==================================================Ascend ============================= test session starts ============================== platform linux -- Python 3.9.19, pytest-6.2.5, py-1.11.0, pluggy-1.5.0 rootdir: /home/jenkins/mindspore/testcases/testcases/tests/st/mint, configfile: ../../../../../../sault/virtual_test/virtualenv_006/sault/config/pytest.ini plugins: mock-3.14.0, hydra-core-1.3.2, forked-1.6.0, anyio-4.9.0, xdist-1.32.0 collected 90 items test_functional_mul.py . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x0-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x0-pynative],max_mem:2.0M TotalTime = 8.14049, [24] [bootstrap]: 0.00073209 [type_inference]: 0.120967 [event_method]: 1.603e-05 [auto_monad]: 0.00010071 [graph_reusing]: 8.08001e-06 [inline]: 2.42001e-06 [add_attr]: 0.178431, [1] [add_attr_with_inline]: 0.178415, [1] [Cycle 1]: 0.00011165, [2] [tag_attr]: 2.65e-05 [meta_addattr_fg_expand]: 5.61998e-06 [parallel-infer-symbol]: 3.85e-06 [pre_auto_parallel]: 5.241e-05 [insert-virtual-dataset]: 2.93e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.78e-06 [pipeline_split]: 2.05002e-06 [optimize]: 0.00579289, [53] [py_interpret_to_execute]: 3.499e-05 [rewriter_before_opt_a]: 9.745e-05 [opt_a]: 0.00310075, [2] [Cycle 1]: 0.00238085, [45] [expand_dump_flag]: 3.58999e-06 [switch_simplify]: 3.92e-05 [loop_unroll]: 2.13e-05 [a_1]: 0.00053188 [with_stream_mark]: 2.025e-05 [recompute_prepare]: 9.41e-06 [updatestate_depend_eliminate]: 5.28002e-06 [updatestate_assign_eliminate]: 3.74002e-06 [updatestate_loads_eliminate]: 3.28e-06 [parameter_eliminate]: 1.92999e-06 [a_2]: 9.752e-05 [accelerated_algorithm]: 8.45001e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 2.20002e-06 [shard_inline]: 6.17001e-06 [merge_send_recv]: 1.069e-05 [auto_parallel]: 7.83999e-06 [parallel]: 5.74e-05 [flash_sp]: 1.169e-05 [merge_comm]: 4.39998e-06 [allreduce_fusion]: 3.65e-06 [matmul_add_comm_reduction]: 1.141e-05 [allreduce_slice_to_reducescatter]: 8.90024e-07 [virtual_shard_identity]: 1.021e-05 [virtual_dataset]: 6.56999e-06 [get_grad_eliminate_]: 6.14001e-06 [virtual_output]: 6.53e-06 [merge_forward]: 4.53999e-06 [cell_reuse_recompute_pass]: 1.64e-06 [offload_activation]: 1.243e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.563e-05 [merge_recompute_call_nodes]: 2.37001e-06 [before_grad]: 1.166e-05 [set_forward_comm_id_for_comm_node_pass]: 4.12998e-06 [meta_fg_expand]: 3.44001e-06 [flash_sp_send_recv_attached]: 2.89001e-06 [receive_attached]: 2.58e-06 [after_resolve]: 1.056e-05 [a_after_grad]: 9.31e-06 [renormalize]: 0.00095433 [add_forward_monad_depend]: 1.239e-05 [auto_monad_grad]: 1.539e-05 [auto_monad_eliminator]: 2.107e-05 [cse]: 6.306e-05 [a_3]: 5.383e-05 [Cycle 2]: 0.00070482, [45] [expand_dump_flag]: 2.14e-06 [switch_simplify]: 7.42002e-06 [loop_unroll]: 6.42001e-06 [a_1]: 0.0001324 [with_stream_mark]: 1.674e-05 [recompute_prepare]: 7.4e-06 [updatestate_depend_eliminate]: 3.83001e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 1.19003e-06 [a_2]: 7.407e-05 [accelerated_algorithm]: 6.21e-06 [shard]: 1.50999e-06 [meta_shard_fg_expand]: 2.14e-06 [shard_inline]: 5.74999e-06 [merge_send_recv]: 1.113e-05 [auto_parallel]: 8.48999e-06 [parallel]: 6.88e-06 [flash_sp]: 3.66001e-06 [merge_comm]: 4.42998e-06 [allreduce_fusion]: 4.09997e-06 [matmul_add_comm_reduction]: 8.13001e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 8.59e-06 [virtual_dataset]: 5.79e-06 [get_grad_eliminate_]: 5.73002e-06 [virtual_output]: 5.24e-06 [merge_forward]: 3.95e-06 [cell_reuse_recompute_pass]: 2.67001e-06 [offload_activation]: 9.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.277e-05 [merge_recompute_call_nodes]: 9.70002e-07 [before_grad]: 9.49e-06 [set_forward_comm_id_for_comm_node_pass]: 3.9e-06 [meta_fg_expand]: 2.91e-06 [flash_sp_send_recv_attached]: 1.12999e-06 [receive_attached]: 1.69e-06 [after_resolve]: 8.91002e-06 [a_after_grad]: 8.62e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.62999e-06 [auto_monad_grad]: 1.35999e-06 [auto_monad_eliminator]: 8.65001e-06 [cse]: 1.918e-05 [a_3]: 3.596e-05 [py_interpret_to_execute_after_opt_a]: 1.649e-05 [slice_cell_reuse_recomputed_activation]: 2.77002e-06 [rewriter_after_opt_a]: 7.188e-05 [convert_after_rewriter]: 9.62999e-06 [order_py_execute_after_rewriter]: 6.12999e-06 [mutable_eliminate]: 0.00077464 [opt_b]: 0.00022114, [1] [Cycle 1]: 0.00021229, [7] [b_1]: 0.00012146 [b_2]: 8.74e-06 [updatestate_depend_eliminate]: 8.80001e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 5.69999e-07 [cse]: 2.632e-05 [optimize_parallel_all_gather_comm]: 2.097e-05 [overlap_param_gather]: 5.79999e-06 [cconv]: 3.268e-05 [loop_unroll]: 0.00053968 [opt_after_cconv]: 0.00011964, [1] [Cycle 1]: 0.00011038, [7] [c_1]: 2.869e-05 [parameter_eliminate]: 5.17999e-06 [updatestate_depend_eliminate]: 7.52998e-06 [updatestate_assign_eliminate]: 2.93e-06 [updatestate_loads_eliminate]: 2.44001e-06 [cse]: 2.345e-05 [renormalize]: 8.79983e-07 [remove_dup_value]: 2.096e-05 [tuple_transform]: 7.936e-05, [1] [Cycle 1]: 7.368e-05, [4] [d_1]: 4.371e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.94999e-06 [partial_unused_args_eliminate]: 2.02001e-06 [add_recomputation]: 8.036e-05 [cse_after_recomputation]: 2.632e-05, [1] [Cycle 1]: 2.089e-05, [1] [cse]: 1.473e-05 [environ_conv]: 2.055e-05 [swap_dp_allreduce_reducescatter]: 5.61e-06 [bias_add_comm_swap]: 3.48e-06 [label_micro_interleaved_index]: 5.25001e-06 [label_fine_grained_interleaved_index]: 3.46001e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.21998e-06 [micro_interleaved_order_control]: 3.41001e-06 [assign_add_opt]: 1.92999e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.74e-06 [full_micro_interleaved_order_control]: 2.59001e-06 [reorder_send_recv_between_fp_bp]: 3.52002e-06 [comm_op_add_attrs]: 1.72999e-06 [add_comm_op_reuse_tag]: 1.11002e-06 [interleave_split_concat_branches]: 1.45999e-06 [interleave_parallel_branches]: 1.24003e-06 [overlap_opt_shard_in_pipeline]: 2.661e-05 [overlap_opt_shard_grad_in_pipeline]: 2.09e-06 [control_data_broadcast_order]: 1.703e-05 [grouped_pairwise_exchange_alltoall]: 1.77001e-06 [offloading_packed_experts]: 4.53999e-06 [overlap_recompute_and_grad_model_parallel]: 5.45001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.28002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.57999e-06 [overlap_recompute_comm]: 2.93e-06 [overlap_grad_ring_attention]: 4.47998e-06 [overlap_grad_flash_sp]: 4.003e-05 [begin_end_overlap_inline]: 5.99975e-07 [split_matmul_comm_elemetwise]: 3.23e-06 [split_layernorm_comm]: 2.46e-06 [handle_group_info]: 1.19e-06 [symbol_engine_optimizer]: 9.359e-05, [1] [Cycle 1]: 8.745e-05, [6] [build]: 5.39e-06 [elim_shapecalc]: 1.35e-05 [elim_not_effective]: 1.426e-05 [opt_reshape]: 7.20998e-06 [fold_const_symbol]: 1.101e-05 [renormalize]: 2.69996e-07 [detach_backward]: 2.82002e-06 [pipeline_parallel_scheduler]: 1.77999e-06 [auto_monad_reorder]: 2.73e-05 [get_jit_bprop_graph]: 1.91998e-06 [rewriter_after_jit_bprop_graph]: 0.00015181 [opt_after_jit_grad]: 0.00064229 [validate]: 7.259e-05 [backend_pass]: 1.30999e-06 [task_emit]: 7.83305 [execute]: 2.245e-05 Sums bootstrap : 0.000732s : 0.01% type_inference : 0.120967s : 1.52% event_method : 0.000016s : 0.00% auto_monad : 0.000101s : 0.00% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000026s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000006s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000052s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.00% optimize.rewriter_before_opt_a : 0.000097s : 0.00% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000047s : 0.00% optimize.opt_a.loop_unroll : 0.000028s : 0.00% optimize.opt_a.a_1 : 0.000664s : 0.01% optimize.opt_a.with_stream_mark : 0.000037s : 0.00% optimize.opt_a.recompute_prepare : 0.000017s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000172s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000015s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000022s : 0.00% optimize.opt_a.auto_parallel : 0.000016s : 0.00% optimize.opt_a.parallel : 0.000064s : 0.00% optimize.opt_a.flash_sp : 0.000015s : 0.00% optimize.opt_a.merge_comm : 0.000009s : 0.00% optimize.opt_a.allreduce_fusion : 0.000008s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000020s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000019s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000022s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000038s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000006s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000954s : 0.01% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.00% optimize.opt_a.auto_monad_grad : 0.000017s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000030s : 0.00% optimize.opt_a.cse : 0.000082s : 0.00% optimize.opt_a.a_3 : 0.000090s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000072s : 0.00% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000775s : 0.01% optimize.opt_b.b_1 : 0.000121s : 0.00% optimize.opt_b.b_2 : 0.000009s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000026s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.00% optimize.overlap_param_gather : 0.000006s : 0.00% optimize.cconv : 0.000033s : 0.00% optimize.loop_unroll : 0.000540s : 0.01% optimize.opt_after_cconv.c_1 : 0.000029s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000023s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000021s : 0.00% optimize.tuple_transform.d_1 : 0.000044s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000080s : 0.00% optimize.cse_after_recomputation.cse : 0.000015s : 0.00% optimize.environ_conv : 0.000021s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000004s : 0.00% optimize.comm_op_add_attrs : 0.000002s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000027s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000040s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000005s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000152s : 0.00% opt_after_jit_grad : 0.000642s : 0.01% validate : 0.000073s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 7.833053s : 98.40% execute : 0.000022s : 0.00% Time group info: ------[substitution.] 0.000245 26 17.83% : 0.000044s : 5: substitution.arithmetic_simplify 0.85% : 0.000002s : 2: substitution.elim_not_effective 0.85% : 0.000002s : 2: substitution.fold_const_symbol 2.77% : 0.000007s : 3: substitution.graph_param_transform 67.85% : 0.000166s : 3: substitution.inline 1.67% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.34% : 0.000006s : 4: substitution.remove_not_recompute_node 1.69% : 0.000004s : 2: substitution.replace_old_param 4.15% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.120897 2 99.36% : 0.120121s : 1: type_inference.infer 0.64% : 0.000776s : 1: type_inference.specialize ------[replace.] 0.000046 4 81.58% : 0.000038s : 3: replace.inline 18.42% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000174 4 94.60% : 0.000164s : 3: match.inline 5.40% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000193 883 0.94% : 0.000002s : 9: predicate.accumulaten_eliminater 1.45% : 0.000003s : 3: predicate.ad_related_special_op_eliminate 0.69% : 0.000001s : 6: predicate.addn_check_dump 0.81% : 0.000002s : 9: predicate.addn_zero_filter 0.75% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.05% : 0.000004s : 15: predicate.arithmetic_simplify 0.85% : 0.000002s : 9: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.16% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 1.04% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.67% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.21% : 0.000000s : 3: predicate.elim_not_effective 0.73% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.25% : 0.000002s : 12: predicate.environ_add_const_eliminate 0.95% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.00% : 0.000002s : 12: predicate.environ_get_depend_swap 1.70% : 0.000003s : 18: predicate.environ_get_eliminate 1.05% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.01% : 0.000004s : 13: predicate.float_depend_g_call 0.64% : 0.000001s : 6: predicate.float_environ_get_switch 0.97% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.17% : 0.000000s : 3: predicate.fold_const_symbol 0.66% : 0.000001s : 6: predicate.get_grad_eliminate 0.31% : 0.000001s : 3: predicate.graph_param_transform 0.54% : 0.000001s : 6: predicate.incorporate_call 0.47% : 0.000001s : 6: predicate.incorporate_call_switch 5.92% : 0.000011s : 40: predicate.inline 0.76% : 0.000001s : 6: predicate.inline_without_move 0.35% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.02% : 0.000002s : 6: predicate.less_batch_normalization 1.45% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.06% : 0.000004s : 25: predicate.load_eliminater 1.72% : 0.000003s : 3: predicate.loop_unroll_after_grad 1.91% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 6: predicate.merge_addn 0.62% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.81% : 0.000002s : 9: predicate.minmaximum_grad 2.39% : 0.000005s : 3: predicate.mutable_eliminate 0.69% : 0.000001s : 3: predicate.opt_reshape 0.42% : 0.000001s : 3: predicate.parallel_virtual_node 1.44% : 0.000003s : 13: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000002s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.16% : 0.000002s : 9: predicate.reduce_eliminate 2.14% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 6: predicate.remove_not_recompute_node 1.20% : 0.000002s : 16: predicate.replace_applicator 0.55% : 0.000001s : 6: predicate.replace_old_param 0.36% : 0.000001s : 3: predicate.reset_defer_inline 1.10% : 0.000002s : 9: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 1.30% : 0.000003s : 6: predicate.same_eliminate 0.39% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.40% : 0.000003s : 6: predicate.shard_identity_eliminate 0.88% : 0.000002s : 6: predicate.special_op_eliminate 0.82% : 0.000002s : 6: predicate.specialize_transform 1.26% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.72% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.18% : 0.000002s : 13: predicate.switch_defer_inline 1.71% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.56% : 0.000009s : 43: predicate.switch_simplify 0.85% : 0.000002s : 9: predicate.tile_eliminate 1.08% : 0.000002s : 9: predicate.transpose_eliminate 1.51% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.74% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000003s : 15: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.28% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.08% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.06% : 0.000006s : 31: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 3: predicate.value_based_eliminate 0.68% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.61% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.69% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000567 8 45.82% : 0.000260s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.18% : 0.000307s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 8.326896 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.14% : 0.178439s : 1: add_attr 2.14% : 0.178420s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000085s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.00% : 0.000107s : 1: auto_monad 0.00% : 0.000033s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.01% : 0.000770s : 1: bootstrap 0.00% : 0.000037s : 1: cconv 0.00% : 0.000005s : 1: comm_op_add_attrs 0.00% : 0.000021s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000029s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000024s : 1: environ_conv 0.00% : 0.000024s : 1: event_method 0.00% : 0.000054s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000007s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.01% : 0.000551s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.01% : 0.000789s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000019s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000021s : 1: opt.transform.mutable_eliminate 0.01% : 0.001093s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000032s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000098s : 28: opt.transform.opt_b 0.00% : 0.000048s : 2: opt.transform.opt_trans_graph 0.00% : 0.000041s : 4: opt.transform.symbol_engine_opt 0.04% : 0.003105s : 1: opt_a 0.00% : 0.000123s : 1: opt_after_cconv 0.01% : 0.000660s : 1: opt_after_jit_grad 0.00% : 0.000225s : 1: opt_b 0.07% : 0.005799s : 1: optimize 0.00% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000045s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000006s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000031s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000009s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000057s : 1: pre_auto_parallel 0.00% : 0.000039s : 1: py_interpret_to_execute 0.00% : 0.000021s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000006s : 1: remove_cast_before_assign_add 0.00% : 0.000025s : 1: remove_dup_value 0.01% : 0.000448s : 1: renormalize.infer 0.01% : 0.000497s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000215s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000077s : 1: rewriter_after_opt_a 0.00% : 0.000102s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000096s : 1: symbol_engine_optimizer 94.07% : 7.833084s : 1: task_emit 0.00% : 0.000082s : 1: tuple_transform 1.45% : 0.120988s : 1: type_inference 0.00% : 0.000117s : 1: validate TotalTime = 0.317116, [24] [bootstrap]: 0.00054759 [type_inference]: 0.0159324 [event_method]: 1.911e-05 [auto_monad]: 7.648e-05 [graph_reusing]: 6.38998e-06 [inline]: 4.23999e-06 [add_attr]: 0.284322, [1] [add_attr_with_inline]: 0.284307, [1] [Cycle 1]: 8.033e-05, [2] [tag_attr]: 2.233e-05 [meta_addattr_fg_expand]: 4.70001e-06 [parallel-infer-symbol]: 4.06001e-06 [pre_auto_parallel]: 3.796e-05 [insert-virtual-dataset]: 2.81e-06 [parallel-infer-symbol-second]: 1.14998e-06 [dataset_repeat_opt]: 3.21999e-06 [pipeline_split]: 1.84e-06 [optimize]: 0.00619383, [53] [py_interpret_to_execute]: 3.322e-05 [rewriter_before_opt_a]: 7.607e-05 [opt_a]: 0.00330558, [2] [Cycle 1]: 0.00244241, [45] [expand_dump_flag]: 4.63999e-06 [switch_simplify]: 3.455e-05 [loop_unroll]: 1.868e-05 [a_1]: 0.00046669 [with_stream_mark]: 2.683e-05 [recompute_prepare]: 1.368e-05 [updatestate_depend_eliminate]: 6.41998e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 3.66001e-06 [parameter_eliminate]: 2.98e-06 [a_2]: 9.568e-05 [accelerated_algorithm]: 8.94e-06 [shard]: 3.10002e-06 [meta_shard_fg_expand]: 2.73e-06 [shard_inline]: 7.65998e-06 [merge_send_recv]: 1.116e-05 [auto_parallel]: 1.11e-05 [parallel]: 5.969e-05 [flash_sp]: 1.523e-05 [merge_comm]: 6.46e-06 [allreduce_fusion]: 3.65e-06 [matmul_add_comm_reduction]: 1.255e-05 [allreduce_slice_to_reducescatter]: 8.00006e-07 [virtual_shard_identity]: 1.739e-05 [virtual_dataset]: 8.23999e-06 [get_grad_eliminate_]: 7.63001e-06 [virtual_output]: 6.98e-06 [merge_forward]: 6.16e-06 [cell_reuse_recompute_pass]: 2.61999e-06 [offload_activation]: 1.296e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.873e-05 [merge_recompute_call_nodes]: 1.97999e-06 [before_grad]: 1.4e-05 [set_forward_comm_id_for_comm_node_pass]: 5.52001e-06 [meta_fg_expand]: 3.26001e-06 [flash_sp_send_recv_attached]: 3.63999e-06 [receive_attached]: 2.79001e-06 [after_resolve]: 1.467e-05 [a_after_grad]: 9.47001e-06 [renormalize]: 0.00101011 [add_forward_monad_depend]: 1.657e-05 [auto_monad_grad]: 3.26001e-06 [auto_monad_eliminator]: 2.401e-05 [cse]: 3.669e-05 [a_3]: 5.886e-05 [Cycle 2]: 0.00084512, [45] [expand_dump_flag]: 3.31999e-06 [switch_simplify]: 1.019e-05 [loop_unroll]: 6.30002e-06 [a_1]: 0.0001454 [with_stream_mark]: 2.108e-05 [recompute_prepare]: 8.72e-06 [updatestate_depend_eliminate]: 5.77999e-06 [updatestate_assign_eliminate]: 3.73001e-06 [updatestate_loads_eliminate]: 3.85e-06 [parameter_eliminate]: 2.49001e-06 [a_2]: 7.951e-05 [accelerated_algorithm]: 7.78001e-06 [shard]: 2.61e-06 [meta_shard_fg_expand]: 2.35002e-06 [shard_inline]: 1.343e-05 [merge_send_recv]: 1.006e-05 [auto_parallel]: 1.06e-05 [parallel]: 9.79e-06 [flash_sp]: 4.17e-06 [merge_comm]: 5.10001e-06 [allreduce_fusion]: 4.23999e-06 [matmul_add_comm_reduction]: 9.96e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 1.137e-05 [virtual_dataset]: 6.00002e-06 [get_grad_eliminate_]: 6.12001e-06 [virtual_output]: 6.06e-06 [merge_forward]: 6.31e-06 [cell_reuse_recompute_pass]: 3.91999e-06 [offload_activation]: 1.259e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.585e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 1.195e-05 [set_forward_comm_id_for_comm_node_pass]: 5.00999e-06 [meta_fg_expand]: 3.26999e-06 [flash_sp_send_recv_attached]: 1.77999e-06 [receive_attached]: 2.43002e-06 [after_resolve]: 1.621e-05 [a_after_grad]: 1.006e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 3.38e-06 [auto_monad_grad]: 2.49999e-06 [auto_monad_eliminator]: 1.29e-05 [cse]: 2.534e-05 [a_3]: 4.031e-05 [py_interpret_to_execute_after_opt_a]: 1.796e-05 [slice_cell_reuse_recomputed_activation]: 2.66e-06 [rewriter_after_opt_a]: 5.169e-05 [convert_after_rewriter]: 7.66999e-06 [order_py_execute_after_rewriter]: 6.02001e-06 [mutable_eliminate]: 0.00085174 [opt_b]: 0.00025136, [1] [Cycle 1]: 0.00023916, [7] [b_1]: 0.00012844 [b_2]: 1.06e-05 [updatestate_depend_eliminate]: 1.223e-05 [updatestate_assign_eliminate]: 3.65998e-06 [updatestate_loads_eliminate]: 3.4e-06 [renormalize]: 1.34e-06 [cse]: 3.621e-05 [optimize_parallel_all_gather_comm]: 2.407e-05 [overlap_param_gather]: 2.24999e-06 [cconv]: 4.043e-05 [loop_unroll]: 0.00062484 [opt_after_cconv]: 0.00013477, [1] [Cycle 1]: 0.00012482, [7] [c_1]: 3.078e-05 [parameter_eliminate]: 6.64999e-06 [updatestate_depend_eliminate]: 8.97e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 2.74001e-06 [cse]: 2.979e-05 [renormalize]: 9.20001e-07 [remove_dup_value]: 1.982e-05 [tuple_transform]: 0.00011149, [1] [Cycle 1]: 0.00010536, [4] [d_1]: 6.897e-05 [none_parameter_eliminate]: 2.45002e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.54e-06 [partial_unused_args_eliminate]: 1.99e-06 [add_recomputation]: 6.671e-05 [cse_after_recomputation]: 3.143e-05, [1] [Cycle 1]: 2.475e-05, [1] [cse]: 1.739e-05 [environ_conv]: 7.65998e-06 [swap_dp_allreduce_reducescatter]: 5.91998e-06 [bias_add_comm_swap]: 4.13999e-06 [label_micro_interleaved_index]: 6.63e-06 [label_fine_grained_interleaved_index]: 3.08e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.32999e-06 [micro_interleaved_order_control]: 2.61999e-06 [assign_add_opt]: 1.76998e-06 [ForceFp32Comm]: 1.17e-06 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.56e-06 [reorder_send_recv_between_fp_bp]: 2.80997e-06 [comm_op_add_attrs]: 1.25001e-06 [add_comm_op_reuse_tag]: 1.15001e-06 [interleave_split_concat_branches]: 1.55001e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.67999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.24001e-06 [control_data_broadcast_order]: 1.909e-05 [grouped_pairwise_exchange_alltoall]: 1.87001e-06 [offloading_packed_experts]: 5.02999e-06 [overlap_recompute_and_grad_model_parallel]: 5.79e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49998e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 4.95999e-06 [overlap_grad_flash_sp]: 2.578e-05 [begin_end_overlap_inline]: 7.60017e-07 [split_matmul_comm_elemetwise]: 2.73998e-06 [split_layernorm_comm]: 2.11998e-06 [handle_group_info]: 1.78002e-06 [symbol_engine_optimizer]: 0.00011001, [1] [Cycle 1]: 0.00010168, [6] [build]: 5.92999e-06 [elim_shapecalc]: 1.703e-05 [elim_not_effective]: 1.849e-05 [opt_reshape]: 9.07999e-06 [fold_const_symbol]: 1.2e-05 [renormalize]: 2.3999e-07 [detach_backward]: 3.03e-06 [pipeline_parallel_scheduler]: 1.89e-06 [auto_monad_reorder]: 2.258e-05 [get_jit_bprop_graph]: 2.76e-06 [rewriter_after_jit_bprop_graph]: 8.59998e-06 [opt_after_jit_grad]: 0.00071287 [validate]: 5.713e-05 [backend_pass]: 1.91e-06 [task_emit]: 0.00883151 [execute]: 1.159e-05 Sums bootstrap : 0.000548s : 1.74% type_inference : 0.015932s : 50.75% event_method : 0.000019s : 0.06% auto_monad : 0.000076s : 0.24% graph_reusing : 0.000006s : 0.02% inline : 0.000004s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000022s : 0.07% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000038s : 0.12% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000033s : 0.11% optimize.rewriter_before_opt_a : 0.000076s : 0.24% optimize.opt_a.expand_dump_flag : 0.000008s : 0.03% optimize.opt_a.switch_simplify : 0.000045s : 0.14% optimize.opt_a.loop_unroll : 0.000025s : 0.08% optimize.opt_a.a_1 : 0.000612s : 1.95% optimize.opt_a.with_stream_mark : 0.000048s : 0.15% optimize.opt_a.recompute_prepare : 0.000022s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000012s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000008s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000008s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000175s : 0.56% optimize.opt_a.accelerated_algorithm : 0.000017s : 0.05% optimize.opt_a.shard : 0.000006s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000005s : 0.02% optimize.opt_a.shard_inline : 0.000021s : 0.07% optimize.opt_a.merge_send_recv : 0.000021s : 0.07% optimize.opt_a.auto_parallel : 0.000022s : 0.07% optimize.opt_a.parallel : 0.000069s : 0.22% optimize.opt_a.flash_sp : 0.000019s : 0.06% optimize.opt_a.merge_comm : 0.000012s : 0.04% optimize.opt_a.allreduce_fusion : 0.000008s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000023s : 0.07% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000029s : 0.09% optimize.opt_a.virtual_dataset : 0.000014s : 0.05% optimize.opt_a.get_grad_eliminate_ : 0.000014s : 0.04% optimize.opt_a.virtual_output : 0.000013s : 0.04% optimize.opt_a.merge_forward : 0.000012s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000007s : 0.02% optimize.opt_a.offload_activation : 0.000026s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000035s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000026s : 0.08% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000011s : 0.03% optimize.opt_a.meta_fg_expand : 0.000007s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000031s : 0.10% optimize.opt_a.a_after_grad : 0.000020s : 0.06% optimize.opt_a.renormalize : 0.001010s : 3.22% optimize.opt_a.add_forward_monad_depend : 0.000020s : 0.06% optimize.opt_a.auto_monad_grad : 0.000006s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000037s : 0.12% optimize.opt_a.cse : 0.000062s : 0.20% optimize.opt_a.a_3 : 0.000099s : 0.32% optimize.py_interpret_to_execute_after_opt_a : 0.000018s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.01% optimize.rewriter_after_opt_a : 0.000052s : 0.16% optimize.convert_after_rewriter : 0.000008s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000852s : 2.71% optimize.opt_b.b_1 : 0.000128s : 0.41% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000012s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000036s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.08% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000040s : 0.13% optimize.loop_unroll : 0.000625s : 1.99% optimize.opt_after_cconv.c_1 : 0.000031s : 0.10% optimize.opt_after_cconv.parameter_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000020s : 0.06% optimize.tuple_transform.d_1 : 0.000069s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000067s : 0.21% optimize.cse_after_recomputation.cse : 0.000017s : 0.06% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000004s : 0.01% optimize.label_micro_interleaved_index : 0.000007s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000019s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000026s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000002s : 0.01% optimize.symbol_engine_optimizer.build : 0.000006s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000017s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000023s : 0.07% get_jit_bprop_graph : 0.000003s : 0.01% rewriter_after_jit_bprop_graph : 0.000009s : 0.03% opt_after_jit_grad : 0.000713s : 2.27% validate : 0.000057s : 0.18% backend_pass : 0.000002s : 0.01% task_emit : 0.008832s : 28.13% execute : 0.000012s : 0.04% Time group info: ------[substitution.] 0.000243 24 19.99% : 0.000049s : 4: substitution.arithmetic_simplify 1.47% : 0.000004s : 2: substitution.elim_not_effective 0.74% : 0.000002s : 2: substitution.fold_const_symbol 2.77% : 0.000007s : 3: substitution.graph_param_transform 67.98% : 0.000165s : 3: substitution.inline 2.38% : 0.000006s : 4: substitution.j_node_and_user_rematch 2.25% : 0.000005s : 4: substitution.remove_not_recompute_node 2.43% : 0.000006s : 2: substitution.replace_old_param ------[type_inference.] 0.015858 2 96.05% : 0.015232s : 1: type_inference.infer 3.95% : 0.000627s : 1: type_inference.specialize ------[replace.] 0.000037 3 100.00% : 0.000037s : 3: replace.inline ------[match.] 0.000162 3 100.00% : 0.000162s : 3: match.inline ------[predicate.] 0.000188 815 0.95% : 0.000002s : 8: predicate.accumulaten_eliminater 1.45% : 0.000003s : 3: predicate.ad_related_special_op_eliminate 0.80% : 0.000001s : 6: predicate.addn_check_dump 0.94% : 0.000002s : 8: predicate.addn_zero_filter 0.70% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.73% : 0.000005s : 14: predicate.arithmetic_simplify 1.09% : 0.000002s : 8: predicate.cast_eliminate 0.60% : 0.000001s : 6: predicate.check_bprop_eliminate 0.54% : 0.000001s : 6: predicate.compare_switch_simplify 0.17% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.77% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.75% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.83% : 0.000002s : 8: predicate.dict_set_item_eliminator 1.89% : 0.000004s : 6: predicate.dumpgradient_eliminate 0.33% : 0.000001s : 3: predicate.elim_not_effective 0.64% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.00% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.03% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.15% : 0.000002s : 11: predicate.environ_get_depend_swap 1.44% : 0.000003s : 17: predicate.environ_get_eliminate 0.88% : 0.000002s : 11: predicate.environ_get_set_eliminate 0.95% : 0.000002s : 11: predicate.exchange_switch_depend_value 1.97% : 0.000004s : 11: predicate.float_depend_g_call 0.49% : 0.000001s : 6: predicate.float_environ_get_switch 0.76% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.89% : 0.000002s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.70% : 0.000001s : 6: predicate.incorporate_call 0.49% : 0.000001s : 6: predicate.incorporate_call_switch 6.56% : 0.000012s : 37: predicate.inline 0.80% : 0.000002s : 6: predicate.inline_without_move 0.35% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.87% : 0.000002s : 6: predicate.less_batch_normalization 1.53% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 1.94% : 0.000004s : 22: predicate.load_eliminater 1.52% : 0.000003s : 3: predicate.loop_unroll_after_grad 1.79% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.52% : 0.000001s : 6: predicate.merge_addn 0.67% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.82% : 0.000002s : 6: predicate.mini_step_allgather_replace 0.67% : 0.000001s : 8: predicate.minmaximum_grad 2.22% : 0.000004s : 3: predicate.mutable_eliminate 0.77% : 0.000001s : 3: predicate.opt_reshape 0.36% : 0.000001s : 3: predicate.parallel_virtual_node 1.49% : 0.000003s : 11: predicate.partial_defer_inline 1.04% : 0.000002s : 11: predicate.partial_eliminate 1.03% : 0.000002s : 8: predicate.print_const_string_wrapper 0.57% : 0.000001s : 6: predicate.reduce_all_const_elim 1.25% : 0.000002s : 8: predicate.reduce_eliminate 1.88% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.59% : 0.000001s : 6: predicate.remove_not_recompute_node 1.10% : 0.000002s : 14: predicate.replace_applicator 0.74% : 0.000001s : 6: predicate.replace_old_param 0.47% : 0.000001s : 3: predicate.reset_defer_inline 0.81% : 0.000002s : 8: predicate.reshape_eliminate 0.58% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.56% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000002s : 6: predicate.same_eliminate 0.42% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.68% : 0.000003s : 6: predicate.shard_identity_eliminate 0.93% : 0.000002s : 6: predicate.special_op_eliminate 0.73% : 0.000001s : 6: predicate.specialize_transform 1.69% : 0.000003s : 6: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000002s : 6: predicate.stack_unstack_eliminate 0.32% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.13% : 0.000002s : 11: predicate.switch_defer_inline 1.52% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.99% : 0.000009s : 38: predicate.switch_simplify 0.77% : 0.000001s : 8: predicate.tile_eliminate 0.77% : 0.000001s : 8: predicate.transpose_eliminate 1.44% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.80% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000003s : 14: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000006s : 20: predicate.tuple_list_get_item_eliminator 1.25% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.41% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 1.81% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.04% : 0.000006s : 28: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.76% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 6: predicate.virtual_output_eliminate 0.26% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000433 7 33.35% : 0.000144s : 2: func_graph_cloner_run.FuncGraphClonerGraph 66.65% : 0.000289s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.609921 196 0.00% : 0.000004s : 1: ForceFp32Comm 46.62% : 0.284330s : 1: add_attr 46.61% : 0.284311s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000073s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.01% : 0.000084s : 1: auto_monad 0.00% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000008s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000008s : 1: bias_add_comm_swap 0.10% : 0.000586s : 1: bootstrap 0.01% : 0.000046s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000024s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.01% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000007s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.00% : 0.000028s : 1: event_method 0.00% : 0.000020s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000007s : 1: handle_group_info 0.00% : 0.000008s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000006s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000010s : 1: label_micro_interleaved_index 0.11% : 0.000642s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000007s : 1: micro_interleaved_order_control 0.14% : 0.000873s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.00% : 0.000022s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000028s : 1: opt.transform.mutable_eliminate 0.18% : 0.001073s : 78: opt.transform.opt_a 0.00% : 0.000029s : 1: opt.transform.opt_after_cconv 0.01% : 0.000038s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000100s : 28: opt.transform.opt_b 0.01% : 0.000075s : 2: opt.transform.opt_trans_graph 0.01% : 0.000051s : 4: opt.transform.symbol_engine_opt 0.54% : 0.003310s : 1: opt_a 0.02% : 0.000139s : 1: opt_after_cconv 0.12% : 0.000734s : 1: opt_after_jit_grad 0.04% : 0.000256s : 1: opt_b 1.02% : 0.006200s : 1: optimize 0.00% : 0.000029s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.01% : 0.000031s : 1: overlap_grad_flash_sp 0.00% : 0.000006s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000005s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000044s : 1: pre_auto_parallel 0.01% : 0.000037s : 1: py_interpret_to_execute 0.00% : 0.000023s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000024s : 1: remove_dup_value 0.10% : 0.000628s : 1: renormalize.infer 0.06% : 0.000367s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000013s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000059s : 1: rewriter_after_opt_a 0.01% : 0.000082s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000113s : 1: symbol_engine_optimizer 1.45% : 0.008855s : 1: task_emit 0.02% : 0.000115s : 1: tuple_transform 2.62% : 0.015961s : 1: type_inference 0.02% : 0.000109s : 1: validate TotalTime = 0.426013, [24] [bootstrap]: 0.00048323 [type_inference]: 0.192867 [event_method]: 1.639e-05 [auto_monad]: 6.703e-05 [graph_reusing]: 6.48998e-06 [inline]: 3.01001e-06 [add_attr]: 0.00423123, [1] [add_attr_with_inline]: 0.00421635, [1] [Cycle 1]: 8.064e-05, [2] [tag_attr]: 2.449e-05 [meta_addattr_fg_expand]: 5.26998e-06 [parallel-infer-symbol]: 4.89e-06 [pre_auto_parallel]: 4.069e-05 [insert-virtual-dataset]: 3.14001e-06 [parallel-infer-symbol-second]: 1.02e-06 [dataset_repeat_opt]: 2.80002e-06 [pipeline_split]: 2.14999e-06 [optimize]: 0.0062981, [53] [py_interpret_to_execute]: 3.199e-05 [rewriter_before_opt_a]: 8.458e-05 [opt_a]: 0.00332952, [2] [Cycle 1]: 0.00225079, [45] [expand_dump_flag]: 3.08998e-06 [switch_simplify]: 3.846e-05 [loop_unroll]: 2.169e-05 [a_1]: 0.00054084 [with_stream_mark]: 2.076e-05 [recompute_prepare]: 9.32999e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 3.70998e-06 [parameter_eliminate]: 1.91998e-06 [a_2]: 8.488e-05 [accelerated_algorithm]: 7.25e-06 [shard]: 2.39999e-06 [meta_shard_fg_expand]: 2.02999e-06 [shard_inline]: 6.64999e-06 [merge_send_recv]: 1.098e-05 [auto_parallel]: 8.62998e-06 [parallel]: 2.231e-05 [flash_sp]: 1.084e-05 [merge_comm]: 4.07e-06 [allreduce_fusion]: 3.69002e-06 [matmul_add_comm_reduction]: 1.115e-05 [allreduce_slice_to_reducescatter]: 9.5999e-07 [virtual_shard_identity]: 1.074e-05 [virtual_dataset]: 6.41998e-06 [get_grad_eliminate_]: 6.01e-06 [virtual_output]: 6.49001e-06 [merge_forward]: 4.70001e-06 [cell_reuse_recompute_pass]: 1.48002e-06 [offload_activation]: 1.172e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.265e-05 [merge_recompute_call_nodes]: 1.78997e-06 [before_grad]: 1.132e-05 [set_forward_comm_id_for_comm_node_pass]: 3.78001e-06 [meta_fg_expand]: 3.37002e-06 [flash_sp_send_recv_attached]: 3.63e-06 [receive_attached]: 2.68e-06 [after_resolve]: 1.172e-05 [a_after_grad]: 9.71e-06 [renormalize]: 0.00090628 [add_forward_monad_depend]: 8.43001e-06 [auto_monad_grad]: 3.28998e-06 [auto_monad_eliminator]: 2.175e-05 [cse]: 3.857e-05 [a_3]: 5.4e-05 [Cycle 2]: 0.00106311, [45] [expand_dump_flag]: 3.13998e-06 [switch_simplify]: 9.09e-06 [loop_unroll]: 5.79e-06 [a_1]: 0.00013823 [with_stream_mark]: 1.918e-05 [recompute_prepare]: 7.11999e-06 [updatestate_depend_eliminate]: 4.2e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 3.64002e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 7.62e-05 [accelerated_algorithm]: 6.38e-06 [shard]: 1.72001e-06 [meta_shard_fg_expand]: 2.83e-06 [shard_inline]: 6.66e-06 [merge_send_recv]: 8.49002e-06 [auto_parallel]: 1.035e-05 [parallel]: 1.484e-05 [flash_sp]: 5.09e-06 [merge_comm]: 5.79e-06 [allreduce_fusion]: 3.95998e-06 [matmul_add_comm_reduction]: 9.61998e-06 [allreduce_slice_to_reducescatter]: 4.7998e-07 [virtual_shard_identity]: 8.89e-06 [virtual_dataset]: 5.87001e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 6.06e-06 [merge_forward]: 4.89e-06 [cell_reuse_recompute_pass]: 3.43e-06 [offload_activation]: 1.016e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.379e-05 [merge_recompute_call_nodes]: 1.19e-06 [before_grad]: 1.02e-05 [set_forward_comm_id_for_comm_node_pass]: 3.68999e-06 [meta_fg_expand]: 2.91e-06 [flash_sp_send_recv_attached]: 1.30001e-06 [receive_attached]: 1.72999e-06 [after_resolve]: 9.99999e-06 [a_after_grad]: 8.89e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 2.42001e-06 [auto_monad_grad]: 2.42001e-06 [auto_monad_eliminator]: 1.015e-05 [cse]: 0.00029887 [a_3]: 4.614e-05 [py_interpret_to_execute_after_opt_a]: 2.379e-05 [slice_cell_reuse_recomputed_activation]: 2.57001e-06 [rewriter_after_opt_a]: 5.551e-05 [convert_after_rewriter]: 8.30999e-06 [order_py_execute_after_rewriter]: 5.50001e-06 [mutable_eliminate]: 0.00089187 [opt_b]: 0.00025187, [1] [Cycle 1]: 0.00024096, [7] [b_1]: 0.00012738 [b_2]: 9.47001e-06 [updatestate_depend_eliminate]: 1.222e-05 [updatestate_assign_eliminate]: 3.97e-06 [updatestate_loads_eliminate]: 3.33e-06 [renormalize]: 1.49e-06 [cse]: 3.948e-05 [optimize_parallel_all_gather_comm]: 2.701e-05 [overlap_param_gather]: 2.22001e-06 [cconv]: 4.299e-05 [loop_unroll]: 0.000708 [opt_after_cconv]: 0.00013125, [1] [Cycle 1]: 0.00012023, [7] [c_1]: 2.898e-05 [parameter_eliminate]: 6.88e-06 [updatestate_depend_eliminate]: 9.66e-06 [updatestate_assign_eliminate]: 2.86e-06 [updatestate_loads_eliminate]: 2.79999e-06 [cse]: 3.069e-05 [renormalize]: 1.09998e-06 [remove_dup_value]: 2.071e-05 [tuple_transform]: 8.722e-05, [1] [Cycle 1]: 7.985e-05, [4] [d_1]: 4.881e-05 [none_parameter_eliminate]: 2.20002e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 7.94002e-06 [partial_unused_args_eliminate]: 2.15002e-06 [add_recomputation]: 6.309e-05 [cse_after_recomputation]: 2.633e-05, [1] [Cycle 1]: 2.107e-05, [1] [cse]: 1.474e-05 [environ_conv]: 7.23e-06 [swap_dp_allreduce_reducescatter]: 5.67999e-06 [bias_add_comm_swap]: 3.95e-06 [label_micro_interleaved_index]: 2.325e-05 [label_fine_grained_interleaved_index]: 3.31999e-06 [merge_cast_opt]: 1.92999e-06 [slice_recompute_activation]: 2.53e-06 [micro_interleaved_order_control]: 2.93e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 9.89996e-07 [remove_cast_before_assign_add]: 1.15999e-06 [full_micro_interleaved_order_control]: 2.52001e-06 [reorder_send_recv_between_fp_bp]: 3.18e-06 [comm_op_add_attrs]: 1.37e-06 [add_comm_op_reuse_tag]: 1.14e-06 [interleave_split_concat_branches]: 1.37e-06 [interleave_parallel_branches]: 1.13001e-06 [overlap_opt_shard_in_pipeline]: 1.62001e-06 [overlap_opt_shard_grad_in_pipeline]: 2.22999e-06 [control_data_broadcast_order]: 1.804e-05 [grouped_pairwise_exchange_alltoall]: 1.67001e-06 [offloading_packed_experts]: 4.95001e-06 [overlap_recompute_and_grad_model_parallel]: 6.34001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47001e-06 [overlap_recompute_comm]: 2.59999e-06 [overlap_grad_ring_attention]: 5.27001e-06 [overlap_grad_flash_sp]: 2.471e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.89001e-06 [split_layernorm_comm]: 2.17999e-06 [handle_group_info]: 1.15999e-06 [symbol_engine_optimizer]: 8.851e-05, [1] [Cycle 1]: 8.262e-05, [6] [build]: 4.13001e-06 [elim_shapecalc]: 1.393e-05 [elim_not_effective]: 1.503e-05 [opt_reshape]: 7.06999e-06 [fold_const_symbol]: 1.03e-05 [renormalize]: 2.09984e-07 [detach_backward]: 2.84001e-06 [pipeline_parallel_scheduler]: 2.58e-06 [auto_monad_reorder]: 1.995e-05 [get_jit_bprop_graph]: 3.7e-06 [rewriter_after_jit_bprop_graph]: 8.60999e-06 [opt_after_jit_grad]: 0.00071795 [validate]: 5.517e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.220838 [execute]: 1.026e-05 Sums bootstrap : 0.000483s : 0.11% type_inference : 0.192867s : 45.87% event_method : 0.000016s : 0.00% auto_monad : 0.000067s : 0.02% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000024s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000005s : 0.00% pre_auto_parallel : 0.000041s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000032s : 0.01% optimize.rewriter_before_opt_a : 0.000085s : 0.02% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000048s : 0.01% optimize.opt_a.loop_unroll : 0.000027s : 0.01% optimize.opt_a.a_1 : 0.000679s : 0.16% optimize.opt_a.with_stream_mark : 0.000040s : 0.01% optimize.opt_a.recompute_prepare : 0.000016s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000010s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000161s : 0.04% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000005s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000019s : 0.00% optimize.opt_a.auto_parallel : 0.000019s : 0.00% optimize.opt_a.parallel : 0.000037s : 0.01% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000010s : 0.00% optimize.opt_a.allreduce_fusion : 0.000008s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000021s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000020s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000013s : 0.00% optimize.opt_a.merge_forward : 0.000010s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000022s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000026s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000022s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000006s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.01% optimize.opt_a.a_after_grad : 0.000019s : 0.00% optimize.opt_a.renormalize : 0.000906s : 0.22% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.00% optimize.opt_a.auto_monad_grad : 0.000006s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000032s : 0.01% optimize.opt_a.cse : 0.000337s : 0.08% optimize.opt_a.a_3 : 0.000100s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000024s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000056s : 0.01% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000892s : 0.21% optimize.opt_b.b_1 : 0.000127s : 0.03% optimize.opt_b.b_2 : 0.000009s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000012s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000039s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000027s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000043s : 0.01% optimize.loop_unroll : 0.000708s : 0.17% optimize.opt_after_cconv.c_1 : 0.000029s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000010s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000021s : 0.00% optimize.tuple_transform.d_1 : 0.000049s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000063s : 0.02% optimize.cse_after_recomputation.cse : 0.000015s : 0.00% optimize.environ_conv : 0.000007s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000023s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000003s : 0.00% auto_monad_reorder : 0.000020s : 0.00% get_jit_bprop_graph : 0.000004s : 0.00% rewriter_after_jit_bprop_graph : 0.000009s : 0.00% opt_after_jit_grad : 0.000718s : 0.17% validate : 0.000055s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.220838s : 52.52% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000257 26 18.92% : 0.000049s : 5: substitution.arithmetic_simplify 0.93% : 0.000002s : 2: substitution.elim_not_effective 0.54% : 0.000001s : 2: substitution.fold_const_symbol 2.92% : 0.000008s : 3: substitution.graph_param_transform 66.48% : 0.000171s : 3: substitution.inline 1.69% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.44% : 0.000006s : 4: substitution.remove_not_recompute_node 2.29% : 0.000006s : 2: substitution.replace_old_param 3.79% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.192799 2 99.61% : 0.192055s : 1: type_inference.infer 0.39% : 0.000745s : 1: type_inference.specialize ------[replace.] 0.000048 4 79.86% : 0.000038s : 3: replace.inline 20.14% : 0.000010s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000178 4 95.01% : 0.000169s : 3: match.inline 4.99% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000205 883 0.89% : 0.000002s : 9: predicate.accumulaten_eliminater 1.29% : 0.000003s : 3: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 6: predicate.addn_check_dump 0.82% : 0.000002s : 9: predicate.addn_zero_filter 0.89% : 0.000002s : 9: predicate.adjust_all_reduce_mul_add 2.21% : 0.000005s : 15: predicate.arithmetic_simplify 0.91% : 0.000002s : 9: predicate.cast_eliminate 0.86% : 0.000002s : 6: predicate.check_bprop_eliminate 0.75% : 0.000002s : 6: predicate.compare_switch_simplify 0.18% : 0.000000s : 3: predicate.const_output_eliminate 0.75% : 0.000002s : 6: predicate.depend_value_elim 0.85% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.81% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.88% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.37% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.20% : 0.000000s : 3: predicate.elim_not_effective 0.72% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.02% : 0.000002s : 12: predicate.environ_get_depend_swap 1.56% : 0.000003s : 18: predicate.environ_get_eliminate 0.98% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.17% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.00% : 0.000004s : 13: predicate.float_depend_g_call 0.81% : 0.000002s : 6: predicate.float_environ_get_switch 0.80% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.18% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000002s : 6: predicate.get_grad_eliminate 0.31% : 0.000001s : 3: predicate.graph_param_transform 0.53% : 0.000001s : 6: predicate.incorporate_call 0.44% : 0.000001s : 6: predicate.incorporate_call_switch 6.10% : 0.000013s : 40: predicate.inline 0.94% : 0.000002s : 6: predicate.inline_without_move 0.32% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.88% : 0.000002s : 6: predicate.less_batch_normalization 1.93% : 0.000004s : 16: predicate.list_to_tuple_eliminator_ 2.41% : 0.000005s : 25: predicate.load_eliminater 1.79% : 0.000004s : 3: predicate.loop_unroll_after_grad 1.78% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.55% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 2.01% : 0.000004s : 3: predicate.mutable_eliminate 0.41% : 0.000001s : 3: predicate.opt_reshape 0.42% : 0.000001s : 3: predicate.parallel_virtual_node 1.61% : 0.000003s : 13: predicate.partial_defer_inline 1.12% : 0.000002s : 13: predicate.partial_eliminate 0.92% : 0.000002s : 9: predicate.print_const_string_wrapper 0.57% : 0.000001s : 6: predicate.reduce_all_const_elim 1.43% : 0.000003s : 9: predicate.reduce_eliminate 2.35% : 0.000005s : 25: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 6: predicate.remove_not_recompute_node 1.16% : 0.000002s : 16: predicate.replace_applicator 0.54% : 0.000001s : 6: predicate.replace_old_param 0.59% : 0.000001s : 3: predicate.reset_defer_inline 0.92% : 0.000002s : 9: predicate.reshape_eliminate 0.64% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.54% : 0.000001s : 3: predicate.row_tensor_eliminate 1.18% : 0.000002s : 6: predicate.same_eliminate 0.36% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.02% : 0.000002s : 6: predicate.shard_identity_eliminate 0.79% : 0.000002s : 6: predicate.special_op_eliminate 0.64% : 0.000001s : 6: predicate.specialize_transform 1.27% : 0.000003s : 6: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000002s : 6: predicate.stack_unstack_eliminate 0.29% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.34% : 0.000003s : 13: predicate.switch_defer_inline 1.67% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000009s : 43: predicate.switch_simplify 0.78% : 0.000002s : 9: predicate.tile_eliminate 0.85% : 0.000002s : 9: predicate.transpose_eliminate 1.63% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.61% : 0.000003s : 15: predicate.tuple_list_get_item_depend_reorder 3.61% : 0.000007s : 22: predicate.tuple_list_get_item_eliminator 1.85% : 0.000004s : 15: predicate.tuple_list_get_set_item_eliminator 1.91% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.52% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.04% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.56% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 3: predicate.value_based_eliminate 0.66% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000485 8 41.52% : 0.000201s : 3: func_graph_cloner_run.FuncGraphClonerGraph 58.48% : 0.000283s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.438735 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.97% : 0.004239s : 1: add_attr 0.96% : 0.004221s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.02% : 0.000069s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000073s : 1: auto_monad 0.01% : 0.000056s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.12% : 0.000524s : 1: bootstrap 0.01% : 0.000048s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000021s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.01% : 0.000029s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.00% : 0.000010s : 1: environ_conv 0.01% : 0.000023s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000010s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000027s : 1: label_micro_interleaved_index 0.17% : 0.000725s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.21% : 0.000913s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000024s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000024s : 1: opt.transform.mutable_eliminate 0.25% : 0.001099s : 78: opt.transform.opt_a 0.01% : 0.000027s : 1: opt.transform.opt_after_cconv 0.01% : 0.000032s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000102s : 28: opt.transform.opt_b 0.01% : 0.000054s : 2: opt.transform.opt_trans_graph 0.01% : 0.000042s : 4: opt.transform.symbol_engine_opt 0.76% : 0.003333s : 1: opt_a 0.03% : 0.000136s : 1: opt_after_cconv 0.17% : 0.000737s : 1: opt_after_jit_grad 0.06% : 0.000255s : 1: opt_b 1.44% : 0.006305s : 1: optimize 0.01% : 0.000031s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000010s : 1: overlap_grad_ring_attention 0.00% : 0.000006s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000010s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000046s : 1: pre_auto_parallel 0.01% : 0.000036s : 1: py_interpret_to_execute 0.01% : 0.000028s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000005s : 1: remove_cast_before_assign_add 0.01% : 0.000025s : 1: remove_dup_value 0.11% : 0.000467s : 1: renormalize.infer 0.10% : 0.000428s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000012s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000062s : 1: rewriter_after_opt_a 0.02% : 0.000089s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000092s : 1: symbol_engine_optimizer 50.34% : 0.220859s : 1: task_emit 0.02% : 0.000091s : 1: tuple_transform 43.97% : 0.192894s : 1: type_inference 0.02% : 0.000106s : 1: validate TotalTime = 0.833876, [24] [bootstrap]: 0.00055476 [type_inference]: 0.240256 [event_method]: 5.632e-05 [auto_monad]: 0.00019645 [graph_reusing]: 8.84e-06 [inline]: 2.54999e-06 [add_attr]: 0.00408943, [1] [add_attr_with_inline]: 0.00407627, [1] [Cycle 1]: 0.0001021, [2] [tag_attr]: 4.631e-05 [meta_addattr_fg_expand]: 1.087e-05 [parallel-infer-symbol]: 3.91001e-06 [pre_auto_parallel]: 6.463e-05 [insert-virtual-dataset]: 2.84999e-06 [parallel-infer-symbol-second]: 1.29e-06 [dataset_repeat_opt]: 2.70002e-06 [pipeline_split]: 1.71998e-06 [optimize]: 0.398274, [53] [py_interpret_to_execute]: 4.968e-05 [rewriter_before_opt_a]: 0.00018704 [opt_a]: 0.394919, [3] [Cycle 1]: 0.390275, [45] [expand_dump_flag]: 6.96999e-06 [switch_simplify]: 8.301e-05 [loop_unroll]: 6.345e-05 [a_1]: 0.0017031 [with_stream_mark]: 3.96e-05 [recompute_prepare]: 3.499e-05 [updatestate_depend_eliminate]: 1.058e-05 [updatestate_assign_eliminate]: 8.94e-06 [updatestate_loads_eliminate]: 7.23999e-06 [parameter_eliminate]: 3.3e-06 [a_2]: 0.00026787 [accelerated_algorithm]: 4.35e-05 [shard]: 2.30002e-06 [meta_shard_fg_expand]: 6.89999e-06 [shard_inline]: 1.774e-05 [merge_send_recv]: 2.185e-05 [auto_parallel]: 1.59e-05 [parallel]: 2.251e-05 [flash_sp]: 1.509e-05 [merge_comm]: 1.092e-05 [allreduce_fusion]: 9.52001e-06 [matmul_add_comm_reduction]: 3.644e-05 [allreduce_slice_to_reducescatter]: 8.50006e-07 [virtual_shard_identity]: 2.32e-05 [virtual_dataset]: 1.775e-05 [get_grad_eliminate_]: 1.594e-05 [virtual_output]: 1.686e-05 [merge_forward]: 1.107e-05 [cell_reuse_recompute_pass]: 1.47999e-06 [offload_activation]: 1.998e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.803e-05 [merge_recompute_call_nodes]: 1.88002e-06 [before_grad]: 4.789e-05 [set_forward_comm_id_for_comm_node_pass]: 1.306e-05 [meta_fg_expand]: 0.00228578 [flash_sp_send_recv_attached]: 6.47001e-06 [receive_attached]: 2.99001e-06 [after_resolve]: 0.00010633 [a_after_grad]: 0.00010624 [renormalize]: 0.384012 [add_forward_monad_depend]: 1.578e-05 [auto_monad_grad]: 6.89999e-06 [auto_monad_eliminator]: 6.323e-05 [cse]: 0.00022189 [a_3]: 0.00036532 [Cycle 2]: 0.00378615, [45] [expand_dump_flag]: 3.2e-06 [switch_simplify]: 4.988e-05 [loop_unroll]: 4.258e-05 [a_1]: 0.0015928 [with_stream_mark]: 2.687e-05 [recompute_prepare]: 1.589e-05 [updatestate_depend_eliminate]: 5.96998e-06 [updatestate_assign_eliminate]: 4.80001e-06 [updatestate_loads_eliminate]: 3.71999e-06 [parameter_eliminate]: 2.44001e-06 [a_2]: 0.00010472 [accelerated_algorithm]: 1.59e-05 [shard]: 2.17999e-06 [meta_shard_fg_expand]: 2.76e-06 [shard_inline]: 7.06001e-06 [merge_send_recv]: 1.288e-05 [auto_parallel]: 1.254e-05 [parallel]: 1.229e-05 [flash_sp]: 5.00001e-06 [merge_comm]: 4.75999e-06 [allreduce_fusion]: 4.33999e-06 [matmul_add_comm_reduction]: 1.306e-05 [allreduce_slice_to_reducescatter]: 9.30013e-07 [virtual_shard_identity]: 1.268e-05 [virtual_dataset]: 6.88998e-06 [get_grad_eliminate_]: 6.81001e-06 [virtual_output]: 7.01001e-06 [merge_forward]: 5.34e-06 [cell_reuse_recompute_pass]: 3.04999e-06 [offload_activation]: 1.391e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.801e-05 [merge_recompute_call_nodes]: 2.11998e-06 [before_grad]: 1.467e-05 [set_forward_comm_id_for_comm_node_pass]: 5.96e-06 [meta_fg_expand]: 0.00014305 [flash_sp_send_recv_attached]: 3.59002e-06 [receive_attached]: 2.89001e-06 [after_resolve]: 2.549e-05 [a_after_grad]: 1.137e-05 [renormalize]: 0.00108181 [add_forward_monad_depend]: 1.015e-05 [auto_monad_grad]: 2.44999e-06 [auto_monad_eliminator]: 2.222e-05 [cse]: 3.905e-05 [a_3]: 6.057e-05 [Cycle 3]: 0.00083238, [45] [expand_dump_flag]: 2.61e-06 [switch_simplify]: 1.033e-05 [loop_unroll]: 6.95002e-06 [a_1]: 0.00016671 [with_stream_mark]: 1.511e-05 [recompute_prepare]: 8.70999e-06 [updatestate_depend_eliminate]: 5.35999e-06 [updatestate_assign_eliminate]: 3.45e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.48002e-06 [a_2]: 9.011e-05 [accelerated_algorithm]: 1.263e-05 [shard]: 1.94e-06 [meta_shard_fg_expand]: 2.37999e-06 [shard_inline]: 8.13001e-06 [merge_send_recv]: 8.80001e-06 [auto_parallel]: 1.043e-05 [parallel]: 1.009e-05 [flash_sp]: 1.30999e-06 [merge_comm]: 4.13001e-06 [allreduce_fusion]: 3.71999e-06 [matmul_add_comm_reduction]: 1.021e-05 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 1.209e-05 [virtual_dataset]: 6.68e-06 [get_grad_eliminate_]: 7.36001e-06 [virtual_output]: 6.74001e-06 [merge_forward]: 4.58999e-06 [cell_reuse_recompute_pass]: 3.76001e-06 [offload_activation]: 1.096e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.664e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 1.253e-05 [set_forward_comm_id_for_comm_node_pass]: 4.89e-06 [meta_fg_expand]: 2.99001e-06 [flash_sp_send_recv_attached]: 1.75001e-06 [receive_attached]: 1.94e-06 [after_resolve]: 1.159e-05 [a_after_grad]: 9.46998e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 2.33998e-06 [auto_monad_grad]: 1.97999e-06 [auto_monad_eliminator]: 1.211e-05 [cse]: 2.802e-05 [a_3]: 4.415e-05 [py_interpret_to_execute_after_opt_a]: 2.167e-05 [slice_cell_reuse_recomputed_activation]: 2.39001e-06 [rewriter_after_opt_a]: 7.331e-05 [convert_after_rewriter]: 9.40001e-06 [order_py_execute_after_rewriter]: 6.27001e-06 [mutable_eliminate]: 0.00094789 [opt_b]: 0.00044964, [1] [Cycle 1]: 0.00043799, [7] [b_1]: 0.00030981 [b_2]: 1.156e-05 [updatestate_depend_eliminate]: 1.205e-05 [updatestate_assign_eliminate]: 3.43e-06 [updatestate_loads_eliminate]: 3.35e-06 [renormalize]: 3.59985e-07 [cse]: 4.411e-05 [optimize_parallel_all_gather_comm]: 2.854e-05 [overlap_param_gather]: 2.54001e-06 [cconv]: 4.145e-05 [loop_unroll]: 0.00063229 [opt_after_cconv]: 0.00013816, [1] [Cycle 1]: 0.00012882, [7] [c_1]: 3.55e-05 [parameter_eliminate]: 5.89999e-06 [updatestate_depend_eliminate]: 9.19e-06 [updatestate_assign_eliminate]: 3.48999e-06 [updatestate_loads_eliminate]: 3.03998e-06 [cse]: 3.254e-05 [renormalize]: 9.80013e-07 [remove_dup_value]: 1.971e-05 [tuple_transform]: 9.995e-05, [1] [Cycle 1]: 9.435e-05, [4] [d_1]: 5.845e-05 [none_parameter_eliminate]: 1.96998e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 1.026e-05 [partial_unused_args_eliminate]: 2.34999e-06 [add_recomputation]: 6.906e-05 [cse_after_recomputation]: 3.485e-05, [1] [Cycle 1]: 2.856e-05, [1] [cse]: 2.091e-05 [environ_conv]: 1.24e-05 [swap_dp_allreduce_reducescatter]: 6.12999e-06 [bias_add_comm_swap]: 3.75998e-06 [label_micro_interleaved_index]: 6.82002e-06 [label_fine_grained_interleaved_index]: 3.07002e-06 [merge_cast_opt]: 1.41002e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.43998e-06 [assign_add_opt]: 1.67999e-06 [ForceFp32Comm]: 1.11002e-06 [remove_cast_before_assign_add]: 1.35001e-06 [full_micro_interleaved_order_control]: 2.81e-06 [reorder_send_recv_between_fp_bp]: 2.93e-06 [comm_op_add_attrs]: 1.19003e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.31002e-06 [interleave_parallel_branches]: 1.29003e-06 [overlap_opt_shard_in_pipeline]: 1.27999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.58998e-06 [control_data_broadcast_order]: 2.011e-05 [grouped_pairwise_exchange_alltoall]: 2.41998e-06 [offloading_packed_experts]: 5.86e-06 [overlap_recompute_and_grad_model_parallel]: 5.69e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.47001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50001e-06 [overlap_recompute_comm]: 2.56998e-06 [overlap_grad_ring_attention]: 6.04001e-06 [overlap_grad_flash_sp]: 2.777e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.46e-06 [split_layernorm_comm]: 2.04999e-06 [handle_group_info]: 1.59e-06 [symbol_engine_optimizer]: 0.00011453, [1] [Cycle 1]: 0.00010714, [6] [build]: 1.296e-05 [elim_shapecalc]: 1.819e-05 [elim_not_effective]: 1.806e-05 [opt_reshape]: 8.58001e-06 [fold_const_symbol]: 1.292e-05 [renormalize]: 2.19996e-07 [detach_backward]: 3.73001e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 2.67e-05 [get_jit_bprop_graph]: 2.64999e-06 [rewriter_after_jit_bprop_graph]: 7.58999e-06 [opt_after_jit_grad]: 0.00068435 [validate]: 6.664e-05 [backend_pass]: 1.22999e-06 [task_emit]: 0.189264 [execute]: 9.10999e-06 Sums bootstrap : 0.000555s : 0.07% type_inference : 0.240256s : 29.02% event_method : 0.000056s : 0.01% auto_monad : 0.000196s : 0.02% graph_reusing : 0.000009s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000046s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000065s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000050s : 0.01% optimize.rewriter_before_opt_a : 0.000187s : 0.02% optimize.opt_a.expand_dump_flag : 0.000013s : 0.00% optimize.opt_a.switch_simplify : 0.000143s : 0.02% optimize.opt_a.loop_unroll : 0.000113s : 0.01% optimize.opt_a.a_1 : 0.003463s : 0.42% optimize.opt_a.with_stream_mark : 0.000082s : 0.01% optimize.opt_a.recompute_prepare : 0.000060s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000022s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.00% optimize.opt_a.parameter_eliminate : 0.000007s : 0.00% optimize.opt_a.a_2 : 0.000463s : 0.06% optimize.opt_a.accelerated_algorithm : 0.000072s : 0.01% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000012s : 0.00% optimize.opt_a.shard_inline : 0.000033s : 0.00% optimize.opt_a.merge_send_recv : 0.000044s : 0.01% optimize.opt_a.auto_parallel : 0.000039s : 0.00% optimize.opt_a.parallel : 0.000045s : 0.01% optimize.opt_a.flash_sp : 0.000021s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000018s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000060s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000048s : 0.01% optimize.opt_a.virtual_dataset : 0.000031s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000030s : 0.00% optimize.opt_a.virtual_output : 0.000031s : 0.00% optimize.opt_a.merge_forward : 0.000021s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000008s : 0.00% optimize.opt_a.offload_activation : 0.000045s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000073s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000006s : 0.00% optimize.opt_a.before_grad : 0.000075s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000024s : 0.00% optimize.opt_a.meta_fg_expand : 0.002432s : 0.29% optimize.opt_a.flash_sp_send_recv_attached : 0.000012s : 0.00% optimize.opt_a.receive_attached : 0.000008s : 0.00% optimize.opt_a.after_resolve : 0.000143s : 0.02% optimize.opt_a.a_after_grad : 0.000127s : 0.02% optimize.opt_a.renormalize : 0.385094s : 46.51% optimize.opt_a.add_forward_monad_depend : 0.000028s : 0.00% optimize.opt_a.auto_monad_grad : 0.000011s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000098s : 0.01% optimize.opt_a.cse : 0.000289s : 0.03% optimize.opt_a.a_3 : 0.000470s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000022s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000073s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000948s : 0.11% optimize.opt_b.b_1 : 0.000310s : 0.04% optimize.opt_b.b_2 : 0.000012s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000012s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000044s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000029s : 0.00% optimize.overlap_param_gather : 0.000003s : 0.00% optimize.cconv : 0.000041s : 0.01% optimize.loop_unroll : 0.000632s : 0.08% optimize.opt_after_cconv.c_1 : 0.000036s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000033s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000020s : 0.00% optimize.tuple_transform.d_1 : 0.000058s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000069s : 0.01% optimize.cse_after_recomputation.cse : 0.000021s : 0.00% optimize.environ_conv : 0.000012s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000007s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000003s : 0.00% optimize.control_data_broadcast_order : 0.000020s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000028s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000002s : 0.00% optimize.symbol_engine_optimizer.build : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000018s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000004s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.00% get_jit_bprop_graph : 0.000003s : 0.00% rewriter_after_jit_bprop_graph : 0.000008s : 0.00% opt_after_jit_grad : 0.000684s : 0.08% validate : 0.000067s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.189264s : 22.86% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.001131 161 7.82% : 0.000088s : 8: substitution.arithmetic_simplify 0.24% : 0.000003s : 3: substitution.elim_not_effective 0.56% : 0.000006s : 5: substitution.float_depend_g_call 0.38% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.20% : 0.000002s : 3: substitution.fold_const_symbol 0.69% : 0.000008s : 4: substitution.graph_param_transform 0.34% : 0.000004s : 2: substitution.incorporate_call 0.20% : 0.000002s : 2: substitution.incorporate_call_switch 59.62% : 0.000674s : 17: substitution.inline 2.74% : 0.000031s : 2: substitution.inline_without_move 2.29% : 0.000026s : 15: substitution.j_node_and_user_rematch 1.99% : 0.000023s : 3: substitution.less_batch_normalization 1.06% : 0.000012s : 7: substitution.minmaximum_grad 2.13% : 0.000024s : 5: substitution.partial_eliminate 1.32% : 0.000015s : 15: substitution.remove_not_recompute_node 3.65% : 0.000041s : 10: substitution.replace_applicator 1.31% : 0.000015s : 10: substitution.replace_old_param 0.50% : 0.000006s : 1: substitution.set_cell_output_no_recompute 2.25% : 0.000025s : 7: substitution.tuple_list_convert_item_index_to_positive 1.04% : 0.000012s : 7: substitution.tuple_list_get_item_const_eliminator 1.55% : 0.000018s : 7: substitution.tuple_list_get_item_depend_reorder 6.60% : 0.000075s : 19: substitution.tuple_list_get_item_eliminator 1.52% : 0.000017s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.240146 2 99.25% : 0.238350s : 1: type_inference.infer 0.75% : 0.001796s : 1: type_inference.specialize ------[replace.] 0.000272 27 63.72% : 0.000173s : 17: replace.inline 36.28% : 0.000099s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000700 27 94.54% : 0.000661s : 17: match.inline 5.46% : 0.000038s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000858 4248 0.97% : 0.000008s : 53: predicate.accumulaten_eliminater 0.33% : 0.000003s : 4: predicate.ad_related_special_op_eliminate 0.40% : 0.000003s : 21: predicate.addn_check_dump 0.94% : 0.000008s : 53: predicate.addn_zero_filter 0.90% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 1.89% : 0.000016s : 74: predicate.arithmetic_simplify 1.00% : 0.000009s : 53: predicate.cast_eliminate 0.99% : 0.000008s : 50: predicate.check_bprop_eliminate 0.38% : 0.000003s : 21: predicate.compare_switch_simplify 0.05% : 0.000000s : 4: predicate.const_output_eliminate 0.41% : 0.000003s : 21: predicate.depend_value_elim 0.92% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.00% : 0.000009s : 53: predicate.dict_get_item_eliminator 0.94% : 0.000008s : 53: predicate.dict_set_item_eliminator 11.24% : 0.000096s : 8: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 4: predicate.elim_not_effective 0.22% : 0.000002s : 4: predicate.elim_shapecalc_of_broadcastargs 1.02% : 0.000009s : 57: predicate.environ_add_const_eliminate 1.04% : 0.000009s : 57: predicate.environ_get_add_eliminate 0.97% : 0.000008s : 57: predicate.environ_get_depend_swap 1.45% : 0.000012s : 78: predicate.environ_get_eliminate 0.96% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.53% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.28% : 0.000020s : 80: predicate.float_depend_g_call 0.39% : 0.000003s : 21: predicate.float_environ_get_switch 0.53% : 0.000005s : 25: predicate.float_tuple_getitem_switch 0.05% : 0.000000s : 4: predicate.fold_const_symbol 0.52% : 0.000004s : 21: predicate.get_grad_eliminate 0.05% : 0.000000s : 4: predicate.graph_param_transform 0.45% : 0.000004s : 21: predicate.incorporate_call 0.39% : 0.000003s : 21: predicate.incorporate_call_switch 5.22% : 0.000045s : 183: predicate.inline 1.25% : 0.000011s : 45: predicate.inline_without_move 0.26% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.71% : 0.000006s : 21: predicate.less_batch_normalization 1.35% : 0.000012s : 71: predicate.list_to_tuple_eliminator_ 2.46% : 0.000021s : 124: predicate.load_eliminater 0.49% : 0.000004s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000019s : 113: predicate.loop_unroll_before_grad 1.26% : 0.000011s : 61: predicate.make_slice_get_slice_eliminator 0.51% : 0.000004s : 21: predicate.merge_addn 0.89% : 0.000008s : 50: predicate.micro_step_allgather_replace 0.96% : 0.000008s : 50: predicate.mini_step_allgather_replace 0.93% : 0.000008s : 53: predicate.minmaximum_grad 0.63% : 0.000005s : 4: predicate.mutable_eliminate 0.13% : 0.000001s : 4: predicate.opt_reshape 0.14% : 0.000001s : 4: predicate.parallel_virtual_node 1.98% : 0.000017s : 80: predicate.partial_defer_inline 1.42% : 0.000012s : 67: predicate.partial_eliminate 0.97% : 0.000008s : 53: predicate.print_const_string_wrapper 0.45% : 0.000004s : 21: predicate.reduce_all_const_elim 1.22% : 0.000010s : 53: predicate.reduce_eliminate 2.19% : 0.000019s : 124: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 21: predicate.remove_not_recompute_node 1.59% : 0.000014s : 113: predicate.replace_applicator 0.65% : 0.000006s : 45: predicate.replace_old_param 0.10% : 0.000001s : 4: predicate.reset_defer_inline 0.98% : 0.000008s : 53: predicate.reshape_eliminate 0.90% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 4: predicate.row_tensor_eliminate 1.11% : 0.000010s : 50: predicate.same_eliminate 0.38% : 0.000003s : 21: predicate.set_cell_output_no_recompute 0.68% : 0.000006s : 21: predicate.shard_identity_eliminate 0.21% : 0.000002s : 8: predicate.special_op_eliminate 0.53% : 0.000005s : 21: predicate.specialize_transform 1.14% : 0.000010s : 50: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000010s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.59% : 0.000014s : 80: predicate.switch_defer_inline 2.49% : 0.000021s : 130: predicate.switch_layer_defer_inline 4.72% : 0.000041s : 218: predicate.switch_simplify 0.93% : 0.000008s : 53: predicate.tile_eliminate 0.94% : 0.000008s : 53: predicate.transpose_eliminate 1.21% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.30% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.27% : 0.000011s : 61: predicate.tuple_list_get_item_depend_reorder 2.51% : 0.000021s : 92: predicate.tuple_list_get_item_eliminator 1.37% : 0.000012s : 61: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000018s : 82: predicate.tuple_list_set_item_eliminator 1.41% : 0.000012s : 71: predicate.tuple_to_list_eliminator_ 2.17% : 0.000019s : 124: predicate.updatestate_pure_node_eliminater 2.64% : 0.000023s : 145: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 4: predicate.value_based_eliminate 0.47% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.53% : 0.000005s : 21: predicate.virtual_output_eliminate 0.08% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002478 36 62.73% : 0.001555s : 15: func_graph_cloner_run.FuncGraphClonerGraph 37.27% : 0.000924s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.626911 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.25% : 0.004096s : 1: add_attr 0.25% : 0.004081s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000076s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000205s : 1: auto_monad 0.00% : 0.000033s : 1: auto_monad_reorder 0.00% : 0.000008s : 1: backend_pass 0.00% : 0.000005s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.04% : 0.000588s : 1: bootstrap 0.00% : 0.000045s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000024s : 1: control_data_broadcast_order 0.00% : 0.000014s : 1: convert_after_rewriter 0.00% : 0.000038s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.00% : 0.000016s : 1: environ_conv 0.00% : 0.000065s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000007s : 1: get_jit_bprop_graph 0.00% : 0.000015s : 1: graph_reusing 0.00% : 0.000006s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000005s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000005s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000010s : 1: label_micro_interleaved_index 0.04% : 0.000648s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000968s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.00% : 0.000025s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000031s : 1: opt.transform.mutable_eliminate 0.32% : 0.005187s : 117: opt.transform.opt_a 0.00% : 0.000034s : 1: opt.transform.opt_after_cconv 0.00% : 0.000037s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000281s : 28: opt.transform.opt_b 0.00% : 0.000066s : 2: opt.transform.opt_trans_graph 0.00% : 0.000052s : 4: opt.transform.symbol_engine_opt 24.27% : 0.394923s : 1: opt_a 0.01% : 0.000143s : 1: opt_after_cconv 0.04% : 0.000703s : 1: opt_after_jit_grad 0.03% : 0.000454s : 1: opt_b 24.48% : 0.398280s : 1: optimize 0.00% : 0.000035s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000033s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000006s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000069s : 1: pre_auto_parallel 0.00% : 0.000055s : 1: py_interpret_to_execute 0.00% : 0.000026s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000024s : 1: remove_dup_value 23.54% : 0.382939s : 2: renormalize.infer 0.13% : 0.002128s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000013s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000082s : 1: rewriter_after_opt_a 0.01% : 0.000193s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000118s : 1: symbol_engine_optimizer 11.63% : 0.189285s : 1: task_emit 0.01% : 0.000104s : 1: tuple_transform 14.77% : 0.240285s : 1: type_inference 0.01% : 0.000119s : 1: validate TotalTime = 0.117682, [24] [bootstrap]: 0.00051697 [type_inference]: 0.0998016 [event_method]: 1.589e-05 [auto_monad]: 6.797e-05 [graph_reusing]: 5.44e-06 [inline]: 3.46999e-06 [add_attr]: 0.00366713, [1] [add_attr_with_inline]: 0.003654, [1] [Cycle 1]: 6.336e-05, [2] [tag_attr]: 1.837e-05 [meta_addattr_fg_expand]: 4.23001e-06 [parallel-infer-symbol]: 4.02998e-06 [pre_auto_parallel]: 3.184e-05 [insert-virtual-dataset]: 2.84999e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 2.93998e-06 [pipeline_split]: 1.77001e-06 [optimize]: 0.00506117, [53] [py_interpret_to_execute]: 2.821e-05 [rewriter_before_opt_a]: 6.276e-05 [opt_a]: 0.00263878, [2] [Cycle 1]: 0.00192019, [45] [expand_dump_flag]: 3.41001e-06 [switch_simplify]: 3.119e-05 [loop_unroll]: 1.739e-05 [a_1]: 0.00040239 [with_stream_mark]: 2.055e-05 [recompute_prepare]: 9.51e-06 [updatestate_depend_eliminate]: 4.60001e-06 [updatestate_assign_eliminate]: 3.90998e-06 [updatestate_loads_eliminate]: 3.30998e-06 [parameter_eliminate]: 1.84998e-06 [a_2]: 8.269e-05 [accelerated_algorithm]: 6.98e-06 [shard]: 2.29001e-06 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 6.14999e-06 [merge_send_recv]: 9.94001e-06 [auto_parallel]: 7.52002e-06 [parallel]: 2.19e-05 [flash_sp]: 1.006e-05 [merge_comm]: 4.05e-06 [allreduce_fusion]: 3.83999e-06 [matmul_add_comm_reduction]: 9.66e-06 [allreduce_slice_to_reducescatter]: 1.38002e-06 [virtual_shard_identity]: 8.04997e-06 [virtual_dataset]: 6.14001e-06 [get_grad_eliminate_]: 5.78002e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 4.11001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.188e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.354e-05 [merge_recompute_call_nodes]: 1.92001e-06 [before_grad]: 1.083e-05 [set_forward_comm_id_for_comm_node_pass]: 4.50999e-06 [meta_fg_expand]: 2.91e-06 [flash_sp_send_recv_attached]: 2.56e-06 [receive_attached]: 2.46998e-06 [after_resolve]: 1.098e-05 [a_after_grad]: 8.99e-06 [renormalize]: 0.00071307 [add_forward_monad_depend]: 6.38e-06 [auto_monad_grad]: 3.13e-06 [auto_monad_eliminator]: 1.819e-05 [cse]: 0.00010028 [a_3]: 4.771e-05 [Cycle 2]: 0.00070439, [45] [expand_dump_flag]: 2.04999e-06 [switch_simplify]: 9.32999e-06 [loop_unroll]: 6.02999e-06 [a_1]: 0.00012447 [with_stream_mark]: 1.554e-05 [recompute_prepare]: 6.92002e-06 [updatestate_depend_eliminate]: 4.1e-06 [updatestate_assign_eliminate]: 2.79001e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.39e-06 [a_2]: 7.251e-05 [accelerated_algorithm]: 7.45e-06 [shard]: 1.79998e-06 [meta_shard_fg_expand]: 2.11998e-06 [shard_inline]: 6.16e-06 [merge_send_recv]: 6.52001e-06 [auto_parallel]: 7.56999e-06 [parallel]: 6.78998e-06 [flash_sp]: 4.53999e-06 [merge_comm]: 3.88001e-06 [allreduce_fusion]: 3.8e-06 [matmul_add_comm_reduction]: 8.59002e-06 [allreduce_slice_to_reducescatter]: 4.89992e-07 [virtual_shard_identity]: 7.18e-06 [virtual_dataset]: 5.50001e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.58997e-06 [merge_forward]: 3.94002e-06 [cell_reuse_recompute_pass]: 4.3e-06 [offload_activation]: 1.02e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.333e-05 [merge_recompute_call_nodes]: 1.16002e-06 [before_grad]: 1.124e-05 [set_forward_comm_id_for_comm_node_pass]: 3.75e-06 [meta_fg_expand]: 2.09e-06 [flash_sp_send_recv_attached]: 1.25999e-06 [receive_attached]: 1.71e-06 [after_resolve]: 1.062e-05 [a_after_grad]: 8.90999e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 2.36e-06 [auto_monad_grad]: 2.04e-06 [auto_monad_eliminator]: 1.118e-05 [cse]: 2.035e-05 [a_3]: 3.421e-05 [py_interpret_to_execute_after_opt_a]: 1.428e-05 [slice_cell_reuse_recomputed_activation]: 2.43e-06 [rewriter_after_opt_a]: 4.306e-05 [convert_after_rewriter]: 7.74002e-06 [order_py_execute_after_rewriter]: 5.87999e-06 [mutable_eliminate]: 0.00071424 [opt_b]: 0.00022436, [1] [Cycle 1]: 0.00021362, [7] [b_1]: 0.00011825 [b_2]: 7.95e-06 [updatestate_depend_eliminate]: 1.054e-05 [updatestate_assign_eliminate]: 3.36999e-06 [updatestate_loads_eliminate]: 2.94999e-06 [renormalize]: 9.70002e-07 [cse]: 3.022e-05 [optimize_parallel_all_gather_comm]: 2.244e-05 [overlap_param_gather]: 2.34001e-06 [cconv]: 3.656e-05 [loop_unroll]: 0.00052803 [opt_after_cconv]: 0.00010921, [1] [Cycle 1]: 0.00010165, [7] [c_1]: 2.718e-05 [parameter_eliminate]: 4.70999e-06 [updatestate_depend_eliminate]: 6.36e-06 [updatestate_assign_eliminate]: 2.68998e-06 [updatestate_loads_eliminate]: 2.45002e-06 [cse]: 2.271e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.775e-05 [tuple_transform]: 8.174e-05, [1] [Cycle 1]: 7.617e-05, [4] [d_1]: 4.64e-05 [none_parameter_eliminate]: 2.22999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 7.68001e-06 [partial_unused_args_eliminate]: 1.94e-06 [add_recomputation]: 5.45e-05 [cse_after_recomputation]: 2.419e-05, [1] [Cycle 1]: 1.893e-05, [1] [cse]: 1.299e-05 [environ_conv]: 7.23e-06 [swap_dp_allreduce_reducescatter]: 5.08002e-06 [bias_add_comm_swap]: 3.73001e-06 [label_micro_interleaved_index]: 4.99e-06 [label_fine_grained_interleaved_index]: 2.78998e-06 [merge_cast_opt]: 1.42999e-06 [slice_recompute_activation]: 2.47001e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 8.99978e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.84001e-06 [reorder_send_recv_between_fp_bp]: 2.68998e-06 [comm_op_add_attrs]: 1.35001e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.30999e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.30999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94999e-06 [control_data_broadcast_order]: 1.518e-05 [grouped_pairwise_exchange_alltoall]: 2.09e-06 [offloading_packed_experts]: 4.55001e-06 [overlap_recompute_and_grad_model_parallel]: 5.22e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.75001e-06 [overlap_recompute_comm]: 2.58e-06 [overlap_grad_ring_attention]: 4.2e-06 [overlap_grad_flash_sp]: 2.424e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 2.14999e-06 [handle_group_info]: 1.20001e-06 [symbol_engine_optimizer]: 8.385e-05, [1] [Cycle 1]: 7.752e-05, [6] [build]: 5.00999e-06 [elim_shapecalc]: 1.12e-05 [elim_not_effective]: 1.355e-05 [opt_reshape]: 7.28e-06 [fold_const_symbol]: 1.051e-05 [renormalize]: 3.7998e-07 [detach_backward]: 2.36998e-06 [pipeline_parallel_scheduler]: 1.71998e-06 [auto_monad_reorder]: 1.773e-05 [get_jit_bprop_graph]: 2.43998e-06 [rewriter_after_jit_bprop_graph]: 6.17001e-06 [opt_after_jit_grad]: 0.00054706 [validate]: 4.947e-05 [backend_pass]: 1.39998e-06 [task_emit]: 0.00760383 [execute]: 1.03e-05 Sums bootstrap : 0.000517s : 0.46% type_inference : 0.099802s : 88.42% event_method : 0.000016s : 0.01% auto_monad : 0.000068s : 0.06% graph_reusing : 0.000005s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000032s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000028s : 0.02% optimize.rewriter_before_opt_a : 0.000063s : 0.06% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.04% optimize.opt_a.loop_unroll : 0.000023s : 0.02% optimize.opt_a.a_1 : 0.000527s : 0.47% optimize.opt_a.with_stream_mark : 0.000036s : 0.03% optimize.opt_a.recompute_prepare : 0.000016s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000155s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000016s : 0.01% optimize.opt_a.auto_parallel : 0.000015s : 0.01% optimize.opt_a.parallel : 0.000029s : 0.03% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000008s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000008s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000022s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000027s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000022s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000713s : 0.63% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.01% optimize.opt_a.auto_monad_grad : 0.000005s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000029s : 0.03% optimize.opt_a.cse : 0.000121s : 0.11% optimize.opt_a.a_3 : 0.000082s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000043s : 0.04% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000714s : 0.63% optimize.opt_b.b_1 : 0.000118s : 0.10% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000011s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000030s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000037s : 0.03% optimize.loop_unroll : 0.000528s : 0.47% optimize.opt_after_cconv.c_1 : 0.000027s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000023s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000018s : 0.02% optimize.tuple_transform.d_1 : 0.000046s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000055s : 0.05% optimize.cse_after_recomputation.cse : 0.000013s : 0.01% optimize.environ_conv : 0.000007s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000005s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.01% opt_after_jit_grad : 0.000547s : 0.48% validate : 0.000049s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.007604s : 6.74% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000191 24 20.08% : 0.000038s : 4: substitution.arithmetic_simplify 1.24% : 0.000002s : 2: substitution.elim_not_effective 0.86% : 0.000002s : 2: substitution.fold_const_symbol 3.30% : 0.000006s : 3: substitution.graph_param_transform 66.69% : 0.000128s : 3: substitution.inline 2.43% : 0.000005s : 4: substitution.j_node_and_user_rematch 2.93% : 0.000006s : 4: substitution.remove_not_recompute_node 2.49% : 0.000005s : 2: substitution.replace_old_param ------[type_inference.] 0.099734 2 99.27% : 0.099003s : 1: type_inference.infer 0.73% : 0.000731s : 1: type_inference.specialize ------[replace.] 0.000032 3 100.00% : 0.000032s : 3: replace.inline ------[match.] 0.000125 3 100.00% : 0.000125s : 3: match.inline ------[predicate.] 0.000161 815 1.01% : 0.000002s : 8: predicate.accumulaten_eliminater 1.23% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 6: predicate.addn_check_dump 0.77% : 0.000001s : 8: predicate.addn_zero_filter 0.71% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.54% : 0.000004s : 14: predicate.arithmetic_simplify 0.81% : 0.000001s : 8: predicate.cast_eliminate 0.83% : 0.000001s : 6: predicate.check_bprop_eliminate 0.60% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.67% : 0.000001s : 6: predicate.depend_value_elim 0.78% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 1.05% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 8: predicate.dict_set_item_eliminator 2.00% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.39% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 11: predicate.environ_add_const_eliminate 0.96% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.01% : 0.000002s : 11: predicate.environ_get_depend_swap 1.93% : 0.000003s : 17: predicate.environ_get_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.09% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 11: predicate.float_depend_g_call 0.55% : 0.000001s : 6: predicate.float_environ_get_switch 0.81% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.55% : 0.000001s : 6: predicate.incorporate_call_switch 6.13% : 0.000010s : 37: predicate.inline 0.91% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 6: predicate.less_batch_normalization 1.39% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.05% : 0.000003s : 22: predicate.load_eliminater 1.26% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.83% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.62% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.57% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 8: predicate.minmaximum_grad 2.31% : 0.000004s : 3: predicate.mutable_eliminate 0.42% : 0.000001s : 3: predicate.opt_reshape 0.52% : 0.000001s : 3: predicate.parallel_virtual_node 1.41% : 0.000002s : 11: predicate.partial_defer_inline 1.18% : 0.000002s : 11: predicate.partial_eliminate 0.86% : 0.000001s : 8: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.05% : 0.000002s : 8: predicate.reduce_eliminate 2.07% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.79% : 0.000001s : 6: predicate.remove_not_recompute_node 1.11% : 0.000002s : 14: predicate.replace_applicator 0.72% : 0.000001s : 6: predicate.replace_old_param 0.59% : 0.000001s : 3: predicate.reset_defer_inline 0.86% : 0.000001s : 8: predicate.reshape_eliminate 0.65% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 3: predicate.row_tensor_eliminate 0.86% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 6: predicate.shard_identity_eliminate 0.88% : 0.000001s : 6: predicate.special_op_eliminate 0.80% : 0.000001s : 6: predicate.specialize_transform 1.29% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.88% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.14% : 0.000002s : 11: predicate.switch_defer_inline 1.79% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.45% : 0.000007s : 38: predicate.switch_simplify 0.86% : 0.000001s : 8: predicate.tile_eliminate 0.81% : 0.000001s : 8: predicate.transpose_eliminate 2.02% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.42% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 22: predicate.updatestate_pure_node_eliminater 2.89% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 3: predicate.value_based_eliminate 0.88% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000372 7 36.09% : 0.000134s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.91% : 0.000238s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.128198 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.87% : 0.003673s : 1: add_attr 2.85% : 0.003658s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000059s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000073s : 1: auto_monad 0.02% : 0.000022s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000007s : 1: bias_add_comm_swap 0.43% : 0.000546s : 1: bootstrap 0.03% : 0.000041s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000027s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000010s : 1: environ_conv 0.02% : 0.000023s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000007s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000005s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.42% : 0.000538s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.57% : 0.000729s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000021s : 1: opt.transform.mutable_eliminate 0.71% : 0.000916s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000094s : 28: opt.transform.opt_b 0.04% : 0.000051s : 2: opt.transform.opt_trans_graph 0.03% : 0.000038s : 4: opt.transform.symbol_engine_opt 2.06% : 0.002642s : 1: opt_a 0.09% : 0.000113s : 1: opt_after_cconv 0.44% : 0.000558s : 1: opt_after_jit_grad 0.18% : 0.000228s : 1: opt_b 3.95% : 0.005067s : 1: optimize 0.02% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000005s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000036s : 1: pre_auto_parallel 0.03% : 0.000032s : 1: py_interpret_to_execute 0.01% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000022s : 1: remove_dup_value 0.32% : 0.000407s : 1: renormalize.infer 0.23% : 0.000295s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000048s : 1: rewriter_after_opt_a 0.05% : 0.000068s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000087s : 1: symbol_engine_optimizer 5.95% : 0.007627s : 1: task_emit 0.07% : 0.000085s : 1: tuple_transform 77.87% : 0.099824s : 1: type_inference 0.07% : 0.000092s : 1: validate TotalTime = 0.485598, [24] [bootstrap]: 0.00048735 [type_inference]: 0.190677 [event_method]: 9.267e-05 [auto_monad]: 0.00014977 [graph_reusing]: 9.31e-06 [inline]: 3.88001e-06 [add_attr]: 0.00423942, [1] [add_attr_with_inline]: 0.0042239, [1] [Cycle 1]: 0.00010392, [2] [tag_attr]: 4.538e-05 [meta_addattr_fg_expand]: 1.021e-05 [parallel-infer-symbol]: 4.16001e-06 [pre_auto_parallel]: 6.403e-05 [insert-virtual-dataset]: 2.64001e-06 [parallel-infer-symbol-second]: 1.18001e-06 [dataset_repeat_opt]: 2.48002e-06 [pipeline_split]: 2.09999e-06 [optimize]: 0.281345, [53] [py_interpret_to_execute]: 4.685e-05 [rewriter_before_opt_a]: 0.00017405 [opt_a]: 0.186056, [3] [Cycle 1]: 0.181981, [45] [expand_dump_flag]: 6.66e-06 [switch_simplify]: 7.758e-05 [loop_unroll]: 6.103e-05 [a_1]: 0.00155123 [with_stream_mark]: 4.12e-05 [recompute_prepare]: 3.206e-05 [updatestate_depend_eliminate]: 1.038e-05 [updatestate_assign_eliminate]: 7.98001e-06 [updatestate_loads_eliminate]: 7.86001e-06 [parameter_eliminate]: 4.69002e-06 [a_2]: 0.00025661 [accelerated_algorithm]: 3.805e-05 [shard]: 2.98998e-06 [meta_shard_fg_expand]: 6.11e-06 [shard_inline]: 1.876e-05 [merge_send_recv]: 2.213e-05 [auto_parallel]: 1.597e-05 [parallel]: 2.308e-05 [flash_sp]: 1.565e-05 [merge_comm]: 1.064e-05 [allreduce_fusion]: 8.60001e-06 [matmul_add_comm_reduction]: 3.97e-05 [allreduce_slice_to_reducescatter]: 9.69972e-07 [virtual_shard_identity]: 2.083e-05 [virtual_dataset]: 1.622e-05 [get_grad_eliminate_]: 1.525e-05 [virtual_output]: 1.539e-05 [merge_forward]: 1.075e-05 [cell_reuse_recompute_pass]: 2.14999e-06 [offload_activation]: 2.109e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.365e-05 [merge_recompute_call_nodes]: 2.10002e-06 [before_grad]: 3.065e-05 [set_forward_comm_id_for_comm_node_pass]: 1.076e-05 [meta_fg_expand]: 0.00221911 [flash_sp_send_recv_attached]: 5.22e-06 [receive_attached]: 2.95002e-06 [after_resolve]: 8.61e-05 [a_after_grad]: 0.00010071 [renormalize]: 0.176102 [add_forward_monad_depend]: 1.477e-05 [auto_monad_grad]: 6.39001e-06 [auto_monad_eliminator]: 5.829e-05 [cse]: 0.00020873 [a_3]: 0.00035904 [Cycle 2]: 0.00332367, [45] [expand_dump_flag]: 3.57997e-06 [switch_simplify]: 4.824e-05 [loop_unroll]: 4.237e-05 [a_1]: 0.00145636 [with_stream_mark]: 2.09e-05 [recompute_prepare]: 1.208e-05 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 4.58001e-06 [updatestate_loads_eliminate]: 3.48999e-06 [parameter_eliminate]: 3.06999e-06 [a_2]: 9.409e-05 [accelerated_algorithm]: 1.347e-05 [shard]: 2.46998e-06 [meta_shard_fg_expand]: 2.53998e-06 [shard_inline]: 7.39002e-06 [merge_send_recv]: 1.135e-05 [auto_parallel]: 1.263e-05 [parallel]: 1.14e-05 [flash_sp]: 4.61002e-06 [merge_comm]: 4.22998e-06 [allreduce_fusion]: 4.07e-06 [matmul_add_comm_reduction]: 1.241e-05 [allreduce_slice_to_reducescatter]: 1.07998e-06 [virtual_shard_identity]: 9.81998e-06 [virtual_dataset]: 7.15e-06 [get_grad_eliminate_]: 6.53e-06 [virtual_output]: 6.78e-06 [merge_forward]: 5.15001e-06 [cell_reuse_recompute_pass]: 1.57001e-06 [offload_activation]: 1.105e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.583e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 1.192e-05 [set_forward_comm_id_for_comm_node_pass]: 4.50999e-06 [meta_fg_expand]: 9.323e-05 [flash_sp_send_recv_attached]: 2.03997e-06 [receive_attached]: 2.68e-06 [after_resolve]: 1.421e-05 [a_after_grad]: 1.107e-05 [renormalize]: 0.00093847 [add_forward_monad_depend]: 6.43998e-06 [auto_monad_grad]: 2.69999e-06 [auto_monad_eliminator]: 1.658e-05 [cse]: 3.749e-05 [a_3]: 5.279e-05 [Cycle 3]: 0.00072747, [45] [expand_dump_flag]: 2.32999e-06 [switch_simplify]: 8.83001e-06 [loop_unroll]: 7.16999e-06 [a_1]: 0.00015706 [with_stream_mark]: 1.089e-05 [recompute_prepare]: 7.58001e-06 [updatestate_depend_eliminate]: 4.15999e-06 [updatestate_assign_eliminate]: 2.81e-06 [updatestate_loads_eliminate]: 2.99001e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 8.693e-05 [accelerated_algorithm]: 1.068e-05 [shard]: 1.47001e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 6.81999e-06 [merge_send_recv]: 5.95002e-06 [auto_parallel]: 8.37e-06 [parallel]: 6.94999e-06 [flash_sp]: 1.35999e-06 [merge_comm]: 3.8e-06 [allreduce_fusion]: 3.68e-06 [matmul_add_comm_reduction]: 7.26999e-06 [allreduce_slice_to_reducescatter]: 5.00004e-07 [virtual_shard_identity]: 7.73999e-06 [virtual_dataset]: 6.63e-06 [get_grad_eliminate_]: 6.21e-06 [virtual_output]: 6.07001e-06 [merge_forward]: 4.28001e-06 [cell_reuse_recompute_pass]: 2.79001e-06 [offload_activation]: 8.37998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.32e-05 [merge_recompute_call_nodes]: 1.25999e-06 [before_grad]: 1.138e-05 [set_forward_comm_id_for_comm_node_pass]: 4.46002e-06 [meta_fg_expand]: 2.43e-06 [flash_sp_send_recv_attached]: 1.56998e-06 [receive_attached]: 2.09999e-06 [after_resolve]: 1.036e-05 [a_after_grad]: 9.70002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 1.42e-06 [auto_monad_eliminator]: 9.60001e-06 [cse]: 1.862e-05 [a_3]: 4.118e-05 [py_interpret_to_execute_after_opt_a]: 1.674e-05 [slice_cell_reuse_recomputed_activation]: 2.09999e-06 [rewriter_after_opt_a]: 4.612e-05 [convert_after_rewriter]: 1.26e-05 [order_py_execute_after_rewriter]: 6.01e-06 [mutable_eliminate]: 0.00077969 [opt_b]: 0.0924336, [1] [Cycle 1]: 0.0924201, [7] [b_1]: 0.00013628 [b_2]: 9.00999e-06 [updatestate_depend_eliminate]: 8.05e-06 [updatestate_assign_eliminate]: 0.0920978 [updatestate_loads_eliminate]: 1.08e-05 [renormalize]: 1.25001e-06 [cse]: 5.471e-05 [optimize_parallel_all_gather_comm]: 3.266e-05 [overlap_param_gather]: 2.46e-06 [cconv]: 4.637e-05 [loop_unroll]: 0.00084539 [opt_after_cconv]: 0.00014057, [1] [Cycle 1]: 0.00012964, [7] [c_1]: 3.817e-05 [parameter_eliminate]: 6.69999e-06 [updatestate_depend_eliminate]: 1.102e-05 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 2.84999e-06 [cse]: 3.055e-05 [renormalize]: 8.30012e-07 [remove_dup_value]: 2.054e-05 [tuple_transform]: 9.7e-05, [1] [Cycle 1]: 9.192e-05, [4] [d_1]: 6.099e-05 [none_parameter_eliminate]: 2.08002e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 8.13999e-06 [partial_unused_args_eliminate]: 2.16998e-06 [add_recomputation]: 7.033e-05 [cse_after_recomputation]: 2.918e-05, [1] [Cycle 1]: 2.321e-05, [1] [cse]: 1.753e-05 [environ_conv]: 1.19e-05 [swap_dp_allreduce_reducescatter]: 6.39001e-06 [bias_add_comm_swap]: 3.61999e-06 [label_micro_interleaved_index]: 5.57001e-06 [label_fine_grained_interleaved_index]: 2.79999e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.43998e-06 [micro_interleaved_order_control]: 2.90002e-06 [assign_add_opt]: 1.50001e-06 [ForceFp32Comm]: 9.50007e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.78e-06 [reorder_send_recv_between_fp_bp]: 2.93998e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.74e-06 [interleave_parallel_branches]: 1.15999e-06 [overlap_opt_shard_in_pipeline]: 1.33002e-06 [overlap_opt_shard_grad_in_pipeline]: 2.01e-06 [control_data_broadcast_order]: 1.619e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 4.67e-06 [overlap_recompute_and_grad_model_parallel]: 5.52999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.35001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.72001e-06 [overlap_grad_ring_attention]: 5.07e-06 [overlap_grad_flash_sp]: 2.306e-05 [begin_end_overlap_inline]: 6.50005e-07 [split_matmul_comm_elemetwise]: 2.77002e-06 [split_layernorm_comm]: 1.90001e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 9.914e-05, [1] [Cycle 1]: 9.362e-05, [6] [build]: 1.125e-05 [elim_shapecalc]: 1.299e-05 [elim_not_effective]: 1.612e-05 [opt_reshape]: 8.70999e-06 [fold_const_symbol]: 1.237e-05 [renormalize]: 1.80007e-07 [detach_backward]: 2.36e-06 [pipeline_parallel_scheduler]: 1.88002e-06 [auto_monad_reorder]: 2.255e-05 [get_jit_bprop_graph]: 2.14999e-06 [rewriter_after_jit_bprop_graph]: 6.47001e-06 [opt_after_jit_grad]: 0.00057777 [validate]: 5.917e-05 [backend_pass]: 1.58002e-06 [task_emit]: 0.00756438 [execute]: 9.67001e-06 Sums bootstrap : 0.000487s : 0.10% type_inference : 0.190677s : 39.74% event_method : 0.000093s : 0.02% auto_monad : 0.000150s : 0.03% graph_reusing : 0.000009s : 0.00% inline : 0.000004s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000045s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000064s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000047s : 0.01% optimize.rewriter_before_opt_a : 0.000174s : 0.04% optimize.opt_a.expand_dump_flag : 0.000013s : 0.00% optimize.opt_a.switch_simplify : 0.000135s : 0.03% optimize.opt_a.loop_unroll : 0.000111s : 0.02% optimize.opt_a.a_1 : 0.003165s : 0.66% optimize.opt_a.with_stream_mark : 0.000073s : 0.02% optimize.opt_a.recompute_prepare : 0.000052s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.00% optimize.opt_a.parameter_eliminate : 0.000009s : 0.00% optimize.opt_a.a_2 : 0.000438s : 0.09% optimize.opt_a.accelerated_algorithm : 0.000062s : 0.01% optimize.opt_a.shard : 0.000007s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.00% optimize.opt_a.shard_inline : 0.000033s : 0.01% optimize.opt_a.merge_send_recv : 0.000039s : 0.01% optimize.opt_a.auto_parallel : 0.000037s : 0.01% optimize.opt_a.parallel : 0.000041s : 0.01% optimize.opt_a.flash_sp : 0.000022s : 0.00% optimize.opt_a.merge_comm : 0.000019s : 0.00% optimize.opt_a.allreduce_fusion : 0.000016s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000059s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000003s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.01% optimize.opt_a.virtual_dataset : 0.000030s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.01% optimize.opt_a.virtual_output : 0.000028s : 0.01% optimize.opt_a.merge_forward : 0.000020s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000007s : 0.00% optimize.opt_a.offload_activation : 0.000041s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000054s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.00% optimize.opt_a.meta_fg_expand : 0.002315s : 0.48% optimize.opt_a.flash_sp_send_recv_attached : 0.000009s : 0.00% optimize.opt_a.receive_attached : 0.000008s : 0.00% optimize.opt_a.after_resolve : 0.000111s : 0.02% optimize.opt_a.a_after_grad : 0.000121s : 0.03% optimize.opt_a.renormalize : 0.177041s : 36.90% optimize.opt_a.add_forward_monad_depend : 0.000022s : 0.00% optimize.opt_a.auto_monad_grad : 0.000011s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.02% optimize.opt_a.cse : 0.000265s : 0.06% optimize.opt_a.a_3 : 0.000453s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000017s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.01% optimize.convert_after_rewriter : 0.000013s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000780s : 0.16% optimize.opt_b.b_1 : 0.000136s : 0.03% optimize.opt_b.b_2 : 0.000009s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.092098s : 19.20% optimize.opt_b.updatestate_loads_eliminate : 0.000011s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000055s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000033s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000046s : 0.01% optimize.loop_unroll : 0.000845s : 0.18% optimize.opt_after_cconv.c_1 : 0.000038s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000011s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000021s : 0.00% optimize.tuple_transform.d_1 : 0.000061s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000070s : 0.01% optimize.cse_after_recomputation.cse : 0.000018s : 0.00% optimize.environ_conv : 0.000012s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000023s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000578s : 0.12% validate : 0.000059s : 0.01% backend_pass : 0.000002s : 0.00% task_emit : 0.007564s : 1.58% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000937 159 7.56% : 0.000071s : 7: substitution.arithmetic_simplify 0.32% : 0.000003s : 3: substitution.elim_not_effective 0.59% : 0.000006s : 5: substitution.float_depend_g_call 0.48% : 0.000005s : 2: substitution.float_tuple_getitem_switch 0.23% : 0.000002s : 3: substitution.fold_const_symbol 0.78% : 0.000007s : 4: substitution.graph_param_transform 0.38% : 0.000004s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 59.60% : 0.000558s : 17: substitution.inline 2.80% : 0.000026s : 2: substitution.inline_without_move 1.34% : 0.000013s : 15: substitution.j_node_and_user_rematch 2.30% : 0.000022s : 3: substitution.less_batch_normalization 1.26% : 0.000012s : 7: substitution.minmaximum_grad 0.82% : 0.000008s : 5: substitution.partial_eliminate 1.42% : 0.000013s : 15: substitution.remove_not_recompute_node 3.92% : 0.000037s : 10: substitution.replace_applicator 1.42% : 0.000013s : 10: substitution.replace_old_param 0.49% : 0.000005s : 1: substitution.set_cell_output_no_recompute 2.71% : 0.000025s : 7: substitution.tuple_list_convert_item_index_to_positive 1.21% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.60% : 0.000015s : 7: substitution.tuple_list_get_item_depend_reorder 6.72% : 0.000063s : 18: substitution.tuple_list_get_item_eliminator 1.80% : 0.000017s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.190566 2 99.08% : 0.188822s : 1: type_inference.infer 0.92% : 0.001745s : 1: type_inference.specialize ------[replace.] 0.000221 26 68.21% : 0.000151s : 17: replace.inline 31.79% : 0.000070s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000579 26 94.64% : 0.000548s : 17: match.inline 5.36% : 0.000031s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000731 4180 1.21% : 0.000009s : 52: predicate.accumulaten_eliminater 0.24% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.62% : 0.000005s : 21: predicate.addn_check_dump 1.11% : 0.000008s : 52: predicate.addn_zero_filter 1.04% : 0.000008s : 52: predicate.adjust_all_reduce_mul_add 2.12% : 0.000016s : 73: predicate.arithmetic_simplify 1.16% : 0.000008s : 52: predicate.cast_eliminate 1.08% : 0.000008s : 50: predicate.check_bprop_eliminate 0.48% : 0.000004s : 21: predicate.compare_switch_simplify 0.07% : 0.000000s : 4: predicate.const_output_eliminate 0.43% : 0.000003s : 21: predicate.depend_value_elim 1.17% : 0.000009s : 52: predicate.dict_get_item_const_eliminator 1.12% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 56: predicate.environ_add_const_eliminate 1.12% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.16% : 0.000008s : 56: predicate.environ_get_depend_swap 1.62% : 0.000012s : 77: predicate.environ_get_eliminate 1.13% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 78: predicate.exchange_switch_depend_value 2.43% : 0.000018s : 78: predicate.float_depend_g_call 0.47% : 0.000003s : 21: predicate.float_environ_get_switch 0.61% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.53% : 0.000004s : 21: predicate.get_grad_eliminate 0.10% : 0.000001s : 4: predicate.graph_param_transform 0.53% : 0.000004s : 21: predicate.incorporate_call 0.43% : 0.000003s : 21: predicate.incorporate_call_switch 5.85% : 0.000043s : 180: predicate.inline 1.41% : 0.000010s : 45: predicate.inline_without_move 0.27% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.70% : 0.000005s : 21: predicate.less_batch_normalization 1.59% : 0.000012s : 69: predicate.list_to_tuple_eliminator_ 2.53% : 0.000018s : 121: predicate.load_eliminater 0.53% : 0.000004s : 4: predicate.loop_unroll_after_grad 2.46% : 0.000018s : 110: predicate.loop_unroll_before_grad 1.49% : 0.000011s : 60: predicate.make_slice_get_slice_eliminator 0.51% : 0.000004s : 21: predicate.merge_addn 1.14% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.15% : 0.000008s : 52: predicate.minmaximum_grad 0.52% : 0.000004s : 4: predicate.mutable_eliminate 0.15% : 0.000001s : 4: predicate.opt_reshape 0.18% : 0.000001s : 4: predicate.parallel_virtual_node 2.20% : 0.000016s : 78: predicate.partial_defer_inline 1.62% : 0.000012s : 65: predicate.partial_eliminate 1.11% : 0.000008s : 52: predicate.print_const_string_wrapper 0.51% : 0.000004s : 21: predicate.reduce_all_const_elim 1.35% : 0.000010s : 52: predicate.reduce_eliminate 2.48% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.38% : 0.000003s : 21: predicate.remove_not_recompute_node 1.87% : 0.000014s : 111: predicate.replace_applicator 0.81% : 0.000006s : 45: predicate.replace_old_param 0.11% : 0.000001s : 4: predicate.reset_defer_inline 1.21% : 0.000009s : 52: predicate.reshape_eliminate 1.06% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 4: predicate.row_tensor_eliminate 1.25% : 0.000009s : 50: predicate.same_eliminate 0.36% : 0.000003s : 21: predicate.set_cell_output_no_recompute 0.67% : 0.000005s : 21: predicate.shard_identity_eliminate 0.24% : 0.000002s : 8: predicate.special_op_eliminate 0.57% : 0.000004s : 21: predicate.specialize_transform 1.41% : 0.000010s : 50: predicate.split_environ_get_set_with_tuple_value 1.26% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.93% : 0.000014s : 78: predicate.switch_defer_inline 2.89% : 0.000021s : 128: predicate.switch_layer_defer_inline 4.95% : 0.000036s : 213: predicate.switch_simplify 1.08% : 0.000008s : 52: predicate.tile_eliminate 1.07% : 0.000008s : 52: predicate.transpose_eliminate 1.36% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 60: predicate.tuple_list_get_item_depend_reorder 2.77% : 0.000020s : 90: predicate.tuple_list_get_item_eliminator 1.55% : 0.000011s : 60: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000016s : 81: predicate.tuple_list_set_item_eliminator 1.48% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.55% : 0.000019s : 121: predicate.updatestate_pure_node_eliminater 2.99% : 0.000022s : 142: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 4: predicate.value_based_eliminate 0.56% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.11% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002237 35 61.85% : 0.001384s : 14: func_graph_cloner_run.FuncGraphClonerGraph 38.15% : 0.000854s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.953191 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.45% : 0.004246s : 1: add_attr 0.44% : 0.004230s : 1: add_attr_with_inline 0.00% : 0.000005s : 1: add_comm_op_reuse_tag 0.01% : 0.000075s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000158s : 1: auto_monad 0.00% : 0.000027s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000005s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.05% : 0.000518s : 1: bootstrap 0.01% : 0.000050s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000016s : 1: convert_after_rewriter 0.00% : 0.000032s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000015s : 1: environ_conv 0.01% : 0.000103s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000014s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000006s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.09% : 0.000858s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.08% : 0.000794s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000028s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000022s : 1: opt.transform.mutable_eliminate 0.50% : 0.004761s : 117: opt.transform.opt_a 0.00% : 0.000036s : 1: opt.transform.opt_after_cconv 0.00% : 0.000028s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000115s : 28: opt.transform.opt_b 0.01% : 0.000066s : 2: opt.transform.opt_trans_graph 0.00% : 0.000046s : 4: opt.transform.symbol_engine_opt 19.52% : 0.186060s : 1: opt_a 0.02% : 0.000144s : 1: opt_after_cconv 0.06% : 0.000590s : 1: opt_after_jit_grad 9.70% : 0.092439s : 1: opt_b 29.52% : 0.281351s : 1: optimize 0.00% : 0.000037s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000006s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000070s : 1: pre_auto_parallel 0.01% : 0.000053s : 1: py_interpret_to_execute 0.00% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000024s : 1: remove_dup_value 18.37% : 0.175100s : 2: renormalize.infer 0.20% : 0.001918s : 2: renormalize.specialize 0.00% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000011s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000051s : 1: rewriter_after_opt_a 0.02% : 0.000180s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000102s : 1: symbol_engine_optimizer 0.80% : 0.007585s : 1: task_emit 0.01% : 0.000101s : 1: tuple_transform 20.01% : 0.190705s : 1: type_inference 0.01% : 0.000101s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x0-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x0-kbk],max_mem:4.0M ..... TotalTime = 147.336, [24] [bootstrap]: 0.00075793 [type_inference]: 0.00914796 [event_method]: 1.707e-05 [auto_monad]: 6.99e-05 [graph_reusing]: 6.54001e-06 [inline]: 3.3e-06 [add_attr]: 0.00558668, [1] [add_attr_with_inline]: 0.00556732, [1] [Cycle 1]: 7.738e-05, [2] [tag_attr]: 2.416e-05 [meta_addattr_fg_expand]: 4.92e-06 [parallel-infer-symbol]: 3.85e-06 [pre_auto_parallel]: 4.075e-05 [insert-virtual-dataset]: 2.61e-06 [parallel-infer-symbol-second]: 9.00007e-07 [dataset_repeat_opt]: 2.95998e-06 [pipeline_split]: 2.24001e-06 [optimize]: 0.171803, [53] [py_interpret_to_execute]: 3.371e-05 [rewriter_before_opt_a]: 8.64e-05 [opt_a]: 0.168907, [2] [Cycle 1]: 0.168079, [45] [expand_dump_flag]: 3.41999e-06 [switch_simplify]: 3.76e-05 [loop_unroll]: 2.099e-05 [a_1]: 0.00060321 [with_stream_mark]: 2.391e-05 [recompute_prepare]: 1.433e-05 [updatestate_depend_eliminate]: 5.35001e-06 [updatestate_assign_eliminate]: 3.8e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 2.30002e-06 [a_2]: 8.771e-05 [accelerated_algorithm]: 7.6e-06 [shard]: 2.56e-06 [meta_shard_fg_expand]: 2.52001e-06 [shard_inline]: 6.71e-06 [merge_send_recv]: 1.067e-05 [auto_parallel]: 1.035e-05 [parallel]: 3.46e-05 [flash_sp]: 1.166e-05 [merge_comm]: 4.52e-06 [allreduce_fusion]: 4.3e-06 [matmul_add_comm_reduction]: 1.327e-05 [allreduce_slice_to_reducescatter]: 1.22e-06 [virtual_shard_identity]: 1.362e-05 [virtual_dataset]: 7.21001e-06 [get_grad_eliminate_]: 6.04999e-06 [virtual_output]: 7.01999e-06 [merge_forward]: 4.55999e-06 [cell_reuse_recompute_pass]: 2.81e-06 [offload_activation]: 1.158e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.653e-05 [merge_recompute_call_nodes]: 1.95001e-06 [before_grad]: 1.185e-05 [set_forward_comm_id_for_comm_node_pass]: 4.89e-06 [meta_fg_expand]: 3.56001e-06 [flash_sp_send_recv_attached]: 3.35003e-06 [receive_attached]: 2.93e-06 [after_resolve]: 1.254e-05 [a_after_grad]: 9.24e-06 [renormalize]: 0.166551 [add_forward_monad_depend]: 2.023e-05 [auto_monad_grad]: 3.11999e-06 [auto_monad_eliminator]: 2.818e-05 [cse]: 3.88e-05 [a_3]: 6.769e-05 [Cycle 2]: 0.0008112, [45] [expand_dump_flag]: 2.28998e-06 [switch_simplify]: 1.051e-05 [loop_unroll]: 6.54001e-06 [a_1]: 0.00015029 [with_stream_mark]: 2.073e-05 [recompute_prepare]: 8.57e-06 [updatestate_depend_eliminate]: 4.82998e-06 [updatestate_assign_eliminate]: 3.58e-06 [updatestate_loads_eliminate]: 3.95998e-06 [parameter_eliminate]: 2.09e-06 [a_2]: 7.881e-05 [accelerated_algorithm]: 7.60998e-06 [shard]: 2.94999e-06 [meta_shard_fg_expand]: 2.79999e-06 [shard_inline]: 7.56001e-06 [merge_send_recv]: 9.89999e-06 [auto_parallel]: 1.09e-05 [parallel]: 1.142e-05 [flash_sp]: 4.88001e-06 [merge_comm]: 4.62e-06 [allreduce_fusion]: 4.03001e-06 [matmul_add_comm_reduction]: 1.234e-05 [allreduce_slice_to_reducescatter]: 1.03001e-06 [virtual_shard_identity]: 1.03e-05 [virtual_dataset]: 6.19001e-06 [get_grad_eliminate_]: 5.67001e-06 [virtual_output]: 5.27999e-06 [merge_forward]: 5.07999e-06 [cell_reuse_recompute_pass]: 4.57e-06 [offload_activation]: 1.133e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.478e-05 [merge_recompute_call_nodes]: 1.77001e-06 [before_grad]: 1.234e-05 [set_forward_comm_id_for_comm_node_pass]: 3.76001e-06 [meta_fg_expand]: 2.36998e-06 [flash_sp_send_recv_attached]: 2.30002e-06 [receive_attached]: 3.14999e-06 [after_resolve]: 1.368e-05 [a_after_grad]: 9.22001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.83002e-06 [auto_monad_grad]: 2.44999e-06 [auto_monad_eliminator]: 1.414e-05 [cse]: 2.46e-05 [a_3]: 3.702e-05 [py_interpret_to_execute_after_opt_a]: 2.085e-05 [slice_cell_reuse_recomputed_activation]: 2.09999e-06 [rewriter_after_opt_a]: 6.057e-05 [convert_after_rewriter]: 9.55001e-06 [order_py_execute_after_rewriter]: 5.56e-06 [mutable_eliminate]: 0.0008433 [opt_b]: 0.00023719, [1] [Cycle 1]: 0.0002247, [7] [b_1]: 0.00011802 [b_2]: 9.32001e-06 [updatestate_depend_eliminate]: 1.23e-05 [updatestate_assign_eliminate]: 3.04999e-06 [updatestate_loads_eliminate]: 3.23e-06 [renormalize]: 1.15001e-06 [cse]: 3.491e-05 [optimize_parallel_all_gather_comm]: 2.862e-05 [overlap_param_gather]: 2.73e-06 [cconv]: 4.193e-05 [loop_unroll]: 0.00059394 [opt_after_cconv]: 0.00012643, [1] [Cycle 1]: 0.00011636, [7] [c_1]: 2.849e-05 [parameter_eliminate]: 6.52001e-06 [updatestate_depend_eliminate]: 8.64e-06 [updatestate_assign_eliminate]: 2.96001e-06 [updatestate_loads_eliminate]: 2.94999e-06 [cse]: 2.619e-05 [renormalize]: 6.89994e-07 [remove_dup_value]: 1.93e-05 [tuple_transform]: 0.00016658, [1] [Cycle 1]: 0.0001591, [4] [d_1]: 5.048e-05 [none_parameter_eliminate]: 2.12999e-06 [renormalize]: 4.80009e-07 [switch_simplify]: 1.002e-05 [partial_unused_args_eliminate]: 2.27001e-06 [add_recomputation]: 7.017e-05 [cse_after_recomputation]: 2.887e-05, [1] [Cycle 1]: 2.232e-05, [1] [cse]: 1.535e-05 [environ_conv]: 7.78001e-06 [swap_dp_allreduce_reducescatter]: 6.53e-06 [bias_add_comm_swap]: 3.78001e-06 [label_micro_interleaved_index]: 8.32e-06 [label_fine_grained_interleaved_index]: 2.74999e-06 [merge_cast_opt]: 1.59e-06 [slice_recompute_activation]: 2.54999e-06 [micro_interleaved_order_control]: 2.41998e-06 [assign_add_opt]: 1.86998e-06 [ForceFp32Comm]: 1.04e-06 [remove_cast_before_assign_add]: 1.29998e-06 [full_micro_interleaved_order_control]: 2.54999e-06 [reorder_send_recv_between_fp_bp]: 3.08e-06 [comm_op_add_attrs]: 1.17e-06 [add_comm_op_reuse_tag]: 1.20001e-06 [interleave_split_concat_branches]: 1.42e-06 [interleave_parallel_branches]: 1.12999e-06 [overlap_opt_shard_in_pipeline]: 1.45999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.16e-06 [control_data_broadcast_order]: 1.757e-05 [grouped_pairwise_exchange_alltoall]: 1.86e-06 [offloading_packed_experts]: 5.49e-06 [overlap_recompute_and_grad_model_parallel]: 5.66e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.65001e-06 [overlap_recompute_comm]: 3.24001e-06 [overlap_grad_ring_attention]: 5.04998e-06 [overlap_grad_flash_sp]: 2.938e-05 [begin_end_overlap_inline]: 7.50006e-07 [split_matmul_comm_elemetwise]: 2.49001e-06 [split_layernorm_comm]: 2.02001e-06 [handle_group_info]: 1.17e-06 [symbol_engine_optimizer]: 0.00010252, [1] [Cycle 1]: 9.496e-05, [6] [build]: 5.25999e-06 [elim_shapecalc]: 1.706e-05 [elim_not_effective]: 1.669e-05 [opt_reshape]: 8.29002e-06 [fold_const_symbol]: 1.012e-05 [renormalize]: 7.40023e-07 [detach_backward]: 2.67001e-06 [pipeline_parallel_scheduler]: 2.05002e-06 [auto_monad_reorder]: 2.385e-05 [get_jit_bprop_graph]: 2.47001e-06 [rewriter_after_jit_bprop_graph]: 7.60998e-06 [opt_after_jit_grad]: 0.00078652 [validate]: 6.055e-05 [backend_pass]: 1.71e-06 [task_emit]: 147.148 [execute]: 9.83002e-06 Sums bootstrap : 0.000758s : 0.00% type_inference : 0.009148s : 0.01% event_method : 0.000017s : 0.00% auto_monad : 0.000070s : 0.00% graph_reusing : 0.000007s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000024s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000041s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000034s : 0.00% optimize.rewriter_before_opt_a : 0.000086s : 0.00% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000048s : 0.00% optimize.opt_a.loop_unroll : 0.000028s : 0.00% optimize.opt_a.a_1 : 0.000754s : 0.00% optimize.opt_a.with_stream_mark : 0.000045s : 0.00% optimize.opt_a.recompute_prepare : 0.000023s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000010s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000008s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000167s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000015s : 0.00% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000005s : 0.00% optimize.opt_a.shard_inline : 0.000014s : 0.00% optimize.opt_a.merge_send_recv : 0.000021s : 0.00% optimize.opt_a.auto_parallel : 0.000021s : 0.00% optimize.opt_a.parallel : 0.000046s : 0.00% optimize.opt_a.flash_sp : 0.000017s : 0.00% optimize.opt_a.merge_comm : 0.000009s : 0.00% optimize.opt_a.allreduce_fusion : 0.000008s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000026s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000024s : 0.00% optimize.opt_a.virtual_dataset : 0.000013s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000010s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000007s : 0.00% optimize.opt_a.offload_activation : 0.000023s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000031s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000024s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000009s : 0.00% optimize.opt_a.meta_fg_expand : 0.000006s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000006s : 0.00% optimize.opt_a.after_resolve : 0.000026s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.166551s : 0.11% optimize.opt_a.add_forward_monad_depend : 0.000022s : 0.00% optimize.opt_a.auto_monad_grad : 0.000006s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000042s : 0.00% optimize.opt_a.cse : 0.000063s : 0.00% optimize.opt_a.a_3 : 0.000105s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000021s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000061s : 0.00% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000843s : 0.00% optimize.opt_b.b_1 : 0.000118s : 0.00% optimize.opt_b.b_2 : 0.000009s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000012s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000035s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000029s : 0.00% optimize.overlap_param_gather : 0.000003s : 0.00% optimize.cconv : 0.000042s : 0.00% optimize.loop_unroll : 0.000594s : 0.00% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000026s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000019s : 0.00% optimize.tuple_transform.d_1 : 0.000050s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000070s : 0.00% optimize.cse_after_recomputation.cse : 0.000015s : 0.00% optimize.environ_conv : 0.000008s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000008s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000029s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000005s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000017s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000017s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000001s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000008s : 0.00% opt_after_jit_grad : 0.000787s : 0.00% validate : 0.000061s : 0.00% backend_pass : 0.000002s : 0.00% task_emit : 147.147614s : 99.88% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000319 26 17.72% : 0.000057s : 5: substitution.arithmetic_simplify 0.73% : 0.000002s : 2: substitution.elim_not_effective 0.43% : 0.000001s : 2: substitution.fold_const_symbol 1.85% : 0.000006s : 3: substitution.graph_param_transform 70.01% : 0.000223s : 3: substitution.inline 1.93% : 0.000006s : 4: substitution.j_node_and_user_rematch 1.95% : 0.000006s : 4: substitution.remove_not_recompute_node 1.87% : 0.000006s : 2: substitution.replace_old_param 3.51% : 0.000011s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.009064 2 91.75% : 0.008316s : 1: type_inference.infer 8.25% : 0.000748s : 1: type_inference.specialize ------[replace.] 0.000047 4 79.34% : 0.000037s : 3: replace.inline 20.66% : 0.000010s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000231 4 95.50% : 0.000221s : 3: match.inline 4.50% : 0.000010s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000207 883 0.77% : 0.000002s : 9: predicate.accumulaten_eliminater 1.84% : 0.000004s : 3: predicate.ad_related_special_op_eliminate 0.49% : 0.000001s : 6: predicate.addn_check_dump 0.86% : 0.000002s : 9: predicate.addn_zero_filter 0.90% : 0.000002s : 9: predicate.adjust_all_reduce_mul_add 2.32% : 0.000005s : 15: predicate.arithmetic_simplify 1.03% : 0.000002s : 9: predicate.cast_eliminate 0.61% : 0.000001s : 6: predicate.check_bprop_eliminate 0.74% : 0.000002s : 6: predicate.compare_switch_simplify 0.15% : 0.000000s : 3: predicate.const_output_eliminate 0.84% : 0.000002s : 6: predicate.depend_value_elim 0.80% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.54% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.28% : 0.000001s : 3: predicate.elim_not_effective 0.54% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.00% : 0.000002s : 12: predicate.environ_get_depend_swap 1.96% : 0.000004s : 18: predicate.environ_get_eliminate 0.98% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.10% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.23% : 0.000005s : 13: predicate.float_depend_g_call 0.52% : 0.000001s : 6: predicate.float_environ_get_switch 0.71% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.16% : 0.000000s : 3: predicate.fold_const_symbol 0.72% : 0.000001s : 6: predicate.get_grad_eliminate 0.30% : 0.000001s : 3: predicate.graph_param_transform 0.51% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 5.78% : 0.000012s : 40: predicate.inline 0.87% : 0.000002s : 6: predicate.inline_without_move 0.31% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.86% : 0.000002s : 6: predicate.less_batch_normalization 1.60% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.06% : 0.000004s : 25: predicate.load_eliminater 1.70% : 0.000004s : 3: predicate.loop_unroll_after_grad 1.70% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.77% : 0.000004s : 15: predicate.make_slice_get_slice_eliminator 0.48% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.63% : 0.000001s : 9: predicate.minmaximum_grad 2.02% : 0.000004s : 3: predicate.mutable_eliminate 0.50% : 0.000001s : 3: predicate.opt_reshape 0.39% : 0.000001s : 3: predicate.parallel_virtual_node 1.38% : 0.000003s : 13: predicate.partial_defer_inline 1.11% : 0.000002s : 13: predicate.partial_eliminate 0.96% : 0.000002s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.60% : 0.000003s : 9: predicate.reduce_eliminate 2.31% : 0.000005s : 25: predicate.redundant_stop_gradient_eliminater 0.77% : 0.000002s : 6: predicate.remove_not_recompute_node 1.37% : 0.000003s : 16: predicate.replace_applicator 0.76% : 0.000002s : 6: predicate.replace_old_param 0.34% : 0.000001s : 3: predicate.reset_defer_inline 1.01% : 0.000002s : 9: predicate.reshape_eliminate 0.47% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 1.05% : 0.000002s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.58% : 0.000003s : 6: predicate.shard_identity_eliminate 0.63% : 0.000001s : 6: predicate.special_op_eliminate 0.64% : 0.000001s : 6: predicate.specialize_transform 1.25% : 0.000003s : 6: predicate.split_environ_get_set_with_tuple_value 0.69% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.08% : 0.000002s : 13: predicate.switch_defer_inline 1.76% : 0.000004s : 19: predicate.switch_layer_defer_inline 4.51% : 0.000009s : 43: predicate.switch_simplify 1.24% : 0.000003s : 9: predicate.tile_eliminate 0.83% : 0.000002s : 9: predicate.transpose_eliminate 1.49% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.67% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.53% : 0.000003s : 15: predicate.tuple_list_get_item_depend_reorder 3.84% : 0.000008s : 22: predicate.tuple_list_get_item_eliminator 1.29% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.38% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 1.88% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.56% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 3: predicate.value_based_eliminate 0.71% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.75% : 0.000002s : 6: predicate.virtual_output_eliminate 0.26% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.42% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000611 8 41.95% : 0.000256s : 3: func_graph_cloner_run.FuncGraphClonerGraph 58.05% : 0.000355s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 147.681606 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.00% : 0.005594s : 1: add_attr 0.00% : 0.005572s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000077s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.00% : 0.000075s : 1: auto_monad 0.00% : 0.000031s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.00% : 0.000801s : 1: bootstrap 0.00% : 0.000046s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000022s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000032s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000011s : 1: environ_conv 0.00% : 0.000025s : 1: event_method 0.00% : 0.000025s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000012s : 1: label_micro_interleaved_index 0.00% : 0.000609s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.00% : 0.000861s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.00% : 0.000022s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000026s : 1: opt.transform.mutable_eliminate 0.00% : 0.001200s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000039s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000096s : 28: opt.transform.opt_b 0.00% : 0.000056s : 2: opt.transform.opt_trans_graph 0.00% : 0.000046s : 4: opt.transform.symbol_engine_opt 0.11% : 0.168911s : 1: opt_a 0.00% : 0.000131s : 1: opt_after_cconv 0.00% : 0.000813s : 1: opt_after_jit_grad 0.00% : 0.000241s : 1: opt_b 0.12% : 0.171810s : 1: optimize 0.00% : 0.000033s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000035s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000006s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000007s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000006s : 1: pipeline_split 0.00% : 0.000045s : 1: pre_auto_parallel 0.00% : 0.000039s : 1: py_interpret_to_execute 0.00% : 0.000025s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000005s : 1: remove_cast_before_assign_add 0.00% : 0.000023s : 1: remove_dup_value 0.11% : 0.166008s : 1: renormalize.infer 0.00% : 0.000522s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000011s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000069s : 1: rewriter_after_opt_a 0.00% : 0.000092s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000106s : 1: symbol_engine_optimizer 99.64% : 147.147668s : 1: task_emit 0.00% : 0.000170s : 1: tuple_transform 0.01% : 0.009179s : 1: type_inference 0.00% : 0.000102s : 1: validate TotalTime = 1.09107, [24] [bootstrap]: 0.0005923 [type_inference]: 0.0914971 [event_method]: 1.524e-05 [auto_monad]: 0.00010489 [graph_reusing]: 6.39001e-06 [inline]: 3.06001e-06 [add_attr]: 0.00521194, [1] [add_attr_with_inline]: 0.00519706, [1] [Cycle 1]: 6.749e-05, [2] [tag_attr]: 1.867e-05 [meta_addattr_fg_expand]: 4.64998e-06 [parallel-infer-symbol]: 3.34001e-06 [pre_auto_parallel]: 3.494e-05 [insert-virtual-dataset]: 2.58998e-06 [parallel-infer-symbol-second]: 8.40024e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.72001e-06 [optimize]: 0.00602392, [53] [py_interpret_to_execute]: 2.925e-05 [rewriter_before_opt_a]: 6.805e-05 [opt_a]: 0.0033971, [2] [Cycle 1]: 0.00213028, [45] [expand_dump_flag]: 3.26999e-06 [switch_simplify]: 3.34e-05 [loop_unroll]: 1.79e-05 [a_1]: 0.0004225 [with_stream_mark]: 2.509e-05 [recompute_prepare]: 1.089e-05 [updatestate_depend_eliminate]: 5.28002e-06 [updatestate_assign_eliminate]: 3.61999e-06 [updatestate_loads_eliminate]: 3.43999e-06 [parameter_eliminate]: 2.12999e-06 [a_2]: 8.907e-05 [accelerated_algorithm]: 9.25999e-06 [shard]: 3.08e-06 [meta_shard_fg_expand]: 2.29001e-06 [shard_inline]: 7.13e-06 [merge_send_recv]: 9.56e-06 [auto_parallel]: 9.19998e-06 [parallel]: 5.107e-05 [flash_sp]: 1.224e-05 [merge_comm]: 5.62999e-06 [allreduce_fusion]: 4.1e-06 [matmul_add_comm_reduction]: 1.143e-05 [allreduce_slice_to_reducescatter]: 1.02e-06 [virtual_shard_identity]: 1.11e-05 [virtual_dataset]: 6.98e-06 [get_grad_eliminate_]: 6.44999e-06 [virtual_output]: 6.55002e-06 [merge_forward]: 4.93001e-06 [cell_reuse_recompute_pass]: 1.72999e-06 [offload_activation]: 1.153e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.585e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 1.204e-05 [set_forward_comm_id_for_comm_node_pass]: 4.31002e-06 [meta_fg_expand]: 3.4e-06 [flash_sp_send_recv_attached]: 3.66999e-06 [receive_attached]: 2.06998e-06 [after_resolve]: 1.203e-05 [a_after_grad]: 1.071e-05 [renormalize]: 0.00082644 [add_forward_monad_depend]: 3.397e-05 [auto_monad_grad]: 2.54999e-06 [auto_monad_eliminator]: 1.961e-05 [cse]: 3.589e-05 [a_3]: 5.444e-05 [Cycle 2]: 0.00125299, [45] [expand_dump_flag]: 1.91998e-06 [switch_simplify]: 8.61002e-06 [loop_unroll]: 6.39001e-06 [a_1]: 0.00012799 [with_stream_mark]: 1.591e-05 [recompute_prepare]: 7.87e-06 [updatestate_depend_eliminate]: 3.76999e-06 [updatestate_assign_eliminate]: 2.93e-06 [updatestate_loads_eliminate]: 3.35e-06 [parameter_eliminate]: 1.53002e-06 [a_2]: 7.455e-05 [accelerated_algorithm]: 7.31001e-06 [shard]: 2.43998e-06 [meta_shard_fg_expand]: 2.22001e-06 [shard_inline]: 7.424e-05 [merge_send_recv]: 7.51999e-06 [auto_parallel]: 9.33002e-06 [parallel]: 9.14e-06 [flash_sp]: 3.88999e-06 [merge_comm]: 4.25e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 8.37e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 8.84e-06 [virtual_dataset]: 6.65998e-06 [get_grad_eliminate_]: 6.64999e-06 [virtual_output]: 5.49e-06 [merge_forward]: 4.18999e-06 [cell_reuse_recompute_pass]: 2.16e-06 [offload_activation]: 1.145e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.447e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 1.127e-05 [set_forward_comm_id_for_comm_node_pass]: 3.86999e-06 [meta_fg_expand]: 2.67001e-06 [flash_sp_send_recv_attached]: 1.07e-06 [receive_attached]: 1.89999e-06 [after_resolve]: 1.027e-05 [a_after_grad]: 9.37001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 2.89001e-06 [auto_monad_grad]: 1.85001e-06 [auto_monad_eliminator]: 1.101e-05 [cse]: 3.02e-05 [a_3]: 0.00045838 [py_interpret_to_execute_after_opt_a]: 2.17e-05 [slice_cell_reuse_recomputed_activation]: 2.98e-06 [rewriter_after_opt_a]: 5.746e-05 [convert_after_rewriter]: 8.50999e-06 [order_py_execute_after_rewriter]: 5.37001e-06 [mutable_eliminate]: 0.00077865 [opt_b]: 0.00023309, [1] [Cycle 1]: 0.0002235, [7] [b_1]: 0.00012025 [b_2]: 8.92e-06 [updatestate_depend_eliminate]: 9.86e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 3.19001e-06 [renormalize]: 7.89994e-07 [cse]: 3.554e-05 [optimize_parallel_all_gather_comm]: 2.389e-05 [overlap_param_gather]: 2.08002e-06 [cconv]: 3.691e-05 [loop_unroll]: 0.00054737 [opt_after_cconv]: 0.00012776, [1] [Cycle 1]: 0.00011972, [7] [c_1]: 2.968e-05 [parameter_eliminate]: 6.95002e-06 [updatestate_depend_eliminate]: 9.15001e-06 [updatestate_assign_eliminate]: 3.19001e-06 [updatestate_loads_eliminate]: 2.70002e-06 [cse]: 2.835e-05 [renormalize]: 5.19998e-07 [remove_dup_value]: 1.774e-05 [tuple_transform]: 8.361e-05, [1] [Cycle 1]: 7.84e-05, [4] [d_1]: 4.601e-05 [none_parameter_eliminate]: 2.28998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 8.19002e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 5.673e-05 [cse_after_recomputation]: 3.047e-05, [1] [Cycle 1]: 2.449e-05, [1] [cse]: 1.662e-05 [environ_conv]: 7.05002e-06 [swap_dp_allreduce_reducescatter]: 5.77001e-06 [bias_add_comm_swap]: 3.77998e-06 [label_micro_interleaved_index]: 6.41998e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.59e-06 [slice_recompute_activation]: 2.51e-06 [micro_interleaved_order_control]: 2.48998e-06 [assign_add_opt]: 2.26e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.21998e-06 [reorder_send_recv_between_fp_bp]: 2.59999e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.42e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.54998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.616e-05 [grouped_pairwise_exchange_alltoall]: 2.24001e-06 [offloading_packed_experts]: 4.17e-06 [overlap_recompute_and_grad_model_parallel]: 5.99999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.54999e-06 [overlap_grad_ring_attention]: 5.14998e-06 [overlap_grad_flash_sp]: 2.383e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 9.993e-05, [1] [Cycle 1]: 9.113e-05, [6] [build]: 4.95001e-06 [elim_shapecalc]: 1.685e-05 [elim_not_effective]: 1.563e-05 [opt_reshape]: 6.74001e-06 [fold_const_symbol]: 1.08e-05 [renormalize]: 7.59988e-07 [detach_backward]: 3.32002e-06 [pipeline_parallel_scheduler]: 1.67001e-06 [auto_monad_reorder]: 2.369e-05 [get_jit_bprop_graph]: 2.06998e-06 [rewriter_after_jit_bprop_graph]: 5.94e-06 [opt_after_jit_grad]: 0.00064449 [validate]: 5.154e-05 [backend_pass]: 1.49998e-06 [task_emit]: 0.98654 [execute]: 1.115e-05 Sums bootstrap : 0.000592s : 0.05% type_inference : 0.091497s : 8.44% event_method : 0.000015s : 0.00% auto_monad : 0.000105s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000019s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000035s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000029s : 0.00% optimize.rewriter_before_opt_a : 0.000068s : 0.01% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000042s : 0.00% optimize.opt_a.loop_unroll : 0.000024s : 0.00% optimize.opt_a.a_1 : 0.000550s : 0.05% optimize.opt_a.with_stream_mark : 0.000041s : 0.00% optimize.opt_a.recompute_prepare : 0.000019s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000164s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000017s : 0.00% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000005s : 0.00% optimize.opt_a.shard_inline : 0.000081s : 0.01% optimize.opt_a.merge_send_recv : 0.000017s : 0.00% optimize.opt_a.auto_parallel : 0.000019s : 0.00% optimize.opt_a.parallel : 0.000060s : 0.01% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000010s : 0.00% optimize.opt_a.allreduce_fusion : 0.000008s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000020s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000020s : 0.00% optimize.opt_a.virtual_dataset : 0.000014s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000013s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000009s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000023s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000030s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000023s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000006s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.00% optimize.opt_a.a_after_grad : 0.000020s : 0.00% optimize.opt_a.renormalize : 0.000827s : 0.08% optimize.opt_a.add_forward_monad_depend : 0.000037s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000031s : 0.00% optimize.opt_a.cse : 0.000066s : 0.01% optimize.opt_a.a_3 : 0.000513s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000022s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000057s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000779s : 0.07% optimize.opt_b.b_1 : 0.000120s : 0.01% optimize.opt_b.b_2 : 0.000009s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000036s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000037s : 0.00% optimize.loop_unroll : 0.000547s : 0.05% optimize.opt_after_cconv.c_1 : 0.000030s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000028s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000018s : 0.00% optimize.tuple_transform.d_1 : 0.000046s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.01% optimize.cse_after_recomputation.cse : 0.000017s : 0.00% optimize.environ_conv : 0.000007s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000005s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000017s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000001s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000644s : 0.06% validate : 0.000052s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.986540s : 90.96% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.000197 24 20.09% : 0.000040s : 4: substitution.arithmetic_simplify 1.29% : 0.000003s : 2: substitution.elim_not_effective 0.81% : 0.000002s : 2: substitution.fold_const_symbol 3.24% : 0.000006s : 3: substitution.graph_param_transform 67.65% : 0.000133s : 3: substitution.inline 2.27% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.65% : 0.000005s : 4: substitution.remove_not_recompute_node 1.99% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.091438 2 99.32% : 0.090819s : 1: type_inference.infer 0.68% : 0.000618s : 1: type_inference.specialize ------[replace.] 0.000034 3 100.00% : 0.000034s : 3: replace.inline ------[match.] 0.000130 3 100.00% : 0.000130s : 3: match.inline ------[predicate.] 0.000182 815 0.83% : 0.000002s : 8: predicate.accumulaten_eliminater 1.58% : 0.000003s : 3: predicate.ad_related_special_op_eliminate 0.52% : 0.000001s : 6: predicate.addn_check_dump 0.83% : 0.000002s : 8: predicate.addn_zero_filter 0.76% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.47% : 0.000004s : 14: predicate.arithmetic_simplify 0.84% : 0.000002s : 8: predicate.cast_eliminate 0.61% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.84% : 0.000002s : 8: predicate.dict_get_item_const_eliminator 0.79% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.45% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.44% : 0.000001s : 3: predicate.elim_not_effective 0.88% : 0.000002s : 3: predicate.elim_shapecalc_of_broadcastargs 1.04% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.23% : 0.000002s : 11: predicate.environ_get_add_eliminate 0.87% : 0.000002s : 11: predicate.environ_get_depend_swap 1.46% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.07% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.04% : 0.000004s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 6: predicate.float_environ_get_switch 0.77% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.30% : 0.000001s : 3: predicate.fold_const_symbol 0.82% : 0.000001s : 6: predicate.get_grad_eliminate 0.34% : 0.000001s : 3: predicate.graph_param_transform 0.59% : 0.000001s : 6: predicate.incorporate_call 0.50% : 0.000001s : 6: predicate.incorporate_call_switch 6.08% : 0.000011s : 37: predicate.inline 1.55% : 0.000003s : 6: predicate.inline_without_move 0.34% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.00% : 0.000002s : 6: predicate.less_batch_normalization 1.44% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.00% : 0.000004s : 22: predicate.load_eliminater 2.01% : 0.000004s : 3: predicate.loop_unroll_after_grad 1.73% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.44% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.78% : 0.000001s : 6: predicate.merge_addn 0.55% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 8: predicate.minmaximum_grad 2.09% : 0.000004s : 3: predicate.mutable_eliminate 0.39% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.49% : 0.000003s : 11: predicate.partial_defer_inline 1.03% : 0.000002s : 11: predicate.partial_eliminate 0.83% : 0.000002s : 8: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.17% : 0.000002s : 8: predicate.reduce_eliminate 2.01% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.82% : 0.000001s : 6: predicate.remove_not_recompute_node 1.32% : 0.000002s : 14: predicate.replace_applicator 0.51% : 0.000001s : 6: predicate.replace_old_param 0.65% : 0.000001s : 3: predicate.reset_defer_inline 0.79% : 0.000001s : 8: predicate.reshape_eliminate 0.66% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 3: predicate.row_tensor_eliminate 0.83% : 0.000002s : 6: predicate.same_eliminate 0.67% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.44% : 0.000003s : 6: predicate.shard_identity_eliminate 0.93% : 0.000002s : 6: predicate.special_op_eliminate 0.84% : 0.000002s : 6: predicate.specialize_transform 1.79% : 0.000003s : 6: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.09% : 0.000002s : 11: predicate.switch_defer_inline 1.81% : 0.000003s : 17: predicate.switch_layer_defer_inline 5.26% : 0.000010s : 38: predicate.switch_simplify 0.72% : 0.000001s : 8: predicate.tile_eliminate 0.77% : 0.000001s : 8: predicate.transpose_eliminate 1.30% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.49% : 0.000003s : 14: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.23% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.28% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 1.83% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.54% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 3: predicate.value_based_eliminate 0.94% : 0.000002s : 6: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000415 7 32.57% : 0.000135s : 2: func_graph_cloner_run.FuncGraphClonerGraph 67.43% : 0.000280s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.104766 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.47% : 0.005219s : 1: add_attr 0.47% : 0.005201s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000063s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.01% : 0.000111s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.06% : 0.000632s : 1: bootstrap 0.00% : 0.000042s : 1: cconv 0.00% : 0.000005s : 1: comm_op_add_attrs 0.00% : 0.000021s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000034s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.00% : 0.000010s : 1: environ_conv 0.00% : 0.000022s : 1: event_method 0.00% : 0.000020s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000011s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000006s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000005s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000010s : 1: label_micro_interleaved_index 0.05% : 0.000563s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000007s : 1: micro_interleaved_order_control 0.07% : 0.000796s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000022s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000026s : 1: opt.transform.mutable_eliminate 0.13% : 0.001454s : 78: opt.transform.opt_a 0.00% : 0.000028s : 1: opt.transform.opt_after_cconv 0.00% : 0.000032s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000098s : 28: opt.transform.opt_b 0.00% : 0.000051s : 2: opt.transform.opt_trans_graph 0.00% : 0.000045s : 4: opt.transform.symbol_engine_opt 0.31% : 0.003402s : 1: opt_a 0.01% : 0.000132s : 1: opt_after_cconv 0.06% : 0.000662s : 1: opt_after_jit_grad 0.02% : 0.000237s : 1: opt_b 0.55% : 0.006030s : 1: optimize 0.00% : 0.000029s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000040s : 1: pre_auto_parallel 0.00% : 0.000033s : 1: py_interpret_to_execute 0.00% : 0.000027s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000022s : 1: remove_dup_value 0.04% : 0.000454s : 1: renormalize.infer 0.03% : 0.000361s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000062s : 1: rewriter_after_opt_a 0.01% : 0.000074s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000103s : 1: symbol_engine_optimizer 89.30% : 0.986568s : 1: task_emit 0.01% : 0.000087s : 1: tuple_transform 8.28% : 0.091525s : 1: type_inference 0.01% : 0.000088s : 1: validate TotalTime = 0.761146, [24] [bootstrap]: 0.00043317 [type_inference]: 0.0435898 [event_method]: 1.663e-05 [auto_monad]: 6.31e-05 [graph_reusing]: 5.60001e-06 [inline]: 2.58e-06 [add_attr]: 0.00375747, [1] [add_attr_with_inline]: 0.00374338, [1] [Cycle 1]: 6.913e-05, [2] [tag_attr]: 2.091e-05 [meta_addattr_fg_expand]: 4.42e-06 [parallel-infer-symbol]: 3.87002e-06 [pre_auto_parallel]: 3.608e-05 [insert-virtual-dataset]: 2.94001e-06 [parallel-infer-symbol-second]: 1.45999e-06 [dataset_repeat_opt]: 2.24001e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.0422442, [53] [py_interpret_to_execute]: 3.031e-05 [rewriter_before_opt_a]: 7.695e-05 [opt_a]: 0.00282726, [2] [Cycle 1]: 0.00210595, [45] [expand_dump_flag]: 3.28e-06 [switch_simplify]: 3.642e-05 [loop_unroll]: 2.167e-05 [a_1]: 0.00051283 [with_stream_mark]: 1.763e-05 [recompute_prepare]: 1.011e-05 [updatestate_depend_eliminate]: 4.23001e-06 [updatestate_assign_eliminate]: 3.94002e-06 [updatestate_loads_eliminate]: 3.01001e-06 [parameter_eliminate]: 1.70001e-06 [a_2]: 8.557e-05 [accelerated_algorithm]: 7.49002e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.87001e-06 [shard_inline]: 6.61e-06 [merge_send_recv]: 1e-05 [auto_parallel]: 8.43999e-06 [parallel]: 2.061e-05 [flash_sp]: 9.64e-06 [merge_comm]: 4.25e-06 [allreduce_fusion]: 3.83999e-06 [matmul_add_comm_reduction]: 1.115e-05 [allreduce_slice_to_reducescatter]: 9.70002e-07 [virtual_shard_identity]: 8.64e-06 [virtual_dataset]: 6.34999e-06 [get_grad_eliminate_]: 6.61999e-06 [virtual_output]: 6.43e-06 [merge_forward]: 4.77e-06 [cell_reuse_recompute_pass]: 1.77999e-06 [offload_activation]: 1.145e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.35e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 1.075e-05 [set_forward_comm_id_for_comm_node_pass]: 3.83001e-06 [meta_fg_expand]: 3.43999e-06 [flash_sp_send_recv_attached]: 2.96001e-06 [receive_attached]: 2.73003e-06 [after_resolve]: 1.067e-05 [a_after_grad]: 9.35001e-06 [renormalize]: 0.00082464 [add_forward_monad_depend]: 6.41e-06 [auto_monad_grad]: 3.13998e-06 [auto_monad_eliminator]: 1.839e-05 [cse]: 3.602e-05 [a_3]: 5.174e-05 [Cycle 2]: 0.00070815, [45] [expand_dump_flag]: 1.66998e-06 [switch_simplify]: 7.84997e-06 [loop_unroll]: 5.92001e-06 [a_1]: 0.00013834 [with_stream_mark]: 1.484e-05 [recompute_prepare]: 7.06999e-06 [updatestate_depend_eliminate]: 3.99002e-06 [updatestate_assign_eliminate]: 2.68998e-06 [updatestate_loads_eliminate]: 3.31001e-06 [parameter_eliminate]: 1.13001e-06 [a_2]: 7.841e-05 [accelerated_algorithm]: 7.2e-06 [shard]: 1.40001e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 6.51999e-06 [merge_send_recv]: 7.82002e-06 [auto_parallel]: 7.65998e-06 [parallel]: 8.2e-06 [flash_sp]: 3.95e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.44001e-06 [matmul_add_comm_reduction]: 7.92e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 7.91001e-06 [virtual_dataset]: 6.07999e-06 [get_grad_eliminate_]: 5.54998e-06 [virtual_output]: 5.51998e-06 [merge_forward]: 3.66001e-06 [cell_reuse_recompute_pass]: 2.32001e-06 [offload_activation]: 9.30001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.302e-05 [merge_recompute_call_nodes]: 1.14998e-06 [before_grad]: 1.089e-05 [set_forward_comm_id_for_comm_node_pass]: 4.1e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 1.20999e-06 [receive_attached]: 2.18002e-06 [after_resolve]: 1.13e-05 [a_after_grad]: 8.47e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.68002e-06 [auto_monad_grad]: 1.81998e-06 [auto_monad_eliminator]: 8.15e-06 [cse]: 1.894e-05 [a_3]: 3.576e-05 [py_interpret_to_execute_after_opt_a]: 1.238e-05 [slice_cell_reuse_recomputed_activation]: 2.41e-06 [rewriter_after_opt_a]: 4.34e-05 [convert_after_rewriter]: 7.88001e-06 [order_py_execute_after_rewriter]: 5.25001e-06 [mutable_eliminate]: 0.037286 [opt_b]: 0.00028933, [1] [Cycle 1]: 0.00027633, [7] [b_1]: 0.00014969 [b_2]: 1.019e-05 [updatestate_depend_eliminate]: 1.398e-05 [updatestate_assign_eliminate]: 3.6e-06 [updatestate_loads_eliminate]: 3.78999e-06 [renormalize]: 9.10019e-07 [cse]: 4.948e-05 [optimize_parallel_all_gather_comm]: 2.77e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 4.201e-05 [loop_unroll]: 0.00073479 [opt_after_cconv]: 0.0001441, [1] [Cycle 1]: 0.00013498, [7] [c_1]: 3.613e-05 [parameter_eliminate]: 7.11999e-06 [updatestate_depend_eliminate]: 8.85001e-06 [updatestate_assign_eliminate]: 3.32002e-06 [updatestate_loads_eliminate]: 3.06001e-06 [cse]: 3.435e-05 [renormalize]: 1.11002e-06 [remove_dup_value]: 2.059e-05 [tuple_transform]: 9.16e-05, [1] [Cycle 1]: 8.429e-05, [4] [d_1]: 5.282e-05 [none_parameter_eliminate]: 2.27999e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 7.18e-06 [partial_unused_args_eliminate]: 2.36998e-06 [add_recomputation]: 6.151e-05 [cse_after_recomputation]: 2.93e-05, [1] [Cycle 1]: 2.344e-05, [1] [cse]: 1.666e-05 [environ_conv]: 8.18999e-06 [swap_dp_allreduce_reducescatter]: 5.79e-06 [bias_add_comm_swap]: 3.75e-06 [label_micro_interleaved_index]: 8.07e-06 [label_fine_grained_interleaved_index]: 3.03e-06 [merge_cast_opt]: 1.78002e-06 [slice_recompute_activation]: 2.67001e-06 [micro_interleaved_order_control]: 2.64001e-06 [assign_add_opt]: 1.46002e-06 [ForceFp32Comm]: 9.89996e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.31998e-06 [reorder_send_recv_between_fp_bp]: 2.93e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 1.32e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.22e-06 [overlap_opt_shard_in_pipeline]: 1.40001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.703e-05 [grouped_pairwise_exchange_alltoall]: 2.11998e-06 [offloading_packed_experts]: 4.86002e-06 [overlap_recompute_and_grad_model_parallel]: 5.54e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.81e-06 [overlap_grad_ring_attention]: 4.57e-06 [overlap_grad_flash_sp]: 2.49e-05 [begin_end_overlap_inline]: 5.89993e-07 [split_matmul_comm_elemetwise]: 2.59999e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 1.04e-06 [symbol_engine_optimizer]: 9.105e-05, [1] [Cycle 1]: 8.56e-05, [6] [build]: 4.46002e-06 [elim_shapecalc]: 1.363e-05 [elim_not_effective]: 1.555e-05 [opt_reshape]: 7.17997e-06 [fold_const_symbol]: 1.136e-05 [renormalize]: 2.89991e-07 [detach_backward]: 2.64001e-06 [pipeline_parallel_scheduler]: 1.67001e-06 [auto_monad_reorder]: 1.949e-05 [get_jit_bprop_graph]: 2.34999e-06 [rewriter_after_jit_bprop_graph]: 7.31001e-06 [opt_after_jit_grad]: 0.00078068 [validate]: 5.665e-05 [backend_pass]: 1.12e-06 [task_emit]: 0.669828 [execute]: 9.04998e-06 Sums bootstrap : 0.000433s : 0.06% type_inference : 0.043590s : 5.76% event_method : 0.000017s : 0.00% auto_monad : 0.000063s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000021s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000036s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000030s : 0.00% optimize.rewriter_before_opt_a : 0.000077s : 0.01% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000044s : 0.01% optimize.opt_a.loop_unroll : 0.000028s : 0.00% optimize.opt_a.a_1 : 0.000651s : 0.09% optimize.opt_a.with_stream_mark : 0.000032s : 0.00% optimize.opt_a.recompute_prepare : 0.000017s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000164s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000015s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000018s : 0.00% optimize.opt_a.auto_parallel : 0.000016s : 0.00% optimize.opt_a.parallel : 0.000029s : 0.00% optimize.opt_a.flash_sp : 0.000014s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000017s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000021s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000027s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000022s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000006s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000825s : 0.11% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.00% optimize.opt_a.auto_monad_grad : 0.000005s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000027s : 0.00% optimize.opt_a.cse : 0.000055s : 0.01% optimize.opt_a.a_3 : 0.000087s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000043s : 0.01% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.037286s : 4.93% optimize.opt_b.b_1 : 0.000150s : 0.02% optimize.opt_b.b_2 : 0.000010s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000014s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000049s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000028s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000042s : 0.01% optimize.loop_unroll : 0.000735s : 0.10% optimize.opt_after_cconv.c_1 : 0.000036s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000034s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000021s : 0.00% optimize.tuple_transform.d_1 : 0.000053s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000062s : 0.01% optimize.cse_after_recomputation.cse : 0.000017s : 0.00% optimize.environ_conv : 0.000008s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000008s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000019s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000007s : 0.00% opt_after_jit_grad : 0.000781s : 0.10% validate : 0.000057s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.669828s : 88.58% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000224 26 18.38% : 0.000041s : 5: substitution.arithmetic_simplify 1.34% : 0.000003s : 2: substitution.elim_not_effective 0.66% : 0.000001s : 2: substitution.fold_const_symbol 3.05% : 0.000007s : 3: substitution.graph_param_transform 65.99% : 0.000148s : 3: substitution.inline 1.72% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.35% : 0.000005s : 4: substitution.remove_not_recompute_node 1.96% : 0.000004s : 2: substitution.replace_old_param 4.56% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.043532 2 98.20% : 0.042748s : 1: type_inference.infer 1.80% : 0.000783s : 1: type_inference.specialize ------[replace.] 0.000044 4 77.87% : 0.000035s : 3: replace.inline 22.13% : 0.000010s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000155 4 93.93% : 0.000146s : 3: match.inline 6.07% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000200 883 0.81% : 0.000002s : 9: predicate.accumulaten_eliminater 1.56% : 0.000003s : 3: predicate.ad_related_special_op_eliminate 0.53% : 0.000001s : 6: predicate.addn_check_dump 1.00% : 0.000002s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 1.96% : 0.000004s : 15: predicate.arithmetic_simplify 0.81% : 0.000002s : 9: predicate.cast_eliminate 0.60% : 0.000001s : 6: predicate.check_bprop_eliminate 0.63% : 0.000001s : 6: predicate.compare_switch_simplify 0.17% : 0.000000s : 3: predicate.const_output_eliminate 0.54% : 0.000001s : 6: predicate.depend_value_elim 0.75% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.92% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.21% : 0.000000s : 3: predicate.elim_not_effective 0.46% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.14% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 12: predicate.environ_get_depend_swap 1.48% : 0.000003s : 18: predicate.environ_get_eliminate 1.01% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.12% : 0.000002s : 13: predicate.exchange_switch_depend_value 1.95% : 0.000004s : 13: predicate.float_depend_g_call 0.53% : 0.000001s : 6: predicate.float_environ_get_switch 1.18% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 3: predicate.fold_const_symbol 0.73% : 0.000001s : 6: predicate.get_grad_eliminate 0.29% : 0.000001s : 3: predicate.graph_param_transform 0.50% : 0.000001s : 6: predicate.incorporate_call 0.46% : 0.000001s : 6: predicate.incorporate_call_switch 5.81% : 0.000012s : 40: predicate.inline 0.81% : 0.000002s : 6: predicate.inline_without_move 0.31% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.99% : 0.000002s : 6: predicate.less_batch_normalization 1.99% : 0.000004s : 16: predicate.list_to_tuple_eliminator_ 2.34% : 0.000005s : 25: predicate.load_eliminater 1.75% : 0.000004s : 3: predicate.loop_unroll_after_grad 1.79% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.52% : 0.000001s : 6: predicate.merge_addn 0.57% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.57% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 2.53% : 0.000005s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.48% : 0.000003s : 13: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000002s : 9: predicate.print_const_string_wrapper 0.53% : 0.000001s : 6: predicate.reduce_all_const_elim 1.19% : 0.000002s : 9: predicate.reduce_eliminate 2.36% : 0.000005s : 25: predicate.redundant_stop_gradient_eliminater 0.38% : 0.000001s : 6: predicate.remove_not_recompute_node 1.06% : 0.000002s : 16: predicate.replace_applicator 0.58% : 0.000001s : 6: predicate.replace_old_param 0.56% : 0.000001s : 3: predicate.reset_defer_inline 1.02% : 0.000002s : 9: predicate.reshape_eliminate 0.74% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.69% : 0.000001s : 3: predicate.row_tensor_eliminate 0.94% : 0.000002s : 6: predicate.same_eliminate 0.36% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.10% : 0.000002s : 6: predicate.shard_identity_eliminate 1.04% : 0.000002s : 6: predicate.special_op_eliminate 0.67% : 0.000001s : 6: predicate.specialize_transform 0.99% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.64% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.51% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.54% : 0.000003s : 13: predicate.switch_defer_inline 1.75% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.36% : 0.000009s : 43: predicate.switch_simplify 0.78% : 0.000002s : 9: predicate.tile_eliminate 0.88% : 0.000002s : 9: predicate.transpose_eliminate 1.65% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.43% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.70% : 0.000003s : 15: predicate.tuple_list_get_item_depend_reorder 3.46% : 0.000007s : 22: predicate.tuple_list_get_item_eliminator 1.95% : 0.000004s : 15: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.52% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.13% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.94% : 0.000006s : 31: predicate.updatestate_useless_node_eliminater 0.59% : 0.000001s : 3: predicate.value_based_eliminate 0.70% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000481 8 41.04% : 0.000197s : 3: func_graph_cloner_run.FuncGraphClonerGraph 58.96% : 0.000283s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.809263 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.47% : 0.003764s : 1: add_attr 0.46% : 0.003749s : 1: add_attr_with_inline 0.00% : 0.000005s : 1: add_comm_op_reuse_tag 0.01% : 0.000066s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000068s : 1: auto_monad 0.00% : 0.000025s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.06% : 0.000471s : 1: bootstrap 0.01% : 0.000046s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000022s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000033s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.00% : 0.000024s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000005s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000011s : 1: label_micro_interleaved_index 0.09% : 0.000753s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 4.61% : 0.037318s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000023s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000040s : 1: opt.transform.mutable_eliminate 0.13% : 0.001063s : 78: opt.transform.opt_a 0.00% : 0.000034s : 1: opt.transform.opt_after_cconv 0.00% : 0.000039s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000115s : 28: opt.transform.opt_b 0.01% : 0.000058s : 2: opt.transform.opt_trans_graph 0.01% : 0.000043s : 4: opt.transform.symbol_engine_opt 0.35% : 0.002832s : 1: opt_a 0.02% : 0.000148s : 1: opt_after_cconv 0.10% : 0.000804s : 1: opt_after_jit_grad 0.04% : 0.000295s : 1: opt_b 5.22% : 0.042251s : 1: optimize 0.00% : 0.000032s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000005s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000041s : 1: pre_auto_parallel 0.00% : 0.000035s : 1: py_interpret_to_execute 0.00% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000025s : 1: remove_dup_value 0.05% : 0.000434s : 1: renormalize.infer 0.05% : 0.000381s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000011s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000049s : 1: rewriter_after_opt_a 0.01% : 0.000081s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000094s : 1: symbol_engine_optimizer 82.77% : 0.669854s : 1: task_emit 0.01% : 0.000095s : 1: tuple_transform 5.39% : 0.043612s : 1: type_inference 0.01% : 0.000095s : 1: validate TotalTime = 0.940991, [24] [bootstrap]: 0.00049235 [type_inference]: 0.0132793 [event_method]: 7.098e-05 [auto_monad]: 0.00015509 [graph_reusing]: 1.006e-05 [inline]: 2.81999e-06 [add_attr]: 0.00400788, [1] [add_attr_with_inline]: 0.00399673, [1] [Cycle 1]: 9.666e-05, [2] [tag_attr]: 4.339e-05 [meta_addattr_fg_expand]: 1.144e-05 [parallel-infer-symbol]: 3.55e-06 [pre_auto_parallel]: 5.789e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 8.29983e-07 [dataset_repeat_opt]: 2.07001e-06 [pipeline_split]: 1.70001e-06 [optimize]: 0.022249, [53] [py_interpret_to_execute]: 4.257e-05 [rewriter_before_opt_a]: 0.000179 [opt_a]: 0.0195726, [3] [Cycle 1]: 0.0150362, [45] [expand_dump_flag]: 5.53002e-06 [switch_simplify]: 7.978e-05 [loop_unroll]: 6.452e-05 [a_1]: 0.00158249 [with_stream_mark]: 2.851e-05 [recompute_prepare]: 2.544e-05 [updatestate_depend_eliminate]: 9.05999e-06 [updatestate_assign_eliminate]: 7.96001e-06 [updatestate_loads_eliminate]: 7.69002e-06 [parameter_eliminate]: 3.13e-06 [a_2]: 0.00025331 [accelerated_algorithm]: 3.282e-05 [shard]: 2.59999e-06 [meta_shard_fg_expand]: 5.03002e-06 [shard_inline]: 1.758e-05 [merge_send_recv]: 1.796e-05 [auto_parallel]: 1.296e-05 [parallel]: 2.027e-05 [flash_sp]: 1.397e-05 [merge_comm]: 9.86e-06 [allreduce_fusion]: 8.75999e-06 [matmul_add_comm_reduction]: 3.009e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.924e-05 [virtual_dataset]: 1.595e-05 [get_grad_eliminate_]: 1.594e-05 [virtual_output]: 1.53e-05 [merge_forward]: 9.82999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 1.934e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.139e-05 [merge_recompute_call_nodes]: 1.55001e-06 [before_grad]: 2.995e-05 [set_forward_comm_id_for_comm_node_pass]: 1.032e-05 [meta_fg_expand]: 0.00197562 [flash_sp_send_recv_attached]: 4.52998e-06 [receive_attached]: 2.46e-06 [after_resolve]: 7.535e-05 [a_after_grad]: 9.983e-05 [renormalize]: 0.00931432 [add_forward_monad_depend]: 1.461e-05 [auto_monad_grad]: 8.23001e-06 [auto_monad_eliminator]: 6.114e-05 [cse]: 0.00033255 [a_3]: 0.00037156 [Cycle 2]: 0.00375646, [45] [expand_dump_flag]: 2.99999e-06 [switch_simplify]: 5.16e-05 [loop_unroll]: 4.527e-05 [a_1]: 0.00158624 [with_stream_mark]: 2.193e-05 [recompute_prepare]: 1.606e-05 [updatestate_depend_eliminate]: 5.29998e-06 [updatestate_assign_eliminate]: 4.18999e-06 [updatestate_loads_eliminate]: 3.38999e-06 [parameter_eliminate]: 2.88998e-06 [a_2]: 9.913e-05 [accelerated_algorithm]: 1.28e-05 [shard]: 2.46e-06 [meta_shard_fg_expand]: 3.14001e-06 [shard_inline]: 7.35998e-06 [merge_send_recv]: 1.129e-05 [auto_parallel]: 1.042e-05 [parallel]: 1.058e-05 [flash_sp]: 4.82998e-06 [merge_comm]: 4.58001e-06 [allreduce_fusion]: 4.27e-06 [matmul_add_comm_reduction]: 1.187e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.322e-05 [virtual_dataset]: 7.83001e-06 [get_grad_eliminate_]: 7.18e-06 [virtual_output]: 7.08e-06 [merge_forward]: 5.54e-06 [cell_reuse_recompute_pass]: 1.74e-06 [offload_activation]: 1.242e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.082e-05 [merge_recompute_call_nodes]: 1.61998e-06 [before_grad]: 1.494e-05 [set_forward_comm_id_for_comm_node_pass]: 5.62999e-06 [meta_fg_expand]: 0.00015516 [flash_sp_send_recv_attached]: 2.34999e-06 [receive_attached]: 2.85998e-06 [after_resolve]: 2.07e-05 [a_after_grad]: 1.177e-05 [renormalize]: 0.00104792 [add_forward_monad_depend]: 6.46e-06 [auto_monad_grad]: 3.02002e-06 [auto_monad_eliminator]: 1.707e-05 [cse]: 3.797e-05 [a_3]: 5.447e-05 [Cycle 3]: 0.00075746, [45] [expand_dump_flag]: 1.71e-06 [switch_simplify]: 8.62998e-06 [loop_unroll]: 6.74999e-06 [a_1]: 0.00016339 [with_stream_mark]: 1.202e-05 [recompute_prepare]: 7.5e-06 [updatestate_depend_eliminate]: 4.66002e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 2.81999e-06 [parameter_eliminate]: 1.49998e-06 [a_2]: 9.036e-05 [accelerated_algorithm]: 1.116e-05 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 2.02001e-06 [shard_inline]: 7.15e-06 [merge_send_recv]: 7.80998e-06 [auto_parallel]: 7.93001e-06 [parallel]: 8.03999e-06 [flash_sp]: 1.35999e-06 [merge_comm]: 4.05e-06 [allreduce_fusion]: 4.11001e-06 [matmul_add_comm_reduction]: 7.41999e-06 [allreduce_slice_to_reducescatter]: 4.59986e-07 [virtual_shard_identity]: 8.63001e-06 [virtual_dataset]: 7.1e-06 [get_grad_eliminate_]: 7.74002e-06 [virtual_output]: 6.85002e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 1.89999e-06 [offload_activation]: 9.47001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.302e-05 [merge_recompute_call_nodes]: 1.14998e-06 [before_grad]: 1.211e-05 [set_forward_comm_id_for_comm_node_pass]: 4.68999e-06 [meta_fg_expand]: 2.68998e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.64998e-06 [after_resolve]: 1.134e-05 [a_after_grad]: 1.045e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.59e-06 [auto_monad_grad]: 1.04003e-06 [auto_monad_eliminator]: 8.28001e-06 [cse]: 1.999e-05 [a_3]: 4.26e-05 [py_interpret_to_execute_after_opt_a]: 1.6e-05 [slice_cell_reuse_recomputed_activation]: 2.09e-06 [rewriter_after_opt_a]: 4.819e-05 [convert_after_rewriter]: 7.48e-06 [order_py_execute_after_rewriter]: 5.79e-06 [mutable_eliminate]: 0.00074106 [opt_b]: 0.00025647, [1] [Cycle 1]: 0.00024762, [7] [b_1]: 0.00015653 [b_2]: 9.52001e-06 [updatestate_depend_eliminate]: 6.71e-06 [updatestate_assign_eliminate]: 3.63e-06 [updatestate_loads_eliminate]: 3.41001e-06 [renormalize]: 6.19999e-07 [cse]: 2.658e-05 [optimize_parallel_all_gather_comm]: 2.03e-05 [overlap_param_gather]: 2.04999e-06 [cconv]: 2.476e-05 [loop_unroll]: 0.00047771 [opt_after_cconv]: 0.00011811, [1] [Cycle 1]: 0.0001116, [7] [c_1]: 3.522e-05 [parameter_eliminate]: 2.62001e-06 [updatestate_depend_eliminate]: 6.06e-06 [updatestate_assign_eliminate]: 3.33998e-06 [updatestate_loads_eliminate]: 2.99001e-06 [cse]: 2.355e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 1.792e-05 [tuple_transform]: 8.662e-05, [1] [Cycle 1]: 8.136e-05, [4] [d_1]: 5.015e-05 [none_parameter_eliminate]: 2.17999e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 8.37e-06 [partial_unused_args_eliminate]: 1.96998e-06 [add_recomputation]: 5.688e-05 [cse_after_recomputation]: 2.917e-05, [1] [Cycle 1]: 2.386e-05, [1] [cse]: 1.759e-05 [environ_conv]: 1.005e-05 [swap_dp_allreduce_reducescatter]: 6.89999e-06 [bias_add_comm_swap]: 3.04001e-06 [label_micro_interleaved_index]: 5.10999e-06 [label_fine_grained_interleaved_index]: 2.99999e-06 [merge_cast_opt]: 1.64e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.91e-06 [assign_add_opt]: 1.52001e-06 [ForceFp32Comm]: 9.09989e-07 [remove_cast_before_assign_add]: 1.25001e-06 [full_micro_interleaved_order_control]: 2.56998e-06 [reorder_send_recv_between_fp_bp]: 3.11999e-06 [comm_op_add_attrs]: 1.33002e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.24e-06 [interleave_parallel_branches]: 1.17e-06 [overlap_opt_shard_in_pipeline]: 1.35001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.688e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 5.32001e-06 [overlap_recompute_and_grad_model_parallel]: 5.70001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.49e-06 [overlap_recompute_allgather_and_fa_grad]: 1.55001e-06 [overlap_recompute_comm]: 2.46e-06 [overlap_grad_ring_attention]: 5.16002e-06 [overlap_grad_flash_sp]: 2.543e-05 [begin_end_overlap_inline]: 6.49976e-07 [split_matmul_comm_elemetwise]: 2.37999e-06 [split_layernorm_comm]: 1.87001e-06 [handle_group_info]: 1.20999e-06 [symbol_engine_optimizer]: 9.966e-05, [1] [Cycle 1]: 9.442e-05, [6] [build]: 1.01e-05 [elim_shapecalc]: 1.364e-05 [elim_not_effective]: 1.627e-05 [opt_reshape]: 8.89e-06 [fold_const_symbol]: 1.22e-05 [renormalize]: 2.79979e-07 [detach_backward]: 2.37999e-06 [pipeline_parallel_scheduler]: 1.62999e-06 [auto_monad_reorder]: 2.19e-05 [get_jit_bprop_graph]: 1.87001e-06 [rewriter_after_jit_bprop_graph]: 4.89003e-06 [opt_after_jit_grad]: 0.00053264 [validate]: 4.911e-05 [backend_pass]: 9.99979e-07 [task_emit]: 0.899776 [execute]: 8.69e-06 Sums bootstrap : 0.000492s : 0.05% type_inference : 0.013279s : 1.42% event_method : 0.000071s : 0.01% auto_monad : 0.000155s : 0.02% graph_reusing : 0.000010s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000043s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000058s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000043s : 0.00% optimize.rewriter_before_opt_a : 0.000179s : 0.02% optimize.opt_a.expand_dump_flag : 0.000010s : 0.00% optimize.opt_a.switch_simplify : 0.000140s : 0.01% optimize.opt_a.loop_unroll : 0.000117s : 0.01% optimize.opt_a.a_1 : 0.003332s : 0.36% optimize.opt_a.with_stream_mark : 0.000062s : 0.01% optimize.opt_a.recompute_prepare : 0.000049s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.00% optimize.opt_a.parameter_eliminate : 0.000008s : 0.00% optimize.opt_a.a_2 : 0.000443s : 0.05% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.01% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.00% optimize.opt_a.shard_inline : 0.000032s : 0.00% optimize.opt_a.merge_send_recv : 0.000037s : 0.00% optimize.opt_a.auto_parallel : 0.000031s : 0.00% optimize.opt_a.parallel : 0.000039s : 0.00% optimize.opt_a.flash_sp : 0.000020s : 0.00% optimize.opt_a.merge_comm : 0.000018s : 0.00% optimize.opt_a.allreduce_fusion : 0.000017s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000049s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000041s : 0.00% optimize.opt_a.virtual_dataset : 0.000031s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000031s : 0.00% optimize.opt_a.virtual_output : 0.000029s : 0.00% optimize.opt_a.merge_forward : 0.000019s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000041s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000065s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.002133s : 0.23% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000107s : 0.01% optimize.opt_a.a_after_grad : 0.000122s : 0.01% optimize.opt_a.renormalize : 0.010362s : 1.11% optimize.opt_a.add_forward_monad_depend : 0.000023s : 0.00% optimize.opt_a.auto_monad_grad : 0.000012s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000086s : 0.01% optimize.opt_a.cse : 0.000391s : 0.04% optimize.opt_a.a_3 : 0.000469s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000741s : 0.08% optimize.opt_b.b_1 : 0.000157s : 0.02% optimize.opt_b.b_2 : 0.000010s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000027s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.00% optimize.loop_unroll : 0.000478s : 0.05% optimize.opt_after_cconv.c_1 : 0.000035s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000024s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000018s : 0.00% optimize.tuple_transform.d_1 : 0.000050s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.01% optimize.cse_after_recomputation.cse : 0.000018s : 0.00% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000533s : 0.06% validate : 0.000049s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.899776s : 96.20% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000943 161 7.34% : 0.000069s : 8: substitution.arithmetic_simplify 0.29% : 0.000003s : 3: substitution.elim_not_effective 0.61% : 0.000006s : 5: substitution.float_depend_g_call 0.44% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.18% : 0.000002s : 3: substitution.fold_const_symbol 0.78% : 0.000007s : 4: substitution.graph_param_transform 0.39% : 0.000004s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 60.00% : 0.000566s : 17: substitution.inline 2.46% : 0.000023s : 2: substitution.inline_without_move 1.23% : 0.000012s : 15: substitution.j_node_and_user_rematch 1.92% : 0.000018s : 3: substitution.less_batch_normalization 1.48% : 0.000014s : 7: substitution.minmaximum_grad 0.84% : 0.000008s : 5: substitution.partial_eliminate 1.47% : 0.000014s : 15: substitution.remove_not_recompute_node 4.03% : 0.000038s : 10: substitution.replace_applicator 1.28% : 0.000012s : 10: substitution.replace_old_param 0.33% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.78% : 0.000026s : 7: substitution.tuple_list_convert_item_index_to_positive 1.26% : 0.000012s : 7: substitution.tuple_list_get_item_const_eliminator 1.77% : 0.000017s : 7: substitution.tuple_list_get_item_depend_reorder 7.16% : 0.000068s : 19: substitution.tuple_list_get_item_eliminator 1.74% : 0.000016s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.013177 2 85.59% : 0.011278s : 1: type_inference.infer 14.41% : 0.001899s : 1: type_inference.specialize ------[replace.] 0.000249 27 63.04% : 0.000157s : 17: replace.inline 36.96% : 0.000092s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000589 27 94.14% : 0.000554s : 17: match.inline 5.86% : 0.000035s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000753 4248 1.14% : 0.000009s : 53: predicate.accumulaten_eliminater 0.23% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.44% : 0.000003s : 21: predicate.addn_check_dump 1.12% : 0.000008s : 53: predicate.addn_zero_filter 1.03% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 2.17% : 0.000016s : 74: predicate.arithmetic_simplify 1.23% : 0.000009s : 53: predicate.cast_eliminate 1.09% : 0.000008s : 50: predicate.check_bprop_eliminate 0.44% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.48% : 0.000004s : 21: predicate.depend_value_elim 1.22% : 0.000009s : 53: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 53: predicate.dict_get_item_eliminator 1.07% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.34% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000001s : 4: predicate.elim_not_effective 0.15% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000009s : 57: predicate.environ_add_const_eliminate 1.12% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 57: predicate.environ_get_depend_swap 1.68% : 0.000013s : 78: predicate.environ_get_eliminate 1.18% : 0.000009s : 57: predicate.environ_get_set_eliminate 1.78% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.73% : 0.000021s : 80: predicate.float_depend_g_call 0.45% : 0.000003s : 21: predicate.float_environ_get_switch 0.58% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.53% : 0.000004s : 21: predicate.get_grad_eliminate 0.09% : 0.000001s : 4: predicate.graph_param_transform 0.48% : 0.000004s : 21: predicate.incorporate_call 0.43% : 0.000003s : 21: predicate.incorporate_call_switch 5.70% : 0.000043s : 183: predicate.inline 1.37% : 0.000010s : 45: predicate.inline_without_move 0.26% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.69% : 0.000005s : 21: predicate.less_batch_normalization 1.57% : 0.000012s : 71: predicate.list_to_tuple_eliminator_ 2.58% : 0.000019s : 124: predicate.load_eliminater 0.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.45% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.41% : 0.000011s : 61: predicate.make_slice_get_slice_eliminator 0.50% : 0.000004s : 21: predicate.merge_addn 1.06% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 53: predicate.minmaximum_grad 0.33% : 0.000002s : 4: predicate.mutable_eliminate 0.16% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 2.22% : 0.000017s : 80: predicate.partial_defer_inline 1.70% : 0.000013s : 67: predicate.partial_eliminate 1.12% : 0.000008s : 53: predicate.print_const_string_wrapper 0.49% : 0.000004s : 21: predicate.reduce_all_const_elim 1.47% : 0.000011s : 53: predicate.reduce_eliminate 2.57% : 0.000019s : 124: predicate.redundant_stop_gradient_eliminater 0.30% : 0.000002s : 21: predicate.remove_not_recompute_node 1.81% : 0.000014s : 113: predicate.replace_applicator 0.80% : 0.000006s : 45: predicate.replace_old_param 0.07% : 0.000001s : 4: predicate.reset_defer_inline 1.17% : 0.000009s : 53: predicate.reshape_eliminate 1.08% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 4: predicate.row_tensor_eliminate 1.33% : 0.000010s : 50: predicate.same_eliminate 0.40% : 0.000003s : 21: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 21: predicate.shard_identity_eliminate 0.25% : 0.000002s : 8: predicate.special_op_eliminate 0.58% : 0.000004s : 21: predicate.specialize_transform 1.31% : 0.000010s : 50: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.94% : 0.000015s : 80: predicate.switch_defer_inline 2.93% : 0.000022s : 130: predicate.switch_layer_defer_inline 5.08% : 0.000038s : 218: predicate.switch_simplify 1.13% : 0.000009s : 53: predicate.tile_eliminate 1.14% : 0.000009s : 53: predicate.transpose_eliminate 1.46% : 0.000011s : 61: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 61: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000010s : 61: predicate.tuple_list_get_item_depend_reorder 2.70% : 0.000020s : 92: predicate.tuple_list_get_item_eliminator 1.60% : 0.000012s : 61: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000015s : 82: predicate.tuple_list_set_item_eliminator 1.66% : 0.000012s : 71: predicate.tuple_to_list_eliminator_ 2.52% : 0.000019s : 124: predicate.updatestate_pure_node_eliminater 3.09% : 0.000023s : 145: predicate.updatestate_useless_node_eliminater 0.11% : 0.000001s : 4: predicate.value_based_eliminate 0.50% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002093 36 56.00% : 0.001172s : 15: func_graph_cloner_run.FuncGraphClonerGraph 44.00% : 0.000921s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.982763 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.41% : 0.004014s : 1: add_attr 0.41% : 0.004001s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000062s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.02% : 0.000164s : 1: auto_monad 0.00% : 0.000026s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.05% : 0.000520s : 1: bootstrap 0.00% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000021s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000033s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000014s : 1: environ_conv 0.01% : 0.000081s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000015s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.05% : 0.000487s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.08% : 0.000752s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.00% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000018s : 1: opt.transform.mutable_eliminate 0.50% : 0.004951s : 117: opt.transform.opt_a 0.00% : 0.000034s : 1: opt.transform.opt_after_cconv 0.00% : 0.000026s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000133s : 28: opt.transform.opt_b 0.01% : 0.000056s : 2: opt.transform.opt_trans_graph 0.00% : 0.000046s : 4: opt.transform.symbol_engine_opt 1.99% : 0.019577s : 1: opt_a 0.01% : 0.000121s : 1: opt_after_cconv 0.06% : 0.000543s : 1: opt_after_jit_grad 0.03% : 0.000260s : 1: opt_b 2.26% : 0.022254s : 1: optimize 0.00% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000068s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000063s : 1: pre_auto_parallel 0.00% : 0.000047s : 1: py_interpret_to_execute 0.00% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000022s : 1: remove_dup_value 0.80% : 0.007873s : 2: renormalize.infer 0.25% : 0.002465s : 2: renormalize.specialize 0.00% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000052s : 1: rewriter_after_opt_a 0.02% : 0.000184s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000103s : 1: symbol_engine_optimizer 91.56% : 0.899801s : 1: task_emit 0.01% : 0.000090s : 1: tuple_transform 1.35% : 0.013308s : 1: type_inference 0.01% : 0.000077s : 1: validate . TotalTime = 0.677163, [24] [bootstrap]: 0.00046361 [type_inference]: 0.0319628 [event_method]: 1.546e-05 [auto_monad]: 6.992e-05 [graph_reusing]: 6.79999e-06 [inline]: 3.34001e-06 [add_attr]: 0.00414631, [1] [add_attr_with_inline]: 0.00413312, [1] [Cycle 1]: 7.11e-05, [2] [tag_attr]: 2.093e-05 [meta_addattr_fg_expand]: 4.40999e-06 [parallel-infer-symbol]: 4.76002e-06 [pre_auto_parallel]: 3.684e-05 [insert-virtual-dataset]: 2.68e-06 [parallel-infer-symbol-second]: 9.90025e-07 [dataset_repeat_opt]: 2.39999e-06 [pipeline_split]: 1.81e-06 [optimize]: 0.00502325, [53] [py_interpret_to_execute]: 2.897e-05 [rewriter_before_opt_a]: 6.577e-05 [opt_a]: 0.00268991, [2] [Cycle 1]: 0.00198245, [45] [expand_dump_flag]: 3.23e-06 [switch_simplify]: 3.106e-05 [loop_unroll]: 1.808e-05 [a_1]: 0.0004371 [with_stream_mark]: 2.295e-05 [recompute_prepare]: 9.05999e-06 [updatestate_depend_eliminate]: 4.37e-06 [updatestate_assign_eliminate]: 3.65e-06 [updatestate_loads_eliminate]: 3.61001e-06 [parameter_eliminate]: 2.24999e-06 [a_2]: 8.799e-05 [accelerated_algorithm]: 7.68001e-06 [shard]: 2.41998e-06 [meta_shard_fg_expand]: 2.57001e-06 [shard_inline]: 6.56999e-06 [merge_send_recv]: 9.23002e-06 [auto_parallel]: 7.01001e-06 [parallel]: 2.14e-05 [flash_sp]: 9.77999e-06 [merge_comm]: 4.49998e-06 [allreduce_fusion]: 3.6e-06 [matmul_add_comm_reduction]: 1.102e-05 [allreduce_slice_to_reducescatter]: 6.90023e-07 [virtual_shard_identity]: 8.27e-06 [virtual_dataset]: 6.39999e-06 [get_grad_eliminate_]: 6.61999e-06 [virtual_output]: 6.66999e-06 [merge_forward]: 4.35e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 1.167e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.351e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 1.111e-05 [set_forward_comm_id_for_comm_node_pass]: 3.89002e-06 [meta_fg_expand]: 3.41999e-06 [flash_sp_send_recv_attached]: 2.98e-06 [receive_attached]: 2.31e-06 [after_resolve]: 1.053e-05 [a_after_grad]: 9.67001e-06 [renormalize]: 0.00079207 [add_forward_monad_depend]: 7.71999e-06 [auto_monad_grad]: 2.86e-06 [auto_monad_eliminator]: 1.766e-05 [cse]: 3.428e-05 [a_3]: 4.831e-05 [Cycle 2]: 0.00069486, [45] [expand_dump_flag]: 1.66998e-06 [switch_simplify]: 7.49002e-06 [loop_unroll]: 6.16e-06 [a_1]: 0.00012879 [with_stream_mark]: 1.378e-05 [recompute_prepare]: 6.83e-06 [updatestate_depend_eliminate]: 3.83001e-06 [updatestate_assign_eliminate]: 3.08998e-06 [updatestate_loads_eliminate]: 3.40998e-06 [parameter_eliminate]: 1.44e-06 [a_2]: 7.625e-05 [accelerated_algorithm]: 6.27001e-06 [shard]: 1.20001e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 6.50002e-06 [merge_send_recv]: 6.37001e-06 [auto_parallel]: 6.51999e-06 [parallel]: 5.52001e-06 [flash_sp]: 3.61999e-06 [merge_comm]: 3.56999e-06 [allreduce_fusion]: 3.53999e-06 [matmul_add_comm_reduction]: 7.07002e-06 [allreduce_slice_to_reducescatter]: 5.19998e-07 [virtual_shard_identity]: 6.81001e-06 [virtual_dataset]: 5.77001e-06 [get_grad_eliminate_]: 5.83002e-06 [virtual_output]: 5.49998e-06 [merge_forward]: 3.35998e-06 [cell_reuse_recompute_pass]: 1.82999e-06 [offload_activation]: 8.65001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.178e-05 [merge_recompute_call_nodes]: 1.20001e-06 [before_grad]: 9.66998e-06 [set_forward_comm_id_for_comm_node_pass]: 4.51002e-06 [meta_fg_expand]: 2.54999e-06 [flash_sp_send_recv_attached]: 1.04e-06 [receive_attached]: 1.71998e-06 [after_resolve]: 1.163e-05 [a_after_grad]: 9.07999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.53002e-06 [auto_monad_grad]: 1.60001e-06 [auto_monad_eliminator]: 8.69e-06 [cse]: 1.734e-05 [a_3]: 3.485e-05 [py_interpret_to_execute_after_opt_a]: 1.166e-05 [slice_cell_reuse_recomputed_activation]: 1.96998e-06 [rewriter_after_opt_a]: 4.064e-05 [convert_after_rewriter]: 7e-06 [order_py_execute_after_rewriter]: 5.40001e-06 [mutable_eliminate]: 0.00070264 [opt_b]: 0.00020918, [1] [Cycle 1]: 0.00020066, [7] [b_1]: 0.00011876 [b_2]: 8.46002e-06 [updatestate_depend_eliminate]: 7.40998e-06 [updatestate_assign_eliminate]: 2.76e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 4.19997e-07 [cse]: 2.233e-05 [optimize_parallel_all_gather_comm]: 1.815e-05 [overlap_param_gather]: 2.12001e-06 [cconv]: 2.973e-05 [loop_unroll]: 0.00049421 [opt_after_cconv]: 0.00011113, [1] [Cycle 1]: 0.00010403, [7] [c_1]: 3.08e-05 [parameter_eliminate]: 3.68e-06 [updatestate_depend_eliminate]: 6.47001e-06 [updatestate_assign_eliminate]: 2.99001e-06 [updatestate_loads_eliminate]: 2.59999e-06 [cse]: 2.054e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.773e-05 [tuple_transform]: 7.745e-05, [1] [Cycle 1]: 7.23e-05, [4] [d_1]: 4.244e-05 [none_parameter_eliminate]: 1.81e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 7.13e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 5.082e-05 [cse_after_recomputation]: 2.46e-05, [1] [Cycle 1]: 1.959e-05, [1] [cse]: 1.328e-05 [environ_conv]: 6.64001e-06 [swap_dp_allreduce_reducescatter]: 5.71e-06 [bias_add_comm_swap]: 2.42001e-06 [label_micro_interleaved_index]: 5.19e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.45999e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.68e-06 [assign_add_opt]: 1.87001e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.14e-06 [full_micro_interleaved_order_control]: 2.63e-06 [reorder_send_recv_between_fp_bp]: 2.99001e-06 [comm_op_add_attrs]: 1.15001e-06 [add_comm_op_reuse_tag]: 1.35999e-06 [interleave_split_concat_branches]: 1.49e-06 [interleave_parallel_branches]: 1.29e-06 [overlap_opt_shard_in_pipeline]: 1.62001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.409e-05 [grouped_pairwise_exchange_alltoall]: 1.74e-06 [offloading_packed_experts]: 4.14002e-06 [overlap_recompute_and_grad_model_parallel]: 5.39e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.35999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.92001e-06 [overlap_recompute_comm]: 2.56e-06 [overlap_grad_ring_attention]: 4.94e-06 [overlap_grad_flash_sp]: 2.143e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 2.01e-06 [handle_group_info]: 1.45001e-06 [symbol_engine_optimizer]: 7.936e-05, [1] [Cycle 1]: 7.408e-05, [6] [build]: 3.88001e-06 [elim_shapecalc]: 9.99999e-06 [elim_not_effective]: 1.332e-05 [opt_reshape]: 6.66e-06 [fold_const_symbol]: 1.063e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.34999e-06 [pipeline_parallel_scheduler]: 1.66e-06 [auto_monad_reorder]: 1.767e-05 [get_jit_bprop_graph]: 1.67999e-06 [rewriter_after_jit_bprop_graph]: 4.28999e-06 [opt_after_jit_grad]: 0.00050586 [validate]: 4.578e-05 [backend_pass]: 1.04e-06 [task_emit]: 0.634564 [execute]: 1.048e-05 Sums bootstrap : 0.000464s : 0.07% type_inference : 0.031963s : 4.76% event_method : 0.000015s : 0.00% auto_monad : 0.000070s : 0.01% graph_reusing : 0.000007s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000021s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000005s : 0.00% pre_auto_parallel : 0.000037s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000029s : 0.00% optimize.rewriter_before_opt_a : 0.000066s : 0.01% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.01% optimize.opt_a.loop_unroll : 0.000024s : 0.00% optimize.opt_a.a_1 : 0.000566s : 0.08% optimize.opt_a.with_stream_mark : 0.000037s : 0.01% optimize.opt_a.recompute_prepare : 0.000016s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000164s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000016s : 0.00% optimize.opt_a.auto_parallel : 0.000014s : 0.00% optimize.opt_a.parallel : 0.000027s : 0.00% optimize.opt_a.flash_sp : 0.000013s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000020s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000006s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.00% optimize.opt_a.a_after_grad : 0.000019s : 0.00% optimize.opt_a.renormalize : 0.000792s : 0.12% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000026s : 0.00% optimize.opt_a.cse : 0.000052s : 0.01% optimize.opt_a.a_3 : 0.000083s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000041s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000703s : 0.10% optimize.opt_b.b_1 : 0.000119s : 0.02% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000022s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000030s : 0.00% optimize.loop_unroll : 0.000494s : 0.07% optimize.opt_after_cconv.c_1 : 0.000031s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000021s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000018s : 0.00% optimize.tuple_transform.d_1 : 0.000042s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.01% optimize.cse_after_recomputation.cse : 0.000013s : 0.00% optimize.environ_conv : 0.000007s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000021s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000506s : 0.08% validate : 0.000046s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.634564s : 94.45% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000198 24 19.68% : 0.000039s : 4: substitution.arithmetic_simplify 1.14% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000002s : 2: substitution.fold_const_symbol 3.24% : 0.000006s : 3: substitution.graph_param_transform 68.18% : 0.000135s : 3: substitution.inline 2.11% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000005s : 4: substitution.remove_not_recompute_node 2.28% : 0.000005s : 2: substitution.replace_old_param ------[type_inference.] 0.031893 2 98.01% : 0.031259s : 1: type_inference.infer 1.99% : 0.000634s : 1: type_inference.specialize ------[replace.] 0.000035 3 100.00% : 0.000035s : 3: replace.inline ------[match.] 0.000133 3 100.00% : 0.000133s : 3: match.inline ------[predicate.] 0.000171 815 0.85% : 0.000001s : 8: predicate.accumulaten_eliminater 1.16% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.85% : 0.000001s : 8: predicate.addn_zero_filter 0.90% : 0.000002s : 8: predicate.adjust_all_reduce_mul_add 2.61% : 0.000004s : 14: predicate.arithmetic_simplify 1.22% : 0.000002s : 8: predicate.cast_eliminate 0.82% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.74% : 0.000001s : 6: predicate.depend_value_elim 0.87% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.89% : 0.000002s : 8: predicate.dict_set_item_eliminator 1.04% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 11: predicate.environ_add_const_eliminate 0.98% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.36% : 0.000002s : 11: predicate.environ_get_depend_swap 2.02% : 0.000003s : 17: predicate.environ_get_eliminate 1.28% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.10% : 0.000004s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.88% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.62% : 0.000001s : 6: predicate.incorporate_call 0.54% : 0.000001s : 6: predicate.incorporate_call_switch 5.90% : 0.000010s : 37: predicate.inline 1.13% : 0.000002s : 6: predicate.inline_without_move 0.38% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.93% : 0.000002s : 6: predicate.less_batch_normalization 1.55% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.17% : 0.000004s : 22: predicate.load_eliminater 1.07% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.85% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.97% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 6: predicate.merge_addn 0.70% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 8: predicate.minmaximum_grad 1.41% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.35% : 0.000001s : 3: predicate.parallel_virtual_node 1.65% : 0.000003s : 11: predicate.partial_defer_inline 1.14% : 0.000002s : 11: predicate.partial_eliminate 0.83% : 0.000001s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.30% : 0.000002s : 8: predicate.reduce_eliminate 2.07% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 6: predicate.remove_not_recompute_node 1.13% : 0.000002s : 14: predicate.replace_applicator 0.63% : 0.000001s : 6: predicate.replace_old_param 0.41% : 0.000001s : 3: predicate.reset_defer_inline 1.01% : 0.000002s : 8: predicate.reshape_eliminate 0.69% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 3: predicate.row_tensor_eliminate 0.86% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 6: predicate.shard_identity_eliminate 0.91% : 0.000002s : 6: predicate.special_op_eliminate 0.80% : 0.000001s : 6: predicate.specialize_transform 1.13% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000002s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.13% : 0.000002s : 11: predicate.switch_defer_inline 1.88% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.39% : 0.000007s : 38: predicate.switch_simplify 0.84% : 0.000001s : 8: predicate.tile_eliminate 1.00% : 0.000002s : 8: predicate.transpose_eliminate 1.80% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 2.06% : 0.000004s : 14: predicate.tuple_list_get_item_const_eliminator 1.65% : 0.000003s : 14: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.63% : 0.000003s : 14: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.54% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 1.96% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.83% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.72% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000447 7 35.15% : 0.000157s : 2: func_graph_cloner_run.FuncGraphClonerGraph 64.85% : 0.000290s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.688233 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.60% : 0.004153s : 1: add_attr 0.60% : 0.004137s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000055s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.01% : 0.000076s : 1: auto_monad 0.00% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.07% : 0.000496s : 1: bootstrap 0.00% : 0.000033s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000018s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000028s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000010s : 1: environ_conv 0.00% : 0.000023s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000005s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000005s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.07% : 0.000505s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.10% : 0.000713s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000016s : 1: opt.transform.mutable_eliminate 0.14% : 0.000961s : 78: opt.transform.opt_a 0.00% : 0.000029s : 1: opt.transform.opt_after_cconv 0.00% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000095s : 28: opt.transform.opt_b 0.01% : 0.000047s : 2: opt.transform.opt_trans_graph 0.01% : 0.000037s : 4: opt.transform.symbol_engine_opt 0.39% : 0.002694s : 1: opt_a 0.02% : 0.000115s : 1: opt_after_cconv 0.08% : 0.000517s : 1: opt_after_jit_grad 0.03% : 0.000212s : 1: opt_b 0.73% : 0.005029s : 1: optimize 0.00% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000005s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000042s : 1: pre_auto_parallel 0.00% : 0.000033s : 1: py_interpret_to_execute 0.00% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000021s : 1: remove_dup_value 0.06% : 0.000434s : 1: renormalize.infer 0.05% : 0.000349s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000045s : 1: rewriter_after_opt_a 0.01% : 0.000070s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000082s : 1: symbol_engine_optimizer 92.21% : 0.634592s : 1: task_emit 0.01% : 0.000081s : 1: tuple_transform 4.65% : 0.031993s : 1: type_inference 0.01% : 0.000078s : 1: validate TotalTime = 0.632701, [24] [bootstrap]: 0.0005183 [type_inference]: 0.0380655 [event_method]: 5.759e-05 [auto_monad]: 0.00013892 [graph_reusing]: 9.25999e-06 [inline]: 2.68998e-06 [add_attr]: 0.00385844, [1] [add_attr_with_inline]: 0.00384644, [1] [Cycle 1]: 9.016e-05, [2] [tag_attr]: 3.923e-05 [meta_addattr_fg_expand]: 1.059e-05 [parallel-infer-symbol]: 3.63e-06 [pre_auto_parallel]: 5.641e-05 [insert-virtual-dataset]: 3.32002e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 2.34999e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.0228442, [53] [py_interpret_to_execute]: 4.196e-05 [rewriter_before_opt_a]: 0.00016879 [opt_a]: 0.0199456, [3] [Cycle 1]: 0.0151485, [45] [expand_dump_flag]: 6.24999e-06 [switch_simplify]: 7.651e-05 [loop_unroll]: 6.196e-05 [a_1]: 0.00150637 [with_stream_mark]: 3.211e-05 [recompute_prepare]: 2.687e-05 [updatestate_depend_eliminate]: 9.62001e-06 [updatestate_assign_eliminate]: 7.94002e-06 [updatestate_loads_eliminate]: 7.6e-06 [parameter_eliminate]: 3.33998e-06 [a_2]: 0.00028847 [accelerated_algorithm]: 3.944e-05 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 5.19998e-06 [shard_inline]: 1.851e-05 [merge_send_recv]: 1.947e-05 [auto_parallel]: 1.546e-05 [parallel]: 2.342e-05 [flash_sp]: 1.436e-05 [merge_comm]: 9.68997e-06 [allreduce_fusion]: 9.42999e-06 [matmul_add_comm_reduction]: 3.33e-05 [allreduce_slice_to_reducescatter]: 1.07e-06 [virtual_shard_identity]: 2.688e-05 [virtual_dataset]: 1.824e-05 [get_grad_eliminate_]: 1.598e-05 [virtual_output]: 1.574e-05 [merge_forward]: 1.114e-05 [cell_reuse_recompute_pass]: 1.86e-06 [offload_activation]: 2.123e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.899e-05 [merge_recompute_call_nodes]: 2.11e-06 [before_grad]: 3.479e-05 [set_forward_comm_id_for_comm_node_pass]: 1.237e-05 [meta_fg_expand]: 0.00211246 [flash_sp_send_recv_attached]: 5.30999e-06 [receive_attached]: 2.90998e-06 [after_resolve]: 8.091e-05 [a_after_grad]: 0.00010269 [renormalize]: 0.00930004 [add_forward_monad_depend]: 1.871e-05 [auto_monad_grad]: 7.11001e-06 [auto_monad_eliminator]: 6.459e-05 [cse]: 0.00023645 [a_3]: 0.00036969 [Cycle 2]: 0.00392956, [45] [expand_dump_flag]: 3.62998e-06 [switch_simplify]: 5.115e-05 [loop_unroll]: 4.509e-05 [a_1]: 0.00185381 [with_stream_mark]: 2.48e-05 [recompute_prepare]: 1.58e-05 [updatestate_depend_eliminate]: 5.42999e-06 [updatestate_assign_eliminate]: 3.89002e-06 [updatestate_loads_eliminate]: 3.42002e-06 [parameter_eliminate]: 2.62001e-06 [a_2]: 9.942e-05 [accelerated_algorithm]: 1.354e-05 [shard]: 2.36998e-06 [meta_shard_fg_expand]: 2.93e-06 [shard_inline]: 7.15e-06 [merge_send_recv]: 1.019e-05 [auto_parallel]: 1.087e-05 [parallel]: 1.103e-05 [flash_sp]: 6.06e-06 [merge_comm]: 4.13001e-06 [allreduce_fusion]: 5.40999e-06 [matmul_add_comm_reduction]: 1.166e-05 [allreduce_slice_to_reducescatter]: 8.10018e-07 [virtual_shard_identity]: 1.071e-05 [virtual_dataset]: 7.26001e-06 [get_grad_eliminate_]: 6.83e-06 [virtual_output]: 6.97002e-06 [merge_forward]: 5.64e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 1.306e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.875e-05 [merge_recompute_call_nodes]: 1.72999e-06 [before_grad]: 1.346e-05 [set_forward_comm_id_for_comm_node_pass]: 5.04e-06 [meta_fg_expand]: 9.913e-05 [flash_sp_send_recv_attached]: 1.96998e-06 [receive_attached]: 2.67001e-06 [after_resolve]: 2.048e-05 [a_after_grad]: 1.158e-05 [renormalize]: 0.0010513 [add_forward_monad_depend]: 8.2e-06 [auto_monad_grad]: 2.54001e-06 [auto_monad_eliminator]: 1.917e-05 [cse]: 3.823e-05 [a_3]: 5.998e-05 [Cycle 3]: 0.00084583, [45] [expand_dump_flag]: 2.21e-06 [switch_simplify]: 9.89999e-06 [loop_unroll]: 7.83001e-06 [a_1]: 0.00016882 [with_stream_mark]: 1.512e-05 [recompute_prepare]: 8.25e-06 [updatestate_depend_eliminate]: 5.59e-06 [updatestate_assign_eliminate]: 3.60998e-06 [updatestate_loads_eliminate]: 3.21001e-06 [parameter_eliminate]: 2.76999e-06 [a_2]: 9.185e-05 [accelerated_algorithm]: 1.481e-05 [shard]: 2.04e-06 [meta_shard_fg_expand]: 2.34001e-06 [shard_inline]: 8.54998e-06 [merge_send_recv]: 9.10001e-06 [auto_parallel]: 9.54999e-06 [parallel]: 9.55001e-06 [flash_sp]: 1.35001e-06 [merge_comm]: 4.16001e-06 [allreduce_fusion]: 4e-06 [matmul_add_comm_reduction]: 9.39998e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.076e-05 [virtual_dataset]: 7.65e-06 [get_grad_eliminate_]: 6.76e-06 [virtual_output]: 6.88e-06 [merge_forward]: 5.54e-06 [cell_reuse_recompute_pass]: 2.61e-06 [offload_activation]: 1.085e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.739e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 1.436e-05 [set_forward_comm_id_for_comm_node_pass]: 5.69999e-06 [meta_fg_expand]: 2.99999e-06 [flash_sp_send_recv_attached]: 1.27e-06 [receive_attached]: 1.72999e-06 [after_resolve]: 1.392e-05 [a_after_grad]: 1.004e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 3.58999e-06 [auto_monad_grad]: 2.37001e-06 [auto_monad_eliminator]: 1.249e-05 [cse]: 2.386e-05 [a_3]: 4.153e-05 [py_interpret_to_execute_after_opt_a]: 2.027e-05 [slice_cell_reuse_recomputed_activation]: 2.56998e-06 [rewriter_after_opt_a]: 5.508e-05 [convert_after_rewriter]: 8.18999e-06 [order_py_execute_after_rewriter]: 5.74e-06 [mutable_eliminate]: 0.00079496 [opt_b]: 0.00027856, [1] [Cycle 1]: 0.00026789, [7] [b_1]: 0.00014429 [b_2]: 1.932e-05 [updatestate_depend_eliminate]: 1.088e-05 [updatestate_assign_eliminate]: 4.27998e-06 [updatestate_loads_eliminate]: 4.39998e-06 [renormalize]: 1.04e-06 [cse]: 3.821e-05 [optimize_parallel_all_gather_comm]: 2.589e-05 [overlap_param_gather]: 2.56e-06 [cconv]: 3.474e-05 [loop_unroll]: 0.00057736 [opt_after_cconv]: 0.00013542, [1] [Cycle 1]: 0.00012759, [7] [c_1]: 3.494e-05 [parameter_eliminate]: 6.45997e-06 [updatestate_depend_eliminate]: 7.92998e-06 [updatestate_assign_eliminate]: 3.66999e-06 [updatestate_loads_eliminate]: 3.22002e-06 [cse]: 3.079e-05 [renormalize]: 9.00007e-07 [remove_dup_value]: 1.954e-05 [tuple_transform]: 9.256e-05, [1] [Cycle 1]: 8.667e-05, [4] [d_1]: 5.459e-05 [none_parameter_eliminate]: 2.04999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 8.10999e-06 [partial_unused_args_eliminate]: 2.17001e-06 [add_recomputation]: 6.592e-05 [cse_after_recomputation]: 3.179e-05, [1] [Cycle 1]: 2.607e-05, [1] [cse]: 1.943e-05 [environ_conv]: 1.214e-05 [swap_dp_allreduce_reducescatter]: 6.75998e-06 [bias_add_comm_swap]: 2.88998e-06 [label_micro_interleaved_index]: 6.74001e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.74999e-06 [assign_add_opt]: 1.54e-06 [ForceFp32Comm]: 9.90025e-07 [remove_cast_before_assign_add]: 1.50001e-06 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.47001e-06 [add_comm_op_reuse_tag]: 1.20001e-06 [interleave_split_concat_branches]: 1.55999e-06 [interleave_parallel_branches]: 1.15001e-06 [overlap_opt_shard_in_pipeline]: 1.35999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.93997e-06 [control_data_broadcast_order]: 2.052e-05 [grouped_pairwise_exchange_alltoall]: 2.21e-06 [offloading_packed_experts]: 6.19001e-06 [overlap_recompute_and_grad_model_parallel]: 6.04999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.74001e-06 [overlap_grad_ring_attention]: 5.31998e-06 [overlap_grad_flash_sp]: 2.687e-05 [begin_end_overlap_inline]: 5.89993e-07 [split_matmul_comm_elemetwise]: 2.43e-06 [split_layernorm_comm]: 2.01e-06 [handle_group_info]: 1.40999e-06 [symbol_engine_optimizer]: 0.00011261, [1] [Cycle 1]: 0.00010652, [6] [build]: 1.289e-05 [elim_shapecalc]: 1.602e-05 [elim_not_effective]: 1.813e-05 [opt_reshape]: 9.04e-06 [fold_const_symbol]: 1.273e-05 [renormalize]: 5.00004e-07 [detach_backward]: 2.54999e-06 [pipeline_parallel_scheduler]: 2.11e-06 [auto_monad_reorder]: 2.65e-05 [get_jit_bprop_graph]: 2.04e-06 [rewriter_after_jit_bprop_graph]: 6.67002e-06 [opt_after_jit_grad]: 0.00066346 [validate]: 6.204e-05 [backend_pass]: 1.27999e-06 [task_emit]: 0.566075 [execute]: 1.147e-05 Sums bootstrap : 0.000518s : 0.08% type_inference : 0.038065s : 6.07% event_method : 0.000058s : 0.01% auto_monad : 0.000139s : 0.02% graph_reusing : 0.000009s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000039s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000056s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000042s : 0.01% optimize.rewriter_before_opt_a : 0.000169s : 0.03% optimize.opt_a.expand_dump_flag : 0.000012s : 0.00% optimize.opt_a.switch_simplify : 0.000138s : 0.02% optimize.opt_a.loop_unroll : 0.000115s : 0.02% optimize.opt_a.a_1 : 0.003529s : 0.56% optimize.opt_a.with_stream_mark : 0.000072s : 0.01% optimize.opt_a.recompute_prepare : 0.000051s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.00% optimize.opt_a.parameter_eliminate : 0.000009s : 0.00% optimize.opt_a.a_2 : 0.000480s : 0.08% optimize.opt_a.accelerated_algorithm : 0.000068s : 0.01% optimize.opt_a.shard : 0.000007s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.01% optimize.opt_a.merge_send_recv : 0.000039s : 0.01% optimize.opt_a.auto_parallel : 0.000036s : 0.01% optimize.opt_a.parallel : 0.000044s : 0.01% optimize.opt_a.flash_sp : 0.000022s : 0.00% optimize.opt_a.merge_comm : 0.000018s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000054s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000048s : 0.01% optimize.opt_a.virtual_dataset : 0.000033s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000030s : 0.00% optimize.opt_a.virtual_output : 0.000030s : 0.00% optimize.opt_a.merge_forward : 0.000022s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000045s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000075s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000063s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000023s : 0.00% optimize.opt_a.meta_fg_expand : 0.002215s : 0.35% optimize.opt_a.flash_sp_send_recv_attached : 0.000009s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000115s : 0.02% optimize.opt_a.a_after_grad : 0.000124s : 0.02% optimize.opt_a.renormalize : 0.010351s : 1.65% optimize.opt_a.add_forward_monad_depend : 0.000030s : 0.00% optimize.opt_a.auto_monad_grad : 0.000012s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000096s : 0.02% optimize.opt_a.cse : 0.000299s : 0.05% optimize.opt_a.a_3 : 0.000471s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000020s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000055s : 0.01% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000795s : 0.13% optimize.opt_b.b_1 : 0.000144s : 0.02% optimize.opt_b.b_2 : 0.000019s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000011s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000038s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.00% optimize.overlap_param_gather : 0.000003s : 0.00% optimize.cconv : 0.000035s : 0.01% optimize.loop_unroll : 0.000577s : 0.09% optimize.opt_after_cconv.c_1 : 0.000035s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000020s : 0.00% optimize.tuple_transform.d_1 : 0.000055s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000066s : 0.01% optimize.cse_after_recomputation.cse : 0.000019s : 0.00% optimize.environ_conv : 0.000012s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000007s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000021s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000027s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000016s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000001s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000007s : 0.00% opt_after_jit_grad : 0.000663s : 0.11% validate : 0.000062s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.566075s : 90.27% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.001036 159 6.93% : 0.000072s : 7: substitution.arithmetic_simplify 0.24% : 0.000002s : 3: substitution.elim_not_effective 0.53% : 0.000005s : 5: substitution.float_depend_g_call 0.43% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.20% : 0.000002s : 3: substitution.fold_const_symbol 0.72% : 0.000007s : 4: substitution.graph_param_transform 0.32% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000003s : 2: substitution.incorporate_call_switch 62.12% : 0.000644s : 17: substitution.inline 2.53% : 0.000026s : 2: substitution.inline_without_move 1.37% : 0.000014s : 15: substitution.j_node_and_user_rematch 2.04% : 0.000021s : 3: substitution.less_batch_normalization 1.38% : 0.000014s : 7: substitution.minmaximum_grad 0.70% : 0.000007s : 5: substitution.partial_eliminate 1.38% : 0.000014s : 15: substitution.remove_not_recompute_node 3.63% : 0.000038s : 10: substitution.replace_applicator 1.29% : 0.000013s : 10: substitution.replace_old_param 0.39% : 0.000004s : 1: substitution.set_cell_output_no_recompute 2.59% : 0.000027s : 7: substitution.tuple_list_convert_item_index_to_positive 1.16% : 0.000012s : 7: substitution.tuple_list_get_item_const_eliminator 1.63% : 0.000017s : 7: substitution.tuple_list_get_item_depend_reorder 6.54% : 0.000068s : 18: substitution.tuple_list_get_item_eliminator 1.59% : 0.000016s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.037971 2 95.17% : 0.036137s : 1: type_inference.infer 4.83% : 0.001834s : 1: type_inference.specialize ------[replace.] 0.000247 26 66.79% : 0.000165s : 17: replace.inline 33.21% : 0.000082s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000666 26 94.76% : 0.000631s : 17: match.inline 5.24% : 0.000035s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000752 4180 1.14% : 0.000009s : 52: predicate.accumulaten_eliminater 0.50% : 0.000004s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.11% : 0.000008s : 52: predicate.addn_zero_filter 1.02% : 0.000008s : 52: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 73: predicate.arithmetic_simplify 1.10% : 0.000008s : 52: predicate.cast_eliminate 1.06% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.45% : 0.000003s : 21: predicate.depend_value_elim 1.11% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.16% : 0.000009s : 52: predicate.dict_get_item_eliminator 1.08% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.42% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 4: predicate.elim_not_effective 0.14% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 56: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 56: predicate.environ_get_add_eliminate 1.12% : 0.000008s : 56: predicate.environ_get_depend_swap 1.56% : 0.000012s : 77: predicate.environ_get_eliminate 1.10% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.67% : 0.000013s : 78: predicate.exchange_switch_depend_value 2.77% : 0.000021s : 78: predicate.float_depend_g_call 0.45% : 0.000003s : 21: predicate.float_environ_get_switch 0.55% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.52% : 0.000004s : 21: predicate.get_grad_eliminate 0.09% : 0.000001s : 4: predicate.graph_param_transform 0.48% : 0.000004s : 21: predicate.incorporate_call 0.42% : 0.000003s : 21: predicate.incorporate_call_switch 5.82% : 0.000044s : 180: predicate.inline 1.49% : 0.000011s : 45: predicate.inline_without_move 0.29% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.70% : 0.000005s : 21: predicate.less_batch_normalization 1.47% : 0.000011s : 69: predicate.list_to_tuple_eliminator_ 2.46% : 0.000019s : 121: predicate.load_eliminater 0.40% : 0.000003s : 4: predicate.loop_unroll_after_grad 2.45% : 0.000018s : 110: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 60: predicate.make_slice_get_slice_eliminator 0.50% : 0.000004s : 21: predicate.merge_addn 1.07% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.15% : 0.000009s : 50: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 52: predicate.minmaximum_grad 0.44% : 0.000003s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.14% : 0.000001s : 4: predicate.parallel_virtual_node 2.29% : 0.000017s : 78: predicate.partial_defer_inline 1.71% : 0.000013s : 65: predicate.partial_eliminate 1.08% : 0.000008s : 52: predicate.print_const_string_wrapper 0.56% : 0.000004s : 21: predicate.reduce_all_const_elim 1.36% : 0.000010s : 52: predicate.reduce_eliminate 2.43% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000004s : 21: predicate.remove_not_recompute_node 1.87% : 0.000014s : 111: predicate.replace_applicator 0.80% : 0.000006s : 45: predicate.replace_old_param 0.14% : 0.000001s : 4: predicate.reset_defer_inline 1.14% : 0.000009s : 52: predicate.reshape_eliminate 1.08% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 4: predicate.row_tensor_eliminate 1.27% : 0.000010s : 50: predicate.same_eliminate 0.38% : 0.000003s : 21: predicate.set_cell_output_no_recompute 0.69% : 0.000005s : 21: predicate.shard_identity_eliminate 0.23% : 0.000002s : 8: predicate.special_op_eliminate 0.67% : 0.000005s : 21: predicate.specialize_transform 1.45% : 0.000011s : 50: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 78: predicate.switch_defer_inline 2.88% : 0.000022s : 128: predicate.switch_layer_defer_inline 5.02% : 0.000038s : 213: predicate.switch_simplify 1.04% : 0.000008s : 52: predicate.tile_eliminate 1.10% : 0.000008s : 52: predicate.transpose_eliminate 1.47% : 0.000011s : 60: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000010s : 60: predicate.tuple_list_get_item_depend_reorder 2.91% : 0.000022s : 90: predicate.tuple_list_get_item_eliminator 1.62% : 0.000012s : 60: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 81: predicate.tuple_list_set_item_eliminator 1.45% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.40% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.22% : 0.000024s : 142: predicate.updatestate_useless_node_eliminater 0.19% : 0.000001s : 4: predicate.value_based_eliminate 0.67% : 0.000005s : 21: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.13% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002504 35 59.25% : 0.001484s : 14: func_graph_cloner_run.FuncGraphClonerGraph 40.75% : 0.001020s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.675181 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.57% : 0.003865s : 1: add_attr 0.57% : 0.003851s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000071s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.02% : 0.000147s : 1: auto_monad 0.00% : 0.000032s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.08% : 0.000549s : 1: bootstrap 0.01% : 0.000040s : 1: cconv 0.00% : 0.000005s : 1: comm_op_add_attrs 0.00% : 0.000025s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.01% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.00% : 0.000016s : 1: environ_conv 0.01% : 0.000067s : 1: event_method 0.00% : 0.000020s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000005s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000010s : 1: label_micro_interleaved_index 0.09% : 0.000591s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.12% : 0.000810s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.00% : 0.000021s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000026s : 1: opt.transform.mutable_eliminate 0.77% : 0.005216s : 117: opt.transform.opt_a 0.00% : 0.000033s : 1: opt.transform.opt_after_cconv 0.01% : 0.000039s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000130s : 28: opt.transform.opt_b 0.01% : 0.000060s : 2: opt.transform.opt_trans_graph 0.01% : 0.000051s : 4: opt.transform.symbol_engine_opt 2.95% : 0.019949s : 1: opt_a 0.02% : 0.000140s : 1: opt_after_cconv 0.10% : 0.000684s : 1: opt_after_jit_grad 0.04% : 0.000283s : 1: opt_b 3.38% : 0.022850s : 1: optimize 0.00% : 0.000030s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000032s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000062s : 1: pre_auto_parallel 0.01% : 0.000046s : 1: py_interpret_to_execute 0.00% : 0.000025s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000024s : 1: remove_dup_value 1.18% : 0.007951s : 2: renormalize.infer 0.35% : 0.002374s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000064s : 1: rewriter_after_opt_a 0.03% : 0.000173s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000115s : 1: symbol_engine_optimizer 83.84% : 0.566102s : 1: task_emit 0.01% : 0.000096s : 1: tuple_transform 5.64% : 0.038096s : 1: type_inference 0.01% : 0.000101s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x0-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x0-ge],max_mem:4.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x1-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x1-pynative],max_mem:4.0M TotalTime = 0.0751205, [24] [bootstrap]: 0.00070386 [type_inference]: 0.0284738 [event_method]: 2.055e-05 [auto_monad]: 7.493e-05 [graph_reusing]: 6.58e-06 [inline]: 3.44001e-06 [add_attr]: 0.0175518, [1] [add_attr_with_inline]: 0.0175345, [1] [Cycle 1]: 8.604e-05, [2] [tag_attr]: 2.403e-05 [meta_addattr_fg_expand]: 5.12e-06 [parallel-infer-symbol]: 4.47e-06 [pre_auto_parallel]: 3.922e-05 [insert-virtual-dataset]: 3.14001e-06 [parallel-infer-symbol-second]: 9.5999e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 2.12999e-06 [optimize]: 0.00640591, [53] [py_interpret_to_execute]: 3.357e-05 [rewriter_before_opt_a]: 8.227e-05 [opt_a]: 0.00334665, [2] [Cycle 1]: 0.00248976, [45] [expand_dump_flag]: 3.16001e-06 [switch_simplify]: 3.719e-05 [loop_unroll]: 2.218e-05 [a_1]: 0.00054039 [with_stream_mark]: 4.795e-05 [recompute_prepare]: 1.707e-05 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 3.95e-06 [updatestate_loads_eliminate]: 3.55998e-06 [parameter_eliminate]: 1.94e-06 [a_2]: 8.86e-05 [accelerated_algorithm]: 8.08001e-06 [shard]: 2.12001e-06 [meta_shard_fg_expand]: 2.41e-06 [shard_inline]: 7.51001e-06 [merge_send_recv]: 1.006e-05 [auto_parallel]: 9.23002e-06 [parallel]: 3.246e-05 [flash_sp]: 1.152e-05 [merge_comm]: 5.07999e-06 [allreduce_fusion]: 4.22e-06 [matmul_add_comm_reduction]: 1.199e-05 [allreduce_slice_to_reducescatter]: 7.99977e-07 [virtual_shard_identity]: 1.35e-05 [virtual_dataset]: 7.38999e-06 [get_grad_eliminate_]: 6.58e-06 [virtual_output]: 6.69999e-06 [merge_forward]: 5.72001e-06 [cell_reuse_recompute_pass]: 1.72001e-06 [offload_activation]: 1.218e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.925e-05 [merge_recompute_call_nodes]: 2.26998e-06 [before_grad]: 1.295e-05 [set_forward_comm_id_for_comm_node_pass]: 5.09e-06 [meta_fg_expand]: 3.51999e-06 [flash_sp_send_recv_attached]: 3.21001e-06 [receive_attached]: 3.10998e-06 [after_resolve]: 1.427e-05 [a_after_grad]: 1.102e-05 [renormalize]: 0.00100884 [add_forward_monad_depend]: 1.47e-05 [auto_monad_grad]: 3.31999e-06 [auto_monad_eliminator]: 2.773e-05 [cse]: 3.609e-05 [a_3]: 6.159e-05 [Cycle 2]: 0.00084164, [45] [expand_dump_flag]: 3.33e-06 [switch_simplify]: 9.51003e-06 [loop_unroll]: 6.56e-06 [a_1]: 0.00013518 [with_stream_mark]: 2.051e-05 [recompute_prepare]: 9.29998e-06 [updatestate_depend_eliminate]: 4.37e-06 [updatestate_assign_eliminate]: 3.41001e-06 [updatestate_loads_eliminate]: 3.68999e-06 [parameter_eliminate]: 1.87999e-06 [a_2]: 7.848e-05 [accelerated_algorithm]: 8.39002e-06 [shard]: 2.32999e-06 [meta_shard_fg_expand]: 2.04e-06 [shard_inline]: 6.98e-06 [merge_send_recv]: 9.14e-06 [auto_parallel]: 1.024e-05 [parallel]: 9.81e-06 [flash_sp]: 4.56002e-06 [merge_comm]: 4.63999e-06 [allreduce_fusion]: 3.55e-06 [matmul_add_comm_reduction]: 1.188e-05 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 1.09e-05 [virtual_dataset]: 6.69001e-06 [get_grad_eliminate_]: 5.25001e-06 [virtual_output]: 6.12999e-06 [merge_forward]: 6.31998e-06 [cell_reuse_recompute_pass]: 2.98e-06 [offload_activation]: 1.201e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.931e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 1.248e-05 [set_forward_comm_id_for_comm_node_pass]: 7.15e-06 [meta_fg_expand]: 2.88e-06 [flash_sp_send_recv_attached]: 2.06003e-06 [receive_attached]: 2.48e-06 [after_resolve]: 1.467e-05 [a_after_grad]: 9.28997e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 5.64998e-06 [auto_monad_grad]: 2.86e-06 [auto_monad_eliminator]: 1.447e-05 [cse]: 2.806e-05 [a_3]: 4.254e-05 [py_interpret_to_execute_after_opt_a]: 1.726e-05 [slice_cell_reuse_recomputed_activation]: 2.63e-06 [rewriter_after_opt_a]: 5.649e-05 [convert_after_rewriter]: 8.69e-06 [order_py_execute_after_rewriter]: 7.03998e-06 [mutable_eliminate]: 0.000821 [opt_b]: 0.00024903, [1] [Cycle 1]: 0.00023778, [7] [b_1]: 0.00012344 [b_2]: 9.20001e-06 [updatestate_depend_eliminate]: 1.283e-05 [updatestate_assign_eliminate]: 3.66001e-06 [updatestate_loads_eliminate]: 3.35e-06 [renormalize]: 9.5999e-07 [cse]: 3.631e-05 [optimize_parallel_all_gather_comm]: 2.632e-05 [overlap_param_gather]: 2.94001e-06 [cconv]: 4.061e-05 [loop_unroll]: 0.00078934 [opt_after_cconv]: 0.00014537, [1] [Cycle 1]: 0.0001352, [7] [c_1]: 3.308e-05 [parameter_eliminate]: 6.50997e-06 [updatestate_depend_eliminate]: 1.177e-05 [updatestate_assign_eliminate]: 3.76999e-06 [updatestate_loads_eliminate]: 3.03e-06 [cse]: 3.518e-05 [renormalize]: 5.09986e-07 [remove_dup_value]: 2.096e-05 [tuple_transform]: 9.198e-05, [1] [Cycle 1]: 8.605e-05, [4] [d_1]: 5.378e-05 [none_parameter_eliminate]: 1.92999e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 7.4e-06 [partial_unused_args_eliminate]: 2.21e-06 [add_recomputation]: 7.126e-05 [cse_after_recomputation]: 3.093e-05, [1] [Cycle 1]: 2.42e-05, [1] [cse]: 1.663e-05 [environ_conv]: 1.116e-05 [swap_dp_allreduce_reducescatter]: 6.57002e-06 [bias_add_comm_swap]: 3.7e-06 [label_micro_interleaved_index]: 7.68001e-06 [label_fine_grained_interleaved_index]: 3.09001e-06 [merge_cast_opt]: 1.36002e-06 [slice_recompute_activation]: 2.73e-06 [micro_interleaved_order_control]: 2.84001e-06 [assign_add_opt]: 2.34999e-06 [ForceFp32Comm]: 9.60019e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.66999e-06 [reorder_send_recv_between_fp_bp]: 3.03998e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 1.06002e-06 [interleave_split_concat_branches]: 1.50999e-06 [interleave_parallel_branches]: 1.20001e-06 [overlap_opt_shard_in_pipeline]: 1.84e-06 [overlap_opt_shard_grad_in_pipeline]: 2.02001e-06 [control_data_broadcast_order]: 2.285e-05 [grouped_pairwise_exchange_alltoall]: 1.84e-06 [offloading_packed_experts]: 5.20999e-06 [overlap_recompute_and_grad_model_parallel]: 6.74999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.64e-06 [overlap_recompute_comm]: 2.61e-06 [overlap_grad_ring_attention]: 6.38e-06 [overlap_grad_flash_sp]: 2.799e-05 [begin_end_overlap_inline]: 6.89994e-07 [split_matmul_comm_elemetwise]: 2.91e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 0.00011528, [1] [Cycle 1]: 0.0001073, [6] [build]: 5.49998e-06 [elim_shapecalc]: 1.912e-05 [elim_not_effective]: 1.743e-05 [opt_reshape]: 8.43001e-06 [fold_const_symbol]: 1.076e-05 [renormalize]: 1.80007e-07 [detach_backward]: 2.85002e-06 [pipeline_parallel_scheduler]: 1.99e-06 [auto_monad_reorder]: 2.558e-05 [get_jit_bprop_graph]: 2.81e-06 [rewriter_after_jit_bprop_graph]: 0.00020792 [opt_after_jit_grad]: 0.00072525 [validate]: 5.448e-05 [backend_pass]: 1.32e-06 [task_emit]: 0.0204879 [execute]: 1.183e-05 Sums bootstrap : 0.000704s : 1.25% type_inference : 0.028474s : 50.70% event_method : 0.000021s : 0.04% auto_monad : 0.000075s : 0.13% graph_reusing : 0.000007s : 0.01% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000024s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000039s : 0.07% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000034s : 0.06% optimize.rewriter_before_opt_a : 0.000082s : 0.15% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000047s : 0.08% optimize.opt_a.loop_unroll : 0.000029s : 0.05% optimize.opt_a.a_1 : 0.000676s : 1.20% optimize.opt_a.with_stream_mark : 0.000068s : 0.12% optimize.opt_a.recompute_prepare : 0.000026s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000010s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000167s : 0.30% optimize.opt_a.accelerated_algorithm : 0.000016s : 0.03% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.01% optimize.opt_a.shard_inline : 0.000014s : 0.03% optimize.opt_a.merge_send_recv : 0.000019s : 0.03% optimize.opt_a.auto_parallel : 0.000019s : 0.03% optimize.opt_a.parallel : 0.000042s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.03% optimize.opt_a.merge_comm : 0.000010s : 0.02% optimize.opt_a.allreduce_fusion : 0.000008s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000024s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000024s : 0.04% optimize.opt_a.virtual_dataset : 0.000014s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.02% optimize.opt_a.virtual_output : 0.000013s : 0.02% optimize.opt_a.merge_forward : 0.000012s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.01% optimize.opt_a.offload_activation : 0.000024s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000039s : 0.07% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000025s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000012s : 0.02% optimize.opt_a.meta_fg_expand : 0.000006s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000006s : 0.01% optimize.opt_a.after_resolve : 0.000029s : 0.05% optimize.opt_a.a_after_grad : 0.000020s : 0.04% optimize.opt_a.renormalize : 0.001009s : 1.80% optimize.opt_a.add_forward_monad_depend : 0.000020s : 0.04% optimize.opt_a.auto_monad_grad : 0.000006s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000042s : 0.08% optimize.opt_a.cse : 0.000064s : 0.11% optimize.opt_a.a_3 : 0.000104s : 0.19% optimize.py_interpret_to_execute_after_opt_a : 0.000017s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000056s : 0.10% optimize.convert_after_rewriter : 0.000009s : 0.02% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000821s : 1.46% optimize.opt_b.b_1 : 0.000123s : 0.22% optimize.opt_b.b_2 : 0.000009s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000013s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000036s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.05% optimize.overlap_param_gather : 0.000003s : 0.01% optimize.cconv : 0.000041s : 0.07% optimize.loop_unroll : 0.000789s : 1.41% optimize.opt_after_cconv.c_1 : 0.000033s : 0.06% optimize.opt_after_cconv.parameter_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000012s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000035s : 0.06% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000021s : 0.04% optimize.tuple_transform.d_1 : 0.000054s : 0.10% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000071s : 0.13% optimize.cse_after_recomputation.cse : 0.000017s : 0.03% optimize.environ_conv : 0.000011s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.01% optimize.bias_add_comm_swap : 0.000004s : 0.01% optimize.label_micro_interleaved_index : 0.000008s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000023s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000007s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.01% optimize.overlap_grad_flash_sp : 0.000028s : 0.05% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000005s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000019s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000017s : 0.03% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.05% get_jit_bprop_graph : 0.000003s : 0.01% rewriter_after_jit_bprop_graph : 0.000208s : 0.37% opt_after_jit_grad : 0.000725s : 1.29% validate : 0.000054s : 0.10% backend_pass : 0.000001s : 0.00% task_emit : 0.020488s : 36.48% execute : 0.000012s : 0.02% Time group info: ------[substitution.] 0.000249 26 19.21% : 0.000048s : 5: substitution.arithmetic_simplify 0.99% : 0.000002s : 2: substitution.elim_not_effective 0.62% : 0.000002s : 2: substitution.fold_const_symbol 2.76% : 0.000007s : 3: substitution.graph_param_transform 65.22% : 0.000162s : 3: substitution.inline 1.89% : 0.000005s : 4: substitution.j_node_and_user_rematch 2.42% : 0.000006s : 4: substitution.remove_not_recompute_node 2.92% : 0.000007s : 2: substitution.replace_old_param 3.98% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.028376 2 95.85% : 0.027198s : 1: type_inference.infer 4.15% : 0.001178s : 1: type_inference.specialize ------[replace.] 0.000050 4 79.61% : 0.000039s : 3: replace.inline 20.39% : 0.000010s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000169 4 94.67% : 0.000160s : 3: match.inline 5.33% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000195 883 0.82% : 0.000002s : 9: predicate.accumulaten_eliminater 1.39% : 0.000003s : 3: predicate.ad_related_special_op_eliminate 0.47% : 0.000001s : 6: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.04% : 0.000004s : 15: predicate.arithmetic_simplify 0.80% : 0.000002s : 9: predicate.cast_eliminate 0.62% : 0.000001s : 6: predicate.check_bprop_eliminate 0.49% : 0.000001s : 6: predicate.compare_switch_simplify 0.24% : 0.000000s : 3: predicate.const_output_eliminate 0.62% : 0.000001s : 6: predicate.depend_value_elim 0.77% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.78% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.52% : 0.000001s : 3: predicate.elim_not_effective 1.14% : 0.000002s : 3: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 12: predicate.environ_add_const_eliminate 0.89% : 0.000002s : 12: predicate.environ_get_add_eliminate 0.94% : 0.000002s : 12: predicate.environ_get_depend_swap 1.74% : 0.000003s : 18: predicate.environ_get_eliminate 1.30% : 0.000003s : 12: predicate.environ_get_set_eliminate 1.08% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 13: predicate.float_depend_g_call 0.48% : 0.000001s : 6: predicate.float_environ_get_switch 0.68% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.72% : 0.000001s : 6: predicate.get_grad_eliminate 0.33% : 0.000001s : 3: predicate.graph_param_transform 0.55% : 0.000001s : 6: predicate.incorporate_call 0.49% : 0.000001s : 6: predicate.incorporate_call_switch 7.04% : 0.000014s : 40: predicate.inline 1.36% : 0.000003s : 6: predicate.inline_without_move 0.32% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.76% : 0.000001s : 6: predicate.less_batch_normalization 1.47% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.16% : 0.000004s : 25: predicate.load_eliminater 2.32% : 0.000005s : 3: predicate.loop_unroll_after_grad 1.83% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.60% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 6: predicate.merge_addn 0.49% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.58% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 2.33% : 0.000005s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.39% : 0.000001s : 3: predicate.parallel_virtual_node 1.47% : 0.000003s : 13: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.74% : 0.000001s : 9: predicate.print_const_string_wrapper 0.59% : 0.000001s : 6: predicate.reduce_all_const_elim 0.96% : 0.000002s : 9: predicate.reduce_eliminate 2.05% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 6: predicate.remove_not_recompute_node 1.60% : 0.000003s : 16: predicate.replace_applicator 0.55% : 0.000001s : 6: predicate.replace_old_param 0.64% : 0.000001s : 3: predicate.reset_defer_inline 0.89% : 0.000002s : 9: predicate.reshape_eliminate 0.56% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.53% : 0.000001s : 3: predicate.row_tensor_eliminate 1.13% : 0.000002s : 6: predicate.same_eliminate 0.48% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.13% : 0.000002s : 6: predicate.shard_identity_eliminate 0.78% : 0.000002s : 6: predicate.special_op_eliminate 0.82% : 0.000002s : 6: predicate.specialize_transform 1.55% : 0.000003s : 6: predicate.split_environ_get_set_with_tuple_value 0.98% : 0.000002s : 6: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.17% : 0.000002s : 13: predicate.switch_defer_inline 1.68% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.30% : 0.000008s : 43: predicate.switch_simplify 0.80% : 0.000002s : 9: predicate.tile_eliminate 0.80% : 0.000002s : 9: predicate.transpose_eliminate 1.72% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.43% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.22% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.57% : 0.000007s : 22: predicate.tuple_list_get_item_eliminator 1.33% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.54% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 1.94% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.67% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 3: predicate.value_based_eliminate 0.60% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.65% : 0.000001s : 6: predicate.virtual_output_eliminate 0.25% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000763 8 37.98% : 0.000290s : 3: func_graph_cloner_run.FuncGraphClonerGraph 62.02% : 0.000473s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.101416 196 0.00% : 0.000004s : 1: ForceFp32Comm 17.31% : 0.017559s : 1: add_attr 17.29% : 0.017539s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000078s : 1: add_recomputation 0.01% : 0.000006s : 1: assign_add_opt 0.08% : 0.000081s : 1: auto_monad 0.03% : 0.000032s : 1: auto_monad_reorder 0.01% : 0.000008s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000007s : 1: bias_add_comm_swap 0.73% : 0.000740s : 1: bootstrap 0.05% : 0.000046s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000028s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000007s : 1: detach_backward 0.02% : 0.000016s : 1: environ_conv 0.03% : 0.000029s : 1: event_method 0.02% : 0.000020s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000007s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000007s : 1: inline 0.01% : 0.000008s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000011s : 1: label_micro_interleaved_index 0.79% : 0.000805s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.83% : 0.000843s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.03% : 0.000031s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000027s : 1: opt.transform.mutable_eliminate 1.12% : 0.001138s : 78: opt.transform.opt_a 0.03% : 0.000031s : 1: opt.transform.opt_after_cconv 0.03% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000098s : 28: opt.transform.opt_b 0.06% : 0.000058s : 2: opt.transform.opt_trans_graph 0.05% : 0.000051s : 4: opt.transform.symbol_engine_opt 3.30% : 0.003351s : 1: opt_a 0.15% : 0.000150s : 1: opt_after_cconv 0.73% : 0.000745s : 1: opt_after_jit_grad 0.25% : 0.000253s : 1: opt_b 6.32% : 0.006412s : 1: optimize 0.03% : 0.000031s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.03% : 0.000033s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000010s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000006s : 1: partial_unused_args_eliminate 0.01% : 0.000006s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000044s : 1: pre_auto_parallel 0.04% : 0.000038s : 1: py_interpret_to_execute 0.02% : 0.000021s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000026s : 1: remove_dup_value 0.53% : 0.000542s : 1: renormalize.infer 0.45% : 0.000453s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.22% : 0.000220s : 1: rewriter_after_jit_bprop_graph 0.07% : 0.000067s : 1: rewriter_after_opt_a 0.09% : 0.000087s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000006s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000119s : 1: symbol_engine_optimizer 20.23% : 0.020512s : 1: task_emit 0.10% : 0.000097s : 1: tuple_transform 28.10% : 0.028503s : 1: type_inference 0.10% : 0.000103s : 1: validate TotalTime = 0.0384396, [24] [bootstrap]: 0.00049224 [type_inference]: 0.00716145 [event_method]: 1.681e-05 [auto_monad]: 7.206e-05 [graph_reusing]: 6.09001e-06 [inline]: 2.85998e-06 [add_attr]: 0.0160418, [1] [add_attr_with_inline]: 0.0160253, [1] [Cycle 1]: 7.884e-05, [2] [tag_attr]: 2.008e-05 [meta_addattr_fg_expand]: 5.10001e-06 [parallel-infer-symbol]: 4.22998e-06 [pre_auto_parallel]: 3.731e-05 [insert-virtual-dataset]: 3.11001e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.41e-06 [pipeline_split]: 1.77001e-06 [optimize]: 0.00551892, [53] [py_interpret_to_execute]: 2.717e-05 [rewriter_before_opt_a]: 6.122e-05 [opt_a]: 0.00281358, [2] [Cycle 1]: 0.00205305, [45] [expand_dump_flag]: 3.26001e-06 [switch_simplify]: 3.16e-05 [loop_unroll]: 1.862e-05 [a_1]: 0.00043231 [with_stream_mark]: 2.093e-05 [recompute_prepare]: 1.197e-05 [updatestate_depend_eliminate]: 4.35e-06 [updatestate_assign_eliminate]: 3.9e-06 [updatestate_loads_eliminate]: 3.41999e-06 [parameter_eliminate]: 3.04999e-06 [a_2]: 9.165e-05 [accelerated_algorithm]: 8.42e-06 [shard]: 2.57001e-06 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 7.08998e-06 [merge_send_recv]: 1.034e-05 [auto_parallel]: 9.00999e-06 [parallel]: 2.292e-05 [flash_sp]: 1.096e-05 [merge_comm]: 5.56e-06 [allreduce_fusion]: 3.63e-06 [matmul_add_comm_reduction]: 1.182e-05 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 1.265e-05 [virtual_dataset]: 6.88e-06 [get_grad_eliminate_]: 6.02001e-06 [virtual_output]: 6.19001e-06 [merge_forward]: 4.27e-06 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 1.099e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.685e-05 [merge_recompute_call_nodes]: 2.12999e-06 [before_grad]: 1.226e-05 [set_forward_comm_id_for_comm_node_pass]: 5.64e-06 [meta_fg_expand]: 4.11001e-06 [flash_sp_send_recv_attached]: 3.4e-06 [receive_attached]: 2.61999e-06 [after_resolve]: 1.38e-05 [a_after_grad]: 1.008e-05 [renormalize]: 0.00079368 [add_forward_monad_depend]: 9.25999e-06 [auto_monad_grad]: 2.64001e-06 [auto_monad_eliminator]: 2.023e-05 [cse]: 3.634e-05 [a_3]: 5.523e-05 [Cycle 2]: 0.00074447, [45] [expand_dump_flag]: 3.03e-06 [switch_simplify]: 8.45999e-06 [loop_unroll]: 5.92001e-06 [a_1]: 0.00013796 [with_stream_mark]: 1.678e-05 [recompute_prepare]: 6.61999e-06 [updatestate_depend_eliminate]: 3.93001e-06 [updatestate_assign_eliminate]: 3.01001e-06 [updatestate_loads_eliminate]: 3.74002e-06 [parameter_eliminate]: 1.34998e-06 [a_2]: 7.527e-05 [accelerated_algorithm]: 7.26001e-06 [shard]: 1.94e-06 [meta_shard_fg_expand]: 2.31e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 8.25e-06 [auto_parallel]: 9.46e-06 [parallel]: 7.83001e-06 [flash_sp]: 4.17e-06 [merge_comm]: 4e-06 [allreduce_fusion]: 3.46999e-06 [matmul_add_comm_reduction]: 9.45001e-06 [allreduce_slice_to_reducescatter]: 8.89995e-07 [virtual_shard_identity]: 9.04e-06 [virtual_dataset]: 6.12999e-06 [get_grad_eliminate_]: 5.94999e-06 [virtual_output]: 5.76e-06 [merge_forward]: 4.45e-06 [cell_reuse_recompute_pass]: 2.11e-06 [offload_activation]: 9.67001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.385e-05 [merge_recompute_call_nodes]: 1.68002e-06 [before_grad]: 1.012e-05 [set_forward_comm_id_for_comm_node_pass]: 5.62999e-06 [meta_fg_expand]: 2.69001e-06 [flash_sp_send_recv_attached]: 1.94e-06 [receive_attached]: 2.16998e-06 [after_resolve]: 1.232e-05 [a_after_grad]: 8.86002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 2.43e-06 [auto_monad_grad]: 2.24999e-06 [auto_monad_eliminator]: 8.52e-06 [cse]: 2.046e-05 [a_3]: 3.649e-05 [py_interpret_to_execute_after_opt_a]: 1.42e-05 [slice_cell_reuse_recomputed_activation]: 2.37999e-06 [rewriter_after_opt_a]: 4.379e-05 [convert_after_rewriter]: 7.66999e-06 [order_py_execute_after_rewriter]: 6.16e-06 [mutable_eliminate]: 0.00082653 [opt_b]: 0.00023545, [1] [Cycle 1]: 0.00022514, [7] [b_1]: 0.00012418 [b_2]: 9.37999e-06 [updatestate_depend_eliminate]: 9.31e-06 [updatestate_assign_eliminate]: 3.62998e-06 [updatestate_loads_eliminate]: 2.84001e-06 [renormalize]: 9.39996e-07 [cse]: 3.301e-05 [optimize_parallel_all_gather_comm]: 2.16e-05 [overlap_param_gather]: 2.04e-06 [cconv]: 3.576e-05 [loop_unroll]: 0.00060429 [opt_after_cconv]: 0.00012844, [1] [Cycle 1]: 0.00011981, [7] [c_1]: 3.204e-05 [parameter_eliminate]: 5.69e-06 [updatestate_depend_eliminate]: 8.97e-06 [updatestate_assign_eliminate]: 2.98003e-06 [updatestate_loads_eliminate]: 2.60002e-06 [cse]: 2.633e-05 [renormalize]: 6.00005e-07 [remove_dup_value]: 1.874e-05 [tuple_transform]: 8.506e-05, [1] [Cycle 1]: 7.895e-05, [4] [d_1]: 4.782e-05 [none_parameter_eliminate]: 1.97001e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 7.67998e-06 [partial_unused_args_eliminate]: 2.17999e-06 [add_recomputation]: 5.983e-05 [cse_after_recomputation]: 2.836e-05, [1] [Cycle 1]: 2.295e-05, [1] [cse]: 1.663e-05 [environ_conv]: 7.17002e-06 [swap_dp_allreduce_reducescatter]: 7.31001e-06 [bias_add_comm_swap]: 3.54002e-06 [label_micro_interleaved_index]: 7.66999e-06 [label_fine_grained_interleaved_index]: 2.68998e-06 [merge_cast_opt]: 1.52001e-06 [slice_recompute_activation]: 2.68998e-06 [micro_interleaved_order_control]: 2.73e-06 [assign_add_opt]: 1.61998e-06 [ForceFp32Comm]: 1.03001e-06 [remove_cast_before_assign_add]: 8.70001e-07 [full_micro_interleaved_order_control]: 2.82002e-06 [reorder_send_recv_between_fp_bp]: 3.62998e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 1.25999e-06 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.42e-06 [overlap_opt_shard_in_pipeline]: 1.43002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.806e-05 [grouped_pairwise_exchange_alltoall]: 1.92001e-06 [offloading_packed_experts]: 5.47001e-06 [overlap_recompute_and_grad_model_parallel]: 6.66e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.80001e-06 [overlap_recompute_comm]: 2.54001e-06 [overlap_grad_ring_attention]: 4.94e-06 [overlap_grad_flash_sp]: 2.432e-05 [begin_end_overlap_inline]: 7.49977e-07 [split_matmul_comm_elemetwise]: 2.48e-06 [split_layernorm_comm]: 2.14999e-06 [handle_group_info]: 1.15999e-06 [symbol_engine_optimizer]: 9.378e-05, [1] [Cycle 1]: 8.787e-05, [6] [build]: 4.60999e-06 [elim_shapecalc]: 1.55e-05 [elim_not_effective]: 1.416e-05 [opt_reshape]: 8.08001e-06 [fold_const_symbol]: 1.018e-05 [renormalize]: 1.8999e-07 [detach_backward]: 2.76e-06 [pipeline_parallel_scheduler]: 1.79998e-06 [auto_monad_reorder]: 2.059e-05 [get_jit_bprop_graph]: 2.00002e-06 [rewriter_after_jit_bprop_graph]: 5.41002e-06 [opt_after_jit_grad]: 0.0007065 [validate]: 4.955e-05 [backend_pass]: 1.39003e-06 [task_emit]: 0.00799012 [execute]: 1.044e-05 Sums bootstrap : 0.000492s : 2.33% type_inference : 0.007161s : 33.90% event_method : 0.000017s : 0.08% auto_monad : 0.000072s : 0.34% graph_reusing : 0.000006s : 0.03% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000020s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000037s : 0.18% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000027s : 0.13% optimize.rewriter_before_opt_a : 0.000061s : 0.29% optimize.opt_a.expand_dump_flag : 0.000006s : 0.03% optimize.opt_a.switch_simplify : 0.000040s : 0.19% optimize.opt_a.loop_unroll : 0.000025s : 0.12% optimize.opt_a.a_1 : 0.000570s : 2.70% optimize.opt_a.with_stream_mark : 0.000038s : 0.18% optimize.opt_a.recompute_prepare : 0.000019s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.03% optimize.opt_a.parameter_eliminate : 0.000004s : 0.02% optimize.opt_a.a_2 : 0.000167s : 0.79% optimize.opt_a.accelerated_algorithm : 0.000016s : 0.07% optimize.opt_a.shard : 0.000005s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.06% optimize.opt_a.merge_send_recv : 0.000019s : 0.09% optimize.opt_a.auto_parallel : 0.000018s : 0.09% optimize.opt_a.parallel : 0.000031s : 0.15% optimize.opt_a.flash_sp : 0.000015s : 0.07% optimize.opt_a.merge_comm : 0.000010s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000021s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000022s : 0.10% optimize.opt_a.virtual_dataset : 0.000013s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.06% optimize.opt_a.virtual_output : 0.000012s : 0.06% optimize.opt_a.merge_forward : 0.000009s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.02% optimize.opt_a.offload_activation : 0.000021s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000031s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.02% optimize.opt_a.before_grad : 0.000022s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000011s : 0.05% optimize.opt_a.meta_fg_expand : 0.000007s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.03% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000026s : 0.12% optimize.opt_a.a_after_grad : 0.000019s : 0.09% optimize.opt_a.renormalize : 0.000794s : 3.76% optimize.opt_a.add_forward_monad_depend : 0.000012s : 0.06% optimize.opt_a.auto_monad_grad : 0.000005s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000029s : 0.14% optimize.opt_a.cse : 0.000057s : 0.27% optimize.opt_a.a_3 : 0.000092s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000044s : 0.21% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000827s : 3.91% optimize.opt_b.b_1 : 0.000124s : 0.59% optimize.opt_b.b_2 : 0.000009s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000033s : 0.16% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000036s : 0.17% optimize.loop_unroll : 0.000604s : 2.86% optimize.opt_after_cconv.c_1 : 0.000032s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000026s : 0.12% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000019s : 0.09% optimize.tuple_transform.d_1 : 0.000048s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000060s : 0.28% optimize.cse_after_recomputation.cse : 0.000017s : 0.08% optimize.environ_conv : 0.000007s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.03% optimize.bias_add_comm_swap : 0.000004s : 0.02% optimize.label_micro_interleaved_index : 0.000008s : 0.04% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000004s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.09% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000007s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000005s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000016s : 0.07% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000021s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.03% opt_after_jit_grad : 0.000707s : 3.34% validate : 0.000050s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.007990s : 37.82% execute : 0.000010s : 0.05% Time group info: ------[substitution.] 0.000203 24 21.29% : 0.000043s : 4: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000002s : 2: substitution.fold_const_symbol 3.31% : 0.000007s : 3: substitution.graph_param_transform 65.75% : 0.000134s : 3: substitution.inline 2.28% : 0.000005s : 4: substitution.j_node_and_user_rematch 2.93% : 0.000006s : 4: substitution.remove_not_recompute_node 2.55% : 0.000005s : 2: substitution.replace_old_param ------[type_inference.] 0.007093 2 91.07% : 0.006460s : 1: type_inference.infer 8.93% : 0.000634s : 1: type_inference.specialize ------[replace.] 0.000035 3 100.00% : 0.000035s : 3: replace.inline ------[match.] 0.000131 3 100.00% : 0.000131s : 3: match.inline ------[predicate.] 0.000181 815 0.86% : 0.000002s : 8: predicate.accumulaten_eliminater 1.24% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 6: predicate.addn_check_dump 0.75% : 0.000001s : 8: predicate.addn_zero_filter 0.73% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.28% : 0.000004s : 14: predicate.arithmetic_simplify 1.08% : 0.000002s : 8: predicate.cast_eliminate 0.70% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.17% : 0.000000s : 3: predicate.const_output_eliminate 0.58% : 0.000001s : 6: predicate.depend_value_elim 0.72% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.83% : 0.000001s : 8: predicate.dict_get_item_eliminator 1.01% : 0.000002s : 8: predicate.dict_set_item_eliminator 1.53% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.20% : 0.000000s : 3: predicate.elim_not_effective 0.57% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.45% : 0.000003s : 11: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.01% : 0.000002s : 11: predicate.environ_get_depend_swap 1.67% : 0.000003s : 17: predicate.environ_get_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.02% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.00% : 0.000004s : 11: predicate.float_depend_g_call 0.54% : 0.000001s : 6: predicate.float_environ_get_switch 0.81% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.18% : 0.000000s : 3: predicate.fold_const_symbol 0.84% : 0.000002s : 6: predicate.get_grad_eliminate 0.35% : 0.000001s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.51% : 0.000001s : 6: predicate.incorporate_call_switch 5.79% : 0.000010s : 37: predicate.inline 0.83% : 0.000002s : 6: predicate.inline_without_move 0.36% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.14% : 0.000002s : 6: predicate.less_batch_normalization 1.69% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.06% : 0.000004s : 22: predicate.load_eliminater 1.98% : 0.000004s : 3: predicate.loop_unroll_after_grad 1.88% : 0.000003s : 18: predicate.loop_unroll_before_grad 2.11% : 0.000004s : 14: predicate.make_slice_get_slice_eliminator 0.51% : 0.000001s : 6: predicate.merge_addn 0.62% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 8: predicate.minmaximum_grad 1.66% : 0.000003s : 3: predicate.mutable_eliminate 0.71% : 0.000001s : 3: predicate.opt_reshape 0.42% : 0.000001s : 3: predicate.parallel_virtual_node 1.35% : 0.000002s : 11: predicate.partial_defer_inline 1.10% : 0.000002s : 11: predicate.partial_eliminate 0.79% : 0.000001s : 8: predicate.print_const_string_wrapper 0.72% : 0.000001s : 6: predicate.reduce_all_const_elim 1.30% : 0.000002s : 8: predicate.reduce_eliminate 2.27% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.65% : 0.000001s : 6: predicate.remove_not_recompute_node 1.04% : 0.000002s : 14: predicate.replace_applicator 0.84% : 0.000002s : 6: predicate.replace_old_param 0.27% : 0.000000s : 3: predicate.reset_defer_inline 0.95% : 0.000002s : 8: predicate.reshape_eliminate 0.54% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.94% : 0.000002s : 6: predicate.same_eliminate 0.40% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.29% : 0.000002s : 6: predicate.shard_identity_eliminate 0.83% : 0.000002s : 6: predicate.special_op_eliminate 0.88% : 0.000002s : 6: predicate.specialize_transform 1.35% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000002s : 6: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.18% : 0.000002s : 11: predicate.switch_defer_inline 1.73% : 0.000003s : 17: predicate.switch_layer_defer_inline 3.92% : 0.000007s : 38: predicate.switch_simplify 1.13% : 0.000002s : 8: predicate.tile_eliminate 0.86% : 0.000002s : 8: predicate.transpose_eliminate 1.64% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.45% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000003s : 14: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000006s : 20: predicate.tuple_list_get_item_eliminator 1.58% : 0.000003s : 14: predicate.tuple_list_get_set_item_eliminator 2.38% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.49% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.09% : 0.000004s : 22: predicate.updatestate_pure_node_eliminater 3.09% : 0.000006s : 28: predicate.updatestate_useless_node_eliminater 0.34% : 0.000001s : 3: predicate.value_based_eliminate 0.94% : 0.000002s : 6: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.39% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000442 7 34.58% : 0.000153s : 2: func_graph_cloner_run.FuncGraphClonerGraph 65.42% : 0.000289s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061951 196 0.01% : 0.000004s : 1: ForceFp32Comm 25.91% : 0.016051s : 1: add_attr 25.87% : 0.016030s : 1: add_attr_with_inline 0.01% : 0.000005s : 1: add_comm_op_reuse_tag 0.11% : 0.000066s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.13% : 0.000079s : 1: auto_monad 0.04% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000007s : 1: bias_add_comm_swap 0.85% : 0.000525s : 1: bootstrap 0.06% : 0.000040s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000022s : 1: control_data_broadcast_order 0.02% : 0.000011s : 1: convert_after_rewriter 0.05% : 0.000032s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000007s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.04% : 0.000024s : 1: event_method 0.03% : 0.000018s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.02% : 0.000010s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000014s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000005s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000011s : 1: label_micro_interleaved_index 1.00% : 0.000617s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 1.36% : 0.000841s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.03% : 0.000019s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000023s : 1: opt.transform.mutable_eliminate 1.60% : 0.000988s : 78: opt.transform.opt_a 0.05% : 0.000030s : 1: opt.transform.opt_after_cconv 0.05% : 0.000031s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000098s : 28: opt.transform.opt_b 0.09% : 0.000053s : 2: opt.transform.opt_trans_graph 0.07% : 0.000044s : 4: opt.transform.symbol_engine_opt 4.55% : 0.002818s : 1: opt_a 0.21% : 0.000132s : 1: opt_after_cconv 1.16% : 0.000721s : 1: opt_after_jit_grad 0.39% : 0.000240s : 1: opt_b 8.92% : 0.005525s : 1: optimize 0.04% : 0.000027s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000009s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.07% : 0.000043s : 1: pre_auto_parallel 0.05% : 0.000032s : 1: py_interpret_to_execute 0.03% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.04% : 0.000024s : 1: remove_dup_value 0.68% : 0.000421s : 1: renormalize.infer 0.58% : 0.000360s : 1: renormalize.specialize 0.01% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.11% : 0.000067s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000006s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000097s : 1: symbol_engine_optimizer 12.93% : 0.008013s : 1: task_emit 0.14% : 0.000089s : 1: tuple_transform 11.60% : 0.007189s : 1: type_inference 0.15% : 0.000095s : 1: validate TotalTime = 0.0694165, [24] [bootstrap]: 0.00048198 [type_inference]: 0.0065678 [event_method]: 1.513e-05 [auto_monad]: 6.462e-05 [graph_reusing]: 6.06e-06 [inline]: 2.64999e-06 [add_attr]: 0.0164403, [1] [add_attr_with_inline]: 0.0164262, [1] [Cycle 1]: 8.227e-05, [2] [tag_attr]: 2.493e-05 [meta_addattr_fg_expand]: 5.18002e-06 [parallel-infer-symbol]: 4.66002e-06 [pre_auto_parallel]: 3.899e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 1.19e-06 [dataset_repeat_opt]: 2.38998e-06 [pipeline_split]: 1.77999e-06 [optimize]: 0.00551163, [53] [py_interpret_to_execute]: 3.397e-05 [rewriter_before_opt_a]: 0.00013125 [opt_a]: 0.00297939, [2] [Cycle 1]: 0.00222314, [45] [expand_dump_flag]: 3.89002e-06 [switch_simplify]: 4.042e-05 [loop_unroll]: 2.33e-05 [a_1]: 0.00053071 [with_stream_mark]: 2.389e-05 [recompute_prepare]: 1.232e-05 [updatestate_depend_eliminate]: 4.39002e-06 [updatestate_assign_eliminate]: 3.87002e-06 [updatestate_loads_eliminate]: 3.09999e-06 [parameter_eliminate]: 2.23002e-06 [a_2]: 8.596e-05 [accelerated_algorithm]: 8.43999e-06 [shard]: 2.49999e-06 [meta_shard_fg_expand]: 2.12001e-06 [shard_inline]: 7.07002e-06 [merge_send_recv]: 1.083e-05 [auto_parallel]: 8.18001e-06 [parallel]: 2.189e-05 [flash_sp]: 9.99999e-06 [merge_comm]: 4.82998e-06 [allreduce_fusion]: 3.93001e-06 [matmul_add_comm_reduction]: 1.104e-05 [allreduce_slice_to_reducescatter]: 7.59988e-07 [virtual_shard_identity]: 1.046e-05 [virtual_dataset]: 7.08998e-06 [get_grad_eliminate_]: 6.68998e-06 [virtual_output]: 7.7e-06 [merge_forward]: 4.17e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 1.188e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.553e-05 [merge_recompute_call_nodes]: 1.52999e-06 [before_grad]: 1.262e-05 [set_forward_comm_id_for_comm_node_pass]: 4.64998e-06 [meta_fg_expand]: 3.33e-06 [flash_sp_send_recv_attached]: 2.98998e-06 [receive_attached]: 2.27001e-06 [after_resolve]: 1.183e-05 [a_after_grad]: 9.49e-06 [renormalize]: 0.0008649 [add_forward_monad_depend]: 8.32998e-06 [auto_monad_grad]: 3.08e-06 [auto_monad_eliminator]: 1.985e-05 [cse]: 3.203e-05 [a_3]: 5.319e-05 [Cycle 2]: 0.00074058, [45] [expand_dump_flag]: 1.89999e-06 [switch_simplify]: 7.56999e-06 [loop_unroll]: 5.92999e-06 [a_1]: 0.00012957 [with_stream_mark]: 1.631e-05 [recompute_prepare]: 7.23e-06 [updatestate_depend_eliminate]: 3.93999e-06 [updatestate_assign_eliminate]: 3.26001e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 1.50001e-06 [a_2]: 7.348e-05 [accelerated_algorithm]: 6.41998e-06 [shard]: 1.73002e-06 [meta_shard_fg_expand]: 2.78e-06 [shard_inline]: 6.56999e-06 [merge_send_recv]: 8.28001e-06 [auto_parallel]: 7.37997e-06 [parallel]: 8.71997e-06 [flash_sp]: 3.66001e-06 [merge_comm]: 4.05998e-06 [allreduce_fusion]: 3.46999e-06 [matmul_add_comm_reduction]: 9.02999e-06 [allreduce_slice_to_reducescatter]: 5.19998e-07 [virtual_shard_identity]: 9.50001e-06 [virtual_dataset]: 6.07999e-06 [get_grad_eliminate_]: 5.34e-06 [virtual_output]: 5.40001e-06 [merge_forward]: 4.72e-06 [cell_reuse_recompute_pass]: 2.36998e-06 [offload_activation]: 1.117e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.562e-05 [merge_recompute_call_nodes]: 1.24e-06 [before_grad]: 1.036e-05 [set_forward_comm_id_for_comm_node_pass]: 4.94e-06 [meta_fg_expand]: 2.48e-06 [flash_sp_send_recv_attached]: 1.25001e-06 [receive_attached]: 1.59e-06 [after_resolve]: 1.088e-05 [a_after_grad]: 9.00999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 2.91999e-06 [auto_monad_grad]: 1.98002e-06 [auto_monad_eliminator]: 1.119e-05 [cse]: 2.062e-05 [a_3]: 3.848e-05 [py_interpret_to_execute_after_opt_a]: 1.36e-05 [slice_cell_reuse_recomputed_activation]: 2.49999e-06 [rewriter_after_opt_a]: 4.509e-05 [convert_after_rewriter]: 8.32e-06 [order_py_execute_after_rewriter]: 6.76999e-06 [mutable_eliminate]: 0.00073774 [opt_b]: 0.00027765, [1] [Cycle 1]: 0.00026869, [7] [b_1]: 0.00015459 [b_2]: 9.10999e-06 [updatestate_depend_eliminate]: 8.95001e-06 [updatestate_assign_eliminate]: 8.90001e-06 [updatestate_loads_eliminate]: 2.71e-06 [renormalize]: 7.00005e-07 [cse]: 2.55e-05 [optimize_parallel_all_gather_comm]: 2.06e-05 [overlap_param_gather]: 2.10002e-06 [cconv]: 3.35e-05 [loop_unroll]: 0.00050396 [opt_after_cconv]: 0.00010635, [1] [Cycle 1]: 9.853e-05, [7] [c_1]: 2.674e-05 [parameter_eliminate]: 5.52999e-06 [updatestate_depend_eliminate]: 6.48e-06 [updatestate_assign_eliminate]: 2.76e-06 [updatestate_loads_eliminate]: 3.11001e-06 [cse]: 1.864e-05 [renormalize]: 4.80009e-07 [remove_dup_value]: 1.744e-05 [tuple_transform]: 7.634e-05, [1] [Cycle 1]: 7.016e-05, [4] [d_1]: 4.252e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 2.70025e-07 [switch_simplify]: 6.50002e-06 [partial_unused_args_eliminate]: 2.00002e-06 [add_recomputation]: 4.914e-05 [cse_after_recomputation]: 2.348e-05, [1] [Cycle 1]: 1.814e-05, [1] [cse]: 1.247e-05 [environ_conv]: 6.69001e-06 [swap_dp_allreduce_reducescatter]: 5.97999e-06 [bias_add_comm_swap]: 2.85998e-06 [label_micro_interleaved_index]: 5.11002e-06 [label_fine_grained_interleaved_index]: 2.89999e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.27001e-06 [micro_interleaved_order_control]: 2.86999e-06 [assign_add_opt]: 1.49e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 7.89994e-07 [full_micro_interleaved_order_control]: 2.58e-06 [reorder_send_recv_between_fp_bp]: 3.13e-06 [comm_op_add_attrs]: 1.19998e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.29e-06 [overlap_opt_shard_in_pipeline]: 1.32e-06 [overlap_opt_shard_grad_in_pipeline]: 1.83002e-06 [control_data_broadcast_order]: 1.411e-05 [grouped_pairwise_exchange_alltoall]: 1.57999e-06 [offloading_packed_experts]: 4.33001e-06 [overlap_recompute_and_grad_model_parallel]: 5.48002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.55002e-06 [overlap_grad_ring_attention]: 5.54998e-06 [overlap_grad_flash_sp]: 2.357e-05 [begin_end_overlap_inline]: 7.39994e-07 [split_matmul_comm_elemetwise]: 2.43002e-06 [split_layernorm_comm]: 1.91003e-06 [handle_group_info]: 1.40001e-06 [symbol_engine_optimizer]: 7.743e-05, [1] [Cycle 1]: 7.267e-05, [6] [build]: 3.25998e-06 [elim_shapecalc]: 9.92999e-06 [elim_not_effective]: 1.324e-05 [opt_reshape]: 6.71999e-06 [fold_const_symbol]: 9.82999e-06 [renormalize]: 2.50002e-07 [detach_backward]: 2.51e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 1.801e-05 [get_jit_bprop_graph]: 2.24999e-06 [rewriter_after_jit_bprop_graph]: 5.87001e-06 [opt_after_jit_grad]: 0.0005034 [validate]: 4.634e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.0394265 [execute]: 9.84001e-06 Sums bootstrap : 0.000482s : 0.93% type_inference : 0.006568s : 12.69% event_method : 0.000015s : 0.03% auto_monad : 0.000065s : 0.12% graph_reusing : 0.000006s : 0.01% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000025s : 0.05% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000005s : 0.01% pre_auto_parallel : 0.000039s : 0.08% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000034s : 0.07% optimize.rewriter_before_opt_a : 0.000131s : 0.25% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000048s : 0.09% optimize.opt_a.loop_unroll : 0.000029s : 0.06% optimize.opt_a.a_1 : 0.000660s : 1.28% optimize.opt_a.with_stream_mark : 0.000040s : 0.08% optimize.opt_a.recompute_prepare : 0.000020s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000159s : 0.31% optimize.opt_a.accelerated_algorithm : 0.000015s : 0.03% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000005s : 0.01% optimize.opt_a.shard_inline : 0.000014s : 0.03% optimize.opt_a.merge_send_recv : 0.000019s : 0.04% optimize.opt_a.auto_parallel : 0.000016s : 0.03% optimize.opt_a.parallel : 0.000031s : 0.06% optimize.opt_a.flash_sp : 0.000014s : 0.03% optimize.opt_a.merge_comm : 0.000009s : 0.02% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000020s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000020s : 0.04% optimize.opt_a.virtual_dataset : 0.000013s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.02% optimize.opt_a.virtual_output : 0.000013s : 0.03% optimize.opt_a.merge_forward : 0.000009s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000023s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000031s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000023s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000010s : 0.02% optimize.opt_a.meta_fg_expand : 0.000006s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000023s : 0.04% optimize.opt_a.a_after_grad : 0.000018s : 0.04% optimize.opt_a.renormalize : 0.000865s : 1.67% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.02% optimize.opt_a.auto_monad_grad : 0.000005s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000031s : 0.06% optimize.opt_a.cse : 0.000053s : 0.10% optimize.opt_a.a_3 : 0.000092s : 0.18% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000045s : 0.09% optimize.convert_after_rewriter : 0.000008s : 0.02% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000738s : 1.43% optimize.opt_b.b_1 : 0.000155s : 0.30% optimize.opt_b.b_2 : 0.000009s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000009s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000025s : 0.05% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.04% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000034s : 0.06% optimize.loop_unroll : 0.000504s : 0.97% optimize.opt_after_cconv.c_1 : 0.000027s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000019s : 0.04% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000017s : 0.03% optimize.tuple_transform.d_1 : 0.000043s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.09% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000007s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.03% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.05% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.03% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.01% opt_after_jit_grad : 0.000503s : 0.97% validate : 0.000046s : 0.09% backend_pass : 0.000001s : 0.00% task_emit : 0.039427s : 76.18% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000242 26 18.01% : 0.000044s : 5: substitution.arithmetic_simplify 0.91% : 0.000002s : 2: substitution.elim_not_effective 0.59% : 0.000001s : 2: substitution.fold_const_symbol 2.84% : 0.000007s : 3: substitution.graph_param_transform 66.95% : 0.000162s : 3: substitution.inline 1.93% : 0.000005s : 4: substitution.j_node_and_user_rematch 2.22% : 0.000005s : 4: substitution.remove_not_recompute_node 1.73% : 0.000004s : 2: substitution.replace_old_param 4.81% : 0.000012s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006512 2 89.54% : 0.005830s : 1: type_inference.infer 10.46% : 0.000681s : 1: type_inference.specialize ------[replace.] 0.000046 4 79.88% : 0.000037s : 3: replace.inline 20.12% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000170 4 93.74% : 0.000160s : 3: match.inline 6.26% : 0.000011s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000175 883 0.96% : 0.000002s : 9: predicate.accumulaten_eliminater 0.77% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.52% : 0.000001s : 6: predicate.addn_check_dump 1.05% : 0.000002s : 9: predicate.addn_zero_filter 0.78% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.14% : 0.000004s : 15: predicate.arithmetic_simplify 0.88% : 0.000002s : 9: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.55% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.60% : 0.000001s : 6: predicate.depend_value_elim 0.95% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.89% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.04% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.38% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_depend_swap 1.69% : 0.000003s : 18: predicate.environ_get_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.41% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.79% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.67% : 0.000001s : 6: predicate.get_grad_eliminate 0.32% : 0.000001s : 3: predicate.graph_param_transform 0.64% : 0.000001s : 6: predicate.incorporate_call 0.53% : 0.000001s : 6: predicate.incorporate_call_switch 6.24% : 0.000011s : 40: predicate.inline 0.97% : 0.000002s : 6: predicate.inline_without_move 0.38% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.90% : 0.000002s : 6: predicate.less_batch_normalization 1.58% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.25% : 0.000004s : 25: predicate.load_eliminater 1.18% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.07% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.54% : 0.000001s : 6: predicate.merge_addn 0.63% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.58% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 9: predicate.minmaximum_grad 1.73% : 0.000003s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.66% : 0.000001s : 3: predicate.parallel_virtual_node 1.80% : 0.000003s : 13: predicate.partial_defer_inline 1.37% : 0.000002s : 13: predicate.partial_eliminate 0.85% : 0.000001s : 9: predicate.print_const_string_wrapper 0.58% : 0.000001s : 6: predicate.reduce_all_const_elim 1.20% : 0.000002s : 9: predicate.reduce_eliminate 2.29% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 1.12% : 0.000002s : 6: predicate.remove_not_recompute_node 1.80% : 0.000003s : 16: predicate.replace_applicator 0.69% : 0.000001s : 6: predicate.replace_old_param 0.27% : 0.000000s : 3: predicate.reset_defer_inline 1.09% : 0.000002s : 9: predicate.reshape_eliminate 0.55% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.98% : 0.000002s : 6: predicate.same_eliminate 0.60% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.73% : 0.000001s : 6: predicate.shard_identity_eliminate 0.90% : 0.000002s : 6: predicate.special_op_eliminate 0.75% : 0.000001s : 6: predicate.specialize_transform 1.05% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.73% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 13: predicate.switch_defer_inline 1.82% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.86% : 0.000009s : 43: predicate.switch_simplify 0.85% : 0.000001s : 9: predicate.tile_eliminate 0.86% : 0.000002s : 9: predicate.transpose_eliminate 1.47% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.45% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.56% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.16% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.08% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 3: predicate.value_based_eliminate 0.69% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000452 8 40.71% : 0.000184s : 3: func_graph_cloner_run.FuncGraphClonerGraph 59.29% : 0.000268s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.093469 196 0.00% : 0.000004s : 1: ForceFp32Comm 17.60% : 0.016447s : 1: add_attr 17.58% : 0.016431s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000070s : 1: auto_monad 0.02% : 0.000022s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.56% : 0.000520s : 1: bootstrap 0.04% : 0.000038s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000027s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000010s : 1: environ_conv 0.02% : 0.000022s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000005s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.55% : 0.000514s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.80% : 0.000752s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000020s : 1: opt.transform.mutable_eliminate 1.16% : 0.001086s : 78: opt.transform.opt_a 0.03% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000095s : 28: opt.transform.opt_b 0.05% : 0.000047s : 2: opt.transform.opt_trans_graph 0.04% : 0.000036s : 4: opt.transform.symbol_engine_opt 3.19% : 0.002983s : 1: opt_a 0.12% : 0.000110s : 1: opt_after_cconv 0.55% : 0.000515s : 1: opt_after_jit_grad 0.30% : 0.000281s : 1: opt_b 5.90% : 0.005517s : 1: optimize 0.03% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.03% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000006s : 1: pipeline_split 0.05% : 0.000045s : 1: pre_auto_parallel 0.04% : 0.000039s : 1: py_interpret_to_execute 0.02% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000022s : 1: remove_dup_value 0.50% : 0.000468s : 1: renormalize.infer 0.41% : 0.000386s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000051s : 1: rewriter_after_opt_a 0.15% : 0.000141s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000080s : 1: symbol_engine_optimizer 42.21% : 0.039449s : 1: task_emit 0.08% : 0.000079s : 1: tuple_transform 7.05% : 0.006588s : 1: type_inference 0.09% : 0.000085s : 1: validate TotalTime = 0.227173, [24] [bootstrap]: 0.00051304 [type_inference]: 0.0753188 [event_method]: 6.08e-05 [auto_monad]: 0.00015371 [graph_reusing]: 9.55001e-06 [inline]: 2.94001e-06 [add_attr]: 0.00369888, [1] [add_attr_with_inline]: 0.00368622, [1] [Cycle 1]: 0.00012381, [2] [tag_attr]: 6.994e-05 [meta_addattr_fg_expand]: 1.169e-05 [parallel-infer-symbol]: 3.77998e-06 [pre_auto_parallel]: 6.126e-05 [insert-virtual-dataset]: 3.4e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.27999e-06 [pipeline_split]: 2.36e-06 [optimize]: 0.0924463, [53] [py_interpret_to_execute]: 4.306e-05 [rewriter_before_opt_a]: 0.00017887 [opt_a]: 0.0898017, [3] [Cycle 1]: 0.084423, [45] [expand_dump_flag]: 6.58e-06 [switch_simplify]: 8.085e-05 [loop_unroll]: 6.561e-05 [a_1]: 0.00168373 [with_stream_mark]: 3.517e-05 [recompute_prepare]: 3.014e-05 [updatestate_depend_eliminate]: 9.60001e-06 [updatestate_assign_eliminate]: 7.86001e-06 [updatestate_loads_eliminate]: 7.08998e-06 [parameter_eliminate]: 4.28001e-06 [a_2]: 0.00026148 [accelerated_algorithm]: 4.18e-05 [shard]: 2.37999e-06 [meta_shard_fg_expand]: 4.97999e-06 [shard_inline]: 1.755e-05 [merge_send_recv]: 2.237e-05 [auto_parallel]: 1.385e-05 [parallel]: 2.166e-05 [flash_sp]: 1.542e-05 [merge_comm]: 1.026e-05 [allreduce_fusion]: 9.05001e-06 [matmul_add_comm_reduction]: 3.736e-05 [allreduce_slice_to_reducescatter]: 1.00999e-06 [virtual_shard_identity]: 2.451e-05 [virtual_dataset]: 1.643e-05 [get_grad_eliminate_]: 1.583e-05 [virtual_output]: 1.535e-05 [merge_forward]: 1.097e-05 [cell_reuse_recompute_pass]: 1.67001e-06 [offload_activation]: 2.082e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.506e-05 [merge_recompute_call_nodes]: 2.07999e-06 [before_grad]: 3.035e-05 [set_forward_comm_id_for_comm_node_pass]: 1.179e-05 [meta_fg_expand]: 0.00218813 [flash_sp_send_recv_attached]: 5.09998e-06 [receive_attached]: 2.29001e-06 [after_resolve]: 8.172e-05 [a_after_grad]: 0.0001046 [renormalize]: 0.0782151 [add_forward_monad_depend]: 1.757e-05 [auto_monad_grad]: 7.77e-06 [auto_monad_eliminator]: 6.402e-05 [cse]: 0.000331 [a_3]: 0.00038658 [Cycle 2]: 0.00456919, [45] [expand_dump_flag]: 3.38e-06 [switch_simplify]: 5.133e-05 [loop_unroll]: 4.382e-05 [a_1]: 0.00153475 [with_stream_mark]: 2.319e-05 [recompute_prepare]: 1.322e-05 [updatestate_depend_eliminate]: 6.22001e-06 [updatestate_assign_eliminate]: 4.45999e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 3.14001e-06 [a_2]: 9.868e-05 [accelerated_algorithm]: 1.392e-05 [shard]: 2.34001e-06 [meta_shard_fg_expand]: 2.96001e-06 [shard_inline]: 7.43999e-06 [merge_send_recv]: 1.152e-05 [auto_parallel]: 1.145e-05 [parallel]: 1.02e-05 [flash_sp]: 4.64998e-06 [merge_comm]: 4.39998e-06 [allreduce_fusion]: 4.17998e-06 [matmul_add_comm_reduction]: 1.183e-05 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 1.119e-05 [virtual_dataset]: 8.02e-06 [get_grad_eliminate_]: 7.11001e-06 [virtual_output]: 6.63998e-06 [merge_forward]: 5.00001e-06 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 1.259e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.686e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 1.287e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29e-06 [meta_fg_expand]: 0.0001516 [flash_sp_send_recv_attached]: 1.81e-06 [receive_attached]: 2.66e-06 [after_resolve]: 1.679e-05 [a_after_grad]: 1.156e-05 [renormalize]: 0.00199138 [add_forward_monad_depend]: 7.89002e-06 [auto_monad_grad]: 2.42001e-06 [auto_monad_eliminator]: 2.007e-05 [cse]: 3.901e-05 [a_3]: 6.042e-05 [Cycle 3]: 0.00078651, [45] [expand_dump_flag]: 2.12001e-06 [switch_simplify]: 9.10001e-06 [loop_unroll]: 7.27997e-06 [a_1]: 0.00016874 [with_stream_mark]: 1.304e-05 [recompute_prepare]: 7.45998e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 3.65e-06 [updatestate_loads_eliminate]: 3.21999e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 9.062e-05 [accelerated_algorithm]: 1.165e-05 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 7.11001e-06 [merge_send_recv]: 7.96001e-06 [auto_parallel]: 1.004e-05 [parallel]: 7.97e-06 [flash_sp]: 1.40999e-06 [merge_comm]: 4.33001e-06 [allreduce_fusion]: 3.9e-06 [matmul_add_comm_reduction]: 8.55001e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 8.99e-06 [virtual_dataset]: 7.03998e-06 [get_grad_eliminate_]: 6.46999e-06 [virtual_output]: 6.79999e-06 [merge_forward]: 4.42e-06 [cell_reuse_recompute_pass]: 2.44001e-06 [offload_activation]: 1.059e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.533e-05 [merge_recompute_call_nodes]: 1.25999e-06 [before_grad]: 1.168e-05 [set_forward_comm_id_for_comm_node_pass]: 4.34002e-06 [meta_fg_expand]: 3.25e-06 [flash_sp_send_recv_attached]: 1.49e-06 [receive_attached]: 1.66998e-06 [after_resolve]: 1.069e-05 [a_after_grad]: 1.027e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.47001e-06 [auto_monad_grad]: 1.34998e-06 [auto_monad_eliminator]: 1.032e-05 [cse]: 2.213e-05 [a_3]: 4.715e-05 [py_interpret_to_execute_after_opt_a]: 1.937e-05 [slice_cell_reuse_recomputed_activation]: 2.24001e-06 [rewriter_after_opt_a]: 5.039e-05 [convert_after_rewriter]: 7.74002e-06 [order_py_execute_after_rewriter]: 5.56e-06 [mutable_eliminate]: 0.00075267 [opt_b]: 0.00023251, [1] [Cycle 1]: 0.00022371, [7] [b_1]: 0.00014015 [b_2]: 1e-05 [updatestate_depend_eliminate]: 6.75998e-06 [updatestate_assign_eliminate]: 3.13998e-06 [updatestate_loads_eliminate]: 2.83e-06 [renormalize]: 6.50005e-07 [cse]: 2.369e-05 [optimize_parallel_all_gather_comm]: 1.959e-05 [overlap_param_gather]: 2.12999e-06 [cconv]: 3.119e-05 [loop_unroll]: 0.0004783 [opt_after_cconv]: 0.00015454, [1] [Cycle 1]: 0.00014782, [7] [c_1]: 3.575e-05 [parameter_eliminate]: 3.36001e-06 [updatestate_depend_eliminate]: 6.98998e-06 [updatestate_assign_eliminate]: 3.88999e-06 [updatestate_loads_eliminate]: 3.16999e-06 [cse]: 2.348e-05 [renormalize]: 7.59988e-07 [remove_dup_value]: 1.807e-05 [tuple_transform]: 9.08e-05, [1] [Cycle 1]: 8.563e-05, [4] [d_1]: 5.343e-05 [none_parameter_eliminate]: 2.09e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 8.46002e-06 [partial_unused_args_eliminate]: 2.69001e-06 [add_recomputation]: 5.976e-05 [cse_after_recomputation]: 2.869e-05, [1] [Cycle 1]: 2.247e-05, [1] [cse]: 1.589e-05 [environ_conv]: 1.05e-05 [swap_dp_allreduce_reducescatter]: 6.84999e-06 [bias_add_comm_swap]: 2.74001e-06 [label_micro_interleaved_index]: 5.18002e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.46998e-06 [micro_interleaved_order_control]: 2.63e-06 [assign_add_opt]: 1.35999e-06 [ForceFp32Comm]: 9.99979e-07 [remove_cast_before_assign_add]: 1.33002e-06 [full_micro_interleaved_order_control]: 2.42001e-06 [reorder_send_recv_between_fp_bp]: 3.18998e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 1.27e-06 [interleave_split_concat_branches]: 1.30999e-06 [interleave_parallel_branches]: 1.20999e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99999e-06 [control_data_broadcast_order]: 1.618e-05 [grouped_pairwise_exchange_alltoall]: 1.69e-06 [offloading_packed_experts]: 4.97999e-06 [overlap_recompute_and_grad_model_parallel]: 6.04001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 5.40999e-06 [overlap_grad_flash_sp]: 2.524e-05 [begin_end_overlap_inline]: 5.99975e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 1.85001e-06 [handle_group_info]: 1.17e-06 [symbol_engine_optimizer]: 9.467e-05, [1] [Cycle 1]: 8.98e-05, [6] [build]: 1.098e-05 [elim_shapecalc]: 1.196e-05 [elim_not_effective]: 1.631e-05 [opt_reshape]: 7.78999e-06 [fold_const_symbol]: 1.223e-05 [renormalize]: 2.3999e-07 [detach_backward]: 2.44999e-06 [pipeline_parallel_scheduler]: 1.61998e-06 [auto_monad_reorder]: 2.216e-05 [get_jit_bprop_graph]: 1.80001e-06 [rewriter_after_jit_bprop_graph]: 5.20001e-06 [opt_after_jit_grad]: 0.00051018 [validate]: 5.325e-05 [backend_pass]: 1.17e-06 [task_emit]: 0.054019 [execute]: 1.083e-05 Sums bootstrap : 0.000513s : 0.23% type_inference : 0.075319s : 33.95% event_method : 0.000061s : 0.03% auto_monad : 0.000154s : 0.07% graph_reusing : 0.000010s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000070s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000012s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000061s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000043s : 0.02% optimize.rewriter_before_opt_a : 0.000179s : 0.08% optimize.opt_a.expand_dump_flag : 0.000012s : 0.01% optimize.opt_a.switch_simplify : 0.000141s : 0.06% optimize.opt_a.loop_unroll : 0.000117s : 0.05% optimize.opt_a.a_1 : 0.003387s : 1.53% optimize.opt_a.with_stream_mark : 0.000071s : 0.03% optimize.opt_a.recompute_prepare : 0.000051s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.01% optimize.opt_a.parameter_eliminate : 0.000009s : 0.00% optimize.opt_a.a_2 : 0.000451s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000067s : 0.03% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.00% optimize.opt_a.shard_inline : 0.000032s : 0.01% optimize.opt_a.merge_send_recv : 0.000042s : 0.02% optimize.opt_a.auto_parallel : 0.000035s : 0.02% optimize.opt_a.parallel : 0.000040s : 0.02% optimize.opt_a.flash_sp : 0.000021s : 0.01% optimize.opt_a.merge_comm : 0.000019s : 0.01% optimize.opt_a.allreduce_fusion : 0.000017s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000058s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000045s : 0.02% optimize.opt_a.virtual_dataset : 0.000031s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000029s : 0.01% optimize.opt_a.virtual_output : 0.000029s : 0.01% optimize.opt_a.merge_forward : 0.000020s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000044s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000067s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.002343s : 1.06% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000109s : 0.05% optimize.opt_a.a_after_grad : 0.000126s : 0.06% optimize.opt_a.renormalize : 0.080207s : 36.15% optimize.opt_a.add_forward_monad_depend : 0.000027s : 0.01% optimize.opt_a.auto_monad_grad : 0.000012s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000094s : 0.04% optimize.opt_a.cse : 0.000392s : 0.18% optimize.opt_a.a_3 : 0.000494s : 0.22% optimize.py_interpret_to_execute_after_opt_a : 0.000019s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.02% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000753s : 0.34% optimize.opt_b.b_1 : 0.000140s : 0.06% optimize.opt_b.b_2 : 0.000010s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000024s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000031s : 0.01% optimize.loop_unroll : 0.000478s : 0.22% optimize.opt_after_cconv.c_1 : 0.000036s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000023s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000018s : 0.01% optimize.tuple_transform.d_1 : 0.000053s : 0.02% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.00% optimize.partial_unused_args_eliminate : 0.000003s : 0.00% optimize.add_recomputation : 0.000060s : 0.03% optimize.cse_after_recomputation.cse : 0.000016s : 0.01% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000510s : 0.23% validate : 0.000053s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.054019s : 24.35% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.001019 161 7.58% : 0.000077s : 8: substitution.arithmetic_simplify 0.28% : 0.000003s : 3: substitution.elim_not_effective 0.64% : 0.000007s : 5: substitution.float_depend_g_call 0.43% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.19% : 0.000002s : 3: substitution.fold_const_symbol 0.71% : 0.000007s : 4: substitution.graph_param_transform 0.36% : 0.000004s : 2: substitution.incorporate_call 0.29% : 0.000003s : 2: substitution.incorporate_call_switch 60.26% : 0.000614s : 17: substitution.inline 2.67% : 0.000027s : 2: substitution.inline_without_move 1.22% : 0.000012s : 15: substitution.j_node_and_user_rematch 2.25% : 0.000023s : 3: substitution.less_batch_normalization 1.31% : 0.000013s : 7: substitution.minmaximum_grad 0.73% : 0.000007s : 5: substitution.partial_eliminate 1.45% : 0.000015s : 15: substitution.remove_not_recompute_node 3.98% : 0.000041s : 10: substitution.replace_applicator 1.30% : 0.000013s : 10: substitution.replace_old_param 0.41% : 0.000004s : 1: substitution.set_cell_output_no_recompute 2.57% : 0.000026s : 7: substitution.tuple_list_convert_item_index_to_positive 1.17% : 0.000012s : 7: substitution.tuple_list_get_item_const_eliminator 1.64% : 0.000017s : 7: substitution.tuple_list_get_item_depend_reorder 6.87% : 0.000070s : 19: substitution.tuple_list_get_item_eliminator 1.68% : 0.000017s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.075216 2 97.48% : 0.073318s : 1: type_inference.infer 2.52% : 0.001898s : 1: type_inference.specialize ------[replace.] 0.000259 27 64.47% : 0.000167s : 17: replace.inline 35.53% : 0.000092s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000637 27 94.46% : 0.000601s : 17: match.inline 5.54% : 0.000035s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000755 4248 1.13% : 0.000009s : 53: predicate.accumulaten_eliminater 0.26% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 21: predicate.addn_check_dump 1.10% : 0.000008s : 53: predicate.addn_zero_filter 1.05% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 2.22% : 0.000017s : 74: predicate.arithmetic_simplify 1.15% : 0.000009s : 53: predicate.cast_eliminate 1.10% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.47% : 0.000004s : 21: predicate.depend_value_elim 1.19% : 0.000009s : 53: predicate.dict_get_item_const_eliminator 1.15% : 0.000009s : 53: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000000s : 4: predicate.elim_not_effective 0.14% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000009s : 57: predicate.environ_add_const_eliminate 1.12% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 57: predicate.environ_get_depend_swap 1.61% : 0.000012s : 78: predicate.environ_get_eliminate 1.10% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.76% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.68% : 0.000020s : 80: predicate.float_depend_g_call 0.44% : 0.000003s : 21: predicate.float_environ_get_switch 0.60% : 0.000005s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.52% : 0.000004s : 21: predicate.get_grad_eliminate 0.09% : 0.000001s : 4: predicate.graph_param_transform 0.46% : 0.000003s : 21: predicate.incorporate_call 0.43% : 0.000003s : 21: predicate.incorporate_call_switch 5.84% : 0.000044s : 183: predicate.inline 1.38% : 0.000010s : 45: predicate.inline_without_move 0.26% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.71% : 0.000005s : 21: predicate.less_batch_normalization 1.64% : 0.000012s : 71: predicate.list_to_tuple_eliminator_ 2.53% : 0.000019s : 124: predicate.load_eliminater 0.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.41% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.37% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.45% : 0.000003s : 21: predicate.merge_addn 1.16% : 0.000009s : 50: predicate.micro_step_allgather_replace 1.20% : 0.000009s : 50: predicate.mini_step_allgather_replace 1.08% : 0.000008s : 53: predicate.minmaximum_grad 0.37% : 0.000003s : 4: predicate.mutable_eliminate 0.10% : 0.000001s : 4: predicate.opt_reshape 0.11% : 0.000001s : 4: predicate.parallel_virtual_node 2.39% : 0.000018s : 80: predicate.partial_defer_inline 1.64% : 0.000012s : 67: predicate.partial_eliminate 1.08% : 0.000008s : 53: predicate.print_const_string_wrapper 0.47% : 0.000004s : 21: predicate.reduce_all_const_elim 1.47% : 0.000011s : 53: predicate.reduce_eliminate 2.56% : 0.000019s : 124: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 21: predicate.remove_not_recompute_node 1.86% : 0.000014s : 113: predicate.replace_applicator 0.69% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.13% : 0.000009s : 53: predicate.reshape_eliminate 1.17% : 0.000009s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.32% : 0.000010s : 50: predicate.same_eliminate 0.33% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.70% : 0.000005s : 21: predicate.shard_identity_eliminate 0.23% : 0.000002s : 8: predicate.special_op_eliminate 0.65% : 0.000005s : 21: predicate.specialize_transform 1.51% : 0.000011s : 50: predicate.split_environ_get_set_with_tuple_value 1.27% : 0.000010s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.89% : 0.000014s : 80: predicate.switch_defer_inline 2.96% : 0.000022s : 130: predicate.switch_layer_defer_inline 5.09% : 0.000038s : 218: predicate.switch_simplify 1.11% : 0.000008s : 53: predicate.tile_eliminate 1.08% : 0.000008s : 53: predicate.transpose_eliminate 1.45% : 0.000011s : 61: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000012s : 61: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 61: predicate.tuple_list_get_item_depend_reorder 2.73% : 0.000021s : 92: predicate.tuple_list_get_item_eliminator 1.49% : 0.000011s : 61: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 82: predicate.tuple_list_set_item_eliminator 1.55% : 0.000012s : 71: predicate.tuple_to_list_eliminator_ 2.50% : 0.000019s : 124: predicate.updatestate_pure_node_eliminater 3.04% : 0.000023s : 145: predicate.updatestate_useless_node_eliminater 0.12% : 0.000001s : 4: predicate.value_based_eliminate 0.60% : 0.000005s : 21: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.004038 36 53.25% : 0.002150s : 15: func_graph_cloner_run.FuncGraphClonerGraph 46.75% : 0.001888s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.408766 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.91% : 0.003706s : 1: add_attr 0.90% : 0.003691s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.02% : 0.000064s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000162s : 1: auto_monad 0.01% : 0.000026s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000005s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.13% : 0.000550s : 1: bootstrap 0.01% : 0.000035s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000032s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000014s : 1: environ_conv 0.02% : 0.000069s : 1: event_method 0.00% : 0.000019s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000014s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.12% : 0.000489s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.19% : 0.000763s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000018s : 1: opt.transform.mutable_eliminate 1.24% : 0.005052s : 117: opt.transform.opt_a 0.01% : 0.000034s : 1: opt.transform.opt_after_cconv 0.01% : 0.000027s : 1: opt.transform.opt_after_jit_grad 0.03% : 0.000119s : 28: opt.transform.opt_b 0.01% : 0.000059s : 2: opt.transform.opt_trans_graph 0.01% : 0.000044s : 4: opt.transform.symbol_engine_opt 21.97% : 0.089806s : 1: opt_a 0.04% : 0.000158s : 1: opt_after_cconv 0.13% : 0.000521s : 1: opt_after_jit_grad 0.06% : 0.000236s : 1: opt_b 22.62% : 0.092452s : 1: optimize 0.01% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000006s : 1: pipeline_split 0.02% : 0.000067s : 1: pre_auto_parallel 0.01% : 0.000048s : 1: py_interpret_to_execute 0.01% : 0.000023s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000005s : 1: remove_cast_before_assign_add 0.01% : 0.000022s : 1: remove_dup_value 18.84% : 0.077021s : 2: renormalize.infer 0.77% : 0.003163s : 2: renormalize.specialize 0.00% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000055s : 1: rewriter_after_opt_a 0.05% : 0.000185s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000098s : 1: symbol_engine_optimizer 13.22% : 0.054044s : 1: task_emit 0.02% : 0.000094s : 1: tuple_transform 18.43% : 0.075345s : 1: type_inference 0.02% : 0.000092s : 1: validate TotalTime = 0.0636871, [24] [bootstrap]: 0.00047294 [type_inference]: 0.046839 [event_method]: 1.403e-05 [auto_monad]: 6.576e-05 [graph_reusing]: 5.57001e-06 [inline]: 2.69999e-06 [add_attr]: 0.00368053, [1] [add_attr_with_inline]: 0.00366844, [1] [Cycle 1]: 6.241e-05, [2] [tag_attr]: 1.657e-05 [meta_addattr_fg_expand]: 4.92e-06 [parallel-infer-symbol]: 4.04002e-06 [pre_auto_parallel]: 3.033e-05 [insert-virtual-dataset]: 2.91e-06 [parallel-infer-symbol-second]: 9.20001e-07 [dataset_repeat_opt]: 3.01001e-06 [pipeline_split]: 2.30002e-06 [optimize]: 0.00484766, [53] [py_interpret_to_execute]: 2.326e-05 [rewriter_before_opt_a]: 5.997e-05 [opt_a]: 0.00241388, [2] [Cycle 1]: 0.00172402, [45] [expand_dump_flag]: 2.58e-06 [switch_simplify]: 3.187e-05 [loop_unroll]: 1.832e-05 [a_1]: 0.00040796 [with_stream_mark]: 1.545e-05 [recompute_prepare]: 9.16002e-06 [updatestate_depend_eliminate]: 4.10998e-06 [updatestate_assign_eliminate]: 3.81001e-06 [updatestate_loads_eliminate]: 3.28e-06 [parameter_eliminate]: 2.04e-06 [a_2]: 8.525e-05 [accelerated_algorithm]: 6.83e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 2.37999e-06 [shard_inline]: 6.54001e-06 [merge_send_recv]: 8.60001e-06 [auto_parallel]: 7.14001e-06 [parallel]: 1.967e-05 [flash_sp]: 9.26998e-06 [merge_comm]: 4.17998e-06 [allreduce_fusion]: 3.7e-06 [matmul_add_comm_reduction]: 9.57999e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 8.48999e-06 [virtual_dataset]: 6.38998e-06 [get_grad_eliminate_]: 6.05002e-06 [virtual_output]: 5.97001e-06 [merge_forward]: 4.17998e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 1.064e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.276e-05 [merge_recompute_call_nodes]: 1.81998e-06 [before_grad]: 1.073e-05 [set_forward_comm_id_for_comm_node_pass]: 4e-06 [meta_fg_expand]: 3.11001e-06 [flash_sp_send_recv_attached]: 2.86999e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.1e-05 [a_after_grad]: 8.94e-06 [renormalize]: 0.00059471 [add_forward_monad_depend]: 5.90002e-06 [auto_monad_grad]: 2.98e-06 [auto_monad_eliminator]: 1.591e-05 [cse]: 3.302e-05 [a_3]: 4.813e-05 [Cycle 2]: 0.00067625, [45] [expand_dump_flag]: 1.61002e-06 [switch_simplify]: 7.66001e-06 [loop_unroll]: 6.01e-06 [a_1]: 0.0001247 [with_stream_mark]: 1.324e-05 [recompute_prepare]: 6.54001e-06 [updatestate_depend_eliminate]: 3.83999e-06 [updatestate_assign_eliminate]: 3.09001e-06 [updatestate_loads_eliminate]: 3.11001e-06 [parameter_eliminate]: 1.42999e-06 [a_2]: 7.466e-05 [accelerated_algorithm]: 6.29001e-06 [shard]: 1.40001e-06 [meta_shard_fg_expand]: 1.66998e-06 [shard_inline]: 6.02001e-06 [merge_send_recv]: 6.02999e-06 [auto_parallel]: 7.38e-06 [parallel]: 6.94001e-06 [flash_sp]: 3.81001e-06 [merge_comm]: 4e-06 [allreduce_fusion]: 3.55003e-06 [matmul_add_comm_reduction]: 7.77e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 7.33999e-06 [virtual_dataset]: 5.55001e-06 [get_grad_eliminate_]: 5.33002e-06 [virtual_output]: 5.34e-06 [merge_forward]: 3.44001e-06 [cell_reuse_recompute_pass]: 1.85001e-06 [offload_activation]: 8.40999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.21e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 1.029e-05 [set_forward_comm_id_for_comm_node_pass]: 4.05e-06 [meta_fg_expand]: 2.31e-06 [flash_sp_send_recv_attached]: 1.05001e-06 [receive_attached]: 2.00002e-06 [after_resolve]: 1.05e-05 [a_after_grad]: 8.56002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.42e-06 [auto_monad_grad]: 1.48002e-06 [auto_monad_eliminator]: 8.85001e-06 [cse]: 1.789e-05 [a_3]: 3.502e-05 [py_interpret_to_execute_after_opt_a]: 1.188e-05 [slice_cell_reuse_recomputed_activation]: 2.12999e-06 [rewriter_after_opt_a]: 4.24e-05 [convert_after_rewriter]: 7.06001e-06 [order_py_execute_after_rewriter]: 5.71998e-06 [mutable_eliminate]: 0.00081284 [opt_b]: 0.00022038, [1] [Cycle 1]: 0.00021171, [7] [b_1]: 0.00012625 [b_2]: 8.48999e-06 [updatestate_depend_eliminate]: 6.98998e-06 [updatestate_assign_eliminate]: 3.05998e-06 [updatestate_loads_eliminate]: 2.49001e-06 [renormalize]: 6.10016e-07 [cse]: 2.472e-05 [optimize_parallel_all_gather_comm]: 1.87e-05 [overlap_param_gather]: 2.04e-06 [cconv]: 2.521e-05 [loop_unroll]: 0.00048887 [opt_after_cconv]: 0.00011077, [1] [Cycle 1]: 0.00010332, [7] [c_1]: 2.986e-05 [parameter_eliminate]: 3.76001e-06 [updatestate_depend_eliminate]: 5.99999e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.47001e-06 [cse]: 2.185e-05 [renormalize]: 7.10017e-07 [remove_dup_value]: 1.753e-05 [tuple_transform]: 7.671e-05, [1] [Cycle 1]: 7.137e-05, [4] [d_1]: 4.196e-05 [none_parameter_eliminate]: 1.85001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 7.76001e-06 [partial_unused_args_eliminate]: 2.27001e-06 [add_recomputation]: 5.02e-05 [cse_after_recomputation]: 2.314e-05, [1] [Cycle 1]: 1.783e-05, [1] [cse]: 1.211e-05 [environ_conv]: 6.09001e-06 [swap_dp_allreduce_reducescatter]: 5.05001e-06 [bias_add_comm_swap]: 3.16999e-06 [label_micro_interleaved_index]: 5.85002e-06 [label_fine_grained_interleaved_index]: 2.84001e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.49001e-06 [micro_interleaved_order_control]: 2.49001e-06 [assign_add_opt]: 1.70001e-06 [ForceFp32Comm]: 8.30012e-07 [remove_cast_before_assign_add]: 1.51002e-06 [full_micro_interleaved_order_control]: 2.25002e-06 [reorder_send_recv_between_fp_bp]: 2.93e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.25999e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.36002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92999e-06 [control_data_broadcast_order]: 1.327e-05 [grouped_pairwise_exchange_alltoall]: 2.21e-06 [offloading_packed_experts]: 4.28999e-06 [overlap_recompute_and_grad_model_parallel]: 5.24998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.57001e-06 [overlap_recompute_comm]: 2.46e-06 [overlap_grad_ring_attention]: 4.25e-06 [overlap_grad_flash_sp]: 1.962e-05 [begin_end_overlap_inline]: 5.40022e-07 [split_matmul_comm_elemetwise]: 2.53998e-06 [split_layernorm_comm]: 2.10002e-06 [handle_group_info]: 1.30001e-06 [symbol_engine_optimizer]: 8.171e-05, [1] [Cycle 1]: 7.61e-05, [6] [build]: 2.99999e-06 [elim_shapecalc]: 1.147e-05 [elim_not_effective]: 1.294e-05 [opt_reshape]: 7.08998e-06 [fold_const_symbol]: 1.011e-05 [renormalize]: 2.59985e-07 [detach_backward]: 2.33998e-06 [pipeline_parallel_scheduler]: 1.64e-06 [auto_monad_reorder]: 1.756e-05 [get_jit_bprop_graph]: 2.09e-06 [rewriter_after_jit_bprop_graph]: 4.08001e-06 [opt_after_jit_grad]: 0.00053498 [validate]: 4.221e-05 [backend_pass]: 1.10001e-06 [task_emit]: 0.00685791 [execute]: 8.10999e-06 Sums bootstrap : 0.000473s : 0.80% type_inference : 0.046839s : 79.54% event_method : 0.000014s : 0.02% auto_monad : 0.000066s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000030s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.04% optimize.rewriter_before_opt_a : 0.000060s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.07% optimize.opt_a.loop_unroll : 0.000024s : 0.04% optimize.opt_a.a_1 : 0.000533s : 0.90% optimize.opt_a.with_stream_mark : 0.000029s : 0.05% optimize.opt_a.recompute_prepare : 0.000016s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000160s : 0.27% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.01% optimize.opt_a.shard_inline : 0.000013s : 0.02% optimize.opt_a.merge_send_recv : 0.000015s : 0.02% optimize.opt_a.auto_parallel : 0.000015s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.05% optimize.opt_a.flash_sp : 0.000013s : 0.02% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000008s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000019s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000021s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000021s : 0.04% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000595s : 1.01% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.04% optimize.opt_a.cse : 0.000051s : 0.09% optimize.opt_a.a_3 : 0.000083s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000042s : 0.07% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000813s : 1.38% optimize.opt_b.b_1 : 0.000126s : 0.21% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000025s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.04% optimize.loop_unroll : 0.000489s : 0.83% optimize.opt_after_cconv.c_1 : 0.000030s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000022s : 0.04% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000018s : 0.03% optimize.tuple_transform.d_1 : 0.000042s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.09% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000006s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000535s : 0.91% validate : 0.000042s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.006858s : 11.65% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000181 24 19.49% : 0.000035s : 4: substitution.arithmetic_simplify 1.14% : 0.000002s : 2: substitution.elim_not_effective 0.89% : 0.000002s : 2: substitution.fold_const_symbol 3.80% : 0.000007s : 3: substitution.graph_param_transform 67.93% : 0.000123s : 3: substitution.inline 1.89% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.65% : 0.000005s : 4: substitution.remove_not_recompute_node 2.22% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.046779 2 98.86% : 0.046246s : 1: type_inference.infer 1.14% : 0.000533s : 1: type_inference.specialize ------[replace.] 0.000032 3 100.00% : 0.000032s : 3: replace.inline ------[match.] 0.000121 3 100.00% : 0.000121s : 3: match.inline ------[predicate.] 0.000161 815 0.80% : 0.000001s : 8: predicate.accumulaten_eliminater 0.88% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 0.80% : 0.000001s : 8: predicate.addn_zero_filter 0.76% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.86% : 0.000005s : 14: predicate.arithmetic_simplify 0.95% : 0.000002s : 8: predicate.cast_eliminate 0.66% : 0.000001s : 6: predicate.check_bprop_eliminate 0.61% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.32% : 0.000001s : 3: predicate.elim_not_effective 0.45% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.02% : 0.000002s : 11: predicate.environ_get_depend_swap 1.76% : 0.000003s : 17: predicate.environ_get_eliminate 1.11% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.11% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.25% : 0.000004s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.70% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.10% : 0.000010s : 37: predicate.inline 0.99% : 0.000002s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 6: predicate.less_batch_normalization 1.61% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.21% : 0.000004s : 22: predicate.load_eliminater 1.35% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.88% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 6: predicate.merge_addn 0.63% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 8: predicate.minmaximum_grad 1.44% : 0.000002s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.37% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 11: predicate.partial_eliminate 0.86% : 0.000001s : 8: predicate.print_const_string_wrapper 0.63% : 0.000001s : 6: predicate.reduce_all_const_elim 1.30% : 0.000002s : 8: predicate.reduce_eliminate 2.22% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 6: predicate.remove_not_recompute_node 1.32% : 0.000002s : 14: predicate.replace_applicator 0.71% : 0.000001s : 6: predicate.replace_old_param 0.47% : 0.000001s : 3: predicate.reset_defer_inline 0.89% : 0.000001s : 8: predicate.reshape_eliminate 0.72% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.53% : 0.000001s : 3: predicate.row_tensor_eliminate 0.93% : 0.000001s : 6: predicate.same_eliminate 0.48% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 6: predicate.shard_identity_eliminate 0.83% : 0.000001s : 6: predicate.special_op_eliminate 0.86% : 0.000001s : 6: predicate.specialize_transform 1.04% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.50% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.23% : 0.000002s : 11: predicate.switch_defer_inline 1.93% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.78% : 0.000008s : 38: predicate.switch_simplify 0.86% : 0.000001s : 8: predicate.tile_eliminate 1.01% : 0.000002s : 8: predicate.transpose_eliminate 1.62% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.99% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 3: predicate.value_based_eliminate 0.71% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.69% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000318 7 32.84% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 67.16% : 0.000213s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.073896 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.99% : 0.003688s : 1: add_attr 4.97% : 0.003672s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000054s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.10% : 0.000072s : 1: auto_monad 0.03% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.68% : 0.000506s : 1: bootstrap 0.04% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000026s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000010s : 1: environ_conv 0.03% : 0.000021s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000009s : 1: label_micro_interleaved_index 0.68% : 0.000500s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.12% : 0.000825s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000019s : 1: opt.transform.mutable_eliminate 1.25% : 0.000924s : 78: opt.transform.opt_a 0.04% : 0.000028s : 1: opt.transform.opt_after_cconv 0.03% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000102s : 28: opt.transform.opt_b 0.06% : 0.000047s : 2: opt.transform.opt_trans_graph 0.05% : 0.000037s : 4: opt.transform.symbol_engine_opt 3.27% : 0.002418s : 1: opt_a 0.15% : 0.000114s : 1: opt_after_cconv 0.74% : 0.000547s : 1: opt_after_jit_grad 0.30% : 0.000224s : 1: opt_b 6.57% : 0.004853s : 1: optimize 0.03% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000006s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.05% : 0.000035s : 1: pre_auto_parallel 0.04% : 0.000027s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000005s : 1: remove_cast_before_assign_add 0.03% : 0.000021s : 1: remove_dup_value 0.42% : 0.000310s : 1: renormalize.infer 0.37% : 0.000276s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000047s : 1: rewriter_after_opt_a 0.09% : 0.000065s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000006s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000085s : 1: symbol_engine_optimizer 9.30% : 0.006876s : 1: task_emit 0.11% : 0.000080s : 1: tuple_transform 63.42% : 0.046863s : 1: type_inference 0.10% : 0.000073s : 1: validate TotalTime = 0.115121, [24] [bootstrap]: 0.00047677 [type_inference]: 0.0607751 [event_method]: 5.509e-05 [auto_monad]: 0.00015444 [graph_reusing]: 1.056e-05 [inline]: 3.3e-06 [add_attr]: 0.00402764, [1] [add_attr_with_inline]: 0.00401402, [1] [Cycle 1]: 0.00010479, [2] [tag_attr]: 4.478e-05 [meta_addattr_fg_expand]: 1.038e-05 [parallel-infer-symbol]: 3.85e-06 [pre_auto_parallel]: 6.174e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.57001e-06 [pipeline_split]: 1.81e-06 [optimize]: 0.0409363, [53] [py_interpret_to_execute]: 4.583e-05 [rewriter_before_opt_a]: 0.00017529 [opt_a]: 0.0382815, [3] [Cycle 1]: 0.0340246, [45] [expand_dump_flag]: 6.24999e-06 [switch_simplify]: 7.979e-05 [loop_unroll]: 6.262e-05 [a_1]: 0.00166233 [with_stream_mark]: 3.911e-05 [recompute_prepare]: 3.319e-05 [updatestate_depend_eliminate]: 1.018e-05 [updatestate_assign_eliminate]: 7.65e-06 [updatestate_loads_eliminate]: 7.66001e-06 [parameter_eliminate]: 3.95e-06 [a_2]: 0.00026548 [accelerated_algorithm]: 3.954e-05 [shard]: 2.61e-06 [meta_shard_fg_expand]: 4.75999e-06 [shard_inline]: 1.78e-05 [merge_send_recv]: 2.078e-05 [auto_parallel]: 1.437e-05 [parallel]: 2.363e-05 [flash_sp]: 1.533e-05 [merge_comm]: 1.138e-05 [allreduce_fusion]: 9.09e-06 [matmul_add_comm_reduction]: 3.649e-05 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 2.472e-05 [virtual_dataset]: 1.665e-05 [get_grad_eliminate_]: 1.635e-05 [virtual_output]: 1.607e-05 [merge_forward]: 1.002e-05 [cell_reuse_recompute_pass]: 1.26997e-06 [offload_activation]: 2.043e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.428e-05 [merge_recompute_call_nodes]: 2.13002e-06 [before_grad]: 3.261e-05 [set_forward_comm_id_for_comm_node_pass]: 1.256e-05 [meta_fg_expand]: 0.0221329 [flash_sp_send_recv_attached]: 1.045e-05 [receive_attached]: 2.49001e-06 [after_resolve]: 0.00010021 [a_after_grad]: 0.00011117 [renormalize]: 0.00802962 [add_forward_monad_depend]: 1.237e-05 [auto_monad_grad]: 7.88001e-06 [auto_monad_eliminator]: 5.684e-05 [cse]: 0.00022481 [a_3]: 0.00036799 [Cycle 2]: 0.00346952, [45] [expand_dump_flag]: 3.03e-06 [switch_simplify]: 4.962e-05 [loop_unroll]: 4.412e-05 [a_1]: 0.00150902 [with_stream_mark]: 2.048e-05 [recompute_prepare]: 1.32e-05 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 4.32003e-06 [updatestate_loads_eliminate]: 3.41999e-06 [parameter_eliminate]: 2.59999e-06 [a_2]: 9.732e-05 [accelerated_algorithm]: 1.266e-05 [shard]: 2.09e-06 [meta_shard_fg_expand]: 2.99001e-06 [shard_inline]: 7.33e-06 [merge_send_recv]: 1.077e-05 [auto_parallel]: 1.044e-05 [parallel]: 1.13e-05 [flash_sp]: 5.10999e-06 [merge_comm]: 4.50001e-06 [allreduce_fusion]: 3.88999e-06 [matmul_add_comm_reduction]: 1.081e-05 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 9.94001e-06 [virtual_dataset]: 7.2e-06 [get_grad_eliminate_]: 7.15998e-06 [virtual_output]: 6.17999e-06 [merge_forward]: 5.09003e-06 [cell_reuse_recompute_pass]: 1.55001e-06 [offload_activation]: 1.211e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.609e-05 [merge_recompute_call_nodes]: 1.71e-06 [before_grad]: 1.313e-05 [set_forward_comm_id_for_comm_node_pass]: 5.04e-06 [meta_fg_expand]: 9.921e-05 [flash_sp_send_recv_attached]: 1.92001e-06 [receive_attached]: 2.51e-06 [after_resolve]: 1.606e-05 [a_after_grad]: 1.241e-05 [renormalize]: 0.00101742 [add_forward_monad_depend]: 6.01e-06 [auto_monad_grad]: 2.53003e-06 [auto_monad_eliminator]: 1.775e-05 [cse]: 3.692e-05 [a_3]: 6.039e-05 [Cycle 3]: 0.00076337, [45] [expand_dump_flag]: 2.35002e-06 [switch_simplify]: 9.37999e-06 [loop_unroll]: 9.33002e-06 [a_1]: 0.00016686 [with_stream_mark]: 1.082e-05 [recompute_prepare]: 7.41001e-06 [updatestate_depend_eliminate]: 4.55001e-06 [updatestate_assign_eliminate]: 3.23e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.34e-06 [a_2]: 9.017e-05 [accelerated_algorithm]: 1.113e-05 [shard]: 1.30001e-06 [meta_shard_fg_expand]: 2.40002e-06 [shard_inline]: 7.33e-06 [merge_send_recv]: 7.78001e-06 [auto_parallel]: 8.36002e-06 [parallel]: 7.67002e-06 [flash_sp]: 1.24e-06 [merge_comm]: 4.29002e-06 [allreduce_fusion]: 4.28999e-06 [matmul_add_comm_reduction]: 7.36999e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 8.78001e-06 [virtual_dataset]: 6.83e-06 [get_grad_eliminate_]: 6.59999e-06 [virtual_output]: 6.58998e-06 [merge_forward]: 4.04002e-06 [cell_reuse_recompute_pass]: 2.56e-06 [offload_activation]: 9.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.395e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 1.148e-05 [set_forward_comm_id_for_comm_node_pass]: 4.13001e-06 [meta_fg_expand]: 3.22002e-06 [flash_sp_send_recv_attached]: 1.16002e-06 [receive_attached]: 1.49e-06 [after_resolve]: 1.104e-05 [a_after_grad]: 1.049e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.54e-06 [auto_monad_grad]: 1.50999e-06 [auto_monad_eliminator]: 8.57998e-06 [cse]: 1.885e-05 [a_3]: 4.194e-05 [py_interpret_to_execute_after_opt_a]: 1.423e-05 [slice_cell_reuse_recomputed_activation]: 2.04999e-06 [rewriter_after_opt_a]: 4.532e-05 [convert_after_rewriter]: 7.55998e-06 [order_py_execute_after_rewriter]: 5.59e-06 [mutable_eliminate]: 0.0007178 [opt_b]: 0.00024979, [1] [Cycle 1]: 0.00024161, [7] [b_1]: 0.00014203 [b_2]: 2.223e-05 [updatestate_depend_eliminate]: 7.06001e-06 [updatestate_assign_eliminate]: 3.10002e-06 [updatestate_loads_eliminate]: 2.91999e-06 [renormalize]: 7.2e-07 [cse]: 2.512e-05 [optimize_parallel_all_gather_comm]: 1.863e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.989e-05 [loop_unroll]: 0.00053482 [opt_after_cconv]: 0.00012752, [1] [Cycle 1]: 0.00012036, [7] [c_1]: 3.822e-05 [parameter_eliminate]: 3.61999e-06 [updatestate_depend_eliminate]: 7.25e-06 [updatestate_assign_eliminate]: 4.03001e-06 [updatestate_loads_eliminate]: 3.86999e-06 [cse]: 2.514e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 1.82e-05 [tuple_transform]: 9.275e-05, [1] [Cycle 1]: 8.749e-05, [4] [d_1]: 5.615e-05 [none_parameter_eliminate]: 2.14e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 8.59998e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 6.125e-05 [cse_after_recomputation]: 2.93e-05, [1] [Cycle 1]: 2.397e-05, [1] [cse]: 1.769e-05 [environ_conv]: 9.37999e-06 [swap_dp_allreduce_reducescatter]: 7.42998e-06 [bias_add_comm_swap]: 3.08e-06 [label_micro_interleaved_index]: 5.74999e-06 [label_fine_grained_interleaved_index]: 3.31999e-06 [merge_cast_opt]: 1.50999e-06 [slice_recompute_activation]: 2.51e-06 [micro_interleaved_order_control]: 2.66999e-06 [assign_add_opt]: 1.50999e-06 [ForceFp32Comm]: 1.14e-06 [remove_cast_before_assign_add]: 1.20999e-06 [full_micro_interleaved_order_control]: 2.56e-06 [reorder_send_recv_between_fp_bp]: 2.96999e-06 [comm_op_add_attrs]: 1.23002e-06 [add_comm_op_reuse_tag]: 1.14998e-06 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.15001e-06 [overlap_opt_shard_in_pipeline]: 1.22999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.96e-06 [control_data_broadcast_order]: 1.565e-05 [grouped_pairwise_exchange_alltoall]: 1.97999e-06 [offloading_packed_experts]: 5.26998e-06 [overlap_recompute_and_grad_model_parallel]: 5.99e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.48002e-06 [overlap_recompute_comm]: 2.96999e-06 [overlap_grad_ring_attention]: 5.00001e-06 [overlap_grad_flash_sp]: 2.577e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.32999e-06 [split_layernorm_comm]: 2.08002e-06 [handle_group_info]: 1.42999e-06 [symbol_engine_optimizer]: 9.838e-05, [1] [Cycle 1]: 9.329e-05, [6] [build]: 1.071e-05 [elim_shapecalc]: 1.26e-05 [elim_not_effective]: 1.66e-05 [opt_reshape]: 8.77e-06 [fold_const_symbol]: 1.32e-05 [renormalize]: 2.59985e-07 [detach_backward]: 2.49001e-06 [pipeline_parallel_scheduler]: 1.84998e-06 [auto_monad_reorder]: 2.257e-05 [get_jit_bprop_graph]: 1.98002e-06 [rewriter_after_jit_bprop_graph]: 4.25999e-06 [opt_after_jit_grad]: 0.00057625 [validate]: 5.303e-05 [backend_pass]: 1.16002e-06 [task_emit]: 0.00767661 [execute]: 8.93002e-06 Sums bootstrap : 0.000477s : 0.44% type_inference : 0.060775s : 55.48% event_method : 0.000055s : 0.05% auto_monad : 0.000154s : 0.14% graph_reusing : 0.000011s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000045s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000062s : 0.06% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000046s : 0.04% optimize.rewriter_before_opt_a : 0.000175s : 0.16% optimize.opt_a.expand_dump_flag : 0.000012s : 0.01% optimize.opt_a.switch_simplify : 0.000139s : 0.13% optimize.opt_a.loop_unroll : 0.000116s : 0.11% optimize.opt_a.a_1 : 0.003338s : 3.05% optimize.opt_a.with_stream_mark : 0.000070s : 0.06% optimize.opt_a.recompute_prepare : 0.000054s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.01% optimize.opt_a.parameter_eliminate : 0.000008s : 0.01% optimize.opt_a.a_2 : 0.000453s : 0.41% optimize.opt_a.accelerated_algorithm : 0.000063s : 0.06% optimize.opt_a.shard : 0.000006s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.01% optimize.opt_a.shard_inline : 0.000032s : 0.03% optimize.opt_a.merge_send_recv : 0.000039s : 0.04% optimize.opt_a.auto_parallel : 0.000033s : 0.03% optimize.opt_a.parallel : 0.000043s : 0.04% optimize.opt_a.flash_sp : 0.000022s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000017s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000055s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000043s : 0.04% optimize.opt_a.virtual_dataset : 0.000031s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000030s : 0.03% optimize.opt_a.virtual_output : 0.000029s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000042s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000064s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.02% optimize.opt_a.meta_fg_expand : 0.022235s : 20.30% optimize.opt_a.flash_sp_send_recv_attached : 0.000014s : 0.01% optimize.opt_a.receive_attached : 0.000006s : 0.01% optimize.opt_a.after_resolve : 0.000127s : 0.12% optimize.opt_a.a_after_grad : 0.000134s : 0.12% optimize.opt_a.renormalize : 0.009047s : 8.26% optimize.opt_a.add_forward_monad_depend : 0.000020s : 0.02% optimize.opt_a.auto_monad_grad : 0.000012s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.08% optimize.opt_a.cse : 0.000281s : 0.26% optimize.opt_a.a_3 : 0.000470s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000045s : 0.04% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000718s : 0.66% optimize.opt_b.b_1 : 0.000142s : 0.13% optimize.opt_b.b_2 : 0.000022s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000025s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000030s : 0.03% optimize.loop_unroll : 0.000535s : 0.49% optimize.opt_after_cconv.c_1 : 0.000038s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000025s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000018s : 0.02% optimize.tuple_transform.d_1 : 0.000056s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000061s : 0.06% optimize.cse_after_recomputation.cse : 0.000018s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000017s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000023s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000576s : 0.53% validate : 0.000053s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.007677s : 7.01% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.001000 159 6.62% : 0.000066s : 7: substitution.arithmetic_simplify 0.28% : 0.000003s : 3: substitution.elim_not_effective 0.56% : 0.000006s : 5: substitution.float_depend_g_call 0.42% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.22% : 0.000002s : 3: substitution.fold_const_symbol 0.80% : 0.000008s : 4: substitution.graph_param_transform 0.37% : 0.000004s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 60.58% : 0.000606s : 17: substitution.inline 3.37% : 0.000034s : 2: substitution.inline_without_move 1.22% : 0.000012s : 15: substitution.j_node_and_user_rematch 2.21% : 0.000022s : 3: substitution.less_batch_normalization 1.27% : 0.000013s : 7: substitution.minmaximum_grad 0.83% : 0.000008s : 5: substitution.partial_eliminate 1.47% : 0.000015s : 15: substitution.remove_not_recompute_node 3.71% : 0.000037s : 10: substitution.replace_applicator 1.34% : 0.000013s : 10: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.78% : 0.000028s : 7: substitution.tuple_list_convert_item_index_to_positive 1.25% : 0.000013s : 7: substitution.tuple_list_get_item_const_eliminator 1.65% : 0.000016s : 7: substitution.tuple_list_get_item_depend_reorder 6.71% : 0.000067s : 18: substitution.tuple_list_get_item_eliminator 1.77% : 0.000018s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.060650 2 96.71% : 0.058655s : 1: type_inference.infer 3.29% : 0.001995s : 1: type_inference.specialize ------[replace.] 0.000252 26 68.00% : 0.000171s : 17: replace.inline 32.00% : 0.000081s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000623 26 95.21% : 0.000593s : 17: match.inline 4.79% : 0.000030s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000758 4180 1.15% : 0.000009s : 52: predicate.accumulaten_eliminater 0.29% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.06% : 0.000008s : 52: predicate.addn_zero_filter 1.03% : 0.000008s : 52: predicate.adjust_all_reduce_mul_add 2.06% : 0.000016s : 73: predicate.arithmetic_simplify 1.15% : 0.000009s : 52: predicate.cast_eliminate 1.05% : 0.000008s : 50: predicate.check_bprop_eliminate 0.45% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.49% : 0.000004s : 21: predicate.depend_value_elim 1.16% : 0.000009s : 52: predicate.dict_get_item_const_eliminator 1.12% : 0.000009s : 52: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.06% : 0.000000s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 56: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 56: predicate.environ_get_add_eliminate 1.09% : 0.000008s : 56: predicate.environ_get_depend_swap 1.62% : 0.000012s : 77: predicate.environ_get_eliminate 1.10% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.70% : 0.000013s : 78: predicate.exchange_switch_depend_value 2.61% : 0.000020s : 78: predicate.float_depend_g_call 0.45% : 0.000003s : 21: predicate.float_environ_get_switch 0.56% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.05% : 0.000000s : 4: predicate.fold_const_symbol 0.56% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000001s : 4: predicate.graph_param_transform 0.48% : 0.000004s : 21: predicate.incorporate_call 0.42% : 0.000003s : 21: predicate.incorporate_call_switch 5.80% : 0.000044s : 180: predicate.inline 1.55% : 0.000012s : 45: predicate.inline_without_move 0.26% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.59% : 0.000004s : 21: predicate.less_batch_normalization 1.56% : 0.000012s : 69: predicate.list_to_tuple_eliminator_ 2.50% : 0.000019s : 121: predicate.load_eliminater 0.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.37% : 0.000018s : 110: predicate.loop_unroll_before_grad 1.31% : 0.000010s : 60: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 21: predicate.merge_addn 1.04% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.09% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.09% : 0.000008s : 52: predicate.minmaximum_grad 0.37% : 0.000003s : 4: predicate.mutable_eliminate 0.15% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 2.32% : 0.000018s : 78: predicate.partial_defer_inline 3.96% : 0.000030s : 65: predicate.partial_eliminate 1.06% : 0.000008s : 52: predicate.print_const_string_wrapper 0.53% : 0.000004s : 21: predicate.reduce_all_const_elim 1.39% : 0.000011s : 52: predicate.reduce_eliminate 2.53% : 0.000019s : 121: predicate.redundant_stop_gradient_eliminater 0.28% : 0.000002s : 21: predicate.remove_not_recompute_node 1.79% : 0.000014s : 111: predicate.replace_applicator 0.78% : 0.000006s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.13% : 0.000009s : 52: predicate.reshape_eliminate 1.07% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 4: predicate.row_tensor_eliminate 1.33% : 0.000010s : 50: predicate.same_eliminate 0.35% : 0.000003s : 21: predicate.set_cell_output_no_recompute 0.58% : 0.000004s : 21: predicate.shard_identity_eliminate 0.25% : 0.000002s : 8: predicate.special_op_eliminate 0.57% : 0.000004s : 21: predicate.specialize_transform 1.35% : 0.000010s : 50: predicate.split_environ_get_set_with_tuple_value 1.30% : 0.000010s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 78: predicate.switch_defer_inline 2.82% : 0.000021s : 128: predicate.switch_layer_defer_inline 4.81% : 0.000036s : 213: predicate.switch_simplify 1.12% : 0.000009s : 52: predicate.tile_eliminate 1.06% : 0.000008s : 52: predicate.transpose_eliminate 1.41% : 0.000011s : 60: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000012s : 60: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 60: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000021s : 90: predicate.tuple_list_get_item_eliminator 1.39% : 0.000011s : 60: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 81: predicate.tuple_list_set_item_eliminator 1.50% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.42% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 2.96% : 0.000022s : 142: predicate.updatestate_useless_node_eliminater 0.11% : 0.000001s : 4: predicate.value_based_eliminate 0.53% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.48% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002288 35 55.56% : 0.001271s : 14: func_graph_cloner_run.FuncGraphClonerGraph 44.44% : 0.001017s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.174333 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.31% : 0.004034s : 1: add_attr 2.31% : 0.004019s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000066s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.09% : 0.000164s : 1: auto_monad 0.02% : 0.000027s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.29% : 0.000501s : 1: bootstrap 0.02% : 0.000033s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000019s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000033s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000065s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000015s : 1: graph_reusing 0.00% : 0.000006s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000009s : 1: label_micro_interleaved_index 0.31% : 0.000546s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.42% : 0.000729s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.87% : 0.005003s : 117: opt.transform.opt_a 0.02% : 0.000036s : 1: opt.transform.opt_after_cconv 0.02% : 0.000028s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000119s : 28: opt.transform.opt_b 0.04% : 0.000062s : 2: opt.transform.opt_trans_graph 0.03% : 0.000047s : 4: opt.transform.symbol_engine_opt 21.96% : 0.038285s : 1: opt_a 0.08% : 0.000131s : 1: opt_after_cconv 0.34% : 0.000588s : 1: opt_after_jit_grad 0.15% : 0.000254s : 1: opt_b 23.49% : 0.040942s : 1: optimize 0.01% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000067s : 1: pre_auto_parallel 0.03% : 0.000051s : 1: py_interpret_to_execute 0.01% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000023s : 1: remove_dup_value 3.94% : 0.006863s : 2: renormalize.infer 1.24% : 0.002162s : 2: renormalize.specialize 0.00% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000049s : 1: rewriter_after_opt_a 0.10% : 0.000183s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000101s : 1: symbol_engine_optimizer 4.42% : 0.007697s : 1: task_emit 0.06% : 0.000096s : 1: tuple_transform 34.88% : 0.060808s : 1: type_inference 0.05% : 0.000088s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x1-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x1-kbk],max_mem:4.0M .... TotalTime = 31.4279, [24] [bootstrap]: 0.00075062 [type_inference]: 0.0456515 [event_method]: 1.715e-05 [auto_monad]: 6.686e-05 [graph_reusing]: 6.26e-06 [inline]: 3.18e-06 [add_attr]: 0.046059, [1] [add_attr_with_inline]: 0.046041, [1] [Cycle 1]: 8.056e-05, [2] [tag_attr]: 2.469e-05 [meta_addattr_fg_expand]: 5.20999e-06 [parallel-infer-symbol]: 4.35999e-06 [pre_auto_parallel]: 3.973e-05 [insert-virtual-dataset]: 2.88e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.68e-06 [pipeline_split]: 2.22999e-06 [optimize]: 0.00633532, [53] [py_interpret_to_execute]: 3.533e-05 [rewriter_before_opt_a]: 8.683e-05 [opt_a]: 0.00333673, [2] [Cycle 1]: 0.00245233, [45] [expand_dump_flag]: 3.06999e-06 [switch_simplify]: 3.62e-05 [loop_unroll]: 2.159e-05 [a_1]: 0.00057052 [with_stream_mark]: 2.516e-05 [recompute_prepare]: 1.567e-05 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 3.82998e-06 [updatestate_loads_eliminate]: 3.3e-06 [parameter_eliminate]: 2.71e-06 [a_2]: 9.252e-05 [accelerated_algorithm]: 1.049e-05 [shard]: 2.99001e-06 [meta_shard_fg_expand]: 2.29999e-06 [shard_inline]: 7.53e-06 [merge_send_recv]: 1.167e-05 [auto_parallel]: 1.213e-05 [parallel]: 3.306e-05 [flash_sp]: 1.26e-05 [merge_comm]: 5.10999e-06 [allreduce_fusion]: 4.43001e-06 [matmul_add_comm_reduction]: 1.23e-05 [allreduce_slice_to_reducescatter]: 9.29984e-07 [virtual_shard_identity]: 1.581e-05 [virtual_dataset]: 7.33e-06 [get_grad_eliminate_]: 6.52001e-06 [virtual_output]: 7e-06 [merge_forward]: 5.42999e-06 [cell_reuse_recompute_pass]: 1.74e-06 [offload_activation]: 1.128e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.993e-05 [merge_recompute_call_nodes]: 2.01003e-06 [before_grad]: 1.347e-05 [set_forward_comm_id_for_comm_node_pass]: 5.30999e-06 [meta_fg_expand]: 3.22002e-06 [flash_sp_send_recv_attached]: 3.04001e-06 [receive_attached]: 2.68e-06 [after_resolve]: 1.499e-05 [a_after_grad]: 1.18e-05 [renormalize]: 0.00093101 [add_forward_monad_depend]: 1.717e-05 [auto_monad_grad]: 3.09999e-06 [auto_monad_eliminator]: 2.265e-05 [cse]: 3.79e-05 [a_3]: 5.493e-05 [Cycle 2]: 0.0008684, [45] [expand_dump_flag]: 2.51e-06 [switch_simplify]: 8.08999e-06 [loop_unroll]: 6.12999e-06 [a_1]: 0.00013884 [with_stream_mark]: 2.28e-05 [recompute_prepare]: 8.65001e-06 [updatestate_depend_eliminate]: 4.61002e-06 [updatestate_assign_eliminate]: 3.26001e-06 [updatestate_loads_eliminate]: 4.64998e-06 [parameter_eliminate]: 2.09999e-06 [a_2]: 7.841e-05 [accelerated_algorithm]: 7.58001e-06 [shard]: 1.87001e-06 [meta_shard_fg_expand]: 2.39999e-06 [shard_inline]: 6.35002e-06 [merge_send_recv]: 1.255e-05 [auto_parallel]: 1.119e-05 [parallel]: 8.92999e-06 [flash_sp]: 5.19e-06 [merge_comm]: 5.02e-06 [allreduce_fusion]: 4.01001e-06 [matmul_add_comm_reduction]: 1.199e-05 [allreduce_slice_to_reducescatter]: 7.99977e-07 [virtual_shard_identity]: 1.242e-05 [virtual_dataset]: 6.72002e-06 [get_grad_eliminate_]: 7.1e-06 [virtual_output]: 5.52001e-06 [merge_forward]: 4.85001e-06 [cell_reuse_recompute_pass]: 2.44999e-06 [offload_activation]: 1.391e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.828e-05 [merge_recompute_call_nodes]: 1.60999e-06 [before_grad]: 1.097e-05 [set_forward_comm_id_for_comm_node_pass]: 5.81e-06 [meta_fg_expand]: 2.73e-06 [flash_sp_send_recv_attached]: 1.45999e-06 [receive_attached]: 2.34999e-06 [after_resolve]: 1.373e-05 [a_after_grad]: 1.046e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 3.56999e-06 [auto_monad_grad]: 2.86e-06 [auto_monad_eliminator]: 1.567e-05 [cse]: 4.22e-05 [a_3]: 4.209e-05 [py_interpret_to_execute_after_opt_a]: 1.763e-05 [slice_cell_reuse_recomputed_activation]: 2.28002e-06 [rewriter_after_opt_a]: 5.159e-05 [convert_after_rewriter]: 8.53001e-06 [order_py_execute_after_rewriter]: 5.92999e-06 [mutable_eliminate]: 0.00089777 [opt_b]: 0.00025236, [1] [Cycle 1]: 0.00024093, [7] [b_1]: 0.00012594 [b_2]: 9.87001e-06 [updatestate_depend_eliminate]: 1.271e-05 [updatestate_assign_eliminate]: 3.56001e-06 [updatestate_loads_eliminate]: 3.24001e-06 [renormalize]: 1.29e-06 [cse]: 3.788e-05 [optimize_parallel_all_gather_comm]: 2.702e-05 [overlap_param_gather]: 2.39001e-06 [cconv]: 4.129e-05 [loop_unroll]: 0.00064036 [opt_after_cconv]: 0.00012956, [1] [Cycle 1]: 0.00012073, [7] [c_1]: 3.15e-05 [parameter_eliminate]: 4.90001e-06 [updatestate_depend_eliminate]: 8.99998e-06 [updatestate_assign_eliminate]: 3.11001e-06 [updatestate_loads_eliminate]: 2.96999e-06 [cse]: 2.962e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.826e-05 [tuple_transform]: 8.608e-05, [1] [Cycle 1]: 7.971e-05, [4] [d_1]: 4.593e-05 [none_parameter_eliminate]: 2.21998e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 8.94e-06 [partial_unused_args_eliminate]: 2.19999e-06 [add_recomputation]: 6.599e-05 [cse_after_recomputation]: 3.157e-05, [1] [Cycle 1]: 2.452e-05, [1] [cse]: 1.557e-05 [environ_conv]: 1.094e-05 [swap_dp_allreduce_reducescatter]: 6.29999e-06 [bias_add_comm_swap]: 3.98001e-06 [label_micro_interleaved_index]: 7.85e-06 [label_fine_grained_interleaved_index]: 3.39001e-06 [merge_cast_opt]: 1.42999e-06 [slice_recompute_activation]: 2.26998e-06 [micro_interleaved_order_control]: 2.81e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 9.5999e-07 [remove_cast_before_assign_add]: 9.99979e-07 [full_micro_interleaved_order_control]: 2.53e-06 [reorder_send_recv_between_fp_bp]: 3.46999e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 1.39998e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.37999e-06 [overlap_opt_shard_in_pipeline]: 1.41002e-06 [overlap_opt_shard_grad_in_pipeline]: 2.04e-06 [control_data_broadcast_order]: 1.862e-05 [grouped_pairwise_exchange_alltoall]: 1.89999e-06 [offloading_packed_experts]: 5.71e-06 [overlap_recompute_and_grad_model_parallel]: 5.57001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.43002e-06 [overlap_recompute_comm]: 2.34001e-06 [overlap_grad_ring_attention]: 5.91e-06 [overlap_grad_flash_sp]: 2.44e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.86e-06 [split_layernorm_comm]: 2.16e-06 [handle_group_info]: 1.20999e-06 [symbol_engine_optimizer]: 0.00015817, [1] [Cycle 1]: 0.00015133, [6] [build]: 5.37001e-06 [elim_shapecalc]: 1.905e-05 [elim_not_effective]: 1.605e-05 [opt_reshape]: 7.97e-06 [fold_const_symbol]: 1.058e-05 [renormalize]: 5.8001e-07 [detach_backward]: 3.04001e-06 [pipeline_parallel_scheduler]: 1.69e-06 [auto_monad_reorder]: 2.479e-05 [get_jit_bprop_graph]: 2.39999e-06 [rewriter_after_jit_bprop_graph]: 8.22998e-06 [opt_after_jit_grad]: 0.0008277 [validate]: 5.125e-05 [backend_pass]: 9.89996e-07 [task_emit]: 31.3278 [execute]: 1.033e-05 Sums bootstrap : 0.000751s : 0.00% type_inference : 0.045652s : 0.15% event_method : 0.000017s : 0.00% auto_monad : 0.000067s : 0.00% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000025s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000040s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.00% optimize.rewriter_before_opt_a : 0.000087s : 0.00% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000044s : 0.00% optimize.opt_a.loop_unroll : 0.000028s : 0.00% optimize.opt_a.a_1 : 0.000709s : 0.00% optimize.opt_a.with_stream_mark : 0.000048s : 0.00% optimize.opt_a.recompute_prepare : 0.000024s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000010s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000008s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000171s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000018s : 0.00% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000005s : 0.00% optimize.opt_a.shard_inline : 0.000014s : 0.00% optimize.opt_a.merge_send_recv : 0.000024s : 0.00% optimize.opt_a.auto_parallel : 0.000023s : 0.00% optimize.opt_a.parallel : 0.000042s : 0.00% optimize.opt_a.flash_sp : 0.000018s : 0.00% optimize.opt_a.merge_comm : 0.000010s : 0.00% optimize.opt_a.allreduce_fusion : 0.000008s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000024s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000028s : 0.00% optimize.opt_a.virtual_dataset : 0.000014s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000014s : 0.00% optimize.opt_a.virtual_output : 0.000013s : 0.00% optimize.opt_a.merge_forward : 0.000010s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000025s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000038s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000024s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000011s : 0.00% optimize.opt_a.meta_fg_expand : 0.000006s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000029s : 0.00% optimize.opt_a.a_after_grad : 0.000022s : 0.00% optimize.opt_a.renormalize : 0.000931s : 0.00% optimize.opt_a.add_forward_monad_depend : 0.000021s : 0.00% optimize.opt_a.auto_monad_grad : 0.000006s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000038s : 0.00% optimize.opt_a.cse : 0.000080s : 0.00% optimize.opt_a.a_3 : 0.000097s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000018s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000052s : 0.00% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000898s : 0.00% optimize.opt_b.b_1 : 0.000126s : 0.00% optimize.opt_b.b_2 : 0.000010s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000013s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000038s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000027s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000041s : 0.00% optimize.loop_unroll : 0.000640s : 0.00% optimize.opt_after_cconv.c_1 : 0.000032s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000018s : 0.00% optimize.tuple_transform.d_1 : 0.000046s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000066s : 0.00% optimize.cse_after_recomputation.cse : 0.000016s : 0.00% optimize.environ_conv : 0.000011s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000008s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000019s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000005s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000019s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000001s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000008s : 0.00% opt_after_jit_grad : 0.000828s : 0.00% validate : 0.000051s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 31.327758s : 99.83% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000262 26 20.13% : 0.000053s : 5: substitution.arithmetic_simplify 0.89% : 0.000002s : 2: substitution.elim_not_effective 0.54% : 0.000001s : 2: substitution.fold_const_symbol 2.70% : 0.000007s : 3: substitution.graph_param_transform 64.47% : 0.000169s : 3: substitution.inline 1.82% : 0.000005s : 4: substitution.j_node_and_user_rematch 2.50% : 0.000007s : 4: substitution.remove_not_recompute_node 2.56% : 0.000007s : 2: substitution.replace_old_param 4.38% : 0.000012s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.045566 2 98.21% : 0.044752s : 1: type_inference.infer 1.79% : 0.000814s : 1: type_inference.specialize ------[replace.] 0.000049 4 80.21% : 0.000039s : 3: replace.inline 19.79% : 0.000010s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000177 4 93.96% : 0.000166s : 3: match.inline 6.04% : 0.000011s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000198 883 0.81% : 0.000002s : 9: predicate.accumulaten_eliminater 0.96% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.46% : 0.000001s : 6: predicate.addn_check_dump 0.79% : 0.000002s : 9: predicate.addn_zero_filter 0.79% : 0.000002s : 9: predicate.adjust_all_reduce_mul_add 2.15% : 0.000004s : 15: predicate.arithmetic_simplify 1.20% : 0.000002s : 9: predicate.cast_eliminate 0.54% : 0.000001s : 6: predicate.check_bprop_eliminate 0.54% : 0.000001s : 6: predicate.compare_switch_simplify 0.17% : 0.000000s : 3: predicate.const_output_eliminate 0.67% : 0.000001s : 6: predicate.depend_value_elim 0.80% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 1.14% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.21% : 0.000000s : 3: predicate.elim_not_effective 0.62% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.04% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.00% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 12: predicate.environ_get_depend_swap 1.93% : 0.000004s : 18: predicate.environ_get_eliminate 0.98% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.06% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.24% : 0.000004s : 13: predicate.float_depend_g_call 0.52% : 0.000001s : 6: predicate.float_environ_get_switch 0.79% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 3: predicate.fold_const_symbol 0.85% : 0.000002s : 6: predicate.get_grad_eliminate 0.43% : 0.000001s : 3: predicate.graph_param_transform 0.57% : 0.000001s : 6: predicate.incorporate_call 0.51% : 0.000001s : 6: predicate.incorporate_call_switch 5.82% : 0.000012s : 40: predicate.inline 1.27% : 0.000003s : 6: predicate.inline_without_move 0.32% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.04% : 0.000002s : 6: predicate.less_batch_normalization 1.57% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.18% : 0.000004s : 25: predicate.load_eliminater 1.55% : 0.000003s : 3: predicate.loop_unroll_after_grad 1.85% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.52% : 0.000001s : 6: predicate.merge_addn 0.48% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 9: predicate.minmaximum_grad 2.93% : 0.000006s : 3: predicate.mutable_eliminate 0.45% : 0.000001s : 3: predicate.opt_reshape 0.36% : 0.000001s : 3: predicate.parallel_virtual_node 2.06% : 0.000004s : 13: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000002s : 9: predicate.print_const_string_wrapper 0.61% : 0.000001s : 6: predicate.reduce_all_const_elim 1.16% : 0.000002s : 9: predicate.reduce_eliminate 2.01% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.62% : 0.000001s : 6: predicate.remove_not_recompute_node 1.13% : 0.000002s : 16: predicate.replace_applicator 0.66% : 0.000001s : 6: predicate.replace_old_param 0.39% : 0.000001s : 3: predicate.reset_defer_inline 0.99% : 0.000002s : 9: predicate.reshape_eliminate 0.70% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.35% : 0.000001s : 3: predicate.row_tensor_eliminate 1.02% : 0.000002s : 6: predicate.same_eliminate 0.73% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.28% : 0.000003s : 6: predicate.shard_identity_eliminate 0.76% : 0.000002s : 6: predicate.special_op_eliminate 0.66% : 0.000001s : 6: predicate.specialize_transform 1.15% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.69% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.30% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.19% : 0.000002s : 13: predicate.switch_defer_inline 1.84% : 0.000004s : 19: predicate.switch_layer_defer_inline 4.38% : 0.000009s : 43: predicate.switch_simplify 1.00% : 0.000002s : 9: predicate.tile_eliminate 0.87% : 0.000002s : 9: predicate.transpose_eliminate 1.72% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.71% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000003s : 15: predicate.tuple_list_get_item_depend_reorder 3.47% : 0.000007s : 22: predicate.tuple_list_get_item_eliminator 1.43% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 2.46% : 0.000005s : 21: predicate.tuple_list_set_item_eliminator 1.52% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 1.98% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.71% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 3: predicate.value_based_eliminate 0.60% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.72% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000584 8 45.16% : 0.000264s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.84% : 0.000321s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 31.482564 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.15% : 0.046066s : 1: add_attr 0.15% : 0.046045s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000074s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.00% : 0.000072s : 1: auto_monad 0.00% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000008s : 1: bias_add_comm_swap 0.00% : 0.000780s : 1: bootstrap 0.00% : 0.000047s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000024s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.00% : 0.000015s : 1: environ_conv 0.00% : 0.000024s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000006s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000005s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000007s : 1: label_fine_grained_interleaved_index 0.00% : 0.000011s : 1: label_micro_interleaved_index 0.00% : 0.000660s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.00% : 0.000920s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.00% : 0.000020s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000030s : 1: opt.transform.mutable_eliminate 0.00% : 0.001167s : 78: opt.transform.opt_a 0.00% : 0.000029s : 1: opt.transform.opt_after_cconv 0.00% : 0.000031s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000100s : 28: opt.transform.opt_b 0.00% : 0.000052s : 2: opt.transform.opt_trans_graph 0.00% : 0.000048s : 4: opt.transform.symbol_engine_opt 0.01% : 0.003341s : 1: opt_a 0.00% : 0.000133s : 1: opt_after_cconv 0.00% : 0.000844s : 1: opt_after_jit_grad 0.00% : 0.000257s : 1: opt_b 0.02% : 0.006342s : 1: optimize 0.00% : 0.000031s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000030s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000010s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000010s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000044s : 1: pre_auto_parallel 0.00% : 0.000041s : 1: py_interpret_to_execute 0.00% : 0.000022s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000023s : 1: remove_dup_value 0.00% : 0.000479s : 1: renormalize.infer 0.00% : 0.000439s : 1: renormalize.specialize 0.00% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000012s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000060s : 1: rewriter_after_opt_a 0.00% : 0.000093s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000161s : 1: symbol_engine_optimizer 99.51% : 31.327785s : 1: task_emit 0.00% : 0.000089s : 1: tuple_transform 0.15% : 0.045678s : 1: type_inference 0.00% : 0.000088s : 1: validate TotalTime = 0.723922, [24] [bootstrap]: 0.00043381 [type_inference]: 0.00676298 [event_method]: 1.458e-05 [auto_monad]: 6.404e-05 [graph_reusing]: 5.72001e-06 [inline]: 2.41e-06 [add_attr]: 0.00344574, [1] [add_attr_with_inline]: 0.00343419, [1] [Cycle 1]: 6.358e-05, [2] [tag_attr]: 1.679e-05 [meta_addattr_fg_expand]: 4.63999e-06 [parallel-infer-symbol]: 3.73999e-06 [pre_auto_parallel]: 3.272e-05 [insert-virtual-dataset]: 3.16001e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.94e-06 [optimize]: 0.00527906, [53] [py_interpret_to_execute]: 2.661e-05 [rewriter_before_opt_a]: 6.194e-05 [opt_a]: 0.0027453, [2] [Cycle 1]: 0.00203563, [45] [expand_dump_flag]: 2.89999e-06 [switch_simplify]: 3.126e-05 [loop_unroll]: 1.813e-05 [a_1]: 0.00039931 [with_stream_mark]: 1.791e-05 [recompute_prepare]: 8.53001e-06 [updatestate_depend_eliminate]: 4.27998e-06 [updatestate_assign_eliminate]: 3.58999e-06 [updatestate_loads_eliminate]: 3.31001e-06 [parameter_eliminate]: 1.87999e-06 [a_2]: 8.572e-05 [accelerated_algorithm]: 7.13998e-06 [shard]: 2.67001e-06 [meta_shard_fg_expand]: 1.87001e-06 [shard_inline]: 6.63e-06 [merge_send_recv]: 8.89e-06 [auto_parallel]: 7.5e-06 [parallel]: 2.117e-05 [flash_sp]: 9.36998e-06 [merge_comm]: 4.35999e-06 [allreduce_fusion]: 3.93999e-06 [matmul_add_comm_reduction]: 1.023e-05 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 8.82999e-06 [virtual_dataset]: 6.29001e-06 [get_grad_eliminate_]: 5.92999e-06 [virtual_output]: 6.35002e-06 [merge_forward]: 4.07e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 1.037e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.366e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 1.223e-05 [set_forward_comm_id_for_comm_node_pass]: 4.03001e-06 [meta_fg_expand]: 3.18998e-06 [flash_sp_send_recv_attached]: 0.00020023 [receive_attached]: 2.34999e-06 [after_resolve]: 1.63e-05 [a_after_grad]: 1.136e-05 [renormalize]: 0.00067022 [add_forward_monad_depend]: 6.71e-06 [auto_monad_grad]: 2.76e-06 [auto_monad_eliminator]: 1.828e-05 [cse]: 3.313e-05 [a_3]: 6.672e-05 [Cycle 2]: 0.00069666, [45] [expand_dump_flag]: 2.19001e-06 [switch_simplify]: 7.63999e-06 [loop_unroll]: 5.97001e-06 [a_1]: 0.00012863 [with_stream_mark]: 1.767e-05 [recompute_prepare]: 6.59999e-06 [updatestate_depend_eliminate]: 3.7e-06 [updatestate_assign_eliminate]: 2.79001e-06 [updatestate_loads_eliminate]: 3.36001e-06 [parameter_eliminate]: 1.80001e-06 [a_2]: 7.764e-05 [accelerated_algorithm]: 6.24001e-06 [shard]: 1.72001e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 6.25002e-06 [merge_send_recv]: 7.13e-06 [auto_parallel]: 8.79e-06 [parallel]: 7.9e-06 [flash_sp]: 4.23999e-06 [merge_comm]: 4.06001e-06 [allreduce_fusion]: 3.88001e-06 [matmul_add_comm_reduction]: 8.18999e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 8.2e-06 [virtual_dataset]: 5.91998e-06 [get_grad_eliminate_]: 6.26998e-06 [virtual_output]: 5.78002e-06 [merge_forward]: 3.91999e-06 [cell_reuse_recompute_pass]: 2.20002e-06 [offload_activation]: 9.60001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.171e-05 [merge_recompute_call_nodes]: 1.10999e-06 [before_grad]: 9.82001e-06 [set_forward_comm_id_for_comm_node_pass]: 4.50999e-06 [meta_fg_expand]: 2.38998e-06 [flash_sp_send_recv_attached]: 1.12e-06 [receive_attached]: 1.67999e-06 [after_resolve]: 1.154e-05 [a_after_grad]: 8.16002e-06 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 1.53002e-06 [auto_monad_grad]: 2.44999e-06 [auto_monad_eliminator]: 9.37999e-06 [cse]: 1.984e-05 [a_3]: 3.505e-05 [py_interpret_to_execute_after_opt_a]: 1.595e-05 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 4.452e-05 [convert_after_rewriter]: 6.98e-06 [order_py_execute_after_rewriter]: 5.75001e-06 [mutable_eliminate]: 0.00075288 [opt_b]: 0.00022633, [1] [Cycle 1]: 0.00021762, [7] [b_1]: 0.00012273 [b_2]: 1.03e-05 [updatestate_depend_eliminate]: 8.33999e-06 [updatestate_assign_eliminate]: 3.09999e-06 [updatestate_loads_eliminate]: 2.91e-06 [renormalize]: 4.09986e-07 [cse]: 2.988e-05 [optimize_parallel_all_gather_comm]: 1.947e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 3.675e-05 [loop_unroll]: 0.00056487 [opt_after_cconv]: 0.00012001, [1] [Cycle 1]: 0.00011253, [7] [c_1]: 2.998e-05 [parameter_eliminate]: 5.81e-06 [updatestate_depend_eliminate]: 7.21001e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 3.88001e-06 [cse]: 2.577e-05 [renormalize]: 7.09988e-07 [remove_dup_value]: 1.811e-05 [tuple_transform]: 8.413e-05, [1] [Cycle 1]: 7.809e-05, [4] [d_1]: 4.708e-05 [none_parameter_eliminate]: 1.96e-06 [renormalize]: 3.19997e-07 [switch_simplify]: 7.97003e-06 [partial_unused_args_eliminate]: 2.16e-06 [add_recomputation]: 5.74e-05 [cse_after_recomputation]: 2.638e-05, [1] [Cycle 1]: 2.066e-05, [1] [cse]: 1.403e-05 [environ_conv]: 8.42998e-06 [swap_dp_allreduce_reducescatter]: 5.66003e-06 [bias_add_comm_swap]: 2.81e-06 [label_micro_interleaved_index]: 6.06e-06 [label_fine_grained_interleaved_index]: 2.96999e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.64001e-06 [assign_add_opt]: 1.54e-06 [ForceFp32Comm]: 1.19e-06 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.59999e-06 [reorder_send_recv_between_fp_bp]: 2.94999e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 1.40001e-06 [interleave_split_concat_branches]: 1.34e-06 [interleave_parallel_branches]: 1.22e-06 [overlap_opt_shard_in_pipeline]: 1.57999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99e-06 [control_data_broadcast_order]: 1.539e-05 [grouped_pairwise_exchange_alltoall]: 1.75001e-06 [offloading_packed_experts]: 4.60001e-06 [overlap_recompute_and_grad_model_parallel]: 5.25999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.69998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.28002e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 4.80001e-06 [overlap_grad_flash_sp]: 2.292e-05 [begin_end_overlap_inline]: 5.60016e-07 [split_matmul_comm_elemetwise]: 2.46e-06 [split_layernorm_comm]: 1.85001e-06 [handle_group_info]: 1.14998e-06 [symbol_engine_optimizer]: 8.591e-05, [1] [Cycle 1]: 8.1e-05, [6] [build]: 4.22e-06 [elim_shapecalc]: 1.165e-05 [elim_not_effective]: 1.42e-05 [opt_reshape]: 7.42998e-06 [fold_const_symbol]: 1.088e-05 [renormalize]: 2.79979e-07 [detach_backward]: 3.11001e-06 [pipeline_parallel_scheduler]: 2.39999e-06 [auto_monad_reorder]: 1.943e-05 [get_jit_bprop_graph]: 2.37999e-06 [rewriter_after_jit_bprop_graph]: 5.98998e-06 [opt_after_jit_grad]: 0.00067491 [validate]: 0.0001009 [backend_pass]: 1.14e-06 [task_emit]: 0.706767 [execute]: 1.063e-05 Sums bootstrap : 0.000434s : 0.06% type_inference : 0.006763s : 0.94% event_method : 0.000015s : 0.00% auto_monad : 0.000064s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000033s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000027s : 0.00% optimize.rewriter_before_opt_a : 0.000062s : 0.01% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.01% optimize.opt_a.loop_unroll : 0.000024s : 0.00% optimize.opt_a.a_1 : 0.000528s : 0.07% optimize.opt_a.with_stream_mark : 0.000036s : 0.00% optimize.opt_a.recompute_prepare : 0.000015s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000163s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000016s : 0.00% optimize.opt_a.auto_parallel : 0.000016s : 0.00% optimize.opt_a.parallel : 0.000029s : 0.00% optimize.opt_a.flash_sp : 0.000014s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000008s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000017s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000020s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000022s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000009s : 0.00% optimize.opt_a.meta_fg_expand : 0.000006s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000201s : 0.03% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000028s : 0.00% optimize.opt_a.a_after_grad : 0.000020s : 0.00% optimize.opt_a.renormalize : 0.000670s : 0.09% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.00% optimize.opt_a.auto_monad_grad : 0.000005s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000028s : 0.00% optimize.opt_a.cse : 0.000053s : 0.01% optimize.opt_a.a_3 : 0.000102s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000045s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000753s : 0.10% optimize.opt_b.b_1 : 0.000123s : 0.02% optimize.opt_b.b_2 : 0.000010s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000037s : 0.01% optimize.loop_unroll : 0.000565s : 0.08% optimize.opt_after_cconv.c_1 : 0.000030s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000026s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000018s : 0.00% optimize.tuple_transform.d_1 : 0.000047s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.01% optimize.cse_after_recomputation.cse : 0.000014s : 0.00% optimize.environ_conv : 0.000008s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000019s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000675s : 0.09% validate : 0.000101s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.706767s : 98.26% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.000183 24 21.77% : 0.000040s : 4: substitution.arithmetic_simplify 1.19% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000002s : 2: substitution.fold_const_symbol 3.82% : 0.000007s : 3: substitution.graph_param_transform 64.62% : 0.000118s : 3: substitution.inline 2.18% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.86% : 0.000005s : 4: substitution.remove_not_recompute_node 2.49% : 0.000005s : 2: substitution.replace_old_param ------[type_inference.] 0.006709 2 91.70% : 0.006152s : 1: type_inference.infer 8.30% : 0.000557s : 1: type_inference.specialize ------[replace.] 0.000033 3 100.00% : 0.000033s : 3: replace.inline ------[match.] 0.000116 3 100.00% : 0.000116s : 3: match.inline ------[predicate.] 0.000187 815 0.75% : 0.000001s : 8: predicate.accumulaten_eliminater 1.27% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.79% : 0.000001s : 8: predicate.addn_zero_filter 0.64% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.20% : 0.000004s : 14: predicate.arithmetic_simplify 0.78% : 0.000001s : 8: predicate.cast_eliminate 9.47% : 0.000018s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.18% : 0.000000s : 3: predicate.const_output_eliminate 0.69% : 0.000001s : 6: predicate.depend_value_elim 0.68% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.78% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.64% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.30% : 0.000001s : 3: predicate.elim_not_effective 0.49% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 0.97% : 0.000002s : 11: predicate.environ_add_const_eliminate 0.93% : 0.000002s : 11: predicate.environ_get_add_eliminate 0.98% : 0.000002s : 11: predicate.environ_get_depend_swap 1.87% : 0.000004s : 17: predicate.environ_get_eliminate 0.89% : 0.000002s : 11: predicate.environ_get_set_eliminate 0.94% : 0.000002s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000003s : 11: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 1.05% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.54% : 0.000001s : 6: predicate.incorporate_call 0.50% : 0.000001s : 6: predicate.incorporate_call_switch 5.63% : 0.000011s : 37: predicate.inline 0.89% : 0.000002s : 6: predicate.inline_without_move 0.34% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.78% : 0.000001s : 6: predicate.less_batch_normalization 1.41% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.06% : 0.000004s : 22: predicate.load_eliminater 1.06% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.60% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.51% : 0.000001s : 6: predicate.merge_addn 0.63% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.65% : 0.000001s : 8: predicate.minmaximum_grad 1.51% : 0.000003s : 3: predicate.mutable_eliminate 0.45% : 0.000001s : 3: predicate.opt_reshape 0.45% : 0.000001s : 3: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.10% : 0.000002s : 11: predicate.partial_eliminate 0.91% : 0.000002s : 8: predicate.print_const_string_wrapper 0.64% : 0.000001s : 6: predicate.reduce_all_const_elim 1.04% : 0.000002s : 8: predicate.reduce_eliminate 2.01% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.58% : 0.000001s : 6: predicate.remove_not_recompute_node 1.02% : 0.000002s : 14: predicate.replace_applicator 0.53% : 0.000001s : 6: predicate.replace_old_param 0.42% : 0.000001s : 3: predicate.reset_defer_inline 0.85% : 0.000002s : 8: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.52% : 0.000001s : 3: predicate.row_tensor_eliminate 0.86% : 0.000002s : 6: predicate.same_eliminate 0.56% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.03% : 0.000002s : 6: predicate.shard_identity_eliminate 0.82% : 0.000002s : 6: predicate.special_op_eliminate 0.70% : 0.000001s : 6: predicate.specialize_transform 1.00% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000002s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.02% : 0.000002s : 11: predicate.switch_defer_inline 1.58% : 0.000003s : 17: predicate.switch_layer_defer_inline 3.98% : 0.000007s : 38: predicate.switch_simplify 0.73% : 0.000001s : 8: predicate.tile_eliminate 0.72% : 0.000001s : 8: predicate.transpose_eliminate 1.84% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.40% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000003s : 14: predicate.tuple_list_get_item_depend_reorder 2.98% : 0.000006s : 20: predicate.tuple_list_get_item_eliminator 1.31% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.32% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 1.79% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.77% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.67% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.64% : 0.000001s : 6: predicate.virtual_output_eliminate 0.44% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000402 7 36.07% : 0.000145s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.93% : 0.000257s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.734444 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.47% : 0.003452s : 1: add_attr 0.47% : 0.003439s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000062s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.01% : 0.000069s : 1: auto_monad 0.00% : 0.000024s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.06% : 0.000467s : 1: bootstrap 0.01% : 0.000041s : 1: cconv 0.00% : 0.000005s : 1: comm_op_add_attrs 0.00% : 0.000019s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000030s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.00% : 0.000021s : 1: event_method 0.00% : 0.000020s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.08% : 0.000577s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.10% : 0.000769s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000019s : 1: opt.transform.mutable_eliminate 0.13% : 0.000951s : 78: opt.transform.opt_a 0.00% : 0.000028s : 1: opt.transform.opt_after_cconv 0.00% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000100s : 28: opt.transform.opt_b 0.01% : 0.000053s : 2: opt.transform.opt_trans_graph 0.01% : 0.000040s : 4: opt.transform.symbol_engine_opt 0.37% : 0.002749s : 1: opt_a 0.02% : 0.000123s : 1: opt_after_cconv 0.09% : 0.000694s : 1: opt_after_jit_grad 0.03% : 0.000230s : 1: opt_b 0.72% : 0.005285s : 1: optimize 0.00% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000038s : 1: pre_auto_parallel 0.00% : 0.000031s : 1: py_interpret_to_execute 0.00% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000023s : 1: remove_dup_value 0.05% : 0.000338s : 1: renormalize.infer 0.04% : 0.000324s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000049s : 1: rewriter_after_opt_a 0.01% : 0.000067s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000089s : 1: symbol_engine_optimizer 96.24% : 0.706798s : 1: task_emit 0.01% : 0.000087s : 1: tuple_transform 0.92% : 0.006790s : 1: type_inference 0.02% : 0.000141s : 1: validate TotalTime = 0.628409, [24] [bootstrap]: 0.00042099 [type_inference]: 0.0189962 [event_method]: 1.637e-05 [auto_monad]: 6.666e-05 [graph_reusing]: 6.49001e-06 [inline]: 2.70002e-06 [add_attr]: 0.00377574, [1] [add_attr_with_inline]: 0.00376391, [1] [Cycle 1]: 6.862e-05, [2] [tag_attr]: 2.046e-05 [meta_addattr_fg_expand]: 5.19e-06 [parallel-infer-symbol]: 3.85e-06 [pre_auto_parallel]: 3.231e-05 [insert-virtual-dataset]: 2.93e-06 [parallel-infer-symbol-second]: 9.89996e-07 [dataset_repeat_opt]: 2.36e-06 [pipeline_split]: 1.72999e-06 [optimize]: 0.0173076, [53] [py_interpret_to_execute]: 2.465e-05 [rewriter_before_opt_a]: 7.302e-05 [opt_a]: 0.00269199, [2] [Cycle 1]: 0.00201393, [45] [expand_dump_flag]: 3.51001e-06 [switch_simplify]: 3.627e-05 [loop_unroll]: 2.148e-05 [a_1]: 0.00049271 [with_stream_mark]: 1.842e-05 [recompute_prepare]: 9.31998e-06 [updatestate_depend_eliminate]: 4.32998e-06 [updatestate_assign_eliminate]: 3.55e-06 [updatestate_loads_eliminate]: 3.2e-06 [parameter_eliminate]: 2.22001e-06 [a_2]: 8.44e-05 [accelerated_algorithm]: 6.95002e-06 [shard]: 2.56e-06 [meta_shard_fg_expand]: 1.94999e-06 [shard_inline]: 6.25002e-06 [merge_send_recv]: 8.77999e-06 [auto_parallel]: 7.73001e-06 [parallel]: 2.028e-05 [flash_sp]: 9.10001e-06 [merge_comm]: 4.26001e-06 [allreduce_fusion]: 3.6e-06 [matmul_add_comm_reduction]: 1.016e-05 [allreduce_slice_to_reducescatter]: 8.30012e-07 [virtual_shard_identity]: 9.01998e-06 [virtual_dataset]: 2.508e-05 [get_grad_eliminate_]: 7.21999e-06 [virtual_output]: 6.74999e-06 [merge_forward]: 4.48001e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 1.028e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.329e-05 [merge_recompute_call_nodes]: 1.95001e-06 [before_grad]: 1.054e-05 [set_forward_comm_id_for_comm_node_pass]: 4.55001e-06 [meta_fg_expand]: 3.64002e-06 [flash_sp_send_recv_attached]: 3.16001e-06 [receive_attached]: 2.51998e-06 [after_resolve]: 1.065e-05 [a_after_grad]: 9.19e-06 [renormalize]: 0.00075642 [add_forward_monad_depend]: 8.23999e-06 [auto_monad_grad]: 2.61e-06 [auto_monad_eliminator]: 1.56e-05 [cse]: 3.513e-05 [a_3]: 4.798e-05 [Cycle 2]: 0.00066424, [45] [expand_dump_flag]: 1.99e-06 [switch_simplify]: 8e-06 [loop_unroll]: 5.72001e-06 [a_1]: 0.00012333 [with_stream_mark]: 1.47e-05 [recompute_prepare]: 6.48e-06 [updatestate_depend_eliminate]: 3.50998e-06 [updatestate_assign_eliminate]: 2.76999e-06 [updatestate_loads_eliminate]: 3.13998e-06 [parameter_eliminate]: 1.15001e-06 [a_2]: 7.247e-05 [accelerated_algorithm]: 5.94e-06 [shard]: 1.29998e-06 [meta_shard_fg_expand]: 1.48002e-06 [shard_inline]: 6.19999e-06 [merge_send_recv]: 6.58e-06 [auto_parallel]: 8.12e-06 [parallel]: 6.58e-06 [flash_sp]: 4.07003e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.58999e-06 [matmul_add_comm_reduction]: 6.29001e-06 [allreduce_slice_to_reducescatter]: 6.99976e-07 [virtual_shard_identity]: 6.91999e-06 [virtual_dataset]: 5.76998e-06 [get_grad_eliminate_]: 5.69999e-06 [virtual_output]: 5.40001e-06 [merge_forward]: 3.14999e-06 [cell_reuse_recompute_pass]: 1.47001e-06 [offload_activation]: 7.67002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.131e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.002e-05 [set_forward_comm_id_for_comm_node_pass]: 4.34002e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 1.39998e-06 [receive_attached]: 2.07001e-06 [after_resolve]: 1.127e-05 [a_after_grad]: 8.67e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.69e-06 [auto_monad_grad]: 1.06002e-06 [auto_monad_eliminator]: 7.10998e-06 [cse]: 1.722e-05 [a_3]: 3.382e-05 [py_interpret_to_execute_after_opt_a]: 1.082e-05 [slice_cell_reuse_recomputed_activation]: 2.28002e-06 [rewriter_after_opt_a]: 3.993e-05 [convert_after_rewriter]: 7.83999e-06 [order_py_execute_after_rewriter]: 5.61e-06 [mutable_eliminate]: 0.00069092 [opt_b]: 0.00021274, [1] [Cycle 1]: 0.00020369, [7] [b_1]: 0.00011818 [b_2]: 8.25e-06 [updatestate_depend_eliminate]: 9.39998e-06 [updatestate_assign_eliminate]: 2.78e-06 [updatestate_loads_eliminate]: 2.49999e-06 [renormalize]: 6.19999e-07 [cse]: 2.289e-05 [optimize_parallel_all_gather_comm]: 1.894e-05 [overlap_param_gather]: 1.96e-06 [cconv]: 3.134e-05 [loop_unroll]: 0.00050275 [opt_after_cconv]: 0.00010873, [1] [Cycle 1]: 0.00010073, [7] [c_1]: 2.841e-05 [parameter_eliminate]: 4.13001e-06 [updatestate_depend_eliminate]: 6.09001e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 2.025e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.913e-05 [tuple_transform]: 7.948e-05, [1] [Cycle 1]: 7.429e-05, [4] [d_1]: 4.388e-05 [none_parameter_eliminate]: 2.10002e-06 [renormalize]: 2.80008e-07 [switch_simplify]: 7.33999e-06 [partial_unused_args_eliminate]: 2.40002e-06 [add_recomputation]: 5.316e-05 [cse_after_recomputation]: 2.501e-05, [1] [Cycle 1]: 1.933e-05, [1] [cse]: 1.348e-05 [environ_conv]: 7.6e-06 [swap_dp_allreduce_reducescatter]: 5.64e-06 [bias_add_comm_swap]: 2.73e-06 [label_micro_interleaved_index]: 6.07001e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.79001e-06 [micro_interleaved_order_control]: 2.74999e-06 [assign_add_opt]: 1.65001e-06 [ForceFp32Comm]: 1.09e-06 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.43e-06 [reorder_send_recv_between_fp_bp]: 3.25e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.22999e-06 [interleave_parallel_branches]: 1.17e-06 [overlap_opt_shard_in_pipeline]: 1.32e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.466e-05 [grouped_pairwise_exchange_alltoall]: 1.74e-06 [offloading_packed_experts]: 5.34998e-06 [overlap_recompute_and_grad_model_parallel]: 5.61e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 2.53003e-06 [overlap_grad_ring_attention]: 4.77998e-06 [overlap_grad_flash_sp]: 2.194e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.52001e-06 [split_layernorm_comm]: 2.13998e-06 [handle_group_info]: 1.14998e-06 [symbol_engine_optimizer]: 0.0123307, [1] [Cycle 1]: 0.0123205, [6] [build]: 4.33999e-06 [elim_shapecalc]: 1.049e-05 [elim_not_effective]: 1.457e-05 [opt_reshape]: 7.75e-06 [fold_const_symbol]: 1.012e-05 [renormalize]: 1.40001e-06 [detach_backward]: 5.99e-06 [pipeline_parallel_scheduler]: 2.73e-06 [auto_monad_reorder]: 4.773e-05 [get_jit_bprop_graph]: 2.78e-06 [rewriter_after_jit_bprop_graph]: 1.521e-05 [opt_after_jit_grad]: 0.00087248 [validate]: 6.134e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.586457 [execute]: 1.074e-05 Sums bootstrap : 0.000421s : 0.07% type_inference : 0.018996s : 3.11% event_method : 0.000016s : 0.00% auto_monad : 0.000067s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000020s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000032s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000025s : 0.00% optimize.rewriter_before_opt_a : 0.000073s : 0.01% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000044s : 0.01% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000616s : 0.10% optimize.opt_a.with_stream_mark : 0.000033s : 0.01% optimize.opt_a.recompute_prepare : 0.000016s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000157s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000015s : 0.00% optimize.opt_a.auto_parallel : 0.000016s : 0.00% optimize.opt_a.parallel : 0.000027s : 0.00% optimize.opt_a.flash_sp : 0.000013s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.00% optimize.opt_a.virtual_dataset : 0.000031s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000013s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000018s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000009s : 0.00% optimize.opt_a.meta_fg_expand : 0.000006s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000756s : 0.12% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.00% optimize.opt_a.cse : 0.000052s : 0.01% optimize.opt_a.a_3 : 0.000082s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000040s : 0.01% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000691s : 0.11% optimize.opt_b.b_1 : 0.000118s : 0.02% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000023s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000031s : 0.01% optimize.loop_unroll : 0.000503s : 0.08% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000020s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000019s : 0.00% optimize.tuple_transform.d_1 : 0.000044s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.01% optimize.cse_after_recomputation.cse : 0.000013s : 0.00% optimize.environ_conv : 0.000008s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000022s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000001s : 0.00% detach_backward : 0.000006s : 0.00% pipeline_parallel_scheduler : 0.000003s : 0.00% auto_monad_reorder : 0.000048s : 0.01% get_jit_bprop_graph : 0.000003s : 0.00% rewriter_after_jit_bprop_graph : 0.000015s : 0.00% opt_after_jit_grad : 0.000872s : 0.14% validate : 0.000061s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.586457s : 95.95% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.000210 26 18.14% : 0.000038s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000002s : 2: substitution.fold_const_symbol 3.34% : 0.000007s : 3: substitution.graph_param_transform 64.99% : 0.000136s : 3: substitution.inline 1.97% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.78% : 0.000006s : 4: substitution.remove_not_recompute_node 2.41% : 0.000005s : 2: substitution.replace_old_param 4.53% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.018940 2 96.25% : 0.018229s : 1: type_inference.infer 3.75% : 0.000711s : 1: type_inference.specialize ------[replace.] 0.000044 4 79.64% : 0.000035s : 3: replace.inline 20.36% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000143 4 93.92% : 0.000134s : 3: match.inline 6.08% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000176 883 0.98% : 0.000002s : 9: predicate.accumulaten_eliminater 1.49% : 0.000003s : 3: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 6: predicate.addn_check_dump 0.96% : 0.000002s : 9: predicate.addn_zero_filter 0.92% : 0.000002s : 9: predicate.adjust_all_reduce_mul_add 2.10% : 0.000004s : 15: predicate.arithmetic_simplify 0.97% : 0.000002s : 9: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.18% : 0.000000s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.88% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.69% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.31% : 0.000001s : 3: predicate.elim_not_effective 0.42% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.32% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 12: predicate.environ_get_depend_swap 1.74% : 0.000003s : 18: predicate.environ_get_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.22% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.46% : 0.000004s : 13: predicate.float_depend_g_call 0.52% : 0.000001s : 6: predicate.float_environ_get_switch 0.79% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.20% : 0.000000s : 3: predicate.graph_param_transform 0.61% : 0.000001s : 6: predicate.incorporate_call 0.52% : 0.000001s : 6: predicate.incorporate_call_switch 5.87% : 0.000010s : 40: predicate.inline 0.82% : 0.000001s : 6: predicate.inline_without_move 0.36% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 6: predicate.less_batch_normalization 1.86% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.30% : 0.000004s : 25: predicate.load_eliminater 1.03% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.00% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.55% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 9: predicate.minmaximum_grad 1.22% : 0.000002s : 3: predicate.mutable_eliminate 0.57% : 0.000001s : 3: predicate.opt_reshape 0.58% : 0.000001s : 3: predicate.parallel_virtual_node 1.53% : 0.000003s : 13: predicate.partial_defer_inline 1.40% : 0.000002s : 13: predicate.partial_eliminate 0.86% : 0.000002s : 9: predicate.print_const_string_wrapper 0.55% : 0.000001s : 6: predicate.reduce_all_const_elim 1.30% : 0.000002s : 9: predicate.reduce_eliminate 2.28% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 6: predicate.remove_not_recompute_node 1.16% : 0.000002s : 16: predicate.replace_applicator 0.72% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.91% : 0.000002s : 9: predicate.reshape_eliminate 0.60% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 3: predicate.row_tensor_eliminate 0.94% : 0.000002s : 6: predicate.same_eliminate 0.40% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.74% : 0.000001s : 6: predicate.shard_identity_eliminate 0.75% : 0.000001s : 6: predicate.special_op_eliminate 0.73% : 0.000001s : 6: predicate.specialize_transform 1.13% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000002s : 6: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.27% : 0.000002s : 13: predicate.switch_defer_inline 1.92% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.76% : 0.000008s : 43: predicate.switch_simplify 0.81% : 0.000001s : 9: predicate.tile_eliminate 0.92% : 0.000002s : 9: predicate.transpose_eliminate 1.64% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.55% : 0.000003s : 15: predicate.tuple_list_get_item_depend_reorder 3.16% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.48% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.14% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.67% : 0.000001s : 3: predicate.value_based_eliminate 0.61% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.97% : 0.000002s : 6: predicate.virtual_output_eliminate 0.34% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.43% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000479 8 46.41% : 0.000222s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.59% : 0.000257s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.651426 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.58% : 0.003782s : 1: add_attr 0.58% : 0.003768s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000058s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.01% : 0.000072s : 1: auto_monad 0.01% : 0.000054s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.07% : 0.000460s : 1: bootstrap 0.01% : 0.000035s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000018s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000028s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000012s : 1: detach_backward 0.00% : 0.000011s : 1: environ_conv 0.00% : 0.000023s : 1: event_method 0.00% : 0.000019s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000007s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.08% : 0.000512s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.11% : 0.000702s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.16% : 0.001031s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.01% : 0.000039s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000094s : 28: opt.transform.opt_b 0.01% : 0.000049s : 2: opt.transform.opt_trans_graph 0.01% : 0.000039s : 4: opt.transform.symbol_engine_opt 0.41% : 0.002695s : 1: opt_a 0.02% : 0.000113s : 1: opt_after_cconv 0.14% : 0.000893s : 1: opt_after_jit_grad 0.03% : 0.000217s : 1: opt_b 2.66% : 0.017315s : 1: optimize 0.00% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000037s : 1: pre_auto_parallel 0.00% : 0.000029s : 1: py_interpret_to_execute 0.00% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000023s : 1: remove_dup_value 0.06% : 0.000402s : 1: renormalize.infer 0.05% : 0.000346s : 1: renormalize.specialize 0.00% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000019s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000044s : 1: rewriter_after_opt_a 0.01% : 0.000078s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 1.89% : 0.012338s : 1: symbol_engine_optimizer 90.03% : 0.586485s : 1: task_emit 0.01% : 0.000083s : 1: tuple_transform 2.92% : 0.019020s : 1: type_inference 0.02% : 0.000103s : 1: validate TotalTime = 0.647839, [24] [bootstrap]: 0.00050162 [type_inference]: 0.0315254 [event_method]: 6.352e-05 [auto_monad]: 0.00015681 [graph_reusing]: 9.61998e-06 [inline]: 3.37002e-06 [add_attr]: 0.00831169, [1] [add_attr_with_inline]: 0.00829741, [1] [Cycle 1]: 0.00010165, [2] [tag_attr]: 4.478e-05 [meta_addattr_fg_expand]: 1.14e-05 [parallel-infer-symbol]: 3.78001e-06 [pre_auto_parallel]: 6.439e-05 [insert-virtual-dataset]: 2.71e-06 [parallel-infer-symbol-second]: 9.20001e-07 [dataset_repeat_opt]: 2.34999e-06 [pipeline_split]: 2.14e-06 [optimize]: 0.0445904, [53] [py_interpret_to_execute]: 5.076e-05 [rewriter_before_opt_a]: 0.00018586 [opt_a]: 0.0417643, [3] [Cycle 1]: 0.0316657, [45] [expand_dump_flag]: 6.68e-06 [switch_simplify]: 8.149e-05 [loop_unroll]: 6.551e-05 [a_1]: 0.00158752 [with_stream_mark]: 3.016e-05 [recompute_prepare]: 2.656e-05 [updatestate_depend_eliminate]: 9.86e-06 [updatestate_assign_eliminate]: 8.23999e-06 [updatestate_loads_eliminate]: 7.46001e-06 [parameter_eliminate]: 3.5e-06 [a_2]: 0.00025483 [accelerated_algorithm]: 3.735e-05 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 5.12999e-06 [shard_inline]: 1.837e-05 [merge_send_recv]: 1.899e-05 [auto_parallel]: 1.322e-05 [parallel]: 2.315e-05 [flash_sp]: 1.427e-05 [merge_comm]: 1.017e-05 [allreduce_fusion]: 8.94998e-06 [matmul_add_comm_reduction]: 3.512e-05 [allreduce_slice_to_reducescatter]: 8.99978e-07 [virtual_shard_identity]: 3.508e-05 [virtual_dataset]: 1.721e-05 [get_grad_eliminate_]: 1.688e-05 [virtual_output]: 1.578e-05 [merge_forward]: 9.94001e-06 [cell_reuse_recompute_pass]: 1.50001e-06 [offload_activation]: 1.989e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.692e-05 [merge_recompute_call_nodes]: 2.26003e-06 [before_grad]: 3.182e-05 [set_forward_comm_id_for_comm_node_pass]: 1.112e-05 [meta_fg_expand]: 0.00216137 [flash_sp_send_recv_attached]: 5.92999e-06 [receive_attached]: 2.93998e-06 [after_resolve]: 8.258e-05 [a_after_grad]: 0.00010193 [renormalize]: 0.0232866 [add_forward_monad_depend]: 1.954e-05 [auto_monad_grad]: 7.58001e-06 [auto_monad_eliminator]: 0.00242052 [cse]: 0.00029203 [a_3]: 0.00038492 [Cycle 2]: 0.00922698, [45] [expand_dump_flag]: 3.54002e-06 [switch_simplify]: 4.989e-05 [loop_unroll]: 4.377e-05 [a_1]: 0.00152923 [with_stream_mark]: 2.443e-05 [recompute_prepare]: 1.227e-05 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 3.92002e-06 [updatestate_loads_eliminate]: 3.5e-06 [parameter_eliminate]: 2.57001e-06 [a_2]: 9.497e-05 [accelerated_algorithm]: 1.262e-05 [shard]: 2.52001e-06 [meta_shard_fg_expand]: 3.45e-06 [shard_inline]: 7.71001e-06 [merge_send_recv]: 1.181e-05 [auto_parallel]: 1.118e-05 [parallel]: 1.111e-05 [flash_sp]: 4.72998e-06 [merge_comm]: 4.77e-06 [allreduce_fusion]: 4.2e-06 [matmul_add_comm_reduction]: 1.098e-05 [allreduce_slice_to_reducescatter]: 9.60019e-07 [virtual_shard_identity]: 1.095e-05 [virtual_dataset]: 7.76001e-06 [get_grad_eliminate_]: 8.65001e-06 [virtual_output]: 7.01001e-06 [merge_forward]: 5.04e-06 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 1.188e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.519e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 1.262e-05 [set_forward_comm_id_for_comm_node_pass]: 4.68999e-06 [meta_fg_expand]: 0.00015334 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.26998e-06 [after_resolve]: 1.633e-05 [a_after_grad]: 1.194e-05 [renormalize]: 0.00661903 [add_forward_monad_depend]: 1.414e-05 [auto_monad_grad]: 3.05998e-06 [auto_monad_eliminator]: 2.719e-05 [cse]: 4.422e-05 [a_3]: 6.858e-05 [Cycle 3]: 0.00084865, [45] [expand_dump_flag]: 2.56998e-06 [switch_simplify]: 1.068e-05 [loop_unroll]: 7.31001e-06 [a_1]: 0.00018559 [with_stream_mark]: 1.684e-05 [recompute_prepare]: 7.31001e-06 [updatestate_depend_eliminate]: 5.69999e-06 [updatestate_assign_eliminate]: 3.92002e-06 [updatestate_loads_eliminate]: 3.53e-06 [parameter_eliminate]: 2.26e-06 [a_2]: 9.186e-05 [accelerated_algorithm]: 1.295e-05 [shard]: 2.14e-06 [meta_shard_fg_expand]: 3.16001e-06 [shard_inline]: 7.98999e-06 [merge_send_recv]: 1.044e-05 [auto_parallel]: 1.147e-05 [parallel]: 1.108e-05 [flash_sp]: 1.95001e-06 [merge_comm]: 4.43001e-06 [allreduce_fusion]: 4.26001e-06 [matmul_add_comm_reduction]: 1.125e-05 [allreduce_slice_to_reducescatter]: 8.79983e-07 [virtual_shard_identity]: 9.19e-06 [virtual_dataset]: 7.05e-06 [get_grad_eliminate_]: 6.71999e-06 [virtual_output]: 6.92002e-06 [merge_forward]: 5.96e-06 [cell_reuse_recompute_pass]: 3.13e-06 [offload_activation]: 1.189e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.507e-05 [merge_recompute_call_nodes]: 1.89e-06 [before_grad]: 1.368e-05 [set_forward_comm_id_for_comm_node_pass]: 4.94e-06 [meta_fg_expand]: 3.33e-06 [flash_sp_send_recv_attached]: 2.37001e-06 [receive_attached]: 2.22001e-06 [after_resolve]: 1.336e-05 [a_after_grad]: 1.164e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.96e-06 [auto_monad_grad]: 1.84e-06 [auto_monad_eliminator]: 1.017e-05 [cse]: 2.54e-05 [a_3]: 4.196e-05 [py_interpret_to_execute_after_opt_a]: 2.251e-05 [slice_cell_reuse_recomputed_activation]: 2.44001e-06 [rewriter_after_opt_a]: 5.16e-05 [convert_after_rewriter]: 8.32e-06 [order_py_execute_after_rewriter]: 6.10002e-06 [mutable_eliminate]: 0.00081866 [opt_b]: 0.00028244, [1] [Cycle 1]: 0.00027307, [7] [b_1]: 0.00016717 [b_2]: 1.049e-05 [updatestate_depend_eliminate]: 1.001e-05 [updatestate_assign_eliminate]: 3.92998e-06 [updatestate_loads_eliminate]: 4.01001e-06 [renormalize]: 1.17e-06 [cse]: 3.393e-05 [optimize_parallel_all_gather_comm]: 2.359e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 3.516e-05 [loop_unroll]: 0.00050409 [opt_after_cconv]: 0.00012616, [1] [Cycle 1]: 0.00011903, [7] [c_1]: 3.492e-05 [parameter_eliminate]: 6.49999e-06 [updatestate_depend_eliminate]: 7.77002e-06 [updatestate_assign_eliminate]: 3.35e-06 [updatestate_loads_eliminate]: 2.93998e-06 [cse]: 2.523e-05 [renormalize]: 4.60015e-07 [remove_dup_value]: 1.987e-05 [tuple_transform]: 9.033e-05, [1] [Cycle 1]: 8.516e-05, [4] [d_1]: 5.58e-05 [none_parameter_eliminate]: 1.67999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 7.61999e-06 [partial_unused_args_eliminate]: 2.31e-06 [add_recomputation]: 6.9e-05 [cse_after_recomputation]: 2.887e-05, [1] [Cycle 1]: 2.375e-05, [1] [cse]: 1.774e-05 [environ_conv]: 1.194e-05 [swap_dp_allreduce_reducescatter]: 6.71999e-06 [bias_add_comm_swap]: 2.98e-06 [label_micro_interleaved_index]: 5.74e-06 [label_fine_grained_interleaved_index]: 3.00002e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.50999e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.43002e-06 [full_micro_interleaved_order_control]: 2.45002e-06 [reorder_send_recv_between_fp_bp]: 3.31001e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 1.20999e-06 [interleave_split_concat_branches]: 1.38002e-06 [interleave_parallel_branches]: 1.12999e-06 [overlap_opt_shard_in_pipeline]: 1.59e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.717e-05 [grouped_pairwise_exchange_alltoall]: 1.74e-06 [offloading_packed_experts]: 5.26998e-06 [overlap_recompute_and_grad_model_parallel]: 6.19999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.70001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 5.36002e-06 [overlap_grad_flash_sp]: 2.638e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.18998e-06 [split_layernorm_comm]: 1.84e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 0.00010235, [1] [Cycle 1]: 9.701e-05, [6] [build]: 1.191e-05 [elim_shapecalc]: 1.309e-05 [elim_not_effective]: 1.727e-05 [opt_reshape]: 8.14997e-06 [fold_const_symbol]: 1.346e-05 [renormalize]: 2.50002e-07 [detach_backward]: 2.50997e-06 [pipeline_parallel_scheduler]: 1.64e-06 [auto_monad_reorder]: 2.221e-05 [get_jit_bprop_graph]: 1.96003e-06 [rewriter_after_jit_bprop_graph]: 5.02e-06 [opt_after_jit_grad]: 0.00055909 [validate]: 5.84e-05 [backend_pass]: 1.24e-06 [task_emit]: 0.561648 [execute]: 1.177e-05 Sums bootstrap : 0.000502s : 0.08% type_inference : 0.031525s : 4.94% event_method : 0.000064s : 0.01% auto_monad : 0.000157s : 0.02% graph_reusing : 0.000010s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000045s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000064s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000051s : 0.01% optimize.rewriter_before_opt_a : 0.000186s : 0.03% optimize.opt_a.expand_dump_flag : 0.000013s : 0.00% optimize.opt_a.switch_simplify : 0.000142s : 0.02% optimize.opt_a.loop_unroll : 0.000117s : 0.02% optimize.opt_a.a_1 : 0.003302s : 0.52% optimize.opt_a.with_stream_mark : 0.000071s : 0.01% optimize.opt_a.recompute_prepare : 0.000046s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.00% optimize.opt_a.parameter_eliminate : 0.000008s : 0.00% optimize.opt_a.a_2 : 0.000442s : 0.07% optimize.opt_a.accelerated_algorithm : 0.000063s : 0.01% optimize.opt_a.shard : 0.000007s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000012s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.01% optimize.opt_a.merge_send_recv : 0.000041s : 0.01% optimize.opt_a.auto_parallel : 0.000036s : 0.01% optimize.opt_a.parallel : 0.000045s : 0.01% optimize.opt_a.flash_sp : 0.000021s : 0.00% optimize.opt_a.merge_comm : 0.000019s : 0.00% optimize.opt_a.allreduce_fusion : 0.000017s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000057s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000003s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000055s : 0.01% optimize.opt_a.virtual_dataset : 0.000032s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.01% optimize.opt_a.virtual_output : 0.000030s : 0.00% optimize.opt_a.merge_forward : 0.000021s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000044s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000067s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000006s : 0.00% optimize.opt_a.before_grad : 0.000058s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.002318s : 0.36% optimize.opt_a.flash_sp_send_recv_attached : 0.000011s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000112s : 0.02% optimize.opt_a.a_after_grad : 0.000126s : 0.02% optimize.opt_a.renormalize : 0.029906s : 4.69% optimize.opt_a.add_forward_monad_depend : 0.000036s : 0.01% optimize.opt_a.auto_monad_grad : 0.000012s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.002458s : 0.39% optimize.opt_a.cse : 0.000362s : 0.06% optimize.opt_a.a_3 : 0.000495s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000023s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000052s : 0.01% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000819s : 0.13% optimize.opt_b.b_1 : 0.000167s : 0.03% optimize.opt_b.b_2 : 0.000010s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000034s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000035s : 0.01% optimize.loop_unroll : 0.000504s : 0.08% optimize.opt_after_cconv.c_1 : 0.000035s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000025s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000020s : 0.00% optimize.tuple_transform.d_1 : 0.000056s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000069s : 0.01% optimize.cse_after_recomputation.cse : 0.000018s : 0.00% optimize.environ_conv : 0.000012s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000017s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000559s : 0.09% validate : 0.000058s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.561648s : 88.05% execute : 0.000012s : 0.00% Time group info: ------[substitution.] 0.000969 161 7.88% : 0.000076s : 8: substitution.arithmetic_simplify 0.29% : 0.000003s : 3: substitution.elim_not_effective 0.59% : 0.000006s : 5: substitution.float_depend_g_call 0.46% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.20% : 0.000002s : 3: substitution.fold_const_symbol 0.80% : 0.000008s : 4: substitution.graph_param_transform 0.42% : 0.000004s : 2: substitution.incorporate_call 0.22% : 0.000002s : 2: substitution.incorporate_call_switch 59.90% : 0.000580s : 17: substitution.inline 2.59% : 0.000025s : 2: substitution.inline_without_move 1.32% : 0.000013s : 15: substitution.j_node_and_user_rematch 2.20% : 0.000021s : 3: substitution.less_batch_normalization 1.35% : 0.000013s : 7: substitution.minmaximum_grad 0.76% : 0.000007s : 5: substitution.partial_eliminate 1.54% : 0.000015s : 15: substitution.remove_not_recompute_node 3.72% : 0.000036s : 10: substitution.replace_applicator 1.46% : 0.000014s : 10: substitution.replace_old_param 0.32% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.76% : 0.000027s : 7: substitution.tuple_list_convert_item_index_to_positive 1.23% : 0.000012s : 7: substitution.tuple_list_get_item_const_eliminator 1.63% : 0.000016s : 7: substitution.tuple_list_get_item_depend_reorder 6.64% : 0.000064s : 19: substitution.tuple_list_get_item_eliminator 1.71% : 0.000017s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.031389 2 85.38% : 0.026799s : 1: type_inference.infer 14.62% : 0.004590s : 1: type_inference.specialize ------[replace.] 0.000241 27 63.37% : 0.000153s : 17: replace.inline 36.63% : 0.000088s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000600 27 94.91% : 0.000570s : 17: match.inline 5.09% : 0.000031s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000752 4248 1.08% : 0.000008s : 53: predicate.accumulaten_eliminater 0.27% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.44% : 0.000003s : 21: predicate.addn_check_dump 1.07% : 0.000008s : 53: predicate.addn_zero_filter 1.05% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 2.09% : 0.000016s : 74: predicate.arithmetic_simplify 1.22% : 0.000009s : 53: predicate.cast_eliminate 1.11% : 0.000008s : 50: predicate.check_bprop_eliminate 0.43% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.47% : 0.000004s : 21: predicate.depend_value_elim 1.21% : 0.000009s : 53: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 53: predicate.dict_get_item_eliminator 1.13% : 0.000009s : 53: predicate.dict_set_item_eliminator 0.34% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.06% : 0.000000s : 4: predicate.elim_not_effective 0.14% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 57: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 57: predicate.environ_get_add_eliminate 1.16% : 0.000009s : 57: predicate.environ_get_depend_swap 1.61% : 0.000012s : 78: predicate.environ_get_eliminate 1.12% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.78% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.54% : 0.000019s : 80: predicate.float_depend_g_call 0.45% : 0.000003s : 21: predicate.float_environ_get_switch 0.53% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.51% : 0.000004s : 21: predicate.get_grad_eliminate 0.08% : 0.000001s : 4: predicate.graph_param_transform 0.52% : 0.000004s : 21: predicate.incorporate_call 0.42% : 0.000003s : 21: predicate.incorporate_call_switch 5.95% : 0.000045s : 183: predicate.inline 1.42% : 0.000011s : 45: predicate.inline_without_move 0.29% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 21: predicate.less_batch_normalization 1.71% : 0.000013s : 71: predicate.list_to_tuple_eliminator_ 2.53% : 0.000019s : 124: predicate.load_eliminater 0.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.41% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.46% : 0.000003s : 21: predicate.merge_addn 1.10% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.09% : 0.000008s : 53: predicate.minmaximum_grad 0.39% : 0.000003s : 4: predicate.mutable_eliminate 0.12% : 0.000001s : 4: predicate.opt_reshape 0.16% : 0.000001s : 4: predicate.parallel_virtual_node 2.07% : 0.000016s : 80: predicate.partial_defer_inline 1.68% : 0.000013s : 67: predicate.partial_eliminate 1.10% : 0.000008s : 53: predicate.print_const_string_wrapper 0.49% : 0.000004s : 21: predicate.reduce_all_const_elim 1.37% : 0.000010s : 53: predicate.reduce_eliminate 2.59% : 0.000019s : 124: predicate.redundant_stop_gradient_eliminater 0.30% : 0.000002s : 21: predicate.remove_not_recompute_node 1.88% : 0.000014s : 113: predicate.replace_applicator 0.74% : 0.000006s : 45: predicate.replace_old_param 0.10% : 0.000001s : 4: predicate.reset_defer_inline 1.21% : 0.000009s : 53: predicate.reshape_eliminate 1.16% : 0.000009s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.45% : 0.000011s : 50: predicate.same_eliminate 0.35% : 0.000003s : 21: predicate.set_cell_output_no_recompute 0.60% : 0.000004s : 21: predicate.shard_identity_eliminate 0.28% : 0.000002s : 8: predicate.special_op_eliminate 0.58% : 0.000004s : 21: predicate.specialize_transform 1.37% : 0.000010s : 50: predicate.split_environ_get_set_with_tuple_value 1.26% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.90% : 0.000014s : 80: predicate.switch_defer_inline 2.88% : 0.000022s : 130: predicate.switch_layer_defer_inline 5.08% : 0.000038s : 218: predicate.switch_simplify 1.18% : 0.000009s : 53: predicate.tile_eliminate 1.07% : 0.000008s : 53: predicate.transpose_eliminate 1.57% : 0.000012s : 61: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 61: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000011s : 61: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000021s : 92: predicate.tuple_list_get_item_eliminator 1.49% : 0.000011s : 61: predicate.tuple_list_get_set_item_eliminator 2.09% : 0.000016s : 82: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 71: predicate.tuple_to_list_eliminator_ 2.46% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.08% : 0.000023s : 145: predicate.updatestate_useless_node_eliminater 0.11% : 0.000001s : 4: predicate.value_based_eliminate 0.55% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.49% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002983 36 49.19% : 0.001467s : 15: func_graph_cloner_run.FuncGraphClonerGraph 50.81% : 0.001516s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.735821 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.13% : 0.008319s : 1: add_attr 1.13% : 0.008302s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000074s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000166s : 1: auto_monad 0.00% : 0.000027s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.07% : 0.000541s : 1: bootstrap 0.01% : 0.000039s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000021s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000032s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000016s : 1: environ_conv 0.01% : 0.000074s : 1: event_method 0.00% : 0.000020s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000014s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000005s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.07% : 0.000515s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.11% : 0.000835s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000022s : 1: opt.transform.mutable_eliminate 0.68% : 0.004969s : 117: opt.transform.opt_a 0.00% : 0.000033s : 1: opt.transform.opt_after_cconv 0.00% : 0.000030s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000141s : 28: opt.transform.opt_b 0.01% : 0.000061s : 2: opt.transform.opt_trans_graph 0.01% : 0.000048s : 4: opt.transform.symbol_engine_opt 5.68% : 0.041768s : 1: opt_a 0.02% : 0.000130s : 1: opt_after_cconv 0.08% : 0.000574s : 1: opt_after_jit_grad 0.04% : 0.000287s : 1: opt_b 6.06% : 0.044596s : 1: optimize 0.00% : 0.000027s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000030s : 1: overlap_grad_flash_sp 0.00% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000070s : 1: pre_auto_parallel 0.01% : 0.000055s : 1: py_interpret_to_execute 0.00% : 0.000027s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000024s : 1: remove_dup_value 3.69% : 0.027143s : 2: renormalize.infer 0.37% : 0.002729s : 2: renormalize.specialize 0.00% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000057s : 1: rewriter_after_opt_a 0.03% : 0.000191s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000106s : 1: symbol_engine_optimizer 76.33% : 0.561675s : 1: task_emit 0.01% : 0.000094s : 1: tuple_transform 4.29% : 0.031558s : 1: type_inference 0.01% : 0.000096s : 1: validate TotalTime = 0.669609, [24] [bootstrap]: 0.00046094 [type_inference]: 0.021966 [event_method]: 1.844e-05 [auto_monad]: 7.353e-05 [graph_reusing]: 6.36e-06 [inline]: 3.81999e-06 [add_attr]: 0.00393543, [1] [add_attr_with_inline]: 0.0039218, [1] [Cycle 1]: 7.421e-05, [2] [tag_attr]: 1.817e-05 [meta_addattr_fg_expand]: 4.43001e-06 [parallel-infer-symbol]: 3.74002e-06 [pre_auto_parallel]: 3.491e-05 [insert-virtual-dataset]: 2.53e-06 [parallel-infer-symbol-second]: 9.50007e-07 [dataset_repeat_opt]: 2.09999e-06 [pipeline_split]: 1.98002e-06 [optimize]: 0.0219036, [53] [py_interpret_to_execute]: 2.866e-05 [rewriter_before_opt_a]: 6.535e-05 [opt_a]: 0.0191575, [2] [Cycle 1]: 0.018327, [45] [expand_dump_flag]: 3.39001e-06 [switch_simplify]: 3.088e-05 [loop_unroll]: 1.86e-05 [a_1]: 0.00041535 [with_stream_mark]: 2.249e-05 [recompute_prepare]: 9.22001e-06 [updatestate_depend_eliminate]: 4.08999e-06 [updatestate_assign_eliminate]: 3.70998e-06 [updatestate_loads_eliminate]: 3.45e-06 [parameter_eliminate]: 1.93002e-06 [a_2]: 9.096e-05 [accelerated_algorithm]: 8.13001e-06 [shard]: 3.07002e-06 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 7.2e-06 [merge_send_recv]: 1.049e-05 [auto_parallel]: 7.97e-06 [parallel]: 2.186e-05 [flash_sp]: 1.011e-05 [merge_comm]: 4.27e-06 [allreduce_fusion]: 3.6e-06 [matmul_add_comm_reduction]: 1.11e-05 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 9.60001e-06 [virtual_dataset]: 7.23999e-06 [get_grad_eliminate_]: 6.10002e-06 [virtual_output]: 6.41998e-06 [merge_forward]: 4.47e-06 [cell_reuse_recompute_pass]: 1.42999e-06 [offload_activation]: 1.137e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.394e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 1.172e-05 [set_forward_comm_id_for_comm_node_pass]: 4.52e-06 [meta_fg_expand]: 3.19001e-06 [flash_sp_send_recv_attached]: 2.65002e-06 [receive_attached]: 2.26e-06 [after_resolve]: 1.142e-05 [a_after_grad]: 9.24998e-06 [renormalize]: 0.0170271 [add_forward_monad_depend]: 1.296e-05 [auto_monad_grad]: 3.03e-06 [auto_monad_eliminator]: 2.521e-05 [cse]: 3.818e-05 [a_3]: 6.473e-05 [Cycle 2]: 0.00081538, [45] [expand_dump_flag]: 2.66999e-06 [switch_simplify]: 1.055e-05 [loop_unroll]: 7.44002e-06 [a_1]: 0.00014849 [with_stream_mark]: 2.082e-05 [recompute_prepare]: 7.45e-06 [updatestate_depend_eliminate]: 4.42e-06 [updatestate_assign_eliminate]: 3.59002e-06 [updatestate_loads_eliminate]: 4.28999e-06 [parameter_eliminate]: 2.54001e-06 [a_2]: 8.481e-05 [accelerated_algorithm]: 7.06001e-06 [shard]: 2.81999e-06 [meta_shard_fg_expand]: 2.48e-06 [shard_inline]: 6.80998e-06 [merge_send_recv]: 1.003e-05 [auto_parallel]: 1.015e-05 [parallel]: 9.67001e-06 [flash_sp]: 4.47998e-06 [merge_comm]: 4.87e-06 [allreduce_fusion]: 3.95e-06 [matmul_add_comm_reduction]: 1.166e-05 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.65e-06 [virtual_dataset]: 6.81999e-06 [get_grad_eliminate_]: 5.99e-06 [virtual_output]: 6.08002e-06 [merge_forward]: 4.82e-06 [cell_reuse_recompute_pass]: 3.63999e-06 [offload_activation]: 1.192e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.547e-05 [merge_recompute_call_nodes]: 1.69e-06 [before_grad]: 1.252e-05 [set_forward_comm_id_for_comm_node_pass]: 4.99e-06 [meta_fg_expand]: 3.5e-06 [flash_sp_send_recv_attached]: 1.86e-06 [receive_attached]: 2.78e-06 [after_resolve]: 1.309e-05 [a_after_grad]: 9.31002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 2.58e-06 [auto_monad_grad]: 2.22999e-06 [auto_monad_eliminator]: 1.166e-05 [cse]: 2.445e-05 [a_3]: 3.698e-05 [py_interpret_to_execute_after_opt_a]: 2.132e-05 [slice_cell_reuse_recomputed_activation]: 2.12999e-06 [rewriter_after_opt_a]: 4.765e-05 [convert_after_rewriter]: 8.01001e-06 [order_py_execute_after_rewriter]: 6.89001e-06 [mutable_eliminate]: 0.00081848 [opt_b]: 0.00024136, [1] [Cycle 1]: 0.00023012, [7] [b_1]: 0.00012415 [b_2]: 8.97e-06 [updatestate_depend_eliminate]: 1.182e-05 [updatestate_assign_eliminate]: 2.93e-06 [updatestate_loads_eliminate]: 3.13998e-06 [renormalize]: 1.03001e-06 [cse]: 3.625e-05 [optimize_parallel_all_gather_comm]: 2.479e-05 [overlap_param_gather]: 2.74001e-06 [cconv]: 3.83e-05 [loop_unroll]: 0.00062247 [opt_after_cconv]: 0.00013112, [1] [Cycle 1]: 0.00012122, [7] [c_1]: 3.16e-05 [parameter_eliminate]: 6.58e-06 [updatestate_depend_eliminate]: 8.40001e-06 [updatestate_assign_eliminate]: 3.14001e-06 [updatestate_loads_eliminate]: 3.06999e-06 [cse]: 2.913e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 1.862e-05 [tuple_transform]: 8.926e-05, [1] [Cycle 1]: 8.361e-05, [4] [d_1]: 5.05e-05 [none_parameter_eliminate]: 2.12001e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 8.15999e-06 [partial_unused_args_eliminate]: 2.02001e-06 [add_recomputation]: 5.701e-05 [cse_after_recomputation]: 2.517e-05, [1] [Cycle 1]: 1.92e-05, [1] [cse]: 1.241e-05 [environ_conv]: 7.61999e-06 [swap_dp_allreduce_reducescatter]: 6.33002e-06 [bias_add_comm_swap]: 3.45e-06 [label_micro_interleaved_index]: 7.2e-06 [label_fine_grained_interleaved_index]: 3.08e-06 [merge_cast_opt]: 1.47001e-06 [slice_recompute_activation]: 2.53e-06 [micro_interleaved_order_control]: 2.86999e-06 [assign_add_opt]: 1.79e-06 [ForceFp32Comm]: 9.20001e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.63998e-06 [reorder_send_recv_between_fp_bp]: 2.94999e-06 [comm_op_add_attrs]: 1.25001e-06 [add_comm_op_reuse_tag]: 1.15001e-06 [interleave_split_concat_branches]: 1.35001e-06 [interleave_parallel_branches]: 1.25001e-06 [overlap_opt_shard_in_pipeline]: 1.62001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.774e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 5.94e-06 [overlap_recompute_and_grad_model_parallel]: 5.89e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47001e-06 [overlap_recompute_comm]: 2.38998e-06 [overlap_grad_ring_attention]: 5.49e-06 [overlap_grad_flash_sp]: 2.102e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.36e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 1.60999e-06 [symbol_engine_optimizer]: 9.329e-05, [1] [Cycle 1]: 8.771e-05, [6] [build]: 4.07e-06 [elim_shapecalc]: 1.518e-05 [elim_not_effective]: 1.579e-05 [opt_reshape]: 7.21999e-06 [fold_const_symbol]: 1.081e-05 [renormalize]: 4.60015e-07 [detach_backward]: 2.58e-06 [pipeline_parallel_scheduler]: 1.97001e-06 [auto_monad_reorder]: 1.983e-05 [get_jit_bprop_graph]: 1.87999e-06 [rewriter_after_jit_bprop_graph]: 6.04999e-06 [opt_after_jit_grad]: 0.0006342 [validate]: 4.932e-05 [backend_pass]: 1.34003e-06 [task_emit]: 0.620194 [execute]: 1.087e-05 Sums bootstrap : 0.000461s : 0.07% type_inference : 0.021966s : 3.31% event_method : 0.000018s : 0.00% auto_monad : 0.000074s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000004s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000035s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000029s : 0.00% optimize.rewriter_before_opt_a : 0.000065s : 0.01% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.01% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000564s : 0.08% optimize.opt_a.with_stream_mark : 0.000043s : 0.01% optimize.opt_a.recompute_prepare : 0.000017s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000008s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000176s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000015s : 0.00% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000014s : 0.00% optimize.opt_a.merge_send_recv : 0.000021s : 0.00% optimize.opt_a.auto_parallel : 0.000018s : 0.00% optimize.opt_a.parallel : 0.000032s : 0.00% optimize.opt_a.flash_sp : 0.000015s : 0.00% optimize.opt_a.merge_comm : 0.000009s : 0.00% optimize.opt_a.allreduce_fusion : 0.000008s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000023s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000017s : 0.00% optimize.opt_a.virtual_dataset : 0.000014s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000013s : 0.00% optimize.opt_a.merge_forward : 0.000009s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000023s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000029s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000024s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000010s : 0.00% optimize.opt_a.meta_fg_expand : 0.000007s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000025s : 0.00% optimize.opt_a.a_after_grad : 0.000019s : 0.00% optimize.opt_a.renormalize : 0.017027s : 2.56% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.00% optimize.opt_a.auto_monad_grad : 0.000005s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000037s : 0.01% optimize.opt_a.cse : 0.000063s : 0.01% optimize.opt_a.a_3 : 0.000102s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000021s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.01% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000818s : 0.12% optimize.opt_b.b_1 : 0.000124s : 0.02% optimize.opt_b.b_2 : 0.000009s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000012s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000036s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000025s : 0.00% optimize.overlap_param_gather : 0.000003s : 0.00% optimize.cconv : 0.000038s : 0.01% optimize.loop_unroll : 0.000622s : 0.09% optimize.opt_after_cconv.c_1 : 0.000032s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000029s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000019s : 0.00% optimize.tuple_transform.d_1 : 0.000051s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000008s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000007s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000021s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000002s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000020s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000634s : 0.10% validate : 0.000049s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.620194s : 93.36% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.000206 24 22.13% : 0.000046s : 4: substitution.arithmetic_simplify 1.29% : 0.000003s : 2: substitution.elim_not_effective 0.69% : 0.000001s : 2: substitution.fold_const_symbol 3.73% : 0.000008s : 3: substitution.graph_param_transform 64.09% : 0.000132s : 3: substitution.inline 2.51% : 0.000005s : 4: substitution.j_node_and_user_rematch 2.92% : 0.000006s : 4: substitution.remove_not_recompute_node 2.66% : 0.000005s : 2: substitution.replace_old_param ------[type_inference.] 0.021891 2 96.43% : 0.021109s : 1: type_inference.infer 3.57% : 0.000782s : 1: type_inference.specialize ------[replace.] 0.000033 3 100.00% : 0.000033s : 3: replace.inline ------[match.] 0.000130 3 100.00% : 0.000130s : 3: match.inline ------[predicate.] 0.000179 815 0.96% : 0.000002s : 8: predicate.accumulaten_eliminater 1.36% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 6: predicate.addn_check_dump 0.82% : 0.000001s : 8: predicate.addn_zero_filter 0.64% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.30% : 0.000004s : 14: predicate.arithmetic_simplify 1.12% : 0.000002s : 8: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.61% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.78% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.89% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.74% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.31% : 0.000001s : 3: predicate.elim_not_effective 0.65% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.02% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.02% : 0.000002s : 11: predicate.environ_get_depend_swap 1.89% : 0.000003s : 17: predicate.environ_get_eliminate 1.16% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.02% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.03% : 0.000004s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.89% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.18% : 0.000000s : 3: predicate.fold_const_symbol 0.72% : 0.000001s : 6: predicate.get_grad_eliminate 0.20% : 0.000000s : 3: predicate.graph_param_transform 0.62% : 0.000001s : 6: predicate.incorporate_call 0.70% : 0.000001s : 6: predicate.incorporate_call_switch 6.07% : 0.000011s : 37: predicate.inline 0.96% : 0.000002s : 6: predicate.inline_without_move 0.93% : 0.000002s : 6: predicate.j_node_and_user_rematch 1.21% : 0.000002s : 6: predicate.less_batch_normalization 1.61% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.18% : 0.000004s : 22: predicate.load_eliminater 1.61% : 0.000003s : 3: predicate.loop_unroll_after_grad 1.69% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.59% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.53% : 0.000001s : 6: predicate.merge_addn 0.59% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.64% : 0.000001s : 8: predicate.minmaximum_grad 2.29% : 0.000004s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.32% : 0.000002s : 11: predicate.partial_defer_inline 1.14% : 0.000002s : 11: predicate.partial_eliminate 0.84% : 0.000002s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.34% : 0.000002s : 8: predicate.reduce_eliminate 2.05% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.61% : 0.000001s : 6: predicate.remove_not_recompute_node 1.22% : 0.000002s : 14: predicate.replace_applicator 0.77% : 0.000001s : 6: predicate.replace_old_param 0.23% : 0.000000s : 3: predicate.reset_defer_inline 0.86% : 0.000002s : 8: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 3: predicate.row_tensor_eliminate 1.21% : 0.000002s : 6: predicate.same_eliminate 0.42% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.00% : 0.000002s : 6: predicate.shard_identity_eliminate 0.85% : 0.000002s : 6: predicate.special_op_eliminate 0.93% : 0.000002s : 6: predicate.specialize_transform 1.40% : 0.000003s : 6: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.06% : 0.000002s : 11: predicate.switch_defer_inline 1.66% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.35% : 0.000008s : 38: predicate.switch_simplify 0.82% : 0.000001s : 8: predicate.tile_eliminate 0.88% : 0.000002s : 8: predicate.transpose_eliminate 1.56% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000006s : 20: predicate.tuple_list_get_item_eliminator 1.64% : 0.000003s : 14: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 1.89% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.70% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.91% : 0.000002s : 6: predicate.virtual_dataset_eliminate 0.84% : 0.000002s : 6: predicate.virtual_output_eliminate 0.26% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.75% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000562 7 24.33% : 0.000137s : 2: func_graph_cloner_run.FuncGraphClonerGraph 75.67% : 0.000425s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.713633 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.55% : 0.003942s : 1: add_attr 0.55% : 0.003926s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000062s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.01% : 0.000080s : 1: auto_monad 0.00% : 0.000025s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.07% : 0.000491s : 1: bootstrap 0.01% : 0.000043s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000022s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000029s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000011s : 1: environ_conv 0.00% : 0.000025s : 1: event_method 0.00% : 0.000019s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000005s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000010s : 1: label_micro_interleaved_index 0.09% : 0.000638s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.12% : 0.000840s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.00% : 0.000019s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000027s : 1: opt.transform.mutable_eliminate 0.14% : 0.000991s : 78: opt.transform.opt_a 0.00% : 0.000029s : 1: opt.transform.opt_after_cconv 0.00% : 0.000029s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000098s : 28: opt.transform.opt_b 0.01% : 0.000055s : 2: opt.transform.opt_trans_graph 0.01% : 0.000044s : 4: opt.transform.symbol_engine_opt 2.69% : 0.019162s : 1: opt_a 0.02% : 0.000135s : 1: opt_after_cconv 0.09% : 0.000650s : 1: opt_after_jit_grad 0.03% : 0.000246s : 1: opt_b 3.07% : 0.021910s : 1: optimize 0.00% : 0.000029s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000011s : 1: order_py_execute_after_rewriter 0.00% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000039s : 1: pre_auto_parallel 0.00% : 0.000033s : 1: py_interpret_to_execute 0.00% : 0.000026s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000022s : 1: remove_dup_value 2.31% : 0.016520s : 1: renormalize.infer 0.07% : 0.000486s : 1: renormalize.specialize 0.00% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000055s : 1: rewriter_after_opt_a 0.01% : 0.000070s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000096s : 1: symbol_engine_optimizer 86.91% : 0.620220s : 1: task_emit 0.01% : 0.000093s : 1: tuple_transform 3.08% : 0.022001s : 1: type_inference 0.01% : 0.000081s : 1: validate TotalTime = 0.932217, [24] [bootstrap]: 0.00046991 [type_inference]: 0.0547252 [event_method]: 5.987e-05 [auto_monad]: 0.00015432 [graph_reusing]: 9.32999e-06 [inline]: 3.07002e-06 [add_attr]: 0.00403899, [1] [add_attr_with_inline]: 0.00402655, [1] [Cycle 1]: 9.465e-05, [2] [tag_attr]: 4.3e-05 [meta_addattr_fg_expand]: 1.054e-05 [parallel-infer-symbol]: 3.93001e-06 [pre_auto_parallel]: 5.865e-05 [insert-virtual-dataset]: 2.58e-06 [parallel-infer-symbol-second]: 9.5999e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.70001e-06 [optimize]: 0.0827061, [53] [py_interpret_to_execute]: 4.652e-05 [rewriter_before_opt_a]: 0.00023874 [opt_a]: 0.0797775, [3] [Cycle 1]: 0.0751793, [45] [expand_dump_flag]: 7.46001e-06 [switch_simplify]: 8.131e-05 [loop_unroll]: 6.348e-05 [a_1]: 0.0016331 [with_stream_mark]: 3.397e-05 [recompute_prepare]: 2.987e-05 [updatestate_depend_eliminate]: 1.007e-05 [updatestate_assign_eliminate]: 7.73001e-06 [updatestate_loads_eliminate]: 7.68001e-06 [parameter_eliminate]: 3.93001e-06 [a_2]: 0.00025884 [accelerated_algorithm]: 3.792e-05 [shard]: 2.24001e-06 [meta_shard_fg_expand]: 5.15999e-06 [shard_inline]: 1.691e-05 [merge_send_recv]: 1.889e-05 [auto_parallel]: 1.585e-05 [parallel]: 2.208e-05 [flash_sp]: 1.616e-05 [merge_comm]: 1.094e-05 [allreduce_fusion]: 9.31e-06 [matmul_add_comm_reduction]: 2.933e-05 [allreduce_slice_to_reducescatter]: 7.39994e-07 [virtual_shard_identity]: 2.209e-05 [virtual_dataset]: 1.639e-05 [get_grad_eliminate_]: 1.627e-05 [virtual_output]: 1.566e-05 [merge_forward]: 9.42001e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 2.055e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.372e-05 [merge_recompute_call_nodes]: 2.16e-06 [before_grad]: 3.079e-05 [set_forward_comm_id_for_comm_node_pass]: 1.035e-05 [meta_fg_expand]: 0.0625407 [flash_sp_send_recv_attached]: 6.36998e-06 [receive_attached]: 2.16e-06 [after_resolve]: 9.312e-05 [a_after_grad]: 0.00010527 [renormalize]: 0.0088029 [add_forward_monad_depend]: 1.418e-05 [auto_monad_grad]: 7.95e-06 [auto_monad_eliminator]: 6.162e-05 [cse]: 0.00024957 [a_3]: 0.00036746 [Cycle 2]: 0.00373893, [45] [expand_dump_flag]: 3.76001e-06 [switch_simplify]: 4.901e-05 [loop_unroll]: 4.382e-05 [a_1]: 0.00160049 [with_stream_mark]: 2.419e-05 [recompute_prepare]: 1.492e-05 [updatestate_depend_eliminate]: 5.10001e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 3.67998e-06 [parameter_eliminate]: 3.03e-06 [a_2]: 0.00010332 [accelerated_algorithm]: 1.34e-05 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 3.73001e-06 [shard_inline]: 7.51999e-06 [merge_send_recv]: 1.013e-05 [auto_parallel]: 1.203e-05 [parallel]: 1.148e-05 [flash_sp]: 4.97999e-06 [merge_comm]: 4.38001e-06 [allreduce_fusion]: 4.21001e-06 [matmul_add_comm_reduction]: 1.15e-05 [allreduce_slice_to_reducescatter]: 8.29983e-07 [virtual_shard_identity]: 1.076e-05 [virtual_dataset]: 7.48e-06 [get_grad_eliminate_]: 7.13e-06 [virtual_output]: 7.74002e-06 [merge_forward]: 4.87e-06 [cell_reuse_recompute_pass]: 1.76e-06 [offload_activation]: 1.248e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.78e-05 [merge_recompute_call_nodes]: 1.69e-06 [before_grad]: 1.469e-05 [set_forward_comm_id_for_comm_node_pass]: 5.39e-06 [meta_fg_expand]: 0.00011193 [flash_sp_send_recv_attached]: 1.92001e-06 [receive_attached]: 2.69001e-06 [after_resolve]: 1.648e-05 [a_after_grad]: 1.217e-05 [renormalize]: 0.00109416 [add_forward_monad_depend]: 8.60999e-06 [auto_monad_grad]: 2.98e-06 [auto_monad_eliminator]: 1.923e-05 [cse]: 4.389e-05 [a_3]: 6.442e-05 [Cycle 3]: 0.00083572, [45] [expand_dump_flag]: 2.75002e-06 [switch_simplify]: 9.96998e-06 [loop_unroll]: 7.88001e-06 [a_1]: 0.00017549 [with_stream_mark]: 1.424e-05 [recompute_prepare]: 8.27003e-06 [updatestate_depend_eliminate]: 4.89998e-06 [updatestate_assign_eliminate]: 4.03999e-06 [updatestate_loads_eliminate]: 3.28e-06 [parameter_eliminate]: 2.12001e-06 [a_2]: 9.216e-05 [accelerated_algorithm]: 1.202e-05 [shard]: 2.39001e-06 [meta_shard_fg_expand]: 2.56998e-06 [shard_inline]: 7.09001e-06 [merge_send_recv]: 9.70002e-06 [auto_parallel]: 1.071e-05 [parallel]: 1.129e-05 [flash_sp]: 1.90001e-06 [merge_comm]: 4.15999e-06 [allreduce_fusion]: 4.26001e-06 [matmul_add_comm_reduction]: 1.099e-05 [allreduce_slice_to_reducescatter]: 9.50007e-07 [virtual_shard_identity]: 1.056e-05 [virtual_dataset]: 6.94999e-06 [get_grad_eliminate_]: 7.30998e-06 [virtual_output]: 6.42001e-06 [merge_forward]: 5.31998e-06 [cell_reuse_recompute_pass]: 2.99001e-06 [offload_activation]: 1.136e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.618e-05 [merge_recompute_call_nodes]: 1.94e-06 [before_grad]: 1.201e-05 [set_forward_comm_id_for_comm_node_pass]: 4.63999e-06 [meta_fg_expand]: 3.52997e-06 [flash_sp_send_recv_attached]: 1.69e-06 [receive_attached]: 2.12999e-06 [after_resolve]: 1.241e-05 [a_after_grad]: 1.041e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 2.38998e-06 [auto_monad_grad]: 1.71002e-06 [auto_monad_eliminator]: 1.092e-05 [cse]: 2.51e-05 [a_3]: 4.308e-05 [py_interpret_to_execute_after_opt_a]: 1.948e-05 [slice_cell_reuse_recomputed_activation]: 2.96001e-06 [rewriter_after_opt_a]: 5.229e-05 [convert_after_rewriter]: 9.18002e-06 [order_py_execute_after_rewriter]: 6.06998e-06 [mutable_eliminate]: 0.00079339 [opt_b]: 0.00029645, [1] [Cycle 1]: 0.00028478, [7] [b_1]: 0.00015732 [b_2]: 2.827e-05 [updatestate_depend_eliminate]: 7.78001e-06 [updatestate_assign_eliminate]: 4.02e-06 [updatestate_loads_eliminate]: 4.03001e-06 [renormalize]: 8.2e-07 [cse]: 3.844e-05 [optimize_parallel_all_gather_comm]: 2.327e-05 [overlap_param_gather]: 2.08002e-06 [cconv]: 3.024e-05 [loop_unroll]: 0.00054107 [opt_after_cconv]: 0.00013358, [1] [Cycle 1]: 0.00012601, [7] [c_1]: 3.64e-05 [parameter_eliminate]: 5.07999e-06 [updatestate_depend_eliminate]: 6.58e-06 [updatestate_assign_eliminate]: 3.22002e-06 [updatestate_loads_eliminate]: 3.31001e-06 [cse]: 3.206e-05 [renormalize]: 6.30011e-07 [remove_dup_value]: 1.931e-05 [tuple_transform]: 9.035e-05, [1] [Cycle 1]: 8.52e-05, [4] [d_1]: 5.307e-05 [none_parameter_eliminate]: 2.05002e-06 [renormalize]: 2.60014e-07 [switch_simplify]: 8.27e-06 [partial_unused_args_eliminate]: 2.19001e-06 [add_recomputation]: 6.168e-05 [cse_after_recomputation]: 3.118e-05, [1] [Cycle 1]: 2.506e-05, [1] [cse]: 1.865e-05 [environ_conv]: 1.093e-05 [swap_dp_allreduce_reducescatter]: 6.88998e-06 [bias_add_comm_swap]: 3.26999e-06 [label_micro_interleaved_index]: 5.34e-06 [label_fine_grained_interleaved_index]: 3.19001e-06 [merge_cast_opt]: 1.49e-06 [slice_recompute_activation]: 2.48e-06 [micro_interleaved_order_control]: 2.86e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 9.90025e-07 [remove_cast_before_assign_add]: 1.19e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 3.08e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 1.46998e-06 [interleave_split_concat_branches]: 1.25999e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 1.75001e-06 [control_data_broadcast_order]: 1.771e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 5.03002e-06 [overlap_recompute_and_grad_model_parallel]: 5.66998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.89001e-06 [overlap_grad_ring_attention]: 5.74e-06 [overlap_grad_flash_sp]: 2.672e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.89999e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 0.00011045, [1] [Cycle 1]: 0.00010383, [6] [build]: 1.197e-05 [elim_shapecalc]: 1.652e-05 [elim_not_effective]: 1.77e-05 [opt_reshape]: 8.59e-06 [fold_const_symbol]: 1.3e-05 [renormalize]: 2.80008e-07 [detach_backward]: 2.69999e-06 [pipeline_parallel_scheduler]: 1.74e-06 [auto_monad_reorder]: 2.429e-05 [get_jit_bprop_graph]: 1.99e-06 [rewriter_after_jit_bprop_graph]: 4.42e-06 [opt_after_jit_grad]: 0.00059192 [validate]: 5.546e-05 [backend_pass]: 9.99979e-07 [task_emit]: 0.789011 [execute]: 1.119e-05 Sums bootstrap : 0.000470s : 0.05% type_inference : 0.054725s : 5.91% event_method : 0.000060s : 0.01% auto_monad : 0.000154s : 0.02% graph_reusing : 0.000009s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000043s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000059s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000047s : 0.01% optimize.rewriter_before_opt_a : 0.000239s : 0.03% optimize.opt_a.expand_dump_flag : 0.000014s : 0.00% optimize.opt_a.switch_simplify : 0.000140s : 0.02% optimize.opt_a.loop_unroll : 0.000115s : 0.01% optimize.opt_a.a_1 : 0.003409s : 0.37% optimize.opt_a.with_stream_mark : 0.000072s : 0.01% optimize.opt_a.recompute_prepare : 0.000053s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000009s : 0.00% optimize.opt_a.a_2 : 0.000454s : 0.05% optimize.opt_a.accelerated_algorithm : 0.000063s : 0.01% optimize.opt_a.shard : 0.000007s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000011s : 0.00% optimize.opt_a.shard_inline : 0.000032s : 0.00% optimize.opt_a.merge_send_recv : 0.000039s : 0.00% optimize.opt_a.auto_parallel : 0.000039s : 0.00% optimize.opt_a.parallel : 0.000045s : 0.00% optimize.opt_a.flash_sp : 0.000023s : 0.00% optimize.opt_a.merge_comm : 0.000019s : 0.00% optimize.opt_a.allreduce_fusion : 0.000018s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000052s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000003s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000043s : 0.00% optimize.opt_a.virtual_dataset : 0.000031s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000031s : 0.00% optimize.opt_a.virtual_output : 0.000030s : 0.00% optimize.opt_a.merge_forward : 0.000020s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000044s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000068s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000006s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.00% optimize.opt_a.meta_fg_expand : 0.062656s : 6.76% optimize.opt_a.flash_sp_send_recv_attached : 0.000010s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000122s : 0.01% optimize.opt_a.a_after_grad : 0.000128s : 0.01% optimize.opt_a.renormalize : 0.009897s : 1.07% optimize.opt_a.add_forward_monad_depend : 0.000025s : 0.00% optimize.opt_a.auto_monad_grad : 0.000013s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000092s : 0.01% optimize.opt_a.cse : 0.000319s : 0.03% optimize.opt_a.a_3 : 0.000475s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000019s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000052s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000793s : 0.09% optimize.opt_b.b_1 : 0.000157s : 0.02% optimize.opt_b.b_2 : 0.000028s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000038s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000023s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000030s : 0.00% optimize.loop_unroll : 0.000541s : 0.06% optimize.opt_after_cconv.c_1 : 0.000036s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000019s : 0.00% optimize.tuple_transform.d_1 : 0.000053s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000062s : 0.01% optimize.cse_after_recomputation.cse : 0.000019s : 0.00% optimize.environ_conv : 0.000011s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000027s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000017s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000592s : 0.06% validate : 0.000055s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.789011s : 85.16% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.001030 159 7.36% : 0.000076s : 7: substitution.arithmetic_simplify 0.25% : 0.000003s : 3: substitution.elim_not_effective 0.59% : 0.000006s : 5: substitution.float_depend_g_call 0.44% : 0.000005s : 2: substitution.float_tuple_getitem_switch 0.23% : 0.000002s : 3: substitution.fold_const_symbol 0.71% : 0.000007s : 4: substitution.graph_param_transform 0.32% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 60.87% : 0.000627s : 17: substitution.inline 2.77% : 0.000029s : 2: substitution.inline_without_move 1.20% : 0.000012s : 15: substitution.j_node_and_user_rematch 2.02% : 0.000021s : 3: substitution.less_batch_normalization 1.33% : 0.000014s : 7: substitution.minmaximum_grad 0.74% : 0.000008s : 5: substitution.partial_eliminate 1.44% : 0.000015s : 15: substitution.remove_not_recompute_node 3.61% : 0.000037s : 10: substitution.replace_applicator 1.44% : 0.000015s : 10: substitution.replace_old_param 0.40% : 0.000004s : 1: substitution.set_cell_output_no_recompute 2.58% : 0.000027s : 7: substitution.tuple_list_convert_item_index_to_positive 1.22% : 0.000013s : 7: substitution.tuple_list_get_item_const_eliminator 1.56% : 0.000016s : 7: substitution.tuple_list_get_item_depend_reorder 7.05% : 0.000073s : 18: substitution.tuple_list_get_item_eliminator 1.64% : 0.000017s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.054613 2 96.32% : 0.052604s : 1: type_inference.infer 3.68% : 0.002009s : 1: type_inference.specialize ------[replace.] 0.000270 26 62.45% : 0.000169s : 17: replace.inline 37.55% : 0.000101s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000650 26 94.47% : 0.000614s : 17: match.inline 5.53% : 0.000036s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000761 4180 1.10% : 0.000008s : 52: predicate.accumulaten_eliminater 0.38% : 0.000003s : 4: predicate.ad_related_special_op_eliminate 0.44% : 0.000003s : 21: predicate.addn_check_dump 1.13% : 0.000009s : 52: predicate.addn_zero_filter 1.08% : 0.000008s : 52: predicate.adjust_all_reduce_mul_add 2.24% : 0.000017s : 73: predicate.arithmetic_simplify 1.24% : 0.000009s : 52: predicate.cast_eliminate 1.15% : 0.000009s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.47% : 0.000004s : 21: predicate.depend_value_elim 1.17% : 0.000009s : 52: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 52: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.33% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 4: predicate.elim_not_effective 0.16% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.29% : 0.000010s : 56: predicate.environ_add_const_eliminate 1.16% : 0.000009s : 56: predicate.environ_get_add_eliminate 1.14% : 0.000009s : 56: predicate.environ_get_depend_swap 1.64% : 0.000012s : 77: predicate.environ_get_eliminate 1.15% : 0.000009s : 56: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 78: predicate.exchange_switch_depend_value 2.53% : 0.000019s : 78: predicate.float_depend_g_call 0.48% : 0.000004s : 21: predicate.float_environ_get_switch 0.57% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.60% : 0.000005s : 21: predicate.get_grad_eliminate 0.09% : 0.000001s : 4: predicate.graph_param_transform 0.46% : 0.000004s : 21: predicate.incorporate_call 0.42% : 0.000003s : 21: predicate.incorporate_call_switch 5.88% : 0.000045s : 180: predicate.inline 1.41% : 0.000011s : 45: predicate.inline_without_move 0.27% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.72% : 0.000006s : 21: predicate.less_batch_normalization 1.52% : 0.000012s : 69: predicate.list_to_tuple_eliminator_ 2.49% : 0.000019s : 121: predicate.load_eliminater 0.35% : 0.000003s : 4: predicate.loop_unroll_after_grad 2.39% : 0.000018s : 110: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 60: predicate.make_slice_get_slice_eliminator 0.49% : 0.000004s : 21: predicate.merge_addn 1.09% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.06% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 52: predicate.minmaximum_grad 0.42% : 0.000003s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.15% : 0.000001s : 4: predicate.parallel_virtual_node 2.09% : 0.000016s : 78: predicate.partial_defer_inline 1.60% : 0.000012s : 65: predicate.partial_eliminate 1.13% : 0.000009s : 52: predicate.print_const_string_wrapper 0.54% : 0.000004s : 21: predicate.reduce_all_const_elim 1.35% : 0.000010s : 52: predicate.reduce_eliminate 2.55% : 0.000019s : 121: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 21: predicate.remove_not_recompute_node 1.86% : 0.000014s : 111: predicate.replace_applicator 0.74% : 0.000006s : 45: predicate.replace_old_param 0.06% : 0.000000s : 4: predicate.reset_defer_inline 1.17% : 0.000009s : 52: predicate.reshape_eliminate 1.08% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 4: predicate.row_tensor_eliminate 1.34% : 0.000010s : 50: predicate.same_eliminate 0.37% : 0.000003s : 21: predicate.set_cell_output_no_recompute 0.66% : 0.000005s : 21: predicate.shard_identity_eliminate 0.20% : 0.000002s : 8: predicate.special_op_eliminate 0.67% : 0.000005s : 21: predicate.specialize_transform 1.40% : 0.000011s : 50: predicate.split_environ_get_set_with_tuple_value 1.28% : 0.000010s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 78: predicate.switch_defer_inline 2.83% : 0.000022s : 128: predicate.switch_layer_defer_inline 5.06% : 0.000039s : 213: predicate.switch_simplify 1.11% : 0.000008s : 52: predicate.tile_eliminate 1.14% : 0.000009s : 52: predicate.transpose_eliminate 1.46% : 0.000011s : 60: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000012s : 60: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000011s : 60: predicate.tuple_list_get_item_depend_reorder 2.94% : 0.000022s : 90: predicate.tuple_list_get_item_eliminator 1.52% : 0.000012s : 60: predicate.tuple_list_get_set_item_eliminator 1.94% : 0.000015s : 81: predicate.tuple_list_set_item_eliminator 1.53% : 0.000012s : 69: predicate.tuple_to_list_eliminator_ 2.47% : 0.000019s : 121: predicate.updatestate_pure_node_eliminater 2.98% : 0.000023s : 142: predicate.updatestate_useless_node_eliminater 0.13% : 0.000001s : 4: predicate.value_based_eliminate 0.56% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.53% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.062766 35 98.25% : 0.061665s : 14: func_graph_cloner_run.FuncGraphClonerGraph 1.75% : 0.001101s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.034138 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.39% : 0.004045s : 1: add_attr 0.39% : 0.004031s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000068s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.02% : 0.000164s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000005s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.05% : 0.000499s : 1: bootstrap 0.00% : 0.000035s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000021s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000015s : 1: environ_conv 0.01% : 0.000070s : 1: event_method 0.00% : 0.000019s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000014s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000005s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.05% : 0.000553s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.08% : 0.000806s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000025s : 1: opt.transform.mutable_eliminate 0.49% : 0.005067s : 117: opt.transform.opt_a 0.00% : 0.000035s : 1: opt.transform.opt_after_cconv 0.00% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000147s : 28: opt.transform.opt_b 0.01% : 0.000058s : 2: opt.transform.opt_trans_graph 0.00% : 0.000050s : 4: opt.transform.symbol_engine_opt 7.71% : 0.079781s : 1: opt_a 0.01% : 0.000138s : 1: opt_after_cconv 0.06% : 0.000606s : 1: opt_after_jit_grad 0.03% : 0.000301s : 1: opt_b 8.00% : 0.082712s : 1: optimize 0.00% : 0.000027s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000031s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000006s : 1: pipeline_split 0.01% : 0.000064s : 1: pre_auto_parallel 0.00% : 0.000052s : 1: py_interpret_to_execute 0.00% : 0.000026s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000024s : 1: remove_dup_value 0.73% : 0.007528s : 2: renormalize.infer 0.23% : 0.002343s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000058s : 1: rewriter_after_opt_a 0.02% : 0.000246s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000113s : 1: symbol_engine_optimizer 76.30% : 0.789038s : 1: task_emit 0.01% : 0.000094s : 1: tuple_transform 5.30% : 0.054762s : 1: type_inference 0.01% : 0.000085s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x1-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x1-ge],max_mem:4.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x2-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x2-pynative],max_mem:4.0M TotalTime = 0.0300474, [24] [bootstrap]: 0.00071937 [type_inference]: 0.0106234 [event_method]: 1.73e-05 [auto_monad]: 6.224e-05 [graph_reusing]: 4.61002e-06 [inline]: 3.54002e-06 [add_attr]: 0.00437387, [1] [add_attr_with_inline]: 0.00436024, [1] [Cycle 1]: 5.468e-05, [2] [tag_attr]: 2.014e-05 [meta_addattr_fg_expand]: 5.46002e-06 [parallel-infer-symbol]: 3.72002e-06 [pre_auto_parallel]: 3.714e-05 [insert-virtual-dataset]: 2.59999e-06 [parallel-infer-symbol-second]: 1.10001e-06 [dataset_repeat_opt]: 2.33998e-06 [pipeline_split]: 1.86998e-06 [optimize]: 0.00502794, [53] [py_interpret_to_execute]: 2.824e-05 [rewriter_before_opt_a]: 7.447e-05 [opt_a]: 0.00269604, [2] [Cycle 1]: 0.00201674, [45] [expand_dump_flag]: 2.48e-06 [switch_simplify]: 2.944e-05 [loop_unroll]: 2.125e-05 [a_1]: 0.00045383 [with_stream_mark]: 1.186e-05 [recompute_prepare]: 8.67e-06 [updatestate_depend_eliminate]: 3.54002e-06 [updatestate_assign_eliminate]: 2.66999e-06 [updatestate_loads_eliminate]: 2.74999e-06 [parameter_eliminate]: 1.07998e-06 [a_2]: 8.071e-05 [accelerated_algorithm]: 7.01999e-06 [shard]: 2.31998e-06 [meta_shard_fg_expand]: 1.36002e-06 [shard_inline]: 6.33e-06 [merge_send_recv]: 9.51e-06 [auto_parallel]: 7.68999e-06 [parallel]: 2.814e-05 [flash_sp]: 8.50001e-06 [merge_comm]: 5.17e-06 [allreduce_fusion]: 3.73001e-06 [matmul_add_comm_reduction]: 1.116e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 8.31002e-06 [virtual_dataset]: 6.99001e-06 [get_grad_eliminate_]: 6.38e-06 [virtual_output]: 6.11e-06 [merge_forward]: 2.85998e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 1.155e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.279e-05 [merge_recompute_call_nodes]: 1.57999e-06 [before_grad]: 1.124e-05 [set_forward_comm_id_for_comm_node_pass]: 3.88999e-06 [meta_fg_expand]: 2.56e-06 [flash_sp_send_recv_attached]: 2.59999e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 9.97999e-06 [a_after_grad]: 9.97001e-06 [renormalize]: 0.00082923 [add_forward_monad_depend]: 1.353e-05 [auto_monad_grad]: 2.53e-06 [auto_monad_eliminator]: 1.669e-05 [cse]: 3.618e-05 [a_3]: 4.911e-05 [Cycle 2]: 0.00066467, [45] [expand_dump_flag]: 2.28998e-06 [switch_simplify]: 8.28001e-06 [loop_unroll]: 6.06998e-06 [a_1]: 0.00012634 [with_stream_mark]: 1.418e-05 [recompute_prepare]: 6.26e-06 [updatestate_depend_eliminate]: 2.96999e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 1.49e-06 [a_2]: 7.333e-05 [accelerated_algorithm]: 6.61e-06 [shard]: 1.44e-06 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 6.26998e-06 [merge_send_recv]: 7.10002e-06 [auto_parallel]: 8.27e-06 [parallel]: 6.80998e-06 [flash_sp]: 3.99002e-06 [merge_comm]: 3.43e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 7.73001e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 6.72002e-06 [virtual_dataset]: 5.62001e-06 [get_grad_eliminate_]: 5.32001e-06 [virtual_output]: 5.22999e-06 [merge_forward]: 3.61999e-06 [cell_reuse_recompute_pass]: 2.61e-06 [offload_activation]: 8.08001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.144e-05 [merge_recompute_call_nodes]: 1.20001e-06 [before_grad]: 9.56e-06 [set_forward_comm_id_for_comm_node_pass]: 4.4e-06 [meta_fg_expand]: 1.94999e-06 [flash_sp_send_recv_attached]: 1.94e-06 [receive_attached]: 2.05002e-06 [after_resolve]: 9.17999e-06 [a_after_grad]: 8.37e-06 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 7.11999e-06 [cse]: 2.146e-05 [a_3]: 3.381e-05 [py_interpret_to_execute_after_opt_a]: 1.218e-05 [slice_cell_reuse_recomputed_activation]: 2.06998e-06 [rewriter_after_opt_a]: 3.651e-05 [convert_after_rewriter]: 7.16999e-06 [order_py_execute_after_rewriter]: 5.45001e-06 [mutable_eliminate]: 0.00072042 [opt_b]: 0.00020261, [1] [Cycle 1]: 0.00019554, [7] [b_1]: 0.00011753 [b_2]: 8.23001e-06 [updatestate_depend_eliminate]: 5.84999e-06 [updatestate_assign_eliminate]: 2.73e-06 [updatestate_loads_eliminate]: 2.39999e-06 [renormalize]: 4.69998e-07 [cse]: 2.076e-05 [optimize_parallel_all_gather_comm]: 1.973e-05 [overlap_param_gather]: 2.09999e-06 [cconv]: 2.936e-05 [loop_unroll]: 0.00047123 [opt_after_cconv]: 0.00010511, [1] [Cycle 1]: 9.902e-05, [7] [c_1]: 2.832e-05 [parameter_eliminate]: 3.6e-06 [updatestate_depend_eliminate]: 6.14001e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.39999e-06 [cse]: 1.897e-05 [renormalize]: 6.50005e-07 [remove_dup_value]: 1.766e-05 [tuple_transform]: 8.146e-05, [1] [Cycle 1]: 7.692e-05, [4] [d_1]: 4.489e-05 [none_parameter_eliminate]: 1.75001e-06 [renormalize]: 4.69998e-07 [switch_simplify]: 8.26002e-06 [partial_unused_args_eliminate]: 2.19001e-06 [add_recomputation]: 5.13e-05 [cse_after_recomputation]: 2.386e-05, [1] [Cycle 1]: 1.878e-05, [1] [cse]: 1.315e-05 [environ_conv]: 1.231e-05 [swap_dp_allreduce_reducescatter]: 5.66e-06 [bias_add_comm_swap]: 3.11001e-06 [label_micro_interleaved_index]: 5.78002e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.53e-06 [assign_add_opt]: 1.37999e-06 [ForceFp32Comm]: 1.12e-06 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.69001e-06 [reorder_send_recv_between_fp_bp]: 3.21999e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.10001e-06 [interleave_split_concat_branches]: 1.22999e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.57001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.80001e-06 [control_data_broadcast_order]: 1.423e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 4.2e-06 [overlap_recompute_and_grad_model_parallel]: 5.48002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39e-06 [overlap_recompute_allgather_and_fa_grad]: 1.38002e-06 [overlap_recompute_comm]: 2.50002e-06 [overlap_grad_ring_attention]: 4.68999e-06 [overlap_grad_flash_sp]: 2.059e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.69999e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.34e-06 [symbol_engine_optimizer]: 8.064e-05, [1] [Cycle 1]: 7.569e-05, [6] [build]: 3.49001e-06 [elim_shapecalc]: 1.057e-05 [elim_not_effective]: 1.306e-05 [opt_reshape]: 7.05e-06 [fold_const_symbol]: 1.099e-05 [renormalize]: 2.50002e-07 [detach_backward]: 2.13998e-06 [pipeline_parallel_scheduler]: 1.69e-06 [auto_monad_reorder]: 1.776e-05 [get_jit_bprop_graph]: 2.26998e-06 [rewriter_after_jit_bprop_graph]: 0.00016196 [opt_after_jit_grad]: 0.00061383 [validate]: 9.666e-05 [backend_pass]: 1.97999e-06 [task_emit]: 0.00781352 [execute]: 9.69e-06 Sums bootstrap : 0.000719s : 2.95% type_inference : 0.010623s : 43.59% event_method : 0.000017s : 0.07% auto_monad : 0.000062s : 0.26% graph_reusing : 0.000005s : 0.02% inline : 0.000004s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000020s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000037s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000028s : 0.12% optimize.rewriter_before_opt_a : 0.000074s : 0.31% optimize.opt_a.expand_dump_flag : 0.000005s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.15% optimize.opt_a.loop_unroll : 0.000027s : 0.11% optimize.opt_a.a_1 : 0.000580s : 2.38% optimize.opt_a.with_stream_mark : 0.000026s : 0.11% optimize.opt_a.recompute_prepare : 0.000015s : 0.06% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.02% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000154s : 0.63% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.06% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000013s : 0.05% optimize.opt_a.merge_send_recv : 0.000017s : 0.07% optimize.opt_a.auto_parallel : 0.000016s : 0.07% optimize.opt_a.parallel : 0.000035s : 0.14% optimize.opt_a.flash_sp : 0.000012s : 0.05% optimize.opt_a.merge_comm : 0.000009s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.06% optimize.opt_a.virtual_dataset : 0.000013s : 0.05% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.05% optimize.opt_a.virtual_output : 0.000011s : 0.05% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.02% optimize.opt_a.offload_activation : 0.000020s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.10% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000021s : 0.09% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.03% optimize.opt_a.meta_fg_expand : 0.000005s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.08% optimize.opt_a.a_after_grad : 0.000018s : 0.08% optimize.opt_a.renormalize : 0.000829s : 3.40% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.06% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.10% optimize.opt_a.cse : 0.000058s : 0.24% optimize.opt_a.a_3 : 0.000083s : 0.34% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000037s : 0.15% optimize.convert_after_rewriter : 0.000007s : 0.03% optimize.order_py_execute_after_rewriter : 0.000005s : 0.02% optimize.mutable_eliminate : 0.000720s : 2.96% optimize.opt_b.b_1 : 0.000118s : 0.48% optimize.opt_b.b_2 : 0.000008s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000021s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.08% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000029s : 0.12% optimize.loop_unroll : 0.000471s : 1.93% optimize.opt_after_cconv.c_1 : 0.000028s : 0.12% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000019s : 0.08% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000018s : 0.07% optimize.tuple_transform.d_1 : 0.000045s : 0.18% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.21% optimize.cse_after_recomputation.cse : 0.000013s : 0.05% optimize.environ_conv : 0.000012s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000006s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000021s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000018s : 0.07% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000162s : 0.66% opt_after_jit_grad : 0.000614s : 2.52% validate : 0.000097s : 0.40% backend_pass : 0.000002s : 0.01% task_emit : 0.007814s : 32.06% execute : 0.000010s : 0.04% Time group info: ------[substitution.] 0.000174 26 23.18% : 0.000040s : 5: substitution.arithmetic_simplify 1.22% : 0.000002s : 2: substitution.elim_not_effective 1.21% : 0.000002s : 2: substitution.fold_const_symbol 3.50% : 0.000006s : 3: substitution.graph_param_transform 60.40% : 0.000105s : 3: substitution.inline 2.44% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.70% : 0.000005s : 4: substitution.remove_not_recompute_node 1.85% : 0.000003s : 2: substitution.replace_old_param 3.49% : 0.000006s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.010549 2 90.62% : 0.009559s : 1: type_inference.infer 9.38% : 0.000989s : 1: type_inference.specialize ------[replace.] 0.000037 4 77.46% : 0.000029s : 3: replace.inline 22.54% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000108 4 95.08% : 0.000103s : 3: match.inline 4.92% : 0.000005s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000172 883 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 1.19% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 0.87% : 0.000001s : 9: predicate.addn_zero_filter 0.78% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000004s : 15: predicate.arithmetic_simplify 1.07% : 0.000002s : 9: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.54% : 0.000001s : 6: predicate.compare_switch_simplify 0.18% : 0.000000s : 3: predicate.const_output_eliminate 0.58% : 0.000001s : 6: predicate.depend_value_elim 0.89% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.91% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.42% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_depend_swap 1.83% : 0.000003s : 18: predicate.environ_get_eliminate 1.29% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 13: predicate.exchange_switch_depend_value 1.99% : 0.000003s : 13: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.83% : 0.000001s : 6: predicate.get_grad_eliminate 0.20% : 0.000000s : 3: predicate.graph_param_transform 0.60% : 0.000001s : 6: predicate.incorporate_call 0.55% : 0.000001s : 6: predicate.incorporate_call_switch 5.92% : 0.000010s : 40: predicate.inline 0.98% : 0.000002s : 6: predicate.inline_without_move 0.38% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.00% : 0.000002s : 6: predicate.less_batch_normalization 1.94% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 25: predicate.load_eliminater 1.11% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.99% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.62% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 9: predicate.minmaximum_grad 1.24% : 0.000002s : 3: predicate.mutable_eliminate 0.33% : 0.000001s : 3: predicate.opt_reshape 0.42% : 0.000001s : 3: predicate.parallel_virtual_node 1.65% : 0.000003s : 13: predicate.partial_defer_inline 1.39% : 0.000002s : 13: predicate.partial_eliminate 0.90% : 0.000002s : 9: predicate.print_const_string_wrapper 0.60% : 0.000001s : 6: predicate.reduce_all_const_elim 1.22% : 0.000002s : 9: predicate.reduce_eliminate 2.37% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.38% : 0.000002s : 16: predicate.replace_applicator 0.47% : 0.000001s : 6: predicate.replace_old_param 0.48% : 0.000001s : 3: predicate.reset_defer_inline 0.90% : 0.000002s : 9: predicate.reshape_eliminate 0.72% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000001s : 6: predicate.same_eliminate 0.47% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 6: predicate.shard_identity_eliminate 1.13% : 0.000002s : 6: predicate.special_op_eliminate 0.77% : 0.000001s : 6: predicate.specialize_transform 1.20% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 13: predicate.switch_defer_inline 1.91% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.71% : 0.000008s : 43: predicate.switch_simplify 0.86% : 0.000001s : 9: predicate.tile_eliminate 0.88% : 0.000002s : 9: predicate.transpose_eliminate 1.58% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.61% : 0.000003s : 15: predicate.tuple_list_get_item_depend_reorder 3.03% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.55% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.55% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.20% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.92% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 3: predicate.value_based_eliminate 0.78% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000627 8 45.91% : 0.000288s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.09% : 0.000339s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.041397 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.58% : 0.004381s : 1: add_attr 10.54% : 0.004364s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.13% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.17% : 0.000069s : 1: auto_monad 0.05% : 0.000022s : 1: auto_monad_reorder 0.03% : 0.000012s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 1.85% : 0.000764s : 1: bootstrap 0.08% : 0.000033s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000018s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000027s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.04% : 0.000016s : 1: environ_conv 0.06% : 0.000023s : 1: event_method 0.04% : 0.000017s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000008s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000007s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000009s : 1: label_micro_interleaved_index 1.16% : 0.000480s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 1.76% : 0.000730s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000015s : 1: opt.transform.mutable_eliminate 2.34% : 0.000968s : 78: opt.transform.opt_a 0.06% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000032s : 1: opt.transform.opt_after_jit_grad 0.23% : 0.000094s : 28: opt.transform.opt_b 0.12% : 0.000050s : 2: opt.transform.opt_trans_graph 0.09% : 0.000037s : 4: opt.transform.symbol_engine_opt 6.52% : 0.002700s : 1: opt_a 0.26% : 0.000108s : 1: opt_after_cconv 1.52% : 0.000629s : 1: opt_after_jit_grad 0.50% : 0.000206s : 1: opt_b 12.16% : 0.005034s : 1: optimize 0.06% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.06% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000005s : 1: parallel-infer-symbol-second 0.01% : 0.000006s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.10% : 0.000041s : 1: pre_auto_parallel 0.08% : 0.000033s : 1: py_interpret_to_execute 0.04% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000022s : 1: remove_dup_value 1.00% : 0.000413s : 1: renormalize.infer 0.99% : 0.000408s : 1: renormalize.specialize 0.02% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.41% : 0.000169s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000041s : 1: rewriter_after_opt_a 0.19% : 0.000079s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.02% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.20% : 0.000084s : 1: symbol_engine_optimizer 18.92% : 0.007834s : 1: task_emit 0.20% : 0.000085s : 1: tuple_transform 25.71% : 0.010645s : 1: type_inference 0.73% : 0.000302s : 1: validate TotalTime = 0.168753, [24] [bootstrap]: 0.0653553 [type_inference]: 0.00675747 [event_method]: 1.583e-05 [auto_monad]: 9.111e-05 [graph_reusing]: 6.68e-06 [inline]: 2.87002e-06 [add_attr]: 0.00378478, [1] [add_attr_with_inline]: 0.00377339, [1] [Cycle 1]: 6.862e-05, [2] [tag_attr]: 1.812e-05 [meta_addattr_fg_expand]: 4.34002e-06 [parallel-infer-symbol]: 3.53999e-06 [pre_auto_parallel]: 3.394e-05 [insert-virtual-dataset]: 2.58e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 2.12001e-06 [optimize]: 0.00520009, [53] [py_interpret_to_execute]: 2.494e-05 [rewriter_before_opt_a]: 6.37e-05 [opt_a]: 0.00265462, [2] [Cycle 1]: 0.00198266, [45] [expand_dump_flag]: 3.14999e-06 [switch_simplify]: 3.196e-05 [loop_unroll]: 1.772e-05 [a_1]: 0.00048229 [with_stream_mark]: 2.235e-05 [recompute_prepare]: 8.72e-06 [updatestate_depend_eliminate]: 4.60001e-06 [updatestate_assign_eliminate]: 3.42002e-06 [updatestate_loads_eliminate]: 3.26001e-06 [parameter_eliminate]: 2.02999e-06 [a_2]: 8.718e-05 [accelerated_algorithm]: 7.38999e-06 [shard]: 2.84999e-06 [meta_shard_fg_expand]: 2.30002e-06 [shard_inline]: 6.51999e-06 [merge_send_recv]: 8.93002e-06 [auto_parallel]: 7.58001e-06 [parallel]: 2.12e-05 [flash_sp]: 1.018e-05 [merge_comm]: 4.23001e-06 [allreduce_fusion]: 4.09002e-06 [matmul_add_comm_reduction]: 1.022e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 8.12e-06 [virtual_dataset]: 6.23998e-06 [get_grad_eliminate_]: 6.07999e-06 [virtual_output]: 6.29001e-06 [merge_forward]: 4.39002e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.12e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.306e-05 [merge_recompute_call_nodes]: 1.72001e-06 [before_grad]: 1.066e-05 [set_forward_comm_id_for_comm_node_pass]: 3.81001e-06 [meta_fg_expand]: 2.81e-06 [flash_sp_send_recv_attached]: 2.64001e-06 [receive_attached]: 2.39999e-06 [after_resolve]: 1.106e-05 [a_after_grad]: 8.92999e-06 [renormalize]: 0.00076522 [add_forward_monad_depend]: 5.35999e-06 [auto_monad_grad]: 2.93e-06 [auto_monad_eliminator]: 1.628e-05 [cse]: 3.65e-05 [a_3]: 4.858e-05 [Cycle 2]: 0.00065935, [45] [expand_dump_flag]: 1.45999e-06 [switch_simplify]: 7.31999e-06 [loop_unroll]: 5.81998e-06 [a_1]: 0.00012524 [with_stream_mark]: 1.324e-05 [recompute_prepare]: 6.34001e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 3.13998e-06 [parameter_eliminate]: 1.17e-06 [a_2]: 7.375e-05 [accelerated_algorithm]: 5.98998e-06 [shard]: 1.76998e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 6.46999e-06 [merge_send_recv]: 5.86e-06 [auto_parallel]: 7.46999e-06 [parallel]: 5.44998e-06 [flash_sp]: 3.92002e-06 [merge_comm]: 3.46001e-06 [allreduce_fusion]: 3.2e-06 [matmul_add_comm_reduction]: 6.63998e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 7.09001e-06 [virtual_dataset]: 6.09999e-06 [get_grad_eliminate_]: 5.61e-06 [virtual_output]: 5.46e-06 [merge_forward]: 3.08e-06 [cell_reuse_recompute_pass]: 2.09e-06 [offload_activation]: 1.233e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.184e-05 [merge_recompute_call_nodes]: 9.89996e-07 [before_grad]: 9.61e-06 [set_forward_comm_id_for_comm_node_pass]: 4.63999e-06 [meta_fg_expand]: 2.21998e-06 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.46002e-06 [after_resolve]: 9.84001e-06 [a_after_grad]: 8.54002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.44e-06 [auto_monad_grad]: 1.27e-06 [auto_monad_eliminator]: 8.05e-06 [cse]: 1.607e-05 [a_3]: 3.402e-05 [py_interpret_to_execute_after_opt_a]: 1.194e-05 [slice_cell_reuse_recomputed_activation]: 2.53003e-06 [rewriter_after_opt_a]: 3.972e-05 [convert_after_rewriter]: 7.09001e-06 [order_py_execute_after_rewriter]: 5.40999e-06 [mutable_eliminate]: 0.0007533 [opt_b]: 0.00020941, [1] [Cycle 1]: 0.00020071, [7] [b_1]: 0.000118 [b_2]: 7.71999e-06 [updatestate_depend_eliminate]: 7.59002e-06 [updatestate_assign_eliminate]: 2.77002e-06 [updatestate_loads_eliminate]: 3.49001e-06 [renormalize]: 1.00999e-06 [cse]: 2.383e-05 [optimize_parallel_all_gather_comm]: 1.889e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 3.213e-05 [loop_unroll]: 0.00059462 [opt_after_cconv]: 0.00012145, [1] [Cycle 1]: 0.00011317, [7] [c_1]: 3.154e-05 [parameter_eliminate]: 4.65999e-06 [updatestate_depend_eliminate]: 7.95998e-06 [updatestate_assign_eliminate]: 3.04001e-06 [updatestate_loads_eliminate]: 2.86e-06 [cse]: 2.319e-05 [renormalize]: 5.8001e-07 [remove_dup_value]: 1.871e-05 [tuple_transform]: 8.742e-05, [1] [Cycle 1]: 8.203e-05, [4] [d_1]: 5.013e-05 [none_parameter_eliminate]: 1.94e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 8.52e-06 [partial_unused_args_eliminate]: 2.16003e-06 [add_recomputation]: 5.699e-05 [cse_after_recomputation]: 2.688e-05, [1] [Cycle 1]: 2.161e-05, [1] [cse]: 1.406e-05 [environ_conv]: 7.00998e-06 [swap_dp_allreduce_reducescatter]: 6.57002e-06 [bias_add_comm_swap]: 3.08e-06 [label_micro_interleaved_index]: 6.26e-06 [label_fine_grained_interleaved_index]: 2.88e-06 [merge_cast_opt]: 1.91e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 3.06999e-06 [assign_add_opt]: 1.64998e-06 [ForceFp32Comm]: 9.39996e-07 [remove_cast_before_assign_add]: 8.29983e-07 [full_micro_interleaved_order_control]: 2.58e-06 [reorder_send_recv_between_fp_bp]: 2.94999e-06 [comm_op_add_attrs]: 1.23002e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.31002e-06 [interleave_parallel_branches]: 1.34e-06 [overlap_opt_shard_in_pipeline]: 1.30999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.58e-05 [grouped_pairwise_exchange_alltoall]: 2.31998e-06 [offloading_packed_experts]: 5.39e-06 [overlap_recompute_and_grad_model_parallel]: 5.57001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54e-06 [overlap_recompute_comm]: 2.44001e-06 [overlap_grad_ring_attention]: 5.09e-06 [overlap_grad_flash_sp]: 2.233e-05 [begin_end_overlap_inline]: 6.19999e-07 [split_matmul_comm_elemetwise]: 2.56e-06 [split_layernorm_comm]: 2.39999e-06 [handle_group_info]: 1.43002e-06 [symbol_engine_optimizer]: 8.707e-05, [1] [Cycle 1]: 8.189e-05, [6] [build]: 3.96001e-06 [elim_shapecalc]: 1.098e-05 [elim_not_effective]: 1.524e-05 [opt_reshape]: 8.10999e-06 [fold_const_symbol]: 1.126e-05 [renormalize]: 2.50002e-07 [detach_backward]: 2.66e-06 [pipeline_parallel_scheduler]: 1.64998e-06 [auto_monad_reorder]: 1.826e-05 [get_jit_bprop_graph]: 2.24001e-06 [rewriter_after_jit_bprop_graph]: 6.22001e-06 [opt_after_jit_grad]: 0.00061371 [validate]: 5.043e-05 [backend_pass]: 1.23002e-06 [task_emit]: 0.0865065 [execute]: 1.113e-05 Sums bootstrap : 0.065355s : 39.90% type_inference : 0.006757s : 4.13% event_method : 0.000016s : 0.01% auto_monad : 0.000091s : 0.06% graph_reusing : 0.000007s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000034s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000025s : 0.02% optimize.rewriter_before_opt_a : 0.000064s : 0.04% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.02% optimize.opt_a.loop_unroll : 0.000024s : 0.01% optimize.opt_a.a_1 : 0.000608s : 0.37% optimize.opt_a.with_stream_mark : 0.000036s : 0.02% optimize.opt_a.recompute_prepare : 0.000015s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000161s : 0.10% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.01% optimize.opt_a.merge_send_recv : 0.000015s : 0.01% optimize.opt_a.auto_parallel : 0.000015s : 0.01% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000014s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.01% optimize.opt_a.virtual_output : 0.000012s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000024s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.01% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000765s : 0.47% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.01% optimize.opt_a.cse : 0.000053s : 0.03% optimize.opt_a.a_3 : 0.000083s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000040s : 0.02% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000753s : 0.46% optimize.opt_b.b_1 : 0.000118s : 0.07% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000024s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000032s : 0.02% optimize.loop_unroll : 0.000595s : 0.36% optimize.opt_after_cconv.c_1 : 0.000032s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000023s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000019s : 0.01% optimize.tuple_transform.d_1 : 0.000050s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.03% optimize.cse_after_recomputation.cse : 0.000014s : 0.01% optimize.environ_conv : 0.000007s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000022s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000614s : 0.37% validate : 0.000050s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.086506s : 52.81% execute : 0.000011s : 0.01% Time group info: ------[substitution.] 0.000188 24 19.36% : 0.000036s : 4: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.27% : 0.000006s : 3: substitution.graph_param_transform 68.73% : 0.000129s : 3: substitution.inline 1.97% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000005s : 4: substitution.remove_not_recompute_node 2.16% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.006694 2 91.47% : 0.006123s : 1: type_inference.infer 8.53% : 0.000571s : 1: type_inference.specialize ------[replace.] 0.000035 3 100.00% : 0.000035s : 3: replace.inline ------[match.] 0.000127 3 100.00% : 0.000127s : 3: match.inline ------[predicate.] 0.000165 815 0.90% : 0.000001s : 8: predicate.accumulaten_eliminater 1.31% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 6: predicate.addn_check_dump 1.06% : 0.000002s : 8: predicate.addn_zero_filter 0.72% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.42% : 0.000004s : 14: predicate.arithmetic_simplify 0.90% : 0.000001s : 8: predicate.cast_eliminate 0.86% : 0.000001s : 6: predicate.check_bprop_eliminate 0.61% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.56% : 0.000001s : 6: predicate.depend_value_elim 0.81% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.93% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.31% : 0.000001s : 3: predicate.elim_not_effective 0.49% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.03% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 11: predicate.environ_get_depend_swap 1.62% : 0.000003s : 17: predicate.environ_get_eliminate 1.12% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.13% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.07% : 0.000003s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 6: predicate.float_environ_get_switch 1.08% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.79% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.67% : 0.000001s : 6: predicate.incorporate_call 0.56% : 0.000001s : 6: predicate.incorporate_call_switch 6.22% : 0.000010s : 37: predicate.inline 0.91% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.04% : 0.000002s : 6: predicate.less_batch_normalization 1.55% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.18% : 0.000004s : 22: predicate.load_eliminater 1.27% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.86% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.61% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 8: predicate.minmaximum_grad 2.12% : 0.000004s : 3: predicate.mutable_eliminate 0.45% : 0.000001s : 3: predicate.opt_reshape 0.46% : 0.000001s : 3: predicate.parallel_virtual_node 1.49% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 11: predicate.partial_eliminate 0.79% : 0.000001s : 8: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.16% : 0.000002s : 8: predicate.reduce_eliminate 2.12% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.57% : 0.000001s : 6: predicate.remove_not_recompute_node 1.20% : 0.000002s : 14: predicate.replace_applicator 0.80% : 0.000001s : 6: predicate.replace_old_param 0.45% : 0.000001s : 3: predicate.reset_defer_inline 0.83% : 0.000001s : 8: predicate.reshape_eliminate 0.67% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 3: predicate.row_tensor_eliminate 0.99% : 0.000002s : 6: predicate.same_eliminate 0.47% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 6: predicate.shard_identity_eliminate 0.87% : 0.000001s : 6: predicate.special_op_eliminate 0.83% : 0.000001s : 6: predicate.specialize_transform 0.98% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.18% : 0.000002s : 11: predicate.switch_defer_inline 1.73% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.78% : 0.000008s : 38: predicate.switch_simplify 0.78% : 0.000001s : 8: predicate.tile_eliminate 0.90% : 0.000001s : 8: predicate.transpose_eliminate 1.48% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.70% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.05% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.57% : 0.000003s : 14: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.17% : 0.000004s : 22: predicate.updatestate_pure_node_eliminater 2.89% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.76% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.70% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000390 7 31.45% : 0.000123s : 2: func_graph_cloner_run.FuncGraphClonerGraph 68.55% : 0.000267s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.179667 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.11% : 0.003791s : 1: add_attr 2.10% : 0.003777s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000062s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.06% : 0.000100s : 1: auto_monad 0.01% : 0.000023s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 36.40% : 0.065392s : 1: bootstrap 0.02% : 0.000036s : 1: cconv 0.00% : 0.000005s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000031s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.01% : 0.000023s : 1: event_method 0.01% : 0.000019s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000011s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000005s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000005s : 1: interleave_parallel_branches 0.00% : 0.000005s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000010s : 1: label_micro_interleaved_index 0.34% : 0.000606s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.43% : 0.000766s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000019s : 1: opt.transform.mutable_eliminate 0.55% : 0.000996s : 78: opt.transform.opt_a 0.02% : 0.000030s : 1: opt.transform.opt_after_cconv 0.02% : 0.000029s : 1: opt.transform.opt_after_jit_grad 0.05% : 0.000095s : 28: opt.transform.opt_b 0.03% : 0.000056s : 2: opt.transform.opt_trans_graph 0.02% : 0.000041s : 4: opt.transform.symbol_engine_opt 1.48% : 0.002658s : 1: opt_a 0.07% : 0.000127s : 1: opt_after_cconv 0.35% : 0.000628s : 1: opt_after_jit_grad 0.12% : 0.000213s : 1: opt_b 2.90% : 0.005206s : 1: optimize 0.01% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000038s : 1: pre_auto_parallel 0.02% : 0.000029s : 1: py_interpret_to_execute 0.01% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000023s : 1: remove_dup_value 0.24% : 0.000428s : 1: renormalize.infer 0.18% : 0.000328s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000044s : 1: rewriter_after_opt_a 0.04% : 0.000068s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000006s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000090s : 1: symbol_engine_optimizer 48.16% : 0.086529s : 1: task_emit 0.05% : 0.000091s : 1: tuple_transform 3.78% : 0.006786s : 1: type_inference 0.05% : 0.000092s : 1: validate TotalTime = 0.136523, [24] [bootstrap]: 0.00052279 [type_inference]: 0.0483783 [event_method]: 1.904e-05 [auto_monad]: 6.838e-05 [graph_reusing]: 5.95002e-06 [inline]: 3.19001e-06 [add_attr]: 0.00449921, [1] [add_attr_with_inline]: 0.00448238, [1] [Cycle 1]: 8.439e-05, [2] [tag_attr]: 2.39e-05 [meta_addattr_fg_expand]: 6.11e-06 [parallel-infer-symbol]: 3.91999e-06 [pre_auto_parallel]: 4.161e-05 [insert-virtual-dataset]: 3.18e-06 [parallel-infer-symbol-second]: 9.79984e-07 [dataset_repeat_opt]: 2.41998e-06 [pipeline_split]: 2.24001e-06 [optimize]: 0.00619802, [53] [py_interpret_to_execute]: 3.326e-05 [rewriter_before_opt_a]: 8.388e-05 [opt_a]: 0.00324137, [2] [Cycle 1]: 0.00246262, [45] [expand_dump_flag]: 3.39001e-06 [switch_simplify]: 3.819e-05 [loop_unroll]: 2.256e-05 [a_1]: 0.00054332 [with_stream_mark]: 2.219e-05 [recompute_prepare]: 1.417e-05 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 4.03001e-06 [updatestate_loads_eliminate]: 3.68999e-06 [parameter_eliminate]: 2.53e-06 [a_2]: 9.123e-05 [accelerated_algorithm]: 9.13002e-06 [shard]: 2.75002e-06 [meta_shard_fg_expand]: 2.48998e-06 [shard_inline]: 6.66e-06 [merge_send_recv]: 1.148e-05 [auto_parallel]: 1.206e-05 [parallel]: 2.344e-05 [flash_sp]: 1.2e-05 [merge_comm]: 4.95001e-06 [allreduce_fusion]: 4.06001e-06 [matmul_add_comm_reduction]: 1.147e-05 [allreduce_slice_to_reducescatter]: 8.50006e-07 [virtual_shard_identity]: 1.409e-05 [virtual_dataset]: 7.87e-06 [get_grad_eliminate_]: 7.19001e-06 [virtual_output]: 6.54999e-06 [merge_forward]: 5.02e-06 [cell_reuse_recompute_pass]: 1.60001e-06 [offload_activation]: 1.163e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.599e-05 [merge_recompute_call_nodes]: 1.62999e-06 [before_grad]: 1.281e-05 [set_forward_comm_id_for_comm_node_pass]: 4.60999e-06 [meta_fg_expand]: 4.28001e-06 [flash_sp_send_recv_attached]: 3.76001e-06 [receive_attached]: 2.58e-06 [after_resolve]: 1.588e-05 [a_after_grad]: 1.096e-05 [renormalize]: 0.00098145 [add_forward_monad_depend]: 1.078e-05 [auto_monad_grad]: 2.70002e-06 [auto_monad_eliminator]: 2.131e-05 [cse]: 3.538e-05 [a_3]: 5.918e-05 [Cycle 2]: 0.00076216, [45] [expand_dump_flag]: 2.96001e-06 [switch_simplify]: 8.45001e-06 [loop_unroll]: 7.1e-06 [a_1]: 0.0001432 [with_stream_mark]: 1.856e-05 [recompute_prepare]: 6.78e-06 [updatestate_depend_eliminate]: 4.05e-06 [updatestate_assign_eliminate]: 2.94999e-06 [updatestate_loads_eliminate]: 3.75e-06 [parameter_eliminate]: 2.04999e-06 [a_2]: 7.779e-05 [accelerated_algorithm]: 6.21998e-06 [shard]: 2.07001e-06 [meta_shard_fg_expand]: 2.88e-06 [shard_inline]: 6.98998e-06 [merge_send_recv]: 8.85999e-06 [auto_parallel]: 1.016e-05 [parallel]: 1.03e-05 [flash_sp]: 4.82e-06 [merge_comm]: 4.07998e-06 [allreduce_fusion]: 3.9e-06 [matmul_add_comm_reduction]: 9.37001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.77e-06 [virtual_dataset]: 6.21e-06 [get_grad_eliminate_]: 5.75001e-06 [virtual_output]: 5.61998e-06 [merge_forward]: 4.64002e-06 [cell_reuse_recompute_pass]: 2.80002e-06 [offload_activation]: 1.082e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.271e-05 [merge_recompute_call_nodes]: 1.20999e-06 [before_grad]: 1.067e-05 [set_forward_comm_id_for_comm_node_pass]: 4.95999e-06 [meta_fg_expand]: 3.01999e-06 [flash_sp_send_recv_attached]: 1.88002e-06 [receive_attached]: 2.11e-06 [after_resolve]: 1.212e-05 [a_after_grad]: 8.80001e-06 [renormalize]: 1.60013e-07 [add_forward_monad_depend]: 2.83e-06 [auto_monad_grad]: 1.99999e-06 [auto_monad_eliminator]: 1.105e-05 [cse]: 3.235e-05 [a_3]: 3.548e-05 [py_interpret_to_execute_after_opt_a]: 1.912e-05 [slice_cell_reuse_recomputed_activation]: 2.33998e-06 [rewriter_after_opt_a]: 4.626e-05 [convert_after_rewriter]: 7.65e-06 [order_py_execute_after_rewriter]: 5.40001e-06 [mutable_eliminate]: 0.00088557 [opt_b]: 0.00024706, [1] [Cycle 1]: 0.00023611, [7] [b_1]: 0.00013109 [b_2]: 8.69003e-06 [updatestate_depend_eliminate]: 1.072e-05 [updatestate_assign_eliminate]: 2.96001e-06 [updatestate_loads_eliminate]: 3.14001e-06 [renormalize]: 1.12e-06 [cse]: 3.646e-05 [optimize_parallel_all_gather_comm]: 2.609e-05 [overlap_param_gather]: 2.34999e-06 [cconv]: 3.662e-05 [loop_unroll]: 0.00074497 [opt_after_cconv]: 0.00013294, [1] [Cycle 1]: 0.0001232, [7] [c_1]: 3.29e-05 [parameter_eliminate]: 6.21e-06 [updatestate_depend_eliminate]: 8.70999e-06 [updatestate_assign_eliminate]: 3.13998e-06 [updatestate_loads_eliminate]: 2.64999e-06 [cse]: 2.953e-05 [renormalize]: 5.50004e-07 [remove_dup_value]: 1.9e-05 [tuple_transform]: 8.632e-05, [1] [Cycle 1]: 8.114e-05, [4] [d_1]: 5.179e-05 [none_parameter_eliminate]: 1.74998e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 7.03998e-06 [partial_unused_args_eliminate]: 2.33998e-06 [add_recomputation]: 5.644e-05 [cse_after_recomputation]: 2.633e-05, [1] [Cycle 1]: 2.083e-05, [1] [cse]: 1.385e-05 [environ_conv]: 7.73001e-06 [swap_dp_allreduce_reducescatter]: 5.53002e-06 [bias_add_comm_swap]: 2.99999e-06 [label_micro_interleaved_index]: 7.66001e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.64e-06 [slice_recompute_activation]: 2.76999e-06 [micro_interleaved_order_control]: 2.83e-06 [assign_add_opt]: 1.45001e-06 [ForceFp32Comm]: 1.20001e-06 [remove_cast_before_assign_add]: 1.11002e-06 [full_micro_interleaved_order_control]: 2.40997e-06 [reorder_send_recv_between_fp_bp]: 2.98e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.11997e-06 [overlap_opt_shard_in_pipeline]: 1.40001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.66e-05 [grouped_pairwise_exchange_alltoall]: 1.87001e-06 [offloading_packed_experts]: 4.95999e-06 [overlap_recompute_and_grad_model_parallel]: 5.34998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.43002e-06 [overlap_recompute_comm]: 2.30002e-06 [overlap_grad_ring_attention]: 4.38001e-06 [overlap_grad_flash_sp]: 2.477e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.57001e-06 [split_layernorm_comm]: 1.87001e-06 [handle_group_info]: 1.31002e-06 [symbol_engine_optimizer]: 9.457e-05, [1] [Cycle 1]: 8.763e-05, [6] [build]: 5.10999e-06 [elim_shapecalc]: 1.588e-05 [elim_not_effective]: 1.44e-05 [opt_reshape]: 8.33999e-06 [fold_const_symbol]: 1.05e-05 [renormalize]: 5.10016e-07 [detach_backward]: 2.97002e-06 [pipeline_parallel_scheduler]: 1.71e-06 [auto_monad_reorder]: 2.011e-05 [get_jit_bprop_graph]: 2.48e-06 [rewriter_after_jit_bprop_graph]: 5.73002e-06 [opt_after_jit_grad]: 0.00077304 [validate]: 5.715e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.0755997 [execute]: 9.46e-06 Sums bootstrap : 0.000523s : 0.40% type_inference : 0.048378s : 37.02% event_method : 0.000019s : 0.01% auto_monad : 0.000068s : 0.05% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000024s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000006s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000042s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000033s : 0.03% optimize.rewriter_before_opt_a : 0.000084s : 0.06% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000047s : 0.04% optimize.opt_a.loop_unroll : 0.000030s : 0.02% optimize.opt_a.a_1 : 0.000687s : 0.53% optimize.opt_a.with_stream_mark : 0.000041s : 0.03% optimize.opt_a.recompute_prepare : 0.000021s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000169s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000015s : 0.01% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000005s : 0.00% optimize.opt_a.shard_inline : 0.000014s : 0.01% optimize.opt_a.merge_send_recv : 0.000020s : 0.02% optimize.opt_a.auto_parallel : 0.000022s : 0.02% optimize.opt_a.parallel : 0.000034s : 0.03% optimize.opt_a.flash_sp : 0.000017s : 0.01% optimize.opt_a.merge_comm : 0.000009s : 0.01% optimize.opt_a.allreduce_fusion : 0.000008s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000021s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000022s : 0.02% optimize.opt_a.virtual_dataset : 0.000014s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000013s : 0.01% optimize.opt_a.virtual_output : 0.000012s : 0.01% optimize.opt_a.merge_forward : 0.000010s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000022s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000029s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000023s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000010s : 0.01% optimize.opt_a.meta_fg_expand : 0.000007s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000028s : 0.02% optimize.opt_a.a_after_grad : 0.000020s : 0.02% optimize.opt_a.renormalize : 0.000982s : 0.75% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000005s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000032s : 0.02% optimize.opt_a.cse : 0.000068s : 0.05% optimize.opt_a.a_3 : 0.000095s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000019s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.04% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000886s : 0.68% optimize.opt_b.b_1 : 0.000131s : 0.10% optimize.opt_b.b_2 : 0.000009s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000011s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000036s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000037s : 0.03% optimize.loop_unroll : 0.000745s : 0.57% optimize.opt_after_cconv.c_1 : 0.000033s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000019s : 0.01% optimize.tuple_transform.d_1 : 0.000052s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.04% optimize.cse_after_recomputation.cse : 0.000014s : 0.01% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000008s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000005s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000016s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000001s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000020s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000773s : 0.59% validate : 0.000057s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.075600s : 57.85% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000253 26 19.29% : 0.000049s : 5: substitution.arithmetic_simplify 1.01% : 0.000003s : 2: substitution.elim_not_effective 0.64% : 0.000002s : 2: substitution.fold_const_symbol 2.80% : 0.000007s : 3: substitution.graph_param_transform 64.21% : 0.000162s : 3: substitution.inline 2.26% : 0.000006s : 4: substitution.j_node_and_user_rematch 2.36% : 0.000006s : 4: substitution.remove_not_recompute_node 2.90% : 0.000007s : 2: substitution.replace_old_param 4.53% : 0.000011s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.048310 2 98.36% : 0.047520s : 1: type_inference.infer 1.64% : 0.000790s : 1: type_inference.specialize ------[replace.] 0.000048 4 78.97% : 0.000038s : 3: replace.inline 21.03% : 0.000010s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000170 4 94.07% : 0.000160s : 3: match.inline 5.93% : 0.000010s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000197 883 0.76% : 0.000001s : 9: predicate.accumulaten_eliminater 1.76% : 0.000003s : 3: predicate.ad_related_special_op_eliminate 0.51% : 0.000001s : 6: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.81% : 0.000002s : 9: predicate.adjust_all_reduce_mul_add 2.50% : 0.000005s : 15: predicate.arithmetic_simplify 1.05% : 0.000002s : 9: predicate.cast_eliminate 0.60% : 0.000001s : 6: predicate.check_bprop_eliminate 0.47% : 0.000001s : 6: predicate.compare_switch_simplify 0.15% : 0.000000s : 3: predicate.const_output_eliminate 0.78% : 0.000002s : 6: predicate.depend_value_elim 0.77% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.29% : 0.000001s : 3: predicate.elim_not_effective 0.65% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.25% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 12: predicate.environ_get_depend_swap 1.90% : 0.000004s : 18: predicate.environ_get_eliminate 1.00% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.21% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.13% : 0.000004s : 13: predicate.float_depend_g_call 0.52% : 0.000001s : 6: predicate.float_environ_get_switch 0.82% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.18% : 0.000000s : 3: predicate.fold_const_symbol 0.62% : 0.000001s : 6: predicate.get_grad_eliminate 0.31% : 0.000001s : 3: predicate.graph_param_transform 0.59% : 0.000001s : 6: predicate.incorporate_call 0.47% : 0.000001s : 6: predicate.incorporate_call_switch 6.09% : 0.000012s : 40: predicate.inline 1.24% : 0.000002s : 6: predicate.inline_without_move 0.34% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.84% : 0.000002s : 6: predicate.less_batch_normalization 1.56% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.27% : 0.000004s : 25: predicate.load_eliminater 1.58% : 0.000003s : 3: predicate.loop_unroll_after_grad 1.97% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.49% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.92% : 0.000002s : 6: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.66% : 0.000003s : 3: predicate.mutable_eliminate 0.45% : 0.000001s : 3: predicate.opt_reshape 0.49% : 0.000001s : 3: predicate.parallel_virtual_node 1.40% : 0.000003s : 13: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.98% : 0.000002s : 9: predicate.print_const_string_wrapper 0.68% : 0.000001s : 6: predicate.reduce_all_const_elim 1.30% : 0.000003s : 9: predicate.reduce_eliminate 2.27% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 6: predicate.remove_not_recompute_node 1.17% : 0.000002s : 16: predicate.replace_applicator 0.66% : 0.000001s : 6: predicate.replace_old_param 0.35% : 0.000001s : 3: predicate.reset_defer_inline 0.97% : 0.000002s : 9: predicate.reshape_eliminate 0.62% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.58% : 0.000001s : 3: predicate.row_tensor_eliminate 0.71% : 0.000001s : 6: predicate.same_eliminate 0.50% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.17% : 0.000002s : 6: predicate.shard_identity_eliminate 0.80% : 0.000002s : 6: predicate.special_op_eliminate 0.69% : 0.000001s : 6: predicate.specialize_transform 1.21% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.75% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.13% : 0.000002s : 13: predicate.switch_defer_inline 1.63% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.34% : 0.000009s : 43: predicate.switch_simplify 0.90% : 0.000002s : 9: predicate.tile_eliminate 0.76% : 0.000002s : 9: predicate.transpose_eliminate 1.32% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.62% : 0.000003s : 15: predicate.tuple_list_get_item_depend_reorder 3.68% : 0.000007s : 22: predicate.tuple_list_get_item_eliminator 1.50% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 2.55% : 0.000005s : 21: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.13% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.93% : 0.000006s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.80% : 0.000002s : 6: predicate.virtual_dataset_eliminate 0.61% : 0.000001s : 6: predicate.virtual_output_eliminate 0.26% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000565 8 45.02% : 0.000254s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.98% : 0.000310s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.149516 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.01% : 0.004507s : 1: add_attr 3.00% : 0.004487s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000075s : 1: auto_monad 0.02% : 0.000025s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.38% : 0.000568s : 1: bootstrap 0.03% : 0.000040s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000030s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.02% : 0.000027s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.01% : 0.000009s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000011s : 1: label_micro_interleaved_index 0.51% : 0.000760s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.60% : 0.000904s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000024s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000027s : 1: opt.transform.mutable_eliminate 0.75% : 0.001127s : 78: opt.transform.opt_a 0.02% : 0.000031s : 1: opt.transform.opt_after_cconv 0.02% : 0.000037s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000103s : 28: opt.transform.opt_b 0.04% : 0.000056s : 2: opt.transform.opt_trans_graph 0.03% : 0.000044s : 4: opt.transform.symbol_engine_opt 2.17% : 0.003245s : 1: opt_a 0.09% : 0.000139s : 1: opt_after_cconv 0.53% : 0.000792s : 1: opt_after_jit_grad 0.17% : 0.000252s : 1: opt_b 4.15% : 0.006205s : 1: optimize 0.02% : 0.000030s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000047s : 1: pre_auto_parallel 0.03% : 0.000038s : 1: py_interpret_to_execute 0.02% : 0.000023s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000023s : 1: remove_dup_value 0.35% : 0.000524s : 1: renormalize.infer 0.30% : 0.000444s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000052s : 1: rewriter_after_opt_a 0.06% : 0.000089s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000097s : 1: symbol_engine_optimizer 50.58% : 0.075618s : 1: task_emit 0.06% : 0.000089s : 1: tuple_transform 32.38% : 0.048407s : 1: type_inference 0.07% : 0.000108s : 1: validate TotalTime = 0.157709, [24] [bootstrap]: 0.00052945 [type_inference]: 0.0451341 [event_method]: 6.163e-05 [auto_monad]: 0.00015104 [graph_reusing]: 8.97999e-06 [inline]: 2.58e-06 [add_attr]: 0.00358084, [1] [add_attr_with_inline]: 0.00356903, [1] [Cycle 1]: 8.808e-05, [2] [tag_attr]: 4.119e-05 [meta_addattr_fg_expand]: 1.178e-05 [parallel-infer-symbol]: 3.15998e-06 [pre_auto_parallel]: 5.781e-05 [insert-virtual-dataset]: 2.92002e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.07999e-06 [pipeline_split]: 2.05002e-06 [optimize]: 0.0787542, [53] [py_interpret_to_execute]: 4.3e-05 [rewriter_before_opt_a]: 0.00017444 [opt_a]: 0.0760118, [3] [Cycle 1]: 0.0546376, [45] [expand_dump_flag]: 5.69999e-06 [switch_simplify]: 7.966e-05 [loop_unroll]: 6.524e-05 [a_1]: 0.00157525 [with_stream_mark]: 2.704e-05 [recompute_prepare]: 2.522e-05 [updatestate_depend_eliminate]: 9.89001e-06 [updatestate_assign_eliminate]: 8.00999e-06 [updatestate_loads_eliminate]: 7.23e-06 [parameter_eliminate]: 3.01999e-06 [a_2]: 0.00026338 [accelerated_algorithm]: 3.437e-05 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 4.82998e-06 [shard_inline]: 1.713e-05 [merge_send_recv]: 1.704e-05 [auto_parallel]: 1.43e-05 [parallel]: 2.086e-05 [flash_sp]: 1.387e-05 [merge_comm]: 4.551e-05 [allreduce_fusion]: 1.068e-05 [matmul_add_comm_reduction]: 3.722e-05 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 4.565e-05 [virtual_dataset]: 1.83e-05 [get_grad_eliminate_]: 1.72e-05 [virtual_output]: 1.619e-05 [merge_forward]: 1.167e-05 [cell_reuse_recompute_pass]: 3.18998e-06 [offload_activation]: 2.154e-05 [cell_reuse_handle_not_recompute_node_pass]: 4.324e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 3.022e-05 [set_forward_comm_id_for_comm_node_pass]: 1.114e-05 [meta_fg_expand]: 0.0130877 [flash_sp_send_recv_attached]: 5.94e-06 [receive_attached]: 3.33998e-06 [after_resolve]: 8.491e-05 [a_after_grad]: 9.963e-05 [renormalize]: 0.0187489 [add_forward_monad_depend]: 1.771e-05 [auto_monad_grad]: 7.19001e-06 [auto_monad_eliminator]: 6.515e-05 [cse]: 0.00022619 [a_3]: 0.00037735 [Cycle 2]: 0.00366372, [45] [expand_dump_flag]: 3.03e-06 [switch_simplify]: 5.027e-05 [loop_unroll]: 4.355e-05 [a_1]: 0.00150918 [with_stream_mark]: 2.348e-05 [recompute_prepare]: 1.389e-05 [updatestate_depend_eliminate]: 5.64e-06 [updatestate_assign_eliminate]: 4.18001e-06 [updatestate_loads_eliminate]: 3.46999e-06 [parameter_eliminate]: 2.39001e-06 [a_2]: 0.0001007 [accelerated_algorithm]: 1.502e-05 [shard]: 2.36998e-06 [meta_shard_fg_expand]: 2.88e-06 [shard_inline]: 7.92998e-06 [merge_send_recv]: 1.118e-05 [auto_parallel]: 1.171e-05 [parallel]: 1.067e-05 [flash_sp]: 5.05999e-06 [merge_comm]: 4.71002e-06 [allreduce_fusion]: 4.10998e-06 [matmul_add_comm_reduction]: 1.187e-05 [allreduce_slice_to_reducescatter]: 1.11002e-06 [virtual_shard_identity]: 1.086e-05 [virtual_dataset]: 7.31999e-06 [get_grad_eliminate_]: 7.08998e-06 [virtual_output]: 6.57002e-06 [merge_forward]: 4.70999e-06 [cell_reuse_recompute_pass]: 1.37999e-06 [offload_activation]: 1.27e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.58e-05 [merge_recompute_call_nodes]: 1.72999e-06 [before_grad]: 1.331e-05 [set_forward_comm_id_for_comm_node_pass]: 4.4e-06 [meta_fg_expand]: 0.00014152 [flash_sp_send_recv_attached]: 2.36998e-06 [receive_attached]: 2.74001e-06 [after_resolve]: 2.003e-05 [a_after_grad]: 1.173e-05 [renormalize]: 0.00109451 [add_forward_monad_depend]: 9.11002e-06 [auto_monad_grad]: 2.69001e-06 [auto_monad_eliminator]: 2.074e-05 [cse]: 3.932e-05 [a_3]: 6.139e-05 [Cycle 3]: 0.0176879, [45] [expand_dump_flag]: 0.016632 [switch_simplify]: 4.848e-05 [loop_unroll]: 1.037e-05 [a_1]: 0.0002031 [with_stream_mark]: 3.132e-05 [recompute_prepare]: 8.34998e-06 [updatestate_depend_eliminate]: 6.44001e-06 [updatestate_assign_eliminate]: 4.2e-06 [updatestate_loads_eliminate]: 3.75e-06 [parameter_eliminate]: 2.89001e-06 [a_2]: 0.00010372 [accelerated_algorithm]: 1.384e-05 [shard]: 2.63998e-06 [meta_shard_fg_expand]: 4.1e-06 [shard_inline]: 7.82e-06 [merge_send_recv]: 1.14e-05 [auto_parallel]: 1.129e-05 [parallel]: 1.304e-05 [flash_sp]: 2.07001e-06 [merge_comm]: 4.86002e-06 [allreduce_fusion]: 4.16001e-06 [matmul_add_comm_reduction]: 1.442e-05 [allreduce_slice_to_reducescatter]: 1.46002e-06 [virtual_shard_identity]: 8.80999e-06 [virtual_dataset]: 7.31001e-06 [get_grad_eliminate_]: 7.02002e-06 [virtual_output]: 8e-06 [merge_forward]: 5.74999e-06 [cell_reuse_recompute_pass]: 3.65e-06 [offload_activation]: 1.356e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.634e-05 [merge_recompute_call_nodes]: 1.89999e-06 [before_grad]: 1.264e-05 [set_forward_comm_id_for_comm_node_pass]: 4.67e-06 [meta_fg_expand]: 3.73001e-06 [flash_sp_send_recv_attached]: 1.79e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 1.358e-05 [a_after_grad]: 1.062e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 3.21001e-06 [auto_monad_grad]: 3.77002e-06 [auto_monad_eliminator]: 2.07e-05 [cse]: 4.43e-05 [a_3]: 4.659e-05 [py_interpret_to_execute_after_opt_a]: 2.298e-05 [slice_cell_reuse_recomputed_activation]: 2.32001e-06 [rewriter_after_opt_a]: 5.431e-05 [convert_after_rewriter]: 7.42998e-06 [order_py_execute_after_rewriter]: 5.40999e-06 [mutable_eliminate]: 0.00077821 [opt_b]: 0.00024566, [1] [Cycle 1]: 0.00023586, [7] [b_1]: 0.00014533 [b_2]: 9.78002e-06 [updatestate_depend_eliminate]: 7.48e-06 [updatestate_assign_eliminate]: 3.28e-06 [updatestate_loads_eliminate]: 3.03e-06 [renormalize]: 6.19999e-07 [cse]: 2.797e-05 [optimize_parallel_all_gather_comm]: 2.069e-05 [overlap_param_gather]: 1.92001e-06 [cconv]: 3.695e-05 [loop_unroll]: 0.00052358 [opt_after_cconv]: 0.00012952, [1] [Cycle 1]: 0.00012201, [7] [c_1]: 3.779e-05 [parameter_eliminate]: 3.63e-06 [updatestate_depend_eliminate]: 7.18998e-06 [updatestate_assign_eliminate]: 3.88999e-06 [updatestate_loads_eliminate]: 3.81001e-06 [cse]: 2.591e-05 [renormalize]: 7.09988e-07 [remove_dup_value]: 1.926e-05 [tuple_transform]: 9.036e-05, [1] [Cycle 1]: 8.506e-05, [4] [d_1]: 5.37e-05 [none_parameter_eliminate]: 1.77999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 8.23001e-06 [partial_unused_args_eliminate]: 2.46e-06 [add_recomputation]: 6.2e-05 [cse_after_recomputation]: 2.843e-05, [1] [Cycle 1]: 2.34e-05, [1] [cse]: 1.671e-05 [environ_conv]: 1.114e-05 [swap_dp_allreduce_reducescatter]: 6.52001e-06 [bias_add_comm_swap]: 3.63999e-06 [label_micro_interleaved_index]: 5.15001e-06 [label_fine_grained_interleaved_index]: 2.81999e-06 [merge_cast_opt]: 1.54998e-06 [slice_recompute_activation]: 2.88e-06 [micro_interleaved_order_control]: 2.35002e-06 [assign_add_opt]: 1.62001e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 1.26997e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 3.01999e-06 [comm_op_add_attrs]: 1.32e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.30999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.73e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.78001e-06 [overlap_recompute_and_grad_model_parallel]: 5.27999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.479e-05 [overlap_recompute_allgather_and_fa_grad]: 1.50001e-06 [overlap_recompute_comm]: 2.73e-06 [overlap_grad_ring_attention]: 6.21e-06 [overlap_grad_flash_sp]: 2.622e-05 [begin_end_overlap_inline]: 5.90022e-07 [split_matmul_comm_elemetwise]: 2.47001e-06 [split_layernorm_comm]: 2.16e-06 [handle_group_info]: 1.20999e-06 [symbol_engine_optimizer]: 9.766e-05, [1] [Cycle 1]: 9.194e-05, [6] [build]: 1.181e-05 [elim_shapecalc]: 1.341e-05 [elim_not_effective]: 1.544e-05 [opt_reshape]: 8.37998e-06 [fold_const_symbol]: 1.204e-05 [renormalize]: 2.3999e-07 [detach_backward]: 2.58e-06 [pipeline_parallel_scheduler]: 1.87999e-06 [auto_monad_reorder]: 2.224e-05 [get_jit_bprop_graph]: 1.74e-06 [rewriter_after_jit_bprop_graph]: 6.13002e-06 [opt_after_jit_grad]: 0.00053694 [validate]: 5.698e-05 [backend_pass]: 1.10001e-06 [task_emit]: 0.0284896 [execute]: 1.255e-05 Sums bootstrap : 0.000529s : 0.40% type_inference : 0.045134s : 33.83% event_method : 0.000062s : 0.05% auto_monad : 0.000151s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000041s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000012s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000058s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000043s : 0.03% optimize.rewriter_before_opt_a : 0.000174s : 0.13% optimize.opt_a.expand_dump_flag : 0.016641s : 12.47% optimize.opt_a.switch_simplify : 0.000178s : 0.13% optimize.opt_a.loop_unroll : 0.000119s : 0.09% optimize.opt_a.a_1 : 0.003288s : 2.46% optimize.opt_a.with_stream_mark : 0.000082s : 0.06% optimize.opt_a.recompute_prepare : 0.000047s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000022s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.01% optimize.opt_a.parameter_eliminate : 0.000008s : 0.01% optimize.opt_a.a_2 : 0.000468s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000063s : 0.05% optimize.opt_a.shard : 0.000007s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000012s : 0.01% optimize.opt_a.shard_inline : 0.000033s : 0.02% optimize.opt_a.merge_send_recv : 0.000040s : 0.03% optimize.opt_a.auto_parallel : 0.000037s : 0.03% optimize.opt_a.parallel : 0.000045s : 0.03% optimize.opt_a.flash_sp : 0.000021s : 0.02% optimize.opt_a.merge_comm : 0.000055s : 0.04% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000064s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000003s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000065s : 0.05% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000031s : 0.02% optimize.opt_a.virtual_output : 0.000031s : 0.02% optimize.opt_a.merge_forward : 0.000022s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000008s : 0.01% optimize.opt_a.offload_activation : 0.000048s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000075s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.013233s : 9.92% optimize.opt_a.flash_sp_send_recv_attached : 0.000010s : 0.01% optimize.opt_a.receive_attached : 0.000009s : 0.01% optimize.opt_a.after_resolve : 0.000119s : 0.09% optimize.opt_a.a_after_grad : 0.000122s : 0.09% optimize.opt_a.renormalize : 0.019844s : 14.87% optimize.opt_a.add_forward_monad_depend : 0.000030s : 0.02% optimize.opt_a.auto_monad_grad : 0.000014s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000107s : 0.08% optimize.opt_a.cse : 0.000310s : 0.23% optimize.opt_a.a_3 : 0.000485s : 0.36% optimize.py_interpret_to_execute_after_opt_a : 0.000023s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000054s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000778s : 0.58% optimize.opt_b.b_1 : 0.000145s : 0.11% optimize.opt_b.b_2 : 0.000010s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000028s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000037s : 0.03% optimize.loop_unroll : 0.000524s : 0.39% optimize.opt_after_cconv.c_1 : 0.000038s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000026s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000019s : 0.01% optimize.tuple_transform.d_1 : 0.000054s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000062s : 0.05% optimize.cse_after_recomputation.cse : 0.000017s : 0.01% optimize.environ_conv : 0.000011s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000015s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000537s : 0.40% validate : 0.000057s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.028490s : 21.35% execute : 0.000013s : 0.01% Time group info: ------[substitution.] 0.000925 161 8.32% : 0.000077s : 8: substitution.arithmetic_simplify 0.28% : 0.000003s : 3: substitution.elim_not_effective 0.54% : 0.000005s : 5: substitution.float_depend_g_call 0.47% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.22% : 0.000002s : 3: substitution.fold_const_symbol 0.81% : 0.000007s : 4: substitution.graph_param_transform 0.39% : 0.000004s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 58.86% : 0.000544s : 17: substitution.inline 2.47% : 0.000023s : 2: substitution.inline_without_move 1.38% : 0.000013s : 15: substitution.j_node_and_user_rematch 2.24% : 0.000021s : 3: substitution.less_batch_normalization 1.37% : 0.000013s : 7: substitution.minmaximum_grad 0.83% : 0.000008s : 5: substitution.partial_eliminate 1.61% : 0.000015s : 15: substitution.remove_not_recompute_node 3.78% : 0.000035s : 10: substitution.replace_applicator 1.47% : 0.000014s : 10: substitution.replace_old_param 0.31% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.75% : 0.000025s : 7: substitution.tuple_list_convert_item_index_to_positive 1.24% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.77% : 0.000016s : 7: substitution.tuple_list_get_item_depend_reorder 6.84% : 0.000063s : 19: substitution.tuple_list_get_item_eliminator 1.78% : 0.000017s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.045031 2 95.38% : 0.042951s : 1: type_inference.infer 4.62% : 0.002080s : 1: type_inference.specialize ------[replace.] 0.000242 27 63.59% : 0.000154s : 17: replace.inline 36.41% : 0.000088s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000566 27 94.24% : 0.000533s : 17: match.inline 5.76% : 0.000033s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000769 4248 1.10% : 0.000008s : 53: predicate.accumulaten_eliminater 0.22% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.44% : 0.000003s : 21: predicate.addn_check_dump 1.08% : 0.000008s : 53: predicate.addn_zero_filter 1.02% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 2.12% : 0.000016s : 74: predicate.arithmetic_simplify 1.17% : 0.000009s : 53: predicate.cast_eliminate 1.24% : 0.000010s : 50: predicate.check_bprop_eliminate 0.45% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.48% : 0.000004s : 21: predicate.depend_value_elim 1.13% : 0.000009s : 53: predicate.dict_get_item_const_eliminator 1.15% : 0.000009s : 53: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000001s : 4: predicate.elim_not_effective 0.16% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 57: predicate.environ_add_const_eliminate 1.11% : 0.000009s : 57: predicate.environ_get_add_eliminate 1.14% : 0.000009s : 57: predicate.environ_get_depend_swap 1.59% : 0.000012s : 78: predicate.environ_get_eliminate 1.10% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.70% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.70% : 0.000021s : 80: predicate.float_depend_g_call 0.47% : 0.000004s : 21: predicate.float_environ_get_switch 0.62% : 0.000005s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.52% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000001s : 4: predicate.graph_param_transform 0.50% : 0.000004s : 21: predicate.incorporate_call 0.43% : 0.000003s : 21: predicate.incorporate_call_switch 5.88% : 0.000045s : 183: predicate.inline 1.38% : 0.000011s : 45: predicate.inline_without_move 0.26% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.80% : 0.000006s : 21: predicate.less_batch_normalization 1.56% : 0.000012s : 71: predicate.list_to_tuple_eliminator_ 2.48% : 0.000019s : 124: predicate.load_eliminater 0.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.42% : 0.000019s : 113: predicate.loop_unroll_before_grad 1.36% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 21: predicate.merge_addn 1.09% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.06% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.07% : 0.000008s : 53: predicate.minmaximum_grad 0.35% : 0.000003s : 4: predicate.mutable_eliminate 0.10% : 0.000001s : 4: predicate.opt_reshape 0.13% : 0.000001s : 4: predicate.parallel_virtual_node 2.05% : 0.000016s : 80: predicate.partial_defer_inline 1.62% : 0.000012s : 67: predicate.partial_eliminate 1.20% : 0.000009s : 53: predicate.print_const_string_wrapper 0.52% : 0.000004s : 21: predicate.reduce_all_const_elim 1.34% : 0.000010s : 53: predicate.reduce_eliminate 2.59% : 0.000020s : 124: predicate.redundant_stop_gradient_eliminater 0.40% : 0.000003s : 21: predicate.remove_not_recompute_node 1.75% : 0.000013s : 113: predicate.replace_applicator 0.71% : 0.000005s : 45: predicate.replace_old_param 0.11% : 0.000001s : 4: predicate.reset_defer_inline 1.15% : 0.000009s : 53: predicate.reshape_eliminate 1.09% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 4: predicate.row_tensor_eliminate 1.37% : 0.000011s : 50: predicate.same_eliminate 0.31% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.78% : 0.000006s : 21: predicate.shard_identity_eliminate 0.24% : 0.000002s : 8: predicate.special_op_eliminate 0.57% : 0.000004s : 21: predicate.specialize_transform 1.45% : 0.000011s : 50: predicate.split_environ_get_set_with_tuple_value 1.23% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 80: predicate.switch_defer_inline 2.90% : 0.000022s : 130: predicate.switch_layer_defer_inline 5.82% : 0.000045s : 218: predicate.switch_simplify 1.14% : 0.000009s : 53: predicate.tile_eliminate 1.14% : 0.000009s : 53: predicate.transpose_eliminate 1.39% : 0.000011s : 61: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000012s : 61: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000011s : 61: predicate.tuple_list_get_item_depend_reorder 2.90% : 0.000022s : 92: predicate.tuple_list_get_item_eliminator 1.42% : 0.000011s : 61: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000016s : 82: predicate.tuple_list_set_item_eliminator 1.55% : 0.000012s : 71: predicate.tuple_to_list_eliminator_ 2.47% : 0.000019s : 124: predicate.updatestate_pure_node_eliminater 3.07% : 0.000024s : 145: predicate.updatestate_useless_node_eliminater 0.13% : 0.000001s : 4: predicate.value_based_eliminate 0.53% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002396 36 57.41% : 0.001376s : 15: func_graph_cloner_run.FuncGraphClonerGraph 42.59% : 0.001021s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.265098 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.35% : 0.003587s : 1: add_attr 1.35% : 0.003573s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000067s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.06% : 0.000159s : 1: auto_monad 0.01% : 0.000027s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.21% : 0.000567s : 1: bootstrap 0.02% : 0.000041s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000032s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000015s : 1: environ_conv 0.03% : 0.000072s : 1: event_method 0.01% : 0.000020s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000014s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.20% : 0.000534s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.30% : 0.000791s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000019s : 1: opt.transform.mutable_eliminate 1.89% : 0.005017s : 117: opt.transform.opt_a 0.01% : 0.000035s : 1: opt.transform.opt_after_cconv 0.01% : 0.000027s : 1: opt.transform.opt_after_jit_grad 0.05% : 0.000123s : 28: opt.transform.opt_b 0.02% : 0.000059s : 2: opt.transform.opt_trans_graph 0.02% : 0.000045s : 4: opt.transform.symbol_engine_opt 28.67% : 0.076015s : 1: opt_a 0.05% : 0.000133s : 1: opt_after_cconv 0.21% : 0.000549s : 1: opt_after_jit_grad 0.09% : 0.000250s : 1: opt_b 29.71% : 0.078760s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000030s : 1: overlap_grad_flash_sp 0.01% : 0.000019s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000063s : 1: pre_auto_parallel 0.02% : 0.000047s : 1: py_interpret_to_execute 0.01% : 0.000027s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000005s : 1: remove_cast_before_assign_add 0.01% : 0.000023s : 1: remove_dup_value 2.56% : 0.006781s : 2: renormalize.infer 4.92% : 0.013040s : 2: renormalize.specialize 0.00% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000059s : 1: rewriter_after_opt_a 0.07% : 0.000180s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.04% : 0.000101s : 1: symbol_engine_optimizer 10.76% : 0.028517s : 1: task_emit 0.04% : 0.000094s : 1: tuple_transform 17.04% : 0.045163s : 1: type_inference 0.04% : 0.000100s : 1: validate TotalTime = 0.0841586, [24] [bootstrap]: 0.00047046 [type_inference]: 0.0331976 [event_method]: 1.754e-05 [auto_monad]: 7.059e-05 [graph_reusing]: 6.56e-06 [inline]: 3.28e-06 [add_attr]: 0.00513509, [1] [add_attr_with_inline]: 0.00512138, [1] [Cycle 1]: 7.3e-05, [2] [tag_attr]: 1.948e-05 [meta_addattr_fg_expand]: 4.88001e-06 [parallel-infer-symbol]: 4.43999e-06 [pre_auto_parallel]: 3.548e-05 [insert-virtual-dataset]: 2.68998e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 2.27001e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.0189136, [53] [py_interpret_to_execute]: 2.738e-05 [rewriter_before_opt_a]: 6.692e-05 [opt_a]: 0.0164158, [2] [Cycle 1]: 0.014465, [45] [expand_dump_flag]: 3.15002e-06 [switch_simplify]: 3.304e-05 [loop_unroll]: 2.01e-05 [a_1]: 0.00044343 [with_stream_mark]: 2.279e-05 [recompute_prepare]: 1.138e-05 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 3.9e-06 [updatestate_loads_eliminate]: 3.63e-06 [parameter_eliminate]: 2.06998e-06 [a_2]: 9.022e-05 [accelerated_algorithm]: 7.85e-06 [shard]: 2.94001e-06 [meta_shard_fg_expand]: 2.36998e-06 [shard_inline]: 7.39002e-06 [merge_send_recv]: 9.51e-06 [auto_parallel]: 8.42e-06 [parallel]: 2.219e-05 [flash_sp]: 1.014e-05 [merge_comm]: 4.60001e-06 [allreduce_fusion]: 4.15e-06 [matmul_add_comm_reduction]: 1.096e-05 [allreduce_slice_to_reducescatter]: 8.89995e-07 [virtual_shard_identity]: 9.30001e-06 [virtual_dataset]: 6.535e-05 [get_grad_eliminate_]: 7.08998e-06 [virtual_output]: 6.78998e-06 [merge_forward]: 4.60999e-06 [cell_reuse_recompute_pass]: 1.74998e-06 [offload_activation]: 1.311e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.474e-05 [merge_recompute_call_nodes]: 2.18998e-06 [before_grad]: 1.33e-05 [set_forward_comm_id_for_comm_node_pass]: 4.79e-06 [meta_fg_expand]: 3.66001e-06 [flash_sp_send_recv_attached]: 4.02e-06 [receive_attached]: 2.74999e-06 [after_resolve]: 1.223e-05 [a_after_grad]: 1.166e-05 [renormalize]: 0.0131079 [add_forward_monad_depend]: 1.203e-05 [auto_monad_grad]: 3.11001e-06 [auto_monad_eliminator]: 2.324e-05 [cse]: 3.508e-05 [a_3]: 6.32e-05 [Cycle 2]: 0.0019352, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 9.34e-06 [loop_unroll]: 7.38e-06 [a_1]: 0.00015319 [with_stream_mark]: 1.87e-05 [recompute_prepare]: 7.11001e-06 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 4.14002e-06 [parameter_eliminate]: 2.37999e-06 [a_2]: 0.00119058 [accelerated_algorithm]: 9.71e-06 [shard]: 3.16001e-06 [meta_shard_fg_expand]: 2.44999e-06 [shard_inline]: 7.15e-06 [merge_send_recv]: 1.371e-05 [auto_parallel]: 1.172e-05 [parallel]: 1.07e-05 [flash_sp]: 5.09e-06 [merge_comm]: 3.96001e-06 [allreduce_fusion]: 3.88001e-06 [matmul_add_comm_reduction]: 1.098e-05 [allreduce_slice_to_reducescatter]: 1.13001e-06 [virtual_shard_identity]: 7.93999e-06 [virtual_dataset]: 6.31998e-06 [get_grad_eliminate_]: 5.91998e-06 [virtual_output]: 5.77999e-06 [merge_forward]: 5.72999e-06 [cell_reuse_recompute_pass]: 4.15e-06 [offload_activation]: 1.213e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.592e-05 [merge_recompute_call_nodes]: 1.61998e-06 [before_grad]: 1.168e-05 [set_forward_comm_id_for_comm_node_pass]: 4.78001e-06 [meta_fg_expand]: 2.74001e-06 [flash_sp_send_recv_attached]: 1.89999e-06 [receive_attached]: 2.48e-06 [after_resolve]: 1.269e-05 [a_after_grad]: 9.72999e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 4.13999e-06 [auto_monad_grad]: 2.30002e-06 [auto_monad_eliminator]: 1.424e-05 [cse]: 2.364e-05 [a_3]: 3.592e-05 [py_interpret_to_execute_after_opt_a]: 1.854e-05 [slice_cell_reuse_recomputed_activation]: 2.61999e-06 [rewriter_after_opt_a]: 4.327e-05 [convert_after_rewriter]: 7.48e-06 [order_py_execute_after_rewriter]: 6.23002e-06 [mutable_eliminate]: 0.00080407 [opt_b]: 0.00022621, [1] [Cycle 1]: 0.00021635, [7] [b_1]: 0.00012458 [b_2]: 8.17e-06 [updatestate_depend_eliminate]: 9.99001e-06 [updatestate_assign_eliminate]: 3.48e-06 [updatestate_loads_eliminate]: 3.21001e-06 [renormalize]: 1.07998e-06 [cse]: 2.69e-05 [optimize_parallel_all_gather_comm]: 2.015e-05 [overlap_param_gather]: 1.79998e-06 [cconv]: 3.496e-05 [loop_unroll]: 0.00050637 [opt_after_cconv]: 0.00010934, [1] [Cycle 1]: 0.00010208, [7] [c_1]: 2.934e-05 [parameter_eliminate]: 4.17e-06 [updatestate_depend_eliminate]: 5.96e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 2.48002e-06 [cse]: 1.949e-05 [renormalize]: 8.70001e-07 [remove_dup_value]: 1.698e-05 [tuple_transform]: 8.022e-05, [1] [Cycle 1]: 7.435e-05, [4] [d_1]: 4.476e-05 [none_parameter_eliminate]: 1.77999e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 7.24001e-06 [partial_unused_args_eliminate]: 2.36e-06 [add_recomputation]: 5.25e-05 [cse_after_recomputation]: 2.348e-05, [1] [Cycle 1]: 1.791e-05, [1] [cse]: 1.145e-05 [environ_conv]: 6.44999e-06 [swap_dp_allreduce_reducescatter]: 5.55001e-06 [bias_add_comm_swap]: 3.11999e-06 [label_micro_interleaved_index]: 5.43002e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.47001e-06 [slice_recompute_activation]: 2.63e-06 [micro_interleaved_order_control]: 2.27001e-06 [assign_add_opt]: 1.50999e-06 [ForceFp32Comm]: 1.09998e-06 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.27001e-06 [reorder_send_recv_between_fp_bp]: 3.01001e-06 [comm_op_add_attrs]: 1.17999e-06 [add_comm_op_reuse_tag]: 1.07998e-06 [interleave_split_concat_branches]: 1.40999e-06 [interleave_parallel_branches]: 1.12999e-06 [overlap_opt_shard_in_pipeline]: 1.55001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.448e-05 [grouped_pairwise_exchange_alltoall]: 2.02001e-06 [offloading_packed_experts]: 4.33999e-06 [overlap_recompute_and_grad_model_parallel]: 5.04e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.66999e-06 [overlap_grad_ring_attention]: 4.82e-06 [overlap_grad_flash_sp]: 2.308e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.59999e-06 [split_layernorm_comm]: 2.04e-06 [handle_group_info]: 1.14e-06 [symbol_engine_optimizer]: 8.305e-05, [1] [Cycle 1]: 7.756e-05, [6] [build]: 3.45e-06 [elim_shapecalc]: 1.118e-05 [elim_not_effective]: 1.345e-05 [opt_reshape]: 7.31999e-06 [fold_const_symbol]: 1.115e-05 [renormalize]: 2.3999e-07 [detach_backward]: 2.84999e-06 [pipeline_parallel_scheduler]: 1.84998e-06 [auto_monad_reorder]: 1.803e-05 [get_jit_bprop_graph]: 1.80001e-06 [rewriter_after_jit_bprop_graph]: 4.35999e-06 [opt_after_jit_grad]: 0.00053636 [validate]: 4.729e-05 [backend_pass]: 1.40001e-06 [task_emit]: 0.0253557 [execute]: 9.54999e-06 Sums bootstrap : 0.000470s : 0.61% type_inference : 0.033198s : 42.71% event_method : 0.000018s : 0.02% auto_monad : 0.000071s : 0.09% graph_reusing : 0.000007s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000019s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000035s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000027s : 0.04% optimize.rewriter_before_opt_a : 0.000067s : 0.09% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000042s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000597s : 0.77% optimize.opt_a.with_stream_mark : 0.000041s : 0.05% optimize.opt_a.recompute_prepare : 0.000018s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000011s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000008s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.001281s : 1.65% optimize.opt_a.accelerated_algorithm : 0.000018s : 0.02% optimize.opt_a.shard : 0.000006s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000005s : 0.01% optimize.opt_a.shard_inline : 0.000015s : 0.02% optimize.opt_a.merge_send_recv : 0.000023s : 0.03% optimize.opt_a.auto_parallel : 0.000020s : 0.03% optimize.opt_a.parallel : 0.000033s : 0.04% optimize.opt_a.flash_sp : 0.000015s : 0.02% optimize.opt_a.merge_comm : 0.000009s : 0.01% optimize.opt_a.allreduce_fusion : 0.000008s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000022s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000017s : 0.02% optimize.opt_a.virtual_dataset : 0.000072s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000013s : 0.02% optimize.opt_a.virtual_output : 0.000013s : 0.02% optimize.opt_a.merge_forward : 0.000010s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.01% optimize.opt_a.offload_activation : 0.000025s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000031s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000025s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000010s : 0.01% optimize.opt_a.meta_fg_expand : 0.000006s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000025s : 0.03% optimize.opt_a.a_after_grad : 0.000021s : 0.03% optimize.opt_a.renormalize : 0.013108s : 16.86% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.02% optimize.opt_a.auto_monad_grad : 0.000005s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000037s : 0.05% optimize.opt_a.cse : 0.000059s : 0.08% optimize.opt_a.a_3 : 0.000099s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000019s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000043s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000804s : 1.03% optimize.opt_b.b_1 : 0.000125s : 0.16% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000027s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000035s : 0.04% optimize.loop_unroll : 0.000506s : 0.65% optimize.opt_after_cconv.c_1 : 0.000029s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000017s : 0.02% optimize.tuple_transform.d_1 : 0.000045s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000023s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000536s : 0.69% validate : 0.000047s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.025356s : 32.62% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000205 24 22.81% : 0.000047s : 4: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000002s : 2: substitution.fold_const_symbol 3.23% : 0.000007s : 3: substitution.graph_param_transform 63.31% : 0.000130s : 3: substitution.inline 2.98% : 0.000006s : 4: substitution.j_node_and_user_rematch 2.95% : 0.000006s : 4: substitution.remove_not_recompute_node 2.83% : 0.000006s : 2: substitution.replace_old_param ------[type_inference.] 0.033129 2 97.90% : 0.032433s : 1: type_inference.infer 2.10% : 0.000696s : 1: type_inference.specialize ------[replace.] 0.000038 3 100.00% : 0.000038s : 3: replace.inline ------[match.] 0.000127 3 100.00% : 0.000127s : 3: match.inline ------[predicate.] 0.000183 815 0.86% : 0.000002s : 8: predicate.accumulaten_eliminater 0.92% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 1.05% : 0.000002s : 8: predicate.addn_zero_filter 0.83% : 0.000002s : 8: predicate.adjust_all_reduce_mul_add 2.81% : 0.000005s : 14: predicate.arithmetic_simplify 1.06% : 0.000002s : 8: predicate.cast_eliminate 0.70% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.18% : 0.000000s : 3: predicate.const_output_eliminate 0.82% : 0.000002s : 6: predicate.depend_value_elim 0.77% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.93% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.97% : 0.000002s : 8: predicate.dict_set_item_eliminator 1.38% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.20% : 0.000000s : 3: predicate.elim_not_effective 0.50% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.07% : 0.000002s : 11: predicate.environ_add_const_eliminate 0.98% : 0.000002s : 11: predicate.environ_get_add_eliminate 0.95% : 0.000002s : 11: predicate.environ_get_depend_swap 1.57% : 0.000003s : 17: predicate.environ_get_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.12% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.08% : 0.000004s : 11: predicate.float_depend_g_call 0.55% : 0.000001s : 6: predicate.float_environ_get_switch 1.34% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 3: predicate.fold_const_symbol 0.82% : 0.000002s : 6: predicate.get_grad_eliminate 0.54% : 0.000001s : 3: predicate.graph_param_transform 0.77% : 0.000001s : 6: predicate.incorporate_call 0.52% : 0.000001s : 6: predicate.incorporate_call_switch 6.56% : 0.000012s : 37: predicate.inline 1.36% : 0.000002s : 6: predicate.inline_without_move 0.35% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.01% : 0.000002s : 6: predicate.less_batch_normalization 1.81% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.01% : 0.000004s : 22: predicate.load_eliminater 1.01% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.84% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.56% : 0.000001s : 6: predicate.merge_addn 0.59% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.28% : 0.000002s : 3: predicate.mutable_eliminate 0.52% : 0.000001s : 3: predicate.opt_reshape 0.47% : 0.000001s : 3: predicate.parallel_virtual_node 1.58% : 0.000003s : 11: predicate.partial_defer_inline 1.11% : 0.000002s : 11: predicate.partial_eliminate 0.89% : 0.000002s : 8: predicate.print_const_string_wrapper 0.73% : 0.000001s : 6: predicate.reduce_all_const_elim 1.17% : 0.000002s : 8: predicate.reduce_eliminate 2.09% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.61% : 0.000001s : 6: predicate.remove_not_recompute_node 1.11% : 0.000002s : 14: predicate.replace_applicator 0.67% : 0.000001s : 6: predicate.replace_old_param 0.38% : 0.000001s : 3: predicate.reset_defer_inline 0.99% : 0.000002s : 8: predicate.reshape_eliminate 0.74% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 3: predicate.row_tensor_eliminate 1.03% : 0.000002s : 6: predicate.same_eliminate 0.40% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.93% : 0.000002s : 6: predicate.shard_identity_eliminate 0.74% : 0.000001s : 6: predicate.special_op_eliminate 0.96% : 0.000002s : 6: predicate.specialize_transform 1.10% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.63% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.11% : 0.000002s : 11: predicate.switch_defer_inline 1.65% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.43% : 0.000008s : 38: predicate.switch_simplify 0.95% : 0.000002s : 8: predicate.tile_eliminate 0.89% : 0.000002s : 8: predicate.transpose_eliminate 1.48% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.56% : 0.000003s : 14: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000006s : 20: predicate.tuple_list_get_item_eliminator 1.47% : 0.000003s : 14: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.47% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.00% : 0.000004s : 22: predicate.updatestate_pure_node_eliminater 2.69% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 3: predicate.value_based_eliminate 0.74% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.28% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.70% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000492 7 25.54% : 0.000126s : 2: func_graph_cloner_run.FuncGraphClonerGraph 74.46% : 0.000366s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.122568 196 0.00% : 0.000004s : 1: ForceFp32Comm 4.20% : 0.005143s : 1: add_attr 4.18% : 0.005126s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000057s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.06% : 0.000077s : 1: auto_monad 0.02% : 0.000022s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.41% : 0.000499s : 1: bootstrap 0.03% : 0.000040s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000027s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000010s : 1: environ_conv 0.02% : 0.000026s : 1: event_method 0.06% : 0.000068s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000011s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000007s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.42% : 0.000517s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.67% : 0.000820s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000019s : 1: opt.transform.mutable_eliminate 0.90% : 0.001101s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000026s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000098s : 28: opt.transform.opt_b 0.04% : 0.000049s : 2: opt.transform.opt_trans_graph 0.03% : 0.000039s : 4: opt.transform.symbol_engine_opt 13.40% : 0.016420s : 1: opt_a 0.09% : 0.000113s : 1: opt_after_cconv 0.45% : 0.000549s : 1: opt_after_jit_grad 0.19% : 0.000231s : 1: opt_b 15.44% : 0.018920s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000040s : 1: pre_auto_parallel 0.03% : 0.000032s : 1: py_interpret_to_execute 0.02% : 0.000023s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000021s : 1: remove_dup_value 10.31% : 0.012635s : 1: renormalize.infer 0.37% : 0.000455s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000048s : 1: rewriter_after_opt_a 0.06% : 0.000072s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000086s : 1: symbol_engine_optimizer 20.70% : 0.025376s : 1: task_emit 0.07% : 0.000084s : 1: tuple_transform 27.11% : 0.033229s : 1: type_inference 0.07% : 0.000087s : 1: validate TotalTime = 0.145649, [24] [bootstrap]: 0.00051359 [type_inference]: 0.0554042 [event_method]: 5.55e-05 [auto_monad]: 0.00015011 [graph_reusing]: 9.44e-06 [inline]: 3.06999e-06 [add_attr]: 0.00963818, [1] [add_attr_with_inline]: 0.00962391, [1] [Cycle 1]: 0.00010304, [2] [tag_attr]: 4.162e-05 [meta_addattr_fg_expand]: 1.104e-05 [parallel-infer-symbol]: 4.77e-06 [pre_auto_parallel]: 5.463e-05 [insert-virtual-dataset]: 3.03e-06 [parallel-infer-symbol-second]: 9.40025e-07 [dataset_repeat_opt]: 2.46e-06 [pipeline_split]: 2.02999e-06 [optimize]: 0.0705734, [53] [py_interpret_to_execute]: 4.119e-05 [rewriter_before_opt_a]: 0.00017037 [opt_a]: 0.0633818, [3] [Cycle 1]: 0.046349, [45] [expand_dump_flag]: 7.34002e-06 [switch_simplify]: 7.777e-05 [loop_unroll]: 6.218e-05 [a_1]: 0.00157453 [with_stream_mark]: 3.038e-05 [recompute_prepare]: 2.762e-05 [updatestate_depend_eliminate]: 9.76003e-06 [updatestate_assign_eliminate]: 7.56001e-06 [updatestate_loads_eliminate]: 7.08e-06 [parameter_eliminate]: 3.50003e-06 [a_2]: 0.00025754 [accelerated_algorithm]: 3.735e-05 [shard]: 2.51e-06 [meta_shard_fg_expand]: 4.23001e-06 [shard_inline]: 1.767e-05 [merge_send_recv]: 1.774e-05 [auto_parallel]: 1.521e-05 [parallel]: 2.133e-05 [flash_sp]: 1.705e-05 [merge_comm]: 1.01e-05 [allreduce_fusion]: 9.63002e-06 [matmul_add_comm_reduction]: 3.225e-05 [allreduce_slice_to_reducescatter]: 7.50006e-07 [virtual_shard_identity]: 2.171e-05 [virtual_dataset]: 1.627e-05 [get_grad_eliminate_]: 1.611e-05 [virtual_output]: 1.555e-05 [merge_forward]: 9.81e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 1.96e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.441e-05 [merge_recompute_call_nodes]: 1.57999e-06 [before_grad]: 3.077e-05 [set_forward_comm_id_for_comm_node_pass]: 1.102e-05 [meta_fg_expand]: 0.0164721 [flash_sp_send_recv_attached]: 6.02999e-06 [receive_attached]: 2.69001e-06 [after_resolve]: 8.805e-05 [a_after_grad]: 0.00010409 [renormalize]: 0.0238162 [add_forward_monad_depend]: 1.599e-05 [auto_monad_grad]: 7.63001e-06 [auto_monad_eliminator]: 6.548e-05 [cse]: 0.00254335 [a_3]: 0.00037602 [Cycle 2]: 0.0161096, [45] [expand_dump_flag]: 3.68e-06 [switch_simplify]: 4.956e-05 [loop_unroll]: 4.366e-05 [a_1]: 0.00352988 [with_stream_mark]: 2.494e-05 [recompute_prepare]: 1.386e-05 [updatestate_depend_eliminate]: 5.80002e-06 [updatestate_assign_eliminate]: 4.03001e-06 [updatestate_loads_eliminate]: 3.57002e-06 [parameter_eliminate]: 2.91e-06 [a_2]: 0.000101 [accelerated_algorithm]: 1.393e-05 [shard]: 2.30002e-06 [meta_shard_fg_expand]: 2.78e-06 [shard_inline]: 7.83999e-06 [merge_send_recv]: 1.074e-05 [auto_parallel]: 1.158e-05 [parallel]: 1.064e-05 [flash_sp]: 4.99e-06 [merge_comm]: 4.40999e-06 [allreduce_fusion]: 4.24002e-06 [matmul_add_comm_reduction]: 1.116e-05 [allreduce_slice_to_reducescatter]: 7.59988e-07 [virtual_shard_identity]: 1.022e-05 [virtual_dataset]: 7.49002e-06 [get_grad_eliminate_]: 6.73e-06 [virtual_output]: 6.58e-06 [merge_forward]: 5.71998e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 1.225e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.652e-05 [merge_recompute_call_nodes]: 1.54998e-06 [before_grad]: 1.348e-05 [set_forward_comm_id_for_comm_node_pass]: 5.22999e-06 [meta_fg_expand]: 0.00011094 [flash_sp_send_recv_attached]: 2.26998e-06 [receive_attached]: 2.78998e-06 [after_resolve]: 2.095e-05 [a_after_grad]: 1.25e-05 [renormalize]: 0.0115004 [add_forward_monad_depend]: 1.213e-05 [auto_monad_grad]: 2.84999e-06 [auto_monad_eliminator]: 2.687e-05 [cse]: 4.756e-05 [a_3]: 7.435e-05 [Cycle 3]: 0.00090046, [45] [expand_dump_flag]: 2.84001e-06 [switch_simplify]: 1.204e-05 [loop_unroll]: 7.42998e-06 [a_1]: 0.00019286 [with_stream_mark]: 1.7e-05 [recompute_prepare]: 7.96001e-06 [updatestate_depend_eliminate]: 5.71e-06 [updatestate_assign_eliminate]: 3.88001e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 1.96e-06 [a_2]: 9.542e-05 [accelerated_algorithm]: 1.366e-05 [shard]: 2.34001e-06 [meta_shard_fg_expand]: 2.46998e-06 [shard_inline]: 7.93001e-06 [merge_send_recv]: 1.142e-05 [auto_parallel]: 1.195e-05 [parallel]: 1.197e-05 [flash_sp]: 1.97001e-06 [merge_comm]: 4.99003e-06 [allreduce_fusion]: 4.36002e-06 [matmul_add_comm_reduction]: 1.356e-05 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 1.209e-05 [virtual_dataset]: 7.31001e-06 [get_grad_eliminate_]: 7.62998e-06 [virtual_output]: 6.96001e-06 [merge_forward]: 5.97999e-06 [cell_reuse_recompute_pass]: 3.33998e-06 [offload_activation]: 1.195e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.744e-05 [merge_recompute_call_nodes]: 1.82001e-06 [before_grad]: 1.442e-05 [set_forward_comm_id_for_comm_node_pass]: 5.37999e-06 [meta_fg_expand]: 3.21999e-06 [flash_sp_send_recv_attached]: 2.11003e-06 [receive_attached]: 3.56001e-06 [after_resolve]: 1.365e-05 [a_after_grad]: 1.066e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 2.20002e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.219e-05 [cse]: 2.535e-05 [a_3]: 4.475e-05 [py_interpret_to_execute_after_opt_a]: 2.217e-05 [slice_cell_reuse_recomputed_activation]: 2.56e-06 [rewriter_after_opt_a]: 5.64e-05 [convert_after_rewriter]: 1.084e-05 [order_py_execute_after_rewriter]: 6.14999e-06 [mutable_eliminate]: 0.00296981 [opt_b]: 0.0002751, [1] [Cycle 1]: 0.00026388, [7] [b_1]: 0.00015072 [b_2]: 1.098e-05 [updatestate_depend_eliminate]: 1.134e-05 [updatestate_assign_eliminate]: 3.91999e-06 [updatestate_loads_eliminate]: 4.02e-06 [renormalize]: 6.10016e-07 [cse]: 4.089e-05 [optimize_parallel_all_gather_comm]: 2.587e-05 [overlap_param_gather]: 2.31e-06 [cconv]: 3.338e-05 [loop_unroll]: 0.00055912 [opt_after_cconv]: 0.0022406, [1] [Cycle 1]: 0.00223098, [7] [c_1]: 3.68e-05 [parameter_eliminate]: 4.68999e-06 [updatestate_depend_eliminate]: 8.90999e-06 [updatestate_assign_eliminate]: 3.5e-06 [updatestate_loads_eliminate]: 3.4e-06 [cse]: 0.00208093 [renormalize]: 9.30013e-07 [remove_dup_value]: 2.11e-05 [tuple_transform]: 0.00011817, [1] [Cycle 1]: 0.00010948, [4] [d_1]: 7.089e-05 [none_parameter_eliminate]: 4.57e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 1.013e-05 [partial_unused_args_eliminate]: 2.16998e-06 [add_recomputation]: 7.733e-05 [cse_after_recomputation]: 3.247e-05, [1] [Cycle 1]: 2.716e-05, [1] [cse]: 2.018e-05 [environ_conv]: 1.154e-05 [swap_dp_allreduce_reducescatter]: 7.38e-06 [bias_add_comm_swap]: 4.13001e-06 [label_micro_interleaved_index]: 8.18001e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.89e-06 [slice_recompute_activation]: 2.41998e-06 [micro_interleaved_order_control]: 2.68e-06 [assign_add_opt]: 1.72001e-06 [ForceFp32Comm]: 9.49978e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.37001e-06 [reorder_send_recv_between_fp_bp]: 3.36999e-06 [comm_op_add_attrs]: 1.22999e-06 [add_comm_op_reuse_tag]: 1.38002e-06 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 1.18001e-06 [overlap_opt_shard_in_pipeline]: 1.26002e-06 [overlap_opt_shard_grad_in_pipeline]: 2.19999e-06 [control_data_broadcast_order]: 1.853e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 5.29e-06 [overlap_recompute_and_grad_model_parallel]: 5.67999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.56998e-06 [overlap_recompute_comm]: 2.41998e-06 [overlap_grad_ring_attention]: 5.74e-06 [overlap_grad_flash_sp]: 2.632e-05 [begin_end_overlap_inline]: 5.60016e-07 [split_matmul_comm_elemetwise]: 2.79999e-06 [split_layernorm_comm]: 1.95001e-06 [handle_group_info]: 1.14998e-06 [symbol_engine_optimizer]: 0.00011015, [1] [Cycle 1]: 0.00010462, [6] [build]: 1.196e-05 [elim_shapecalc]: 1.796e-05 [elim_not_effective]: 1.785e-05 [opt_reshape]: 8.32998e-06 [fold_const_symbol]: 1.318e-05 [renormalize]: 2.30008e-07 [detach_backward]: 2.84999e-06 [pipeline_parallel_scheduler]: 2.07999e-06 [auto_monad_reorder]: 2.408e-05 [get_jit_bprop_graph]: 2.07001e-06 [rewriter_after_jit_bprop_graph]: 7.46999e-06 [opt_after_jit_grad]: 0.00069445 [validate]: 6.1e-05 [backend_pass]: 1.05001e-06 [task_emit]: 0.00813868 [execute]: 1.058e-05 Sums bootstrap : 0.000514s : 0.38% type_inference : 0.055404s : 41.28% event_method : 0.000055s : 0.04% auto_monad : 0.000150s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000042s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.01% parallel-infer-symbol : 0.000005s : 0.00% pre_auto_parallel : 0.000055s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000041s : 0.03% optimize.rewriter_before_opt_a : 0.000170s : 0.13% optimize.opt_a.expand_dump_flag : 0.000014s : 0.01% optimize.opt_a.switch_simplify : 0.000139s : 0.10% optimize.opt_a.loop_unroll : 0.000113s : 0.08% optimize.opt_a.a_1 : 0.005297s : 3.95% optimize.opt_a.with_stream_mark : 0.000072s : 0.05% optimize.opt_a.recompute_prepare : 0.000049s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.01% optimize.opt_a.parameter_eliminate : 0.000008s : 0.01% optimize.opt_a.a_2 : 0.000454s : 0.34% optimize.opt_a.accelerated_algorithm : 0.000065s : 0.05% optimize.opt_a.shard : 0.000007s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.01% optimize.opt_a.shard_inline : 0.000033s : 0.02% optimize.opt_a.merge_send_recv : 0.000040s : 0.03% optimize.opt_a.auto_parallel : 0.000039s : 0.03% optimize.opt_a.parallel : 0.000044s : 0.03% optimize.opt_a.flash_sp : 0.000024s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000018s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000057s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000044s : 0.03% optimize.opt_a.virtual_dataset : 0.000031s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000030s : 0.02% optimize.opt_a.virtual_output : 0.000029s : 0.02% optimize.opt_a.merge_forward : 0.000022s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000044s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000068s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000059s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.02% optimize.opt_a.meta_fg_expand : 0.016586s : 12.36% optimize.opt_a.flash_sp_send_recv_attached : 0.000010s : 0.01% optimize.opt_a.receive_attached : 0.000009s : 0.01% optimize.opt_a.after_resolve : 0.000123s : 0.09% optimize.opt_a.a_after_grad : 0.000127s : 0.09% optimize.opt_a.renormalize : 0.035317s : 26.31% optimize.opt_a.add_forward_monad_depend : 0.000030s : 0.02% optimize.opt_a.auto_monad_grad : 0.000012s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000105s : 0.08% optimize.opt_a.cse : 0.002616s : 1.95% optimize.opt_a.a_3 : 0.000495s : 0.37% optimize.py_interpret_to_execute_after_opt_a : 0.000022s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000056s : 0.04% optimize.convert_after_rewriter : 0.000011s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.002970s : 2.21% optimize.opt_b.b_1 : 0.000151s : 0.11% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000011s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000041s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000033s : 0.02% optimize.loop_unroll : 0.000559s : 0.42% optimize.opt_after_cconv.c_1 : 0.000037s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.002081s : 1.55% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000021s : 0.02% optimize.tuple_transform.d_1 : 0.000071s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000005s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000077s : 0.06% optimize.cse_after_recomputation.cse : 0.000020s : 0.02% optimize.environ_conv : 0.000012s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.01% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000008s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000019s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000018s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000007s : 0.01% opt_after_jit_grad : 0.000694s : 0.52% validate : 0.000061s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.008139s : 6.06% execute : 0.000011s : 0.01% Time group info: ------[substitution.] 0.001023 159 7.48% : 0.000077s : 7: substitution.arithmetic_simplify 0.30% : 0.000003s : 3: substitution.elim_not_effective 0.63% : 0.000006s : 5: substitution.float_depend_g_call 0.41% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.21% : 0.000002s : 3: substitution.fold_const_symbol 0.83% : 0.000008s : 4: substitution.graph_param_transform 0.40% : 0.000004s : 2: substitution.incorporate_call 0.21% : 0.000002s : 2: substitution.incorporate_call_switch 59.15% : 0.000605s : 17: substitution.inline 2.63% : 0.000027s : 2: substitution.inline_without_move 1.32% : 0.000014s : 15: substitution.j_node_and_user_rematch 2.13% : 0.000022s : 3: substitution.less_batch_normalization 1.36% : 0.000014s : 7: substitution.minmaximum_grad 0.75% : 0.000008s : 5: substitution.partial_eliminate 1.53% : 0.000016s : 15: substitution.remove_not_recompute_node 4.21% : 0.000043s : 10: substitution.replace_applicator 1.45% : 0.000015s : 10: substitution.replace_old_param 0.33% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.70% : 0.000028s : 7: substitution.tuple_list_convert_item_index_to_positive 1.28% : 0.000013s : 7: substitution.tuple_list_get_item_const_eliminator 1.64% : 0.000017s : 7: substitution.tuple_list_get_item_depend_reorder 7.31% : 0.000075s : 18: substitution.tuple_list_get_item_eliminator 1.77% : 0.000018s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.055293 2 96.40% : 0.053305s : 1: type_inference.infer 3.60% : 0.001988s : 1: type_inference.specialize ------[replace.] 0.000269 26 59.31% : 0.000160s : 17: replace.inline 40.69% : 0.000109s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000633 26 93.85% : 0.000594s : 17: match.inline 6.15% : 0.000039s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000773 4180 1.15% : 0.000009s : 52: predicate.accumulaten_eliminater 0.35% : 0.000003s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000004s : 21: predicate.addn_check_dump 1.10% : 0.000009s : 52: predicate.addn_zero_filter 1.01% : 0.000008s : 52: predicate.adjust_all_reduce_mul_add 2.27% : 0.000018s : 73: predicate.arithmetic_simplify 1.12% : 0.000009s : 52: predicate.cast_eliminate 1.05% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000004s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.46% : 0.000004s : 21: predicate.depend_value_elim 1.31% : 0.000010s : 52: predicate.dict_get_item_const_eliminator 1.14% : 0.000009s : 52: predicate.dict_get_item_eliminator 1.19% : 0.000009s : 52: predicate.dict_set_item_eliminator 0.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 4: predicate.elim_not_effective 0.24% : 0.000002s : 4: predicate.elim_shapecalc_of_broadcastargs 1.28% : 0.000010s : 56: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 56: predicate.environ_get_add_eliminate 1.08% : 0.000008s : 56: predicate.environ_get_depend_swap 1.62% : 0.000012s : 77: predicate.environ_get_eliminate 1.17% : 0.000009s : 56: predicate.environ_get_set_eliminate 1.69% : 0.000013s : 78: predicate.exchange_switch_depend_value 2.59% : 0.000020s : 78: predicate.float_depend_g_call 0.46% : 0.000004s : 21: predicate.float_environ_get_switch 0.60% : 0.000005s : 25: predicate.float_tuple_getitem_switch 0.05% : 0.000000s : 4: predicate.fold_const_symbol 0.54% : 0.000004s : 21: predicate.get_grad_eliminate 0.13% : 0.000001s : 4: predicate.graph_param_transform 0.51% : 0.000004s : 21: predicate.incorporate_call 0.42% : 0.000003s : 21: predicate.incorporate_call_switch 5.71% : 0.000044s : 180: predicate.inline 1.41% : 0.000011s : 45: predicate.inline_without_move 0.30% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.75% : 0.000006s : 21: predicate.less_batch_normalization 1.54% : 0.000012s : 69: predicate.list_to_tuple_eliminator_ 2.45% : 0.000019s : 121: predicate.load_eliminater 0.39% : 0.000003s : 4: predicate.loop_unroll_after_grad 2.29% : 0.000018s : 110: predicate.loop_unroll_before_grad 1.35% : 0.000010s : 60: predicate.make_slice_get_slice_eliminator 0.49% : 0.000004s : 21: predicate.merge_addn 1.14% : 0.000009s : 50: predicate.micro_step_allgather_replace 1.05% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.10% : 0.000009s : 52: predicate.minmaximum_grad 0.39% : 0.000003s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.14% : 0.000001s : 4: predicate.parallel_virtual_node 2.10% : 0.000016s : 78: predicate.partial_defer_inline 1.60% : 0.000012s : 65: predicate.partial_eliminate 1.08% : 0.000008s : 52: predicate.print_const_string_wrapper 0.48% : 0.000004s : 21: predicate.reduce_all_const_elim 1.53% : 0.000012s : 52: predicate.reduce_eliminate 2.46% : 0.000019s : 121: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 21: predicate.remove_not_recompute_node 1.81% : 0.000014s : 111: predicate.replace_applicator 0.80% : 0.000006s : 45: predicate.replace_old_param 0.10% : 0.000001s : 4: predicate.reset_defer_inline 1.16% : 0.000009s : 52: predicate.reshape_eliminate 1.06% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 4: predicate.row_tensor_eliminate 1.47% : 0.000011s : 50: predicate.same_eliminate 0.37% : 0.000003s : 21: predicate.set_cell_output_no_recompute 0.59% : 0.000005s : 21: predicate.shard_identity_eliminate 0.30% : 0.000002s : 8: predicate.special_op_eliminate 0.57% : 0.000004s : 21: predicate.specialize_transform 1.37% : 0.000011s : 50: predicate.split_environ_get_set_with_tuple_value 1.29% : 0.000010s : 45: predicate.stack_unstack_eliminate 0.13% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.82% : 0.000014s : 78: predicate.switch_defer_inline 2.88% : 0.000022s : 128: predicate.switch_layer_defer_inline 4.97% : 0.000038s : 213: predicate.switch_simplify 1.18% : 0.000009s : 52: predicate.tile_eliminate 1.08% : 0.000008s : 52: predicate.transpose_eliminate 1.47% : 0.000011s : 60: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 60: predicate.tuple_list_get_item_const_eliminator 1.56% : 0.000012s : 60: predicate.tuple_list_get_item_depend_reorder 3.16% : 0.000024s : 90: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 60: predicate.tuple_list_get_set_item_eliminator 2.09% : 0.000016s : 81: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 69: predicate.tuple_to_list_eliminator_ 2.44% : 0.000019s : 121: predicate.updatestate_pure_node_eliminater 2.96% : 0.000023s : 142: predicate.updatestate_useless_node_eliminater 0.12% : 0.000001s : 4: predicate.value_based_eliminate 0.52% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.50% : 0.000004s : 21: predicate.virtual_output_eliminate 0.08% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002616 35 51.85% : 0.001356s : 14: func_graph_cloner_run.FuncGraphClonerGraph 48.15% : 0.001260s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.268360 237 0.00% : 0.000004s : 1: ForceFp32Comm 3.59% : 0.009645s : 1: add_attr 3.59% : 0.009628s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000084s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.06% : 0.000160s : 1: auto_monad 0.01% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.20% : 0.000544s : 1: bootstrap 0.01% : 0.000037s : 1: cconv 0.00% : 0.000005s : 1: comm_op_add_attrs 0.01% : 0.000023s : 1: control_data_broadcast_order 0.01% : 0.000015s : 1: convert_after_rewriter 0.01% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000015s : 1: environ_conv 0.02% : 0.000064s : 1: event_method 0.01% : 0.000019s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000014s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000011s : 1: label_micro_interleaved_index 0.21% : 0.000574s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 1.11% : 0.002990s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000019s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000024s : 1: opt.transform.mutable_eliminate 2.60% : 0.006965s : 117: opt.transform.opt_a 0.01% : 0.000035s : 1: opt.transform.opt_after_cconv 0.01% : 0.000032s : 1: opt.transform.opt_after_jit_grad 0.05% : 0.000126s : 28: opt.transform.opt_b 0.03% : 0.000075s : 2: opt.transform.opt_trans_graph 0.02% : 0.000053s : 4: opt.transform.symbol_engine_opt 23.62% : 0.063385s : 1: opt_a 0.84% : 0.002246s : 1: opt_after_cconv 0.27% : 0.000713s : 1: opt_after_jit_grad 0.10% : 0.000279s : 1: opt_b 26.30% : 0.070579s : 1: optimize 0.01% : 0.000030s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000030s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000005s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000060s : 1: pre_auto_parallel 0.02% : 0.000046s : 1: py_interpret_to_execute 0.01% : 0.000027s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000026s : 1: remove_dup_value 12.15% : 0.032619s : 2: renormalize.infer 0.99% : 0.002664s : 2: renormalize.specialize 0.00% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000011s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000064s : 1: rewriter_after_opt_a 0.07% : 0.000175s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.04% : 0.000113s : 1: symbol_engine_optimizer 3.04% : 0.008161s : 1: task_emit 0.05% : 0.000122s : 1: tuple_transform 20.66% : 0.055436s : 1: type_inference 0.04% : 0.000108s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x2-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x2-kbk],max_mem:4.0M TotalTime = 0.517826, [24] [bootstrap]: 0.00081254 [type_inference]: 0.0289016 [event_method]: 2.145e-05 [auto_monad]: 6.755e-05 [graph_reusing]: 5.67001e-06 [inline]: 3.58999e-06 [add_attr]: 0.00713646, [1] [add_attr_with_inline]: 0.00711831, [1] [Cycle 1]: 7.235e-05, [2] [tag_attr]: 2.165e-05 [meta_addattr_fg_expand]: 4.87e-06 [parallel-infer-symbol]: 4.22e-06 [pre_auto_parallel]: 3.888e-05 [insert-virtual-dataset]: 3.16999e-06 [parallel-infer-symbol-second]: 9.39996e-07 [dataset_repeat_opt]: 2.37999e-06 [pipeline_split]: 1.87999e-06 [optimize]: 0.0157921, [53] [py_interpret_to_execute]: 3.473e-05 [rewriter_before_opt_a]: 7.961e-05 [opt_a]: 0.00886142, [2] [Cycle 1]: 0.00802261, [45] [expand_dump_flag]: 2.83003e-06 [switch_simplify]: 3.656e-05 [loop_unroll]: 2.156e-05 [a_1]: 0.00052504 [with_stream_mark]: 1.981e-05 [recompute_prepare]: 9.83998e-06 [updatestate_depend_eliminate]: 4.74e-06 [updatestate_assign_eliminate]: 3.63e-06 [updatestate_loads_eliminate]: 3.4e-06 [parameter_eliminate]: 1.86e-06 [a_2]: 8.462e-05 [accelerated_algorithm]: 8.45001e-06 [shard]: 2.27999e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 6.89999e-06 [merge_send_recv]: 9.92999e-06 [auto_parallel]: 8.08001e-06 [parallel]: 3.054e-05 [flash_sp]: 8.87e-06 [merge_comm]: 4.48999e-06 [allreduce_fusion]: 3.60998e-06 [matmul_add_comm_reduction]: 1.088e-05 [allreduce_slice_to_reducescatter]: 7.30011e-07 [virtual_shard_identity]: 9.17001e-06 [virtual_dataset]: 6.92002e-06 [get_grad_eliminate_]: 6.06e-06 [virtual_output]: 6.42001e-06 [merge_forward]: 4.27e-06 [cell_reuse_recompute_pass]: 1.41002e-06 [offload_activation]: 1.168e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.401e-05 [merge_recompute_call_nodes]: 1.85001e-06 [before_grad]: 1.066e-05 [set_forward_comm_id_for_comm_node_pass]: 4.10998e-06 [meta_fg_expand]: 3.08998e-06 [flash_sp_send_recv_attached]: 2.79999e-06 [receive_attached]: 2.12001e-06 [after_resolve]: 1.061e-05 [a_after_grad]: 9.84001e-06 [renormalize]: 0.00664772 [add_forward_monad_depend]: 1.852e-05 [auto_monad_grad]: 2.78e-06 [auto_monad_eliminator]: 2.592e-05 [cse]: 3.831e-05 [a_3]: 6.525e-05 [Cycle 2]: 0.00082212, [45] [expand_dump_flag]: 2.68e-06 [switch_simplify]: 9.89001e-06 [loop_unroll]: 7.03e-06 [a_1]: 0.00015316 [with_stream_mark]: 2.16e-05 [recompute_prepare]: 8.87999e-06 [updatestate_depend_eliminate]: 4.28001e-06 [updatestate_assign_eliminate]: 3.91001e-06 [updatestate_loads_eliminate]: 4.45999e-06 [parameter_eliminate]: 2.20002e-06 [a_2]: 7.754e-05 [accelerated_algorithm]: 8.23001e-06 [shard]: 2.63e-06 [meta_shard_fg_expand]: 3.37002e-06 [shard_inline]: 7.7e-06 [merge_send_recv]: 9.60001e-06 [auto_parallel]: 1.082e-05 [parallel]: 1.086e-05 [flash_sp]: 5.19e-06 [merge_comm]: 4.75999e-06 [allreduce_fusion]: 3.64002e-06 [matmul_add_comm_reduction]: 1.163e-05 [allreduce_slice_to_reducescatter]: 8.10018e-07 [virtual_shard_identity]: 1.013e-05 [virtual_dataset]: 5.61998e-06 [get_grad_eliminate_]: 5.99e-06 [virtual_output]: 6.12999e-06 [merge_forward]: 5.40999e-06 [cell_reuse_recompute_pass]: 2.96001e-06 [offload_activation]: 1.371e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.607e-05 [merge_recompute_call_nodes]: 1.97999e-06 [before_grad]: 1.306e-05 [set_forward_comm_id_for_comm_node_pass]: 4.73001e-06 [meta_fg_expand]: 2.76e-06 [flash_sp_send_recv_attached]: 2.08002e-06 [receive_attached]: 2.46998e-06 [after_resolve]: 1.452e-05 [a_after_grad]: 9.14e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 3.11001e-06 [auto_monad_grad]: 1.81003e-06 [auto_monad_eliminator]: 1.168e-05 [cse]: 2.32e-05 [a_3]: 3.813e-05 [py_interpret_to_execute_after_opt_a]: 1.958e-05 [slice_cell_reuse_recomputed_activation]: 2.86999e-06 [rewriter_after_opt_a]: 5.124e-05 [convert_after_rewriter]: 8.69003e-06 [order_py_execute_after_rewriter]: 6.33e-06 [mutable_eliminate]: 0.00084997 [opt_b]: 0.00312756, [1] [Cycle 1]: 0.00311507, [7] [b_1]: 0.00012261 [b_2]: 9.91e-06 [updatestate_depend_eliminate]: 1.07e-05 [updatestate_assign_eliminate]: 3.33998e-06 [updatestate_loads_eliminate]: 3.48999e-06 [renormalize]: 0.00283085 [cse]: 5.308e-05 [optimize_parallel_all_gather_comm]: 3.252e-05 [overlap_param_gather]: 2.24999e-06 [cconv]: 4.239e-05 [loop_unroll]: 0.00174774 [opt_after_cconv]: 0.00015411, [1] [Cycle 1]: 0.0001426, [7] [c_1]: 3.342e-05 [parameter_eliminate]: 8.33999e-06 [updatestate_depend_eliminate]: 1.281e-05 [updatestate_assign_eliminate]: 3.43e-06 [updatestate_loads_eliminate]: 3.66999e-06 [cse]: 3.881e-05 [renormalize]: 8.09989e-07 [remove_dup_value]: 1.914e-05 [tuple_transform]: 9.519e-05, [1] [Cycle 1]: 8.934e-05, [4] [d_1]: 5.396e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 4.19997e-07 [switch_simplify]: 8.86002e-06 [partial_unused_args_eliminate]: 2.41e-06 [add_recomputation]: 6.766e-05 [cse_after_recomputation]: 3.452e-05, [1] [Cycle 1]: 2.742e-05, [1] [cse]: 1.719e-05 [environ_conv]: 1.504e-05 [swap_dp_allreduce_reducescatter]: 6.68e-06 [bias_add_comm_swap]: 4.92e-06 [label_micro_interleaved_index]: 8.35999e-06 [label_fine_grained_interleaved_index]: 3.16001e-06 [merge_cast_opt]: 1.58002e-06 [slice_recompute_activation]: 2.78e-06 [micro_interleaved_order_control]: 2.50002e-06 [assign_add_opt]: 2.08998e-06 [ForceFp32Comm]: 1.29e-06 [remove_cast_before_assign_add]: 1.20999e-06 [full_micro_interleaved_order_control]: 2.27001e-06 [reorder_send_recv_between_fp_bp]: 3.48e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 1.11002e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.27999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.88002e-06 [control_data_broadcast_order]: 1.854e-05 [grouped_pairwise_exchange_alltoall]: 1.56002e-06 [offloading_packed_experts]: 5.35999e-06 [overlap_recompute_and_grad_model_parallel]: 5.39998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47001e-06 [overlap_recompute_comm]: 2.43002e-06 [overlap_grad_ring_attention]: 4.97e-06 [overlap_grad_flash_sp]: 2.58e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.48e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 1.14e-06 [symbol_engine_optimizer]: 0.00010781, [1] [Cycle 1]: 9.93e-05, [6] [build]: 5.11002e-06 [elim_shapecalc]: 1.859e-05 [elim_not_effective]: 1.683e-05 [opt_reshape]: 7.94002e-06 [fold_const_symbol]: 1.055e-05 [renormalize]: 2.50002e-07 [detach_backward]: 2.21e-06 [pipeline_parallel_scheduler]: 1.76e-06 [auto_monad_reorder]: 2.415e-05 [get_jit_bprop_graph]: 2.13998e-06 [rewriter_after_jit_bprop_graph]: 8.84e-06 [opt_after_jit_grad]: 0.00073325 [validate]: 5.26e-05 [backend_pass]: 1.55999e-06 [task_emit]: 0.463895 [execute]: 8.87e-06 Sums bootstrap : 0.000813s : 0.16% type_inference : 0.028902s : 5.67% event_method : 0.000021s : 0.00% auto_monad : 0.000068s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000004s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000022s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000039s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.01% optimize.rewriter_before_opt_a : 0.000080s : 0.02% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000046s : 0.01% optimize.opt_a.loop_unroll : 0.000029s : 0.01% optimize.opt_a.a_1 : 0.000678s : 0.13% optimize.opt_a.with_stream_mark : 0.000041s : 0.01% optimize.opt_a.recompute_prepare : 0.000019s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000008s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000162s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000017s : 0.00% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000005s : 0.00% optimize.opt_a.shard_inline : 0.000015s : 0.00% optimize.opt_a.merge_send_recv : 0.000020s : 0.00% optimize.opt_a.auto_parallel : 0.000019s : 0.00% optimize.opt_a.parallel : 0.000041s : 0.01% optimize.opt_a.flash_sp : 0.000014s : 0.00% optimize.opt_a.merge_comm : 0.000009s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000023s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000019s : 0.00% optimize.opt_a.virtual_dataset : 0.000013s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000013s : 0.00% optimize.opt_a.merge_forward : 0.000010s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000025s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000030s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000024s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000009s : 0.00% optimize.opt_a.meta_fg_expand : 0.000006s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000025s : 0.00% optimize.opt_a.a_after_grad : 0.000019s : 0.00% optimize.opt_a.renormalize : 0.006648s : 1.31% optimize.opt_a.add_forward_monad_depend : 0.000022s : 0.00% optimize.opt_a.auto_monad_grad : 0.000005s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000038s : 0.01% optimize.opt_a.cse : 0.000062s : 0.01% optimize.opt_a.a_3 : 0.000103s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000020s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000051s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000850s : 0.17% optimize.opt_b.b_1 : 0.000123s : 0.02% optimize.opt_b.b_2 : 0.000010s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000011s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.002831s : 0.56% optimize.opt_b.cse : 0.000053s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000033s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000042s : 0.01% optimize.loop_unroll : 0.001748s : 0.34% optimize.opt_after_cconv.c_1 : 0.000033s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000013s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000039s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000019s : 0.00% optimize.tuple_transform.d_1 : 0.000054s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000068s : 0.01% optimize.cse_after_recomputation.cse : 0.000017s : 0.00% optimize.environ_conv : 0.000015s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000005s : 0.00% optimize.label_micro_interleaved_index : 0.000008s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000019s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000005s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000019s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000017s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000009s : 0.00% opt_after_jit_grad : 0.000733s : 0.14% validate : 0.000053s : 0.01% backend_pass : 0.000002s : 0.00% task_emit : 0.463895s : 91.09% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000241 26 19.88% : 0.000048s : 5: substitution.arithmetic_simplify 1.14% : 0.000003s : 2: substitution.elim_not_effective 0.60% : 0.000001s : 2: substitution.fold_const_symbol 2.93% : 0.000007s : 3: substitution.graph_param_transform 65.36% : 0.000158s : 3: substitution.inline 1.85% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.37% : 0.000006s : 4: substitution.remove_not_recompute_node 1.97% : 0.000005s : 2: substitution.replace_old_param 3.90% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.028813 2 96.52% : 0.027809s : 1: type_inference.infer 3.48% : 0.001004s : 1: type_inference.specialize ------[replace.] 0.000044 4 80.44% : 0.000036s : 3: replace.inline 19.56% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000164 4 94.68% : 0.000155s : 3: match.inline 5.32% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000197 883 0.84% : 0.000002s : 9: predicate.accumulaten_eliminater 1.52% : 0.000003s : 3: predicate.ad_related_special_op_eliminate 0.50% : 0.000001s : 6: predicate.addn_check_dump 1.15% : 0.000002s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.14% : 0.000004s : 15: predicate.arithmetic_simplify 0.90% : 0.000002s : 9: predicate.cast_eliminate 0.56% : 0.000001s : 6: predicate.check_bprop_eliminate 0.52% : 0.000001s : 6: predicate.compare_switch_simplify 0.18% : 0.000000s : 3: predicate.const_output_eliminate 0.52% : 0.000001s : 6: predicate.depend_value_elim 0.99% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.48% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000001s : 3: predicate.elim_not_effective 0.99% : 0.000002s : 3: predicate.elim_shapecalc_of_broadcastargs 1.00% : 0.000002s : 12: predicate.environ_add_const_eliminate 0.93% : 0.000002s : 12: predicate.environ_get_add_eliminate 0.90% : 0.000002s : 12: predicate.environ_get_depend_swap 1.47% : 0.000003s : 18: predicate.environ_get_eliminate 1.05% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.11% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.32% : 0.000005s : 13: predicate.float_depend_g_call 0.50% : 0.000001s : 6: predicate.float_environ_get_switch 0.67% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.17% : 0.000000s : 3: predicate.fold_const_symbol 0.70% : 0.000001s : 6: predicate.get_grad_eliminate 0.34% : 0.000001s : 3: predicate.graph_param_transform 0.56% : 0.000001s : 6: predicate.incorporate_call 0.46% : 0.000001s : 6: predicate.incorporate_call_switch 6.28% : 0.000012s : 40: predicate.inline 0.99% : 0.000002s : 6: predicate.inline_without_move 0.46% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.97% : 0.000002s : 6: predicate.less_batch_normalization 1.77% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.06% : 0.000004s : 25: predicate.load_eliminater 2.46% : 0.000005s : 3: predicate.loop_unroll_after_grad 1.87% : 0.000004s : 21: predicate.loop_unroll_before_grad 2.02% : 0.000004s : 15: predicate.make_slice_get_slice_eliminator 0.51% : 0.000001s : 6: predicate.merge_addn 0.49% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.56% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 9: predicate.minmaximum_grad 2.55% : 0.000005s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.41% : 0.000001s : 3: predicate.parallel_virtual_node 1.43% : 0.000003s : 13: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000002s : 9: predicate.print_const_string_wrapper 0.57% : 0.000001s : 6: predicate.reduce_all_const_elim 1.12% : 0.000002s : 9: predicate.reduce_eliminate 2.06% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.56% : 0.000001s : 6: predicate.remove_not_recompute_node 1.28% : 0.000003s : 16: predicate.replace_applicator 0.56% : 0.000001s : 6: predicate.replace_old_param 0.45% : 0.000001s : 3: predicate.reset_defer_inline 0.97% : 0.000002s : 9: predicate.reshape_eliminate 0.65% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 3: predicate.row_tensor_eliminate 1.00% : 0.000002s : 6: predicate.same_eliminate 0.39% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.02% : 0.000002s : 6: predicate.shard_identity_eliminate 0.82% : 0.000002s : 6: predicate.special_op_eliminate 0.78% : 0.000002s : 6: predicate.specialize_transform 1.48% : 0.000003s : 6: predicate.split_environ_get_set_with_tuple_value 0.75% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.17% : 0.000002s : 13: predicate.switch_defer_inline 1.63% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.69% : 0.000009s : 43: predicate.switch_simplify 0.82% : 0.000002s : 9: predicate.tile_eliminate 0.96% : 0.000002s : 9: predicate.transpose_eliminate 1.38% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.26% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.43% : 0.000007s : 22: predicate.tuple_list_get_item_eliminator 1.32% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.72% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.15% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.59% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 3: predicate.value_based_eliminate 0.64% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.64% : 0.000001s : 6: predicate.virtual_output_eliminate 0.27% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.75% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000743 8 40.09% : 0.000298s : 3: func_graph_cloner_run.FuncGraphClonerGraph 59.91% : 0.000445s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.548746 196 0.00% : 0.000005s : 1: ForceFp32Comm 1.30% : 0.007144s : 1: add_attr 1.30% : 0.007123s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000075s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.01% : 0.000073s : 1: auto_monad 0.01% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000008s : 1: bias_add_comm_swap 0.16% : 0.000862s : 1: bootstrap 0.01% : 0.000047s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000023s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.01% : 0.000038s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000020s : 1: environ_conv 0.01% : 0.000029s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000012s : 1: label_micro_interleaved_index 0.32% : 0.001772s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.16% : 0.000873s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000075s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000027s : 1: opt.transform.mutable_eliminate 0.20% : 0.001109s : 78: opt.transform.opt_a 0.01% : 0.000032s : 1: opt.transform.opt_after_cconv 0.01% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000098s : 28: opt.transform.opt_b 0.01% : 0.000059s : 2: opt.transform.opt_trans_graph 0.01% : 0.000048s : 4: opt.transform.symbol_engine_opt 1.62% : 0.008865s : 1: opt_a 0.03% : 0.000158s : 1: opt_after_cconv 0.14% : 0.000753s : 1: opt_after_jit_grad 0.57% : 0.003133s : 1: opt_b 2.88% : 0.015800s : 1: optimize 0.01% : 0.000039s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.01% : 0.000032s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000043s : 1: pre_auto_parallel 0.01% : 0.000039s : 1: py_interpret_to_execute 0.00% : 0.000026s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000023s : 1: remove_dup_value 1.12% : 0.006143s : 1: renormalize.infer 0.09% : 0.000490s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000013s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000058s : 1: rewriter_after_opt_a 0.02% : 0.000084s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000111s : 1: symbol_engine_optimizer 84.54% : 0.463917s : 1: task_emit 0.02% : 0.000098s : 1: tuple_transform 5.27% : 0.028937s : 1: type_inference 0.02% : 0.000097s : 1: validate TotalTime = 0.350962, [24] [bootstrap]: 0.00051455 [type_inference]: 0.00667108 [event_method]: 1.398e-05 [auto_monad]: 6.236e-05 [graph_reusing]: 5.66e-06 [inline]: 2.69001e-06 [add_attr]: 0.00326496, [1] [add_attr_with_inline]: 0.00325448, [1] [Cycle 1]: 5.406e-05, [2] [tag_attr]: 1.535e-05 [meta_addattr_fg_expand]: 4.58999e-06 [parallel-infer-symbol]: 3.18998e-06 [pre_auto_parallel]: 2.673e-05 [insert-virtual-dataset]: 2.68e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.29001e-06 [pipeline_split]: 1.78002e-06 [optimize]: 0.00443654, [53] [py_interpret_to_execute]: 2.132e-05 [rewriter_before_opt_a]: 5.442e-05 [opt_a]: 0.00243121, [2] [Cycle 1]: 0.00171449, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 3.054e-05 [loop_unroll]: 1.9e-05 [a_1]: 0.00039273 [with_stream_mark]: 1.711e-05 [recompute_prepare]: 8.10999e-06 [updatestate_depend_eliminate]: 4.02e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.33e-06 [parameter_eliminate]: 1.79e-06 [a_2]: 8.669e-05 [accelerated_algorithm]: 6.69999e-06 [shard]: 2.05002e-06 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 6.08998e-06 [merge_send_recv]: 9.29e-06 [auto_parallel]: 7.43e-06 [parallel]: 1.955e-05 [flash_sp]: 8.99998e-06 [merge_comm]: 5.00001e-06 [allreduce_fusion]: 4.02e-06 [matmul_add_comm_reduction]: 9.95002e-06 [allreduce_slice_to_reducescatter]: 6.80011e-07 [virtual_shard_identity]: 9.59e-06 [virtual_dataset]: 6.58003e-06 [get_grad_eliminate_]: 5.92001e-06 [virtual_output]: 6.87002e-06 [merge_forward]: 4.36002e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.60001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.39e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 1.143e-05 [set_forward_comm_id_for_comm_node_pass]: 3.74002e-06 [meta_fg_expand]: 2.76e-06 [flash_sp_send_recv_attached]: 3.30998e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.069e-05 [a_after_grad]: 9.99999e-06 [renormalize]: 0.00056522 [add_forward_monad_depend]: 4.77e-06 [auto_monad_grad]: 2.41998e-06 [auto_monad_eliminator]: 1.397e-05 [cse]: 3.067e-05 [a_3]: 9.099e-05 [Cycle 2]: 0.0007059, [45] [expand_dump_flag]: 1.38002e-06 [switch_simplify]: 7.65e-06 [loop_unroll]: 5.74999e-06 [a_1]: 0.00012006 [with_stream_mark]: 1.208e-05 [recompute_prepare]: 6.33998e-06 [updatestate_depend_eliminate]: 3.29001e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 3.09999e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 0.00012171 [accelerated_algorithm]: 7.33999e-06 [shard]: 1.16997e-06 [meta_shard_fg_expand]: 1.31002e-06 [shard_inline]: 6.90002e-06 [merge_send_recv]: 5.77999e-06 [auto_parallel]: 6.43e-06 [parallel]: 4.83001e-06 [flash_sp]: 3.93001e-06 [merge_comm]: 3.62998e-06 [allreduce_fusion]: 3.34001e-06 [matmul_add_comm_reduction]: 5.99999e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 6.88998e-06 [virtual_dataset]: 5.92001e-06 [get_grad_eliminate_]: 5.34998e-06 [virtual_output]: 5.44e-06 [merge_forward]: 2.78e-06 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 7.05002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.142e-05 [merge_recompute_call_nodes]: 1.00999e-06 [before_grad]: 9.34e-06 [set_forward_comm_id_for_comm_node_pass]: 3.79002e-06 [meta_fg_expand]: 1.97999e-06 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.29e-06 [after_resolve]: 9.07999e-06 [a_after_grad]: 8.17998e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.56002e-06 [auto_monad_grad]: 1.21002e-06 [auto_monad_eliminator]: 7.61999e-06 [cse]: 1.647e-05 [a_3]: 3.441e-05 [py_interpret_to_execute_after_opt_a]: 9.52001e-06 [slice_cell_reuse_recomputed_activation]: 2.35002e-06 [rewriter_after_opt_a]: 3.571e-05 [convert_after_rewriter]: 6.94999e-06 [order_py_execute_after_rewriter]: 5.32999e-06 [mutable_eliminate]: 0.00051462 [opt_b]: 0.00019879, [1] [Cycle 1]: 0.00019202, [7] [b_1]: 0.00011584 [b_2]: 7.45e-06 [updatestate_depend_eliminate]: 5.92001e-06 [updatestate_assign_eliminate]: 2.74001e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 9.00007e-07 [cse]: 2.002e-05 [optimize_parallel_all_gather_comm]: 1.671e-05 [overlap_param_gather]: 2.17999e-06 [cconv]: 2.55e-05 [loop_unroll]: 0.00043753 [opt_after_cconv]: 0.00010173, [1] [Cycle 1]: 9.532e-05, [7] [c_1]: 2.684e-05 [parameter_eliminate]: 3.12002e-06 [updatestate_depend_eliminate]: 5.59e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.37001e-06 [cse]: 1.918e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.665e-05 [tuple_transform]: 7.342e-05, [1] [Cycle 1]: 6.837e-05, [4] [d_1]: 3.987e-05 [none_parameter_eliminate]: 1.87001e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.81999e-06 [partial_unused_args_eliminate]: 1.91003e-06 [add_recomputation]: 4.567e-05 [cse_after_recomputation]: 2.333e-05, [1] [Cycle 1]: 1.858e-05, [1] [cse]: 1.297e-05 [environ_conv]: 5.37999e-06 [swap_dp_allreduce_reducescatter]: 5.92999e-06 [bias_add_comm_swap]: 2.42001e-06 [label_micro_interleaved_index]: 5.00999e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.42999e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.12999e-06 [assign_add_opt]: 1.33002e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 1.18001e-06 [full_micro_interleaved_order_control]: 2.46998e-06 [reorder_send_recv_between_fp_bp]: 2.91999e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.26997e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.57999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99e-06 [control_data_broadcast_order]: 1.288e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 4.90999e-06 [overlap_recompute_and_grad_model_parallel]: 5.38002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.74e-06 [overlap_recompute_allgather_and_fa_grad]: 1.64e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 4.77998e-06 [overlap_grad_flash_sp]: 1.997e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.53e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 1.10999e-06 [symbol_engine_optimizer]: 7.622e-05, [1] [Cycle 1]: 7.155e-05, [6] [build]: 3.63999e-06 [elim_shapecalc]: 9.41998e-06 [elim_not_effective]: 1.211e-05 [opt_reshape]: 7.22002e-06 [fold_const_symbol]: 1.041e-05 [renormalize]: 2.69996e-07 [detach_backward]: 2.21998e-06 [pipeline_parallel_scheduler]: 1.89e-06 [auto_monad_reorder]: 1.706e-05 [get_jit_bprop_graph]: 1.45999e-06 [rewriter_after_jit_bprop_graph]: 4.63001e-06 [opt_after_jit_grad]: 0.00048688 [validate]: 4.112e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.335146 [execute]: 9.89999e-06 Sums bootstrap : 0.000515s : 0.15% type_inference : 0.006671s : 1.92% event_method : 0.000014s : 0.00% auto_monad : 0.000062s : 0.02% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.01% optimize.rewriter_before_opt_a : 0.000054s : 0.02% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.01% optimize.opt_a.loop_unroll : 0.000025s : 0.01% optimize.opt_a.a_1 : 0.000513s : 0.15% optimize.opt_a.with_stream_mark : 0.000029s : 0.01% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000208s : 0.06% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000015s : 0.00% optimize.opt_a.auto_parallel : 0.000014s : 0.00% optimize.opt_a.parallel : 0.000024s : 0.01% optimize.opt_a.flash_sp : 0.000013s : 0.00% optimize.opt_a.merge_comm : 0.000009s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.00% optimize.opt_a.virtual_dataset : 0.000013s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.01% optimize.opt_a.a_after_grad : 0.000018s : 0.01% optimize.opt_a.renormalize : 0.000565s : 0.16% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.01% optimize.opt_a.cse : 0.000047s : 0.01% optimize.opt_a.a_3 : 0.000125s : 0.04% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000515s : 0.15% optimize.opt_b.b_1 : 0.000116s : 0.03% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000020s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.01% optimize.loop_unroll : 0.000438s : 0.13% optimize.opt_after_cconv.c_1 : 0.000027s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000017s : 0.00% optimize.tuple_transform.d_1 : 0.000040s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.01% optimize.cse_after_recomputation.cse : 0.000013s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000020s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000487s : 0.14% validate : 0.000041s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.335146s : 96.69% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000162 24 19.77% : 0.000032s : 4: substitution.arithmetic_simplify 1.21% : 0.000002s : 2: substitution.elim_not_effective 0.91% : 0.000001s : 2: substitution.fold_const_symbol 3.42% : 0.000006s : 3: substitution.graph_param_transform 67.30% : 0.000109s : 3: substitution.inline 2.25% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.00% : 0.000005s : 4: substitution.remove_not_recompute_node 2.14% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006617 2 91.83% : 0.006077s : 1: type_inference.infer 8.17% : 0.000540s : 1: type_inference.specialize ------[replace.] 0.000031 3 100.00% : 0.000031s : 3: replace.inline ------[match.] 0.000107 3 100.00% : 0.000107s : 3: match.inline ------[predicate.] 0.000160 815 0.87% : 0.000001s : 8: predicate.accumulaten_eliminater 0.98% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.75% : 0.000001s : 6: predicate.addn_check_dump 0.97% : 0.000002s : 8: predicate.addn_zero_filter 0.74% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.36% : 0.000004s : 14: predicate.arithmetic_simplify 0.89% : 0.000001s : 8: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.81% : 0.000001s : 6: predicate.depend_value_elim 0.84% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.46% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.01% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 11: predicate.environ_get_depend_swap 1.92% : 0.000003s : 17: predicate.environ_get_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.16% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.19% : 0.000004s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 6: predicate.float_environ_get_switch 0.93% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.79% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.74% : 0.000001s : 6: predicate.incorporate_call 0.61% : 0.000001s : 6: predicate.incorporate_call_switch 6.27% : 0.000010s : 37: predicate.inline 1.05% : 0.000002s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.04% : 0.000002s : 6: predicate.less_batch_normalization 1.47% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 22: predicate.load_eliminater 1.06% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.03% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 6: predicate.merge_addn 0.66% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 8: predicate.minmaximum_grad 1.11% : 0.000002s : 3: predicate.mutable_eliminate 0.41% : 0.000001s : 3: predicate.opt_reshape 0.46% : 0.000001s : 3: predicate.parallel_virtual_node 1.44% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 11: predicate.partial_eliminate 0.81% : 0.000001s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.30% : 0.000002s : 8: predicate.reduce_eliminate 2.32% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.59% : 0.000001s : 6: predicate.remove_not_recompute_node 1.16% : 0.000002s : 14: predicate.replace_applicator 0.71% : 0.000001s : 6: predicate.replace_old_param 0.26% : 0.000000s : 3: predicate.reset_defer_inline 0.97% : 0.000002s : 8: predicate.reshape_eliminate 0.74% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 3: predicate.row_tensor_eliminate 1.04% : 0.000002s : 6: predicate.same_eliminate 0.51% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 6: predicate.shard_identity_eliminate 0.77% : 0.000001s : 6: predicate.special_op_eliminate 0.79% : 0.000001s : 6: predicate.specialize_transform 1.19% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 11: predicate.switch_defer_inline 1.93% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.55% : 0.000007s : 38: predicate.switch_simplify 0.89% : 0.000001s : 8: predicate.tile_eliminate 0.91% : 0.000001s : 8: predicate.transpose_eliminate 1.60% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.08% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.59% : 0.000003s : 14: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.44% : 0.000004s : 22: predicate.updatestate_pure_node_eliminater 3.00% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.58% : 0.000001s : 3: predicate.value_based_eliminate 0.78% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000359 7 36.39% : 0.000131s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.61% : 0.000228s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.360327 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.91% : 0.003270s : 1: add_attr 0.90% : 0.003259s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000068s : 1: auto_monad 0.01% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.15% : 0.000552s : 1: bootstrap 0.01% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.01% : 0.000026s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.01% : 0.000020s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.12% : 0.000447s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.15% : 0.000524s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000015s : 1: opt.transform.mutable_eliminate 0.26% : 0.000948s : 78: opt.transform.opt_a 0.01% : 0.000025s : 1: opt.transform.opt_after_cconv 0.01% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.03% : 0.000094s : 28: opt.transform.opt_b 0.01% : 0.000044s : 2: opt.transform.opt_trans_graph 0.01% : 0.000035s : 4: opt.transform.symbol_engine_opt 0.68% : 0.002434s : 1: opt_a 0.03% : 0.000105s : 1: opt_after_cconv 0.14% : 0.000498s : 1: opt_after_jit_grad 0.06% : 0.000202s : 1: opt_b 1.23% : 0.004441s : 1: optimize 0.01% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000024s : 1: overlap_grad_flash_sp 0.00% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000031s : 1: pre_auto_parallel 0.01% : 0.000025s : 1: py_interpret_to_execute 0.00% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000020s : 1: remove_dup_value 0.08% : 0.000306s : 1: renormalize.infer 0.07% : 0.000252s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000040s : 1: rewriter_after_opt_a 0.02% : 0.000059s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000079s : 1: symbol_engine_optimizer 93.02% : 0.335172s : 1: task_emit 0.02% : 0.000077s : 1: tuple_transform 1.86% : 0.006689s : 1: type_inference 0.02% : 0.000072s : 1: validate TotalTime = 0.458413, [24] [bootstrap]: 0.00047676 [type_inference]: 0.025381 [event_method]: 2.049e-05 [auto_monad]: 7.231e-05 [graph_reusing]: 7.65e-06 [inline]: 3.71001e-06 [add_attr]: 0.00607879, [1] [add_attr_with_inline]: 0.00606705, [1] [Cycle 1]: 7.207e-05, [2] [tag_attr]: 2.127e-05 [meta_addattr_fg_expand]: 5.02e-06 [parallel-infer-symbol]: 3.65e-06 [pre_auto_parallel]: 3.701e-05 [insert-virtual-dataset]: 2.95998e-06 [parallel-infer-symbol-second]: 9.5999e-07 [dataset_repeat_opt]: 1.86e-06 [pipeline_split]: 1.74998e-06 [optimize]: 0.0230437, [53] [py_interpret_to_execute]: 2.921e-05 [rewriter_before_opt_a]: 7.828e-05 [opt_a]: 0.00283519, [2] [Cycle 1]: 0.00212644, [45] [expand_dump_flag]: 3.16999e-06 [switch_simplify]: 3.649e-05 [loop_unroll]: 2.176e-05 [a_1]: 0.00053553 [with_stream_mark]: 2.039e-05 [recompute_prepare]: 1.127e-05 [updatestate_depend_eliminate]: 4.30999e-06 [updatestate_assign_eliminate]: 4.11001e-06 [updatestate_loads_eliminate]: 3.26999e-06 [parameter_eliminate]: 2.04e-06 [a_2]: 8.584e-05 [accelerated_algorithm]: 7.42002e-06 [shard]: 2.31998e-06 [meta_shard_fg_expand]: 2.21e-06 [shard_inline]: 6.71e-06 [merge_send_recv]: 9.79999e-06 [auto_parallel]: 7.58999e-06 [parallel]: 1.968e-05 [flash_sp]: 8.70999e-06 [merge_comm]: 3.88999e-06 [allreduce_fusion]: 3.89002e-06 [matmul_add_comm_reduction]: 1.043e-05 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 8.76002e-06 [virtual_dataset]: 6.31998e-06 [get_grad_eliminate_]: 6.76999e-06 [virtual_output]: 6.26998e-06 [merge_forward]: 4.90001e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 1.093e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.415e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 1.132e-05 [set_forward_comm_id_for_comm_node_pass]: 3.98001e-06 [meta_fg_expand]: 3.38e-06 [flash_sp_send_recv_attached]: 2.65002e-06 [receive_attached]: 2.54001e-06 [after_resolve]: 1.056e-05 [a_after_grad]: 9.32999e-06 [renormalize]: 0.0008289 [add_forward_monad_depend]: 6.28e-06 [auto_monad_grad]: 3.03e-06 [auto_monad_eliminator]: 1.663e-05 [cse]: 3.52e-05 [a_3]: 4.934e-05 [Cycle 2]: 0.00069582, [45] [expand_dump_flag]: 1.27e-06 [switch_simplify]: 7.66999e-06 [loop_unroll]: 6.44999e-06 [a_1]: 0.00012353 [with_stream_mark]: 1.471e-05 [recompute_prepare]: 6.94001e-06 [updatestate_depend_eliminate]: 3.79002e-06 [updatestate_assign_eliminate]: 2.91e-06 [updatestate_loads_eliminate]: 3.74002e-06 [parameter_eliminate]: 1.76e-06 [a_2]: 7.487e-05 [accelerated_algorithm]: 6.58e-06 [shard]: 1.40999e-06 [meta_shard_fg_expand]: 2.19001e-06 [shard_inline]: 5.97999e-06 [merge_send_recv]: 6.81999e-06 [auto_parallel]: 6.91001e-06 [parallel]: 6.61999e-06 [flash_sp]: 3.53999e-06 [merge_comm]: 4.67998e-06 [allreduce_fusion]: 3.57002e-06 [matmul_add_comm_reduction]: 8.53001e-06 [allreduce_slice_to_reducescatter]: 5.19998e-07 [virtual_shard_identity]: 7.65e-06 [virtual_dataset]: 6.29999e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 3.53999e-06 [cell_reuse_recompute_pass]: 1.86998e-06 [offload_activation]: 7.89002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.163e-05 [merge_recompute_call_nodes]: 1.27999e-06 [before_grad]: 1.124e-05 [set_forward_comm_id_for_comm_node_pass]: 4.80999e-06 [meta_fg_expand]: 1.96998e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.35001e-06 [after_resolve]: 1.042e-05 [a_after_grad]: 9.05999e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.94e-06 [auto_monad_grad]: 1.59998e-06 [auto_monad_eliminator]: 9.37999e-06 [cse]: 1.904e-05 [a_3]: 3.671e-05 [py_interpret_to_execute_after_opt_a]: 1.254e-05 [slice_cell_reuse_recomputed_activation]: 2.69001e-06 [rewriter_after_opt_a]: 4.272e-05 [convert_after_rewriter]: 7.25e-06 [order_py_execute_after_rewriter]: 5.58997e-06 [mutable_eliminate]: 0.0007439 [opt_b]: 0.00021877, [1] [Cycle 1]: 0.00020986, [7] [b_1]: 0.00012061 [b_2]: 8.28001e-06 [updatestate_depend_eliminate]: 7.57002e-06 [updatestate_assign_eliminate]: 3.02002e-06 [updatestate_loads_eliminate]: 2.81999e-06 [renormalize]: 1.20999e-06 [cse]: 2.768e-05 [optimize_parallel_all_gather_comm]: 2.036e-05 [overlap_param_gather]: 2.08998e-06 [cconv]: 3.481e-05 [loop_unroll]: 0.0181331 [opt_after_cconv]: 0.00017288, [1] [Cycle 1]: 0.00016082, [7] [c_1]: 3.816e-05 [parameter_eliminate]: 7.9e-06 [updatestate_depend_eliminate]: 1.489e-05 [updatestate_assign_eliminate]: 3.8e-06 [updatestate_loads_eliminate]: 3.61001e-06 [cse]: 4.935e-05 [renormalize]: 5.39992e-07 [remove_dup_value]: 2.037e-05 [tuple_transform]: 9.401e-05, [1] [Cycle 1]: 8.885e-05, [4] [d_1]: 5.578e-05 [none_parameter_eliminate]: 1.79e-06 [renormalize]: 3.60014e-07 [switch_simplify]: 8.84998e-06 [partial_unused_args_eliminate]: 2.41998e-06 [add_recomputation]: 6.661e-05 [cse_after_recomputation]: 2.632e-05, [1] [Cycle 1]: 1.991e-05, [1] [cse]: 1.319e-05 [environ_conv]: 8.67998e-06 [swap_dp_allreduce_reducescatter]: 5.69e-06 [bias_add_comm_swap]: 3.2e-06 [label_micro_interleaved_index]: 8.25999e-06 [label_fine_grained_interleaved_index]: 2.94001e-06 [merge_cast_opt]: 1.55999e-06 [slice_recompute_activation]: 2.63998e-06 [micro_interleaved_order_control]: 2.44999e-06 [assign_add_opt]: 1.81998e-06 [ForceFp32Comm]: 9.39996e-07 [remove_cast_before_assign_add]: 1.27e-06 [full_micro_interleaved_order_control]: 2.22001e-06 [reorder_send_recv_between_fp_bp]: 2.83e-06 [comm_op_add_attrs]: 1.15001e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.34e-06 [interleave_parallel_branches]: 1.15001e-06 [overlap_opt_shard_in_pipeline]: 1.28002e-06 [overlap_opt_shard_grad_in_pipeline]: 2.14e-06 [control_data_broadcast_order]: 1.782e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 4.92e-06 [overlap_recompute_and_grad_model_parallel]: 5.19e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.28002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.60999e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 4.73001e-06 [overlap_grad_flash_sp]: 2.314e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.30002e-06 [split_layernorm_comm]: 1.94999e-06 [handle_group_info]: 1.17e-06 [symbol_engine_optimizer]: 9.066e-05, [1] [Cycle 1]: 8.486e-05, [6] [build]: 4.47998e-06 [elim_shapecalc]: 1.402e-05 [elim_not_effective]: 1.418e-05 [opt_reshape]: 7.46001e-06 [fold_const_symbol]: 1.055e-05 [renormalize]: 2.59985e-07 [detach_backward]: 2.78e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 2.012e-05 [get_jit_bprop_graph]: 2.09999e-06 [rewriter_after_jit_bprop_graph]: 8.19002e-06 [opt_after_jit_grad]: 0.00080079 [validate]: 4.981e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.40211 [execute]: 1.13e-05 Sums bootstrap : 0.000477s : 0.11% type_inference : 0.025381s : 5.63% event_method : 0.000020s : 0.00% auto_monad : 0.000072s : 0.02% graph_reusing : 0.000008s : 0.00% inline : 0.000004s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000021s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000037s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000029s : 0.01% optimize.rewriter_before_opt_a : 0.000078s : 0.02% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000044s : 0.01% optimize.opt_a.loop_unroll : 0.000028s : 0.01% optimize.opt_a.a_1 : 0.000659s : 0.15% optimize.opt_a.with_stream_mark : 0.000035s : 0.01% optimize.opt_a.recompute_prepare : 0.000018s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000161s : 0.04% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000017s : 0.00% optimize.opt_a.auto_parallel : 0.000014s : 0.00% optimize.opt_a.parallel : 0.000026s : 0.01% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000009s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.00% optimize.opt_a.virtual_dataset : 0.000013s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000019s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000026s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000023s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000009s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000829s : 0.18% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.00% optimize.opt_a.auto_monad_grad : 0.000005s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000026s : 0.01% optimize.opt_a.cse : 0.000054s : 0.01% optimize.opt_a.a_3 : 0.000086s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000043s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000744s : 0.16% optimize.opt_b.b_1 : 0.000121s : 0.03% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000028s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000035s : 0.01% optimize.loop_unroll : 0.018133s : 4.02% optimize.opt_after_cconv.c_1 : 0.000038s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000015s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000049s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000020s : 0.00% optimize.tuple_transform.d_1 : 0.000056s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000067s : 0.01% optimize.cse_after_recomputation.cse : 0.000013s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000008s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000020s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000008s : 0.00% opt_after_jit_grad : 0.000801s : 0.18% validate : 0.000050s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.402110s : 89.14% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.000248 26 15.96% : 0.000040s : 5: substitution.arithmetic_simplify 0.95% : 0.000002s : 2: substitution.elim_not_effective 0.71% : 0.000002s : 2: substitution.fold_const_symbol 2.86% : 0.000007s : 3: substitution.graph_param_transform 69.98% : 0.000174s : 3: substitution.inline 2.07% : 0.000005s : 4: substitution.j_node_and_user_rematch 2.10% : 0.000005s : 4: substitution.remove_not_recompute_node 1.54% : 0.000004s : 2: substitution.replace_old_param 3.84% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.025303 2 23.54% : 0.005957s : 1: type_inference.infer 76.46% : 0.019346s : 1: type_inference.specialize ------[replace.] 0.000046 4 80.25% : 0.000037s : 3: replace.inline 19.75% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000180 4 95.28% : 0.000171s : 3: match.inline 4.72% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000186 883 0.87% : 0.000002s : 9: predicate.accumulaten_eliminater 0.95% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.49% : 0.000001s : 6: predicate.addn_check_dump 0.81% : 0.000001s : 9: predicate.addn_zero_filter 0.77% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.38% : 0.000004s : 15: predicate.arithmetic_simplify 0.83% : 0.000002s : 9: predicate.cast_eliminate 0.54% : 0.000001s : 6: predicate.check_bprop_eliminate 0.51% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.55% : 0.000001s : 6: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.54% : 0.000001s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.01% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.14% : 0.000002s : 12: predicate.environ_get_depend_swap 1.60% : 0.000003s : 18: predicate.environ_get_eliminate 1.02% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.13% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 13: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.80% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.17% : 0.000000s : 3: predicate.fold_const_symbol 0.78% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.59% : 0.000001s : 6: predicate.incorporate_call 0.50% : 0.000001s : 6: predicate.incorporate_call_switch 6.56% : 0.000012s : 40: predicate.inline 0.85% : 0.000002s : 6: predicate.inline_without_move 0.36% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.83% : 0.000002s : 6: predicate.less_batch_normalization 1.70% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.20% : 0.000004s : 25: predicate.load_eliminater 3.78% : 0.000007s : 3: predicate.loop_unroll_after_grad 1.92% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.93% : 0.000004s : 15: predicate.make_slice_get_slice_eliminator 0.53% : 0.000001s : 6: predicate.merge_addn 0.56% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.58% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 9: predicate.minmaximum_grad 2.07% : 0.000004s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.39% : 0.000001s : 3: predicate.parallel_virtual_node 1.45% : 0.000003s : 13: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.58% : 0.000001s : 6: predicate.reduce_all_const_elim 1.15% : 0.000002s : 9: predicate.reduce_eliminate 2.25% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.40% : 0.000001s : 6: predicate.remove_not_recompute_node 1.41% : 0.000003s : 16: predicate.replace_applicator 0.60% : 0.000001s : 6: predicate.replace_old_param 0.37% : 0.000001s : 3: predicate.reset_defer_inline 0.89% : 0.000002s : 9: predicate.reshape_eliminate 0.64% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.76% : 0.000001s : 6: predicate.same_eliminate 0.54% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 6: predicate.shard_identity_eliminate 0.79% : 0.000001s : 6: predicate.special_op_eliminate 0.73% : 0.000001s : 6: predicate.specialize_transform 0.90% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.21% : 0.000002s : 13: predicate.switch_defer_inline 1.71% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.92% : 0.000009s : 43: predicate.switch_simplify 0.81% : 0.000001s : 9: predicate.tile_eliminate 0.85% : 0.000002s : 9: predicate.transpose_eliminate 1.56% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000003s : 15: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.43% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.58% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.02% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.09% : 0.000006s : 31: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 3: predicate.value_based_eliminate 0.86% : 0.000002s : 6: predicate.virtual_dataset_eliminate 0.64% : 0.000001s : 6: predicate.virtual_output_eliminate 0.37% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000487 8 41.61% : 0.000203s : 3: func_graph_cloner_run.FuncGraphClonerGraph 58.39% : 0.000284s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.489635 196 0.00% : 0.000004s : 1: ForceFp32Comm 1.24% : 0.006085s : 1: add_attr 1.24% : 0.006071s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000072s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.02% : 0.000078s : 1: auto_monad 0.00% : 0.000024s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.10% : 0.000512s : 1: bootstrap 0.01% : 0.000039s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000022s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000030s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.01% : 0.000027s : 1: event_method 0.00% : 0.000020s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000011s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000012s : 1: label_micro_interleaved_index 3.71% : 0.018161s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.15% : 0.000758s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000044s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000021s : 1: opt.transform.mutable_eliminate 0.22% : 0.001066s : 78: opt.transform.opt_a 0.01% : 0.000034s : 1: opt.transform.opt_after_cconv 0.01% : 0.000030s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000095s : 28: opt.transform.opt_b 0.01% : 0.000061s : 2: opt.transform.opt_trans_graph 0.01% : 0.000041s : 4: opt.transform.symbol_engine_opt 0.58% : 0.002839s : 1: opt_a 0.04% : 0.000178s : 1: opt_after_cconv 0.17% : 0.000815s : 1: opt_after_jit_grad 0.05% : 0.000223s : 1: opt_b 4.71% : 0.023050s : 1: optimize 0.00% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000042s : 1: pre_auto_parallel 0.01% : 0.000034s : 1: py_interpret_to_execute 0.00% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000025s : 1: remove_dup_value 0.09% : 0.000448s : 1: renormalize.infer 0.08% : 0.000372s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000012s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000048s : 1: rewriter_after_opt_a 0.02% : 0.000084s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000094s : 1: symbol_engine_optimizer 82.13% : 0.402137s : 1: task_emit 0.02% : 0.000097s : 1: tuple_transform 5.19% : 0.025410s : 1: type_inference 0.02% : 0.000087s : 1: validate TotalTime = 0.513005, [24] [bootstrap]: 0.00054559 [type_inference]: 0.0151048 [event_method]: 6.444e-05 [auto_monad]: 0.00021505 [graph_reusing]: 9.20999e-06 [inline]: 4.52e-06 [add_attr]: 0.00414944, [1] [add_attr_with_inline]: 0.00413536, [1] [Cycle 1]: 0.00010662, [2] [tag_attr]: 4.671e-05 [meta_addattr_fg_expand]: 1.191e-05 [parallel-infer-symbol]: 3.58999e-06 [pre_auto_parallel]: 6.374e-05 [insert-virtual-dataset]: 2.90998e-06 [parallel-infer-symbol-second]: 1.15001e-06 [dataset_repeat_opt]: 2.49999e-06 [pipeline_split]: 1.97001e-06 [optimize]: 0.0222492, [53] [py_interpret_to_execute]: 4.953e-05 [rewriter_before_opt_a]: 0.0001838 [opt_a]: 0.0196005, [3] [Cycle 1]: 0.0152082, [45] [expand_dump_flag]: 6.39001e-06 [switch_simplify]: 8.07e-05 [loop_unroll]: 6.499e-05 [a_1]: 0.00166777 [with_stream_mark]: 3.305e-05 [recompute_prepare]: 2.803e-05 [updatestate_depend_eliminate]: 1.033e-05 [updatestate_assign_eliminate]: 8.40001e-06 [updatestate_loads_eliminate]: 7.18998e-06 [parameter_eliminate]: 3.88999e-06 [a_2]: 0.00025782 [accelerated_algorithm]: 3.859e-05 [shard]: 2.17001e-06 [meta_shard_fg_expand]: 5.22e-06 [shard_inline]: 1.652e-05 [merge_send_recv]: 1.95e-05 [auto_parallel]: 1.495e-05 [parallel]: 2.181e-05 [flash_sp]: 1.503e-05 [merge_comm]: 1.061e-05 [allreduce_fusion]: 9.44998e-06 [matmul_add_comm_reduction]: 3.704e-05 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 2.23e-05 [virtual_dataset]: 1.641e-05 [get_grad_eliminate_]: 1.586e-05 [virtual_output]: 1.606e-05 [merge_forward]: 1.071e-05 [cell_reuse_recompute_pass]: 2.01e-06 [offload_activation]: 2.002e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.346e-05 [merge_recompute_call_nodes]: 1.80001e-06 [before_grad]: 3.009e-05 [set_forward_comm_id_for_comm_node_pass]: 1.256e-05 [meta_fg_expand]: 0.00214889 [flash_sp_send_recv_attached]: 4.96997e-06 [receive_attached]: 2.59999e-06 [after_resolve]: 8.398e-05 [a_after_grad]: 0.00010468 [renormalize]: 0.00921618 [add_forward_monad_depend]: 1.499e-05 [auto_monad_grad]: 7.2e-06 [auto_monad_eliminator]: 6.004e-05 [cse]: 0.00023242 [a_3]: 0.00037058 [Cycle 2]: 0.00353971, [45] [expand_dump_flag]: 2.97002e-06 [switch_simplify]: 6.523e-05 [loop_unroll]: 4.426e-05 [a_1]: 0.00149802 [with_stream_mark]: 2.061e-05 [recompute_prepare]: 1.251e-05 [updatestate_depend_eliminate]: 5.51e-06 [updatestate_assign_eliminate]: 4.15999e-06 [updatestate_loads_eliminate]: 3.46001e-06 [parameter_eliminate]: 2.68e-06 [a_2]: 9.514e-05 [accelerated_algorithm]: 1.248e-05 [shard]: 2.13998e-06 [meta_shard_fg_expand]: 3.05998e-06 [shard_inline]: 7.07002e-06 [merge_send_recv]: 9.66003e-06 [auto_parallel]: 1.116e-05 [parallel]: 1.129e-05 [flash_sp]: 4.68999e-06 [merge_comm]: 4.28001e-06 [allreduce_fusion]: 4.13001e-06 [matmul_add_comm_reduction]: 1.159e-05 [allreduce_slice_to_reducescatter]: 9.09989e-07 [virtual_shard_identity]: 9.00999e-06 [virtual_dataset]: 7.18998e-06 [get_grad_eliminate_]: 6.49999e-06 [virtual_output]: 6.39001e-06 [merge_forward]: 4.57e-06 [cell_reuse_recompute_pass]: 1.64998e-06 [offload_activation]: 1.193e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.525e-05 [merge_recompute_call_nodes]: 1.94e-06 [before_grad]: 1.442e-05 [set_forward_comm_id_for_comm_node_pass]: 5.04e-06 [meta_fg_expand]: 0.00013457 [flash_sp_send_recv_attached]: 2.16998e-06 [receive_attached]: 2.53e-06 [after_resolve]: 1.622e-05 [a_after_grad]: 1.14e-05 [renormalize]: 0.00104102 [add_forward_monad_depend]: 7.53e-06 [auto_monad_grad]: 2.49999e-06 [auto_monad_eliminator]: 1.653e-05 [cse]: 3.674e-05 [a_3]: 5.537e-05 [Cycle 3]: 0.00083111, [45] [expand_dump_flag]: 1.84998e-06 [switch_simplify]: 8.55001e-06 [loop_unroll]: 7.03e-06 [a_1]: 0.0001639 [with_stream_mark]: 1.041e-05 [recompute_prepare]: 7.67998e-06 [updatestate_depend_eliminate]: 4.33001e-06 [updatestate_assign_eliminate]: 3.15002e-06 [updatestate_loads_eliminate]: 3.48e-06 [parameter_eliminate]: 1.15999e-06 [a_2]: 9.097e-05 [accelerated_algorithm]: 1.149e-05 [shard]: 1.42e-06 [meta_shard_fg_expand]: 2.37001e-06 [shard_inline]: 7.06001e-06 [merge_send_recv]: 6.83e-06 [auto_parallel]: 7.36001e-06 [parallel]: 7.53999e-06 [flash_sp]: 1.21997e-06 [merge_comm]: 4.15e-06 [allreduce_fusion]: 4.01001e-06 [matmul_add_comm_reduction]: 7.93001e-06 [allreduce_slice_to_reducescatter]: 4.30009e-07 [virtual_shard_identity]: 9.04e-06 [virtual_dataset]: 6.84001e-06 [get_grad_eliminate_]: 6.55002e-06 [virtual_output]: 6.45997e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.96e-06 [offload_activation]: 9.66998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.549e-05 [merge_recompute_call_nodes]: 1.02998e-06 [before_grad]: 1.306e-05 [set_forward_comm_id_for_comm_node_pass]: 4.72998e-06 [meta_fg_expand]: 2.89001e-06 [flash_sp_send_recv_attached]: 1.07e-06 [receive_attached]: 1.40999e-06 [after_resolve]: 1.087e-05 [a_after_grad]: 1.046e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.84e-06 [auto_monad_grad]: 1.84e-06 [auto_monad_eliminator]: 8.98002e-06 [cse]: 2.162e-05 [a_3]: 4.086e-05 [py_interpret_to_execute_after_opt_a]: 1.659e-05 [slice_cell_reuse_recomputed_activation]: 2.06998e-06 [rewriter_after_opt_a]: 4.673e-05 [convert_after_rewriter]: 7.7e-06 [order_py_execute_after_rewriter]: 5.35999e-06 [mutable_eliminate]: 0.00075888 [opt_b]: 0.00024135, [1] [Cycle 1]: 0.00023222, [7] [b_1]: 0.00014519 [b_2]: 8.98002e-06 [updatestate_depend_eliminate]: 7.26001e-06 [updatestate_assign_eliminate]: 3.20002e-06 [updatestate_loads_eliminate]: 4.18001e-06 [renormalize]: 7.2e-07 [cse]: 2.527e-05 [optimize_parallel_all_gather_comm]: 2.019e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 4.794e-05 [loop_unroll]: 0.00048086 [opt_after_cconv]: 0.00012069, [1] [Cycle 1]: 0.00011417, [7] [c_1]: 3.509e-05 [parameter_eliminate]: 3.63e-06 [updatestate_depend_eliminate]: 6.37001e-06 [updatestate_assign_eliminate]: 3.51999e-06 [updatestate_loads_eliminate]: 3.51999e-06 [cse]: 2.415e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 1.781e-05 [tuple_transform]: 8.644e-05, [1] [Cycle 1]: 8.158e-05, [4] [d_1]: 5.072e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 8.18999e-06 [partial_unused_args_eliminate]: 2.05002e-06 [add_recomputation]: 5.805e-05 [cse_after_recomputation]: 2.879e-05, [1] [Cycle 1]: 2.348e-05, [1] [cse]: 1.721e-05 [environ_conv]: 1.145e-05 [swap_dp_allreduce_reducescatter]: 6.02001e-06 [bias_add_comm_swap]: 3.08e-06 [label_micro_interleaved_index]: 4.92999e-06 [label_fine_grained_interleaved_index]: 2.71999e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.66e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 1.11002e-06 [remove_cast_before_assign_add]: 1.17999e-06 [full_micro_interleaved_order_control]: 2.55002e-06 [reorder_send_recv_between_fp_bp]: 2.79999e-06 [comm_op_add_attrs]: 1.25999e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.61002e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.37e-06 [overlap_opt_shard_grad_in_pipeline]: 2.09999e-06 [control_data_broadcast_order]: 1.577e-05 [grouped_pairwise_exchange_alltoall]: 1.76e-06 [offloading_packed_experts]: 5.40999e-06 [overlap_recompute_and_grad_model_parallel]: 5.74e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.60001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 5.38002e-06 [overlap_grad_flash_sp]: 2.582e-05 [begin_end_overlap_inline]: 6.69999e-07 [split_matmul_comm_elemetwise]: 2.38002e-06 [split_layernorm_comm]: 2.28002e-06 [handle_group_info]: 1.16002e-06 [symbol_engine_optimizer]: 9.503e-05, [1] [Cycle 1]: 9.04e-05, [6] [build]: 1.097e-05 [elim_shapecalc]: 1.243e-05 [elim_not_effective]: 1.546e-05 [opt_reshape]: 8.79e-06 [fold_const_symbol]: 1.271e-05 [renormalize]: 2.10013e-07 [detach_backward]: 2.61e-06 [pipeline_parallel_scheduler]: 1.91998e-06 [auto_monad_reorder]: 2.124e-05 [get_jit_bprop_graph]: 2.50997e-06 [rewriter_after_jit_bprop_graph]: 4.75999e-06 [opt_after_jit_grad]: 0.00052874 [validate]: 5.755e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.469681 [execute]: 9.12999e-06 Sums bootstrap : 0.000546s : 0.11% type_inference : 0.015105s : 2.98% event_method : 0.000064s : 0.01% auto_monad : 0.000215s : 0.04% graph_reusing : 0.000009s : 0.00% inline : 0.000005s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000047s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000012s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000064s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000050s : 0.01% optimize.rewriter_before_opt_a : 0.000184s : 0.04% optimize.opt_a.expand_dump_flag : 0.000011s : 0.00% optimize.opt_a.switch_simplify : 0.000154s : 0.03% optimize.opt_a.loop_unroll : 0.000116s : 0.02% optimize.opt_a.a_1 : 0.003330s : 0.66% optimize.opt_a.with_stream_mark : 0.000064s : 0.01% optimize.opt_a.recompute_prepare : 0.000048s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.00% optimize.opt_a.parameter_eliminate : 0.000008s : 0.00% optimize.opt_a.a_2 : 0.000444s : 0.09% optimize.opt_a.accelerated_algorithm : 0.000063s : 0.01% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000011s : 0.00% optimize.opt_a.shard_inline : 0.000031s : 0.01% optimize.opt_a.merge_send_recv : 0.000036s : 0.01% optimize.opt_a.auto_parallel : 0.000033s : 0.01% optimize.opt_a.parallel : 0.000041s : 0.01% optimize.opt_a.flash_sp : 0.000021s : 0.00% optimize.opt_a.merge_comm : 0.000019s : 0.00% optimize.opt_a.allreduce_fusion : 0.000018s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000057s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.01% optimize.opt_a.virtual_dataset : 0.000030s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000029s : 0.01% optimize.opt_a.virtual_output : 0.000029s : 0.01% optimize.opt_a.merge_forward : 0.000019s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000042s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000064s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000058s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.00% optimize.opt_a.meta_fg_expand : 0.002286s : 0.45% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000111s : 0.02% optimize.opt_a.a_after_grad : 0.000127s : 0.02% optimize.opt_a.renormalize : 0.010257s : 2.02% optimize.opt_a.add_forward_monad_depend : 0.000024s : 0.00% optimize.opt_a.auto_monad_grad : 0.000012s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000086s : 0.02% optimize.opt_a.cse : 0.000291s : 0.06% optimize.opt_a.a_3 : 0.000467s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000017s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.01% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000759s : 0.15% optimize.opt_b.b_1 : 0.000145s : 0.03% optimize.opt_b.b_2 : 0.000009s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000025s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000048s : 0.01% optimize.loop_unroll : 0.000481s : 0.09% optimize.opt_after_cconv.c_1 : 0.000035s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000024s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000018s : 0.00% optimize.tuple_transform.d_1 : 0.000051s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.01% optimize.cse_after_recomputation.cse : 0.000017s : 0.00% optimize.environ_conv : 0.000011s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.00% get_jit_bprop_graph : 0.000003s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000529s : 0.10% validate : 0.000058s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.469681s : 92.60% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000978 161 7.47% : 0.000073s : 8: substitution.arithmetic_simplify 0.25% : 0.000002s : 3: substitution.elim_not_effective 0.58% : 0.000006s : 5: substitution.float_depend_g_call 0.44% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.23% : 0.000002s : 3: substitution.fold_const_symbol 0.70% : 0.000007s : 4: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000003s : 2: substitution.incorporate_call_switch 60.83% : 0.000595s : 17: substitution.inline 2.79% : 0.000027s : 2: substitution.inline_without_move 1.26% : 0.000012s : 15: substitution.j_node_and_user_rematch 2.08% : 0.000020s : 3: substitution.less_batch_normalization 1.33% : 0.000013s : 7: substitution.minmaximum_grad 0.81% : 0.000008s : 5: substitution.partial_eliminate 1.42% : 0.000014s : 15: substitution.remove_not_recompute_node 3.40% : 0.000033s : 10: substitution.replace_applicator 1.27% : 0.000012s : 10: substitution.replace_old_param 0.42% : 0.000004s : 1: substitution.set_cell_output_no_recompute 2.73% : 0.000027s : 7: substitution.tuple_list_convert_item_index_to_positive 1.24% : 0.000012s : 7: substitution.tuple_list_get_item_const_eliminator 1.64% : 0.000016s : 7: substitution.tuple_list_get_item_depend_reorder 6.82% : 0.000067s : 19: substitution.tuple_list_get_item_eliminator 1.67% : 0.000016s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.014983 2 86.35% : 0.012937s : 1: type_inference.infer 13.65% : 0.002046s : 1: type_inference.specialize ------[replace.] 0.000244 27 65.25% : 0.000159s : 17: replace.inline 34.75% : 0.000085s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000617 27 94.67% : 0.000584s : 17: match.inline 5.33% : 0.000033s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000746 4248 1.08% : 0.000008s : 53: predicate.accumulaten_eliminater 0.30% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.44% : 0.000003s : 21: predicate.addn_check_dump 1.21% : 0.000009s : 53: predicate.addn_zero_filter 1.07% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 74: predicate.arithmetic_simplify 1.18% : 0.000009s : 53: predicate.cast_eliminate 1.10% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.05% : 0.000000s : 4: predicate.const_output_eliminate 0.43% : 0.000003s : 21: predicate.depend_value_elim 1.18% : 0.000009s : 53: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 53: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.06% : 0.000000s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000009s : 57: predicate.environ_add_const_eliminate 1.12% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.16% : 0.000009s : 57: predicate.environ_get_depend_swap 1.63% : 0.000012s : 78: predicate.environ_get_eliminate 1.13% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.75% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.59% : 0.000019s : 80: predicate.float_depend_g_call 0.47% : 0.000004s : 21: predicate.float_environ_get_switch 0.53% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.05% : 0.000000s : 4: predicate.fold_const_symbol 0.51% : 0.000004s : 21: predicate.get_grad_eliminate 0.09% : 0.000001s : 4: predicate.graph_param_transform 0.49% : 0.000004s : 21: predicate.incorporate_call 0.42% : 0.000003s : 21: predicate.incorporate_call_switch 5.84% : 0.000044s : 183: predicate.inline 1.49% : 0.000011s : 45: predicate.inline_without_move 0.31% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 21: predicate.less_batch_normalization 1.56% : 0.000012s : 71: predicate.list_to_tuple_eliminator_ 2.55% : 0.000019s : 124: predicate.load_eliminater 0.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.45% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.45% : 0.000003s : 21: predicate.merge_addn 1.11% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.21% : 0.000009s : 50: predicate.mini_step_allgather_replace 1.08% : 0.000008s : 53: predicate.minmaximum_grad 0.32% : 0.000002s : 4: predicate.mutable_eliminate 0.16% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 2.23% : 0.000017s : 80: predicate.partial_defer_inline 1.66% : 0.000012s : 67: predicate.partial_eliminate 1.11% : 0.000008s : 53: predicate.print_const_string_wrapper 0.49% : 0.000004s : 21: predicate.reduce_all_const_elim 1.50% : 0.000011s : 53: predicate.reduce_eliminate 2.57% : 0.000019s : 124: predicate.redundant_stop_gradient_eliminater 0.30% : 0.000002s : 21: predicate.remove_not_recompute_node 1.89% : 0.000014s : 113: predicate.replace_applicator 0.70% : 0.000005s : 45: predicate.replace_old_param 0.10% : 0.000001s : 4: predicate.reset_defer_inline 1.24% : 0.000009s : 53: predicate.reshape_eliminate 1.17% : 0.000009s : 50: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 4: predicate.row_tensor_eliminate 1.31% : 0.000010s : 50: predicate.same_eliminate 0.31% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.59% : 0.000004s : 21: predicate.shard_identity_eliminate 0.24% : 0.000002s : 8: predicate.special_op_eliminate 0.58% : 0.000004s : 21: predicate.specialize_transform 1.35% : 0.000010s : 50: predicate.split_environ_get_set_with_tuple_value 1.27% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.10% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.90% : 0.000014s : 80: predicate.switch_defer_inline 2.92% : 0.000022s : 130: predicate.switch_layer_defer_inline 5.08% : 0.000038s : 218: predicate.switch_simplify 1.19% : 0.000009s : 53: predicate.tile_eliminate 1.09% : 0.000008s : 53: predicate.transpose_eliminate 1.47% : 0.000011s : 61: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000012s : 61: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 61: predicate.tuple_list_get_item_depend_reorder 3.09% : 0.000023s : 92: predicate.tuple_list_get_item_eliminator 1.51% : 0.000011s : 61: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000015s : 82: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 71: predicate.tuple_to_list_eliminator_ 2.49% : 0.000019s : 124: predicate.updatestate_pure_node_eliminater 3.01% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 4: predicate.value_based_eliminate 0.49% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.50% : 0.000004s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002393 36 58.87% : 0.001409s : 15: func_graph_cloner_run.FuncGraphClonerGraph 41.13% : 0.000984s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.554818 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.75% : 0.004156s : 1: add_attr 0.75% : 0.004140s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000226s : 1: auto_monad 0.00% : 0.000025s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.11% : 0.000585s : 1: bootstrap 0.01% : 0.000053s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000019s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000032s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000015s : 1: environ_conv 0.01% : 0.000074s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000014s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000008s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.09% : 0.000490s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.14% : 0.000770s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.00% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.90% : 0.004972s : 117: opt.transform.opt_a 0.01% : 0.000033s : 1: opt.transform.opt_after_cconv 0.00% : 0.000028s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000121s : 28: opt.transform.opt_b 0.01% : 0.000056s : 2: opt.transform.opt_trans_graph 0.01% : 0.000045s : 4: opt.transform.symbol_engine_opt 3.53% : 0.019604s : 1: opt_a 0.02% : 0.000124s : 1: opt_after_cconv 0.10% : 0.000541s : 1: opt_after_jit_grad 0.04% : 0.000245s : 1: opt_b 4.01% : 0.022255s : 1: optimize 0.00% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000014s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000070s : 1: pre_auto_parallel 0.01% : 0.000054s : 1: py_interpret_to_execute 0.00% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000022s : 1: remove_dup_value 1.47% : 0.008146s : 2: renormalize.infer 0.38% : 0.002086s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000051s : 1: rewriter_after_opt_a 0.03% : 0.000189s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000098s : 1: symbol_engine_optimizer 84.66% : 0.469700s : 1: task_emit 0.02% : 0.000090s : 1: tuple_transform 2.73% : 0.015139s : 1: type_inference 0.02% : 0.000095s : 1: validate TotalTime = 0.314388, [24] [bootstrap]: 0.00050361 [type_inference]: 0.0143923 [event_method]: 1.816e-05 [auto_monad]: 7.618e-05 [graph_reusing]: 6.93998e-06 [inline]: 2.89999e-06 [add_attr]: 0.00427822, [1] [add_attr_with_inline]: 0.00426304, [1] [Cycle 1]: 8.192e-05, [2] [tag_attr]: 2.112e-05 [meta_addattr_fg_expand]: 4.36002e-06 [parallel-infer-symbol]: 4.55999e-06 [pre_auto_parallel]: 0.0162202 [insert-virtual-dataset]: 3.75998e-06 [parallel-infer-symbol-second]: 2.88998e-06 [dataset_repeat_opt]: 3.48e-06 [pipeline_split]: 2.06e-06 [optimize]: 0.0113195, [53] [py_interpret_to_execute]: 4.047e-05 [rewriter_before_opt_a]: 7.313e-05 [opt_a]: 0.00521053, [2] [Cycle 1]: 0.00448576, [45] [expand_dump_flag]: 3.83999e-06 [switch_simplify]: 3.289e-05 [loop_unroll]: 1.918e-05 [a_1]: 0.00043529 [with_stream_mark]: 2.23e-05 [recompute_prepare]: 9.24998e-06 [updatestate_depend_eliminate]: 4.08001e-06 [updatestate_assign_eliminate]: 3.81001e-06 [updatestate_loads_eliminate]: 3.18e-06 [parameter_eliminate]: 2.31e-06 [a_2]: 8.5e-05 [accelerated_algorithm]: 6.86001e-06 [shard]: 2.29999e-06 [meta_shard_fg_expand]: 2.66e-06 [shard_inline]: 6.63e-06 [merge_send_recv]: 9.85002e-06 [auto_parallel]: 7.93999e-06 [parallel]: 2.129e-05 [flash_sp]: 1.37e-05 [merge_comm]: 4.37e-06 [allreduce_fusion]: 3.9e-06 [matmul_add_comm_reduction]: 1.066e-05 [allreduce_slice_to_reducescatter]: 9.20001e-07 [virtual_shard_identity]: 9.05999e-06 [virtual_dataset]: 6.64001e-06 [get_grad_eliminate_]: 6.06998e-06 [virtual_output]: 6.30002e-06 [merge_forward]: 4.62e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.282e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.292e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 1.131e-05 [set_forward_comm_id_for_comm_node_pass]: 4.19002e-06 [meta_fg_expand]: 3.83999e-06 [flash_sp_send_recv_attached]: 3.25e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.082e-05 [a_after_grad]: 9.07999e-06 [renormalize]: 0.0032628 [add_forward_monad_depend]: 7.25003e-06 [auto_monad_grad]: 2.96001e-06 [auto_monad_eliminator]: 2.035e-05 [cse]: 3.6e-05 [a_3]: 5.357e-05 [Cycle 2]: 0.00070447, [45] [expand_dump_flag]: 2.20002e-06 [switch_simplify]: 9.15001e-06 [loop_unroll]: 6.44001e-06 [a_1]: 0.00013476 [with_stream_mark]: 1.363e-05 [recompute_prepare]: 6.66999e-06 [updatestate_depend_eliminate]: 3.71999e-06 [updatestate_assign_eliminate]: 3.61999e-06 [updatestate_loads_eliminate]: 3.75998e-06 [parameter_eliminate]: 1.77999e-06 [a_2]: 8.124e-05 [accelerated_algorithm]: 6.29999e-06 [shard]: 1.44e-06 [meta_shard_fg_expand]: 2.09e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 7.4e-06 [auto_parallel]: 5.86998e-06 [parallel]: 7.14001e-06 [flash_sp]: 3.65998e-06 [merge_comm]: 3.73999e-06 [allreduce_fusion]: 3.58e-06 [matmul_add_comm_reduction]: 9.54e-06 [allreduce_slice_to_reducescatter]: 5.19998e-07 [virtual_shard_identity]: 7.24001e-06 [virtual_dataset]: 5.66e-06 [get_grad_eliminate_]: 5.63997e-06 [virtual_output]: 5.39998e-06 [merge_forward]: 3.74002e-06 [cell_reuse_recompute_pass]: 1.99999e-06 [offload_activation]: 1.02e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.286e-05 [merge_recompute_call_nodes]: 1.19e-06 [before_grad]: 9.42001e-06 [set_forward_comm_id_for_comm_node_pass]: 4.17e-06 [meta_fg_expand]: 2.66e-06 [flash_sp_send_recv_attached]: 1.18001e-06 [receive_attached]: 1.85001e-06 [after_resolve]: 1.263e-05 [a_after_grad]: 8.89e-06 [renormalize]: 7.00238e-08 [add_forward_monad_depend]: 1.42e-06 [auto_monad_grad]: 1.62001e-06 [auto_monad_eliminator]: 7.56999e-06 [cse]: 1.677e-05 [a_3]: 3.508e-05 [py_interpret_to_execute_after_opt_a]: 1.361e-05 [slice_cell_reuse_recomputed_activation]: 2.68998e-06 [rewriter_after_opt_a]: 4.207e-05 [convert_after_rewriter]: 7.69002e-06 [order_py_execute_after_rewriter]: 5.96998e-06 [mutable_eliminate]: 0.0008142 [opt_b]: 0.003571, [1] [Cycle 1]: 0.00356132, [7] [b_1]: 0.00342122 [b_2]: 1.425e-05 [updatestate_depend_eliminate]: 1.287e-05 [updatestate_assign_eliminate]: 2.88e-06 [updatestate_loads_eliminate]: 3.03998e-06 [renormalize]: 1.42e-06 [cse]: 3.905e-05 [optimize_parallel_all_gather_comm]: 2.617e-05 [overlap_param_gather]: 1.99999e-06 [cconv]: 3.759e-05 [loop_unroll]: 0.00071192 [opt_after_cconv]: 0.00012081, [1] [Cycle 1]: 0.0001111, [7] [c_1]: 2.872e-05 [parameter_eliminate]: 5.56998e-06 [updatestate_depend_eliminate]: 7.02997e-06 [updatestate_assign_eliminate]: 2.85998e-06 [updatestate_loads_eliminate]: 2.79999e-06 [cse]: 2.549e-05 [renormalize]: 8.60018e-07 [remove_dup_value]: 1.782e-05 [tuple_transform]: 8.149e-05, [1] [Cycle 1]: 7.634e-05, [4] [d_1]: 4.687e-05 [none_parameter_eliminate]: 2.16e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.91001e-06 [partial_unused_args_eliminate]: 1.81998e-06 [add_recomputation]: 5.806e-05 [cse_after_recomputation]: 2.584e-05, [1] [Cycle 1]: 1.984e-05, [1] [cse]: 1.391e-05 [environ_conv]: 6.56e-06 [swap_dp_allreduce_reducescatter]: 5.69e-06 [bias_add_comm_swap]: 3.01001e-06 [label_micro_interleaved_index]: 6.12001e-06 [label_fine_grained_interleaved_index]: 2.76999e-06 [merge_cast_opt]: 1.63997e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.76e-06 [assign_add_opt]: 1.45999e-06 [ForceFp32Comm]: 1.24e-06 [remove_cast_before_assign_add]: 1.55001e-06 [full_micro_interleaved_order_control]: 2.69999e-06 [reorder_send_recv_between_fp_bp]: 3.04999e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 1.14e-06 [interleave_split_concat_branches]: 1.57999e-06 [interleave_parallel_branches]: 1.32999e-06 [overlap_opt_shard_in_pipeline]: 1.25001e-06 [overlap_opt_shard_grad_in_pipeline]: 2.16998e-06 [control_data_broadcast_order]: 1.576e-05 [grouped_pairwise_exchange_alltoall]: 1.64998e-06 [offloading_packed_experts]: 4.70001e-06 [overlap_recompute_and_grad_model_parallel]: 5.46998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.51002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.92001e-06 [overlap_recompute_comm]: 2.44999e-06 [overlap_grad_ring_attention]: 4.94e-06 [overlap_grad_flash_sp]: 2.31e-05 [begin_end_overlap_inline]: 6.30011e-07 [split_matmul_comm_elemetwise]: 2.72001e-06 [split_layernorm_comm]: 2.08002e-06 [handle_group_info]: 1.45999e-06 [symbol_engine_optimizer]: 8.508e-05, [1] [Cycle 1]: 7.907e-05, [6] [build]: 3.91999e-06 [elim_shapecalc]: 1.078e-05 [elim_not_effective]: 1.376e-05 [opt_reshape]: 7.16999e-06 [fold_const_symbol]: 1.107e-05 [renormalize]: 2.3999e-07 [detach_backward]: 2.27001e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 1.82e-05 [get_jit_bprop_graph]: 2.53998e-06 [rewriter_after_jit_bprop_graph]: 7.07002e-06 [opt_after_jit_grad]: 0.00064694 [validate]: 4.803e-05 [backend_pass]: 1.00999e-06 [task_emit]: 0.266474 [execute]: 1.023e-05 Sums bootstrap : 0.000504s : 0.16% type_inference : 0.014392s : 4.66% event_method : 0.000018s : 0.01% auto_monad : 0.000076s : 0.02% graph_reusing : 0.000007s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000021s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000005s : 0.00% pre_auto_parallel : 0.016220s : 5.25% insert-virtual-dataset : 0.000004s : 0.00% parallel-infer-symbol-second : 0.000003s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.01% optimize.rewriter_before_opt_a : 0.000073s : 0.02% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000042s : 0.01% optimize.opt_a.loop_unroll : 0.000026s : 0.01% optimize.opt_a.a_1 : 0.000570s : 0.18% optimize.opt_a.with_stream_mark : 0.000036s : 0.01% optimize.opt_a.recompute_prepare : 0.000016s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000166s : 0.05% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000005s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000017s : 0.01% optimize.opt_a.auto_parallel : 0.000014s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.01% optimize.opt_a.flash_sp : 0.000017s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000020s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000023s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000026s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000006s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000023s : 0.01% optimize.opt_a.a_after_grad : 0.000018s : 0.01% optimize.opt_a.renormalize : 0.003263s : 1.06% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.00% optimize.opt_a.auto_monad_grad : 0.000005s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000028s : 0.01% optimize.opt_a.cse : 0.000053s : 0.02% optimize.opt_a.a_3 : 0.000089s : 0.03% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000042s : 0.01% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000814s : 0.26% optimize.opt_b.b_1 : 0.003421s : 1.11% optimize.opt_b.b_2 : 0.000014s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000013s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000039s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000038s : 0.01% optimize.loop_unroll : 0.000712s : 0.23% optimize.opt_after_cconv.c_1 : 0.000029s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000025s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000018s : 0.01% optimize.tuple_transform.d_1 : 0.000047s : 0.02% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.02% optimize.cse_after_recomputation.cse : 0.000014s : 0.00% optimize.environ_conv : 0.000007s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.01% get_jit_bprop_graph : 0.000003s : 0.00% rewriter_after_jit_bprop_graph : 0.000007s : 0.00% opt_after_jit_grad : 0.000647s : 0.21% validate : 0.000048s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.266474s : 86.29% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000203 24 19.85% : 0.000040s : 4: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.85% : 0.000002s : 2: substitution.fold_const_symbol 3.08% : 0.000006s : 3: substitution.graph_param_transform 68.24% : 0.000138s : 3: substitution.inline 1.92% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.66% : 0.000005s : 4: substitution.remove_not_recompute_node 2.35% : 0.000005s : 2: substitution.replace_old_param ------[type_inference.] 0.014316 2 95.10% : 0.013613s : 1: type_inference.infer 4.90% : 0.000702s : 1: type_inference.specialize ------[replace.] 0.000034 3 100.00% : 0.000034s : 3: replace.inline ------[match.] 0.000136 3 100.00% : 0.000136s : 3: match.inline ------[predicate.] 0.000175 815 0.86% : 0.000002s : 8: predicate.accumulaten_eliminater 1.48% : 0.000003s : 3: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 6: predicate.addn_check_dump 0.94% : 0.000002s : 8: predicate.addn_zero_filter 0.76% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.42% : 0.000004s : 14: predicate.arithmetic_simplify 0.98% : 0.000002s : 8: predicate.cast_eliminate 0.69% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.41% : 0.000001s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.79% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 1.01% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.31% : 0.000001s : 3: predicate.elim_not_effective 0.45% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.42% : 0.000002s : 11: predicate.environ_add_const_eliminate 0.98% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.94% : 0.000003s : 11: predicate.environ_get_depend_swap 1.60% : 0.000003s : 17: predicate.environ_get_eliminate 0.97% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.05% : 0.000002s : 11: predicate.exchange_switch_depend_value 1.94% : 0.000003s : 11: predicate.float_depend_g_call 0.98% : 0.000002s : 6: predicate.float_environ_get_switch 0.85% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 3: predicate.fold_const_symbol 0.83% : 0.000001s : 6: predicate.get_grad_eliminate 0.33% : 0.000001s : 3: predicate.graph_param_transform 0.65% : 0.000001s : 6: predicate.incorporate_call 0.55% : 0.000001s : 6: predicate.incorporate_call_switch 6.16% : 0.000011s : 37: predicate.inline 0.86% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.92% : 0.000002s : 6: predicate.less_batch_normalization 1.55% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.04% : 0.000004s : 22: predicate.load_eliminater 1.40% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.76% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.79% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.59% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 8: predicate.minmaximum_grad 1.11% : 0.000002s : 3: predicate.mutable_eliminate 0.46% : 0.000001s : 3: predicate.opt_reshape 0.44% : 0.000001s : 3: predicate.parallel_virtual_node 1.29% : 0.000002s : 11: predicate.partial_defer_inline 1.14% : 0.000002s : 11: predicate.partial_eliminate 0.79% : 0.000001s : 8: predicate.print_const_string_wrapper 0.73% : 0.000001s : 6: predicate.reduce_all_const_elim 1.38% : 0.000002s : 8: predicate.reduce_eliminate 2.09% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 6: predicate.remove_not_recompute_node 1.27% : 0.000002s : 14: predicate.replace_applicator 0.72% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000001s : 3: predicate.reset_defer_inline 0.96% : 0.000002s : 8: predicate.reshape_eliminate 0.83% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.60% : 0.000001s : 3: predicate.row_tensor_eliminate 0.80% : 0.000001s : 6: predicate.same_eliminate 0.43% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.95% : 0.000002s : 6: predicate.shard_identity_eliminate 0.96% : 0.000002s : 6: predicate.special_op_eliminate 0.85% : 0.000001s : 6: predicate.specialize_transform 1.13% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.74% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.10% : 0.000002s : 11: predicate.switch_defer_inline 1.82% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.40% : 0.000008s : 38: predicate.switch_simplify 0.81% : 0.000001s : 8: predicate.tile_eliminate 0.93% : 0.000002s : 8: predicate.transpose_eliminate 1.64% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.67% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000006s : 20: predicate.tuple_list_get_item_eliminator 1.59% : 0.000003s : 14: predicate.tuple_list_get_set_item_eliminator 2.48% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.34% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 1.96% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.82% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.83% : 0.000001s : 3: predicate.value_based_eliminate 0.71% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.61% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000495 7 33.26% : 0.000165s : 2: func_graph_cloner_run.FuncGraphClonerGraph 66.74% : 0.000331s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.334389 196 0.00% : 0.000004s : 1: ForceFp32Comm 1.28% : 0.004286s : 1: add_attr 1.28% : 0.004268s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.02% : 0.000063s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.02% : 0.000083s : 1: auto_monad 0.01% : 0.000022s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.16% : 0.000545s : 1: bootstrap 0.01% : 0.000042s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000019s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000029s : 1: cse_after_recomputation 0.00% : 0.000007s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000010s : 1: environ_conv 0.01% : 0.000027s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000011s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000012s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000005s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.22% : 0.000726s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.25% : 0.000824s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000016s : 1: opt.transform.mutable_eliminate 0.29% : 0.000975s : 78: opt.transform.opt_a 0.01% : 0.000027s : 1: opt.transform.opt_after_cconv 0.01% : 0.000029s : 1: opt.transform.opt_after_jit_grad 0.03% : 0.000114s : 28: opt.transform.opt_b 0.02% : 0.000050s : 2: opt.transform.opt_trans_graph 0.01% : 0.000038s : 4: opt.transform.symbol_engine_opt 1.56% : 0.005214s : 1: opt_a 0.04% : 0.000125s : 1: opt_after_cconv 0.20% : 0.000662s : 1: opt_after_jit_grad 1.07% : 0.003575s : 1: opt_b 3.39% : 0.011326s : 1: optimize 0.01% : 0.000030s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000006s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000006s : 1: pipeline_split 4.86% : 0.016264s : 1: pre_auto_parallel 0.01% : 0.000046s : 1: py_interpret_to_execute 0.01% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000005s : 1: remove_cast_before_assign_add 0.01% : 0.000022s : 1: remove_dup_value 0.86% : 0.002865s : 1: renormalize.infer 0.12% : 0.000386s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000046s : 1: rewriter_after_opt_a 0.02% : 0.000078s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.03% : 0.000088s : 1: symbol_engine_optimizer 79.70% : 0.266501s : 1: task_emit 0.03% : 0.000084s : 1: tuple_transform 4.32% : 0.014429s : 1: type_inference 0.03% : 0.000085s : 1: validate TotalTime = 0.21254, [24] [bootstrap]: 0.00045383 [type_inference]: 0.0126346 [event_method]: 5.402e-05 [auto_monad]: 0.00013669 [graph_reusing]: 9.92999e-06 [inline]: 2.39999e-06 [add_attr]: 0.00383317, [1] [add_attr_with_inline]: 0.00382223, [1] [Cycle 1]: 8.633e-05, [2] [tag_attr]: 3.619e-05 [meta_addattr_fg_expand]: 9.90002e-06 [parallel-infer-symbol]: 3.78001e-06 [pre_auto_parallel]: 5.218e-05 [insert-virtual-dataset]: 2.54001e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 2.06998e-06 [pipeline_split]: 1.72999e-06 [optimize]: 0.019667, [53] [py_interpret_to_execute]: 3.902e-05 [rewriter_before_opt_a]: 0.00016834 [opt_a]: 0.0168107, [3] [Cycle 1]: 0.012674, [45] [expand_dump_flag]: 5.29e-06 [switch_simplify]: 7.508e-05 [loop_unroll]: 6.611e-05 [a_1]: 0.00149548 [with_stream_mark]: 3.277e-05 [recompute_prepare]: 2.807e-05 [updatestate_depend_eliminate]: 9.79e-06 [updatestate_assign_eliminate]: 7.82e-06 [updatestate_loads_eliminate]: 7.55998e-06 [parameter_eliminate]: 3.42002e-06 [a_2]: 0.00026162 [accelerated_algorithm]: 3.856e-05 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 5.57999e-06 [shard_inline]: 1.712e-05 [merge_send_recv]: 2.018e-05 [auto_parallel]: 1.457e-05 [parallel]: 2.296e-05 [flash_sp]: 1.397e-05 [merge_comm]: 1.067e-05 [allreduce_fusion]: 1.007e-05 [matmul_add_comm_reduction]: 3.066e-05 [allreduce_slice_to_reducescatter]: 1.17e-06 [virtual_shard_identity]: 2.123e-05 [virtual_dataset]: 1.82e-05 [get_grad_eliminate_]: 1.683e-05 [virtual_output]: 1.725e-05 [merge_forward]: 1.02e-05 [cell_reuse_recompute_pass]: 2.11998e-06 [offload_activation]: 2.057e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.59e-05 [merge_recompute_call_nodes]: 1.67999e-06 [before_grad]: 3.341e-05 [set_forward_comm_id_for_comm_node_pass]: 1.046e-05 [meta_fg_expand]: 0.00187906 [flash_sp_send_recv_attached]: 4.28001e-06 [receive_attached]: 2.78e-06 [after_resolve]: 7.412e-05 [a_after_grad]: 9.601e-05 [renormalize]: 0.00725487 [add_forward_monad_depend]: 1.347e-05 [auto_monad_grad]: 6.74999e-06 [auto_monad_eliminator]: 5.544e-05 [cse]: 0.00021409 [a_3]: 0.00035135 [Cycle 2]: 0.00335075, [45] [expand_dump_flag]: 2.93998e-06 [switch_simplify]: 4.809e-05 [loop_unroll]: 4.283e-05 [a_1]: 0.0015349 [with_stream_mark]: 1.846e-05 [recompute_prepare]: 1.321e-05 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 4.05998e-06 [updatestate_loads_eliminate]: 3.33998e-06 [parameter_eliminate]: 2.00002e-06 [a_2]: 9.568e-05 [accelerated_algorithm]: 1.29e-05 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 2.78e-06 [shard_inline]: 7.75e-06 [merge_send_recv]: 1.045e-05 [auto_parallel]: 1.13e-05 [parallel]: 1.005e-05 [flash_sp]: 4.97e-06 [merge_comm]: 4.57e-06 [allreduce_fusion]: 4.00998e-06 [matmul_add_comm_reduction]: 1.086e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 8.97e-06 [virtual_dataset]: 7.55e-06 [get_grad_eliminate_]: 7.74002e-06 [virtual_output]: 7.38e-06 [merge_forward]: 5.54e-06 [cell_reuse_recompute_pass]: 1.52999e-06 [offload_activation]: 1.11e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.737e-05 [merge_recompute_call_nodes]: 1.79998e-06 [before_grad]: 1.291e-05 [set_forward_comm_id_for_comm_node_pass]: 5.44e-06 [meta_fg_expand]: 9.26e-05 [flash_sp_send_recv_attached]: 2.09e-06 [receive_attached]: 2.61e-06 [after_resolve]: 1.495e-05 [a_after_grad]: 1.156e-05 [renormalize]: 0.00086469 [add_forward_monad_depend]: 5.92999e-06 [auto_monad_grad]: 2.46998e-06 [auto_monad_eliminator]: 1.692e-05 [cse]: 3.427e-05 [a_3]: 5.672e-05 [Cycle 3]: 0.00076435, [45] [expand_dump_flag]: 2.17999e-06 [switch_simplify]: 8.75999e-06 [loop_unroll]: 6.84999e-06 [a_1]: 0.0001583 [with_stream_mark]: 1.09e-05 [recompute_prepare]: 7.78001e-06 [updatestate_depend_eliminate]: 4.33999e-06 [updatestate_assign_eliminate]: 2.99001e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.70001e-06 [a_2]: 8.772e-05 [accelerated_algorithm]: 1.247e-05 [shard]: 1.37999e-06 [meta_shard_fg_expand]: 2.17999e-06 [shard_inline]: 6.84999e-06 [merge_send_recv]: 7.5e-06 [auto_parallel]: 8.26002e-06 [parallel]: 7.43e-06 [flash_sp]: 1.04e-06 [merge_comm]: 4.56002e-06 [allreduce_fusion]: 3.89002e-06 [matmul_add_comm_reduction]: 7.58001e-06 [allreduce_slice_to_reducescatter]: 4.90021e-07 [virtual_shard_identity]: 1.101e-05 [virtual_dataset]: 6.52001e-06 [get_grad_eliminate_]: 6.94999e-06 [virtual_output]: 6.07999e-06 [merge_forward]: 4.13999e-06 [cell_reuse_recompute_pass]: 1.94e-06 [offload_activation]: 9.89999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.461e-05 [merge_recompute_call_nodes]: 1.17e-06 [before_grad]: 1.174e-05 [set_forward_comm_id_for_comm_node_pass]: 4.06001e-06 [meta_fg_expand]: 2.80002e-06 [flash_sp_send_recv_attached]: 1.22999e-06 [receive_attached]: 1.18001e-06 [after_resolve]: 1.104e-05 [a_after_grad]: 9.56e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 2.32001e-06 [auto_monad_grad]: 1.78002e-06 [auto_monad_eliminator]: 9.36998e-06 [cse]: 2.284e-05 [a_3]: 4.087e-05 [py_interpret_to_execute_after_opt_a]: 1.69e-05 [slice_cell_reuse_recomputed_activation]: 2.00002e-06 [rewriter_after_opt_a]: 4.616e-05 [convert_after_rewriter]: 7.61999e-06 [order_py_execute_after_rewriter]: 6.39001e-06 [mutable_eliminate]: 0.0008985 [opt_b]: 0.00024741, [1] [Cycle 1]: 0.00023791, [7] [b_1]: 0.00014354 [b_2]: 8.98002e-06 [updatestate_depend_eliminate]: 7.63999e-06 [updatestate_assign_eliminate]: 4.13001e-06 [updatestate_loads_eliminate]: 3.64002e-06 [renormalize]: 5.19998e-07 [cse]: 2.886e-05 [optimize_parallel_all_gather_comm]: 2.195e-05 [overlap_param_gather]: 2.42001e-06 [cconv]: 2.991e-05 [loop_unroll]: 0.00055558 [opt_after_cconv]: 0.00013293, [1] [Cycle 1]: 0.00012483, [7] [c_1]: 3.819e-05 [parameter_eliminate]: 3.95e-06 [updatestate_depend_eliminate]: 7.55998e-06 [updatestate_assign_eliminate]: 3.83001e-06 [updatestate_loads_eliminate]: 3.41999e-06 [cse]: 2.809e-05 [renormalize]: 9.29984e-07 [remove_dup_value]: 1.817e-05 [tuple_transform]: 8.931e-05, [1] [Cycle 1]: 8.344e-05, [4] [d_1]: 5.147e-05 [none_parameter_eliminate]: 1.90001e-06 [renormalize]: 2.60014e-07 [switch_simplify]: 9.12001e-06 [partial_unused_args_eliminate]: 1.86e-06 [add_recomputation]: 5.7e-05 [cse_after_recomputation]: 2.893e-05, [1] [Cycle 1]: 2.381e-05, [1] [cse]: 1.684e-05 [environ_conv]: 9.89999e-06 [swap_dp_allreduce_reducescatter]: 6.84999e-06 [bias_add_comm_swap]: 3.22002e-06 [label_micro_interleaved_index]: 6.44999e-06 [label_fine_grained_interleaved_index]: 3.18e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.79998e-06 [ForceFp32Comm]: 1.00001e-06 [remove_cast_before_assign_add]: 1.27e-06 [full_micro_interleaved_order_control]: 2.26998e-06 [reorder_send_recv_between_fp_bp]: 2.92002e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 1.28002e-06 [interleave_split_concat_branches]: 1.28002e-06 [interleave_parallel_branches]: 1.49e-06 [overlap_opt_shard_in_pipeline]: 1.61002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.71e-05 [grouped_pairwise_exchange_alltoall]: 1.83002e-06 [offloading_packed_experts]: 4.61002e-06 [overlap_recompute_and_grad_model_parallel]: 5.44998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.76999e-06 [overlap_grad_ring_attention]: 5.74e-06 [overlap_grad_flash_sp]: 2.46e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.34001e-06 [split_layernorm_comm]: 2.11998e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 0.000102, [1] [Cycle 1]: 9.569e-05, [6] [build]: 1.04e-05 [elim_shapecalc]: 1.494e-05 [elim_not_effective]: 1.608e-05 [opt_reshape]: 7.63001e-06 [fold_const_symbol]: 1.258e-05 [renormalize]: 3.00002e-07 [detach_backward]: 3.14999e-06 [pipeline_parallel_scheduler]: 1.70001e-06 [auto_monad_reorder]: 2.438e-05 [get_jit_bprop_graph]: 1.87001e-06 [rewriter_after_jit_bprop_graph]: 5.81998e-06 [opt_after_jit_grad]: 0.00058051 [validate]: 5.309e-05 [backend_pass]: 1.18001e-06 [task_emit]: 0.174743 [execute]: 1.086e-05 Sums bootstrap : 0.000454s : 0.22% type_inference : 0.012635s : 6.10% event_method : 0.000054s : 0.03% auto_monad : 0.000137s : 0.07% graph_reusing : 0.000010s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000052s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.02% optimize.rewriter_before_opt_a : 0.000168s : 0.08% optimize.opt_a.expand_dump_flag : 0.000010s : 0.01% optimize.opt_a.switch_simplify : 0.000132s : 0.06% optimize.opt_a.loop_unroll : 0.000116s : 0.06% optimize.opt_a.a_1 : 0.003189s : 1.54% optimize.opt_a.with_stream_mark : 0.000062s : 0.03% optimize.opt_a.recompute_prepare : 0.000049s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.01% optimize.opt_a.parameter_eliminate : 0.000007s : 0.00% optimize.opt_a.a_2 : 0.000445s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000064s : 0.03% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000011s : 0.01% optimize.opt_a.shard_inline : 0.000032s : 0.02% optimize.opt_a.merge_send_recv : 0.000038s : 0.02% optimize.opt_a.auto_parallel : 0.000034s : 0.02% optimize.opt_a.parallel : 0.000040s : 0.02% optimize.opt_a.flash_sp : 0.000020s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000018s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000049s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000041s : 0.02% optimize.opt_a.virtual_dataset : 0.000032s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.02% optimize.opt_a.virtual_output : 0.000031s : 0.01% optimize.opt_a.merge_forward : 0.000020s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000042s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000068s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000058s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001974s : 0.95% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000100s : 0.05% optimize.opt_a.a_after_grad : 0.000117s : 0.06% optimize.opt_a.renormalize : 0.008120s : 3.92% optimize.opt_a.add_forward_monad_depend : 0.000022s : 0.01% optimize.opt_a.auto_monad_grad : 0.000011s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.04% optimize.opt_a.cse : 0.000271s : 0.13% optimize.opt_a.a_3 : 0.000449s : 0.22% optimize.py_interpret_to_execute_after_opt_a : 0.000017s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.02% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000899s : 0.43% optimize.opt_b.b_1 : 0.000144s : 0.07% optimize.opt_b.b_2 : 0.000009s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000029s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000030s : 0.01% optimize.loop_unroll : 0.000556s : 0.27% optimize.opt_after_cconv.c_1 : 0.000038s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000028s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000018s : 0.01% optimize.tuple_transform.d_1 : 0.000051s : 0.02% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.03% optimize.cse_after_recomputation.cse : 0.000017s : 0.01% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000581s : 0.28% validate : 0.000053s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.174743s : 84.35% execute : 0.000011s : 0.01% Time group info: ------[substitution.] 0.000952 159 7.01% : 0.000067s : 7: substitution.arithmetic_simplify 0.29% : 0.000003s : 3: substitution.elim_not_effective 0.56% : 0.000005s : 5: substitution.float_depend_g_call 0.48% : 0.000005s : 2: substitution.float_tuple_getitem_switch 0.20% : 0.000002s : 3: substitution.fold_const_symbol 0.82% : 0.000008s : 4: substitution.graph_param_transform 0.41% : 0.000004s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.14% : 0.000525s : 17: substitution.inline 2.20% : 0.000021s : 2: substitution.inline_without_move 1.32% : 0.000013s : 15: substitution.j_node_and_user_rematch 2.18% : 0.000021s : 3: substitution.less_batch_normalization 1.45% : 0.000014s : 7: substitution.minmaximum_grad 0.77% : 0.000007s : 5: substitution.partial_eliminate 1.62% : 0.000015s : 15: substitution.remove_not_recompute_node 3.50% : 0.000033s : 10: substitution.replace_applicator 1.16% : 0.000011s : 10: substitution.replace_old_param 0.44% : 0.000004s : 1: substitution.set_cell_output_no_recompute 8.82% : 0.000084s : 7: substitution.tuple_list_convert_item_index_to_positive 1.22% : 0.000012s : 7: substitution.tuple_list_get_item_const_eliminator 1.72% : 0.000016s : 7: substitution.tuple_list_get_item_depend_reorder 6.74% : 0.000064s : 18: substitution.tuple_list_get_item_eliminator 1.69% : 0.000016s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012546 2 87.06% : 0.010922s : 1: type_inference.infer 12.94% : 0.001624s : 1: type_inference.specialize ------[replace.] 0.000219 26 67.16% : 0.000147s : 17: replace.inline 32.84% : 0.000072s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000546 26 94.30% : 0.000515s : 17: match.inline 5.70% : 0.000031s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000723 4180 1.13% : 0.000008s : 52: predicate.accumulaten_eliminater 0.27% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.45% : 0.000003s : 21: predicate.addn_check_dump 1.09% : 0.000008s : 52: predicate.addn_zero_filter 1.09% : 0.000008s : 52: predicate.adjust_all_reduce_mul_add 2.20% : 0.000016s : 73: predicate.arithmetic_simplify 1.17% : 0.000008s : 52: predicate.cast_eliminate 1.06% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.49% : 0.000004s : 21: predicate.depend_value_elim 1.13% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 52: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 4: predicate.elim_not_effective 0.15% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 56: predicate.environ_add_const_eliminate 1.15% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.15% : 0.000008s : 56: predicate.environ_get_depend_swap 1.64% : 0.000012s : 77: predicate.environ_get_eliminate 1.14% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.75% : 0.000013s : 78: predicate.exchange_switch_depend_value 2.62% : 0.000019s : 78: predicate.float_depend_g_call 0.46% : 0.000003s : 21: predicate.float_environ_get_switch 0.59% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.58% : 0.000004s : 21: predicate.get_grad_eliminate 0.06% : 0.000000s : 4: predicate.graph_param_transform 0.49% : 0.000004s : 21: predicate.incorporate_call 0.45% : 0.000003s : 21: predicate.incorporate_call_switch 5.79% : 0.000042s : 180: predicate.inline 1.48% : 0.000011s : 45: predicate.inline_without_move 0.32% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.76% : 0.000005s : 21: predicate.less_batch_normalization 1.48% : 0.000011s : 69: predicate.list_to_tuple_eliminator_ 2.55% : 0.000018s : 121: predicate.load_eliminater 0.36% : 0.000003s : 4: predicate.loop_unroll_after_grad 2.56% : 0.000019s : 110: predicate.loop_unroll_before_grad 1.35% : 0.000010s : 60: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 21: predicate.merge_addn 1.04% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.07% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 52: predicate.minmaximum_grad 0.43% : 0.000003s : 4: predicate.mutable_eliminate 0.15% : 0.000001s : 4: predicate.opt_reshape 0.17% : 0.000001s : 4: predicate.parallel_virtual_node 2.17% : 0.000016s : 78: predicate.partial_defer_inline 1.62% : 0.000012s : 65: predicate.partial_eliminate 1.08% : 0.000008s : 52: predicate.print_const_string_wrapper 0.49% : 0.000004s : 21: predicate.reduce_all_const_elim 1.39% : 0.000010s : 52: predicate.reduce_eliminate 2.56% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 21: predicate.remove_not_recompute_node 1.89% : 0.000014s : 111: predicate.replace_applicator 0.78% : 0.000006s : 45: predicate.replace_old_param 0.11% : 0.000001s : 4: predicate.reset_defer_inline 1.14% : 0.000008s : 52: predicate.reshape_eliminate 1.06% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 4: predicate.row_tensor_eliminate 1.29% : 0.000009s : 50: predicate.same_eliminate 0.36% : 0.000003s : 21: predicate.set_cell_output_no_recompute 0.61% : 0.000004s : 21: predicate.shard_identity_eliminate 0.28% : 0.000002s : 8: predicate.special_op_eliminate 0.62% : 0.000004s : 21: predicate.specialize_transform 1.28% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.28% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.90% : 0.000014s : 78: predicate.switch_defer_inline 2.89% : 0.000021s : 128: predicate.switch_layer_defer_inline 5.08% : 0.000037s : 213: predicate.switch_simplify 1.09% : 0.000008s : 52: predicate.tile_eliminate 1.13% : 0.000008s : 52: predicate.transpose_eliminate 1.48% : 0.000011s : 60: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000010s : 60: predicate.tuple_list_get_item_depend_reorder 2.76% : 0.000020s : 90: predicate.tuple_list_get_item_eliminator 1.48% : 0.000011s : 60: predicate.tuple_list_get_set_item_eliminator 1.96% : 0.000014s : 81: predicate.tuple_list_set_item_eliminator 1.59% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.47% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.11% : 0.000022s : 142: predicate.updatestate_useless_node_eliminater 0.12% : 0.000001s : 4: predicate.value_based_eliminate 0.55% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.60% : 0.000004s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001866 35 57.79% : 0.001078s : 14: func_graph_cloner_run.FuncGraphClonerGraph 42.21% : 0.000788s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.249151 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.54% : 0.003839s : 1: add_attr 1.54% : 0.003827s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.02% : 0.000062s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.06% : 0.000144s : 1: auto_monad 0.01% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.20% : 0.000486s : 1: bootstrap 0.01% : 0.000035s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000022s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000032s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.01% : 0.000014s : 1: environ_conv 0.03% : 0.000063s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000014s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000007s : 1: label_fine_grained_interleaved_index 0.00% : 0.000010s : 1: label_micro_interleaved_index 0.23% : 0.000567s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.37% : 0.000915s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000021s : 1: opt.transform.mutable_eliminate 1.92% : 0.004784s : 117: opt.transform.opt_a 0.01% : 0.000036s : 1: opt.transform.opt_after_cconv 0.01% : 0.000030s : 1: opt.transform.opt_after_jit_grad 0.05% : 0.000117s : 28: opt.transform.opt_b 0.02% : 0.000058s : 2: opt.transform.opt_trans_graph 0.02% : 0.000047s : 4: opt.transform.symbol_engine_opt 6.75% : 0.016815s : 1: opt_a 0.05% : 0.000136s : 1: opt_after_cconv 0.24% : 0.000595s : 1: opt_after_jit_grad 0.10% : 0.000251s : 1: opt_b 7.90% : 0.019673s : 1: optimize 0.01% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.01% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000058s : 1: pre_auto_parallel 0.02% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000021s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000022s : 1: remove_dup_value 2.51% : 0.006265s : 2: renormalize.infer 0.74% : 0.001834s : 2: renormalize.specialize 0.00% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000053s : 1: rewriter_after_opt_a 0.07% : 0.000174s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.04% : 0.000105s : 1: symbol_engine_optimizer 70.15% : 0.174768s : 1: task_emit 0.04% : 0.000093s : 1: tuple_transform 5.08% : 0.012657s : 1: type_inference 0.04% : 0.000088s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x2-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x2-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x3-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x3-pynative],max_mem:6.0M TotalTime = 0.0263984, [24] [bootstrap]: 0.00068912 [type_inference]: 0.00823335 [event_method]: 1.574e-05 [auto_monad]: 7.072e-05 [graph_reusing]: 5.70001e-06 [inline]: 2.64001e-06 [add_attr]: 0.00422433, [1] [add_attr_with_inline]: 0.0042116, [1] [Cycle 1]: 6.248e-05, [2] [tag_attr]: 1.757e-05 [meta_addattr_fg_expand]: 5.19e-06 [parallel-infer-symbol]: 3.68e-06 [pre_auto_parallel]: 3.113e-05 [insert-virtual-dataset]: 2.99999e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 2.12001e-06 [pipeline_split]: 1.82001e-06 [optimize]: 0.00472427, [53] [py_interpret_to_execute]: 2.329e-05 [rewriter_before_opt_a]: 7.299e-05 [opt_a]: 0.00256468, [2] [Cycle 1]: 0.00192026, [45] [expand_dump_flag]: 2.97002e-06 [switch_simplify]: 3.379e-05 [loop_unroll]: 2.057e-05 [a_1]: 0.00048319 [with_stream_mark]: 1.633e-05 [recompute_prepare]: 9.25999e-06 [updatestate_depend_eliminate]: 4.1e-06 [updatestate_assign_eliminate]: 4.22998e-06 [updatestate_loads_eliminate]: 3.28e-06 [parameter_eliminate]: 1.94e-06 [a_2]: 8.196e-05 [accelerated_algorithm]: 6.78e-06 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 2.37999e-06 [shard_inline]: 6.12001e-06 [merge_send_recv]: 9.05999e-06 [auto_parallel]: 6.29001e-06 [parallel]: 2.824e-05 [flash_sp]: 8.18001e-06 [merge_comm]: 4.38999e-06 [allreduce_fusion]: 3.98999e-06 [matmul_add_comm_reduction]: 1.108e-05 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 8.2e-06 [virtual_dataset]: 6.59999e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 5.81e-06 [merge_forward]: 4.17e-06 [cell_reuse_recompute_pass]: 1.40999e-06 [offload_activation]: 1.029e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.321e-05 [merge_recompute_call_nodes]: 2.12001e-06 [before_grad]: 1.08e-05 [set_forward_comm_id_for_comm_node_pass]: 4.12e-06 [meta_fg_expand]: 3.01001e-06 [flash_sp_send_recv_attached]: 2.66e-06 [receive_attached]: 2.56e-06 [after_resolve]: 1.031e-05 [a_after_grad]: 9.29e-06 [renormalize]: 0.00070605 [add_forward_monad_depend]: 9.71003e-06 [auto_monad_grad]: 2.68e-06 [auto_monad_eliminator]: 1.629e-05 [cse]: 3.284e-05 [a_3]: 4.743e-05 [Cycle 2]: 0.00063209, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 7.33999e-06 [loop_unroll]: 5.97001e-06 [a_1]: 0.00012048 [with_stream_mark]: 1.138e-05 [recompute_prepare]: 6.15002e-06 [updatestate_depend_eliminate]: 3.18e-06 [updatestate_assign_eliminate]: 2.89999e-06 [updatestate_loads_eliminate]: 3.35e-06 [parameter_eliminate]: 1.20001e-06 [a_2]: 7.163e-05 [accelerated_algorithm]: 5.70001e-06 [shard]: 1.12999e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 6.07999e-06 [auto_parallel]: 5.36002e-06 [parallel]: 6.19001e-06 [flash_sp]: 3.73999e-06 [merge_comm]: 3.26001e-06 [allreduce_fusion]: 3.44001e-06 [matmul_add_comm_reduction]: 5.89e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 6.46e-06 [virtual_dataset]: 5.57999e-06 [get_grad_eliminate_]: 5.21002e-06 [virtual_output]: 5.17999e-06 [merge_forward]: 3.04999e-06 [cell_reuse_recompute_pass]: 1.42999e-06 [offload_activation]: 6.84001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.074e-05 [merge_recompute_call_nodes]: 1.04e-06 [before_grad]: 9.15999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.88001e-06 [meta_fg_expand]: 1.76e-06 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.17999e-06 [after_resolve]: 8.94e-06 [a_after_grad]: 7.85e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.37999e-06 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 7.46001e-06 [cse]: 1.586e-05 [a_3]: 3.272e-05 [py_interpret_to_execute_after_opt_a]: 1.03e-05 [slice_cell_reuse_recomputed_activation]: 2.42001e-06 [rewriter_after_opt_a]: 3.448e-05 [convert_after_rewriter]: 7.43e-06 [order_py_execute_after_rewriter]: 5.44e-06 [mutable_eliminate]: 0.00058954 [opt_b]: 0.00021647, [1] [Cycle 1]: 0.00020934, [7] [b_1]: 0.00012314 [b_2]: 9.33002e-06 [updatestate_depend_eliminate]: 7.48e-06 [updatestate_assign_eliminate]: 2.89001e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 6.59988e-07 [cse]: 2.29e-05 [optimize_parallel_all_gather_comm]: 1.913e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 2.875e-05 [loop_unroll]: 0.00046056 [opt_after_cconv]: 0.00010107, [1] [Cycle 1]: 9.503e-05, [7] [c_1]: 2.717e-05 [parameter_eliminate]: 3.41999e-06 [updatestate_depend_eliminate]: 5.49e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.81e-06 [cse]: 1.821e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.581e-05 [tuple_transform]: 7.187e-05, [1] [Cycle 1]: 6.714e-05, [4] [d_1]: 3.942e-05 [none_parameter_eliminate]: 1.74e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.73998e-06 [partial_unused_args_eliminate]: 1.81998e-06 [add_recomputation]: 5.367e-05 [cse_after_recomputation]: 2.318e-05, [1] [Cycle 1]: 1.813e-05, [1] [cse]: 1.279e-05 [environ_conv]: 9.94999e-06 [swap_dp_allreduce_reducescatter]: 4.99e-06 [bias_add_comm_swap]: 2.71999e-06 [label_micro_interleaved_index]: 4.63999e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.27001e-06 [micro_interleaved_order_control]: 2.51e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 9.5999e-07 [full_micro_interleaved_order_control]: 2.27999e-06 [reorder_send_recv_between_fp_bp]: 3.01999e-06 [comm_op_add_attrs]: 1.10999e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.30001e-06 [interleave_parallel_branches]: 1.48002e-06 [overlap_opt_shard_in_pipeline]: 1.25001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.338e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 4.80001e-06 [overlap_recompute_and_grad_model_parallel]: 5.07e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.62001e-06 [overlap_recompute_comm]: 2.39001e-06 [overlap_grad_ring_attention]: 4.17e-06 [overlap_grad_flash_sp]: 2.032e-05 [begin_end_overlap_inline]: 6.00005e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 2.12999e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 7.445e-05, [1] [Cycle 1]: 6.975e-05, [6] [build]: 3.50998e-06 [elim_shapecalc]: 9.29e-06 [elim_not_effective]: 1.248e-05 [opt_reshape]: 6.21e-06 [fold_const_symbol]: 9.52999e-06 [renormalize]: 1.80007e-07 [detach_backward]: 2.17001e-06 [pipeline_parallel_scheduler]: 1.67001e-06 [auto_monad_reorder]: 1.772e-05 [get_jit_bprop_graph]: 2.41998e-06 [rewriter_after_jit_bprop_graph]: 0.00013883 [opt_after_jit_grad]: 0.00050499 [validate]: 4.159e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00736925 [execute]: 9.82999e-06 Sums bootstrap : 0.000689s : 3.28% type_inference : 0.008233s : 39.16% event_method : 0.000016s : 0.07% auto_monad : 0.000071s : 0.34% graph_reusing : 0.000006s : 0.03% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000031s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000023s : 0.11% optimize.rewriter_before_opt_a : 0.000073s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.20% optimize.opt_a.loop_unroll : 0.000027s : 0.13% optimize.opt_a.a_1 : 0.000604s : 2.87% optimize.opt_a.with_stream_mark : 0.000028s : 0.13% optimize.opt_a.recompute_prepare : 0.000015s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000154s : 0.73% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.06% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000015s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.06% optimize.opt_a.parallel : 0.000034s : 0.16% optimize.opt_a.flash_sp : 0.000012s : 0.06% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.07% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.05% optimize.opt_a.virtual_output : 0.000011s : 0.05% optimize.opt_a.merge_forward : 0.000007s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000017s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000020s : 0.09% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.09% optimize.opt_a.a_after_grad : 0.000017s : 0.08% optimize.opt_a.renormalize : 0.000706s : 3.36% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.05% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.11% optimize.opt_a.cse : 0.000049s : 0.23% optimize.opt_a.a_3 : 0.000080s : 0.38% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.16% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000590s : 2.80% optimize.opt_b.b_1 : 0.000123s : 0.59% optimize.opt_b.b_2 : 0.000009s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000023s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000029s : 0.14% optimize.loop_unroll : 0.000461s : 2.19% optimize.opt_after_cconv.c_1 : 0.000027s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.19% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000054s : 0.26% optimize.cse_after_recomputation.cse : 0.000013s : 0.06% optimize.environ_conv : 0.000010s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000020s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000018s : 0.08% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000139s : 0.66% opt_after_jit_grad : 0.000505s : 2.40% validate : 0.000042s : 0.20% backend_pass : 0.000001s : 0.00% task_emit : 0.007369s : 35.05% execute : 0.000010s : 0.05% Time group info: ------[substitution.] 0.000195 26 19.16% : 0.000037s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 2.81% : 0.000005s : 3: substitution.graph_param_transform 65.27% : 0.000127s : 3: substitution.inline 1.96% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000005s : 4: substitution.remove_not_recompute_node 1.91% : 0.000004s : 2: substitution.replace_old_param 4.45% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.008152 2 90.38% : 0.007368s : 1: type_inference.infer 9.62% : 0.000784s : 1: type_inference.specialize ------[replace.] 0.000039 4 77.51% : 0.000031s : 3: replace.inline 22.49% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000133 4 93.97% : 0.000125s : 3: match.inline 6.03% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000167 883 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 0.84% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000002s : 9: predicate.addn_zero_filter 0.80% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.05% : 0.000003s : 15: predicate.arithmetic_simplify 0.87% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 6: predicate.check_bprop_eliminate 0.55% : 0.000001s : 6: predicate.compare_switch_simplify 0.17% : 0.000000s : 3: predicate.const_output_eliminate 0.59% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.21% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.20% : 0.000000s : 3: predicate.elim_not_effective 0.42% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.40% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 12: predicate.environ_get_depend_swap 1.72% : 0.000003s : 18: predicate.environ_get_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 13: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.87% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.71% : 0.000001s : 6: predicate.get_grad_eliminate 0.18% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.56% : 0.000001s : 6: predicate.incorporate_call_switch 6.55% : 0.000011s : 40: predicate.inline 0.82% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 6: predicate.less_batch_normalization 1.78% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.30% : 0.000004s : 25: predicate.load_eliminater 1.03% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.03% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.55% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 9: predicate.minmaximum_grad 1.36% : 0.000002s : 3: predicate.mutable_eliminate 0.33% : 0.000001s : 3: predicate.opt_reshape 0.35% : 0.000001s : 3: predicate.parallel_virtual_node 1.55% : 0.000003s : 13: predicate.partial_defer_inline 1.39% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000001s : 9: predicate.print_const_string_wrapper 0.59% : 0.000001s : 6: predicate.reduce_all_const_elim 1.34% : 0.000002s : 9: predicate.reduce_eliminate 2.34% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 6: predicate.remove_not_recompute_node 1.40% : 0.000002s : 16: predicate.replace_applicator 0.58% : 0.000001s : 6: predicate.replace_old_param 0.51% : 0.000001s : 3: predicate.reset_defer_inline 0.85% : 0.000001s : 9: predicate.reshape_eliminate 0.66% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.62% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.70% : 0.000001s : 6: predicate.shard_identity_eliminate 1.05% : 0.000002s : 6: predicate.special_op_eliminate 0.74% : 0.000001s : 6: predicate.specialize_transform 1.06% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000002s : 6: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 13: predicate.switch_defer_inline 1.93% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.88% : 0.000008s : 43: predicate.switch_simplify 0.93% : 0.000002s : 9: predicate.tile_eliminate 0.86% : 0.000001s : 9: predicate.transpose_eliminate 1.66% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.51% : 0.000003s : 15: predicate.tuple_list_get_item_depend_reorder 3.43% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.54% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.07% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.63% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.65% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.44% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000604 8 49.71% : 0.000300s : 3: func_graph_cloner_run.FuncGraphClonerGraph 50.29% : 0.000304s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.037172 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.38% : 0.004230s : 1: add_attr 11.34% : 0.004216s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000058s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000076s : 1: auto_monad 0.06% : 0.000022s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 2.03% : 0.000753s : 1: bootstrap 0.09% : 0.000032s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000026s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.04% : 0.000013s : 1: environ_conv 0.06% : 0.000023s : 1: event_method 0.05% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000006s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.27% : 0.000471s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.61% : 0.000600s : 1: mutable_eliminate 0.02% : 0.000008s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000016s : 1: opt.transform.mutable_eliminate 2.65% : 0.000986s : 78: opt.transform.opt_a 0.07% : 0.000026s : 1: opt.transform.opt_after_cconv 0.06% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000102s : 28: opt.transform.opt_b 0.12% : 0.000043s : 2: opt.transform.opt_trans_graph 0.09% : 0.000034s : 4: opt.transform.symbol_engine_opt 6.91% : 0.002568s : 1: opt_a 0.28% : 0.000105s : 1: opt_after_cconv 1.39% : 0.000517s : 1: opt_after_jit_grad 0.59% : 0.000220s : 1: opt_b 12.72% : 0.004729s : 1: optimize 0.06% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.06% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.10% : 0.000036s : 1: pre_auto_parallel 0.07% : 0.000028s : 1: py_interpret_to_execute 0.04% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000020s : 1: remove_dup_value 1.01% : 0.000374s : 1: renormalize.infer 0.87% : 0.000323s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.39% : 0.000145s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000038s : 1: rewriter_after_opt_a 0.21% : 0.000077s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.21% : 0.000077s : 1: symbol_engine_optimizer 19.88% : 0.007390s : 1: task_emit 0.20% : 0.000075s : 1: tuple_transform 22.21% : 0.008255s : 1: type_inference 0.22% : 0.000083s : 1: validate TotalTime = 0.0236578, [24] [bootstrap]: 0.00046401 [type_inference]: 0.00692727 [event_method]: 1.436e-05 [auto_monad]: 6.538e-05 [graph_reusing]: 6.04999e-06 [inline]: 2.83998e-06 [add_attr]: 0.00362084, [1] [add_attr_with_inline]: 0.00361008, [1] [Cycle 1]: 5.975e-05, [2] [tag_attr]: 1.697e-05 [meta_addattr_fg_expand]: 4.25e-06 [parallel-infer-symbol]: 4.15e-06 [pre_auto_parallel]: 3.132e-05 [insert-virtual-dataset]: 2.84999e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.27001e-06 [pipeline_split]: 1.92001e-06 [optimize]: 0.0048336, [53] [py_interpret_to_execute]: 2.652e-05 [rewriter_before_opt_a]: 6.392e-05 [opt_a]: 0.00266007, [2] [Cycle 1]: 0.00182307, [45] [expand_dump_flag]: 3.25e-06 [switch_simplify]: 3.207e-05 [loop_unroll]: 1.667e-05 [a_1]: 0.00040669 [with_stream_mark]: 2.119e-05 [recompute_prepare]: 1.081e-05 [updatestate_depend_eliminate]: 4.60001e-06 [updatestate_assign_eliminate]: 3.71999e-06 [updatestate_loads_eliminate]: 3.66999e-06 [parameter_eliminate]: 1.90001e-06 [a_2]: 8.594e-05 [accelerated_algorithm]: 7.06999e-06 [shard]: 3.26001e-06 [meta_shard_fg_expand]: 2.01998e-06 [shard_inline]: 6.58e-06 [merge_send_recv]: 1.018e-05 [auto_parallel]: 7.97e-06 [parallel]: 1.963e-05 [flash_sp]: 9.59e-06 [merge_comm]: 4.89003e-06 [allreduce_fusion]: 3.76001e-06 [matmul_add_comm_reduction]: 1.19e-05 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 1.105e-05 [virtual_dataset]: 6.34999e-06 [get_grad_eliminate_]: 5.91003e-06 [virtual_output]: 6.46e-06 [merge_forward]: 4.87998e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 1.111e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.499e-05 [merge_recompute_call_nodes]: 1.66002e-06 [before_grad]: 1.21e-05 [set_forward_comm_id_for_comm_node_pass]: 4.47e-06 [meta_fg_expand]: 3.16999e-06 [flash_sp_send_recv_attached]: 3.2e-06 [receive_attached]: 2.34999e-06 [after_resolve]: 1.197e-05 [a_after_grad]: 9.45001e-06 [renormalize]: 0.00064301 [add_forward_monad_depend]: 6.41e-06 [auto_monad_grad]: 2.94999e-06 [auto_monad_eliminator]: 1.707e-05 [cse]: 3.234e-05 [a_3]: 4.874e-05 [Cycle 2]: 0.0008236, [45] [expand_dump_flag]: 1.52999e-06 [switch_simplify]: 7.88999e-06 [loop_unroll]: 6.16e-06 [a_1]: 0.0001221 [with_stream_mark]: 1.607e-05 [recompute_prepare]: 7.88001e-06 [updatestate_depend_eliminate]: 3.57002e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 3.11001e-06 [parameter_eliminate]: 1.35001e-06 [a_2]: 7.419e-05 [accelerated_algorithm]: 7.16999e-06 [shard]: 1.52999e-06 [meta_shard_fg_expand]: 1.79998e-06 [shard_inline]: 6.40002e-06 [merge_send_recv]: 8.84e-06 [auto_parallel]: 6.51e-06 [parallel]: 6.56e-06 [flash_sp]: 3.65e-06 [merge_comm]: 4.20999e-06 [allreduce_fusion]: 3.59002e-06 [matmul_add_comm_reduction]: 8.72998e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 9.07999e-06 [virtual_dataset]: 5.83002e-06 [get_grad_eliminate_]: 5.39998e-06 [virtual_output]: 5.17e-06 [merge_forward]: 3.87002e-06 [cell_reuse_recompute_pass]: 1.90001e-06 [offload_activation]: 8.03001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.219e-05 [merge_recompute_call_nodes]: 1.27999e-06 [before_grad]: 9.77001e-06 [set_forward_comm_id_for_comm_node_pass]: 4.99e-06 [meta_fg_expand]: 2.47001e-06 [flash_sp_send_recv_attached]: 1.19e-06 [receive_attached]: 1.13001e-06 [after_resolve]: 1.088e-05 [a_after_grad]: 8.025e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 3.36999e-06 [auto_monad_grad]: 2.24999e-06 [auto_monad_eliminator]: 1.108e-05 [cse]: 1.989e-05 [a_3]: 3.766e-05 [py_interpret_to_execute_after_opt_a]: 1.302e-05 [slice_cell_reuse_recomputed_activation]: 2.49999e-06 [rewriter_after_opt_a]: 4.037e-05 [convert_after_rewriter]: 6.76e-06 [order_py_execute_after_rewriter]: 5.79999e-06 [mutable_eliminate]: 0.00062794 [opt_b]: 0.00019663, [1] [Cycle 1]: 0.0001895, [7] [b_1]: 0.00011339 [b_2]: 6.89001e-06 [updatestate_depend_eliminate]: 7.6e-06 [updatestate_assign_eliminate]: 2.79001e-06 [updatestate_loads_eliminate]: 2.68e-06 [renormalize]: 6.19999e-07 [cse]: 1.986e-05 [optimize_parallel_all_gather_comm]: 1.868e-05 [overlap_param_gather]: 1.91998e-06 [cconv]: 3.043e-05 [loop_unroll]: 0.00045402 [opt_after_cconv]: 0.00010212, [1] [Cycle 1]: 9.577e-05, [7] [c_1]: 2.643e-05 [parameter_eliminate]: 3.95e-06 [updatestate_depend_eliminate]: 6.20002e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.78998e-06 [cse]: 1.82e-05 [renormalize]: 6.59988e-07 [remove_dup_value]: 1.673e-05 [tuple_transform]: 7.101e-05, [1] [Cycle 1]: 6.614e-05, [4] [d_1]: 3.876e-05 [none_parameter_eliminate]: 1.68002e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.99001e-06 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 4.907e-05 [cse_after_recomputation]: 2.201e-05, [1] [Cycle 1]: 1.769e-05, [1] [cse]: 1.203e-05 [environ_conv]: 6.15002e-06 [swap_dp_allreduce_reducescatter]: 5.47001e-06 [bias_add_comm_swap]: 2.88003e-06 [label_micro_interleaved_index]: 5.27001e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.48e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.61e-06 [reorder_send_recv_between_fp_bp]: 2.86e-06 [comm_op_add_attrs]: 1.18001e-06 [add_comm_op_reuse_tag]: 1.04998e-06 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.15001e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84998e-06 [control_data_broadcast_order]: 1.381e-05 [grouped_pairwise_exchange_alltoall]: 1.40999e-06 [offloading_packed_experts]: 4.44998e-06 [overlap_recompute_and_grad_model_parallel]: 6.41998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 4.93001e-06 [overlap_grad_flash_sp]: 2.101e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.34999e-06 [split_layernorm_comm]: 2.04999e-06 [handle_group_info]: 1.12999e-06 [symbol_engine_optimizer]: 7.653e-05, [1] [Cycle 1]: 7.162e-05, [6] [build]: 2.92002e-06 [elim_shapecalc]: 9.38002e-06 [elim_not_effective]: 1.219e-05 [opt_reshape]: 6.48e-06 [fold_const_symbol]: 9.62001e-06 [renormalize]: 2.9002e-07 [detach_backward]: 2.28998e-06 [pipeline_parallel_scheduler]: 1.54998e-06 [auto_monad_reorder]: 1.698e-05 [get_jit_bprop_graph]: 1.82999e-06 [rewriter_after_jit_bprop_graph]: 4.35999e-06 [opt_after_jit_grad]: 0.00051056 [validate]: 4.198e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00683585 [execute]: 9.10999e-06 Sums bootstrap : 0.000464s : 2.46% type_inference : 0.006927s : 36.78% event_method : 0.000014s : 0.08% auto_monad : 0.000065s : 0.35% graph_reusing : 0.000006s : 0.03% inline : 0.000003s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000031s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000027s : 0.14% optimize.rewriter_before_opt_a : 0.000064s : 0.34% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000040s : 0.21% optimize.opt_a.loop_unroll : 0.000023s : 0.12% optimize.opt_a.a_1 : 0.000529s : 2.81% optimize.opt_a.with_stream_mark : 0.000037s : 0.20% optimize.opt_a.recompute_prepare : 0.000019s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000160s : 0.85% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.08% optimize.opt_a.shard : 0.000005s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.07% optimize.opt_a.merge_send_recv : 0.000019s : 0.10% optimize.opt_a.auto_parallel : 0.000014s : 0.08% optimize.opt_a.parallel : 0.000026s : 0.14% optimize.opt_a.flash_sp : 0.000013s : 0.07% optimize.opt_a.merge_comm : 0.000009s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000021s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000020s : 0.11% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000012s : 0.06% optimize.opt_a.merge_forward : 0.000009s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000019s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000027s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000022s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000009s : 0.05% optimize.opt_a.meta_fg_expand : 0.000006s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000023s : 0.12% optimize.opt_a.a_after_grad : 0.000090s : 0.48% optimize.opt_a.renormalize : 0.000643s : 3.41% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.05% optimize.opt_a.auto_monad_grad : 0.000005s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000028s : 0.15% optimize.opt_a.cse : 0.000052s : 0.28% optimize.opt_a.a_3 : 0.000086s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000040s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000628s : 3.33% optimize.opt_b.b_1 : 0.000113s : 0.60% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000020s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000030s : 0.16% optimize.loop_unroll : 0.000454s : 2.41% optimize.opt_after_cconv.c_1 : 0.000026s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000017s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.26% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000021s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000511s : 2.71% validate : 0.000042s : 0.22% backend_pass : 0.000001s : 0.00% task_emit : 0.006836s : 36.29% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000180 24 19.18% : 0.000034s : 4: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.22% : 0.000006s : 3: substitution.graph_param_transform 68.02% : 0.000122s : 3: substitution.inline 2.71% : 0.000005s : 4: substitution.j_node_and_user_rematch 2.66% : 0.000005s : 4: substitution.remove_not_recompute_node 2.36% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.006863 2 92.18% : 0.006327s : 1: type_inference.infer 7.82% : 0.000537s : 1: type_inference.specialize ------[replace.] 0.000035 3 100.00% : 0.000035s : 3: replace.inline ------[match.] 0.000120 3 100.00% : 0.000120s : 3: match.inline ------[predicate.] 0.000158 815 0.95% : 0.000002s : 8: predicate.accumulaten_eliminater 0.95% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 6: predicate.addn_check_dump 0.86% : 0.000001s : 8: predicate.addn_zero_filter 0.81% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 14: predicate.arithmetic_simplify 0.82% : 0.000001s : 8: predicate.cast_eliminate 0.66% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.86% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 8: predicate.dict_set_item_eliminator 0.98% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.02% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.00% : 0.000002s : 11: predicate.environ_get_depend_swap 1.72% : 0.000003s : 17: predicate.environ_get_eliminate 1.03% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.19% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.57% : 0.000004s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.89% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.69% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.71% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.46% : 0.000010s : 37: predicate.inline 1.17% : 0.000002s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.21% : 0.000002s : 6: predicate.less_batch_normalization 1.49% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 22: predicate.load_eliminater 1.15% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.91% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.28% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.49% : 0.000002s : 11: predicate.partial_defer_inline 1.46% : 0.000002s : 11: predicate.partial_eliminate 0.78% : 0.000001s : 8: predicate.print_const_string_wrapper 0.77% : 0.000001s : 6: predicate.reduce_all_const_elim 1.09% : 0.000002s : 8: predicate.reduce_eliminate 2.29% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 6: predicate.remove_not_recompute_node 1.36% : 0.000002s : 14: predicate.replace_applicator 0.90% : 0.000001s : 6: predicate.replace_old_param 0.61% : 0.000001s : 3: predicate.reset_defer_inline 0.83% : 0.000001s : 8: predicate.reshape_eliminate 0.64% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 1.19% : 0.000002s : 6: predicate.same_eliminate 0.71% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.97% : 0.000002s : 6: predicate.shard_identity_eliminate 0.78% : 0.000001s : 6: predicate.special_op_eliminate 1.02% : 0.000002s : 6: predicate.specialize_transform 0.98% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.22% : 0.000002s : 11: predicate.switch_defer_inline 1.85% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.75% : 0.000008s : 38: predicate.switch_simplify 0.82% : 0.000001s : 8: predicate.tile_eliminate 0.86% : 0.000001s : 8: predicate.transpose_eliminate 1.68% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.36% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.15% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.94% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.71% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 6: predicate.virtual_output_eliminate 0.35% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.68% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000363 7 33.84% : 0.000123s : 2: func_graph_cloner_run.FuncGraphClonerGraph 66.16% : 0.000240s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.033903 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.70% : 0.003627s : 1: add_attr 10.66% : 0.003615s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000072s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.49% : 0.000504s : 1: bootstrap 0.10% : 0.000034s : 1: cconv 0.01% : 0.000005s : 1: comm_op_add_attrs 0.05% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.05% : 0.000015s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000009s : 1: label_micro_interleaved_index 1.37% : 0.000464s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.89% : 0.000640s : 1: mutable_eliminate 0.02% : 0.000008s : 1: offloading_packed_experts 0.04% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000015s : 1: opt.transform.mutable_eliminate 2.96% : 0.001004s : 78: opt.transform.opt_a 0.07% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000091s : 28: opt.transform.opt_b 0.13% : 0.000044s : 2: opt.transform.opt_trans_graph 0.10% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.86% : 0.002663s : 1: opt_a 0.31% : 0.000106s : 1: opt_after_cconv 1.54% : 0.000522s : 1: opt_after_jit_grad 0.59% : 0.000200s : 1: opt_b 14.27% : 0.004838s : 1: optimize 0.07% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000025s : 1: overlap_grad_flash_sp 0.01% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000006s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000036s : 1: pre_auto_parallel 0.09% : 0.000031s : 1: py_interpret_to_execute 0.05% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000020s : 1: remove_dup_value 1.03% : 0.000349s : 1: renormalize.infer 0.84% : 0.000284s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000044s : 1: rewriter_after_opt_a 0.21% : 0.000071s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000079s : 1: symbol_engine_optimizer 20.22% : 0.006854s : 1: task_emit 0.22% : 0.000074s : 1: tuple_transform 20.51% : 0.006955s : 1: type_inference 0.23% : 0.000078s : 1: validate TotalTime = 0.022534, [24] [bootstrap]: 0.00043949 [type_inference]: 0.00618612 [event_method]: 1.465e-05 [auto_monad]: 6.128e-05 [graph_reusing]: 5.52001e-06 [inline]: 2.23002e-06 [add_attr]: 0.00333626, [1] [add_attr_with_inline]: 0.00332587, [1] [Cycle 1]: 6.364e-05, [2] [tag_attr]: 1.699e-05 [meta_addattr_fg_expand]: 4.79e-06 [parallel-infer-symbol]: 3.56001e-06 [pre_auto_parallel]: 2.918e-05 [insert-virtual-dataset]: 2.72001e-06 [parallel-infer-symbol-second]: 9.5999e-07 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 2.00002e-06 [optimize]: 0.00456489, [53] [py_interpret_to_execute]: 2.385e-05 [rewriter_before_opt_a]: 6.588e-05 [opt_a]: 0.00242085, [2] [Cycle 1]: 0.0017826, [45] [expand_dump_flag]: 3.13e-06 [switch_simplify]: 3.616e-05 [loop_unroll]: 2.094e-05 [a_1]: 0.00046243 [with_stream_mark]: 1.669e-05 [recompute_prepare]: 8.59e-06 [updatestate_depend_eliminate]: 3.9e-06 [updatestate_assign_eliminate]: 4.23999e-06 [updatestate_loads_eliminate]: 3.66999e-06 [parameter_eliminate]: 1.77999e-06 [a_2]: 8.309e-05 [accelerated_algorithm]: 6.63e-06 [shard]: 2.68003e-06 [meta_shard_fg_expand]: 1.87001e-06 [shard_inline]: 5.96998e-06 [merge_send_recv]: 8.97e-06 [auto_parallel]: 6.81999e-06 [parallel]: 2.029e-05 [flash_sp]: 8.37e-06 [merge_comm]: 3.81999e-06 [allreduce_fusion]: 3.55998e-06 [matmul_add_comm_reduction]: 9.76e-06 [allreduce_slice_to_reducescatter]: 8.09989e-07 [virtual_shard_identity]: 7.25e-06 [virtual_dataset]: 6.79001e-06 [get_grad_eliminate_]: 5.76e-06 [virtual_output]: 5.89e-06 [merge_forward]: 4.43999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 1.081e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.217e-05 [merge_recompute_call_nodes]: 1.77001e-06 [before_grad]: 1.061e-05 [set_forward_comm_id_for_comm_node_pass]: 3.66001e-06 [meta_fg_expand]: 2.75997e-06 [flash_sp_send_recv_attached]: 2.63998e-06 [receive_attached]: 3.01999e-06 [after_resolve]: 9.93998e-06 [a_after_grad]: 8.94998e-06 [renormalize]: 0.00061334 [add_forward_monad_depend]: 5.14e-06 [auto_monad_grad]: 2.34001e-06 [auto_monad_eliminator]: 1.575e-05 [cse]: 3.23e-05 [a_3]: 4.476e-05 [Cycle 2]: 0.00062714, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 7.97e-06 [loop_unroll]: 6.24001e-06 [a_1]: 0.0001168 [with_stream_mark]: 1.029e-05 [recompute_prepare]: 6.11e-06 [updatestate_depend_eliminate]: 3.05002e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.83998e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 7.237e-05 [accelerated_algorithm]: 5.91e-06 [shard]: 1.19e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 6.19999e-06 [merge_send_recv]: 4.97999e-06 [auto_parallel]: 5.52001e-06 [parallel]: 4.70999e-06 [flash_sp]: 3.54002e-06 [merge_comm]: 3.35998e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 6.08002e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.31e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.27001e-06 [merge_forward]: 2.75997e-06 [cell_reuse_recompute_pass]: 1.66002e-06 [offload_activation]: 7.12002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.185e-05 [merge_recompute_call_nodes]: 7.80012e-07 [before_grad]: 9.14e-06 [set_forward_comm_id_for_comm_node_pass]: 3.71999e-06 [meta_fg_expand]: 1.99e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.16002e-06 [after_resolve]: 8.61002e-06 [a_after_grad]: 8.12e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.69e-06 [auto_monad_grad]: 1.45999e-06 [auto_monad_eliminator]: 6.69001e-06 [cse]: 1.524e-05 [a_3]: 3.326e-05 [py_interpret_to_execute_after_opt_a]: 9.27999e-06 [slice_cell_reuse_recomputed_activation]: 2.44001e-06 [rewriter_after_opt_a]: 3.73e-05 [convert_after_rewriter]: 7.32997e-06 [order_py_execute_after_rewriter]: 6.02999e-06 [mutable_eliminate]: 0.00057495 [opt_b]: 0.00020074, [1] [Cycle 1]: 0.00019209, [7] [b_1]: 0.00011468 [b_2]: 7.18998e-06 [updatestate_depend_eliminate]: 6.34001e-06 [updatestate_assign_eliminate]: 3.04999e-06 [updatestate_loads_eliminate]: 2.84999e-06 [renormalize]: 9.50007e-07 [cse]: 2.086e-05 [optimize_parallel_all_gather_comm]: 1.652e-05 [overlap_param_gather]: 2.22001e-06 [cconv]: 2.712e-05 [loop_unroll]: 0.00047121 [opt_after_cconv]: 0.00010363, [1] [Cycle 1]: 9.71e-05, [7] [c_1]: 2.73e-05 [parameter_eliminate]: 3.39001e-06 [updatestate_depend_eliminate]: 5.74999e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.73003e-06 [cse]: 2.019e-05 [renormalize]: 6.30011e-07 [remove_dup_value]: 1.598e-05 [tuple_transform]: 7.267e-05, [1] [Cycle 1]: 6.787e-05, [4] [d_1]: 4.009e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 3.50003e-07 [switch_simplify]: 7.23e-06 [partial_unused_args_eliminate]: 2.11e-06 [add_recomputation]: 4.877e-05 [cse_after_recomputation]: 2.344e-05, [1] [Cycle 1]: 1.793e-05, [1] [cse]: 1.205e-05 [environ_conv]: 5.79e-06 [swap_dp_allreduce_reducescatter]: 5.37999e-06 [bias_add_comm_swap]: 3.13998e-06 [label_micro_interleaved_index]: 5.52999e-06 [label_fine_grained_interleaved_index]: 2.80002e-06 [merge_cast_opt]: 1.43002e-06 [slice_recompute_activation]: 2.71e-06 [micro_interleaved_order_control]: 2.73e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.51e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 1.15999e-06 [add_comm_op_reuse_tag]: 1.25999e-06 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.32e-06 [overlap_opt_shard_in_pipeline]: 1.28002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.426e-05 [grouped_pairwise_exchange_alltoall]: 1.78002e-06 [offloading_packed_experts]: 4.68999e-06 [overlap_recompute_and_grad_model_parallel]: 5.48002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.62001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.56e-06 [overlap_grad_ring_attention]: 4.62e-06 [overlap_grad_flash_sp]: 2.076e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.24999e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 1.14998e-06 [symbol_engine_optimizer]: 8.511e-05, [1] [Cycle 1]: 7.985e-05, [6] [build]: 2.54999e-06 [elim_shapecalc]: 1.196e-05 [elim_not_effective]: 1.426e-05 [opt_reshape]: 7.03998e-06 [fold_const_symbol]: 9.72999e-06 [renormalize]: 4.19997e-07 [detach_backward]: 2.17999e-06 [pipeline_parallel_scheduler]: 1.65001e-06 [auto_monad_reorder]: 1.772e-05 [get_jit_bprop_graph]: 1.67001e-06 [rewriter_after_jit_bprop_graph]: 4.04002e-06 [opt_after_jit_grad]: 0.00051439 [validate]: 3.974e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00705427 [execute]: 8.75001e-06 Sums bootstrap : 0.000439s : 2.43% type_inference : 0.006186s : 34.15% event_method : 0.000015s : 0.08% auto_monad : 0.000061s : 0.34% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000029s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000024s : 0.13% optimize.rewriter_before_opt_a : 0.000066s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000044s : 0.24% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000579s : 3.20% optimize.opt_a.with_stream_mark : 0.000027s : 0.15% optimize.opt_a.recompute_prepare : 0.000015s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000155s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000025s : 0.14% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.07% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000018s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000020s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000613s : 3.39% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.12% optimize.opt_a.cse : 0.000048s : 0.26% optimize.opt_a.a_3 : 0.000078s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000037s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000575s : 3.17% optimize.opt_b.b_1 : 0.000115s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.01% optimize.opt_b.cse : 0.000021s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000027s : 0.15% optimize.loop_unroll : 0.000471s : 2.60% optimize.opt_after_cconv.c_1 : 0.000027s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.cse : 0.000020s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.27% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000006s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000021s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.07% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000018s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000514s : 2.84% validate : 0.000040s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.007054s : 38.95% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000182 26 18.70% : 0.000034s : 5: substitution.arithmetic_simplify 1.39% : 0.000003s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 3.16% : 0.000006s : 3: substitution.graph_param_transform 64.69% : 0.000118s : 3: substitution.inline 1.94% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.58% : 0.000005s : 4: substitution.remove_not_recompute_node 1.88% : 0.000003s : 2: substitution.replace_old_param 4.86% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006132 2 89.60% : 0.005494s : 1: type_inference.infer 10.40% : 0.000638s : 1: type_inference.specialize ------[replace.] 0.000040 4 77.67% : 0.000031s : 3: replace.inline 22.33% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000124 4 93.47% : 0.000116s : 3: match.inline 6.53% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 883 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 6: predicate.addn_check_dump 0.85% : 0.000001s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.15% : 0.000004s : 15: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.62% : 0.000001s : 6: predicate.check_bprop_eliminate 0.76% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.66% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.32% : 0.000001s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 12: predicate.environ_get_depend_swap 1.72% : 0.000003s : 18: predicate.environ_get_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.49% : 0.000004s : 13: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.91% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.73% : 0.000001s : 6: predicate.get_grad_eliminate 0.34% : 0.000001s : 3: predicate.graph_param_transform 0.67% : 0.000001s : 6: predicate.incorporate_call 0.54% : 0.000001s : 6: predicate.incorporate_call_switch 6.53% : 0.000011s : 40: predicate.inline 0.95% : 0.000002s : 6: predicate.inline_without_move 0.38% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.92% : 0.000002s : 6: predicate.less_batch_normalization 1.68% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.31% : 0.000004s : 25: predicate.load_eliminater 0.92% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.06% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.54% : 0.000001s : 6: predicate.merge_addn 0.65% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 9: predicate.minmaximum_grad 1.74% : 0.000003s : 3: predicate.mutable_eliminate 0.40% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.62% : 0.000003s : 13: predicate.partial_defer_inline 1.42% : 0.000002s : 13: predicate.partial_eliminate 0.94% : 0.000002s : 9: predicate.print_const_string_wrapper 0.58% : 0.000001s : 6: predicate.reduce_all_const_elim 1.09% : 0.000002s : 9: predicate.reduce_eliminate 2.31% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 6: predicate.remove_not_recompute_node 1.24% : 0.000002s : 16: predicate.replace_applicator 0.65% : 0.000001s : 6: predicate.replace_old_param 0.31% : 0.000001s : 3: predicate.reset_defer_inline 0.96% : 0.000002s : 9: predicate.reshape_eliminate 0.65% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.91% : 0.000001s : 6: predicate.same_eliminate 0.43% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.75% : 0.000001s : 6: predicate.shard_identity_eliminate 0.79% : 0.000001s : 6: predicate.special_op_eliminate 0.85% : 0.000001s : 6: predicate.specialize_transform 0.96% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.75% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 13: predicate.switch_defer_inline 1.86% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.06% : 0.000008s : 43: predicate.switch_simplify 0.88% : 0.000001s : 9: predicate.tile_eliminate 0.93% : 0.000002s : 9: predicate.transpose_eliminate 1.55% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.37% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.54% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.21% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.08% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.68% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.65% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000391 8 42.13% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 57.87% : 0.000226s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032167 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.39% : 0.003342s : 1: add_attr 10.35% : 0.003330s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000054s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.21% : 0.000067s : 1: auto_monad 0.07% : 0.000022s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.48% : 0.000476s : 1: bootstrap 0.10% : 0.000031s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000018s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000027s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.05% : 0.000015s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000005s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.50% : 0.000482s : 1: loop_unroll 0.02% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.82% : 0.000586s : 1: mutable_eliminate 0.02% : 0.000008s : 1: offloading_packed_experts 0.05% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000016s : 1: opt.transform.mutable_eliminate 2.99% : 0.000961s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000026s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000092s : 28: opt.transform.opt_b 0.14% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000039s : 4: opt.transform.symbol_engine_opt 7.54% : 0.002424s : 1: opt_a 0.34% : 0.000108s : 1: opt_after_cconv 1.64% : 0.000527s : 1: opt_after_jit_grad 0.63% : 0.000204s : 1: opt_b 14.21% : 0.004570s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000034s : 1: pre_auto_parallel 0.09% : 0.000028s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000020s : 1: remove_dup_value 0.98% : 0.000315s : 1: renormalize.infer 0.90% : 0.000291s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000041s : 1: rewriter_after_opt_a 0.22% : 0.000070s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000006s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000006s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000088s : 1: symbol_engine_optimizer 21.99% : 0.007074s : 1: task_emit 0.24% : 0.000076s : 1: tuple_transform 19.29% : 0.006205s : 1: type_inference 0.22% : 0.000072s : 1: validate TotalTime = 0.0475138, [24] [bootstrap]: 0.00051272 [type_inference]: 0.0144202 [event_method]: 5.467e-05 [auto_monad]: 0.00015502 [graph_reusing]: 9.12001e-06 [inline]: 3.71001e-06 [add_attr]: 0.00376449, [1] [add_attr_with_inline]: 0.00375149, [1] [Cycle 1]: 9.738e-05, [2] [tag_attr]: 3.909e-05 [meta_addattr_fg_expand]: 1.056e-05 [parallel-infer-symbol]: 3.70998e-06 [pre_auto_parallel]: 5.734e-05 [insert-virtual-dataset]: 3.06999e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.86999e-06 [pipeline_split]: 2.12001e-06 [optimize]: 0.0199057, [53] [py_interpret_to_execute]: 4.458e-05 [rewriter_before_opt_a]: 0.0001718 [opt_a]: 0.017309, [3] [Cycle 1]: 0.0132287, [45] [expand_dump_flag]: 5.28002e-06 [switch_simplify]: 7.922e-05 [loop_unroll]: 6.353e-05 [a_1]: 0.00156063 [with_stream_mark]: 7.481e-05 [recompute_prepare]: 2.919e-05 [updatestate_depend_eliminate]: 1.029e-05 [updatestate_assign_eliminate]: 7.55e-06 [updatestate_loads_eliminate]: 6.99001e-06 [parameter_eliminate]: 0.00012649 [a_2]: 0.00025933 [accelerated_algorithm]: 3.855e-05 [shard]: 2.36e-06 [meta_shard_fg_expand]: 4.63999e-06 [shard_inline]: 1.649e-05 [merge_send_recv]: 2.21e-05 [auto_parallel]: 1.228e-05 [parallel]: 2.167e-05 [flash_sp]: 1.371e-05 [merge_comm]: 1.069e-05 [allreduce_fusion]: 9.00999e-06 [matmul_add_comm_reduction]: 3.518e-05 [allreduce_slice_to_reducescatter]: 8.79983e-07 [virtual_shard_identity]: 2.043e-05 [virtual_dataset]: 1.629e-05 [get_grad_eliminate_]: 1.573e-05 [virtual_output]: 1.589e-05 [merge_forward]: 9.95002e-06 [cell_reuse_recompute_pass]: 1.75001e-06 [offload_activation]: 2.006e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.298e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 2.992e-05 [set_forward_comm_id_for_comm_node_pass]: 1.023e-05 [meta_fg_expand]: 0.00183829 [flash_sp_send_recv_attached]: 4.12998e-06 [receive_attached]: 2.64001e-06 [after_resolve]: 7.417e-05 [a_after_grad]: 9.371e-05 [renormalize]: 0.00760635 [add_forward_monad_depend]: 1.339e-05 [auto_monad_grad]: 6.61999e-06 [auto_monad_eliminator]: 5.836e-05 [cse]: 0.00020811 [a_3]: 0.00035293 [Cycle 2]: 0.00331609, [45] [expand_dump_flag]: 3.02002e-06 [switch_simplify]: 4.721e-05 [loop_unroll]: 4.323e-05 [a_1]: 0.0014811 [with_stream_mark]: 1.983e-05 [recompute_prepare]: 1.278e-05 [updatestate_depend_eliminate]: 4.97999e-06 [updatestate_assign_eliminate]: 4.38999e-06 [updatestate_loads_eliminate]: 3.38999e-06 [parameter_eliminate]: 2.84001e-06 [a_2]: 9.535e-05 [accelerated_algorithm]: 1.254e-05 [shard]: 2.62001e-06 [meta_shard_fg_expand]: 2.71e-06 [shard_inline]: 6.86001e-06 [merge_send_recv]: 1.057e-05 [auto_parallel]: 1.065e-05 [parallel]: 1.033e-05 [flash_sp]: 4.74e-06 [merge_comm]: 4.33001e-06 [allreduce_fusion]: 3.98001e-06 [matmul_add_comm_reduction]: 1.042e-05 [allreduce_slice_to_reducescatter]: 8.30012e-07 [virtual_shard_identity]: 9.34998e-06 [virtual_dataset]: 7.46999e-06 [get_grad_eliminate_]: 7.06001e-06 [virtual_output]: 6.43e-06 [merge_forward]: 4.82e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 1.121e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.461e-05 [merge_recompute_call_nodes]: 1.60001e-06 [before_grad]: 1.356e-05 [set_forward_comm_id_for_comm_node_pass]: 4.42e-06 [meta_fg_expand]: 0.00011895 [flash_sp_send_recv_attached]: 2.15002e-06 [receive_attached]: 2.56e-06 [after_resolve]: 1.55e-05 [a_after_grad]: 1.073e-05 [renormalize]: 0.00087808 [add_forward_monad_depend]: 7.05998e-06 [auto_monad_grad]: 2.76999e-06 [auto_monad_eliminator]: 1.776e-05 [cse]: 3.659e-05 [a_3]: 5.467e-05 [Cycle 3]: 0.00074415, [45] [expand_dump_flag]: 1.79e-06 [switch_simplify]: 8.27e-06 [loop_unroll]: 7.38e-06 [a_1]: 0.00015703 [with_stream_mark]: 1.105e-05 [recompute_prepare]: 7.37997e-06 [updatestate_depend_eliminate]: 4.1e-06 [updatestate_assign_eliminate]: 3.12002e-06 [updatestate_loads_eliminate]: 3.06999e-06 [parameter_eliminate]: 1.72999e-06 [a_2]: 8.7e-05 [accelerated_algorithm]: 1.094e-05 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 2.09e-06 [shard_inline]: 7.64002e-06 [merge_send_recv]: 7.66001e-06 [auto_parallel]: 7.78001e-06 [parallel]: 8.23001e-06 [flash_sp]: 9.29984e-07 [merge_comm]: 3.97998e-06 [allreduce_fusion]: 3.86001e-06 [matmul_add_comm_reduction]: 8.16002e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 7.71001e-06 [virtual_dataset]: 6.42001e-06 [get_grad_eliminate_]: 6.33e-06 [virtual_output]: 6.14001e-06 [merge_forward]: 4.05e-06 [cell_reuse_recompute_pass]: 2.30002e-06 [offload_activation]: 9.32999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.392e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 1.171e-05 [set_forward_comm_id_for_comm_node_pass]: 4.89e-06 [meta_fg_expand]: 2.70002e-06 [flash_sp_send_recv_attached]: 1.27e-06 [receive_attached]: 1.12e-06 [after_resolve]: 1.048e-05 [a_after_grad]: 9.91e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.94999e-06 [auto_monad_grad]: 1.34e-06 [auto_monad_eliminator]: 1.012e-05 [cse]: 1.947e-05 [a_3]: 4.054e-05 [py_interpret_to_execute_after_opt_a]: 1.603e-05 [slice_cell_reuse_recomputed_activation]: 2.16e-06 [rewriter_after_opt_a]: 4.952e-05 [convert_after_rewriter]: 7.65998e-06 [order_py_execute_after_rewriter]: 5.73997e-06 [mutable_eliminate]: 0.00074567 [opt_b]: 0.00023818, [1] [Cycle 1]: 0.00022938, [7] [b_1]: 0.00013785 [b_2]: 8.50001e-06 [updatestate_depend_eliminate]: 9.72999e-06 [updatestate_assign_eliminate]: 3.68e-06 [updatestate_loads_eliminate]: 3.23e-06 [renormalize]: 1.25999e-06 [cse]: 2.784e-05 [optimize_parallel_all_gather_comm]: 2.062e-05 [overlap_param_gather]: 2.00002e-06 [cconv]: 3.128e-05 [loop_unroll]: 0.00047615 [opt_after_cconv]: 0.00011728, [1] [Cycle 1]: 0.00011052, [7] [c_1]: 3.42e-05 [parameter_eliminate]: 3.4e-06 [updatestate_depend_eliminate]: 6.54001e-06 [updatestate_assign_eliminate]: 2.96999e-06 [updatestate_loads_eliminate]: 3.13998e-06 [cse]: 2.286e-05 [renormalize]: 8.29983e-07 [remove_dup_value]: 1.875e-05 [tuple_transform]: 8.936e-05, [1] [Cycle 1]: 8.399e-05, [4] [d_1]: 5.237e-05 [none_parameter_eliminate]: 1.86003e-06 [renormalize]: 4.09986e-07 [switch_simplify]: 8.45999e-06 [partial_unused_args_eliminate]: 2.13998e-06 [add_recomputation]: 6.342e-05 [cse_after_recomputation]: 2.701e-05, [1] [Cycle 1]: 2.221e-05, [1] [cse]: 1.623e-05 [environ_conv]: 1.101e-05 [swap_dp_allreduce_reducescatter]: 6.39001e-06 [bias_add_comm_swap]: 3.13e-06 [label_micro_interleaved_index]: 5.94e-06 [label_fine_grained_interleaved_index]: 3.13e-06 [merge_cast_opt]: 1.83997e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.43002e-06 [ForceFp32Comm]: 8.80013e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 3.5e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.57999e-06 [interleave_parallel_branches]: 1.21002e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.504e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.52e-06 [overlap_recompute_and_grad_model_parallel]: 5.26002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.52001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.62001e-06 [overlap_grad_ring_attention]: 5.30999e-06 [overlap_grad_flash_sp]: 2.47e-05 [begin_end_overlap_inline]: 6.30011e-07 [split_matmul_comm_elemetwise]: 2.25002e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 1.40001e-06 [symbol_engine_optimizer]: 9.691e-05, [1] [Cycle 1]: 9.154e-05, [6] [build]: 1.174e-05 [elim_shapecalc]: 1.309e-05 [elim_not_effective]: 1.603e-05 [opt_reshape]: 8.12e-06 [fold_const_symbol]: 1.221e-05 [renormalize]: 3.00002e-07 [detach_backward]: 2.01998e-06 [pipeline_parallel_scheduler]: 1.82001e-06 [auto_monad_reorder]: 2.233e-05 [get_jit_bprop_graph]: 1.93002e-06 [rewriter_after_jit_bprop_graph]: 5.57001e-06 [opt_after_jit_grad]: 0.00055334 [validate]: 5.301e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.00769151 [execute]: 8.85001e-06 Sums bootstrap : 0.000513s : 1.21% type_inference : 0.014420s : 34.17% event_method : 0.000055s : 0.13% auto_monad : 0.000155s : 0.37% graph_reusing : 0.000009s : 0.02% inline : 0.000004s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000039s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.03% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000057s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000045s : 0.11% optimize.rewriter_before_opt_a : 0.000172s : 0.41% optimize.opt_a.expand_dump_flag : 0.000010s : 0.02% optimize.opt_a.switch_simplify : 0.000135s : 0.32% optimize.opt_a.loop_unroll : 0.000114s : 0.27% optimize.opt_a.a_1 : 0.003199s : 7.58% optimize.opt_a.with_stream_mark : 0.000106s : 0.25% optimize.opt_a.recompute_prepare : 0.000049s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.03% optimize.opt_a.parameter_eliminate : 0.000131s : 0.31% optimize.opt_a.a_2 : 0.000442s : 1.05% optimize.opt_a.accelerated_algorithm : 0.000062s : 0.15% optimize.opt_a.shard : 0.000006s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.02% optimize.opt_a.shard_inline : 0.000031s : 0.07% optimize.opt_a.merge_send_recv : 0.000040s : 0.10% optimize.opt_a.auto_parallel : 0.000031s : 0.07% optimize.opt_a.parallel : 0.000040s : 0.10% optimize.opt_a.flash_sp : 0.000019s : 0.05% optimize.opt_a.merge_comm : 0.000019s : 0.05% optimize.opt_a.allreduce_fusion : 0.000017s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000054s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.09% optimize.opt_a.virtual_dataset : 0.000030s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000029s : 0.07% optimize.opt_a.virtual_output : 0.000028s : 0.07% optimize.opt_a.merge_forward : 0.000019s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.01% optimize.opt_a.offload_activation : 0.000041s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.05% optimize.opt_a.meta_fg_expand : 0.001960s : 4.64% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.01% optimize.opt_a.after_resolve : 0.000100s : 0.24% optimize.opt_a.a_after_grad : 0.000114s : 0.27% optimize.opt_a.renormalize : 0.008485s : 20.10% optimize.opt_a.add_forward_monad_depend : 0.000022s : 0.05% optimize.opt_a.auto_monad_grad : 0.000011s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000086s : 0.20% optimize.opt_a.cse : 0.000264s : 0.63% optimize.opt_a.a_3 : 0.000448s : 1.06% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000050s : 0.12% optimize.convert_after_rewriter : 0.000008s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000746s : 1.77% optimize.opt_b.b_1 : 0.000138s : 0.33% optimize.opt_b.b_2 : 0.000009s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000028s : 0.07% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000031s : 0.07% optimize.loop_unroll : 0.000476s : 1.13% optimize.opt_after_cconv.c_1 : 0.000034s : 0.08% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000023s : 0.05% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000019s : 0.04% optimize.tuple_transform.d_1 : 0.000052s : 0.12% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000063s : 0.15% optimize.cse_after_recomputation.cse : 0.000016s : 0.04% optimize.environ_conv : 0.000011s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000006s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000025s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.05% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.01% opt_after_jit_grad : 0.000553s : 1.31% validate : 0.000053s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.007692s : 18.23% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000900 161 7.46% : 0.000067s : 8: substitution.arithmetic_simplify 0.27% : 0.000002s : 3: substitution.elim_not_effective 0.57% : 0.000005s : 5: substitution.float_depend_g_call 0.55% : 0.000005s : 2: substitution.float_tuple_getitem_switch 0.23% : 0.000002s : 3: substitution.fold_const_symbol 0.78% : 0.000007s : 4: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 59.35% : 0.000534s : 17: substitution.inline 2.44% : 0.000022s : 2: substitution.inline_without_move 1.31% : 0.000012s : 15: substitution.j_node_and_user_rematch 2.35% : 0.000021s : 3: substitution.less_batch_normalization 1.44% : 0.000013s : 7: substitution.minmaximum_grad 0.74% : 0.000007s : 5: substitution.partial_eliminate 1.43% : 0.000013s : 15: substitution.remove_not_recompute_node 3.77% : 0.000034s : 10: substitution.replace_applicator 1.37% : 0.000012s : 10: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.76% : 0.000025s : 7: substitution.tuple_list_convert_item_index_to_positive 1.26% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.71% : 0.000015s : 7: substitution.tuple_list_get_item_depend_reorder 7.46% : 0.000067s : 19: substitution.tuple_list_get_item_eliminator 1.82% : 0.000016s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.014311 2 87.16% : 0.012474s : 1: type_inference.infer 12.84% : 0.001837s : 1: type_inference.specialize ------[replace.] 0.000263 27 55.83% : 0.000147s : 17: replace.inline 44.17% : 0.000116s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000556 27 94.09% : 0.000523s : 17: match.inline 5.91% : 0.000033s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000723 4248 1.19% : 0.000009s : 53: predicate.accumulaten_eliminater 0.31% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.48% : 0.000003s : 21: predicate.addn_check_dump 1.24% : 0.000009s : 53: predicate.addn_zero_filter 1.04% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 74: predicate.arithmetic_simplify 1.21% : 0.000009s : 53: predicate.cast_eliminate 1.08% : 0.000008s : 50: predicate.check_bprop_eliminate 0.45% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.43% : 0.000003s : 21: predicate.depend_value_elim 1.12% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 53: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 57: predicate.environ_add_const_eliminate 1.17% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 57: predicate.environ_get_depend_swap 1.65% : 0.000012s : 78: predicate.environ_get_eliminate 1.14% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.81% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.65% : 0.000019s : 80: predicate.float_depend_g_call 0.49% : 0.000004s : 21: predicate.float_environ_get_switch 0.57% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.57% : 0.000004s : 21: predicate.get_grad_eliminate 0.06% : 0.000000s : 4: predicate.graph_param_transform 0.50% : 0.000004s : 21: predicate.incorporate_call 0.45% : 0.000003s : 21: predicate.incorporate_call_switch 5.89% : 0.000043s : 183: predicate.inline 1.39% : 0.000010s : 45: predicate.inline_without_move 0.27% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 21: predicate.less_batch_normalization 1.62% : 0.000012s : 71: predicate.list_to_tuple_eliminator_ 2.59% : 0.000019s : 124: predicate.load_eliminater 0.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.53% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.47% : 0.000003s : 21: predicate.merge_addn 1.10% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.06% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 53: predicate.minmaximum_grad 0.50% : 0.000004s : 4: predicate.mutable_eliminate 0.12% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 2.21% : 0.000016s : 80: predicate.partial_defer_inline 1.64% : 0.000012s : 67: predicate.partial_eliminate 1.09% : 0.000008s : 53: predicate.print_const_string_wrapper 0.46% : 0.000003s : 21: predicate.reduce_all_const_elim 1.38% : 0.000010s : 53: predicate.reduce_eliminate 2.68% : 0.000019s : 124: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 21: predicate.remove_not_recompute_node 1.96% : 0.000014s : 113: predicate.replace_applicator 0.74% : 0.000005s : 45: predicate.replace_old_param 0.21% : 0.000001s : 4: predicate.reset_defer_inline 1.18% : 0.000009s : 53: predicate.reshape_eliminate 1.07% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 4: predicate.row_tensor_eliminate 1.27% : 0.000009s : 50: predicate.same_eliminate 0.37% : 0.000003s : 21: predicate.set_cell_output_no_recompute 0.60% : 0.000004s : 21: predicate.shard_identity_eliminate 0.23% : 0.000002s : 8: predicate.special_op_eliminate 0.67% : 0.000005s : 21: predicate.specialize_transform 1.31% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.90% : 0.000014s : 80: predicate.switch_defer_inline 2.95% : 0.000021s : 130: predicate.switch_layer_defer_inline 5.25% : 0.000038s : 218: predicate.switch_simplify 1.08% : 0.000008s : 53: predicate.tile_eliminate 1.08% : 0.000008s : 53: predicate.transpose_eliminate 1.39% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 61: predicate.tuple_list_get_item_depend_reorder 2.72% : 0.000020s : 92: predicate.tuple_list_get_item_eliminator 1.42% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 1.93% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.53% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.49% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.11% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.12% : 0.000001s : 4: predicate.value_based_eliminate 0.51% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.53% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002137 36 60.25% : 0.001288s : 15: func_graph_cloner_run.FuncGraphClonerGraph 39.75% : 0.000850s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.084627 237 0.00% : 0.000004s : 1: ForceFp32Comm 4.45% : 0.003770s : 1: add_attr 4.44% : 0.003756s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000068s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000164s : 1: auto_monad 0.03% : 0.000026s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.65% : 0.000552s : 1: bootstrap 0.04% : 0.000035s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.04% : 0.000030s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000014s : 1: environ_conv 0.08% : 0.000065s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000014s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000008s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000005s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000009s : 1: label_micro_interleaved_index 0.57% : 0.000486s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.90% : 0.000758s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000020s : 1: opt.transform.mutable_eliminate 5.64% : 0.004776s : 117: opt.transform.opt_a 0.04% : 0.000032s : 1: opt.transform.opt_after_cconv 0.03% : 0.000028s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000115s : 28: opt.transform.opt_b 0.07% : 0.000058s : 2: opt.transform.opt_trans_graph 0.05% : 0.000045s : 4: opt.transform.symbol_engine_opt 20.46% : 0.017312s : 1: opt_a 0.14% : 0.000121s : 1: opt_after_cconv 0.67% : 0.000567s : 1: opt_after_jit_grad 0.29% : 0.000242s : 1: opt_b 23.53% : 0.019911s : 1: optimize 0.03% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.07% : 0.000063s : 1: pre_auto_parallel 0.06% : 0.000049s : 1: py_interpret_to_execute 0.02% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000022s : 1: remove_dup_value 7.82% : 0.006618s : 2: renormalize.infer 2.18% : 0.001846s : 2: renormalize.specialize 0.01% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000055s : 1: rewriter_after_opt_a 0.21% : 0.000177s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000100s : 1: symbol_engine_optimizer 9.11% : 0.007712s : 1: task_emit 0.11% : 0.000093s : 1: tuple_transform 17.08% : 0.014451s : 1: type_inference 0.11% : 0.000094s : 1: validate TotalTime = 0.0291515, [24] [bootstrap]: 0.00046098 [type_inference]: 0.00702307 [event_method]: 1.505e-05 [auto_monad]: 6.354e-05 [graph_reusing]: 6.12999e-06 [inline]: 2.76e-06 [add_attr]: 0.00356221, [1] [add_attr_with_inline]: 0.00354832, [1] [Cycle 1]: 6.105e-05, [2] [tag_attr]: 1.586e-05 [meta_addattr_fg_expand]: 4.51002e-06 [parallel-infer-symbol]: 3.41001e-06 [pre_auto_parallel]: 3.239e-05 [insert-virtual-dataset]: 3.01999e-06 [parallel-infer-symbol-second]: 8.29983e-07 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 2.05002e-06 [optimize]: 0.00587029, [53] [py_interpret_to_execute]: 2.596e-05 [rewriter_before_opt_a]: 6.213e-05 [opt_a]: 0.00346858, [2] [Cycle 1]: 0.00278019, [45] [expand_dump_flag]: 2.94999e-06 [switch_simplify]: 2.933e-05 [loop_unroll]: 1.745e-05 [a_1]: 0.00039405 [with_stream_mark]: 1.986e-05 [recompute_prepare]: 9.30001e-06 [updatestate_depend_eliminate]: 3.78001e-06 [updatestate_assign_eliminate]: 3.66001e-06 [updatestate_loads_eliminate]: 3.6e-06 [parameter_eliminate]: 1.82001e-06 [a_2]: 8.414e-05 [accelerated_algorithm]: 6.88e-06 [shard]: 2.29999e-06 [meta_shard_fg_expand]: 2.12001e-06 [shard_inline]: 6.36e-06 [merge_send_recv]: 8.98002e-06 [auto_parallel]: 6.74999e-06 [parallel]: 0.00092087 [flash_sp]: 1.142e-05 [merge_comm]: 7.68999e-06 [allreduce_fusion]: 4.2e-06 [matmul_add_comm_reduction]: 1.107e-05 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 1.517e-05 [virtual_dataset]: 6.81001e-06 [get_grad_eliminate_]: 5.99e-06 [virtual_output]: 6.26e-06 [merge_forward]: 4.38001e-06 [cell_reuse_recompute_pass]: 1.62001e-06 [offload_activation]: 1.024e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.644e-05 [merge_recompute_call_nodes]: 1.76998e-06 [before_grad]: 1.128e-05 [set_forward_comm_id_for_comm_node_pass]: 3.96001e-06 [meta_fg_expand]: 3.12002e-06 [flash_sp_send_recv_attached]: 3.04999e-06 [receive_attached]: 2.09e-06 [after_resolve]: 1.111e-05 [a_after_grad]: 1.101e-05 [renormalize]: 0.00071956 [add_forward_monad_depend]: 6.23e-06 [auto_monad_grad]: 3.2e-06 [auto_monad_eliminator]: 1.734e-05 [cse]: 3.387e-05 [a_3]: 4.81e-05 [Cycle 2]: 0.00067598, [45] [expand_dump_flag]: 2.19001e-06 [switch_simplify]: 7.43e-06 [loop_unroll]: 6.51e-06 [a_1]: 0.00012706 [with_stream_mark]: 1.367e-05 [recompute_prepare]: 6.11998e-06 [updatestate_depend_eliminate]: 4.1e-06 [updatestate_assign_eliminate]: 3.09999e-06 [updatestate_loads_eliminate]: 2.98998e-06 [parameter_eliminate]: 1.35999e-06 [a_2]: 7.405e-05 [accelerated_algorithm]: 6.54001e-06 [shard]: 1.22999e-06 [meta_shard_fg_expand]: 2.07001e-06 [shard_inline]: 6.04999e-06 [merge_send_recv]: 6.46999e-06 [auto_parallel]: 7.2e-06 [parallel]: 7.40003e-06 [flash_sp]: 3.85998e-06 [merge_comm]: 3.71001e-06 [allreduce_fusion]: 3.68999e-06 [matmul_add_comm_reduction]: 7.87e-06 [allreduce_slice_to_reducescatter]: 5.40022e-07 [virtual_shard_identity]: 7.25e-06 [virtual_dataset]: 5.49e-06 [get_grad_eliminate_]: 5.78002e-06 [virtual_output]: 5.27001e-06 [merge_forward]: 3.5e-06 [cell_reuse_recompute_pass]: 1.57999e-06 [offload_activation]: 1.287e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.126e-05 [merge_recompute_call_nodes]: 1.78002e-06 [before_grad]: 1.057e-05 [set_forward_comm_id_for_comm_node_pass]: 4.43001e-06 [meta_fg_expand]: 2.14e-06 [flash_sp_send_recv_attached]: 9.80013e-07 [receive_attached]: 1.29998e-06 [after_resolve]: 9.37999e-06 [a_after_grad]: 8.33999e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.79e-06 [auto_monad_grad]: 1.44e-06 [auto_monad_eliminator]: 8.13001e-06 [cse]: 1.631e-05 [a_3]: 3.319e-05 [py_interpret_to_execute_after_opt_a]: 1.328e-05 [slice_cell_reuse_recomputed_activation]: 2.36e-06 [rewriter_after_opt_a]: 3.773e-05 [convert_after_rewriter]: 6.83e-06 [order_py_execute_after_rewriter]: 5.91998e-06 [mutable_eliminate]: 0.00071044 [opt_b]: 0.00020913, [1] [Cycle 1]: 0.00020064, [7] [b_1]: 0.00011567 [b_2]: 7.92e-06 [updatestate_depend_eliminate]: 8.02e-06 [updatestate_assign_eliminate]: 2.86e-06 [updatestate_loads_eliminate]: 2.61e-06 [renormalize]: 6.09987e-07 [cse]: 2.57e-05 [optimize_parallel_all_gather_comm]: 1.845e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 3.163e-05 [loop_unroll]: 0.00050835 [opt_after_cconv]: 0.00010979, [1] [Cycle 1]: 0.00010241, [7] [c_1]: 2.778e-05 [parameter_eliminate]: 3.98999e-06 [updatestate_depend_eliminate]: 6.99001e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 3.02002e-06 [cse]: 2.166e-05 [renormalize]: 9.20001e-07 [remove_dup_value]: 1.66e-05 [tuple_transform]: 0.00011556, [1] [Cycle 1]: 0.00010999, [4] [d_1]: 4.186e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 7.13e-06 [partial_unused_args_eliminate]: 2.31e-06 [add_recomputation]: 5.239e-05 [cse_after_recomputation]: 2.608e-05, [1] [Cycle 1]: 2.056e-05, [1] [cse]: 1.459e-05 [environ_conv]: 6.35997e-06 [swap_dp_allreduce_reducescatter]: 5.76e-06 [bias_add_comm_swap]: 3.2e-06 [label_micro_interleaved_index]: 5.91e-06 [label_fine_grained_interleaved_index]: 3.04001e-06 [merge_cast_opt]: 1.45999e-06 [slice_recompute_activation]: 2.48e-06 [micro_interleaved_order_control]: 2.23998e-06 [assign_add_opt]: 1.42999e-06 [ForceFp32Comm]: 1.35001e-06 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.51998e-06 [reorder_send_recv_between_fp_bp]: 2.81e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.13001e-06 [interleave_split_concat_branches]: 1.30001e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.53002e-06 [overlap_opt_shard_grad_in_pipeline]: 2.49001e-06 [control_data_broadcast_order]: 1.558e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 5.26002e-06 [overlap_recompute_and_grad_model_parallel]: 5.25001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.66999e-06 [overlap_grad_ring_attention]: 5.29998e-06 [overlap_grad_flash_sp]: 2.195e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.89999e-06 [split_layernorm_comm]: 2.05002e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 8.502e-05, [1] [Cycle 1]: 7.975e-05, [6] [build]: 4.27e-06 [elim_shapecalc]: 1.149e-05 [elim_not_effective]: 1.47e-05 [opt_reshape]: 7.75e-06 [fold_const_symbol]: 1.066e-05 [renormalize]: 3.30008e-07 [detach_backward]: 2.13998e-06 [pipeline_parallel_scheduler]: 1.82999e-06 [auto_monad_reorder]: 1.926e-05 [get_jit_bprop_graph]: 1.97001e-06 [rewriter_after_jit_bprop_graph]: 4.53999e-06 [opt_after_jit_grad]: 0.00059912 [validate]: 4.506e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0111619 [execute]: 9.32999e-06 Sums bootstrap : 0.000461s : 1.89% type_inference : 0.007023s : 28.78% event_method : 0.000015s : 0.06% auto_monad : 0.000064s : 0.26% graph_reusing : 0.000006s : 0.03% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.07% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000032s : 0.13% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000026s : 0.11% optimize.rewriter_before_opt_a : 0.000062s : 0.25% optimize.opt_a.expand_dump_flag : 0.000005s : 0.02% optimize.opt_a.switch_simplify : 0.000037s : 0.15% optimize.opt_a.loop_unroll : 0.000024s : 0.10% optimize.opt_a.a_1 : 0.000521s : 2.14% optimize.opt_a.with_stream_mark : 0.000034s : 0.14% optimize.opt_a.recompute_prepare : 0.000015s : 0.06% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000158s : 0.65% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.06% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.05% optimize.opt_a.merge_send_recv : 0.000015s : 0.06% optimize.opt_a.auto_parallel : 0.000014s : 0.06% optimize.opt_a.parallel : 0.000928s : 3.80% optimize.opt_a.flash_sp : 0.000015s : 0.06% optimize.opt_a.merge_comm : 0.000011s : 0.05% optimize.opt_a.allreduce_fusion : 0.000008s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000022s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.05% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.05% optimize.opt_a.virtual_output : 0.000012s : 0.05% optimize.opt_a.merge_forward : 0.000008s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000023s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000028s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000022s : 0.09% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.03% optimize.opt_a.meta_fg_expand : 0.000005s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.08% optimize.opt_a.a_after_grad : 0.000019s : 0.08% optimize.opt_a.renormalize : 0.000720s : 2.95% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.03% optimize.opt_a.auto_monad_grad : 0.000005s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.10% optimize.opt_a.cse : 0.000050s : 0.21% optimize.opt_a.a_3 : 0.000081s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000038s : 0.15% optimize.convert_after_rewriter : 0.000007s : 0.03% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000710s : 2.91% optimize.opt_b.b_1 : 0.000116s : 0.47% optimize.opt_b.b_2 : 0.000008s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000026s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.08% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000032s : 0.13% optimize.loop_unroll : 0.000508s : 2.08% optimize.opt_after_cconv.c_1 : 0.000028s : 0.11% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000022s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000017s : 0.07% optimize.tuple_transform.d_1 : 0.000042s : 0.17% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.21% optimize.cse_after_recomputation.cse : 0.000015s : 0.06% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000006s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000016s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000022s : 0.09% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000019s : 0.08% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.02% opt_after_jit_grad : 0.000599s : 2.46% validate : 0.000045s : 0.18% backend_pass : 0.000001s : 0.00% task_emit : 0.011162s : 45.75% execute : 0.000009s : 0.04% Time group info: ------[substitution.] 0.000183 24 21.09% : 0.000039s : 4: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000002s : 2: substitution.fold_const_symbol 3.14% : 0.000006s : 3: substitution.graph_param_transform 66.03% : 0.000121s : 3: substitution.inline 2.47% : 0.000005s : 4: substitution.j_node_and_user_rematch 2.80% : 0.000005s : 4: substitution.remove_not_recompute_node 2.47% : 0.000005s : 2: substitution.replace_old_param ------[type_inference.] 0.006964 2 92.36% : 0.006432s : 1: type_inference.infer 7.64% : 0.000532s : 1: type_inference.specialize ------[replace.] 0.000030 3 100.00% : 0.000030s : 3: replace.inline ------[match.] 0.000119 3 100.00% : 0.000119s : 3: match.inline ------[predicate.] 0.000161 815 0.86% : 0.000001s : 8: predicate.accumulaten_eliminater 1.20% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 8: predicate.addn_zero_filter 0.70% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.41% : 0.000004s : 14: predicate.arithmetic_simplify 0.78% : 0.000001s : 8: predicate.cast_eliminate 0.68% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.66% : 0.000001s : 6: predicate.depend_value_elim 0.80% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.99% : 0.000002s : 8: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 3: predicate.elim_not_effective 0.60% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.03% : 0.000002s : 11: predicate.environ_add_const_eliminate 0.97% : 0.000002s : 11: predicate.environ_get_add_eliminate 0.96% : 0.000002s : 11: predicate.environ_get_depend_swap 1.68% : 0.000003s : 17: predicate.environ_get_eliminate 1.02% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.11% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.19% : 0.000004s : 11: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.83% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.40% : 0.000001s : 3: predicate.fold_const_symbol 0.81% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.09% : 0.000010s : 37: predicate.inline 1.14% : 0.000002s : 6: predicate.inline_without_move 0.38% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.01% : 0.000002s : 6: predicate.less_batch_normalization 1.58% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 22: predicate.load_eliminater 1.29% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.93% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.83% : 0.000001s : 6: predicate.merge_addn 0.75% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.59% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 8: predicate.minmaximum_grad 1.81% : 0.000003s : 3: predicate.mutable_eliminate 0.41% : 0.000001s : 3: predicate.opt_reshape 0.48% : 0.000001s : 3: predicate.parallel_virtual_node 1.43% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 11: predicate.partial_eliminate 0.77% : 0.000001s : 8: predicate.print_const_string_wrapper 0.94% : 0.000002s : 6: predicate.reduce_all_const_elim 1.19% : 0.000002s : 8: predicate.reduce_eliminate 2.09% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.57% : 0.000001s : 6: predicate.remove_not_recompute_node 1.14% : 0.000002s : 14: predicate.replace_applicator 0.60% : 0.000001s : 6: predicate.replace_old_param 0.42% : 0.000001s : 3: predicate.reset_defer_inline 0.86% : 0.000001s : 8: predicate.reshape_eliminate 0.68% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 3: predicate.row_tensor_eliminate 1.12% : 0.000002s : 6: predicate.same_eliminate 0.44% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.06% : 0.000002s : 6: predicate.shard_identity_eliminate 0.88% : 0.000001s : 6: predicate.special_op_eliminate 0.83% : 0.000001s : 6: predicate.specialize_transform 0.96% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.15% : 0.000002s : 11: predicate.switch_defer_inline 1.76% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.57% : 0.000007s : 38: predicate.switch_simplify 0.81% : 0.000001s : 8: predicate.tile_eliminate 0.87% : 0.000001s : 8: predicate.transpose_eliminate 1.47% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.56% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.01% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.95% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.64% : 0.000001s : 3: predicate.value_based_eliminate 0.81% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.92% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000378 7 34.95% : 0.000132s : 2: func_graph_cloner_run.FuncGraphClonerGraph 65.05% : 0.000246s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.040370 196 0.01% : 0.000004s : 1: ForceFp32Comm 8.84% : 0.003568s : 1: add_attr 8.80% : 0.003553s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.14% : 0.000057s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.17% : 0.000069s : 1: auto_monad 0.06% : 0.000024s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.22% : 0.000492s : 1: bootstrap 0.09% : 0.000036s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000019s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000029s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000010s : 1: environ_conv 0.05% : 0.000022s : 1: event_method 0.04% : 0.000016s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000010s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000009s : 1: label_micro_interleaved_index 1.29% : 0.000520s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.79% : 0.000723s : 1: mutable_eliminate 0.02% : 0.000008s : 1: offloading_packed_experts 0.04% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000018s : 1: opt.transform.mutable_eliminate 2.27% : 0.000916s : 78: opt.transform.opt_a 0.06% : 0.000026s : 1: opt.transform.opt_after_cconv 0.06% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.23% : 0.000093s : 28: opt.transform.opt_b 0.11% : 0.000045s : 2: opt.transform.opt_trans_graph 0.10% : 0.000040s : 4: opt.transform.symbol_engine_opt 8.60% : 0.003472s : 1: opt_a 0.28% : 0.000113s : 1: opt_after_cconv 1.52% : 0.000612s : 1: opt_after_jit_grad 0.53% : 0.000213s : 1: opt_b 14.55% : 0.005876s : 1: optimize 0.05% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.06% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000009s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000006s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000037s : 1: pre_auto_parallel 0.07% : 0.000030s : 1: py_interpret_to_execute 0.04% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000020s : 1: remove_dup_value 0.97% : 0.000390s : 1: renormalize.infer 0.80% : 0.000322s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000042s : 1: rewriter_after_opt_a 0.16% : 0.000066s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.02% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000088s : 1: symbol_engine_optimizer 27.70% : 0.011183s : 1: task_emit 0.29% : 0.000119s : 1: tuple_transform 17.46% : 0.007050s : 1: type_inference 0.21% : 0.000086s : 1: validate TotalTime = 0.138114, [24] [bootstrap]: 0.00050463 [type_inference]: 0.0172277 [event_method]: 5.585e-05 [auto_monad]: 0.00014023 [graph_reusing]: 9.40001e-06 [inline]: 2.58998e-06 [add_attr]: 0.00380668, [1] [add_attr_with_inline]: 0.00379425, [1] [Cycle 1]: 8.527e-05, [2] [tag_attr]: 3.918e-05 [meta_addattr_fg_expand]: 1.017e-05 [parallel-infer-symbol]: 3.51999e-06 [pre_auto_parallel]: 5.539e-05 [insert-virtual-dataset]: 2.93e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.77999e-06 [optimize]: 0.108753, [53] [py_interpret_to_execute]: 4.387e-05 [rewriter_before_opt_a]: 0.00016126 [opt_a]: 0.105032, [3] [Cycle 1]: 0.101299, [45] [expand_dump_flag]: 5.71e-06 [switch_simplify]: 7.481e-05 [loop_unroll]: 6.034e-05 [a_1]: 0.0248084 [with_stream_mark]: 3.928e-05 [recompute_prepare]: 3.061e-05 [updatestate_depend_eliminate]: 1.089e-05 [updatestate_assign_eliminate]: 7.71999e-06 [updatestate_loads_eliminate]: 7.18e-06 [parameter_eliminate]: 3.93999e-06 [a_2]: 0.00025604 [accelerated_algorithm]: 3.755e-05 [shard]: 1.89e-06 [meta_shard_fg_expand]: 6.36998e-06 [shard_inline]: 1.695e-05 [merge_send_recv]: 1.873e-05 [auto_parallel]: 1.578e-05 [parallel]: 2.335e-05 [flash_sp]: 1.565e-05 [merge_comm]: 1.171e-05 [allreduce_fusion]: 8.97e-06 [matmul_add_comm_reduction]: 3.672e-05 [allreduce_slice_to_reducescatter]: 9.89996e-07 [virtual_shard_identity]: 2.169e-05 [virtual_dataset]: 1.731e-05 [get_grad_eliminate_]: 1.547e-05 [virtual_output]: 1.53e-05 [merge_forward]: 9.32999e-06 [cell_reuse_recompute_pass]: 1.77999e-06 [offload_activation]: 2.111e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.358e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 3.069e-05 [set_forward_comm_id_for_comm_node_pass]: 9.81e-06 [meta_fg_expand]: 0.00203362 [flash_sp_send_recv_attached]: 5.09e-06 [receive_attached]: 2.48e-06 [after_resolve]: 8.05e-05 [a_after_grad]: 9.931e-05 [renormalize]: 0.0722961 [add_forward_monad_depend]: 1.416e-05 [auto_monad_grad]: 6.84999e-06 [auto_monad_eliminator]: 5.787e-05 [cse]: 0.00021666 [a_3]: 0.00035883 [Cycle 2]: 0.00302519, [45] [expand_dump_flag]: 3.19001e-06 [switch_simplify]: 4.72e-05 [loop_unroll]: 4.248e-05 [a_1]: 0.00142475 [with_stream_mark]: 1.735e-05 [recompute_prepare]: 9.10001e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 4.14002e-06 [updatestate_loads_eliminate]: 3.91999e-06 [parameter_eliminate]: 1.86998e-06 [a_2]: 9.03e-05 [accelerated_algorithm]: 1.253e-05 [shard]: 2.46e-06 [meta_shard_fg_expand]: 2.71999e-06 [shard_inline]: 7.33999e-06 [merge_send_recv]: 1.082e-05 [auto_parallel]: 1.149e-05 [parallel]: 1.013e-05 [flash_sp]: 4.23999e-06 [merge_comm]: 4.45999e-06 [allreduce_fusion]: 3.86999e-06 [matmul_add_comm_reduction]: 1.031e-05 [allreduce_slice_to_reducescatter]: 1.00999e-06 [virtual_shard_identity]: 9.07999e-06 [virtual_dataset]: 7.08e-06 [get_grad_eliminate_]: 6.48e-06 [virtual_output]: 6.38998e-06 [merge_forward]: 4.50999e-06 [cell_reuse_recompute_pass]: 1.42999e-06 [offload_activation]: 1.106e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.412e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 1.244e-05 [set_forward_comm_id_for_comm_node_pass]: 4.25e-06 [meta_fg_expand]: 8.156e-05 [flash_sp_send_recv_attached]: 1.99999e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 1.206e-05 [a_after_grad]: 1.051e-05 [renormalize]: 0.00076396 [add_forward_monad_depend]: 4.38999e-06 [auto_monad_grad]: 1.32999e-06 [auto_monad_eliminator]: 1.2e-05 [cse]: 2.401e-05 [a_3]: 4.878e-05 [Cycle 3]: 0.0006896, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 7.92998e-06 [loop_unroll]: 6.86001e-06 [a_1]: 0.00014988 [with_stream_mark]: 8.90001e-06 [recompute_prepare]: 6.82002e-06 [updatestate_depend_eliminate]: 3.71001e-06 [updatestate_assign_eliminate]: 2.75997e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 8.597e-05 [accelerated_algorithm]: 9.68997e-06 [shard]: 1.14003e-06 [meta_shard_fg_expand]: 1.35999e-06 [shard_inline]: 6.94999e-06 [merge_send_recv]: 5.27999e-06 [auto_parallel]: 6.41e-06 [parallel]: 5.30001e-06 [flash_sp]: 9.29984e-07 [merge_comm]: 3.69002e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 6.04001e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 7.61999e-06 [virtual_dataset]: 6.24001e-06 [get_grad_eliminate_]: 6.14001e-06 [virtual_output]: 6.22001e-06 [merge_forward]: 3.11999e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 7.1e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.246e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.059e-05 [set_forward_comm_id_for_comm_node_pass]: 3.75e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 9.5999e-07 [after_resolve]: 9.57001e-06 [a_after_grad]: 1.079e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.39e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 8.13999e-06 [cse]: 1.724e-05 [a_3]: 3.992e-05 [py_interpret_to_execute_after_opt_a]: 1.361e-05 [slice_cell_reuse_recomputed_activation]: 1.95001e-06 [rewriter_after_opt_a]: 4.604e-05 [convert_after_rewriter]: 7.58001e-06 [order_py_execute_after_rewriter]: 5.81e-06 [mutable_eliminate]: 0.00193143 [opt_b]: 0.00023265, [1] [Cycle 1]: 0.00022393, [7] [b_1]: 0.00013895 [b_2]: 8.82e-06 [updatestate_depend_eliminate]: 7.27002e-06 [updatestate_assign_eliminate]: 2.81e-06 [updatestate_loads_eliminate]: 2.89001e-06 [renormalize]: 6.49976e-07 [cse]: 2.377e-05 [optimize_parallel_all_gather_comm]: 1.935e-05 [overlap_param_gather]: 1.89999e-06 [cconv]: 2.919e-05 [loop_unroll]: 0.00048464 [opt_after_cconv]: 0.0001166, [1] [Cycle 1]: 0.00011035, [7] [c_1]: 3.239e-05 [parameter_eliminate]: 3.23e-06 [updatestate_depend_eliminate]: 5.93998e-06 [updatestate_assign_eliminate]: 2.84001e-06 [updatestate_loads_eliminate]: 2.64999e-06 [cse]: 2.145e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.669e-05 [tuple_transform]: 8.199e-05, [1] [Cycle 1]: 7.714e-05, [4] [d_1]: 4.775e-05 [none_parameter_eliminate]: 1.79998e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 8.12e-06 [partial_unused_args_eliminate]: 1.96998e-06 [add_recomputation]: 5.553e-05 [cse_after_recomputation]: 2.548e-05, [1] [Cycle 1]: 2.015e-05, [1] [cse]: 1.448e-05 [environ_conv]: 9.71e-06 [swap_dp_allreduce_reducescatter]: 6.66e-06 [bias_add_comm_swap]: 2.91e-06 [label_micro_interleaved_index]: 4.63001e-06 [label_fine_grained_interleaved_index]: 2.65002e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.27001e-06 [assign_add_opt]: 1.35001e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.20999e-06 [full_micro_interleaved_order_control]: 2.27001e-06 [reorder_send_recv_between_fp_bp]: 2.84001e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 1.23002e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.39998e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 2.24999e-06 [control_data_broadcast_order]: 1.458e-05 [grouped_pairwise_exchange_alltoall]: 1.42999e-06 [offloading_packed_experts]: 4.21001e-06 [overlap_recompute_and_grad_model_parallel]: 5.84e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 4.89e-06 [overlap_grad_flash_sp]: 2.424e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.54999e-06 [split_layernorm_comm]: 2.04e-06 [handle_group_info]: 1.57001e-06 [symbol_engine_optimizer]: 9.301e-05, [1] [Cycle 1]: 8.777e-05, [6] [build]: 1.036e-05 [elim_shapecalc]: 1.216e-05 [elim_not_effective]: 1.498e-05 [opt_reshape]: 7.55e-06 [fold_const_symbol]: 1.217e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.56998e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 2.154e-05 [get_jit_bprop_graph]: 1.79e-06 [rewriter_after_jit_bprop_graph]: 4.05e-06 [opt_after_jit_grad]: 0.00048127 [validate]: 4.826e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00673995 [execute]: 8.70001e-06 Sums bootstrap : 0.000505s : 0.38% type_inference : 0.017228s : 12.97% event_method : 0.000056s : 0.04% auto_monad : 0.000140s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000039s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000055s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000044s : 0.03% optimize.rewriter_before_opt_a : 0.000161s : 0.12% optimize.opt_a.expand_dump_flag : 0.000010s : 0.01% optimize.opt_a.switch_simplify : 0.000130s : 0.10% optimize.opt_a.loop_unroll : 0.000110s : 0.08% optimize.opt_a.a_1 : 0.026383s : 19.87% optimize.opt_a.with_stream_mark : 0.000066s : 0.05% optimize.opt_a.recompute_prepare : 0.000047s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.01% optimize.opt_a.parameter_eliminate : 0.000007s : 0.01% optimize.opt_a.a_2 : 0.000432s : 0.33% optimize.opt_a.accelerated_algorithm : 0.000060s : 0.05% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.01% optimize.opt_a.shard_inline : 0.000031s : 0.02% optimize.opt_a.merge_send_recv : 0.000035s : 0.03% optimize.opt_a.auto_parallel : 0.000034s : 0.03% optimize.opt_a.parallel : 0.000039s : 0.03% optimize.opt_a.flash_sp : 0.000021s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000016s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000053s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000031s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.02% optimize.opt_a.virtual_output : 0.000028s : 0.02% optimize.opt_a.merge_forward : 0.000017s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000039s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000054s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.01% optimize.opt_a.meta_fg_expand : 0.002117s : 1.59% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.01% optimize.opt_a.receive_attached : 0.000006s : 0.00% optimize.opt_a.after_resolve : 0.000102s : 0.08% optimize.opt_a.a_after_grad : 0.000121s : 0.09% optimize.opt_a.renormalize : 0.073060s : 55.02% optimize.opt_a.add_forward_monad_depend : 0.000020s : 0.02% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000078s : 0.06% optimize.opt_a.cse : 0.000258s : 0.19% optimize.opt_a.a_3 : 0.000448s : 0.34% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.03% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.001931s : 1.45% optimize.opt_b.b_1 : 0.000139s : 0.10% optimize.opt_b.b_2 : 0.000009s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000024s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000029s : 0.02% optimize.loop_unroll : 0.000485s : 0.36% optimize.opt_after_cconv.c_1 : 0.000032s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000021s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000017s : 0.01% optimize.tuple_transform.d_1 : 0.000048s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.04% optimize.cse_after_recomputation.cse : 0.000014s : 0.01% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000002s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000481s : 0.36% validate : 0.000048s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.006740s : 5.08% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.001763 159 3.61% : 0.000064s : 7: substitution.arithmetic_simplify 0.13% : 0.000002s : 3: substitution.elim_not_effective 0.34% : 0.000006s : 5: substitution.float_depend_g_call 0.24% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.11% : 0.000002s : 3: substitution.fold_const_symbol 0.37% : 0.000007s : 4: substitution.graph_param_transform 0.25% : 0.000004s : 2: substitution.incorporate_call 0.13% : 0.000002s : 2: substitution.incorporate_call_switch 79.37% : 0.001399s : 17: substitution.inline 1.39% : 0.000025s : 2: substitution.inline_without_move 0.67% : 0.000012s : 15: substitution.j_node_and_user_rematch 1.14% : 0.000020s : 3: substitution.less_batch_normalization 0.69% : 0.000012s : 7: substitution.minmaximum_grad 0.43% : 0.000008s : 5: substitution.partial_eliminate 0.73% : 0.000013s : 15: substitution.remove_not_recompute_node 1.97% : 0.000035s : 10: substitution.replace_applicator 0.68% : 0.000012s : 10: substitution.replace_old_param 0.25% : 0.000004s : 1: substitution.set_cell_output_no_recompute 1.42% : 0.000025s : 7: substitution.tuple_list_convert_item_index_to_positive 0.66% : 0.000012s : 7: substitution.tuple_list_get_item_const_eliminator 0.87% : 0.000015s : 7: substitution.tuple_list_get_item_depend_reorder 3.62% : 0.000064s : 18: substitution.tuple_list_get_item_eliminator 0.92% : 0.000016s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.017124 2 88.93% : 0.015228s : 1: type_inference.infer 11.07% : 0.001896s : 1: type_inference.specialize ------[replace.] 0.022561 26 99.67% : 0.022487s : 17: replace.inline 0.33% : 0.000074s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.001417 26 97.91% : 0.001387s : 17: match.inline 2.09% : 0.000030s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000718 4180 1.20% : 0.000009s : 52: predicate.accumulaten_eliminater 0.22% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.11% : 0.000008s : 52: predicate.addn_zero_filter 1.03% : 0.000007s : 52: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 73: predicate.arithmetic_simplify 1.20% : 0.000009s : 52: predicate.cast_eliminate 1.08% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.48% : 0.000003s : 21: predicate.depend_value_elim 1.10% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.29% : 0.000009s : 52: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 4: predicate.elim_not_effective 0.15% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.16% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.18% : 0.000008s : 56: predicate.environ_get_depend_swap 1.61% : 0.000012s : 77: predicate.environ_get_eliminate 1.16% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 78: predicate.exchange_switch_depend_value 2.77% : 0.000020s : 78: predicate.float_depend_g_call 0.48% : 0.000003s : 21: predicate.float_environ_get_switch 0.58% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.54% : 0.000004s : 21: predicate.get_grad_eliminate 0.09% : 0.000001s : 4: predicate.graph_param_transform 0.54% : 0.000004s : 21: predicate.incorporate_call 0.44% : 0.000003s : 21: predicate.incorporate_call_switch 5.92% : 0.000042s : 180: predicate.inline 1.44% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.70% : 0.000005s : 21: predicate.less_batch_normalization 1.56% : 0.000011s : 69: predicate.list_to_tuple_eliminator_ 2.54% : 0.000018s : 121: predicate.load_eliminater 0.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.44% : 0.000018s : 110: predicate.loop_unroll_before_grad 1.42% : 0.000010s : 60: predicate.make_slice_get_slice_eliminator 0.47% : 0.000003s : 21: predicate.merge_addn 1.04% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.06% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 52: predicate.minmaximum_grad 0.32% : 0.000002s : 4: predicate.mutable_eliminate 0.10% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 2.84% : 0.000020s : 78: predicate.partial_defer_inline 1.61% : 0.000012s : 65: predicate.partial_eliminate 1.13% : 0.000008s : 52: predicate.print_const_string_wrapper 0.49% : 0.000004s : 21: predicate.reduce_all_const_elim 1.47% : 0.000011s : 52: predicate.reduce_eliminate 2.50% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.29% : 0.000002s : 21: predicate.remove_not_recompute_node 1.82% : 0.000013s : 111: predicate.replace_applicator 0.70% : 0.000005s : 45: predicate.replace_old_param 0.12% : 0.000001s : 4: predicate.reset_defer_inline 1.11% : 0.000008s : 52: predicate.reshape_eliminate 1.07% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 4: predicate.row_tensor_eliminate 1.30% : 0.000009s : 50: predicate.same_eliminate 0.32% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.67% : 0.000005s : 21: predicate.shard_identity_eliminate 0.23% : 0.000002s : 8: predicate.special_op_eliminate 0.58% : 0.000004s : 21: predicate.specialize_transform 1.31% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.25% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.84% : 0.000013s : 78: predicate.switch_defer_inline 2.84% : 0.000020s : 128: predicate.switch_layer_defer_inline 4.99% : 0.000036s : 213: predicate.switch_simplify 1.13% : 0.000008s : 52: predicate.tile_eliminate 1.08% : 0.000008s : 52: predicate.transpose_eliminate 1.45% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000012s : 60: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000010s : 60: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000020s : 90: predicate.tuple_list_get_item_eliminator 1.45% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 81: predicate.tuple_list_set_item_eliminator 1.57% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.48% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.05% : 0.000022s : 142: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 4: predicate.value_based_eliminate 0.54% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.49% : 0.000004s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002259 35 54.94% : 0.001241s : 14: func_graph_cloner_run.FuncGraphClonerGraph 45.06% : 0.001018s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.351866 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.08% : 0.003812s : 1: add_attr 1.08% : 0.003799s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.02% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000148s : 1: auto_monad 0.01% : 0.000025s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.15% : 0.000539s : 1: bootstrap 0.01% : 0.000033s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000018s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000028s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.02% : 0.000064s : 1: event_method 0.00% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000014s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000005s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.14% : 0.000494s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.55% : 0.001944s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 7.94% : 0.027942s : 117: opt.transform.opt_a 0.01% : 0.000031s : 1: opt.transform.opt_after_cconv 0.01% : 0.000027s : 1: opt.transform.opt_after_jit_grad 0.03% : 0.000117s : 28: opt.transform.opt_b 0.02% : 0.000053s : 2: opt.transform.opt_trans_graph 0.01% : 0.000043s : 4: opt.transform.symbol_engine_opt 29.85% : 0.105035s : 1: opt_a 0.03% : 0.000120s : 1: opt_after_cconv 0.14% : 0.000492s : 1: opt_after_jit_grad 0.07% : 0.000236s : 1: opt_b 30.91% : 0.108759s : 1: optimize 0.01% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000060s : 1: pre_auto_parallel 0.01% : 0.000048s : 1: py_interpret_to_execute 0.00% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000021s : 1: remove_dup_value 20.23% : 0.071168s : 2: renormalize.infer 0.53% : 0.001871s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000050s : 1: rewriter_after_opt_a 0.05% : 0.000166s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.03% : 0.000096s : 1: symbol_engine_optimizer 1.92% : 0.006753s : 1: task_emit 0.02% : 0.000085s : 1: tuple_transform 4.90% : 0.017254s : 1: type_inference 0.02% : 0.000083s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x3-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x3-kbk],max_mem:6.0M .. TotalTime = 19.5812, [24] [bootstrap]: 0.00058915 [type_inference]: 0.00811987 [event_method]: 1.527e-05 [auto_monad]: 6.095e-05 [graph_reusing]: 5.97999e-06 [inline]: 2.68e-06 [add_attr]: 0.00427589, [1] [add_attr_with_inline]: 0.00425961, [1] [Cycle 1]: 6.145e-05, [2] [tag_attr]: 1.983e-05 [meta_addattr_fg_expand]: 4.88001e-06 [parallel-infer-symbol]: 3.7e-06 [pre_auto_parallel]: 3.061e-05 [insert-virtual-dataset]: 2.68998e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 2.26e-06 [pipeline_split]: 1.85001e-06 [optimize]: 0.00491575, [53] [py_interpret_to_execute]: 2.497e-05 [rewriter_before_opt_a]: 7.033e-05 [opt_a]: 0.00259859, [2] [Cycle 1]: 0.00188897, [45] [expand_dump_flag]: 3.14001e-06 [switch_simplify]: 3.426e-05 [loop_unroll]: 2.045e-05 [a_1]: 0.00047182 [with_stream_mark]: 1.69e-05 [recompute_prepare]: 8.75001e-06 [updatestate_depend_eliminate]: 4.2e-06 [updatestate_assign_eliminate]: 3.83999e-06 [updatestate_loads_eliminate]: 3.31999e-06 [parameter_eliminate]: 1.93002e-06 [a_2]: 8.706e-05 [accelerated_algorithm]: 7.91001e-06 [shard]: 2.32001e-06 [meta_shard_fg_expand]: 1.91998e-06 [shard_inline]: 6.79001e-06 [merge_send_recv]: 8.67e-06 [auto_parallel]: 6.99001e-06 [parallel]: 2.765e-05 [flash_sp]: 8.70001e-06 [merge_comm]: 3.97e-06 [allreduce_fusion]: 3.98001e-06 [matmul_add_comm_reduction]: 9.70002e-06 [allreduce_slice_to_reducescatter]: 8.30012e-07 [virtual_shard_identity]: 8.07e-06 [virtual_dataset]: 5.95002e-06 [get_grad_eliminate_]: 5.91e-06 [virtual_output]: 6.25002e-06 [merge_forward]: 4.40999e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 1.062e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.224e-05 [merge_recompute_call_nodes]: 1.44998e-06 [before_grad]: 1.023e-05 [set_forward_comm_id_for_comm_node_pass]: 3.70998e-06 [meta_fg_expand]: 2.99999e-06 [flash_sp_send_recv_attached]: 2.54999e-06 [receive_attached]: 2.43e-06 [after_resolve]: 9.44e-06 [a_after_grad]: 8.79998e-06 [renormalize]: 0.000694 [add_forward_monad_depend]: 9.79e-06 [auto_monad_grad]: 2.43e-06 [auto_monad_eliminator]: 1.483e-05 [cse]: 3.103e-05 [a_3]: 4.532e-05 [Cycle 2]: 0.00069758, [45] [expand_dump_flag]: 1.62999e-06 [switch_simplify]: 7.19001e-06 [loop_unroll]: 5.96998e-06 [a_1]: 0.0001504 [with_stream_mark]: 1.169e-05 [recompute_prepare]: 7.30998e-06 [updatestate_depend_eliminate]: 3.41001e-06 [updatestate_assign_eliminate]: 2.73e-06 [updatestate_loads_eliminate]: 2.73998e-06 [parameter_eliminate]: 1.09998e-06 [a_2]: 7.356e-05 [accelerated_algorithm]: 6.06e-06 [shard]: 1.50999e-06 [meta_shard_fg_expand]: 1.61998e-06 [shard_inline]: 5.78997e-06 [merge_send_recv]: 5.43002e-06 [auto_parallel]: 6.67002e-06 [parallel]: 6.43e-06 [flash_sp]: 3.75998e-06 [merge_comm]: 3.79002e-06 [allreduce_fusion]: 3.63999e-06 [matmul_add_comm_reduction]: 7.95e-06 [allreduce_slice_to_reducescatter]: 9.09989e-07 [virtual_shard_identity]: 8.59002e-06 [virtual_dataset]: 5.57001e-06 [get_grad_eliminate_]: 5.94e-06 [virtual_output]: 5.15001e-06 [merge_forward]: 4.02e-06 [cell_reuse_recompute_pass]: 2.60002e-06 [offload_activation]: 8.00999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.224e-05 [merge_recompute_call_nodes]: 8.90024e-07 [before_grad]: 9.67999e-06 [set_forward_comm_id_for_comm_node_pass]: 4.23001e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 1.19e-06 [receive_attached]: 1.82999e-06 [after_resolve]: 9.26998e-06 [a_after_grad]: 8.18999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 2.26e-06 [auto_monad_grad]: 2.01998e-06 [auto_monad_eliminator]: 9.59e-06 [cse]: 2.214e-05 [a_3]: 3.413e-05 [py_interpret_to_execute_after_opt_a]: 1.329e-05 [slice_cell_reuse_recomputed_activation]: 2.22001e-06 [rewriter_after_opt_a]: 3.918e-05 [convert_after_rewriter]: 7.61001e-06 [order_py_execute_after_rewriter]: 5.56002e-06 [mutable_eliminate]: 0.00067531 [opt_b]: 0.00020283, [1] [Cycle 1]: 0.00019436, [7] [b_1]: 0.00011197 [b_2]: 8.24002e-06 [updatestate_depend_eliminate]: 7.11999e-06 [updatestate_assign_eliminate]: 2.96001e-06 [updatestate_loads_eliminate]: 2.54999e-06 [renormalize]: 8.2e-07 [cse]: 2.233e-05 [optimize_parallel_all_gather_comm]: 1.858e-05 [overlap_param_gather]: 2.01998e-06 [cconv]: 3.039e-05 [loop_unroll]: 0.00049716 [opt_after_cconv]: 0.00010679, [1] [Cycle 1]: 9.999e-05, [7] [c_1]: 2.764e-05 [parameter_eliminate]: 3.66001e-06 [updatestate_depend_eliminate]: 7.26999e-06 [updatestate_assign_eliminate]: 2.77002e-06 [updatestate_loads_eliminate]: 3.08e-06 [cse]: 2.005e-05 [renormalize]: 6.59988e-07 [remove_dup_value]: 1.606e-05 [tuple_transform]: 7.358e-05, [1] [Cycle 1]: 6.856e-05, [4] [d_1]: 4.148e-05 [none_parameter_eliminate]: 1.71002e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.38e-06 [partial_unused_args_eliminate]: 2.34999e-06 [add_recomputation]: 5.497e-05 [cse_after_recomputation]: 2.354e-05, [1] [Cycle 1]: 1.825e-05, [1] [cse]: 1.221e-05 [environ_conv]: 9.10999e-06 [swap_dp_allreduce_reducescatter]: 5.24e-06 [bias_add_comm_swap]: 3.66001e-06 [label_micro_interleaved_index]: 5.27999e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.36998e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.17999e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 1.00001e-06 [remove_cast_before_assign_add]: 1.01997e-06 [full_micro_interleaved_order_control]: 2.35997e-06 [reorder_send_recv_between_fp_bp]: 3.26001e-06 [comm_op_add_attrs]: 1.15001e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.50001e-06 [overlap_opt_shard_grad_in_pipeline]: 2.00002e-06 [control_data_broadcast_order]: 1.466e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 5.07999e-06 [overlap_recompute_and_grad_model_parallel]: 5.72999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.37e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27e-06 [overlap_recompute_comm]: 2.41e-06 [overlap_grad_ring_attention]: 5.18002e-06 [overlap_grad_flash_sp]: 2.019e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.37001e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 1.09998e-06 [symbol_engine_optimizer]: 8.609e-05, [1] [Cycle 1]: 8.035e-05, [6] [build]: 3.41001e-06 [elim_shapecalc]: 1.322e-05 [elim_not_effective]: 1.464e-05 [opt_reshape]: 6.32001e-06 [fold_const_symbol]: 9.57999e-06 [renormalize]: 1.99972e-07 [detach_backward]: 2.39001e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 1.942e-05 [get_jit_bprop_graph]: 1.77999e-06 [rewriter_after_jit_bprop_graph]: 4.17003e-06 [opt_after_jit_grad]: 0.00053516 [validate]: 4.301e-05 [backend_pass]: 8.09989e-07 [task_emit]: 19.5623 [execute]: 8.40001e-06 Sums bootstrap : 0.000589s : 0.00% type_inference : 0.008120s : 0.04% event_method : 0.000015s : 0.00% auto_monad : 0.000061s : 0.00% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000020s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000031s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000025s : 0.00% optimize.rewriter_before_opt_a : 0.000070s : 0.00% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000622s : 0.00% optimize.opt_a.with_stream_mark : 0.000029s : 0.00% optimize.opt_a.recompute_prepare : 0.000016s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000161s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000014s : 0.00% optimize.opt_a.auto_parallel : 0.000014s : 0.00% optimize.opt_a.parallel : 0.000034s : 0.00% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000008s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000017s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000019s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000694s : 0.00% optimize.opt_a.add_forward_monad_depend : 0.000012s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.00% optimize.opt_a.cse : 0.000053s : 0.00% optimize.opt_a.a_3 : 0.000079s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000039s : 0.00% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000675s : 0.00% optimize.opt_b.b_1 : 0.000112s : 0.00% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000022s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000030s : 0.00% optimize.loop_unroll : 0.000497s : 0.00% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000020s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.00% optimize.tuple_transform.d_1 : 0.000041s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000055s : 0.00% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000020s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000019s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000535s : 0.00% validate : 0.000043s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 19.562332s : 99.93% execute : 0.000008s : 0.00% Time group info: ------[substitution.] 0.000223 26 29.74% : 0.000066s : 5: substitution.arithmetic_simplify 0.85% : 0.000002s : 2: substitution.elim_not_effective 0.67% : 0.000001s : 2: substitution.fold_const_symbol 2.93% : 0.000007s : 3: substitution.graph_param_transform 56.00% : 0.000125s : 3: substitution.inline 1.78% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.17% : 0.000005s : 4: substitution.remove_not_recompute_node 1.78% : 0.000004s : 2: substitution.replace_old_param 4.10% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.008050 2 90.56% : 0.007290s : 1: type_inference.infer 9.44% : 0.000760s : 1: type_inference.specialize ------[replace.] 0.000040 4 79.54% : 0.000032s : 3: replace.inline 20.46% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000132 4 93.55% : 0.000123s : 3: match.inline 6.45% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000169 883 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 1.06% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.53% : 0.000001s : 6: predicate.addn_check_dump 0.87% : 0.000001s : 9: predicate.addn_zero_filter 0.77% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.47% : 0.000004s : 15: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.60% : 0.000001s : 6: predicate.check_bprop_eliminate 0.55% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.85% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.19% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.21% : 0.000000s : 3: predicate.elim_not_effective 0.55% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.01% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.02% : 0.000002s : 12: predicate.environ_get_depend_swap 1.78% : 0.000003s : 18: predicate.environ_get_eliminate 1.06% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.22% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 13: predicate.float_depend_g_call 0.53% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.66% : 0.000001s : 6: predicate.get_grad_eliminate 0.34% : 0.000001s : 3: predicate.graph_param_transform 0.71% : 0.000001s : 6: predicate.incorporate_call 0.57% : 0.000001s : 6: predicate.incorporate_call_switch 6.61% : 0.000011s : 40: predicate.inline 1.00% : 0.000002s : 6: predicate.inline_without_move 0.47% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.23% : 0.000002s : 6: predicate.less_batch_normalization 1.63% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.18% : 0.000004s : 25: predicate.load_eliminater 1.48% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.05% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.90% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 9: predicate.minmaximum_grad 1.28% : 0.000002s : 3: predicate.mutable_eliminate 0.32% : 0.000001s : 3: predicate.opt_reshape 0.41% : 0.000001s : 3: predicate.parallel_virtual_node 1.62% : 0.000003s : 13: predicate.partial_defer_inline 1.36% : 0.000002s : 13: predicate.partial_eliminate 0.87% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 6: predicate.reduce_all_const_elim 1.11% : 0.000002s : 9: predicate.reduce_eliminate 2.27% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.63% : 0.000001s : 6: predicate.remove_not_recompute_node 1.34% : 0.000002s : 16: predicate.replace_applicator 0.53% : 0.000001s : 6: predicate.replace_old_param 0.38% : 0.000001s : 3: predicate.reset_defer_inline 0.96% : 0.000002s : 9: predicate.reshape_eliminate 0.76% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.53% : 0.000001s : 3: predicate.row_tensor_eliminate 0.90% : 0.000002s : 6: predicate.same_eliminate 0.60% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.91% : 0.000002s : 6: predicate.shard_identity_eliminate 0.73% : 0.000001s : 6: predicate.special_op_eliminate 0.84% : 0.000001s : 6: predicate.specialize_transform 1.14% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.74% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.27% : 0.000002s : 13: predicate.switch_defer_inline 1.87% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.72% : 0.000008s : 43: predicate.switch_simplify 0.89% : 0.000002s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.42% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.25% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.55% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.20% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.05% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 3: predicate.value_based_eliminate 0.58% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.61% : 0.000001s : 6: predicate.virtual_output_eliminate 0.27% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000530 8 45.32% : 0.000240s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.68% : 0.000290s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 19.592274 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.02% : 0.004283s : 1: add_attr 0.02% : 0.004264s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000067s : 1: auto_monad 0.00% : 0.000023s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.00% : 0.000630s : 1: bootstrap 0.00% : 0.000035s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000019s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000027s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.00% : 0.000022s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000005s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.00% : 0.000509s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.00% : 0.000690s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.01% : 0.001014s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000090s : 28: opt.transform.opt_b 0.00% : 0.000046s : 2: opt.transform.opt_trans_graph 0.00% : 0.000040s : 4: opt.transform.symbol_engine_opt 0.01% : 0.002602s : 1: opt_a 0.00% : 0.000111s : 1: opt_after_cconv 0.00% : 0.000548s : 1: opt_after_jit_grad 0.00% : 0.000206s : 1: opt_b 0.03% : 0.004921s : 1: optimize 0.00% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000035s : 1: pre_auto_parallel 0.00% : 0.000029s : 1: py_interpret_to_execute 0.00% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000020s : 1: remove_dup_value 0.00% : 0.000357s : 1: renormalize.infer 0.00% : 0.000328s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000045s : 1: rewriter_after_opt_a 0.00% : 0.000075s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000089s : 1: symbol_engine_optimizer 99.85% : 19.562355s : 1: task_emit 0.00% : 0.000076s : 1: tuple_transform 0.04% : 0.008141s : 1: type_inference 0.00% : 0.000074s : 1: validate . TotalTime = 0.57285, [24] [bootstrap]: 0.0004527 [type_inference]: 0.018993 [event_method]: 1.736e-05 [auto_monad]: 7.131e-05 [graph_reusing]: 6.41e-06 [inline]: 3.51999e-06 [add_attr]: 0.00381056, [1] [add_attr_with_inline]: 0.00379786, [1] [Cycle 1]: 6.428e-05, [2] [tag_attr]: 1.663e-05 [meta_addattr_fg_expand]: 4.22e-06 [parallel-infer-symbol]: 3.66001e-06 [pre_auto_parallel]: 3.466e-05 [insert-virtual-dataset]: 2.89999e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.17999e-06 [pipeline_split]: 1.76e-06 [optimize]: 0.00483457, [53] [py_interpret_to_execute]: 2.295e-05 [rewriter_before_opt_a]: 5.957e-05 [opt_a]: 0.00248353, [2] [Cycle 1]: 0.00179058, [45] [expand_dump_flag]: 3.51001e-06 [switch_simplify]: 3.056e-05 [loop_unroll]: 1.791e-05 [a_1]: 0.00037171 [with_stream_mark]: 2.164e-05 [recompute_prepare]: 9.79e-06 [updatestate_depend_eliminate]: 4.42e-06 [updatestate_assign_eliminate]: 3.65e-06 [updatestate_loads_eliminate]: 3.36001e-06 [parameter_eliminate]: 1.89e-06 [a_2]: 8.623e-05 [accelerated_algorithm]: 7.92e-06 [shard]: 2.17999e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 6.98e-06 [merge_send_recv]: 9.46e-06 [auto_parallel]: 7.05e-06 [parallel]: 2.062e-05 [flash_sp]: 1.044e-05 [merge_comm]: 4.2e-06 [allreduce_fusion]: 4.12e-06 [matmul_add_comm_reduction]: 1.11e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 8.82e-06 [virtual_dataset]: 6.62002e-06 [get_grad_eliminate_]: 6.26e-06 [virtual_output]: 5.97999e-06 [merge_forward]: 4.22998e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 1.131e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.308e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 1.047e-05 [set_forward_comm_id_for_comm_node_pass]: 3.99002e-06 [meta_fg_expand]: 2.83e-06 [flash_sp_send_recv_attached]: 2.59999e-06 [receive_attached]: 2.76e-06 [after_resolve]: 9.81e-06 [a_after_grad]: 8.75999e-06 [renormalize]: 0.00067603 [add_forward_monad_depend]: 6.51e-06 [auto_monad_grad]: 2.94999e-06 [auto_monad_eliminator]: 1.623e-05 [cse]: 3.275e-05 [a_3]: 5.018e-05 [Cycle 2]: 0.0006807, [45] [expand_dump_flag]: 1.69e-06 [switch_simplify]: 7.28999e-06 [loop_unroll]: 5.84e-06 [a_1]: 0.00012339 [with_stream_mark]: 1.374e-05 [recompute_prepare]: 6.68003e-06 [updatestate_depend_eliminate]: 3.9e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 3.26999e-06 [parameter_eliminate]: 1.14998e-06 [a_2]: 7.891e-05 [accelerated_algorithm]: 7.01999e-06 [shard]: 1.81e-06 [meta_shard_fg_expand]: 2.19999e-06 [shard_inline]: 6.20002e-06 [merge_send_recv]: 7.34002e-06 [auto_parallel]: 7.97e-06 [parallel]: 6.96001e-06 [flash_sp]: 3.99002e-06 [merge_comm]: 3.49001e-06 [allreduce_fusion]: 3.49001e-06 [matmul_add_comm_reduction]: 1.106e-05 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.91001e-06 [virtual_dataset]: 5.46998e-06 [get_grad_eliminate_]: 5.41002e-06 [virtual_output]: 5.20001e-06 [merge_forward]: 3.30998e-06 [cell_reuse_recompute_pass]: 2.19001e-06 [offload_activation]: 9.56998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.187e-05 [merge_recompute_call_nodes]: 1.06002e-06 [before_grad]: 1.013e-05 [set_forward_comm_id_for_comm_node_pass]: 4.14002e-06 [meta_fg_expand]: 2.32001e-06 [flash_sp_send_recv_attached]: 1.09e-06 [receive_attached]: 1.48002e-06 [after_resolve]: 9.71e-06 [a_after_grad]: 7.92e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.76e-06 [auto_monad_grad]: 1.49e-06 [auto_monad_eliminator]: 8.60001e-06 [cse]: 1.84e-05 [a_3]: 3.411e-05 [py_interpret_to_execute_after_opt_a]: 1.57e-05 [slice_cell_reuse_recomputed_activation]: 2.26e-06 [rewriter_after_opt_a]: 4.024e-05 [convert_after_rewriter]: 7.24001e-06 [order_py_execute_after_rewriter]: 5.81e-06 [mutable_eliminate]: 0.00067302 [opt_b]: 0.00025016, [1] [Cycle 1]: 0.00024118, [7] [b_1]: 0.00012412 [b_2]: 9.67999e-06 [updatestate_depend_eliminate]: 8.79e-06 [updatestate_assign_eliminate]: 2.94999e-06 [updatestate_loads_eliminate]: 2.80997e-06 [renormalize]: 4.69998e-07 [cse]: 2.596e-05 [optimize_parallel_all_gather_comm]: 1.935e-05 [overlap_param_gather]: 1.94999e-06 [cconv]: 3.269e-05 [loop_unroll]: 0.00050009 [opt_after_cconv]: 0.00010786, [1] [Cycle 1]: 0.00010114, [7] [c_1]: 2.743e-05 [parameter_eliminate]: 5.30999e-06 [updatestate_depend_eliminate]: 6.12001e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 2.106e-05 [renormalize]: 8.99978e-07 [remove_dup_value]: 1.709e-05 [tuple_transform]: 7.549e-05, [1] [Cycle 1]: 7.023e-05, [4] [d_1]: 4.094e-05 [none_parameter_eliminate]: 1.63002e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 7.02002e-06 [partial_unused_args_eliminate]: 1.91003e-06 [add_recomputation]: 5.293e-05 [cse_after_recomputation]: 2.35e-05, [1] [Cycle 1]: 1.819e-05, [1] [cse]: 1.228e-05 [environ_conv]: 7.08e-06 [swap_dp_allreduce_reducescatter]: 5.17e-06 [bias_add_comm_swap]: 3.08e-06 [label_micro_interleaved_index]: 6.07999e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.74e-06 [slice_recompute_activation]: 2.49999e-06 [micro_interleaved_order_control]: 3.06001e-06 [assign_add_opt]: 1.76e-06 [ForceFp32Comm]: 9.10019e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.22001e-06 [reorder_send_recv_between_fp_bp]: 3.13998e-06 [comm_op_add_attrs]: 1.42e-06 [add_comm_op_reuse_tag]: 1.06002e-06 [interleave_split_concat_branches]: 1.91998e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.45999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.357e-05 [grouped_pairwise_exchange_alltoall]: 2.27999e-06 [offloading_packed_experts]: 4.55001e-06 [overlap_recompute_and_grad_model_parallel]: 5.22e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.90001e-06 [overlap_recompute_comm]: 2.58e-06 [overlap_grad_ring_attention]: 4.45e-06 [overlap_grad_flash_sp]: 2.251e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 2.21998e-06 [handle_group_info]: 1.16002e-06 [symbol_engine_optimizer]: 8.328e-05, [1] [Cycle 1]: 7.831e-05, [6] [build]: 3.73999e-06 [elim_shapecalc]: 1.08e-05 [elim_not_effective]: 1.411e-05 [opt_reshape]: 7.46999e-06 [fold_const_symbol]: 1.059e-05 [renormalize]: 2.29978e-07 [detach_backward]: 2.83998e-06 [pipeline_parallel_scheduler]: 1.66002e-06 [auto_monad_reorder]: 1.775e-05 [get_jit_bprop_graph]: 1.70001e-06 [rewriter_after_jit_bprop_graph]: 4.32e-06 [opt_after_jit_grad]: 0.00056586 [validate]: 4.774e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.543682 [execute]: 1.03e-05 Sums bootstrap : 0.000453s : 0.08% type_inference : 0.018993s : 3.34% event_method : 0.000017s : 0.00% auto_monad : 0.000071s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000004s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000035s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.00% optimize.rewriter_before_opt_a : 0.000060s : 0.01% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.01% optimize.opt_a.loop_unroll : 0.000024s : 0.00% optimize.opt_a.a_1 : 0.000495s : 0.09% optimize.opt_a.with_stream_mark : 0.000035s : 0.01% optimize.opt_a.recompute_prepare : 0.000016s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000165s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000015s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000017s : 0.00% optimize.opt_a.auto_parallel : 0.000015s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000014s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000008s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000022s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000021s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000676s : 0.12% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.00% optimize.opt_a.cse : 0.000051s : 0.01% optimize.opt_a.a_3 : 0.000084s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000040s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000673s : 0.12% optimize.opt_b.b_1 : 0.000124s : 0.02% optimize.opt_b.b_2 : 0.000010s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000026s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000033s : 0.01% optimize.loop_unroll : 0.000500s : 0.09% optimize.opt_after_cconv.c_1 : 0.000027s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000021s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000017s : 0.00% optimize.tuple_transform.d_1 : 0.000041s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000007s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000566s : 0.10% validate : 0.000048s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.543682s : 95.74% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000163 24 23.84% : 0.000039s : 4: substitution.arithmetic_simplify 1.33% : 0.000002s : 2: substitution.elim_not_effective 1.11% : 0.000002s : 2: substitution.fold_const_symbol 4.07% : 0.000007s : 3: substitution.graph_param_transform 61.71% : 0.000101s : 3: substitution.inline 2.68% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.98% : 0.000005s : 4: substitution.remove_not_recompute_node 2.28% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.018910 2 32.08% : 0.006067s : 1: type_inference.infer 67.92% : 0.012843s : 1: type_inference.specialize ------[replace.] 0.000029 3 100.00% : 0.000029s : 3: replace.inline ------[match.] 0.000099 3 100.00% : 0.000099s : 3: match.inline ------[predicate.] 0.000161 815 0.89% : 0.000001s : 8: predicate.accumulaten_eliminater 1.08% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 6: predicate.addn_check_dump 0.89% : 0.000001s : 8: predicate.addn_zero_filter 0.76% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.50% : 0.000004s : 14: predicate.arithmetic_simplify 0.97% : 0.000002s : 8: predicate.cast_eliminate 0.83% : 0.000001s : 6: predicate.check_bprop_eliminate 0.61% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.76% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.34% : 0.000001s : 3: predicate.elim_not_effective 0.48% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.03% : 0.000002s : 11: predicate.environ_get_depend_swap 1.70% : 0.000003s : 17: predicate.environ_get_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.08% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.40% : 0.000004s : 11: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.98% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.83% : 0.000001s : 6: predicate.get_grad_eliminate 0.26% : 0.000000s : 3: predicate.graph_param_transform 0.67% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.21% : 0.000010s : 37: predicate.inline 0.94% : 0.000002s : 6: predicate.inline_without_move 0.43% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.96% : 0.000002s : 6: predicate.less_batch_normalization 1.68% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.09% : 0.000003s : 22: predicate.load_eliminater 1.07% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.94% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.81% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.57% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 8: predicate.minmaximum_grad 1.87% : 0.000003s : 3: predicate.mutable_eliminate 0.34% : 0.000001s : 3: predicate.opt_reshape 0.43% : 0.000001s : 3: predicate.parallel_virtual_node 1.60% : 0.000003s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 11: predicate.partial_eliminate 0.75% : 0.000001s : 8: predicate.print_const_string_wrapper 0.58% : 0.000001s : 6: predicate.reduce_all_const_elim 1.26% : 0.000002s : 8: predicate.reduce_eliminate 2.08% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.84% : 0.000001s : 6: predicate.remove_not_recompute_node 1.14% : 0.000002s : 14: predicate.replace_applicator 0.59% : 0.000001s : 6: predicate.replace_old_param 0.27% : 0.000000s : 3: predicate.reset_defer_inline 0.82% : 0.000001s : 8: predicate.reshape_eliminate 0.60% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.55% : 0.000001s : 3: predicate.row_tensor_eliminate 0.82% : 0.000001s : 6: predicate.same_eliminate 0.59% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 6: predicate.shard_identity_eliminate 1.04% : 0.000002s : 6: predicate.special_op_eliminate 1.51% : 0.000002s : 6: predicate.specialize_transform 1.07% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.70% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.12% : 0.000002s : 11: predicate.switch_defer_inline 1.73% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.69% : 0.000008s : 38: predicate.switch_simplify 0.80% : 0.000001s : 8: predicate.tile_eliminate 0.88% : 0.000001s : 8: predicate.transpose_eliminate 1.64% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.42% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.46% : 0.000006s : 20: predicate.tuple_list_get_item_eliminator 1.68% : 0.000003s : 14: predicate.tuple_list_get_set_item_eliminator 2.10% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.43% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.87% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.60% : 0.000001s : 3: predicate.value_based_eliminate 0.86% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000415 7 36.08% : 0.000150s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.92% : 0.000265s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.583223 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.65% : 0.003817s : 1: add_attr 0.65% : 0.003802s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000058s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.01% : 0.000077s : 1: auto_monad 0.00% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.08% : 0.000492s : 1: bootstrap 0.01% : 0.000037s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000017s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000027s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000011s : 1: environ_conv 0.00% : 0.000026s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000005s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.09% : 0.000512s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.12% : 0.000688s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000019s : 1: opt.transform.mutable_eliminate 0.15% : 0.000887s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000026s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000102s : 28: opt.transform.opt_b 0.01% : 0.000046s : 2: opt.transform.opt_trans_graph 0.01% : 0.000039s : 4: opt.transform.symbol_engine_opt 0.43% : 0.002488s : 1: opt_a 0.02% : 0.000111s : 1: opt_after_cconv 0.10% : 0.000579s : 1: opt_after_jit_grad 0.04% : 0.000254s : 1: opt_b 0.83% : 0.004841s : 1: optimize 0.00% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000039s : 1: pre_auto_parallel 0.00% : 0.000027s : 1: py_interpret_to_execute 0.00% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000021s : 1: remove_dup_value 0.06% : 0.000340s : 1: renormalize.infer 0.06% : 0.000327s : 1: renormalize.specialize 0.00% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000045s : 1: rewriter_after_opt_a 0.01% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000086s : 1: symbol_engine_optimizer 93.22% : 0.543708s : 1: task_emit 0.01% : 0.000079s : 1: tuple_transform 3.26% : 0.019031s : 1: type_inference 0.01% : 0.000083s : 1: validate TotalTime = 0.545433, [24] [bootstrap]: 0.00045845 [type_inference]: 0.0147902 [event_method]: 1.665e-05 [auto_monad]: 6.525e-05 [graph_reusing]: 5.70001e-06 [inline]: 2.74001e-06 [add_attr]: 0.00365633, [1] [add_attr_with_inline]: 0.00364588, [1] [Cycle 1]: 6.484e-05, [2] [tag_attr]: 1.743e-05 [meta_addattr_fg_expand]: 4.65001e-06 [parallel-infer-symbol]: 3.56999e-06 [pre_auto_parallel]: 3.37e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.37999e-06 [pipeline_split]: 1.91998e-06 [optimize]: 0.00480389, [53] [py_interpret_to_execute]: 2.527e-05 [rewriter_before_opt_a]: 6.895e-05 [opt_a]: 0.00259566, [2] [Cycle 1]: 0.00194185, [45] [expand_dump_flag]: 2.56e-06 [switch_simplify]: 3.178e-05 [loop_unroll]: 2.073e-05 [a_1]: 0.00043484 [with_stream_mark]: 1.524e-05 [recompute_prepare]: 8.62e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 3.5e-06 [updatestate_loads_eliminate]: 3.39001e-06 [parameter_eliminate]: 2.32999e-06 [a_2]: 8.08e-05 [accelerated_algorithm]: 7.28999e-06 [shard]: 1.90001e-06 [meta_shard_fg_expand]: 1.89999e-06 [shard_inline]: 5.97001e-06 [merge_send_recv]: 8.92999e-06 [auto_parallel]: 7.92e-06 [parallel]: 2.012e-05 [flash_sp]: 8.87999e-06 [merge_comm]: 4.2e-06 [allreduce_fusion]: 3.8e-06 [matmul_add_comm_reduction]: 9.51e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 8.05e-06 [virtual_dataset]: 6.68e-06 [get_grad_eliminate_]: 6.21998e-06 [virtual_output]: 6.10002e-06 [merge_forward]: 4.19002e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 1.172e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.257e-05 [merge_recompute_call_nodes]: 2.12001e-06 [before_grad]: 1.057e-05 [set_forward_comm_id_for_comm_node_pass]: 3.88001e-06 [meta_fg_expand]: 2.74999e-06 [flash_sp_send_recv_attached]: 2.78e-06 [receive_attached]: 2.43e-06 [after_resolve]: 1.017e-05 [a_after_grad]: 8.77e-06 [renormalize]: 0.00079309 [add_forward_monad_depend]: 5.69999e-06 [auto_monad_grad]: 2.88e-06 [auto_monad_eliminator]: 1.503e-05 [cse]: 3.203e-05 [a_3]: 4.556e-05 [Cycle 2]: 0.00064186, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 7.76001e-06 [loop_unroll]: 5.59e-06 [a_1]: 0.00012057 [with_stream_mark]: 1.347e-05 [recompute_prepare]: 6.16998e-06 [updatestate_depend_eliminate]: 3.59002e-06 [updatestate_assign_eliminate]: 2.76999e-06 [updatestate_loads_eliminate]: 3.31999e-06 [parameter_eliminate]: 1.19e-06 [a_2]: 7.287e-05 [accelerated_algorithm]: 6.69999e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.76998e-06 [merge_send_recv]: 5.39e-06 [auto_parallel]: 6.61999e-06 [parallel]: 6.26e-06 [flash_sp]: 3.81999e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 6.91001e-06 [allreduce_slice_to_reducescatter]: 6.70028e-07 [virtual_shard_identity]: 6.81001e-06 [virtual_dataset]: 5.17999e-06 [get_grad_eliminate_]: 5.41998e-06 [virtual_output]: 5.29e-06 [merge_forward]: 3.43e-06 [cell_reuse_recompute_pass]: 1.49998e-06 [offload_activation]: 8.08999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.069e-05 [merge_recompute_call_nodes]: 1.12e-06 [before_grad]: 9.05999e-06 [set_forward_comm_id_for_comm_node_pass]: 4.01001e-06 [meta_fg_expand]: 2.09e-06 [flash_sp_send_recv_attached]: 1.17999e-06 [receive_attached]: 1.38002e-06 [after_resolve]: 9.07001e-06 [a_after_grad]: 8.23999e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.59998e-06 [auto_monad_grad]: 1.33002e-06 [auto_monad_eliminator]: 7.77e-06 [cse]: 1.47e-05 [a_3]: 3.315e-05 [py_interpret_to_execute_after_opt_a]: 1.251e-05 [slice_cell_reuse_recomputed_activation]: 2.19001e-06 [rewriter_after_opt_a]: 3.842e-05 [convert_after_rewriter]: 6.83e-06 [order_py_execute_after_rewriter]: 4.98001e-06 [mutable_eliminate]: 0.00065489 [opt_b]: 0.00019958, [1] [Cycle 1]: 0.00019167, [7] [b_1]: 0.00011232 [b_2]: 7.78001e-06 [updatestate_depend_eliminate]: 6.76e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 8.09989e-07 [cse]: 2.135e-05 [optimize_parallel_all_gather_comm]: 1.899e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.559e-05 [loop_unroll]: 0.0004673 [opt_after_cconv]: 0.00010493, [1] [Cycle 1]: 9.812e-05, [7] [c_1]: 2.708e-05 [parameter_eliminate]: 3.33e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.56998e-06 [cse]: 2.078e-05 [renormalize]: 6.89994e-07 [remove_dup_value]: 1.618e-05 [tuple_transform]: 7.25e-05, [1] [Cycle 1]: 6.755e-05, [4] [d_1]: 3.862e-05 [none_parameter_eliminate]: 1.61002e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 7.06999e-06 [partial_unused_args_eliminate]: 2.30002e-06 [add_recomputation]: 4.869e-05 [cse_after_recomputation]: 2.224e-05, [1] [Cycle 1]: 1.696e-05, [1] [cse]: 1.16e-05 [environ_conv]: 5.87001e-06 [swap_dp_allreduce_reducescatter]: 5.40001e-06 [bias_add_comm_swap]: 2.69001e-06 [label_micro_interleaved_index]: 5.31002e-06 [label_fine_grained_interleaved_index]: 2.86999e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.58e-06 [assign_add_opt]: 1.33002e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.19e-06 [full_micro_interleaved_order_control]: 2.92002e-06 [reorder_send_recv_between_fp_bp]: 3.09001e-06 [comm_op_add_attrs]: 1.10999e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.23002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.325e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 3.94002e-06 [overlap_recompute_and_grad_model_parallel]: 5.22999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.72001e-06 [overlap_recompute_comm]: 2.39999e-06 [overlap_grad_ring_attention]: 4.38001e-06 [overlap_grad_flash_sp]: 2.162e-05 [begin_end_overlap_inline]: 5.70028e-07 [split_matmul_comm_elemetwise]: 2.61e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.04e-06 [symbol_engine_optimizer]: 7.634e-05, [1] [Cycle 1]: 7.123e-05, [6] [build]: 3.8e-06 [elim_shapecalc]: 9.03002e-06 [elim_not_effective]: 1.253e-05 [opt_reshape]: 6.54999e-06 [fold_const_symbol]: 1.014e-05 [renormalize]: 2.60014e-07 [detach_backward]: 2.59001e-06 [pipeline_parallel_scheduler]: 1.62001e-06 [auto_monad_reorder]: 1.708e-05 [get_jit_bprop_graph]: 2.16e-06 [rewriter_after_jit_bprop_graph]: 4.80999e-06 [opt_after_jit_grad]: 0.0005507 [validate]: 5.882e-05 [backend_pass]: 1.02998e-06 [task_emit]: 0.52065 [execute]: 1.068e-05 Sums bootstrap : 0.000458s : 0.08% type_inference : 0.014790s : 2.74% event_method : 0.000017s : 0.00% auto_monad : 0.000065s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000034s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000025s : 0.00% optimize.rewriter_before_opt_a : 0.000069s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.01% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000555s : 0.10% optimize.opt_a.with_stream_mark : 0.000029s : 0.01% optimize.opt_a.recompute_prepare : 0.000015s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000154s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000014s : 0.00% optimize.opt_a.auto_parallel : 0.000015s : 0.00% optimize.opt_a.parallel : 0.000026s : 0.00% optimize.opt_a.flash_sp : 0.000013s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000020s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000793s : 0.15% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.00% optimize.opt_a.cse : 0.000047s : 0.01% optimize.opt_a.a_3 : 0.000079s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000038s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000655s : 0.12% optimize.opt_b.b_1 : 0.000112s : 0.02% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000021s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.00% optimize.loop_unroll : 0.000467s : 0.09% optimize.opt_after_cconv.c_1 : 0.000027s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000021s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000022s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000551s : 0.10% validate : 0.000059s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.520650s : 96.30% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.000168 26 22.27% : 0.000037s : 5: substitution.arithmetic_simplify 1.14% : 0.000002s : 2: substitution.elim_not_effective 0.91% : 0.000002s : 2: substitution.fold_const_symbol 3.54% : 0.000006s : 3: substitution.graph_param_transform 59.22% : 0.000099s : 3: substitution.inline 2.20% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.10% : 0.000005s : 4: substitution.remove_not_recompute_node 2.29% : 0.000004s : 2: substitution.replace_old_param 5.32% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.014732 2 95.43% : 0.014058s : 1: type_inference.infer 4.57% : 0.000674s : 1: type_inference.specialize ------[replace.] 0.000035 4 76.58% : 0.000026s : 3: replace.inline 23.42% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000105 4 92.31% : 0.000097s : 3: match.inline 7.69% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 883 0.88% : 0.000001s : 9: predicate.accumulaten_eliminater 0.80% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 9: predicate.addn_zero_filter 0.91% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.37% : 0.000004s : 15: predicate.arithmetic_simplify 0.94% : 0.000002s : 9: predicate.cast_eliminate 0.59% : 0.000001s : 6: predicate.check_bprop_eliminate 0.55% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.59% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 9: predicate.dict_set_item_eliminator 0.99% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.42% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.34% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.14% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 12: predicate.environ_get_depend_swap 1.71% : 0.000003s : 18: predicate.environ_get_eliminate 1.18% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.18% : 0.000004s : 13: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.82% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.84% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.75% : 0.000001s : 6: predicate.incorporate_call 0.55% : 0.000001s : 6: predicate.incorporate_call_switch 5.86% : 0.000010s : 40: predicate.inline 0.92% : 0.000002s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.42% : 0.000002s : 6: predicate.less_batch_normalization 1.65% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.26% : 0.000004s : 25: predicate.load_eliminater 1.12% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.04% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.85% : 0.000001s : 9: predicate.minmaximum_grad 1.26% : 0.000002s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.61% : 0.000003s : 13: predicate.partial_defer_inline 1.37% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.59% : 0.000001s : 6: predicate.reduce_all_const_elim 1.15% : 0.000002s : 9: predicate.reduce_eliminate 2.41% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.44% : 0.000002s : 16: predicate.replace_applicator 0.71% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.93% : 0.000002s : 9: predicate.reshape_eliminate 0.62% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 3: predicate.row_tensor_eliminate 0.79% : 0.000001s : 6: predicate.same_eliminate 0.51% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 6: predicate.shard_identity_eliminate 0.87% : 0.000001s : 6: predicate.special_op_eliminate 0.83% : 0.000001s : 6: predicate.specialize_transform 0.95% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.75% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 13: predicate.switch_defer_inline 2.06% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.63% : 0.000008s : 43: predicate.switch_simplify 0.90% : 0.000001s : 9: predicate.tile_eliminate 0.85% : 0.000001s : 9: predicate.transpose_eliminate 1.82% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.71% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.53% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.40% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.51% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.84% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.21% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.00% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.52% : 0.000001s : 3: predicate.value_based_eliminate 0.69% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000406 8 41.90% : 0.000170s : 3: func_graph_cloner_run.FuncGraphClonerGraph 58.10% : 0.000236s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.555761 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.66% : 0.003663s : 1: add_attr 0.66% : 0.003650s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000071s : 1: auto_monad 0.00% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.09% : 0.000499s : 1: bootstrap 0.01% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000026s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.00% : 0.000024s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.09% : 0.000477s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.12% : 0.000665s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000014s : 1: opt.transform.mutable_eliminate 0.17% : 0.000933s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000090s : 28: opt.transform.opt_b 0.01% : 0.000044s : 2: opt.transform.opt_trans_graph 0.01% : 0.000035s : 4: opt.transform.symbol_engine_opt 0.47% : 0.002599s : 1: opt_a 0.02% : 0.000108s : 1: opt_after_cconv 0.10% : 0.000583s : 1: opt_after_jit_grad 0.04% : 0.000203s : 1: opt_b 0.87% : 0.004810s : 1: optimize 0.00% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000040s : 1: pre_auto_parallel 0.01% : 0.000030s : 1: py_interpret_to_execute 0.00% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000020s : 1: remove_dup_value 0.09% : 0.000474s : 1: renormalize.infer 0.06% : 0.000311s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000043s : 1: rewriter_after_opt_a 0.01% : 0.000074s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000079s : 1: symbol_engine_optimizer 93.69% : 0.520677s : 1: task_emit 0.01% : 0.000076s : 1: tuple_transform 2.67% : 0.014816s : 1: type_inference 0.02% : 0.000096s : 1: validate TotalTime = 0.628538, [24] [bootstrap]: 0.00052028 [type_inference]: 0.0132788 [event_method]: 5.541e-05 [auto_monad]: 0.0001423 [graph_reusing]: 8.45001e-06 [inline]: 2.26e-06 [add_attr]: 0.00405381, [1] [add_attr_with_inline]: 0.00404094, [1] [Cycle 1]: 0.00010936, [2] [tag_attr]: 4.385e-05 [meta_addattr_fg_expand]: 1.048e-05 [parallel-infer-symbol]: 4.76002e-06 [pre_auto_parallel]: 6.541e-05 [insert-virtual-dataset]: 2.62001e-06 [parallel-infer-symbol-second]: 9.10019e-07 [dataset_repeat_opt]: 2.42001e-06 [pipeline_split]: 1.94e-06 [optimize]: 0.0218819, [53] [py_interpret_to_execute]: 4.73e-05 [rewriter_before_opt_a]: 0.00017988 [opt_a]: 0.0192624, [3] [Cycle 1]: 0.0151256, [45] [expand_dump_flag]: 5.91e-06 [switch_simplify]: 8.054e-05 [loop_unroll]: 6.784e-05 [a_1]: 0.00168532 [with_stream_mark]: 3.651e-05 [recompute_prepare]: 2.91e-05 [updatestate_depend_eliminate]: 1.053e-05 [updatestate_assign_eliminate]: 7.82998e-06 [updatestate_loads_eliminate]: 7.51001e-06 [parameter_eliminate]: 3.73999e-06 [a_2]: 0.00025163 [accelerated_algorithm]: 3.587e-05 [shard]: 2.43e-06 [meta_shard_fg_expand]: 4.84e-06 [shard_inline]: 1.654e-05 [merge_send_recv]: 1.897e-05 [auto_parallel]: 1.522e-05 [parallel]: 2.259e-05 [flash_sp]: 1.374e-05 [merge_comm]: 9.41e-06 [allreduce_fusion]: 8.62e-06 [matmul_add_comm_reduction]: 3.531e-05 [allreduce_slice_to_reducescatter]: 9.29984e-07 [virtual_shard_identity]: 2.023e-05 [virtual_dataset]: 1.583e-05 [get_grad_eliminate_]: 1.481e-05 [virtual_output]: 1.49e-05 [merge_forward]: 9.63002e-06 [cell_reuse_recompute_pass]: 1.77999e-06 [offload_activation]: 1.806e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.161e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 2.876e-05 [set_forward_comm_id_for_comm_node_pass]: 9.87001e-06 [meta_fg_expand]: 0.00233381 [flash_sp_send_recv_attached]: 4.61002e-06 [receive_attached]: 2.44999e-06 [after_resolve]: 7.648e-05 [a_after_grad]: 9.425e-05 [renormalize]: 0.00905309 [add_forward_monad_depend]: 1.509e-05 [auto_monad_grad]: 7.50998e-06 [auto_monad_eliminator]: 5.987e-05 [cse]: 0.00020814 [a_3]: 0.00034888 [Cycle 2]: 0.00338675, [45] [expand_dump_flag]: 2.71999e-06 [switch_simplify]: 4.674e-05 [loop_unroll]: 4.259e-05 [a_1]: 0.00145066 [with_stream_mark]: 2.212e-05 [recompute_prepare]: 1.169e-05 [updatestate_depend_eliminate]: 5.32001e-06 [updatestate_assign_eliminate]: 3.88001e-06 [updatestate_loads_eliminate]: 3.3e-06 [parameter_eliminate]: 2.29001e-06 [a_2]: 9.568e-05 [accelerated_algorithm]: 1.276e-05 [shard]: 2.03002e-06 [meta_shard_fg_expand]: 3.05002e-06 [shard_inline]: 7.28e-06 [merge_send_recv]: 1.082e-05 [auto_parallel]: 1.119e-05 [parallel]: 9.86998e-06 [flash_sp]: 4.74e-06 [merge_comm]: 4.2e-06 [allreduce_fusion]: 4e-06 [matmul_add_comm_reduction]: 1.112e-05 [allreduce_slice_to_reducescatter]: 6.99976e-07 [virtual_shard_identity]: 9.20999e-06 [virtual_dataset]: 6.91999e-06 [get_grad_eliminate_]: 6.76999e-06 [virtual_output]: 6.24999e-06 [merge_forward]: 4.75001e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 1.132e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.527e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 1.17e-05 [set_forward_comm_id_for_comm_node_pass]: 4.65999e-06 [meta_fg_expand]: 0.00013292 [flash_sp_send_recv_attached]: 2.44001e-06 [receive_attached]: 2.69999e-06 [after_resolve]: 1.537e-05 [a_after_grad]: 1.058e-05 [renormalize]: 0.000987 [add_forward_monad_depend]: 6.54001e-06 [auto_monad_grad]: 2.54999e-06 [auto_monad_eliminator]: 1.661e-05 [cse]: 3.739e-05 [a_3]: 5.507e-05 [Cycle 3]: 0.00072761, [45] [expand_dump_flag]: 1.87999e-06 [switch_simplify]: 9.09e-06 [loop_unroll]: 7.3e-06 [a_1]: 0.00015905 [with_stream_mark]: 1.094e-05 [recompute_prepare]: 7.03e-06 [updatestate_depend_eliminate]: 4e-06 [updatestate_assign_eliminate]: 3.56999e-06 [updatestate_loads_eliminate]: 3.23e-06 [parameter_eliminate]: 1.02998e-06 [a_2]: 8.646e-05 [accelerated_algorithm]: 1.046e-05 [shard]: 1.69e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 7.02002e-06 [merge_send_recv]: 6.99001e-06 [auto_parallel]: 7.60998e-06 [parallel]: 6.63e-06 [flash_sp]: 1.29e-06 [merge_comm]: 4.13001e-06 [allreduce_fusion]: 4.28999e-06 [matmul_add_comm_reduction]: 7.16001e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.97003e-06 [virtual_dataset]: 6.24999e-06 [get_grad_eliminate_]: 6.36e-06 [virtual_output]: 6.14001e-06 [merge_forward]: 4.62e-06 [cell_reuse_recompute_pass]: 1.84e-06 [offload_activation]: 8.72e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.327e-05 [merge_recompute_call_nodes]: 9.30013e-07 [before_grad]: 1.181e-05 [set_forward_comm_id_for_comm_node_pass]: 4.13001e-06 [meta_fg_expand]: 2.81999e-06 [flash_sp_send_recv_attached]: 1.27e-06 [receive_attached]: 1.88997e-06 [after_resolve]: 9.44e-06 [a_after_grad]: 9.69e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 8.17e-06 [cse]: 1.889e-05 [a_3]: 3.872e-05 [py_interpret_to_execute_after_opt_a]: 1.342e-05 [slice_cell_reuse_recomputed_activation]: 2.46e-06 [rewriter_after_opt_a]: 3.799e-05 [convert_after_rewriter]: 9.76e-06 [order_py_execute_after_rewriter]: 5.97001e-06 [mutable_eliminate]: 0.00073336 [opt_b]: 0.00026079, [1] [Cycle 1]: 0.00023617, [7] [b_1]: 0.00014051 [b_2]: 8.93002e-06 [updatestate_depend_eliminate]: 7.62002e-06 [updatestate_assign_eliminate]: 3.18e-06 [updatestate_loads_eliminate]: 3.3e-06 [renormalize]: 5.3001e-07 [cse]: 3.207e-05 [optimize_parallel_all_gather_comm]: 2.198e-05 [overlap_param_gather]: 2.00002e-06 [cconv]: 3.314e-05 [loop_unroll]: 0.00049548 [opt_after_cconv]: 0.00012411, [1] [Cycle 1]: 0.000117, [7] [c_1]: 3.528e-05 [parameter_eliminate]: 4.43999e-06 [updatestate_depend_eliminate]: 6.62002e-06 [updatestate_assign_eliminate]: 4e-06 [updatestate_loads_eliminate]: 2.94001e-06 [cse]: 2.467e-05 [renormalize]: 5.60016e-07 [remove_dup_value]: 1.694e-05 [tuple_transform]: 8.488e-05, [1] [Cycle 1]: 8.013e-05, [4] [d_1]: 5.076e-05 [none_parameter_eliminate]: 1.69998e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 8.40001e-06 [partial_unused_args_eliminate]: 1.99999e-06 [add_recomputation]: 6.268e-05 [cse_after_recomputation]: 2.68e-05, [1] [Cycle 1]: 2.188e-05, [1] [cse]: 1.588e-05 [environ_conv]: 1.151e-05 [swap_dp_allreduce_reducescatter]: 6.26e-06 [bias_add_comm_swap]: 2.89001e-06 [label_micro_interleaved_index]: 5.40999e-06 [label_fine_grained_interleaved_index]: 2.85998e-06 [merge_cast_opt]: 1.45001e-06 [slice_recompute_activation]: 2.22001e-06 [micro_interleaved_order_control]: 2.59999e-06 [assign_add_opt]: 1.72999e-06 [ForceFp32Comm]: 1.14e-06 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.34001e-06 [reorder_send_recv_between_fp_bp]: 3.03e-06 [comm_op_add_attrs]: 1.20999e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.11002e-06 [overlap_opt_shard_in_pipeline]: 1.34998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.80001e-06 [control_data_broadcast_order]: 1.485e-05 [grouped_pairwise_exchange_alltoall]: 1.62999e-06 [offloading_packed_experts]: 5.16002e-06 [overlap_recompute_and_grad_model_parallel]: 5.72999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.62001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.41e-06 [overlap_grad_ring_attention]: 4.93001e-06 [overlap_grad_flash_sp]: 2.599e-05 [begin_end_overlap_inline]: 8.09989e-07 [split_matmul_comm_elemetwise]: 2.69999e-06 [split_layernorm_comm]: 2.09999e-06 [handle_group_info]: 1.40999e-06 [symbol_engine_optimizer]: 9.32e-05, [1] [Cycle 1]: 8.854e-05, [6] [build]: 1.156e-05 [elim_shapecalc]: 1.091e-05 [elim_not_effective]: 1.54e-05 [opt_reshape]: 7.40998e-06 [fold_const_symbol]: 1.255e-05 [renormalize]: 2.20025e-07 [detach_backward]: 2.33998e-06 [pipeline_parallel_scheduler]: 1.89e-06 [auto_monad_reorder]: 2.228e-05 [get_jit_bprop_graph]: 2.01e-06 [rewriter_after_jit_bprop_graph]: 5.59e-06 [opt_after_jit_grad]: 0.00050433 [validate]: 5.258e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.587662 [execute]: 1.073e-05 Sums bootstrap : 0.000520s : 0.08% type_inference : 0.013279s : 2.13% event_method : 0.000055s : 0.01% auto_monad : 0.000142s : 0.02% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000044s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.00% parallel-infer-symbol : 0.000005s : 0.00% pre_auto_parallel : 0.000065s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000047s : 0.01% optimize.rewriter_before_opt_a : 0.000180s : 0.03% optimize.opt_a.expand_dump_flag : 0.000011s : 0.00% optimize.opt_a.switch_simplify : 0.000136s : 0.02% optimize.opt_a.loop_unroll : 0.000118s : 0.02% optimize.opt_a.a_1 : 0.003295s : 0.53% optimize.opt_a.with_stream_mark : 0.000070s : 0.01% optimize.opt_a.recompute_prepare : 0.000048s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.00% optimize.opt_a.parameter_eliminate : 0.000007s : 0.00% optimize.opt_a.a_2 : 0.000434s : 0.07% optimize.opt_a.accelerated_algorithm : 0.000059s : 0.01% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.00% optimize.opt_a.shard_inline : 0.000031s : 0.00% optimize.opt_a.merge_send_recv : 0.000037s : 0.01% optimize.opt_a.auto_parallel : 0.000034s : 0.01% optimize.opt_a.parallel : 0.000039s : 0.01% optimize.opt_a.flash_sp : 0.000020s : 0.00% optimize.opt_a.merge_comm : 0.000018s : 0.00% optimize.opt_a.allreduce_fusion : 0.000017s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000054s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.01% optimize.opt_a.virtual_dataset : 0.000029s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.00% optimize.opt_a.virtual_output : 0.000027s : 0.00% optimize.opt_a.merge_forward : 0.000019s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000038s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000052s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000019s : 0.00% optimize.opt_a.meta_fg_expand : 0.002470s : 0.40% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000101s : 0.02% optimize.opt_a.a_after_grad : 0.000115s : 0.02% optimize.opt_a.renormalize : 0.010040s : 1.61% optimize.opt_a.add_forward_monad_depend : 0.000023s : 0.00% optimize.opt_a.auto_monad_grad : 0.000011s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.01% optimize.opt_a.cse : 0.000264s : 0.04% optimize.opt_a.a_3 : 0.000443s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000038s : 0.01% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000733s : 0.12% optimize.opt_b.b_1 : 0.000141s : 0.02% optimize.opt_b.b_2 : 0.000009s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000033s : 0.01% optimize.loop_unroll : 0.000495s : 0.08% optimize.opt_after_cconv.c_1 : 0.000035s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000025s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000017s : 0.00% optimize.tuple_transform.d_1 : 0.000051s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000063s : 0.01% optimize.cse_after_recomputation.cse : 0.000016s : 0.00% optimize.environ_conv : 0.000012s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000504s : 0.08% validate : 0.000053s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.587662s : 94.33% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.000966 161 7.17% : 0.000069s : 8: substitution.arithmetic_simplify 0.25% : 0.000002s : 3: substitution.elim_not_effective 0.57% : 0.000006s : 5: substitution.float_depend_g_call 0.47% : 0.000005s : 2: substitution.float_tuple_getitem_switch 0.22% : 0.000002s : 3: substitution.fold_const_symbol 0.75% : 0.000007s : 4: substitution.graph_param_transform 0.45% : 0.000004s : 2: substitution.incorporate_call 0.22% : 0.000002s : 2: substitution.incorporate_call_switch 61.19% : 0.000591s : 17: substitution.inline 2.29% : 0.000022s : 2: substitution.inline_without_move 1.16% : 0.000011s : 15: substitution.j_node_and_user_rematch 2.08% : 0.000020s : 3: substitution.less_batch_normalization 1.40% : 0.000014s : 7: substitution.minmaximum_grad 0.83% : 0.000008s : 5: substitution.partial_eliminate 1.39% : 0.000013s : 15: substitution.remove_not_recompute_node 3.64% : 0.000035s : 10: substitution.replace_applicator 1.31% : 0.000013s : 10: substitution.replace_old_param 0.36% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.58% : 0.000025s : 7: substitution.tuple_list_convert_item_index_to_positive 1.20% : 0.000012s : 7: substitution.tuple_list_get_item_const_eliminator 1.65% : 0.000016s : 7: substitution.tuple_list_get_item_depend_reorder 7.12% : 0.000069s : 19: substitution.tuple_list_get_item_eliminator 1.69% : 0.000016s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.013182 2 86.55% : 0.011409s : 1: type_inference.infer 13.45% : 0.001773s : 1: type_inference.specialize ------[replace.] 0.000246 27 67.70% : 0.000167s : 17: replace.inline 32.30% : 0.000080s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000615 27 94.30% : 0.000579s : 17: match.inline 5.70% : 0.000035s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000718 4248 1.21% : 0.000009s : 53: predicate.accumulaten_eliminater 0.25% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.43% : 0.000003s : 21: predicate.addn_check_dump 1.10% : 0.000008s : 53: predicate.addn_zero_filter 1.07% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 74: predicate.arithmetic_simplify 1.23% : 0.000009s : 53: predicate.cast_eliminate 1.09% : 0.000008s : 50: predicate.check_bprop_eliminate 0.45% : 0.000003s : 21: predicate.compare_switch_simplify 0.08% : 0.000001s : 4: predicate.const_output_eliminate 0.47% : 0.000003s : 21: predicate.depend_value_elim 1.18% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.24% : 0.000009s : 53: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 57: predicate.environ_get_add_eliminate 1.17% : 0.000008s : 57: predicate.environ_get_depend_swap 1.69% : 0.000012s : 78: predicate.environ_get_eliminate 1.15% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.77% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.68% : 0.000019s : 80: predicate.float_depend_g_call 0.44% : 0.000003s : 21: predicate.float_environ_get_switch 0.54% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.51% : 0.000004s : 21: predicate.get_grad_eliminate 0.06% : 0.000000s : 4: predicate.graph_param_transform 0.49% : 0.000004s : 21: predicate.incorporate_call 0.44% : 0.000003s : 21: predicate.incorporate_call_switch 5.97% : 0.000043s : 183: predicate.inline 1.41% : 0.000010s : 45: predicate.inline_without_move 0.27% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.72% : 0.000005s : 21: predicate.less_batch_normalization 1.58% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.58% : 0.000019s : 124: predicate.load_eliminater 0.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.60% : 0.000019s : 113: predicate.loop_unroll_before_grad 1.36% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.45% : 0.000003s : 21: predicate.merge_addn 1.06% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.09% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 53: predicate.minmaximum_grad 0.40% : 0.000003s : 4: predicate.mutable_eliminate 0.10% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 2.40% : 0.000017s : 80: predicate.partial_defer_inline 1.65% : 0.000012s : 67: predicate.partial_eliminate 1.14% : 0.000008s : 53: predicate.print_const_string_wrapper 0.49% : 0.000004s : 21: predicate.reduce_all_const_elim 1.43% : 0.000010s : 53: predicate.reduce_eliminate 2.57% : 0.000018s : 124: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 21: predicate.remove_not_recompute_node 1.84% : 0.000013s : 113: predicate.replace_applicator 0.77% : 0.000006s : 45: predicate.replace_old_param 0.07% : 0.000001s : 4: predicate.reset_defer_inline 1.14% : 0.000008s : 53: predicate.reshape_eliminate 1.08% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 4: predicate.row_tensor_eliminate 1.19% : 0.000009s : 50: predicate.same_eliminate 0.34% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.58% : 0.000004s : 21: predicate.shard_identity_eliminate 0.24% : 0.000002s : 8: predicate.special_op_eliminate 0.57% : 0.000004s : 21: predicate.specialize_transform 1.39% : 0.000010s : 50: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.89% : 0.000014s : 80: predicate.switch_defer_inline 2.90% : 0.000021s : 130: predicate.switch_layer_defer_inline 5.20% : 0.000037s : 218: predicate.switch_simplify 1.14% : 0.000008s : 53: predicate.tile_eliminate 1.09% : 0.000008s : 53: predicate.transpose_eliminate 1.40% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 61: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000021s : 92: predicate.tuple_list_get_item_eliminator 1.52% : 0.000011s : 61: predicate.tuple_list_get_set_item_eliminator 1.94% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.51% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.48% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.04% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.13% : 0.000001s : 4: predicate.value_based_eliminate 0.52% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.49% : 0.000003s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002203 36 58.49% : 0.001288s : 15: func_graph_cloner_run.FuncGraphClonerGraph 41.51% : 0.000914s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.669558 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.61% : 0.004061s : 1: add_attr 0.60% : 0.004045s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000067s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.02% : 0.000151s : 1: auto_monad 0.00% : 0.000026s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.08% : 0.000546s : 1: bootstrap 0.01% : 0.000037s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000018s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000030s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000015s : 1: environ_conv 0.01% : 0.000065s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.08% : 0.000506s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.11% : 0.000746s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000018s : 1: opt.transform.mutable_eliminate 0.73% : 0.004856s : 117: opt.transform.opt_a 0.01% : 0.000034s : 1: opt.transform.opt_after_cconv 0.00% : 0.000026s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000117s : 28: opt.transform.opt_b 0.01% : 0.000057s : 2: opt.transform.opt_trans_graph 0.01% : 0.000043s : 4: opt.transform.symbol_engine_opt 2.88% : 0.019266s : 1: opt_a 0.02% : 0.000128s : 1: opt_after_cconv 0.08% : 0.000516s : 1: opt_after_jit_grad 0.04% : 0.000265s : 1: opt_b 3.27% : 0.021888s : 1: optimize 0.00% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000072s : 1: pre_auto_parallel 0.01% : 0.000052s : 1: py_interpret_to_execute 0.00% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000021s : 1: remove_dup_value 1.20% : 0.008002s : 2: renormalize.infer 0.30% : 0.002015s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000043s : 1: rewriter_after_opt_a 0.03% : 0.000189s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000096s : 1: symbol_engine_optimizer 87.77% : 0.587686s : 1: task_emit 0.01% : 0.000088s : 1: tuple_transform 1.99% : 0.013306s : 1: type_inference 0.01% : 0.000085s : 1: validate TotalTime = 0.536497, [24] [bootstrap]: 0.00054582 [type_inference]: 0.00782615 [event_method]: 1.387e-05 [auto_monad]: 6.804e-05 [graph_reusing]: 7.73001e-06 [inline]: 3.70998e-06 [add_attr]: 0.00406627, [1] [add_attr_with_inline]: 0.00405404, [1] [Cycle 1]: 5.466e-05, [2] [tag_attr]: 1.704e-05 [meta_addattr_fg_expand]: 3.83999e-06 [parallel-infer-symbol]: 4.68999e-06 [pre_auto_parallel]: 3.182e-05 [insert-virtual-dataset]: 3.17002e-06 [parallel-infer-symbol-second]: 9.10019e-07 [dataset_repeat_opt]: 2.16998e-06 [pipeline_split]: 1.91e-06 [optimize]: 0.00579299, [53] [py_interpret_to_execute]: 2.66e-05 [rewriter_before_opt_a]: 6.285e-05 [opt_a]: 0.00282818, [2] [Cycle 1]: 0.00203568, [45] [expand_dump_flag]: 1.86e-06 [switch_simplify]: 2.45e-05 [loop_unroll]: 1.807e-05 [a_1]: 0.00038432 [with_stream_mark]: 2.044e-05 [recompute_prepare]: 8.68001e-06 [updatestate_depend_eliminate]: 4.06001e-06 [updatestate_assign_eliminate]: 3.68e-06 [updatestate_loads_eliminate]: 3.30998e-06 [parameter_eliminate]: 2.84999e-06 [a_2]: 8.998e-05 [accelerated_algorithm]: 8.32e-06 [shard]: 2.85002e-06 [meta_shard_fg_expand]: 2.06e-06 [shard_inline]: 6.89999e-06 [merge_send_recv]: 9.92001e-06 [auto_parallel]: 9.24e-06 [parallel]: 2.207e-05 [flash_sp]: 1.037e-05 [merge_comm]: 4.38001e-06 [allreduce_fusion]: 4e-06 [matmul_add_comm_reduction]: 1.194e-05 [allreduce_slice_to_reducescatter]: 7.30011e-07 [virtual_shard_identity]: 9.74999e-06 [virtual_dataset]: 6.53e-06 [get_grad_eliminate_]: 6.41998e-06 [virtual_output]: 7.06999e-06 [merge_forward]: 4.41002e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 1.16e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.534e-05 [merge_recompute_call_nodes]: 1.79998e-06 [before_grad]: 1.276e-05 [set_forward_comm_id_for_comm_node_pass]: 4.20999e-06 [meta_fg_expand]: 2.66e-06 [flash_sp_send_recv_attached]: 3.26999e-06 [receive_attached]: 2.61999e-06 [after_resolve]: 1.116e-05 [a_after_grad]: 9.96998e-06 [renormalize]: 0.00085705 [add_forward_monad_depend]: 7.08998e-06 [auto_monad_grad]: 2.68e-06 [auto_monad_eliminator]: 2.033e-05 [cse]: 3.343e-05 [a_3]: 5.6e-05 [Cycle 2]: 0.00077749, [45] [expand_dump_flag]: 2.05002e-06 [switch_simplify]: 7.9e-06 [loop_unroll]: 6.93e-06 [a_1]: 0.00014214 [with_stream_mark]: 1.474e-05 [recompute_prepare]: 6.33e-06 [updatestate_depend_eliminate]: 3.07002e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 3.41999e-06 [parameter_eliminate]: 1.10001e-06 [a_2]: 7.53e-05 [accelerated_algorithm]: 5.81e-06 [shard]: 2.54999e-06 [meta_shard_fg_expand]: 2.32001e-06 [shard_inline]: 6.27001e-06 [merge_send_recv]: 7.83001e-06 [auto_parallel]: 8.60999e-06 [parallel]: 7.71999e-06 [flash_sp]: 5.16998e-06 [merge_comm]: 3.76001e-06 [allreduce_fusion]: 3.66001e-06 [matmul_add_comm_reduction]: 1.622e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.28999e-06 [virtual_dataset]: 6.06e-06 [get_grad_eliminate_]: 6.07999e-06 [virtual_output]: 6.19999e-06 [merge_forward]: 3.88001e-06 [cell_reuse_recompute_pass]: 3.51001e-06 [offload_activation]: 8.80999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.25e-05 [merge_recompute_call_nodes]: 1.17999e-06 [before_grad]: 1.133e-05 [set_forward_comm_id_for_comm_node_pass]: 4.89e-06 [meta_fg_expand]: 2.21e-06 [flash_sp_send_recv_attached]: 1.55999e-06 [receive_attached]: 1.81e-06 [after_resolve]: 1.126e-05 [a_after_grad]: 9.02e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.81003e-06 [auto_monad_grad]: 1.78002e-06 [auto_monad_eliminator]: 9.89001e-06 [cse]: 2.529e-05 [a_3]: 3.939e-05 [py_interpret_to_execute_after_opt_a]: 1.587e-05 [slice_cell_reuse_recomputed_activation]: 2.76e-06 [rewriter_after_opt_a]: 4.289e-05 [convert_after_rewriter]: 7.26999e-06 [order_py_execute_after_rewriter]: 6.00002e-06 [mutable_eliminate]: 0.00089901 [opt_b]: 0.0002551, [1] [Cycle 1]: 0.00024304, [7] [b_1]: 0.00013461 [b_2]: 9.92999e-06 [updatestate_depend_eliminate]: 1.006e-05 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 2.86999e-06 [renormalize]: 1.45999e-06 [cse]: 3.817e-05 [optimize_parallel_all_gather_comm]: 2.484e-05 [overlap_param_gather]: 2.12001e-06 [cconv]: 4.334e-05 [loop_unroll]: 0.00073682 [opt_after_cconv]: 0.00012868, [1] [Cycle 1]: 0.00011858, [7] [c_1]: 3.212e-05 [parameter_eliminate]: 4.58999e-06 [updatestate_depend_eliminate]: 7.74002e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 3.58e-06 [cse]: 2.735e-05 [renormalize]: 1.15001e-06 [remove_dup_value]: 2.17e-05 [tuple_transform]: 8.837e-05, [1] [Cycle 1]: 8.278e-05, [4] [d_1]: 5.146e-05 [none_parameter_eliminate]: 1.90001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 7.25e-06 [partial_unused_args_eliminate]: 2.42001e-06 [add_recomputation]: 6.009e-05 [cse_after_recomputation]: 2.995e-05, [1] [Cycle 1]: 2.334e-05, [1] [cse]: 1.59e-05 [environ_conv]: 8.18001e-06 [swap_dp_allreduce_reducescatter]: 6.64001e-06 [bias_add_comm_swap]: 3.76001e-06 [label_micro_interleaved_index]: 7.65e-06 [label_fine_grained_interleaved_index]: 3.15998e-06 [merge_cast_opt]: 1.81e-06 [slice_recompute_activation]: 3.08e-06 [micro_interleaved_order_control]: 3.11001e-06 [assign_add_opt]: 1.55001e-06 [ForceFp32Comm]: 1.04e-06 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.82002e-06 [reorder_send_recv_between_fp_bp]: 3.85e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 1.16997e-06 [interleave_split_concat_branches]: 1.45001e-06 [interleave_parallel_branches]: 1.29e-06 [overlap_opt_shard_in_pipeline]: 1.54e-06 [overlap_opt_shard_grad_in_pipeline]: 1.97001e-06 [control_data_broadcast_order]: 1.773e-05 [grouped_pairwise_exchange_alltoall]: 2.11e-06 [offloading_packed_experts]: 4.92e-06 [overlap_recompute_and_grad_model_parallel]: 6.94999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42999e-06 [overlap_recompute_allgather_and_fa_grad]: 2.00002e-06 [overlap_recompute_comm]: 2.66e-06 [overlap_grad_ring_attention]: 5.66e-06 [overlap_grad_flash_sp]: 2.449e-05 [begin_end_overlap_inline]: 7.29982e-07 [split_matmul_comm_elemetwise]: 2.63e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.30999e-06 [symbol_engine_optimizer]: 9.916e-05, [1] [Cycle 1]: 9.287e-05, [6] [build]: 4.38999e-06 [elim_shapecalc]: 1.63e-05 [elim_not_effective]: 1.755e-05 [opt_reshape]: 7.91001e-06 [fold_const_symbol]: 1.103e-05 [renormalize]: 6.29982e-07 [detach_backward]: 2.83998e-06 [pipeline_parallel_scheduler]: 1.66998e-06 [auto_monad_reorder]: 2.253e-05 [get_jit_bprop_graph]: 2.56e-06 [rewriter_after_jit_bprop_graph]: 6.23e-06 [opt_after_jit_grad]: 0.0171386 [validate]: 6.199e-05 [backend_pass]: 1.16997e-06 [task_emit]: 0.500562 [execute]: 1.018e-05 Sums bootstrap : 0.000546s : 0.10% type_inference : 0.007826s : 1.47% event_method : 0.000014s : 0.00% auto_monad : 0.000068s : 0.01% graph_reusing : 0.000008s : 0.00% inline : 0.000004s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000005s : 0.00% pre_auto_parallel : 0.000032s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000027s : 0.01% optimize.rewriter_before_opt_a : 0.000063s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.01% optimize.opt_a.loop_unroll : 0.000025s : 0.00% optimize.opt_a.a_1 : 0.000526s : 0.10% optimize.opt_a.with_stream_mark : 0.000035s : 0.01% optimize.opt_a.recompute_prepare : 0.000015s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000165s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.00% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000018s : 0.00% optimize.opt_a.auto_parallel : 0.000018s : 0.00% optimize.opt_a.parallel : 0.000030s : 0.01% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000008s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000028s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000017s : 0.00% optimize.opt_a.virtual_dataset : 0.000013s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000013s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000020s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000028s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000024s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000009s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.00% optimize.opt_a.a_after_grad : 0.000019s : 0.00% optimize.opt_a.renormalize : 0.000857s : 0.16% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000030s : 0.01% optimize.opt_a.cse : 0.000059s : 0.01% optimize.opt_a.a_3 : 0.000095s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000043s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000899s : 0.17% optimize.opt_b.b_1 : 0.000135s : 0.03% optimize.opt_b.b_2 : 0.000010s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000038s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000025s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000043s : 0.01% optimize.loop_unroll : 0.000737s : 0.14% optimize.opt_after_cconv.c_1 : 0.000032s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000027s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000022s : 0.00% optimize.tuple_transform.d_1 : 0.000051s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000060s : 0.01% optimize.cse_after_recomputation.cse : 0.000016s : 0.00% optimize.environ_conv : 0.000008s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000008s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000004s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000007s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000016s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000001s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000023s : 0.00% get_jit_bprop_graph : 0.000003s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.017139s : 3.23% validate : 0.000062s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.500562s : 94.25% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000176 24 20.99% : 0.000037s : 4: substitution.arithmetic_simplify 1.54% : 0.000003s : 2: substitution.elim_not_effective 0.89% : 0.000002s : 2: substitution.fold_const_symbol 3.73% : 0.000007s : 3: substitution.graph_param_transform 64.13% : 0.000113s : 3: substitution.inline 2.76% : 0.000005s : 4: substitution.j_node_and_user_rematch 3.18% : 0.000006s : 4: substitution.remove_not_recompute_node 2.79% : 0.000005s : 2: substitution.replace_old_param ------[type_inference.] 0.007764 2 92.27% : 0.007164s : 1: type_inference.infer 7.73% : 0.000600s : 1: type_inference.specialize ------[replace.] 0.000033 3 100.00% : 0.000033s : 3: replace.inline ------[match.] 0.000111 3 100.00% : 0.000111s : 3: match.inline ------[predicate.] 0.000177 815 0.91% : 0.000002s : 8: predicate.accumulaten_eliminater 2.02% : 0.000004s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.80% : 0.000001s : 8: predicate.addn_zero_filter 0.65% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.42% : 0.000004s : 14: predicate.arithmetic_simplify 0.91% : 0.000002s : 8: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.67% : 0.000001s : 6: predicate.depend_value_elim 0.74% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 1.04% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 8: predicate.dict_set_item_eliminator 2.32% : 0.000004s : 6: predicate.dumpgradient_eliminate 0.34% : 0.000001s : 3: predicate.elim_not_effective 0.64% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.25% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.03% : 0.000002s : 11: predicate.environ_get_depend_swap 1.70% : 0.000003s : 17: predicate.environ_get_eliminate 1.04% : 0.000002s : 11: predicate.environ_get_set_eliminate 0.99% : 0.000002s : 11: predicate.exchange_switch_depend_value 1.96% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 6: predicate.float_environ_get_switch 0.89% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.34% : 0.000001s : 3: predicate.graph_param_transform 0.64% : 0.000001s : 6: predicate.incorporate_call 0.54% : 0.000001s : 6: predicate.incorporate_call_switch 5.84% : 0.000010s : 37: predicate.inline 1.13% : 0.000002s : 6: predicate.inline_without_move 0.33% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.89% : 0.000002s : 6: predicate.less_batch_normalization 1.65% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.01% : 0.000004s : 22: predicate.load_eliminater 1.31% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.73% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.88% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.72% : 0.000001s : 6: predicate.micro_step_allgather_replace 1.00% : 0.000002s : 6: predicate.mini_step_allgather_replace 0.65% : 0.000001s : 8: predicate.minmaximum_grad 1.57% : 0.000003s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.26% : 0.000002s : 11: predicate.partial_defer_inline 1.10% : 0.000002s : 11: predicate.partial_eliminate 0.79% : 0.000001s : 8: predicate.print_const_string_wrapper 0.59% : 0.000001s : 6: predicate.reduce_all_const_elim 1.49% : 0.000003s : 8: predicate.reduce_eliminate 2.10% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 6: predicate.remove_not_recompute_node 1.36% : 0.000002s : 14: predicate.replace_applicator 0.46% : 0.000001s : 6: predicate.replace_old_param 0.44% : 0.000001s : 3: predicate.reset_defer_inline 1.07% : 0.000002s : 8: predicate.reshape_eliminate 0.69% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.56% : 0.000001s : 3: predicate.row_tensor_eliminate 0.95% : 0.000002s : 6: predicate.same_eliminate 0.62% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 6: predicate.shard_identity_eliminate 1.14% : 0.000002s : 6: predicate.special_op_eliminate 0.81% : 0.000001s : 6: predicate.specialize_transform 1.24% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.05% : 0.000002s : 11: predicate.switch_defer_inline 1.83% : 0.000003s : 17: predicate.switch_layer_defer_inline 3.79% : 0.000007s : 38: predicate.switch_simplify 0.74% : 0.000001s : 8: predicate.tile_eliminate 0.89% : 0.000002s : 8: predicate.transpose_eliminate 1.53% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.03% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.96% : 0.000003s : 14: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.75% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.10% : 0.000004s : 22: predicate.updatestate_pure_node_eliminater 2.64% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 3: predicate.value_based_eliminate 0.62% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.27% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000457 7 34.66% : 0.000159s : 2: func_graph_cloner_run.FuncGraphClonerGraph 65.34% : 0.000299s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.548357 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.74% : 0.004074s : 1: add_attr 0.74% : 0.004058s : 1: add_attr_with_inline 0.00% : 0.000005s : 1: add_comm_op_reuse_tag 0.01% : 0.000067s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.01% : 0.000074s : 1: auto_monad 0.01% : 0.000027s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.11% : 0.000594s : 1: bootstrap 0.01% : 0.000047s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000022s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000033s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000011s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000005s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000011s : 1: label_micro_interleaved_index 0.14% : 0.000750s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.17% : 0.000919s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.00% : 0.000019s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000022s : 1: opt.transform.mutable_eliminate 0.17% : 0.000930s : 78: opt.transform.opt_a 0.01% : 0.000030s : 1: opt.transform.opt_after_cconv 0.01% : 0.000056s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000107s : 28: opt.transform.opt_b 0.01% : 0.000056s : 2: opt.transform.opt_trans_graph 0.01% : 0.000047s : 4: opt.transform.symbol_engine_opt 0.52% : 0.002832s : 1: opt_a 0.02% : 0.000133s : 1: opt_after_cconv 3.13% : 0.017177s : 1: opt_after_jit_grad 0.05% : 0.000259s : 1: opt_b 1.06% : 0.005801s : 1: optimize 0.01% : 0.000029s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.01% : 0.000030s : 1: overlap_grad_flash_sp 0.00% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000005s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000036s : 1: pre_auto_parallel 0.01% : 0.000031s : 1: py_interpret_to_execute 0.00% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000025s : 1: remove_dup_value 0.08% : 0.000461s : 1: renormalize.infer 0.07% : 0.000386s : 1: renormalize.specialize 0.00% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000048s : 1: rewriter_after_opt_a 0.01% : 0.000067s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000007s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000102s : 1: symbol_engine_optimizer 91.29% : 0.500588s : 1: task_emit 0.02% : 0.000092s : 1: tuple_transform 1.43% : 0.007852s : 1: type_inference 0.02% : 0.000103s : 1: validate TotalTime = 0.70145, [24] [bootstrap]: 0.00049305 [type_inference]: 0.0420365 [event_method]: 5.639e-05 [auto_monad]: 0.00013801 [graph_reusing]: 9.25999e-06 [inline]: 3.5e-06 [add_attr]: 0.00360561, [1] [add_attr_with_inline]: 0.0035945, [1] [Cycle 1]: 8.871e-05, [2] [tag_attr]: 3.815e-05 [meta_addattr_fg_expand]: 9.15001e-06 [parallel-infer-symbol]: 3.95e-06 [pre_auto_parallel]: 5.508e-05 [insert-virtual-dataset]: 2.93998e-06 [parallel-infer-symbol-second]: 9.79984e-07 [dataset_repeat_opt]: 2.41998e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.0321386, [53] [py_interpret_to_execute]: 4.049e-05 [rewriter_before_opt_a]: 0.00015688 [opt_a]: 0.0292112, [3] [Cycle 1]: 0.0251317, [45] [expand_dump_flag]: 4.88001e-06 [switch_simplify]: 7.563e-05 [loop_unroll]: 5.97e-05 [a_1]: 0.00145418 [with_stream_mark]: 2.892e-05 [recompute_prepare]: 2.493e-05 [updatestate_depend_eliminate]: 9.96998e-06 [updatestate_assign_eliminate]: 7.94002e-06 [updatestate_loads_eliminate]: 7.35e-06 [parameter_eliminate]: 3.08e-06 [a_2]: 0.00024606 [accelerated_algorithm]: 3.22e-05 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 3.84002e-06 [shard_inline]: 1.615e-05 [merge_send_recv]: 1.89e-05 [auto_parallel]: 1.157e-05 [parallel]: 2.211e-05 [flash_sp]: 1.333e-05 [merge_comm]: 9.34e-06 [allreduce_fusion]: 8.62e-06 [matmul_add_comm_reduction]: 3.105e-05 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 1.843e-05 [virtual_dataset]: 1.571e-05 [get_grad_eliminate_]: 1.511e-05 [virtual_output]: 1.515e-05 [merge_forward]: 9.43002e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 1.833e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.07e-05 [merge_recompute_call_nodes]: 1.84e-06 [before_grad]: 2.857e-05 [set_forward_comm_id_for_comm_node_pass]: 9.89001e-06 [meta_fg_expand]: 0.00167204 [flash_sp_send_recv_attached]: 5.20999e-06 [receive_attached]: 2.66999e-06 [after_resolve]: 7.295e-05 [a_after_grad]: 9.376e-05 [renormalize]: 0.0199673 [add_forward_monad_depend]: 1.571e-05 [auto_monad_grad]: 6.86001e-06 [auto_monad_eliminator]: 6.008e-05 [cse]: 0.00021994 [a_3]: 0.00036121 [Cycle 2]: 0.00325789, [45] [expand_dump_flag]: 3.38999e-06 [switch_simplify]: 4.761e-05 [loop_unroll]: 4.289e-05 [a_1]: 0.00141874 [with_stream_mark]: 2.215e-05 [recompute_prepare]: 1.135e-05 [updatestate_depend_eliminate]: 5.73002e-06 [updatestate_assign_eliminate]: 3.85e-06 [updatestate_loads_eliminate]: 3.73001e-06 [parameter_eliminate]: 2.27001e-06 [a_2]: 9.275e-05 [accelerated_algorithm]: 1.279e-05 [shard]: 2.51e-06 [meta_shard_fg_expand]: 2.58003e-06 [shard_inline]: 7.14001e-06 [merge_send_recv]: 9.89001e-06 [auto_parallel]: 1.14e-05 [parallel]: 1.148e-05 [flash_sp]: 4e-06 [merge_comm]: 4.27998e-06 [allreduce_fusion]: 4.01001e-06 [matmul_add_comm_reduction]: 1.006e-05 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 9.82999e-06 [virtual_dataset]: 6.64001e-06 [get_grad_eliminate_]: 6.66e-06 [virtual_output]: 6.33002e-06 [merge_forward]: 4.66002e-06 [cell_reuse_recompute_pass]: 1.16002e-06 [offload_activation]: 1.197e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.404e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 1.288e-05 [set_forward_comm_id_for_comm_node_pass]: 5.65001e-06 [meta_fg_expand]: 9.137e-05 [flash_sp_send_recv_attached]: 1.93002e-06 [receive_attached]: 2.12999e-06 [after_resolve]: 1.482e-05 [a_after_grad]: 1.079e-05 [renormalize]: 0.00092701 [add_forward_monad_depend]: 6.78998e-06 [auto_monad_grad]: 2.53e-06 [auto_monad_eliminator]: 1.572e-05 [cse]: 3.475e-05 [a_3]: 5.387e-05 [Cycle 3]: 0.00079982, [45] [expand_dump_flag]: 1.55999e-06 [switch_simplify]: 8.27e-06 [loop_unroll]: 6.94001e-06 [a_1]: 0.00015964 [with_stream_mark]: 1.166e-05 [recompute_prepare]: 7.11001e-06 [updatestate_depend_eliminate]: 4.50001e-06 [updatestate_assign_eliminate]: 2.89999e-06 [updatestate_loads_eliminate]: 2.93998e-06 [parameter_eliminate]: 1.21997e-06 [a_2]: 0.00010125 [accelerated_algorithm]: 1.267e-05 [shard]: 1.71e-06 [meta_shard_fg_expand]: 2.01998e-06 [shard_inline]: 7.38e-06 [merge_send_recv]: 7.85e-06 [auto_parallel]: 8e-06 [parallel]: 7.43999e-06 [flash_sp]: 1.40999e-06 [merge_comm]: 4.10998e-06 [allreduce_fusion]: 3.65e-06 [matmul_add_comm_reduction]: 9.14998e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 8.68001e-06 [virtual_dataset]: 6.76e-06 [get_grad_eliminate_]: 7.61999e-06 [virtual_output]: 6.33e-06 [merge_forward]: 6.13002e-06 [cell_reuse_recompute_pass]: 3.03e-06 [offload_activation]: 9.77001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.499e-05 [merge_recompute_call_nodes]: 1.86e-06 [before_grad]: 1.216e-05 [set_forward_comm_id_for_comm_node_pass]: 4.99998e-06 [meta_fg_expand]: 2.76999e-06 [flash_sp_send_recv_attached]: 1.38002e-06 [receive_attached]: 1.96998e-06 [after_resolve]: 1.356e-05 [a_after_grad]: 1.164e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 2.53e-06 [auto_monad_grad]: 1.85001e-06 [auto_monad_eliminator]: 1.203e-05 [cse]: 2.665e-05 [a_3]: 4.344e-05 [py_interpret_to_execute_after_opt_a]: 1.944e-05 [slice_cell_reuse_recomputed_activation]: 2.54999e-06 [rewriter_after_opt_a]: 5.333e-05 [convert_after_rewriter]: 9.37001e-06 [order_py_execute_after_rewriter]: 5.91e-06 [mutable_eliminate]: 0.00081656 [opt_b]: 0.00025461, [1] [Cycle 1]: 0.00024399, [7] [b_1]: 0.00014251 [b_2]: 9.98998e-06 [updatestate_depend_eliminate]: 1.059e-05 [updatestate_assign_eliminate]: 3.6e-06 [updatestate_loads_eliminate]: 3.08e-06 [renormalize]: 9.09989e-07 [cse]: 3.171e-05 [optimize_parallel_all_gather_comm]: 2.454e-05 [overlap_param_gather]: 2.81999e-06 [cconv]: 3.489e-05 [loop_unroll]: 0.00059406 [opt_after_cconv]: 0.00015014, [1] [Cycle 1]: 0.00014124, [7] [c_1]: 4.148e-05 [parameter_eliminate]: 5.29e-06 [updatestate_depend_eliminate]: 9.79e-06 [updatestate_assign_eliminate]: 4.43999e-06 [updatestate_loads_eliminate]: 4.19002e-06 [cse]: 3.392e-05 [renormalize]: 7.7e-07 [remove_dup_value]: 1.871e-05 [tuple_transform]: 9.34e-05, [1] [Cycle 1]: 8.766e-05, [4] [d_1]: 5.631e-05 [none_parameter_eliminate]: 2.02001e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 8.12e-06 [partial_unused_args_eliminate]: 2.18998e-06 [add_recomputation]: 7.258e-05 [cse_after_recomputation]: 3.404e-05, [1] [Cycle 1]: 2.757e-05, [1] [cse]: 1.916e-05 [environ_conv]: 1.236e-05 [swap_dp_allreduce_reducescatter]: 7.16999e-06 [bias_add_comm_swap]: 3.19001e-06 [label_micro_interleaved_index]: 6.79999e-06 [label_fine_grained_interleaved_index]: 2.93998e-06 [merge_cast_opt]: 1.55001e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 3.48e-06 [assign_add_opt]: 1.62999e-06 [ForceFp32Comm]: 1.00999e-06 [remove_cast_before_assign_add]: 1.57999e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.96001e-06 [comm_op_add_attrs]: 1.09003e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.30001e-06 [interleave_parallel_branches]: 1.14998e-06 [overlap_opt_shard_in_pipeline]: 1.69998e-06 [overlap_opt_shard_grad_in_pipeline]: 2.10002e-06 [control_data_broadcast_order]: 2.021e-05 [grouped_pairwise_exchange_alltoall]: 1.76998e-06 [offloading_packed_experts]: 6.02001e-06 [overlap_recompute_and_grad_model_parallel]: 5.79e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.68002e-06 [overlap_recompute_comm]: 2.81999e-06 [overlap_grad_ring_attention]: 5.64e-06 [overlap_grad_flash_sp]: 2.679e-05 [begin_end_overlap_inline]: 7.49977e-07 [split_matmul_comm_elemetwise]: 2.53e-06 [split_layernorm_comm]: 2.22001e-06 [handle_group_info]: 1.34e-06 [symbol_engine_optimizer]: 0.00011848, [1] [Cycle 1]: 0.00011147, [6] [build]: 1.215e-05 [elim_shapecalc]: 1.593e-05 [elim_not_effective]: 1.711e-05 [opt_reshape]: 8.79e-06 [fold_const_symbol]: 1.447e-05 [renormalize]: 2.89991e-07 [detach_backward]: 2.82002e-06 [pipeline_parallel_scheduler]: 1.96e-06 [auto_monad_reorder]: 2.656e-05 [get_jit_bprop_graph]: 2.76e-06 [rewriter_after_jit_bprop_graph]: 6.89001e-06 [opt_after_jit_grad]: 0.00065385 [validate]: 0.0121447 [backend_pass]: 1.64998e-06 [task_emit]: 0.609703 [execute]: 9.82001e-06 Sums bootstrap : 0.000493s : 0.07% type_inference : 0.042036s : 6.04% event_method : 0.000056s : 0.01% auto_monad : 0.000138s : 0.02% graph_reusing : 0.000009s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000038s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000055s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.01% optimize.rewriter_before_opt_a : 0.000157s : 0.02% optimize.opt_a.expand_dump_flag : 0.000010s : 0.00% optimize.opt_a.switch_simplify : 0.000132s : 0.02% optimize.opt_a.loop_unroll : 0.000110s : 0.02% optimize.opt_a.a_1 : 0.003033s : 0.44% optimize.opt_a.with_stream_mark : 0.000063s : 0.01% optimize.opt_a.recompute_prepare : 0.000043s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.00% optimize.opt_a.parameter_eliminate : 0.000007s : 0.00% optimize.opt_a.a_2 : 0.000440s : 0.06% optimize.opt_a.accelerated_algorithm : 0.000058s : 0.01% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.00% optimize.opt_a.shard_inline : 0.000031s : 0.00% optimize.opt_a.merge_send_recv : 0.000037s : 0.01% optimize.opt_a.auto_parallel : 0.000031s : 0.00% optimize.opt_a.parallel : 0.000041s : 0.01% optimize.opt_a.flash_sp : 0.000019s : 0.00% optimize.opt_a.merge_comm : 0.000018s : 0.00% optimize.opt_a.allreduce_fusion : 0.000016s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000050s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.01% optimize.opt_a.virtual_dataset : 0.000029s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000029s : 0.00% optimize.opt_a.virtual_output : 0.000028s : 0.00% optimize.opt_a.merge_forward : 0.000020s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000040s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000054s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001766s : 0.25% optimize.opt_a.flash_sp_send_recv_attached : 0.000009s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000101s : 0.01% optimize.opt_a.a_after_grad : 0.000116s : 0.02% optimize.opt_a.renormalize : 0.020894s : 3.00% optimize.opt_a.add_forward_monad_depend : 0.000025s : 0.00% optimize.opt_a.auto_monad_grad : 0.000011s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000088s : 0.01% optimize.opt_a.cse : 0.000281s : 0.04% optimize.opt_a.a_3 : 0.000459s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000019s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000053s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000817s : 0.12% optimize.opt_b.b_1 : 0.000143s : 0.02% optimize.opt_b.b_2 : 0.000010s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000011s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000025s : 0.00% optimize.overlap_param_gather : 0.000003s : 0.00% optimize.cconv : 0.000035s : 0.01% optimize.loop_unroll : 0.000594s : 0.09% optimize.opt_after_cconv.c_1 : 0.000041s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000010s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000034s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000019s : 0.00% optimize.tuple_transform.d_1 : 0.000056s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000073s : 0.01% optimize.cse_after_recomputation.cse : 0.000019s : 0.00% optimize.environ_conv : 0.000012s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000007s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000020s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000027s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000016s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000017s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.00% get_jit_bprop_graph : 0.000003s : 0.00% rewriter_after_jit_bprop_graph : 0.000007s : 0.00% opt_after_jit_grad : 0.000654s : 0.09% validate : 0.012145s : 1.74% backend_pass : 0.000002s : 0.00% task_emit : 0.609703s : 87.58% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000840 159 7.71% : 0.000065s : 7: substitution.arithmetic_simplify 0.33% : 0.000003s : 3: substitution.elim_not_effective 0.60% : 0.000005s : 5: substitution.float_depend_g_call 0.50% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.38% : 0.000003s : 3: substitution.fold_const_symbol 0.91% : 0.000008s : 4: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 58.41% : 0.000491s : 17: substitution.inline 2.53% : 0.000021s : 2: substitution.inline_without_move 1.37% : 0.000011s : 15: substitution.j_node_and_user_rematch 2.20% : 0.000018s : 3: substitution.less_batch_normalization 1.43% : 0.000012s : 7: substitution.minmaximum_grad 0.82% : 0.000007s : 5: substitution.partial_eliminate 1.56% : 0.000013s : 15: substitution.remove_not_recompute_node 3.94% : 0.000033s : 10: substitution.replace_applicator 1.45% : 0.000012s : 10: substitution.replace_old_param 0.37% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.95% : 0.000025s : 7: substitution.tuple_list_convert_item_index_to_positive 1.35% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.80% : 0.000015s : 7: substitution.tuple_list_get_item_depend_reorder 6.87% : 0.000058s : 18: substitution.tuple_list_get_item_eliminator 1.89% : 0.000016s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.041933 2 95.81% : 0.040175s : 1: type_inference.infer 4.19% : 0.001759s : 1: type_inference.specialize ------[replace.] 0.000205 26 66.70% : 0.000137s : 17: replace.inline 33.30% : 0.000068s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000507 26 94.74% : 0.000481s : 17: match.inline 5.26% : 0.000027s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000715 4180 1.12% : 0.000008s : 52: predicate.accumulaten_eliminater 0.45% : 0.000003s : 4: predicate.ad_related_special_op_eliminate 0.44% : 0.000003s : 21: predicate.addn_check_dump 1.08% : 0.000008s : 52: predicate.addn_zero_filter 1.07% : 0.000008s : 52: predicate.adjust_all_reduce_mul_add 1.97% : 0.000014s : 73: predicate.arithmetic_simplify 1.16% : 0.000008s : 52: predicate.cast_eliminate 1.15% : 0.000008s : 50: predicate.check_bprop_eliminate 0.44% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.46% : 0.000003s : 21: predicate.depend_value_elim 1.10% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.14% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.47% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000000s : 4: predicate.elim_not_effective 0.19% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 56: predicate.environ_add_const_eliminate 1.12% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.14% : 0.000008s : 56: predicate.environ_get_depend_swap 1.64% : 0.000012s : 77: predicate.environ_get_eliminate 1.16% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.77% : 0.000013s : 78: predicate.exchange_switch_depend_value 2.49% : 0.000018s : 78: predicate.float_depend_g_call 0.44% : 0.000003s : 21: predicate.float_environ_get_switch 0.55% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.51% : 0.000004s : 21: predicate.get_grad_eliminate 0.06% : 0.000000s : 4: predicate.graph_param_transform 0.51% : 0.000004s : 21: predicate.incorporate_call 0.44% : 0.000003s : 21: predicate.incorporate_call_switch 5.86% : 0.000042s : 180: predicate.inline 1.42% : 0.000010s : 45: predicate.inline_without_move 0.32% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 21: predicate.less_batch_normalization 1.50% : 0.000011s : 69: predicate.list_to_tuple_eliminator_ 2.66% : 0.000019s : 121: predicate.load_eliminater 0.56% : 0.000004s : 4: predicate.loop_unroll_after_grad 2.44% : 0.000017s : 110: predicate.loop_unroll_before_grad 1.31% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.48% : 0.000003s : 21: predicate.merge_addn 1.11% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.08% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.08% : 0.000008s : 52: predicate.minmaximum_grad 0.67% : 0.000005s : 4: predicate.mutable_eliminate 0.18% : 0.000001s : 4: predicate.opt_reshape 0.14% : 0.000001s : 4: predicate.parallel_virtual_node 2.25% : 0.000016s : 78: predicate.partial_defer_inline 1.65% : 0.000012s : 65: predicate.partial_eliminate 1.08% : 0.000008s : 52: predicate.print_const_string_wrapper 0.51% : 0.000004s : 21: predicate.reduce_all_const_elim 1.37% : 0.000010s : 52: predicate.reduce_eliminate 2.54% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.30% : 0.000002s : 21: predicate.remove_not_recompute_node 1.92% : 0.000014s : 111: predicate.replace_applicator 0.86% : 0.000006s : 45: predicate.replace_old_param 0.11% : 0.000001s : 4: predicate.reset_defer_inline 1.12% : 0.000008s : 52: predicate.reshape_eliminate 1.15% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 4: predicate.row_tensor_eliminate 1.36% : 0.000010s : 50: predicate.same_eliminate 0.36% : 0.000003s : 21: predicate.set_cell_output_no_recompute 0.56% : 0.000004s : 21: predicate.shard_identity_eliminate 0.29% : 0.000002s : 8: predicate.special_op_eliminate 0.58% : 0.000004s : 21: predicate.specialize_transform 1.39% : 0.000010s : 50: predicate.split_environ_get_set_with_tuple_value 1.25% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.89% : 0.000013s : 78: predicate.switch_defer_inline 2.91% : 0.000021s : 128: predicate.switch_layer_defer_inline 5.03% : 0.000036s : 213: predicate.switch_simplify 1.16% : 0.000008s : 52: predicate.tile_eliminate 1.09% : 0.000008s : 52: predicate.transpose_eliminate 1.42% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.71% : 0.000019s : 90: predicate.tuple_list_get_item_eliminator 1.45% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000014s : 81: predicate.tuple_list_set_item_eliminator 1.53% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.50% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.12% : 0.000022s : 142: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 4: predicate.value_based_eliminate 0.54% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.49% : 0.000004s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002156 35 57.80% : 0.001246s : 14: func_graph_cloner_run.FuncGraphClonerGraph 42.20% : 0.000910s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.762899 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.47% : 0.003612s : 1: add_attr 0.47% : 0.003599s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000082s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.02% : 0.000148s : 1: auto_monad 0.00% : 0.000034s : 1: auto_monad_reorder 0.00% : 0.000010s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.07% : 0.000531s : 1: bootstrap 0.01% : 0.000041s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000024s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.00% : 0.000016s : 1: environ_conv 0.01% : 0.000067s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000014s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000005s : 1: handle_group_info 0.00% : 0.000008s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000010s : 1: label_micro_interleaved_index 0.08% : 0.000609s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000007s : 1: micro_interleaved_order_control 0.11% : 0.000835s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.00% : 0.000021s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000024s : 1: opt.transform.mutable_eliminate 0.60% : 0.004597s : 117: opt.transform.opt_a 0.01% : 0.000039s : 1: opt.transform.opt_after_cconv 0.00% : 0.000038s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000121s : 28: opt.transform.opt_b 0.01% : 0.000062s : 2: opt.transform.opt_trans_graph 0.01% : 0.000052s : 4: opt.transform.symbol_engine_opt 3.83% : 0.029215s : 1: opt_a 0.02% : 0.000154s : 1: opt_after_cconv 0.09% : 0.000672s : 1: opt_after_jit_grad 0.03% : 0.000258s : 1: opt_b 4.21% : 0.032144s : 1: optimize 0.00% : 0.000029s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000031s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000007s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000060s : 1: pre_auto_parallel 0.01% : 0.000045s : 1: py_interpret_to_execute 0.00% : 0.000024s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000005s : 1: remove_cast_before_assign_add 0.00% : 0.000022s : 1: remove_dup_value 2.47% : 0.018854s : 2: renormalize.infer 0.26% : 0.002019s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000011s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000061s : 1: rewriter_after_opt_a 0.02% : 0.000162s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000121s : 1: symbol_engine_optimizer 79.92% : 0.609728s : 1: task_emit 0.01% : 0.000097s : 1: tuple_transform 5.51% : 0.042067s : 1: type_inference 1.60% : 0.012231s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x3-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x3-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x4-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x4-pynative],max_mem:6.0M TotalTime = 0.0260737, [24] [bootstrap]: 0.00055886 [type_inference]: 0.00752834 [event_method]: 1.592e-05 [auto_monad]: 6.709e-05 [graph_reusing]: 6.68e-06 [inline]: 3.66999e-06 [add_attr]: 0.0045496, [1] [add_attr_with_inline]: 0.00453389, [1] [Cycle 1]: 6.629e-05, [2] [tag_attr]: 1.989e-05 [meta_addattr_fg_expand]: 4.53999e-06 [parallel-infer-symbol]: 3.63e-06 [pre_auto_parallel]: 3.425e-05 [insert-virtual-dataset]: 2.69001e-06 [parallel-infer-symbol-second]: 9.00007e-07 [dataset_repeat_opt]: 2.26998e-06 [pipeline_split]: 3.52002e-06 [optimize]: 0.0050216, [53] [py_interpret_to_execute]: 2.715e-05 [rewriter_before_opt_a]: 7.379e-05 [opt_a]: 0.00263787, [2] [Cycle 1]: 0.0019869, [45] [expand_dump_flag]: 3.94002e-06 [switch_simplify]: 3.508e-05 [loop_unroll]: 2.105e-05 [a_1]: 0.00048683 [with_stream_mark]: 1.91e-05 [recompute_prepare]: 8.65001e-06 [updatestate_depend_eliminate]: 3.91001e-06 [updatestate_assign_eliminate]: 3.44001e-06 [updatestate_loads_eliminate]: 3.51999e-06 [parameter_eliminate]: 1.89e-06 [a_2]: 8.023e-05 [accelerated_algorithm]: 6.74999e-06 [shard]: 2.92002e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 6.23e-06 [merge_send_recv]: 9.33002e-06 [auto_parallel]: 7.11999e-06 [parallel]: 2.951e-05 [flash_sp]: 9.09e-06 [merge_comm]: 4.03001e-06 [allreduce_fusion]: 3.73001e-06 [matmul_add_comm_reduction]: 1.045e-05 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 8.16002e-06 [virtual_dataset]: 6.06e-06 [get_grad_eliminate_]: 5.76998e-06 [virtual_output]: 5.88002e-06 [merge_forward]: 4.27e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 1.122e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.283e-05 [merge_recompute_call_nodes]: 1.67001e-06 [before_grad]: 1.128e-05 [set_forward_comm_id_for_comm_node_pass]: 4.17003e-06 [meta_fg_expand]: 2.89999e-06 [flash_sp_send_recv_attached]: 2.51e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.039e-05 [a_after_grad]: 9.67001e-06 [renormalize]: 0.00076423 [add_forward_monad_depend]: 1.06e-05 [auto_monad_grad]: 2.78003e-06 [auto_monad_eliminator]: 1.69e-05 [cse]: 3.327e-05 [a_3]: 4.512e-05 [Cycle 2]: 0.00063994, [45] [expand_dump_flag]: 1.33002e-06 [switch_simplify]: 6.78998e-06 [loop_unroll]: 5.86e-06 [a_1]: 0.00012209 [with_stream_mark]: 1.201e-05 [recompute_prepare]: 6.36e-06 [updatestate_depend_eliminate]: 3.33e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.98998e-06 [parameter_eliminate]: 1.01002e-06 [a_2]: 7.281e-05 [accelerated_algorithm]: 5.89e-06 [shard]: 1.15999e-06 [meta_shard_fg_expand]: 1.39998e-06 [shard_inline]: 6.11e-06 [merge_send_recv]: 5.12e-06 [auto_parallel]: 5.99e-06 [parallel]: 5.99e-06 [flash_sp]: 4.29997e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.08e-06 [matmul_add_comm_reduction]: 7.52998e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.68e-06 [virtual_dataset]: 5.49998e-06 [get_grad_eliminate_]: 5.24e-06 [virtual_output]: 5.12999e-06 [merge_forward]: 3.06999e-06 [cell_reuse_recompute_pass]: 1.65001e-06 [offload_activation]: 7.82e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.89999e-06 [merge_recompute_call_nodes]: 1.12e-06 [before_grad]: 9.59999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.35999e-06 [after_resolve]: 9.52999e-06 [a_after_grad]: 8.2e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 6.84001e-06 [cse]: 2.247e-05 [a_3]: 3.413e-05 [py_interpret_to_execute_after_opt_a]: 9.52999e-06 [slice_cell_reuse_recomputed_activation]: 2.56e-06 [rewriter_after_opt_a]: 3.846e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 5.56002e-06 [mutable_eliminate]: 0.00069005 [opt_b]: 0.00020408, [1] [Cycle 1]: 0.00019671, [7] [b_1]: 0.00011319 [b_2]: 8.22e-06 [updatestate_depend_eliminate]: 7.48e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.41e-06 [renormalize]: 8.30012e-07 [cse]: 2.466e-05 [optimize_parallel_all_gather_comm]: 1.941e-05 [overlap_param_gather]: 1.96998e-06 [cconv]: 3.146e-05 [loop_unroll]: 0.00055756 [opt_after_cconv]: 0.00010875, [1] [Cycle 1]: 0.00010238, [7] [c_1]: 2.741e-05 [parameter_eliminate]: 4.17e-06 [updatestate_depend_eliminate]: 6.47001e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 3.16001e-06 [cse]: 2.258e-05 [renormalize]: 6.69999e-07 [remove_dup_value]: 1.684e-05 [tuple_transform]: 7.471e-05, [1] [Cycle 1]: 6.951e-05, [4] [d_1]: 4.147e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 7.26001e-06 [partial_unused_args_eliminate]: 1.85001e-06 [add_recomputation]: 5.461e-05 [cse_after_recomputation]: 2.224e-05, [1] [Cycle 1]: 1.744e-05, [1] [cse]: 1.205e-05 [environ_conv]: 9.87001e-06 [swap_dp_allreduce_reducescatter]: 5.42999e-06 [bias_add_comm_swap]: 2.94001e-06 [label_micro_interleaved_index]: 5.39e-06 [label_fine_grained_interleaved_index]: 3.4e-06 [merge_cast_opt]: 1.54998e-06 [slice_recompute_activation]: 2.19999e-06 [micro_interleaved_order_control]: 2.76e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.52001e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.77001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94999e-06 [control_data_broadcast_order]: 1.585e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 4.70001e-06 [overlap_recompute_and_grad_model_parallel]: 5.39998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.26997e-06 [overlap_recompute_comm]: 2.80002e-06 [overlap_grad_ring_attention]: 4.57e-06 [overlap_grad_flash_sp]: 1.974e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.93e-06 [split_layernorm_comm]: 1.91e-06 [handle_group_info]: 1.27999e-06 [symbol_engine_optimizer]: 8.001e-05, [1] [Cycle 1]: 7.506e-05, [6] [build]: 3.51999e-06 [elim_shapecalc]: 1.068e-05 [elim_not_effective]: 1.378e-05 [opt_reshape]: 7.45e-06 [fold_const_symbol]: 1.036e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.21e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.747e-05 [get_jit_bprop_graph]: 2.34001e-06 [rewriter_after_jit_bprop_graph]: 4.84e-06 [opt_after_jit_grad]: 0.00053665 [validate]: 4.696e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.00731871 [execute]: 9.42999e-06 Sums bootstrap : 0.000559s : 2.75% type_inference : 0.007528s : 37.00% event_method : 0.000016s : 0.08% auto_monad : 0.000067s : 0.33% graph_reusing : 0.000007s : 0.03% inline : 0.000004s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000020s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000034s : 0.17% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000004s : 0.02% optimize.py_interpret_to_execute : 0.000027s : 0.13% optimize.rewriter_before_opt_a : 0.000074s : 0.36% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000042s : 0.21% optimize.opt_a.loop_unroll : 0.000027s : 0.13% optimize.opt_a.a_1 : 0.000609s : 2.99% optimize.opt_a.with_stream_mark : 0.000031s : 0.15% optimize.opt_a.recompute_prepare : 0.000015s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000153s : 0.75% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.06% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000014s : 0.07% optimize.opt_a.auto_parallel : 0.000013s : 0.06% optimize.opt_a.parallel : 0.000036s : 0.17% optimize.opt_a.flash_sp : 0.000013s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.07% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.05% optimize.opt_a.virtual_output : 0.000011s : 0.05% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000019s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000021s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.10% optimize.opt_a.a_after_grad : 0.000018s : 0.09% optimize.opt_a.renormalize : 0.000764s : 3.76% optimize.opt_a.add_forward_monad_depend : 0.000012s : 0.06% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.12% optimize.opt_a.cse : 0.000056s : 0.27% optimize.opt_a.a_3 : 0.000079s : 0.39% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.01% optimize.rewriter_after_opt_a : 0.000038s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.03% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000690s : 3.39% optimize.opt_b.b_1 : 0.000113s : 0.56% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000025s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000031s : 0.15% optimize.loop_unroll : 0.000558s : 2.74% optimize.opt_after_cconv.c_1 : 0.000027s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.cse : 0.000023s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000017s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000055s : 0.27% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000010s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000002s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000016s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000020s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.02% opt_after_jit_grad : 0.000537s : 2.64% validate : 0.000047s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.007319s : 35.97% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000206 26 18.32% : 0.000038s : 5: substitution.arithmetic_simplify 1.01% : 0.000002s : 2: substitution.elim_not_effective 0.91% : 0.000002s : 2: substitution.fold_const_symbol 3.03% : 0.000006s : 3: substitution.graph_param_transform 66.09% : 0.000136s : 3: substitution.inline 1.82% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.27% : 0.000005s : 4: substitution.remove_not_recompute_node 1.85% : 0.000004s : 2: substitution.replace_old_param 4.72% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007453 2 89.87% : 0.006697s : 1: type_inference.infer 10.13% : 0.000755s : 1: type_inference.specialize ------[replace.] 0.000041 4 80.62% : 0.000033s : 3: replace.inline 19.38% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000143 4 93.96% : 0.000135s : 3: match.inline 6.04% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000166 883 0.87% : 0.000001s : 9: predicate.accumulaten_eliminater 1.18% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.52% : 0.000001s : 6: predicate.addn_check_dump 0.83% : 0.000001s : 9: predicate.addn_zero_filter 0.79% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 15: predicate.arithmetic_simplify 1.13% : 0.000002s : 9: predicate.cast_eliminate 0.59% : 0.000001s : 6: predicate.check_bprop_eliminate 0.55% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.56% : 0.000001s : 6: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.39% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.14% : 0.000002s : 12: predicate.environ_get_depend_swap 1.75% : 0.000003s : 18: predicate.environ_get_eliminate 1.05% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.18% : 0.000004s : 13: predicate.float_depend_g_call 0.53% : 0.000001s : 6: predicate.float_environ_get_switch 0.86% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.70% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.67% : 0.000001s : 6: predicate.incorporate_call 0.55% : 0.000001s : 6: predicate.incorporate_call_switch 6.18% : 0.000010s : 40: predicate.inline 1.05% : 0.000002s : 6: predicate.inline_without_move 0.64% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 6: predicate.less_batch_normalization 1.81% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.32% : 0.000004s : 25: predicate.load_eliminater 1.32% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.10% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.58% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 9: predicate.minmaximum_grad 1.42% : 0.000002s : 3: predicate.mutable_eliminate 0.39% : 0.000001s : 3: predicate.opt_reshape 0.39% : 0.000001s : 3: predicate.parallel_virtual_node 1.59% : 0.000003s : 13: predicate.partial_defer_inline 1.38% : 0.000002s : 13: predicate.partial_eliminate 1.07% : 0.000002s : 9: predicate.print_const_string_wrapper 0.58% : 0.000001s : 6: predicate.reduce_all_const_elim 1.38% : 0.000002s : 9: predicate.reduce_eliminate 2.34% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 6: predicate.remove_not_recompute_node 1.42% : 0.000002s : 16: predicate.replace_applicator 0.57% : 0.000001s : 6: predicate.replace_old_param 0.41% : 0.000001s : 3: predicate.reset_defer_inline 0.93% : 0.000002s : 9: predicate.reshape_eliminate 0.67% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.56% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 6: predicate.shard_identity_eliminate 0.83% : 0.000001s : 6: predicate.special_op_eliminate 0.74% : 0.000001s : 6: predicate.specialize_transform 0.98% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 13: predicate.switch_defer_inline 1.87% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.78% : 0.000008s : 43: predicate.switch_simplify 0.83% : 0.000001s : 9: predicate.tile_eliminate 1.05% : 0.000002s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.21% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.02% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.64% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.64% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000522 8 46.40% : 0.000242s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.60% : 0.000280s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.037477 196 0.01% : 0.000004s : 1: ForceFp32Comm 12.16% : 0.004556s : 1: add_attr 12.11% : 0.004538s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000060s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000072s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.60% : 0.000600s : 1: bootstrap 0.09% : 0.000035s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000019s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000013s : 1: environ_conv 0.06% : 0.000022s : 1: event_method 0.04% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000006s : 1: get_jit_bprop_graph 0.03% : 0.000011s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000007s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000007s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.52% : 0.000568s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 1.87% : 0.000702s : 1: mutable_eliminate 0.02% : 0.000008s : 1: offloading_packed_experts 0.04% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000019s : 1: opt.transform.mutable_eliminate 2.65% : 0.000992s : 78: opt.transform.opt_a 0.07% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.24% : 0.000091s : 28: opt.transform.opt_b 0.12% : 0.000046s : 2: opt.transform.opt_trans_graph 0.10% : 0.000038s : 4: opt.transform.symbol_engine_opt 7.05% : 0.002641s : 1: opt_a 0.30% : 0.000113s : 1: opt_after_cconv 1.46% : 0.000548s : 1: opt_after_jit_grad 0.55% : 0.000208s : 1: opt_b 13.41% : 0.005027s : 1: optimize 0.06% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.06% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.03% : 0.000011s : 1: pipeline_split 0.10% : 0.000039s : 1: pre_auto_parallel 0.08% : 0.000032s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000021s : 1: remove_dup_value 1.10% : 0.000413s : 1: renormalize.infer 0.92% : 0.000343s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000043s : 1: rewriter_after_opt_a 0.21% : 0.000078s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000006s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000006s : 1: split_matmul_comm_elemetwise 0.02% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000083s : 1: symbol_engine_optimizer 19.58% : 0.007338s : 1: task_emit 0.21% : 0.000078s : 1: tuple_transform 20.16% : 0.007554s : 1: type_inference 0.23% : 0.000087s : 1: validate TotalTime = 0.022535, [24] [bootstrap]: 0.00049593 [type_inference]: 0.00675266 [event_method]: 1.389e-05 [auto_monad]: 6.381e-05 [graph_reusing]: 5.65001e-06 [inline]: 2.39001e-06 [add_attr]: 0.00325681, [1] [add_attr_with_inline]: 0.00324704, [1] [Cycle 1]: 5.547e-05, [2] [tag_attr]: 1.489e-05 [meta_addattr_fg_expand]: 3.99002e-06 [parallel-infer-symbol]: 3.32002e-06 [pre_auto_parallel]: 2.883e-05 [insert-virtual-dataset]: 2.82002e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 2.13998e-06 [pipeline_split]: 1.94e-06 [optimize]: 0.00451746, [53] [py_interpret_to_execute]: 2.173e-05 [rewriter_before_opt_a]: 5.648e-05 [opt_a]: 0.00231331, [2] [Cycle 1]: 0.00167052, [45] [expand_dump_flag]: 2.79999e-06 [switch_simplify]: 3.062e-05 [loop_unroll]: 1.749e-05 [a_1]: 0.00037081 [with_stream_mark]: 1.643e-05 [recompute_prepare]: 7.56001e-06 [updatestate_depend_eliminate]: 4e-06 [updatestate_assign_eliminate]: 3.2e-06 [updatestate_loads_eliminate]: 3.16001e-06 [parameter_eliminate]: 1.99e-06 [a_2]: 8.236e-05 [accelerated_algorithm]: 6.31e-06 [shard]: 2.58998e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 8.25e-06 [auto_parallel]: 6.44999e-06 [parallel]: 2.123e-05 [flash_sp]: 8.22003e-06 [merge_comm]: 3.85998e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 9.92001e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 7.64002e-06 [virtual_dataset]: 6.07001e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.72999e-06 [merge_forward]: 3.98001e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 1.068e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.247e-05 [merge_recompute_call_nodes]: 1.85001e-06 [before_grad]: 1.047e-05 [set_forward_comm_id_for_comm_node_pass]: 3.73001e-06 [meta_fg_expand]: 2.74999e-06 [flash_sp_send_recv_attached]: 2.58e-06 [receive_attached]: 2.39999e-06 [after_resolve]: 1.012e-05 [a_after_grad]: 8.50999e-06 [renormalize]: 0.00061005 [add_forward_monad_depend]: 5.75001e-06 [auto_monad_grad]: 3.28e-06 [auto_monad_eliminator]: 1.437e-05 [cse]: 3.427e-05 [a_3]: 4.511e-05 [Cycle 2]: 0.00063201, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 8.13999e-06 [loop_unroll]: 5.84e-06 [a_1]: 0.00011917 [with_stream_mark]: 1.194e-05 [recompute_prepare]: 6.02999e-06 [updatestate_depend_eliminate]: 3.04999e-06 [updatestate_assign_eliminate]: 2.77002e-06 [updatestate_loads_eliminate]: 3.66001e-06 [parameter_eliminate]: 1.27e-06 [a_2]: 7.258e-05 [accelerated_algorithm]: 5.92001e-06 [shard]: 1.44998e-06 [meta_shard_fg_expand]: 1.37e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 5.60001e-06 [auto_parallel]: 5.87999e-06 [parallel]: 5.79999e-06 [flash_sp]: 3.3e-06 [merge_comm]: 3.23e-06 [allreduce_fusion]: 3.27002e-06 [matmul_add_comm_reduction]: 6.55002e-06 [allreduce_slice_to_reducescatter]: 5.00004e-07 [virtual_shard_identity]: 6.79001e-06 [virtual_dataset]: 5.53002e-06 [get_grad_eliminate_]: 5.35999e-06 [virtual_output]: 5.51e-06 [merge_forward]: 3.22002e-06 [cell_reuse_recompute_pass]: 1.60999e-06 [offload_activation]: 6.66999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.101e-05 [merge_recompute_call_nodes]: 9.30013e-07 [before_grad]: 8.92e-06 [set_forward_comm_id_for_comm_node_pass]: 3.69002e-06 [meta_fg_expand]: 2.20002e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.28002e-06 [after_resolve]: 9.05999e-06 [a_after_grad]: 8.02998e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 1.07998e-06 [auto_monad_eliminator]: 7.06001e-06 [cse]: 1.883e-05 [a_3]: 3.332e-05 [py_interpret_to_execute_after_opt_a]: 9.91e-06 [slice_cell_reuse_recomputed_activation]: 2.05002e-06 [rewriter_after_opt_a]: 3.739e-05 [convert_after_rewriter]: 6.76e-06 [order_py_execute_after_rewriter]: 5.97001e-06 [mutable_eliminate]: 0.0006227 [opt_b]: 0.0001957, [1] [Cycle 1]: 0.00018828, [7] [b_1]: 0.00011146 [b_2]: 6.96001e-06 [updatestate_depend_eliminate]: 6.68998e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.79999e-06 [renormalize]: 5.69999e-07 [cse]: 2.046e-05 [optimize_parallel_all_gather_comm]: 1.707e-05 [overlap_param_gather]: 2.26998e-06 [cconv]: 2.935e-05 [loop_unroll]: 0.00051236 [opt_after_cconv]: 0.00010876, [1] [Cycle 1]: 0.00010155, [7] [c_1]: 2.963e-05 [parameter_eliminate]: 3.29001e-06 [updatestate_depend_eliminate]: 6.51999e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.61e-06 [cse]: 2.039e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.61e-05 [tuple_transform]: 7.21e-05, [1] [Cycle 1]: 6.673e-05, [4] [d_1]: 3.897e-05 [none_parameter_eliminate]: 2.15002e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.66999e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 5.047e-05 [cse_after_recomputation]: 2.323e-05, [1] [Cycle 1]: 1.818e-05, [1] [cse]: 1.223e-05 [environ_conv]: 5.54e-06 [swap_dp_allreduce_reducescatter]: 6.34001e-06 [bias_add_comm_swap]: 2.89999e-06 [label_micro_interleaved_index]: 4.58999e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.63e-06 [micro_interleaved_order_control]: 2.79999e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 2.59999e-06 [reorder_send_recv_between_fp_bp]: 2.94999e-06 [comm_op_add_attrs]: 1.32e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.24998e-06 [overlap_opt_shard_in_pipeline]: 1.59e-06 [overlap_opt_shard_grad_in_pipeline]: 2.01e-06 [control_data_broadcast_order]: 1.364e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 4.58999e-06 [overlap_recompute_and_grad_model_parallel]: 5.51e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 2.57001e-06 [overlap_grad_ring_attention]: 4.74002e-06 [overlap_grad_flash_sp]: 2.116e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.22999e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 7.489e-05, [1] [Cycle 1]: 7.05e-05, [6] [build]: 2.94001e-06 [elim_shapecalc]: 9.54e-06 [elim_not_effective]: 1.249e-05 [opt_reshape]: 6.11998e-06 [fold_const_symbol]: 1.01e-05 [renormalize]: 1.79978e-07 [detach_backward]: 2.33998e-06 [pipeline_parallel_scheduler]: 1.66998e-06 [auto_monad_reorder]: 1.737e-05 [get_jit_bprop_graph]: 1.49998e-06 [rewriter_after_jit_bprop_graph]: 3.86001e-06 [opt_after_jit_grad]: 0.00050309 [validate]: 4.312e-05 [backend_pass]: 1.30999e-06 [task_emit]: 0.00656345 [execute]: 8.95999e-06 Sums bootstrap : 0.000496s : 2.72% type_inference : 0.006753s : 37.08% event_method : 0.000014s : 0.08% auto_monad : 0.000064s : 0.35% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.12% optimize.rewriter_before_opt_a : 0.000056s : 0.31% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.21% optimize.opt_a.loop_unroll : 0.000023s : 0.13% optimize.opt_a.a_1 : 0.000490s : 2.69% optimize.opt_a.with_stream_mark : 0.000028s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000155s : 0.85% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.15% optimize.opt_a.flash_sp : 0.000012s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000610s : 3.35% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000053s : 0.29% optimize.opt_a.a_3 : 0.000078s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000037s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000623s : 3.42% optimize.opt_b.b_1 : 0.000111s : 0.61% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000020s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000029s : 0.16% optimize.loop_unroll : 0.000512s : 2.81% optimize.opt_after_cconv.c_1 : 0.000030s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000020s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000021s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000503s : 2.76% validate : 0.000043s : 0.24% backend_pass : 0.000001s : 0.01% task_emit : 0.006563s : 36.04% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000156 24 20.48% : 0.000032s : 4: substitution.arithmetic_simplify 1.53% : 0.000002s : 2: substitution.elim_not_effective 0.94% : 0.000001s : 2: substitution.fold_const_symbol 3.60% : 0.000006s : 3: substitution.graph_param_transform 65.83% : 0.000103s : 3: substitution.inline 2.14% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.14% : 0.000005s : 4: substitution.remove_not_recompute_node 2.33% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.006699 2 92.43% : 0.006192s : 1: type_inference.infer 7.57% : 0.000507s : 1: type_inference.specialize ------[replace.] 0.000031 3 100.00% : 0.000031s : 3: replace.inline ------[match.] 0.000101 3 100.00% : 0.000101s : 3: match.inline ------[predicate.] 0.000153 815 0.97% : 0.000001s : 8: predicate.accumulaten_eliminater 0.93% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 6: predicate.addn_check_dump 0.84% : 0.000001s : 8: predicate.addn_zero_filter 0.77% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.36% : 0.000004s : 14: predicate.arithmetic_simplify 0.89% : 0.000001s : 8: predicate.cast_eliminate 0.69% : 0.000001s : 6: predicate.check_bprop_eliminate 0.61% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.61% : 0.000001s : 6: predicate.depend_value_elim 0.81% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.20% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.48% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.02% : 0.000002s : 11: predicate.environ_get_depend_swap 1.96% : 0.000003s : 17: predicate.environ_get_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.20% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 6: predicate.float_environ_get_switch 0.88% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 3: predicate.fold_const_symbol 0.73% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.74% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.31% : 0.000010s : 37: predicate.inline 0.90% : 0.000001s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 6: predicate.less_batch_normalization 1.67% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 22: predicate.load_eliminater 1.28% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.88% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 8: predicate.minmaximum_grad 1.39% : 0.000002s : 3: predicate.mutable_eliminate 0.39% : 0.000001s : 3: predicate.opt_reshape 0.46% : 0.000001s : 3: predicate.parallel_virtual_node 1.50% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 11: predicate.partial_eliminate 1.07% : 0.000002s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.04% : 0.000002s : 8: predicate.reduce_eliminate 2.25% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 6: predicate.remove_not_recompute_node 1.21% : 0.000002s : 14: predicate.replace_applicator 0.60% : 0.000001s : 6: predicate.replace_old_param 0.31% : 0.000000s : 3: predicate.reset_defer_inline 0.86% : 0.000001s : 8: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 3: predicate.row_tensor_eliminate 0.79% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 6: predicate.shard_identity_eliminate 0.87% : 0.000001s : 6: predicate.special_op_eliminate 0.90% : 0.000001s : 6: predicate.specialize_transform 1.14% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.75% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.21% : 0.000002s : 11: predicate.switch_defer_inline 1.83% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.95% : 0.000008s : 38: predicate.switch_simplify 0.84% : 0.000001s : 8: predicate.tile_eliminate 0.83% : 0.000001s : 8: predicate.transpose_eliminate 1.59% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.58% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.45% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.80% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000331 7 34.02% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 65.98% : 0.000218s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031931 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.22% : 0.003262s : 1: add_attr 10.18% : 0.003251s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000069s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.65% : 0.000528s : 1: bootstrap 0.10% : 0.000033s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000026s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.05% : 0.000016s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.64% : 0.000522s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.98% : 0.000633s : 1: mutable_eliminate 0.02% : 0.000008s : 1: offloading_packed_experts 0.05% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 2.70% : 0.000861s : 78: opt.transform.opt_a 0.09% : 0.000028s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000035s : 4: opt.transform.symbol_engine_opt 7.26% : 0.002317s : 1: opt_a 0.35% : 0.000113s : 1: opt_after_cconv 1.61% : 0.000514s : 1: opt_after_jit_grad 0.62% : 0.000199s : 1: opt_b 14.16% : 0.004522s : 1: optimize 0.06% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000025s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 1.07% : 0.000340s : 1: renormalize.infer 0.82% : 0.000262s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000041s : 1: rewriter_after_opt_a 0.19% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000078s : 1: symbol_engine_optimizer 20.61% : 0.006583s : 1: task_emit 0.23% : 0.000075s : 1: tuple_transform 21.22% : 0.006774s : 1: type_inference 0.26% : 0.000084s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x4-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x4-kbk],max_mem:6.0M .. TotalTime = 20.5633, [24] [bootstrap]: 0.00053371 [type_inference]: 0.00705806 [event_method]: 1.372e-05 [auto_monad]: 6.327e-05 [graph_reusing]: 5.71e-06 [inline]: 2.57001e-06 [add_attr]: 0.00411693, [1] [add_attr_with_inline]: 0.00410492, [1] [Cycle 1]: 4.877e-05, [2] [tag_attr]: 1.611e-05 [meta_addattr_fg_expand]: 4.37e-06 [parallel-infer-symbol]: 3.31999e-06 [pre_auto_parallel]: 2.692e-05 [insert-virtual-dataset]: 2.55002e-06 [parallel-infer-symbol-second]: 8.69972e-07 [dataset_repeat_opt]: 2.34001e-06 [pipeline_split]: 1.74998e-06 [optimize]: 0.00447169, [53] [py_interpret_to_execute]: 2.312e-05 [rewriter_before_opt_a]: 6.548e-05 [opt_a]: 0.00242885, [2] [Cycle 1]: 0.00175691, [45] [expand_dump_flag]: 2.91e-06 [switch_simplify]: 3.307e-05 [loop_unroll]: 2.017e-05 [a_1]: 0.00045562 [with_stream_mark]: 1.49e-05 [recompute_prepare]: 8.74e-06 [updatestate_depend_eliminate]: 4.02e-06 [updatestate_assign_eliminate]: 3.43e-06 [updatestate_loads_eliminate]: 3.21999e-06 [parameter_eliminate]: 2.22001e-06 [a_2]: 8.031e-05 [accelerated_algorithm]: 6.46e-06 [shard]: 2.51e-06 [meta_shard_fg_expand]: 2.26e-06 [shard_inline]: 6.07999e-06 [merge_send_recv]: 9.27001e-06 [auto_parallel]: 6.94999e-06 [parallel]: 2.648e-05 [flash_sp]: 8.27e-06 [merge_comm]: 4.23999e-06 [allreduce_fusion]: 4.09002e-06 [matmul_add_comm_reduction]: 9.17999e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.25998e-06 [virtual_dataset]: 6.21e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 5.77999e-06 [merge_forward]: 4.12998e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.99999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.153e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 9.99001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.73001e-06 [meta_fg_expand]: 2.61999e-06 [flash_sp_send_recv_attached]: 2.93998e-06 [receive_attached]: 2.57001e-06 [after_resolve]: 9.54e-06 [a_after_grad]: 8.45999e-06 [renormalize]: 0.00060211 [add_forward_monad_depend]: 1.289e-05 [auto_monad_grad]: 2.65997e-06 [auto_monad_eliminator]: 1.567e-05 [cse]: 3.046e-05 [a_3]: 4.414e-05 [Cycle 2]: 0.00066188, [45] [expand_dump_flag]: 1.15999e-06 [switch_simplify]: 8.22e-06 [loop_unroll]: 6.86999e-06 [a_1]: 0.0001273 [with_stream_mark]: 1.148e-05 [recompute_prepare]: 6.94001e-06 [updatestate_depend_eliminate]: 3.28998e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.94001e-06 [parameter_eliminate]: 1.15999e-06 [a_2]: 7.712e-05 [accelerated_algorithm]: 6.59001e-06 [shard]: 1.16002e-06 [meta_shard_fg_expand]: 1.36002e-06 [shard_inline]: 5.99e-06 [merge_send_recv]: 5.04998e-06 [auto_parallel]: 5.82999e-06 [parallel]: 5.09e-06 [flash_sp]: 3.63999e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 6.85002e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.13998e-06 [virtual_dataset]: 5.99999e-06 [get_grad_eliminate_]: 5.64e-06 [virtual_output]: 5.54e-06 [merge_forward]: 3.18e-06 [cell_reuse_recompute_pass]: 1.74998e-06 [offload_activation]: 6.86001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.173e-05 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 9.59e-06 [set_forward_comm_id_for_comm_node_pass]: 3.78001e-06 [meta_fg_expand]: 2.06e-06 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.28002e-06 [after_resolve]: 9.03002e-06 [a_after_grad]: 7.95998e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 1.19998e-06 [auto_monad_eliminator]: 6.93e-06 [cse]: 1.556e-05 [a_3]: 3.723e-05 [py_interpret_to_execute_after_opt_a]: 9.09e-06 [slice_cell_reuse_recomputed_activation]: 2.29999e-06 [rewriter_after_opt_a]: 3.393e-05 [convert_after_rewriter]: 7.51999e-06 [order_py_execute_after_rewriter]: 6.11998e-06 [mutable_eliminate]: 0.00051267 [opt_b]: 0.00019573, [1] [Cycle 1]: 0.00018906, [7] [b_1]: 0.00011077 [b_2]: 7.66001e-06 [updatestate_depend_eliminate]: 6.74999e-06 [updatestate_assign_eliminate]: 2.73e-06 [updatestate_loads_eliminate]: 3.3e-06 [renormalize]: 4.19997e-07 [cse]: 1.948e-05 [optimize_parallel_all_gather_comm]: 1.725e-05 [overlap_param_gather]: 1.96e-06 [cconv]: 2.643e-05 [loop_unroll]: 0.00046416 [opt_after_cconv]: 0.00010129, [1] [Cycle 1]: 9.526e-05, [7] [c_1]: 2.671e-05 [parameter_eliminate]: 2.58e-06 [updatestate_depend_eliminate]: 5.32001e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.53e-06 [cse]: 1.847e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.695e-05 [tuple_transform]: 7.067e-05, [1] [Cycle 1]: 6.62e-05, [4] [d_1]: 3.912e-05 [none_parameter_eliminate]: 1.72001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.63e-06 [partial_unused_args_eliminate]: 2.12999e-06 [add_recomputation]: 4.973e-05 [cse_after_recomputation]: 2.245e-05, [1] [Cycle 1]: 1.752e-05, [1] [cse]: 1.166e-05 [environ_conv]: 9.48997e-06 [swap_dp_allreduce_reducescatter]: 5.30999e-06 [bias_add_comm_swap]: 3.51001e-06 [label_micro_interleaved_index]: 4.65999e-06 [label_fine_grained_interleaved_index]: 2.99999e-06 [merge_cast_opt]: 1.48002e-06 [slice_recompute_activation]: 2.15002e-06 [micro_interleaved_order_control]: 2.50002e-06 [assign_add_opt]: 1.45001e-06 [ForceFp32Comm]: 9.20001e-07 [remove_cast_before_assign_add]: 1.25999e-06 [full_micro_interleaved_order_control]: 2.24001e-06 [reorder_send_recv_between_fp_bp]: 3.09999e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 1.09998e-06 [interleave_split_concat_branches]: 1.47999e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.75001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.359e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 4.58001e-06 [overlap_recompute_and_grad_model_parallel]: 5.24e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50001e-06 [overlap_recompute_comm]: 2.84999e-06 [overlap_grad_ring_attention]: 4.80001e-06 [overlap_grad_flash_sp]: 1.787e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 2.17001e-06 [handle_group_info]: 1.15999e-06 [symbol_engine_optimizer]: 7.534e-05, [1] [Cycle 1]: 7.1e-05, [6] [build]: 3.61001e-06 [elim_shapecalc]: 9.84999e-06 [elim_not_effective]: 1.263e-05 [opt_reshape]: 6.54001e-06 [fold_const_symbol]: 9.86e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.92999e-06 [pipeline_parallel_scheduler]: 1.59998e-06 [auto_monad_reorder]: 1.701e-05 [get_jit_bprop_graph]: 1.91998e-06 [rewriter_after_jit_bprop_graph]: 3.95e-06 [opt_after_jit_grad]: 0.00049965 [validate]: 4.289e-05 [backend_pass]: 1.00999e-06 [task_emit]: 20.5462 [execute]: 6.31e-06 Sums bootstrap : 0.000534s : 0.00% type_inference : 0.007058s : 0.03% event_method : 0.000014s : 0.00% auto_monad : 0.000063s : 0.00% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.00% optimize.rewriter_before_opt_a : 0.000065s : 0.00% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000583s : 0.00% optimize.opt_a.with_stream_mark : 0.000026s : 0.00% optimize.opt_a.recompute_prepare : 0.000016s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000157s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000014s : 0.00% optimize.opt_a.auto_parallel : 0.000013s : 0.00% optimize.opt_a.parallel : 0.000032s : 0.00% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000016s : 0.00% optimize.opt_a.renormalize : 0.000602s : 0.00% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.00% optimize.opt_a.cse : 0.000046s : 0.00% optimize.opt_a.a_3 : 0.000081s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.00% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000513s : 0.00% optimize.opt_b.b_1 : 0.000111s : 0.00% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.00% optimize.loop_unroll : 0.000464s : 0.00% optimize.opt_after_cconv.c_1 : 0.000027s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000017s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.00% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000500s : 0.00% validate : 0.000043s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 20.546173s : 99.94% execute : 0.000006s : 0.00% Time group info: ------[substitution.] 0.000177 26 19.26% : 0.000034s : 5: substitution.arithmetic_simplify 1.29% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.31% : 0.000006s : 3: substitution.graph_param_transform 63.87% : 0.000113s : 3: substitution.inline 1.80% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.73% : 0.000005s : 4: substitution.remove_not_recompute_node 2.12% : 0.000004s : 2: substitution.replace_old_param 4.81% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007003 2 90.78% : 0.006358s : 1: type_inference.infer 9.22% : 0.000645s : 1: type_inference.specialize ------[replace.] 0.000038 4 79.28% : 0.000030s : 3: replace.inline 20.72% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 4 93.47% : 0.000111s : 3: match.inline 6.53% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000166 883 0.94% : 0.000002s : 9: predicate.accumulaten_eliminater 0.85% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 9: predicate.addn_zero_filter 0.90% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 15: predicate.arithmetic_simplify 0.95% : 0.000002s : 9: predicate.cast_eliminate 0.72% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.61% : 0.000001s : 6: predicate.depend_value_elim 0.87% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.11% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.00% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_depend_swap 1.79% : 0.000003s : 18: predicate.environ_get_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.26% : 0.000004s : 13: predicate.float_depend_g_call 0.57% : 0.000001s : 6: predicate.float_environ_get_switch 0.86% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.30% : 0.000010s : 40: predicate.inline 0.83% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 6: predicate.less_batch_normalization 1.68% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.62% : 0.000004s : 25: predicate.load_eliminater 1.06% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.09% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 6: predicate.merge_addn 0.65% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 9: predicate.minmaximum_grad 0.98% : 0.000002s : 3: predicate.mutable_eliminate 0.41% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.52% : 0.000003s : 13: predicate.partial_defer_inline 1.39% : 0.000002s : 13: predicate.partial_eliminate 0.87% : 0.000001s : 9: predicate.print_const_string_wrapper 0.60% : 0.000001s : 6: predicate.reduce_all_const_elim 1.15% : 0.000002s : 9: predicate.reduce_eliminate 2.39% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 6: predicate.remove_not_recompute_node 1.40% : 0.000002s : 16: predicate.replace_applicator 0.52% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.98% : 0.000002s : 9: predicate.reshape_eliminate 0.64% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 3: predicate.row_tensor_eliminate 0.90% : 0.000001s : 6: predicate.same_eliminate 0.47% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 6: predicate.shard_identity_eliminate 0.77% : 0.000001s : 6: predicate.special_op_eliminate 0.85% : 0.000001s : 6: predicate.specialize_transform 0.92% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.74% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 13: predicate.switch_defer_inline 1.96% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.93% : 0.000008s : 43: predicate.switch_simplify 0.92% : 0.000002s : 9: predicate.tile_eliminate 0.95% : 0.000002s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.74% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.66% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.55% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.74% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.25% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.04% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.65% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 6: predicate.virtual_output_eliminate 0.28% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.39% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000426 8 47.78% : 0.000204s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.22% : 0.000223s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 20.573570 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.02% : 0.004122s : 1: add_attr 0.02% : 0.004108s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000069s : 1: auto_monad 0.00% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.00% : 0.000562s : 1: bootstrap 0.00% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000017s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000012s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.00% : 0.000473s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.00% : 0.000522s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000014s : 1: opt.transform.mutable_eliminate 0.00% : 0.000969s : 78: opt.transform.opt_a 0.00% : 0.000025s : 1: opt.transform.opt_after_cconv 0.00% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000089s : 28: opt.transform.opt_b 0.00% : 0.000044s : 2: opt.transform.opt_trans_graph 0.00% : 0.000035s : 4: opt.transform.symbol_engine_opt 0.01% : 0.002432s : 1: opt_a 0.00% : 0.000105s : 1: opt_after_cconv 0.00% : 0.000509s : 1: opt_after_jit_grad 0.00% : 0.000199s : 1: opt_b 0.02% : 0.004476s : 1: optimize 0.00% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000032s : 1: pre_auto_parallel 0.00% : 0.000027s : 1: py_interpret_to_execute 0.00% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000021s : 1: remove_dup_value 0.00% : 0.000329s : 1: renormalize.infer 0.00% : 0.000265s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000038s : 1: rewriter_after_opt_a 0.00% : 0.000069s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000078s : 1: symbol_engine_optimizer 99.87% : 20.546190s : 1: task_emit 0.00% : 0.000074s : 1: tuple_transform 0.03% : 0.007075s : 1: type_inference 0.00% : 0.000073s : 1: validate . TotalTime = 0.354221, [24] [bootstrap]: 0.00045051 [type_inference]: 0.00667612 [event_method]: 1.415e-05 [auto_monad]: 6.396e-05 [graph_reusing]: 5.75001e-06 [inline]: 2.88e-06 [add_attr]: 0.00366221, [1] [add_attr_with_inline]: 0.00365024, [1] [Cycle 1]: 6.885e-05, [2] [tag_attr]: 1.835e-05 [meta_addattr_fg_expand]: 4.35e-06 [parallel-infer-symbol]: 4.05998e-06 [pre_auto_parallel]: 3.295e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 1.86998e-06 [pipeline_split]: 1.97001e-06 [optimize]: 0.00491384, [53] [py_interpret_to_execute]: 2.834e-05 [rewriter_before_opt_a]: 6.582e-05 [opt_a]: 0.00249801, [2] [Cycle 1]: 0.0017796, [45] [expand_dump_flag]: 2.79001e-06 [switch_simplify]: 3.132e-05 [loop_unroll]: 1.73e-05 [a_1]: 0.0004067 [with_stream_mark]: 2.074e-05 [recompute_prepare]: 9.36e-06 [updatestate_depend_eliminate]: 3.99002e-06 [updatestate_assign_eliminate]: 3.43999e-06 [updatestate_loads_eliminate]: 3.61001e-06 [parameter_eliminate]: 2.31e-06 [a_2]: 8.475e-05 [accelerated_algorithm]: 6.76e-06 [shard]: 2.59999e-06 [meta_shard_fg_expand]: 2.01998e-06 [shard_inline]: 6.49999e-06 [merge_send_recv]: 9.67999e-06 [auto_parallel]: 8.1e-06 [parallel]: 2.038e-05 [flash_sp]: 9.72001e-06 [merge_comm]: 4.33999e-06 [allreduce_fusion]: 4e-06 [matmul_add_comm_reduction]: 1.05e-05 [allreduce_slice_to_reducescatter]: 7.30011e-07 [virtual_shard_identity]: 9.76e-06 [virtual_dataset]: 6.58e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.79e-06 [merge_forward]: 3.95998e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 1.077e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.325e-05 [merge_recompute_call_nodes]: 1.58002e-06 [before_grad]: 1.144e-05 [set_forward_comm_id_for_comm_node_pass]: 4.17e-06 [meta_fg_expand]: 2.88998e-06 [flash_sp_send_recv_attached]: 3.27002e-06 [receive_attached]: 3.04001e-06 [after_resolve]: 1.065e-05 [a_after_grad]: 8.89e-06 [renormalize]: 0.00062407 [add_forward_monad_depend]: 7.35e-06 [auto_monad_grad]: 2.91e-06 [auto_monad_eliminator]: 1.785e-05 [cse]: 3.494e-05 [a_3]: 4.955e-05 [Cycle 2]: 0.00070424, [45] [expand_dump_flag]: 3.03e-06 [switch_simplify]: 7.93001e-06 [loop_unroll]: 6.11998e-06 [a_1]: 0.00013148 [with_stream_mark]: 1.964e-05 [recompute_prepare]: 6.76e-06 [updatestate_depend_eliminate]: 3.23998e-06 [updatestate_assign_eliminate]: 2.96001e-06 [updatestate_loads_eliminate]: 3.13e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.389e-05 [accelerated_algorithm]: 6.39999e-06 [shard]: 1.35001e-06 [meta_shard_fg_expand]: 2.24999e-06 [shard_inline]: 6.28998e-06 [merge_send_recv]: 7.1e-06 [auto_parallel]: 7.37997e-06 [parallel]: 7.31001e-06 [flash_sp]: 4.08001e-06 [merge_comm]: 4.08001e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 7.37002e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 8.27998e-06 [virtual_dataset]: 6.14001e-06 [get_grad_eliminate_]: 5.81e-06 [virtual_output]: 5.22e-06 [merge_forward]: 4.25999e-06 [cell_reuse_recompute_pass]: 2.21998e-06 [offload_activation]: 8.59e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.324e-05 [merge_recompute_call_nodes]: 1.14003e-06 [before_grad]: 9.82999e-06 [set_forward_comm_id_for_comm_node_pass]: 4.28001e-06 [meta_fg_expand]: 2.47001e-06 [flash_sp_send_recv_attached]: 1.23002e-06 [receive_attached]: 2.07999e-06 [after_resolve]: 1.044e-05 [a_after_grad]: 8.37e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.79e-06 [auto_monad_grad]: 1.97001e-06 [auto_monad_eliminator]: 1.024e-05 [cse]: 1.988e-05 [a_3]: 3.427e-05 [py_interpret_to_execute_after_opt_a]: 1.504e-05 [slice_cell_reuse_recomputed_activation]: 2.94999e-06 [rewriter_after_opt_a]: 4.328e-05 [convert_after_rewriter]: 8.00999e-06 [order_py_execute_after_rewriter]: 5.41002e-06 [mutable_eliminate]: 0.00078007 [opt_b]: 0.00020547, [1] [Cycle 1]: 0.00019473, [7] [b_1]: 0.00011469 [b_2]: 7.89997e-06 [updatestate_depend_eliminate]: 5.89999e-06 [updatestate_assign_eliminate]: 3.00998e-06 [updatestate_loads_eliminate]: 2.95002e-06 [renormalize]: 7.80012e-07 [cse]: 2.176e-05 [optimize_parallel_all_gather_comm]: 2.025e-05 [overlap_param_gather]: 1.91998e-06 [cconv]: 2.82e-05 [loop_unroll]: 0.00050705 [opt_after_cconv]: 0.00010283, [1] [Cycle 1]: 9.653e-05, [7] [c_1]: 2.769e-05 [parameter_eliminate]: 3.27002e-06 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.53e-06 [cse]: 1.99e-05 [renormalize]: 6.10016e-07 [remove_dup_value]: 1.624e-05 [tuple_transform]: 7.256e-05, [1] [Cycle 1]: 6.777e-05, [4] [d_1]: 3.978e-05 [none_parameter_eliminate]: 1.48002e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.78e-06 [partial_unused_args_eliminate]: 2.02001e-06 [add_recomputation]: 4.931e-05 [cse_after_recomputation]: 2.26e-05, [1] [Cycle 1]: 1.762e-05, [1] [cse]: 1.196e-05 [environ_conv]: 6.24999e-06 [swap_dp_allreduce_reducescatter]: 5.57999e-06 [bias_add_comm_swap]: 3.18e-06 [label_micro_interleaved_index]: 6.04001e-06 [label_fine_grained_interleaved_index]: 2.59999e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 3.00002e-06 [assign_add_opt]: 1.39e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.26e-06 [reorder_send_recv_between_fp_bp]: 3.03e-06 [comm_op_add_attrs]: 1.32999e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.38002e-06 [interleave_parallel_branches]: 1.19e-06 [overlap_opt_shard_in_pipeline]: 1.30999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.07999e-06 [control_data_broadcast_order]: 1.432e-05 [grouped_pairwise_exchange_alltoall]: 2.00002e-06 [offloading_packed_experts]: 4.60999e-06 [overlap_recompute_and_grad_model_parallel]: 5.61e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.54e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.37999e-06 [overlap_grad_ring_attention]: 5.12e-06 [overlap_grad_flash_sp]: 2.342e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 3.04001e-06 [split_layernorm_comm]: 2.49001e-06 [handle_group_info]: 1.11997e-06 [symbol_engine_optimizer]: 7.922e-05, [1] [Cycle 1]: 7.407e-05, [6] [build]: 3.53999e-06 [elim_shapecalc]: 9.32001e-06 [elim_not_effective]: 1.327e-05 [opt_reshape]: 7.39002e-06 [fold_const_symbol]: 9.94999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 2.09999e-06 [pipeline_parallel_scheduler]: 1.67999e-06 [auto_monad_reorder]: 1.681e-05 [get_jit_bprop_graph]: 1.54e-06 [rewriter_after_jit_bprop_graph]: 4.60001e-06 [opt_after_jit_grad]: 0.00049468 [validate]: 4.849e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.337565 [execute]: 8.90999e-06 Sums bootstrap : 0.000451s : 0.13% type_inference : 0.006676s : 1.91% event_method : 0.000014s : 0.00% auto_monad : 0.000064s : 0.02% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000033s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000028s : 0.01% optimize.rewriter_before_opt_a : 0.000066s : 0.02% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.01% optimize.opt_a.loop_unroll : 0.000023s : 0.01% optimize.opt_a.a_1 : 0.000538s : 0.15% optimize.opt_a.with_stream_mark : 0.000040s : 0.01% optimize.opt_a.recompute_prepare : 0.000016s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000159s : 0.05% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000017s : 0.00% optimize.opt_a.auto_parallel : 0.000015s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.01% optimize.opt_a.flash_sp : 0.000014s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000018s : 0.01% optimize.opt_a.virtual_dataset : 0.000013s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000019s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000026s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.01% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000624s : 0.18% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.00% optimize.opt_a.auto_monad_grad : 0.000005s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000028s : 0.01% optimize.opt_a.cse : 0.000055s : 0.02% optimize.opt_a.a_3 : 0.000084s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000043s : 0.01% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000780s : 0.22% optimize.opt_b.b_1 : 0.000115s : 0.03% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000022s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000028s : 0.01% optimize.loop_unroll : 0.000507s : 0.15% optimize.opt_after_cconv.c_1 : 0.000028s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000020s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.00% optimize.tuple_transform.d_1 : 0.000040s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000495s : 0.14% validate : 0.000048s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.337565s : 96.61% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000194 24 20.97% : 0.000041s : 4: substitution.arithmetic_simplify 1.01% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000002s : 2: substitution.fold_const_symbol 3.40% : 0.000007s : 3: substitution.graph_param_transform 66.39% : 0.000129s : 3: substitution.inline 2.26% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.75% : 0.000005s : 4: substitution.remove_not_recompute_node 2.43% : 0.000005s : 2: substitution.replace_old_param ------[type_inference.] 0.006623 2 91.76% : 0.006078s : 1: type_inference.infer 8.24% : 0.000545s : 1: type_inference.specialize ------[replace.] 0.000034 3 100.00% : 0.000034s : 3: replace.inline ------[match.] 0.000127 3 100.00% : 0.000127s : 3: match.inline ------[predicate.] 0.000162 815 0.80% : 0.000001s : 8: predicate.accumulaten_eliminater 0.90% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.82% : 0.000001s : 8: predicate.addn_zero_filter 0.74% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.18% : 0.000004s : 14: predicate.arithmetic_simplify 0.83% : 0.000001s : 8: predicate.cast_eliminate 0.68% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.62% : 0.000001s : 6: predicate.depend_value_elim 0.79% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.53% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.35% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.19% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 11: predicate.environ_get_depend_swap 1.65% : 0.000003s : 17: predicate.environ_get_eliminate 1.02% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.19% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.27% : 0.000004s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 6: predicate.float_environ_get_switch 0.93% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.72% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.72% : 0.000001s : 6: predicate.incorporate_call 0.57% : 0.000001s : 6: predicate.incorporate_call_switch 6.19% : 0.000010s : 37: predicate.inline 1.05% : 0.000002s : 6: predicate.inline_without_move 0.38% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.07% : 0.000002s : 6: predicate.less_batch_normalization 1.97% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.25% : 0.000004s : 22: predicate.load_eliminater 1.09% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.82% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.94% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.75% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 8: predicate.minmaximum_grad 1.93% : 0.000003s : 3: predicate.mutable_eliminate 0.53% : 0.000001s : 3: predicate.opt_reshape 0.45% : 0.000001s : 3: predicate.parallel_virtual_node 1.40% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 11: predicate.partial_eliminate 0.96% : 0.000002s : 8: predicate.print_const_string_wrapper 0.85% : 0.000001s : 6: predicate.reduce_all_const_elim 1.32% : 0.000002s : 8: predicate.reduce_eliminate 2.42% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 6: predicate.remove_not_recompute_node 1.07% : 0.000002s : 14: predicate.replace_applicator 1.02% : 0.000002s : 6: predicate.replace_old_param 0.41% : 0.000001s : 3: predicate.reset_defer_inline 0.90% : 0.000001s : 8: predicate.reshape_eliminate 0.66% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.54% : 0.000001s : 3: predicate.row_tensor_eliminate 0.78% : 0.000001s : 6: predicate.same_eliminate 0.44% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 6: predicate.shard_identity_eliminate 0.75% : 0.000001s : 6: predicate.special_op_eliminate 0.81% : 0.000001s : 6: predicate.specialize_transform 1.08% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.16% : 0.000002s : 11: predicate.switch_defer_inline 1.76% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.47% : 0.000007s : 38: predicate.switch_simplify 0.91% : 0.000001s : 8: predicate.tile_eliminate 0.84% : 0.000001s : 8: predicate.transpose_eliminate 1.45% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.63% : 0.000003s : 14: predicate.tuple_list_get_item_depend_reorder 3.59% : 0.000006s : 20: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.02% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.78% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 3: predicate.value_based_eliminate 0.69% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000399 7 36.42% : 0.000145s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.58% : 0.000254s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.364511 196 0.00% : 0.000004s : 1: ForceFp32Comm 1.01% : 0.003669s : 1: add_attr 1.00% : 0.003655s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000070s : 1: auto_monad 0.01% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.13% : 0.000486s : 1: bootstrap 0.01% : 0.000032s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000018s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000010s : 1: environ_conv 0.01% : 0.000020s : 1: event_method 0.00% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.14% : 0.000517s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.22% : 0.000796s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000024s : 1: opt.transform.mutable_eliminate 0.26% : 0.000931s : 78: opt.transform.opt_a 0.01% : 0.000026s : 1: opt.transform.opt_after_cconv 0.01% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000091s : 28: opt.transform.opt_b 0.01% : 0.000045s : 2: opt.transform.opt_trans_graph 0.01% : 0.000036s : 4: opt.transform.symbol_engine_opt 0.69% : 0.002502s : 1: opt_a 0.03% : 0.000106s : 1: opt_after_cconv 0.14% : 0.000507s : 1: opt_after_jit_grad 0.06% : 0.000209s : 1: opt_b 1.35% : 0.004919s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000038s : 1: pre_auto_parallel 0.01% : 0.000033s : 1: py_interpret_to_execute 0.01% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000020s : 1: remove_dup_value 0.08% : 0.000298s : 1: renormalize.infer 0.09% : 0.000317s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000050s : 1: rewriter_after_opt_a 0.02% : 0.000071s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000082s : 1: symbol_engine_optimizer 92.61% : 0.337585s : 1: task_emit 0.02% : 0.000076s : 1: tuple_transform 1.84% : 0.006700s : 1: type_inference 0.02% : 0.000082s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x4-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x4-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x5-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x5-pynative],max_mem:6.0M TotalTime = 0.0249007, [24] [bootstrap]: 0.0005138 [type_inference]: 0.00692571 [event_method]: 1.549e-05 [auto_monad]: 6.166e-05 [graph_reusing]: 5.64e-06 [inline]: 3.39001e-06 [add_attr]: 0.00429093, [1] [add_attr_with_inline]: 0.00427602, [1] [Cycle 1]: 6.966e-05, [2] [tag_attr]: 2.072e-05 [meta_addattr_fg_expand]: 4.32e-06 [parallel-infer-symbol]: 3.4e-06 [pre_auto_parallel]: 3.494e-05 [insert-virtual-dataset]: 2.57001e-06 [parallel-infer-symbol-second]: 8.59989e-07 [dataset_repeat_opt]: 2.29999e-06 [pipeline_split]: 1.97001e-06 [optimize]: 0.00492755, [53] [py_interpret_to_execute]: 2.814e-05 [rewriter_before_opt_a]: 7.441e-05 [opt_a]: 0.00260122, [2] [Cycle 1]: 0.0019151, [45] [expand_dump_flag]: 2.99001e-06 [switch_simplify]: 3.591e-05 [loop_unroll]: 2.142e-05 [a_1]: 0.00046399 [with_stream_mark]: 1.747e-05 [recompute_prepare]: 9.57001e-06 [updatestate_depend_eliminate]: 3.98999e-06 [updatestate_assign_eliminate]: 3.66999e-06 [updatestate_loads_eliminate]: 3.14001e-06 [parameter_eliminate]: 2.43998e-06 [a_2]: 8.453e-05 [accelerated_algorithm]: 7.12002e-06 [shard]: 2.21e-06 [meta_shard_fg_expand]: 2.03997e-06 [shard_inline]: 7.01001e-06 [merge_send_recv]: 9.66e-06 [auto_parallel]: 7.37997e-06 [parallel]: 2.94e-05 [flash_sp]: 9.72001e-06 [merge_comm]: 4.49998e-06 [allreduce_fusion]: 3.53e-06 [matmul_add_comm_reduction]: 1.017e-05 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 9.31e-06 [virtual_dataset]: 6.49001e-06 [get_grad_eliminate_]: 5.74e-06 [virtual_output]: 6.56999e-06 [merge_forward]: 3.95998e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 1.035e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.425e-05 [merge_recompute_call_nodes]: 1.87001e-06 [before_grad]: 1.182e-05 [set_forward_comm_id_for_comm_node_pass]: 4.31002e-06 [meta_fg_expand]: 2.84999e-06 [flash_sp_send_recv_attached]: 2.75997e-06 [receive_attached]: 2.73e-06 [after_resolve]: 1.032e-05 [a_after_grad]: 9.13002e-06 [renormalize]: 0.00068482 [add_forward_monad_depend]: 1.091e-05 [auto_monad_grad]: 3.14001e-06 [auto_monad_eliminator]: 1.461e-05 [cse]: 3.183e-05 [a_3]: 4.912e-05 [Cycle 2]: 0.00067312, [45] [expand_dump_flag]: 1.73997e-06 [switch_simplify]: 7.14001e-06 [loop_unroll]: 6.28e-06 [a_1]: 0.00012738 [with_stream_mark]: 1.302e-05 [recompute_prepare]: 6.32001e-06 [updatestate_depend_eliminate]: 3.46999e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 3.01999e-06 [parameter_eliminate]: 1.61002e-06 [a_2]: 7.379e-05 [accelerated_algorithm]: 6.04999e-06 [shard]: 1.21002e-06 [meta_shard_fg_expand]: 1.98002e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 6.12001e-06 [auto_parallel]: 7.78001e-06 [parallel]: 7.26001e-06 [flash_sp]: 3.68e-06 [merge_comm]: 3.40003e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 8.27e-06 [allreduce_slice_to_reducescatter]: 4.69998e-07 [virtual_shard_identity]: 7.20998e-06 [virtual_dataset]: 5.82999e-06 [get_grad_eliminate_]: 5.54998e-06 [virtual_output]: 5.89999e-06 [merge_forward]: 3.67998e-06 [cell_reuse_recompute_pass]: 2.32999e-06 [offload_activation]: 9.19998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.323e-05 [merge_recompute_call_nodes]: 1.26997e-06 [before_grad]: 1.051e-05 [set_forward_comm_id_for_comm_node_pass]: 4.03001e-06 [meta_fg_expand]: 2.44999e-06 [flash_sp_send_recv_attached]: 1.20999e-06 [receive_attached]: 1.31002e-06 [after_resolve]: 9.44998e-06 [a_after_grad]: 7.61001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.57999e-06 [auto_monad_grad]: 1.35999e-06 [auto_monad_eliminator]: 6.93998e-06 [cse]: 2.092e-05 [a_3]: 3.471e-05 [py_interpret_to_execute_after_opt_a]: 1.291e-05 [slice_cell_reuse_recomputed_activation]: 2.74001e-06 [rewriter_after_opt_a]: 3.906e-05 [convert_after_rewriter]: 7.3e-06 [order_py_execute_after_rewriter]: 5.89999e-06 [mutable_eliminate]: 0.00066475 [opt_b]: 0.00020582, [1] [Cycle 1]: 0.00019664, [7] [b_1]: 0.0001138 [b_2]: 8.05999e-06 [updatestate_depend_eliminate]: 6.63998e-06 [updatestate_assign_eliminate]: 3.04001e-06 [updatestate_loads_eliminate]: 2.94999e-06 [renormalize]: 5.8001e-07 [cse]: 2.269e-05 [optimize_parallel_all_gather_comm]: 1.84e-05 [overlap_param_gather]: 2.15002e-06 [cconv]: 3.289e-05 [loop_unroll]: 0.00052737 [opt_after_cconv]: 0.00010446, [1] [Cycle 1]: 9.775e-05, [7] [c_1]: 2.583e-05 [parameter_eliminate]: 3.86001e-06 [updatestate_depend_eliminate]: 5.54e-06 [updatestate_assign_eliminate]: 2.73e-06 [updatestate_loads_eliminate]: 2.30002e-06 [cse]: 2.123e-05 [renormalize]: 7.2e-07 [remove_dup_value]: 1.533e-05 [tuple_transform]: 7.224e-05, [1] [Cycle 1]: 6.715e-05, [4] [d_1]: 3.964e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 2.59985e-07 [switch_simplify]: 6.47001e-06 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 5.645e-05 [cse_after_recomputation]: 2.389e-05, [1] [Cycle 1]: 1.894e-05, [1] [cse]: 1.282e-05 [environ_conv]: 1.03e-05 [swap_dp_allreduce_reducescatter]: 5.49e-06 [bias_add_comm_swap]: 3.01001e-06 [label_micro_interleaved_index]: 5.82001e-06 [label_fine_grained_interleaved_index]: 2.63998e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.04e-06 [assign_add_opt]: 1.52001e-06 [ForceFp32Comm]: 9.20001e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.24001e-06 [reorder_send_recv_between_fp_bp]: 3.16999e-06 [comm_op_add_attrs]: 1.17e-06 [add_comm_op_reuse_tag]: 1.37e-06 [interleave_split_concat_branches]: 1.44e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 2.02999e-06 [control_data_broadcast_order]: 1.423e-05 [grouped_pairwise_exchange_alltoall]: 1.57999e-06 [offloading_packed_experts]: 4.58999e-06 [overlap_recompute_and_grad_model_parallel]: 4.95999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.26997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 4.10998e-06 [overlap_grad_flash_sp]: 2.15e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.31998e-06 [split_layernorm_comm]: 1.80001e-06 [handle_group_info]: 1.13001e-06 [symbol_engine_optimizer]: 7.869e-05, [1] [Cycle 1]: 7.36e-05, [6] [build]: 3.6e-06 [elim_shapecalc]: 9.77001e-06 [elim_not_effective]: 1.346e-05 [opt_reshape]: 6.24999e-06 [fold_const_symbol]: 1.023e-05 [renormalize]: 4.2998e-07 [detach_backward]: 2.41e-06 [pipeline_parallel_scheduler]: 1.60001e-06 [auto_monad_reorder]: 1.738e-05 [get_jit_bprop_graph]: 1.94999e-06 [rewriter_after_jit_bprop_graph]: 4.17e-06 [opt_after_jit_grad]: 0.00049938 [validate]: 4.267e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.00730847 [execute]: 8.59e-06 Sums bootstrap : 0.000514s : 2.63% type_inference : 0.006926s : 35.50% event_method : 0.000015s : 0.08% auto_monad : 0.000062s : 0.32% graph_reusing : 0.000006s : 0.03% inline : 0.000003s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000021s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000035s : 0.18% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000028s : 0.14% optimize.rewriter_before_opt_a : 0.000074s : 0.38% optimize.opt_a.expand_dump_flag : 0.000005s : 0.02% optimize.opt_a.switch_simplify : 0.000043s : 0.22% optimize.opt_a.loop_unroll : 0.000028s : 0.14% optimize.opt_a.a_1 : 0.000591s : 3.03% optimize.opt_a.with_stream_mark : 0.000030s : 0.16% optimize.opt_a.recompute_prepare : 0.000016s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000004s : 0.02% optimize.opt_a.a_2 : 0.000158s : 0.81% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.07% optimize.opt_a.merge_send_recv : 0.000016s : 0.08% optimize.opt_a.auto_parallel : 0.000015s : 0.08% optimize.opt_a.parallel : 0.000037s : 0.19% optimize.opt_a.flash_sp : 0.000013s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000017s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000012s : 0.06% optimize.opt_a.merge_forward : 0.000008s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.02% optimize.opt_a.offload_activation : 0.000020s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000027s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000022s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000685s : 3.51% optimize.opt_a.add_forward_monad_depend : 0.000012s : 0.06% optimize.opt_a.auto_monad_grad : 0.000005s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.11% optimize.opt_a.cse : 0.000053s : 0.27% optimize.opt_a.a_3 : 0.000084s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.01% optimize.rewriter_after_opt_a : 0.000039s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000665s : 3.41% optimize.opt_b.b_1 : 0.000114s : 0.58% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000023s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000033s : 0.17% optimize.loop_unroll : 0.000527s : 2.70% optimize.opt_after_cconv.c_1 : 0.000026s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000021s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.29% optimize.cse_after_recomputation.cse : 0.000013s : 0.07% optimize.environ_conv : 0.000010s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000006s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000021s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000499s : 2.56% validate : 0.000043s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.007308s : 37.46% execute : 0.000009s : 0.04% Time group info: ------[substitution.] 0.000194 26 20.37% : 0.000039s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.36% : 0.000007s : 3: substitution.graph_param_transform 62.33% : 0.000121s : 3: substitution.inline 2.44% : 0.000005s : 4: substitution.j_node_and_user_rematch 2.94% : 0.000006s : 4: substitution.remove_not_recompute_node 1.98% : 0.000004s : 2: substitution.replace_old_param 4.78% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006859 2 89.76% : 0.006157s : 1: type_inference.infer 10.24% : 0.000702s : 1: type_inference.specialize ------[replace.] 0.000039 4 77.98% : 0.000030s : 3: replace.inline 22.02% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000127 4 93.36% : 0.000119s : 3: match.inline 6.64% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000166 883 0.90% : 0.000001s : 9: predicate.accumulaten_eliminater 0.99% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.53% : 0.000001s : 6: predicate.addn_check_dump 0.93% : 0.000002s : 9: predicate.addn_zero_filter 0.79% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.29% : 0.000004s : 15: predicate.arithmetic_simplify 0.93% : 0.000002s : 9: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.70% : 0.000001s : 6: predicate.compare_switch_simplify 0.18% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.99% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.91% : 0.000002s : 9: predicate.dict_set_item_eliminator 0.97% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.51% : 0.000001s : 3: predicate.elim_not_effective 0.45% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.34% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 12: predicate.environ_get_depend_swap 1.69% : 0.000003s : 18: predicate.environ_get_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.40% : 0.000004s : 13: predicate.float_depend_g_call 0.63% : 0.000001s : 6: predicate.float_environ_get_switch 0.85% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.66% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.63% : 0.000001s : 6: predicate.incorporate_call 0.54% : 0.000001s : 6: predicate.incorporate_call_switch 6.22% : 0.000010s : 40: predicate.inline 0.87% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 6: predicate.less_batch_normalization 1.62% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 25: predicate.load_eliminater 1.02% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.17% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.53% : 0.000001s : 6: predicate.merge_addn 0.66% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.58% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 9: predicate.minmaximum_grad 1.13% : 0.000002s : 3: predicate.mutable_eliminate 0.32% : 0.000001s : 3: predicate.opt_reshape 0.43% : 0.000001s : 3: predicate.parallel_virtual_node 1.50% : 0.000002s : 13: predicate.partial_defer_inline 1.43% : 0.000002s : 13: predicate.partial_eliminate 0.82% : 0.000001s : 9: predicate.print_const_string_wrapper 0.60% : 0.000001s : 6: predicate.reduce_all_const_elim 1.11% : 0.000002s : 9: predicate.reduce_eliminate 2.46% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 6: predicate.remove_not_recompute_node 1.49% : 0.000002s : 16: predicate.replace_applicator 0.70% : 0.000001s : 6: predicate.replace_old_param 0.39% : 0.000001s : 3: predicate.reset_defer_inline 0.88% : 0.000001s : 9: predicate.reshape_eliminate 0.74% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.53% : 0.000001s : 3: predicate.row_tensor_eliminate 0.79% : 0.000001s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.91% : 0.000002s : 6: predicate.shard_identity_eliminate 0.76% : 0.000001s : 6: predicate.special_op_eliminate 0.77% : 0.000001s : 6: predicate.specialize_transform 0.88% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.73% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 13: predicate.switch_defer_inline 1.84% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.27% : 0.000009s : 43: predicate.switch_simplify 0.86% : 0.000001s : 9: predicate.tile_eliminate 0.85% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.43% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.71% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.24% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.99% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.62% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000486 8 43.50% : 0.000211s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.50% : 0.000275s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.035937 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.96% : 0.004297s : 1: add_attr 11.91% : 0.004280s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000061s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.19% : 0.000067s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.51% : 0.000544s : 1: bootstrap 0.10% : 0.000037s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000018s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000027s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.04% : 0.000014s : 1: environ_conv 0.06% : 0.000022s : 1: event_method 0.04% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000007s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000009s : 1: label_micro_interleaved_index 1.49% : 0.000537s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.88% : 0.000676s : 1: mutable_eliminate 0.02% : 0.000008s : 1: offloading_packed_experts 0.04% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000017s : 1: opt.transform.mutable_eliminate 2.75% : 0.000989s : 78: opt.transform.opt_a 0.07% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.25% : 0.000091s : 28: opt.transform.opt_b 0.12% : 0.000044s : 2: opt.transform.opt_trans_graph 0.10% : 0.000036s : 4: opt.transform.symbol_engine_opt 7.25% : 0.002605s : 1: opt_a 0.30% : 0.000108s : 1: opt_after_cconv 1.42% : 0.000510s : 1: opt_after_jit_grad 0.58% : 0.000209s : 1: opt_b 13.73% : 0.004933s : 1: optimize 0.06% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000025s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000005s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000040s : 1: pre_auto_parallel 0.09% : 0.000033s : 1: py_interpret_to_execute 0.05% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000019s : 1: remove_dup_value 0.99% : 0.000355s : 1: renormalize.infer 0.89% : 0.000321s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000044s : 1: rewriter_after_opt_a 0.22% : 0.000079s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000082s : 1: symbol_engine_optimizer 20.38% : 0.007322s : 1: task_emit 0.21% : 0.000075s : 1: tuple_transform 19.33% : 0.006947s : 1: type_inference 0.20% : 0.000073s : 1: validate TotalTime = 0.0242307, [24] [bootstrap]: 0.00044972 [type_inference]: 0.0069453 [event_method]: 1.573e-05 [auto_monad]: 6.873e-05 [graph_reusing]: 5.91998e-06 [inline]: 2.70997e-06 [add_attr]: 0.0036481, [1] [add_attr_with_inline]: 0.00363495, [1] [Cycle 1]: 6.73e-05, [2] [tag_attr]: 1.807e-05 [meta_addattr_fg_expand]: 4.38999e-06 [parallel-infer-symbol]: 4.05e-06 [pre_auto_parallel]: 3.483e-05 [insert-virtual-dataset]: 3.54002e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.29001e-06 [pipeline_split]: 1.80001e-06 [optimize]: 0.00494572, [53] [py_interpret_to_execute]: 2.586e-05 [rewriter_before_opt_a]: 6.514e-05 [opt_a]: 0.0025411, [2] [Cycle 1]: 0.0018855, [45] [expand_dump_flag]: 3.29001e-06 [switch_simplify]: 3.144e-05 [loop_unroll]: 1.713e-05 [a_1]: 0.00041159 [with_stream_mark]: 2.228e-05 [recompute_prepare]: 9.58002e-06 [updatestate_depend_eliminate]: 4.52e-06 [updatestate_assign_eliminate]: 3.54002e-06 [updatestate_loads_eliminate]: 3.68999e-06 [parameter_eliminate]: 2.14999e-06 [a_2]: 9.031e-05 [accelerated_algorithm]: 7.00998e-06 [shard]: 2.65002e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 6.24001e-06 [merge_send_recv]: 8.85001e-06 [auto_parallel]: 8.15999e-06 [parallel]: 2.103e-05 [flash_sp]: 9.36998e-06 [merge_comm]: 4.22998e-06 [allreduce_fusion]: 3.86999e-06 [matmul_add_comm_reduction]: 1.049e-05 [allreduce_slice_to_reducescatter]: 7.60017e-07 [virtual_shard_identity]: 9.19e-06 [virtual_dataset]: 7.26001e-06 [get_grad_eliminate_]: 6.27001e-06 [virtual_output]: 6.12999e-06 [merge_forward]: 4.17e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 1.198e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.312e-05 [merge_recompute_call_nodes]: 1.63002e-06 [before_grad]: 1.103e-05 [set_forward_comm_id_for_comm_node_pass]: 4.05e-06 [meta_fg_expand]: 2.91e-06 [flash_sp_send_recv_attached]: 2.71e-06 [receive_attached]: 2.26e-06 [after_resolve]: 1.074e-05 [a_after_grad]: 9.94001e-06 [renormalize]: 0.00072794 [add_forward_monad_depend]: 6.67002e-06 [auto_monad_grad]: 3.03e-06 [auto_monad_eliminator]: 1.549e-05 [cse]: 3.447e-05 [a_3]: 4.975e-05 [Cycle 2]: 0.00064371, [45] [expand_dump_flag]: 1.34e-06 [switch_simplify]: 7.80998e-06 [loop_unroll]: 6.20002e-06 [a_1]: 0.00012582 [with_stream_mark]: 1.309e-05 [recompute_prepare]: 7.26001e-06 [updatestate_depend_eliminate]: 3.71999e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.91999e-06 [parameter_eliminate]: 1.23002e-06 [a_2]: 7.106e-05 [accelerated_algorithm]: 5.62999e-06 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 5.64e-06 [auto_parallel]: 5.95002e-06 [parallel]: 6.34999e-06 [flash_sp]: 3.42002e-06 [merge_comm]: 3.72002e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 5.84999e-06 [allreduce_slice_to_reducescatter]: 4.89992e-07 [virtual_shard_identity]: 6.33e-06 [virtual_dataset]: 5.49e-06 [get_grad_eliminate_]: 5.74999e-06 [virtual_output]: 5.36998e-06 [merge_forward]: 3.25e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 7.64002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.165e-05 [merge_recompute_call_nodes]: 9.39996e-07 [before_grad]: 9.64e-06 [set_forward_comm_id_for_comm_node_pass]: 4.17003e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.96e-06 [after_resolve]: 9.37001e-06 [a_after_grad]: 8.06001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.84e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 7.43e-06 [cse]: 1.519e-05 [a_3]: 3.309e-05 [py_interpret_to_execute_after_opt_a]: 1.049e-05 [slice_cell_reuse_recomputed_activation]: 2.98e-06 [rewriter_after_opt_a]: 3.937e-05 [convert_after_rewriter]: 7.03998e-06 [order_py_execute_after_rewriter]: 5.48002e-06 [mutable_eliminate]: 0.00068307 [opt_b]: 0.00022128, [1] [Cycle 1]: 0.00021282, [7] [b_1]: 0.0001214 [b_2]: 8.35999e-06 [updatestate_depend_eliminate]: 8.47998e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 3.06001e-06 [renormalize]: 8.09989e-07 [cse]: 2.797e-05 [optimize_parallel_all_gather_comm]: 1.945e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 3.388e-05 [loop_unroll]: 0.00056548 [opt_after_cconv]: 0.00010882, [1] [Cycle 1]: 0.00010172, [7] [c_1]: 2.759e-05 [parameter_eliminate]: 4.48001e-06 [updatestate_depend_eliminate]: 6.51e-06 [updatestate_assign_eliminate]: 2.67001e-06 [updatestate_loads_eliminate]: 2.99001e-06 [cse]: 2.077e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.655e-05 [tuple_transform]: 7.814e-05, [1] [Cycle 1]: 7.308e-05, [4] [d_1]: 4.277e-05 [none_parameter_eliminate]: 1.78997e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 7.66001e-06 [partial_unused_args_eliminate]: 2.00002e-06 [add_recomputation]: 5.482e-05 [cse_after_recomputation]: 2.339e-05, [1] [Cycle 1]: 1.821e-05, [1] [cse]: 1.233e-05 [environ_conv]: 6.58e-06 [swap_dp_allreduce_reducescatter]: 5.91998e-06 [bias_add_comm_swap]: 3.01001e-06 [label_micro_interleaved_index]: 5.56e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.67001e-06 [assign_add_opt]: 1.79e-06 [ForceFp32Comm]: 9.39996e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.46e-06 [reorder_send_recv_between_fp_bp]: 3.18998e-06 [comm_op_add_attrs]: 1.10999e-06 [add_comm_op_reuse_tag]: 1.04998e-06 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.66002e-06 [overlap_opt_shard_grad_in_pipeline]: 2.13002e-06 [control_data_broadcast_order]: 1.49e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 4.87e-06 [overlap_recompute_and_grad_model_parallel]: 5.29e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.63002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.57001e-06 [overlap_grad_ring_attention]: 4.66002e-06 [overlap_grad_flash_sp]: 2.294e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.49001e-06 [split_layernorm_comm]: 1.77999e-06 [handle_group_info]: 1.20999e-06 [symbol_engine_optimizer]: 8e-05, [1] [Cycle 1]: 7.524e-05, [6] [build]: 3.79002e-06 [elim_shapecalc]: 1.149e-05 [elim_not_effective]: 1.364e-05 [opt_reshape]: 6.27001e-06 [fold_const_symbol]: 9.96e-06 [renormalize]: 1.69995e-07 [detach_backward]: 2.15002e-06 [pipeline_parallel_scheduler]: 1.73002e-06 [auto_monad_reorder]: 1.731e-05 [get_jit_bprop_graph]: 2.19001e-06 [rewriter_after_jit_bprop_graph]: 5.50001e-06 [opt_after_jit_grad]: 0.00053037 [validate]: 4.681e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.00721562 [execute]: 1.055e-05 Sums bootstrap : 0.000450s : 2.31% type_inference : 0.006945s : 35.72% event_method : 0.000016s : 0.08% auto_monad : 0.000069s : 0.35% graph_reusing : 0.000006s : 0.03% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000035s : 0.18% insert-virtual-dataset : 0.000004s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000026s : 0.13% optimize.rewriter_before_opt_a : 0.000065s : 0.33% optimize.opt_a.expand_dump_flag : 0.000005s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.20% optimize.opt_a.loop_unroll : 0.000023s : 0.12% optimize.opt_a.a_1 : 0.000537s : 2.76% optimize.opt_a.with_stream_mark : 0.000035s : 0.18% optimize.opt_a.recompute_prepare : 0.000017s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000161s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.06% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000014s : 0.07% optimize.opt_a.auto_parallel : 0.000014s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.14% optimize.opt_a.flash_sp : 0.000013s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.08% optimize.opt_a.virtual_dataset : 0.000013s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000020s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000021s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.10% optimize.opt_a.a_after_grad : 0.000018s : 0.09% optimize.opt_a.renormalize : 0.000728s : 3.74% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.12% optimize.opt_a.cse : 0.000050s : 0.26% optimize.opt_a.a_3 : 0.000083s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.02% optimize.rewriter_after_opt_a : 0.000039s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000683s : 3.51% optimize.opt_b.b_1 : 0.000121s : 0.62% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000028s : 0.14% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000034s : 0.17% optimize.loop_unroll : 0.000565s : 2.91% optimize.opt_after_cconv.c_1 : 0.000028s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.cse : 0.000021s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000017s : 0.09% optimize.tuple_transform.d_1 : 0.000043s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000055s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000007s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000006s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000015s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000006s : 0.03% opt_after_jit_grad : 0.000530s : 2.73% validate : 0.000047s : 0.24% backend_pass : 0.000001s : 0.01% task_emit : 0.007216s : 37.11% execute : 0.000011s : 0.05% Time group info: ------[substitution.] 0.000184 24 19.47% : 0.000036s : 4: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.37% : 0.000006s : 3: substitution.graph_param_transform 68.06% : 0.000125s : 3: substitution.inline 2.35% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.82% : 0.000005s : 4: substitution.remove_not_recompute_node 1.99% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.006881 2 91.35% : 0.006285s : 1: type_inference.infer 8.65% : 0.000595s : 1: type_inference.specialize ------[replace.] 0.000032 3 100.00% : 0.000032s : 3: replace.inline ------[match.] 0.000123 3 100.00% : 0.000123s : 3: match.inline ------[predicate.] 0.000163 815 0.88% : 0.000001s : 8: predicate.accumulaten_eliminater 0.82% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 8: predicate.addn_zero_filter 0.95% : 0.000002s : 8: predicate.adjust_all_reduce_mul_add 2.78% : 0.000005s : 14: predicate.arithmetic_simplify 1.07% : 0.000002s : 8: predicate.cast_eliminate 0.66% : 0.000001s : 6: predicate.check_bprop_eliminate 0.59% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.81% : 0.000001s : 6: predicate.depend_value_elim 0.78% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.51% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_add_eliminate 0.99% : 0.000002s : 11: predicate.environ_get_depend_swap 1.76% : 0.000003s : 17: predicate.environ_get_eliminate 1.12% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.14% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 6: predicate.float_environ_get_switch 0.87% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.81% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.60% : 0.000001s : 6: predicate.incorporate_call_switch 6.47% : 0.000011s : 37: predicate.inline 0.97% : 0.000002s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 6: predicate.less_batch_normalization 1.50% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.10% : 0.000003s : 22: predicate.load_eliminater 1.15% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.89% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.88% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 8: predicate.minmaximum_grad 1.94% : 0.000003s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.41% : 0.000001s : 3: predicate.parallel_virtual_node 1.41% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 11: predicate.partial_eliminate 0.84% : 0.000001s : 8: predicate.print_const_string_wrapper 0.61% : 0.000001s : 6: predicate.reduce_all_const_elim 1.23% : 0.000002s : 8: predicate.reduce_eliminate 2.28% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 6: predicate.remove_not_recompute_node 1.09% : 0.000002s : 14: predicate.replace_applicator 0.66% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.97% : 0.000002s : 8: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 3: predicate.row_tensor_eliminate 0.87% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 6: predicate.shard_identity_eliminate 0.79% : 0.000001s : 6: predicate.special_op_eliminate 1.17% : 0.000002s : 6: predicate.specialize_transform 1.03% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.22% : 0.000002s : 11: predicate.switch_defer_inline 1.98% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.51% : 0.000007s : 38: predicate.switch_simplify 0.95% : 0.000002s : 8: predicate.tile_eliminate 0.82% : 0.000001s : 8: predicate.transpose_eliminate 1.60% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.35% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.41% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.58% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.90% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.73% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.66% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000397 7 33.61% : 0.000133s : 2: func_graph_cloner_run.FuncGraphClonerGraph 66.39% : 0.000263s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.034635 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.55% : 0.003656s : 1: add_attr 10.51% : 0.003640s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000059s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.21% : 0.000074s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.41% : 0.000489s : 1: bootstrap 0.11% : 0.000038s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000018s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000027s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000010s : 1: environ_conv 0.07% : 0.000024s : 1: event_method 0.05% : 0.000018s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000006s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000009s : 1: label_micro_interleaved_index 1.67% : 0.000578s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 2.02% : 0.000699s : 1: mutable_eliminate 0.02% : 0.000008s : 1: offloading_packed_experts 0.04% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000018s : 1: opt.transform.mutable_eliminate 2.68% : 0.000927s : 78: opt.transform.opt_a 0.07% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000097s : 28: opt.transform.opt_b 0.14% : 0.000048s : 2: opt.transform.opt_trans_graph 0.11% : 0.000037s : 4: opt.transform.symbol_engine_opt 7.35% : 0.002544s : 1: opt_a 0.32% : 0.000112s : 1: opt_after_cconv 1.56% : 0.000541s : 1: opt_after_jit_grad 0.65% : 0.000225s : 1: opt_b 14.30% : 0.004951s : 1: optimize 0.07% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000006s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.12% : 0.000040s : 1: pre_auto_parallel 0.09% : 0.000030s : 1: py_interpret_to_execute 0.04% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000021s : 1: remove_dup_value 1.17% : 0.000404s : 1: renormalize.infer 0.91% : 0.000314s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000044s : 1: rewriter_after_opt_a 0.20% : 0.000070s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000083s : 1: symbol_engine_optimizer 20.89% : 0.007237s : 1: task_emit 0.23% : 0.000081s : 1: tuple_transform 20.14% : 0.006977s : 1: type_inference 0.25% : 0.000087s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x5-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x5-kbk],max_mem:6.0M TotalTime = 0.388749, [24] [bootstrap]: 0.00062621 [type_inference]: 0.00790236 [event_method]: 1.777e-05 [auto_monad]: 6.53e-05 [graph_reusing]: 5.72001e-06 [inline]: 2.88e-06 [add_attr]: 0.0042524, [1] [add_attr_with_inline]: 0.00423586, [1] [Cycle 1]: 7.175e-05, [2] [tag_attr]: 2.18e-05 [meta_addattr_fg_expand]: 4.46002e-06 [parallel-infer-symbol]: 4.23001e-06 [pre_auto_parallel]: 3.657e-05 [insert-virtual-dataset]: 2.93e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.26998e-06 [pipeline_split]: 1.76998e-06 [optimize]: 0.00521077, [53] [py_interpret_to_execute]: 2.841e-05 [rewriter_before_opt_a]: 7.48e-05 [opt_a]: 0.00268424, [2] [Cycle 1]: 0.00201036, [45] [expand_dump_flag]: 3.06001e-06 [switch_simplify]: 3.592e-05 [loop_unroll]: 2.048e-05 [a_1]: 0.0004897 [with_stream_mark]: 1.825e-05 [recompute_prepare]: 8.26002e-06 [updatestate_depend_eliminate]: 4.08001e-06 [updatestate_assign_eliminate]: 3.29001e-06 [updatestate_loads_eliminate]: 3.23e-06 [parameter_eliminate]: 1.91e-06 [a_2]: 8.129e-05 [accelerated_algorithm]: 6.54999e-06 [shard]: 2.04999e-06 [meta_shard_fg_expand]: 1.76003e-06 [shard_inline]: 6.60002e-06 [merge_send_recv]: 8.30999e-06 [auto_parallel]: 7.83001e-06 [parallel]: 2.959e-05 [flash_sp]: 9.81003e-06 [merge_comm]: 4.18001e-06 [allreduce_fusion]: 3.65e-06 [matmul_add_comm_reduction]: 1.025e-05 [allreduce_slice_to_reducescatter]: 9.20001e-07 [virtual_shard_identity]: 8.88002e-06 [virtual_dataset]: 6.32001e-06 [get_grad_eliminate_]: 5.79e-06 [virtual_output]: 5.80002e-06 [merge_forward]: 4.35e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 1.065e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.305e-05 [merge_recompute_call_nodes]: 1.45001e-06 [before_grad]: 1.119e-05 [set_forward_comm_id_for_comm_node_pass]: 3.86001e-06 [meta_fg_expand]: 2.79999e-06 [flash_sp_send_recv_attached]: 2.79999e-06 [receive_attached]: 2.26e-06 [after_resolve]: 9.62999e-06 [a_after_grad]: 9.00999e-06 [renormalize]: 0.00078965 [add_forward_monad_depend]: 1.082e-05 [auto_monad_grad]: 2.56e-06 [auto_monad_eliminator]: 1.599e-05 [cse]: 3.369e-05 [a_3]: 4.482e-05 [Cycle 2]: 0.00066113, [45] [expand_dump_flag]: 1.33002e-06 [switch_simplify]: 7.37002e-06 [loop_unroll]: 5.59998e-06 [a_1]: 0.00012027 [with_stream_mark]: 1.159e-05 [recompute_prepare]: 6.19001e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 2.79001e-06 [updatestate_loads_eliminate]: 3.29001e-06 [parameter_eliminate]: 1.12e-06 [a_2]: 7.193e-05 [accelerated_algorithm]: 6.15002e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 5.40999e-06 [auto_parallel]: 7.39002e-06 [parallel]: 6.46e-06 [flash_sp]: 3.45998e-06 [merge_comm]: 3.72002e-06 [allreduce_fusion]: 3.63e-06 [matmul_add_comm_reduction]: 6.36e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 7.22002e-06 [virtual_dataset]: 5.46002e-06 [get_grad_eliminate_]: 5.27001e-06 [virtual_output]: 5.19e-06 [merge_forward]: 3.28998e-06 [cell_reuse_recompute_pass]: 1.84998e-06 [offload_activation]: 8.76002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.057e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 9.37001e-06 [set_forward_comm_id_for_comm_node_pass]: 4.10998e-06 [meta_fg_expand]: 2.10002e-06 [flash_sp_send_recv_attached]: 1.40001e-06 [receive_attached]: 1.67001e-06 [after_resolve]: 9.96998e-06 [a_after_grad]: 8.25e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.42e-06 [auto_monad_grad]: 1.59998e-06 [auto_monad_eliminator]: 7.91001e-06 [cse]: 1.559e-05 [a_3]: 4.873e-05 [py_interpret_to_execute_after_opt_a]: 1.331e-05 [slice_cell_reuse_recomputed_activation]: 2.58e-06 [rewriter_after_opt_a]: 3.885e-05 [convert_after_rewriter]: 6.83e-06 [order_py_execute_after_rewriter]: 5.07999e-06 [mutable_eliminate]: 0.00081747 [opt_b]: 0.00021828, [1] [Cycle 1]: 0.00020947, [7] [b_1]: 0.00011737 [b_2]: 7.94002e-06 [updatestate_depend_eliminate]: 1.017e-05 [updatestate_assign_eliminate]: 2.66999e-06 [updatestate_loads_eliminate]: 3.48e-06 [renormalize]: 8.2e-07 [cse]: 2.726e-05 [optimize_parallel_all_gather_comm]: 2.058e-05 [overlap_param_gather]: 2.19001e-06 [cconv]: 3.547e-05 [loop_unroll]: 0.00051825 [opt_after_cconv]: 0.00010893, [1] [Cycle 1]: 0.00010233, [7] [c_1]: 2.745e-05 [parameter_eliminate]: 5.97999e-06 [updatestate_depend_eliminate]: 6.97002e-06 [updatestate_assign_eliminate]: 2.71e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.941e-05 [renormalize]: 7.00005e-07 [remove_dup_value]: 1.754e-05 [tuple_transform]: 7.663e-05, [1] [Cycle 1]: 7.172e-05, [4] [d_1]: 4.491e-05 [none_parameter_eliminate]: 1.72001e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 6.118e-05 [cse_after_recomputation]: 2.247e-05, [1] [Cycle 1]: 1.745e-05, [1] [cse]: 1.191e-05 [environ_conv]: 8.84e-06 [swap_dp_allreduce_reducescatter]: 5.46998e-06 [bias_add_comm_swap]: 3.53999e-06 [label_micro_interleaved_index]: 5.91e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.67001e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.99001e-06 [assign_add_opt]: 1.76998e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 1.13001e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 1.39e-06 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.29e-06 [overlap_opt_shard_in_pipeline]: 1.52999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.41e-06 [control_data_broadcast_order]: 1.482e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 4.84e-06 [overlap_recompute_and_grad_model_parallel]: 6.04001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.57001e-06 [overlap_grad_ring_attention]: 4.99e-06 [overlap_grad_flash_sp]: 2.286e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.49001e-06 [split_layernorm_comm]: 1.97999e-06 [handle_group_info]: 1.15001e-06 [symbol_engine_optimizer]: 8.337e-05, [1] [Cycle 1]: 7.668e-05, [6] [build]: 4.26001e-06 [elim_shapecalc]: 1.232e-05 [elim_not_effective]: 1.294e-05 [opt_reshape]: 6.56e-06 [fold_const_symbol]: 1.029e-05 [renormalize]: 1.99972e-07 [detach_backward]: 2.51998e-06 [pipeline_parallel_scheduler]: 1.62001e-06 [auto_monad_reorder]: 1.733e-05 [get_jit_bprop_graph]: 2.22999e-06 [rewriter_after_jit_bprop_graph]: 5.77001e-06 [opt_after_jit_grad]: 0.00055294 [validate]: 4.715e-05 [backend_pass]: 1.04e-06 [task_emit]: 0.369719 [execute]: 9.44e-06 Sums bootstrap : 0.000626s : 0.16% type_inference : 0.007902s : 2.06% event_method : 0.000018s : 0.00% auto_monad : 0.000065s : 0.02% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000022s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000037s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000028s : 0.01% optimize.rewriter_before_opt_a : 0.000075s : 0.02% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000043s : 0.01% optimize.opt_a.loop_unroll : 0.000026s : 0.01% optimize.opt_a.a_1 : 0.000610s : 0.16% optimize.opt_a.with_stream_mark : 0.000030s : 0.01% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000153s : 0.04% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000014s : 0.00% optimize.opt_a.auto_parallel : 0.000015s : 0.00% optimize.opt_a.parallel : 0.000036s : 0.01% optimize.opt_a.flash_sp : 0.000013s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000019s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.01% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000790s : 0.21% optimize.opt_a.add_forward_monad_depend : 0.000012s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.01% optimize.opt_a.cse : 0.000049s : 0.01% optimize.opt_a.a_3 : 0.000094s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000039s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000817s : 0.21% optimize.opt_b.b_1 : 0.000117s : 0.03% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000027s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000035s : 0.01% optimize.loop_unroll : 0.000518s : 0.14% optimize.opt_after_cconv.c_1 : 0.000027s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000018s : 0.00% optimize.tuple_transform.d_1 : 0.000045s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000061s : 0.02% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000553s : 0.14% validate : 0.000047s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.369719s : 96.44% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000213 26 18.73% : 0.000040s : 5: substitution.arithmetic_simplify 1.00% : 0.000002s : 2: substitution.elim_not_effective 0.88% : 0.000002s : 2: substitution.fold_const_symbol 2.82% : 0.000006s : 3: substitution.graph_param_transform 65.47% : 0.000139s : 3: substitution.inline 2.17% : 0.000005s : 4: substitution.j_node_and_user_rematch 2.32% : 0.000005s : 4: substitution.remove_not_recompute_node 2.16% : 0.000005s : 2: substitution.replace_old_param 4.44% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007831 2 90.08% : 0.007055s : 1: type_inference.infer 9.92% : 0.000777s : 1: type_inference.specialize ------[replace.] 0.000040 4 79.39% : 0.000032s : 3: replace.inline 20.61% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000145 4 94.04% : 0.000137s : 3: match.inline 5.96% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000169 883 0.89% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.52% : 0.000001s : 6: predicate.addn_check_dump 0.85% : 0.000001s : 9: predicate.addn_zero_filter 0.77% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 15: predicate.arithmetic_simplify 0.90% : 0.000002s : 9: predicate.cast_eliminate 0.61% : 0.000001s : 6: predicate.check_bprop_eliminate 0.60% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.59% : 0.000001s : 6: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.60% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.03% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 12: predicate.environ_get_depend_swap 1.75% : 0.000003s : 18: predicate.environ_get_eliminate 1.04% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.24% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.80% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.82% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.63% : 0.000001s : 6: predicate.incorporate_call 0.55% : 0.000001s : 6: predicate.incorporate_call_switch 6.34% : 0.000011s : 40: predicate.inline 0.88% : 0.000001s : 6: predicate.inline_without_move 0.35% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 6: predicate.less_batch_normalization 1.58% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.29% : 0.000004s : 25: predicate.load_eliminater 1.21% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.02% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 9: predicate.minmaximum_grad 2.56% : 0.000004s : 3: predicate.mutable_eliminate 0.39% : 0.000001s : 3: predicate.opt_reshape 0.55% : 0.000001s : 3: predicate.parallel_virtual_node 1.48% : 0.000002s : 13: predicate.partial_defer_inline 1.36% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000001s : 9: predicate.print_const_string_wrapper 0.60% : 0.000001s : 6: predicate.reduce_all_const_elim 1.12% : 0.000002s : 9: predicate.reduce_eliminate 2.28% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 6: predicate.remove_not_recompute_node 1.27% : 0.000002s : 16: predicate.replace_applicator 0.61% : 0.000001s : 6: predicate.replace_old_param 0.39% : 0.000001s : 3: predicate.reset_defer_inline 0.89% : 0.000002s : 9: predicate.reshape_eliminate 0.59% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 3: predicate.row_tensor_eliminate 0.73% : 0.000001s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.89% : 0.000002s : 6: predicate.shard_identity_eliminate 0.75% : 0.000001s : 6: predicate.special_op_eliminate 0.74% : 0.000001s : 6: predicate.specialize_transform 1.24% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 13: predicate.switch_defer_inline 1.87% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.79% : 0.000008s : 43: predicate.switch_simplify 1.00% : 0.000002s : 9: predicate.tile_eliminate 0.85% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.83% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.57% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.20% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.99% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 3: predicate.value_based_eliminate 0.64% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.65% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000499 8 45.17% : 0.000226s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.83% : 0.000274s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.400149 196 0.00% : 0.000004s : 1: ForceFp32Comm 1.06% : 0.004259s : 1: add_attr 1.06% : 0.004240s : 1: add_attr_with_inline 0.00% : 0.000005s : 1: add_comm_op_reuse_tag 0.02% : 0.000066s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.02% : 0.000071s : 1: auto_monad 0.01% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.17% : 0.000662s : 1: bootstrap 0.01% : 0.000039s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000018s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.01% : 0.000026s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.01% : 0.000024s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000005s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.13% : 0.000530s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.21% : 0.000837s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000023s : 1: opt.transform.mutable_eliminate 0.25% : 0.000993s : 78: opt.transform.opt_a 0.01% : 0.000026s : 1: opt.transform.opt_after_cconv 0.01% : 0.000026s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000093s : 28: opt.transform.opt_b 0.01% : 0.000049s : 2: opt.transform.opt_trans_graph 0.01% : 0.000038s : 4: opt.transform.symbol_engine_opt 0.67% : 0.002688s : 1: opt_a 0.03% : 0.000112s : 1: opt_after_cconv 0.14% : 0.000567s : 1: opt_after_jit_grad 0.06% : 0.000222s : 1: opt_b 1.30% : 0.005216s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000041s : 1: pre_auto_parallel 0.01% : 0.000032s : 1: py_interpret_to_execute 0.00% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000022s : 1: remove_dup_value 0.11% : 0.000436s : 1: renormalize.infer 0.09% : 0.000345s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000044s : 1: rewriter_after_opt_a 0.02% : 0.000080s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000086s : 1: symbol_engine_optimizer 92.40% : 0.369742s : 1: task_emit 0.02% : 0.000081s : 1: tuple_transform 1.98% : 0.007930s : 1: type_inference 0.02% : 0.000081s : 1: validate TotalTime = 0.178017, [24] [bootstrap]: 0.00047669 [type_inference]: 0.0069856 [event_method]: 1.64e-05 [auto_monad]: 6.824e-05 [graph_reusing]: 6.37001e-06 [inline]: 2.73e-06 [add_attr]: 0.00387807, [1] [add_attr_with_inline]: 0.00386386, [1] [Cycle 1]: 6.777e-05, [2] [tag_attr]: 1.9e-05 [meta_addattr_fg_expand]: 4.3e-06 [parallel-infer-symbol]: 3.78999e-06 [pre_auto_parallel]: 3.526e-05 [insert-virtual-dataset]: 2.51998e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 2.01e-06 [optimize]: 0.00500695, [53] [py_interpret_to_execute]: 3.044e-05 [rewriter_before_opt_a]: 6.639e-05 [opt_a]: 0.00259694, [2] [Cycle 1]: 0.00192598, [45] [expand_dump_flag]: 2.88998e-06 [switch_simplify]: 2.986e-05 [loop_unroll]: 1.741e-05 [a_1]: 0.00047395 [with_stream_mark]: 2.247e-05 [recompute_prepare]: 9.50001e-06 [updatestate_depend_eliminate]: 4.18001e-06 [updatestate_assign_eliminate]: 3.71999e-06 [updatestate_loads_eliminate]: 3.68e-06 [parameter_eliminate]: 1.86e-06 [a_2]: 8.569e-05 [accelerated_algorithm]: 7.24001e-06 [shard]: 2.80002e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 6.07001e-06 [merge_send_recv]: 1.003e-05 [auto_parallel]: 6.92002e-06 [parallel]: 1.974e-05 [flash_sp]: 1.051e-05 [merge_comm]: 4.25e-06 [allreduce_fusion]: 3.54002e-06 [matmul_add_comm_reduction]: 1.094e-05 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 7.78001e-06 [virtual_dataset]: 5.89999e-06 [get_grad_eliminate_]: 5.62001e-06 [virtual_output]: 6.11e-06 [merge_forward]: 3.98001e-06 [cell_reuse_recompute_pass]: 1.58002e-06 [offload_activation]: 1.082e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.313e-05 [merge_recompute_call_nodes]: 2.23998e-06 [before_grad]: 1.085e-05 [set_forward_comm_id_for_comm_node_pass]: 3.83999e-06 [meta_fg_expand]: 2.81e-06 [flash_sp_send_recv_attached]: 2.79999e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 1.006e-05 [a_after_grad]: 9.29998e-06 [renormalize]: 0.00072013 [add_forward_monad_depend]: 6.33002e-06 [auto_monad_grad]: 2.64999e-06 [auto_monad_eliminator]: 1.678e-05 [cse]: 3.336e-05 [a_3]: 4.766e-05 [Cycle 2]: 0.00065702, [45] [expand_dump_flag]: 2.57001e-06 [switch_simplify]: 7.82e-06 [loop_unroll]: 5.72001e-06 [a_1]: 0.00012281 [with_stream_mark]: 1.552e-05 [recompute_prepare]: 6.79999e-06 [updatestate_depend_eliminate]: 3.26999e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.78003e-06 [parameter_eliminate]: 1.26002e-06 [a_2]: 7.185e-05 [accelerated_algorithm]: 5.72999e-06 [shard]: 1.24e-06 [meta_shard_fg_expand]: 2.14999e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 6.34001e-06 [auto_parallel]: 7.09001e-06 [parallel]: 5.85002e-06 [flash_sp]: 3.86001e-06 [merge_comm]: 3.69002e-06 [allreduce_fusion]: 3.50998e-06 [matmul_add_comm_reduction]: 8.09002e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 6.45002e-06 [virtual_dataset]: 5.47001e-06 [get_grad_eliminate_]: 5.35001e-06 [virtual_output]: 5.15999e-06 [merge_forward]: 3.90998e-06 [cell_reuse_recompute_pass]: 1.74e-06 [offload_activation]: 9.39998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.187e-05 [merge_recompute_call_nodes]: 1.14003e-06 [before_grad]: 9.87001e-06 [set_forward_comm_id_for_comm_node_pass]: 4.65999e-06 [meta_fg_expand]: 1.97001e-06 [flash_sp_send_recv_attached]: 1.47001e-06 [receive_attached]: 1.81e-06 [after_resolve]: 9.39998e-06 [a_after_grad]: 8.00999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.32999e-06 [auto_monad_grad]: 1.15999e-06 [auto_monad_eliminator]: 8.32e-06 [cse]: 1.766e-05 [a_3]: 3.467e-05 [py_interpret_to_execute_after_opt_a]: 1.299e-05 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 4.223e-05 [convert_after_rewriter]: 8.18999e-06 [order_py_execute_after_rewriter]: 4.99e-06 [mutable_eliminate]: 0.0007011 [opt_b]: 0.00022119, [1] [Cycle 1]: 0.00021174, [7] [b_1]: 0.00011687 [b_2]: 9.12001e-06 [updatestate_depend_eliminate]: 1.017e-05 [updatestate_assign_eliminate]: 3.16999e-06 [updatestate_loads_eliminate]: 3.08e-06 [renormalize]: 5.69999e-07 [cse]: 2.81e-05 [optimize_parallel_all_gather_comm]: 2.212e-05 [overlap_param_gather]: 2.24999e-06 [cconv]: 3.595e-05 [loop_unroll]: 0.00051086 [opt_after_cconv]: 0.00014608, [1] [Cycle 1]: 0.00013876, [7] [c_1]: 2.679e-05 [parameter_eliminate]: 4.99e-06 [updatestate_depend_eliminate]: 4.01e-05 [updatestate_assign_eliminate]: 2.71999e-06 [updatestate_loads_eliminate]: 3.04001e-06 [cse]: 2.244e-05 [renormalize]: 6.59988e-07 [remove_dup_value]: 1.816e-05 [tuple_transform]: 7.834e-05, [1] [Cycle 1]: 7.247e-05, [4] [d_1]: 4.393e-05 [none_parameter_eliminate]: 1.82001e-06 [renormalize]: 2.29978e-07 [switch_simplify]: 6.77002e-06 [partial_unused_args_eliminate]: 1.97001e-06 [add_recomputation]: 5.138e-05 [cse_after_recomputation]: 2.228e-05, [1] [Cycle 1]: 1.729e-05, [1] [cse]: 1.155e-05 [environ_conv]: 6.46999e-06 [swap_dp_allreduce_reducescatter]: 5.01997e-06 [bias_add_comm_swap]: 3.21001e-06 [label_micro_interleaved_index]: 5.30999e-06 [label_fine_grained_interleaved_index]: 2.53003e-06 [merge_cast_opt]: 1.67001e-06 [slice_recompute_activation]: 2.58e-06 [micro_interleaved_order_control]: 2.33002e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.30999e-06 [full_micro_interleaved_order_control]: 2.56998e-06 [reorder_send_recv_between_fp_bp]: 2.96001e-06 [comm_op_add_attrs]: 1.09998e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.24998e-06 [overlap_opt_shard_grad_in_pipeline]: 2.06998e-06 [control_data_broadcast_order]: 1.312e-05 [grouped_pairwise_exchange_alltoall]: 1.71998e-06 [offloading_packed_experts]: 4.01001e-06 [overlap_recompute_and_grad_model_parallel]: 6.34001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.38002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.69001e-06 [overlap_grad_ring_attention]: 4.65001e-06 [overlap_grad_flash_sp]: 2.206e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.29001e-06 [split_layernorm_comm]: 2.02999e-06 [handle_group_info]: 1.22999e-06 [symbol_engine_optimizer]: 7.825e-05, [1] [Cycle 1]: 7.368e-05, [6] [build]: 3.75e-06 [elim_shapecalc]: 9.60001e-06 [elim_not_effective]: 1.297e-05 [opt_reshape]: 7.08e-06 [fold_const_symbol]: 1.009e-05 [renormalize]: 3.09985e-07 [detach_backward]: 2.19999e-06 [pipeline_parallel_scheduler]: 1.72999e-06 [auto_monad_reorder]: 1.698e-05 [get_jit_bprop_graph]: 1.85001e-06 [rewriter_after_jit_bprop_graph]: 6.44999e-06 [opt_after_jit_grad]: 0.00052487 [validate]: 4.464e-05 [backend_pass]: 1.18001e-06 [task_emit]: 0.16066 [execute]: 9.79e-06 Sums bootstrap : 0.000477s : 0.28% type_inference : 0.006986s : 4.04% event_method : 0.000016s : 0.01% auto_monad : 0.000068s : 0.04% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000019s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000035s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000030s : 0.02% optimize.rewriter_before_opt_a : 0.000066s : 0.04% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.02% optimize.opt_a.loop_unroll : 0.000023s : 0.01% optimize.opt_a.a_1 : 0.000597s : 0.34% optimize.opt_a.with_stream_mark : 0.000038s : 0.02% optimize.opt_a.recompute_prepare : 0.000016s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000158s : 0.09% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000016s : 0.01% optimize.opt_a.auto_parallel : 0.000014s : 0.01% optimize.opt_a.parallel : 0.000026s : 0.01% optimize.opt_a.flash_sp : 0.000014s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000020s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.01% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000720s : 0.42% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.01% optimize.opt_a.cse : 0.000051s : 0.03% optimize.opt_a.a_3 : 0.000082s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000042s : 0.02% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000701s : 0.41% optimize.opt_b.b_1 : 0.000117s : 0.07% optimize.opt_b.b_2 : 0.000009s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000028s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000036s : 0.02% optimize.loop_unroll : 0.000511s : 0.30% optimize.opt_after_cconv.c_1 : 0.000027s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000040s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000022s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000018s : 0.01% optimize.tuple_transform.d_1 : 0.000044s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.03% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000022s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000525s : 0.30% validate : 0.000045s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.160660s : 92.86% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000200 24 18.70% : 0.000037s : 4: substitution.arithmetic_simplify 0.97% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000002s : 2: substitution.fold_const_symbol 2.94% : 0.000006s : 3: substitution.graph_param_transform 69.46% : 0.000139s : 3: substitution.inline 2.33% : 0.000005s : 4: substitution.j_node_and_user_rematch 2.73% : 0.000005s : 4: substitution.remove_not_recompute_node 2.10% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.006919 2 91.38% : 0.006323s : 1: type_inference.infer 8.62% : 0.000596s : 1: type_inference.specialize ------[replace.] 0.000032 3 100.00% : 0.000032s : 3: replace.inline ------[match.] 0.000137 3 100.00% : 0.000137s : 3: match.inline ------[predicate.] 0.000160 815 0.87% : 0.000001s : 8: predicate.accumulaten_eliminater 1.23% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.78% : 0.000001s : 8: predicate.addn_zero_filter 0.73% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.19% : 0.000003s : 14: predicate.arithmetic_simplify 0.81% : 0.000001s : 8: predicate.cast_eliminate 0.62% : 0.000001s : 6: predicate.check_bprop_eliminate 0.61% : 0.000001s : 6: predicate.compare_switch_simplify 0.25% : 0.000000s : 3: predicate.const_output_eliminate 0.66% : 0.000001s : 6: predicate.depend_value_elim 0.78% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.34% : 0.000001s : 3: predicate.elim_not_effective 0.46% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.34% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.01% : 0.000002s : 11: predicate.environ_get_depend_swap 1.67% : 0.000003s : 17: predicate.environ_get_eliminate 1.04% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.06% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 11: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 1.12% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.69% : 0.000001s : 6: predicate.get_grad_eliminate 0.39% : 0.000001s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.38% : 0.000010s : 37: predicate.inline 1.02% : 0.000002s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 6: predicate.less_batch_normalization 1.53% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.04% : 0.000003s : 22: predicate.load_eliminater 1.90% : 0.000003s : 3: predicate.loop_unroll_after_grad 1.90% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.58% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 6: predicate.merge_addn 0.63% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 2.21% : 0.000004s : 3: predicate.mutable_eliminate 0.39% : 0.000001s : 3: predicate.opt_reshape 0.46% : 0.000001s : 3: predicate.parallel_virtual_node 1.40% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 11: predicate.partial_eliminate 0.80% : 0.000001s : 8: predicate.print_const_string_wrapper 0.61% : 0.000001s : 6: predicate.reduce_all_const_elim 1.20% : 0.000002s : 8: predicate.reduce_eliminate 2.22% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.83% : 0.000001s : 6: predicate.remove_not_recompute_node 1.14% : 0.000002s : 14: predicate.replace_applicator 0.66% : 0.000001s : 6: predicate.replace_old_param 0.49% : 0.000001s : 3: predicate.reset_defer_inline 0.82% : 0.000001s : 8: predicate.reshape_eliminate 0.64% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.51% : 0.000001s : 3: predicate.row_tensor_eliminate 0.98% : 0.000002s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 6: predicate.shard_identity_eliminate 0.75% : 0.000001s : 6: predicate.special_op_eliminate 0.82% : 0.000001s : 6: predicate.specialize_transform 1.25% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.13% : 0.000002s : 11: predicate.switch_defer_inline 1.71% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.61% : 0.000007s : 38: predicate.switch_simplify 0.85% : 0.000001s : 8: predicate.tile_eliminate 0.78% : 0.000001s : 8: predicate.transpose_eliminate 1.38% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.67% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.33% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.66% : 0.000003s : 14: predicate.tuple_list_get_set_item_eliminator 2.42% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.46% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 1.99% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.76% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 3: predicate.value_based_eliminate 0.66% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.76% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.70% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000425 7 37.07% : 0.000157s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.93% : 0.000267s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.188761 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.06% : 0.003884s : 1: add_attr 2.05% : 0.003868s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000056s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.04% : 0.000074s : 1: auto_monad 0.01% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.27% : 0.000508s : 1: bootstrap 0.02% : 0.000040s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000010s : 1: environ_conv 0.01% : 0.000024s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000011s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.28% : 0.000524s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000718s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000019s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000021s : 1: opt.transform.mutable_eliminate 0.52% : 0.000978s : 78: opt.transform.opt_a 0.01% : 0.000025s : 1: opt.transform.opt_after_cconv 0.01% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.05% : 0.000095s : 28: opt.transform.opt_b 0.03% : 0.000048s : 2: opt.transform.opt_trans_graph 0.02% : 0.000036s : 4: opt.transform.symbol_engine_opt 1.38% : 0.002600s : 1: opt_a 0.08% : 0.000150s : 1: opt_after_cconv 0.29% : 0.000538s : 1: opt_after_jit_grad 0.12% : 0.000225s : 1: opt_b 2.66% : 0.005012s : 1: optimize 0.01% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000040s : 1: pre_auto_parallel 0.02% : 0.000034s : 1: py_interpret_to_execute 0.01% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000023s : 1: remove_dup_value 0.22% : 0.000406s : 1: renormalize.infer 0.16% : 0.000305s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000047s : 1: rewriter_after_opt_a 0.04% : 0.000071s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.04% : 0.000081s : 1: symbol_engine_optimizer 85.13% : 0.160685s : 1: task_emit 0.04% : 0.000081s : 1: tuple_transform 3.72% : 0.007018s : 1: type_inference 0.04% : 0.000080s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x5-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x5-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x6-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x6-pynative],max_mem:6.0M TotalTime = 0.0262328, [24] [bootstrap]: 0.00059671 [type_inference]: 0.00785393 [event_method]: 1.75e-05 [auto_monad]: 6.449e-05 [graph_reusing]: 6.17999e-06 [inline]: 2.93998e-06 [add_attr]: 0.00429422, [1] [add_attr_with_inline]: 0.00427799, [1] [Cycle 1]: 7.307e-05, [2] [tag_attr]: 2.004e-05 [meta_addattr_fg_expand]: 4.37e-06 [parallel-infer-symbol]: 3.33e-06 [pre_auto_parallel]: 3.748e-05 [insert-virtual-dataset]: 2.68e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 2.44999e-06 [pipeline_split]: 1.75001e-06 [optimize]: 0.00504108, [53] [py_interpret_to_execute]: 2.789e-05 [rewriter_before_opt_a]: 7.526e-05 [opt_a]: 0.00268299, [2] [Cycle 1]: 0.00202329, [45] [expand_dump_flag]: 3.21999e-06 [switch_simplify]: 3.623e-05 [loop_unroll]: 2.078e-05 [a_1]: 0.00049881 [with_stream_mark]: 1.903e-05 [recompute_prepare]: 9.37999e-06 [updatestate_depend_eliminate]: 4.15e-06 [updatestate_assign_eliminate]: 3.53999e-06 [updatestate_loads_eliminate]: 3.34001e-06 [parameter_eliminate]: 2.11e-06 [a_2]: 8.21e-05 [accelerated_algorithm]: 6.79999e-06 [shard]: 2.04e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 6.39999e-06 [merge_send_recv]: 9.07999e-06 [auto_parallel]: 7.18e-06 [parallel]: 2.852e-05 [flash_sp]: 9.81e-06 [merge_comm]: 3.97e-06 [allreduce_fusion]: 3.73001e-06 [matmul_add_comm_reduction]: 1.052e-05 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 7.57002e-06 [virtual_dataset]: 6.33002e-06 [get_grad_eliminate_]: 5.74e-06 [virtual_output]: 6.04999e-06 [merge_forward]: 4.25e-06 [cell_reuse_recompute_pass]: 1.37999e-06 [offload_activation]: 1.027e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.245e-05 [merge_recompute_call_nodes]: 1.72999e-06 [before_grad]: 1.084e-05 [set_forward_comm_id_for_comm_node_pass]: 3.93999e-06 [meta_fg_expand]: 3.11999e-06 [flash_sp_send_recv_attached]: 3.22002e-06 [receive_attached]: 2.51e-06 [after_resolve]: 9.85002e-06 [a_after_grad]: 8.80001e-06 [renormalize]: 0.0007704 [add_forward_monad_depend]: 1.161e-05 [auto_monad_grad]: 2.64001e-06 [auto_monad_eliminator]: 1.75e-05 [cse]: 3.411e-05 [a_3]: 4.612e-05 [Cycle 2]: 0.00064605, [45] [expand_dump_flag]: 1.79e-06 [switch_simplify]: 7.18e-06 [loop_unroll]: 5.45001e-06 [a_1]: 0.00012179 [with_stream_mark]: 1.242e-05 [recompute_prepare]: 5.84e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.84001e-06 [parameter_eliminate]: 1.16002e-06 [a_2]: 7.153e-05 [accelerated_algorithm]: 6.36998e-06 [shard]: 1.20001e-06 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 6.12001e-06 [merge_send_recv]: 6.11e-06 [auto_parallel]: 6.68e-06 [parallel]: 6.04001e-06 [flash_sp]: 3.58999e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 3.65998e-06 [matmul_add_comm_reduction]: 7.50998e-06 [allreduce_slice_to_reducescatter]: 5.09986e-07 [virtual_shard_identity]: 7.51999e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.64e-06 [virtual_output]: 5.12999e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 2.32001e-06 [offload_activation]: 8.23001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.117e-05 [merge_recompute_call_nodes]: 1.15999e-06 [before_grad]: 9.57999e-06 [set_forward_comm_id_for_comm_node_pass]: 4.95001e-06 [meta_fg_expand]: 2.04999e-06 [flash_sp_send_recv_attached]: 1.22999e-06 [receive_attached]: 1.71e-06 [after_resolve]: 9.72001e-06 [a_after_grad]: 8.07e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.86e-06 [auto_monad_grad]: 1.60001e-06 [auto_monad_eliminator]: 7e-06 [cse]: 1.677e-05 [a_3]: 3.251e-05 [py_interpret_to_execute_after_opt_a]: 1.41e-05 [slice_cell_reuse_recomputed_activation]: 2.16998e-06 [rewriter_after_opt_a]: 4.024e-05 [convert_after_rewriter]: 6.73e-06 [order_py_execute_after_rewriter]: 5.32999e-06 [mutable_eliminate]: 0.00069844 [opt_b]: 0.0002132, [1] [Cycle 1]: 0.0002044, [7] [b_1]: 0.00011429 [b_2]: 1.224e-05 [updatestate_depend_eliminate]: 8.33999e-06 [updatestate_assign_eliminate]: 2.79999e-06 [updatestate_loads_eliminate]: 2.68003e-06 [renormalize]: 4.39992e-07 [cse]: 2.484e-05 [optimize_parallel_all_gather_comm]: 2.017e-05 [overlap_param_gather]: 2.19999e-06 [cconv]: 3.177e-05 [loop_unroll]: 0.00050162 [opt_after_cconv]: 0.00010822, [1] [Cycle 1]: 0.00010084, [7] [c_1]: 2.726e-05 [parameter_eliminate]: 4.45e-06 [updatestate_depend_eliminate]: 6.44001e-06 [updatestate_assign_eliminate]: 2.81e-06 [updatestate_loads_eliminate]: 2.41e-06 [cse]: 2.18e-05 [renormalize]: 2.49973e-07 [remove_dup_value]: 1.618e-05 [tuple_transform]: 7.413e-05, [1] [Cycle 1]: 6.941e-05, [4] [d_1]: 4.252e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.39001e-06 [partial_unused_args_eliminate]: 2.04e-06 [add_recomputation]: 5.958e-05 [cse_after_recomputation]: 2.202e-05, [1] [Cycle 1]: 1.749e-05, [1] [cse]: 1.176e-05 [environ_conv]: 1.032e-05 [swap_dp_allreduce_reducescatter]: 5.67999e-06 [bias_add_comm_swap]: 2.94001e-06 [label_micro_interleaved_index]: 5.52999e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.42999e-06 [slice_recompute_activation]: 2.73e-06 [micro_interleaved_order_control]: 2.66e-06 [assign_add_opt]: 1.36998e-06 [ForceFp32Comm]: 1.09e-06 [remove_cast_before_assign_add]: 1.26997e-06 [full_micro_interleaved_order_control]: 2.33998e-06 [reorder_send_recv_between_fp_bp]: 2.81e-06 [comm_op_add_attrs]: 1.33002e-06 [add_comm_op_reuse_tag]: 1.34998e-06 [interleave_split_concat_branches]: 1.57001e-06 [interleave_parallel_branches]: 1.15001e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.508e-05 [grouped_pairwise_exchange_alltoall]: 1.61002e-06 [offloading_packed_experts]: 4.42003e-06 [overlap_recompute_and_grad_model_parallel]: 5.22e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.73e-06 [overlap_grad_ring_attention]: 4.42e-06 [overlap_grad_flash_sp]: 2.114e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.35002e-06 [split_layernorm_comm]: 1.88002e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 8.039e-05, [1] [Cycle 1]: 7.502e-05, [6] [build]: 4.32e-06 [elim_shapecalc]: 1.072e-05 [elim_not_effective]: 1.33e-05 [opt_reshape]: 6.89999e-06 [fold_const_symbol]: 1.002e-05 [renormalize]: 2.3999e-07 [detach_backward]: 2.50997e-06 [pipeline_parallel_scheduler]: 1.87999e-06 [auto_monad_reorder]: 1.741e-05 [get_jit_bprop_graph]: 2.34001e-06 [rewriter_after_jit_bprop_graph]: 5.74999e-06 [opt_after_jit_grad]: 0.0005175 [validate]: 4.643e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00743706 [execute]: 1.038e-05 Sums bootstrap : 0.000597s : 2.87% type_inference : 0.007854s : 37.75% event_method : 0.000017s : 0.08% auto_monad : 0.000064s : 0.31% graph_reusing : 0.000006s : 0.03% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000020s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000037s : 0.18% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000028s : 0.13% optimize.rewriter_before_opt_a : 0.000075s : 0.36% optimize.opt_a.expand_dump_flag : 0.000005s : 0.02% optimize.opt_a.switch_simplify : 0.000043s : 0.21% optimize.opt_a.loop_unroll : 0.000026s : 0.13% optimize.opt_a.a_1 : 0.000621s : 2.98% optimize.opt_a.with_stream_mark : 0.000031s : 0.15% optimize.opt_a.recompute_prepare : 0.000015s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000154s : 0.74% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.06% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.06% optimize.opt_a.merge_send_recv : 0.000015s : 0.07% optimize.opt_a.auto_parallel : 0.000014s : 0.07% optimize.opt_a.parallel : 0.000035s : 0.17% optimize.opt_a.flash_sp : 0.000013s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.07% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.05% optimize.opt_a.virtual_output : 0.000011s : 0.05% optimize.opt_a.merge_forward : 0.000008s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.02% optimize.opt_a.offload_activation : 0.000018s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000020s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000009s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.09% optimize.opt_a.a_after_grad : 0.000017s : 0.08% optimize.opt_a.renormalize : 0.000770s : 3.70% optimize.opt_a.add_forward_monad_depend : 0.000013s : 0.06% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.12% optimize.opt_a.cse : 0.000051s : 0.24% optimize.opt_a.a_3 : 0.000079s : 0.38% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000040s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.03% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000698s : 3.36% optimize.opt_b.b_1 : 0.000114s : 0.55% optimize.opt_b.b_2 : 0.000012s : 0.06% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000025s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000032s : 0.15% optimize.loop_unroll : 0.000502s : 2.41% optimize.opt_after_cconv.c_1 : 0.000027s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000022s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.08% optimize.tuple_transform.d_1 : 0.000043s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000060s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000010s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000006s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000002s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000015s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000021s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.08% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000006s : 0.03% opt_after_jit_grad : 0.000517s : 2.49% validate : 0.000046s : 0.22% backend_pass : 0.000001s : 0.00% task_emit : 0.007437s : 35.75% execute : 0.000010s : 0.05% Time group info: ------[substitution.] 0.000221 26 18.71% : 0.000041s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.64% : 0.000001s : 2: substitution.fold_const_symbol 2.80% : 0.000006s : 3: substitution.graph_param_transform 66.22% : 0.000146s : 3: substitution.inline 1.87% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.13% : 0.000005s : 4: substitution.remove_not_recompute_node 1.98% : 0.000004s : 2: substitution.replace_old_param 4.61% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007744 2 90.86% : 0.007036s : 1: type_inference.infer 9.14% : 0.000708s : 1: type_inference.specialize ------[replace.] 0.000043 4 79.11% : 0.000034s : 3: replace.inline 20.89% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000154 4 93.89% : 0.000144s : 3: match.inline 6.11% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000168 883 0.91% : 0.000002s : 9: predicate.accumulaten_eliminater 0.88% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 6: predicate.addn_check_dump 0.86% : 0.000001s : 9: predicate.addn_zero_filter 0.81% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.02% : 0.000003s : 15: predicate.arithmetic_simplify 0.90% : 0.000002s : 9: predicate.cast_eliminate 0.59% : 0.000001s : 6: predicate.check_bprop_eliminate 0.53% : 0.000001s : 6: predicate.compare_switch_simplify 0.18% : 0.000000s : 3: predicate.const_output_eliminate 0.74% : 0.000001s : 6: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.45% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.29% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.03% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.01% : 0.000002s : 12: predicate.environ_get_depend_swap 1.74% : 0.000003s : 18: predicate.environ_get_eliminate 1.06% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.33% : 0.000004s : 13: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.82% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.70% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.55% : 0.000001s : 6: predicate.incorporate_call_switch 6.63% : 0.000011s : 40: predicate.inline 0.95% : 0.000002s : 6: predicate.inline_without_move 0.37% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 6: predicate.less_batch_normalization 1.74% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.30% : 0.000004s : 25: predicate.load_eliminater 1.25% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.17% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.55% : 0.000001s : 6: predicate.merge_addn 0.55% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.59% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 9: predicate.minmaximum_grad 1.78% : 0.000003s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.32% : 0.000001s : 3: predicate.parallel_virtual_node 1.50% : 0.000003s : 13: predicate.partial_defer_inline 1.40% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000001s : 9: predicate.print_const_string_wrapper 0.59% : 0.000001s : 6: predicate.reduce_all_const_elim 1.14% : 0.000002s : 9: predicate.reduce_eliminate 2.31% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.40% : 0.000001s : 6: predicate.remove_not_recompute_node 1.17% : 0.000002s : 16: predicate.replace_applicator 0.75% : 0.000001s : 6: predicate.replace_old_param 0.39% : 0.000001s : 3: predicate.reset_defer_inline 0.87% : 0.000001s : 9: predicate.reshape_eliminate 0.58% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.94% : 0.000002s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.30% : 0.000002s : 6: predicate.shard_identity_eliminate 0.77% : 0.000001s : 6: predicate.special_op_eliminate 1.00% : 0.000002s : 6: predicate.specialize_transform 1.39% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.34% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 13: predicate.switch_defer_inline 1.85% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.84% : 0.000008s : 43: predicate.switch_simplify 0.84% : 0.000001s : 9: predicate.tile_eliminate 0.84% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.68% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.59% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.18% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.91% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.55% : 0.000001s : 3: predicate.value_based_eliminate 0.93% : 0.000002s : 6: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000469 8 46.92% : 0.000220s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.08% : 0.000249s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.037495 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.47% : 0.004300s : 1: add_attr 11.42% : 0.004282s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000064s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000071s : 1: auto_monad 0.06% : 0.000022s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.71% : 0.000642s : 1: bootstrap 0.10% : 0.000036s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000019s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.04% : 0.000014s : 1: environ_conv 0.07% : 0.000024s : 1: event_method 0.05% : 0.000018s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.37% : 0.000512s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.90% : 0.000713s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000019s : 1: opt.transform.mutable_eliminate 2.68% : 0.001005s : 78: opt.transform.opt_a 0.07% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.26% : 0.000096s : 28: opt.transform.opt_b 0.12% : 0.000047s : 2: opt.transform.opt_trans_graph 0.10% : 0.000037s : 4: opt.transform.symbol_engine_opt 7.16% : 0.002686s : 1: opt_a 0.30% : 0.000112s : 1: opt_after_cconv 1.41% : 0.000529s : 1: opt_after_jit_grad 0.58% : 0.000217s : 1: opt_b 13.46% : 0.005047s : 1: optimize 0.06% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000042s : 1: pre_auto_parallel 0.09% : 0.000032s : 1: py_interpret_to_execute 0.05% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000020s : 1: remove_dup_value 1.14% : 0.000429s : 1: renormalize.infer 0.89% : 0.000332s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000045s : 1: rewriter_after_opt_a 0.21% : 0.000080s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000006s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000083s : 1: symbol_engine_optimizer 19.89% : 0.007458s : 1: task_emit 0.21% : 0.000077s : 1: tuple_transform 21.01% : 0.007879s : 1: type_inference 0.23% : 0.000086s : 1: validate TotalTime = 0.0309761, [24] [bootstrap]: 0.00049296 [type_inference]: 0.00808807 [event_method]: 1.677e-05 [auto_monad]: 6.833e-05 [graph_reusing]: 5.84e-06 [inline]: 3.61001e-06 [add_attr]: 0.00393232, [1] [add_attr_with_inline]: 0.0039197, [1] [Cycle 1]: 7.097e-05, [2] [tag_attr]: 1.973e-05 [meta_addattr_fg_expand]: 4e-06 [parallel-infer-symbol]: 3.76001e-06 [pre_auto_parallel]: 3.565e-05 [insert-virtual-dataset]: 2.60002e-06 [parallel-infer-symbol-second]: 1.12e-06 [dataset_repeat_opt]: 2.49999e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.00910839, [53] [py_interpret_to_execute]: 2.735e-05 [rewriter_before_opt_a]: 6.239e-05 [opt_a]: 0.00627375, [2] [Cycle 1]: 0.00546816, [45] [expand_dump_flag]: 2.84999e-06 [switch_simplify]: 3.083e-05 [loop_unroll]: 1.813e-05 [a_1]: 0.00039369 [with_stream_mark]: 2.094e-05 [recompute_prepare]: 8.2e-06 [updatestate_depend_eliminate]: 4.3e-06 [updatestate_assign_eliminate]: 3.66999e-06 [updatestate_loads_eliminate]: 3.79002e-06 [parameter_eliminate]: 1.81e-06 [a_2]: 8.263e-05 [accelerated_algorithm]: 7.68999e-06 [shard]: 2.89001e-06 [meta_shard_fg_expand]: 1.83002e-06 [shard_inline]: 6.53003e-06 [merge_send_recv]: 9.37001e-06 [auto_parallel]: 7.5e-06 [parallel]: 1.992e-05 [flash_sp]: 1.078e-05 [merge_comm]: 4.20999e-06 [allreduce_fusion]: 3.53999e-06 [matmul_add_comm_reduction]: 1.061e-05 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 8.63001e-06 [virtual_dataset]: 6.01998e-06 [get_grad_eliminate_]: 6.23e-06 [virtual_output]: 5.98002e-06 [merge_forward]: 4.56002e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 1.137e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.329e-05 [merge_recompute_call_nodes]: 1.86e-06 [before_grad]: 1.091e-05 [set_forward_comm_id_for_comm_node_pass]: 4.24002e-06 [meta_fg_expand]: 2.64001e-06 [flash_sp_send_recv_attached]: 2.86999e-06 [receive_attached]: 2.31998e-06 [after_resolve]: 1.02e-05 [a_after_grad]: 8.37998e-06 [renormalize]: 0.00424075 [add_forward_monad_depend]: 1.724e-05 [auto_monad_grad]: 3.14001e-06 [auto_monad_eliminator]: 4.21e-05 [cse]: 4.278e-05 [a_3]: 6.516e-05 [Cycle 2]: 0.00079005, [45] [expand_dump_flag]: 3.48999e-06 [switch_simplify]: 8.52998e-06 [loop_unroll]: 6.26998e-06 [a_1]: 0.00015186 [with_stream_mark]: 3.497e-05 [recompute_prepare]: 6.93998e-06 [updatestate_depend_eliminate]: 4.49998e-06 [updatestate_assign_eliminate]: 3.46001e-06 [updatestate_loads_eliminate]: 4.85001e-06 [parameter_eliminate]: 2.41e-06 [a_2]: 7.749e-05 [accelerated_algorithm]: 7.05002e-06 [shard]: 2.94999e-06 [meta_shard_fg_expand]: 2.72001e-06 [shard_inline]: 6.55002e-06 [merge_send_recv]: 8.84e-06 [auto_parallel]: 1.012e-05 [parallel]: 1.205e-05 [flash_sp]: 4.07e-06 [merge_comm]: 3.75e-06 [allreduce_fusion]: 3.38999e-06 [matmul_add_comm_reduction]: 1.247e-05 [allreduce_slice_to_reducescatter]: 9.10019e-07 [virtual_shard_identity]: 8.08001e-06 [virtual_dataset]: 5.84e-06 [get_grad_eliminate_]: 5.51998e-06 [virtual_output]: 5.71998e-06 [merge_forward]: 4.83001e-06 [cell_reuse_recompute_pass]: 3.09001e-06 [offload_activation]: 1.21e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.255e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 1.004e-05 [set_forward_comm_id_for_comm_node_pass]: 5.05001e-06 [meta_fg_expand]: 3.18e-06 [flash_sp_send_recv_attached]: 1.97001e-06 [receive_attached]: 2.71999e-06 [after_resolve]: 1.348e-05 [a_after_grad]: 8.83001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.79e-06 [auto_monad_grad]: 1.74998e-06 [auto_monad_eliminator]: 9.97999e-06 [cse]: 1.995e-05 [a_3]: 3.524e-05 [py_interpret_to_execute_after_opt_a]: 2.25e-05 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 4.801e-05 [convert_after_rewriter]: 7.53999e-06 [order_py_execute_after_rewriter]: 6.76e-06 [mutable_eliminate]: 0.00095549 [opt_b]: 0.00023947, [1] [Cycle 1]: 0.00022786, [7] [b_1]: 0.00012029 [b_2]: 9.99001e-06 [updatestate_depend_eliminate]: 1.188e-05 [updatestate_assign_eliminate]: 3.88001e-06 [updatestate_loads_eliminate]: 3.06999e-06 [renormalize]: 8.79983e-07 [cse]: 3.529e-05 [optimize_parallel_all_gather_comm]: 2.647e-05 [overlap_param_gather]: 2.04e-06 [cconv]: 3.974e-05 [loop_unroll]: 0.0005946 [opt_after_cconv]: 0.00012867, [1] [Cycle 1]: 0.00011895, [7] [c_1]: 2.832e-05 [parameter_eliminate]: 6.91001e-06 [updatestate_depend_eliminate]: 1.06e-05 [updatestate_assign_eliminate]: 3.08e-06 [updatestate_loads_eliminate]: 2.99999e-06 [cse]: 2.836e-05 [renormalize]: 5.79981e-07 [remove_dup_value]: 1.945e-05 [tuple_transform]: 8.031e-05, [1] [Cycle 1]: 7.505e-05, [4] [d_1]: 4.602e-05 [none_parameter_eliminate]: 1.76e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.82002e-06 [partial_unused_args_eliminate]: 2.00002e-06 [add_recomputation]: 5.896e-05 [cse_after_recomputation]: 2.573e-05, [1] [Cycle 1]: 1.974e-05, [1] [cse]: 1.369e-05 [environ_conv]: 7.6e-06 [swap_dp_allreduce_reducescatter]: 5.67001e-06 [bias_add_comm_swap]: 3.09001e-06 [label_micro_interleaved_index]: 7.3e-06 [label_fine_grained_interleaved_index]: 2.67001e-06 [merge_cast_opt]: 1.56002e-06 [slice_recompute_activation]: 2.30002e-06 [micro_interleaved_order_control]: 3.28e-06 [assign_add_opt]: 1.40001e-06 [ForceFp32Comm]: 1.09e-06 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.39999e-06 [reorder_send_recv_between_fp_bp]: 3.13e-06 [comm_op_add_attrs]: 1.18001e-06 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.29998e-06 [interleave_parallel_branches]: 1.34e-06 [overlap_opt_shard_in_pipeline]: 1.76e-06 [overlap_opt_shard_grad_in_pipeline]: 2.06998e-06 [control_data_broadcast_order]: 1.672e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 5.04e-06 [overlap_recompute_and_grad_model_parallel]: 5.59e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 2.50997e-06 [overlap_grad_ring_attention]: 4.66002e-06 [overlap_grad_flash_sp]: 2.508e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.46998e-06 [split_layernorm_comm]: 2.02999e-06 [handle_group_info]: 1.04998e-06 [symbol_engine_optimizer]: 9.264e-05, [1] [Cycle 1]: 8.629e-05, [6] [build]: 5.14998e-06 [elim_shapecalc]: 1.375e-05 [elim_not_effective]: 1.503e-05 [opt_reshape]: 7.31999e-06 [fold_const_symbol]: 1.052e-05 [renormalize]: 2.3999e-07 [detach_backward]: 2.41e-06 [pipeline_parallel_scheduler]: 1.67001e-06 [auto_monad_reorder]: 2.021e-05 [get_jit_bprop_graph]: 2.24999e-06 [rewriter_after_jit_bprop_graph]: 7.95e-06 [opt_after_jit_grad]: 0.00076158 [validate]: 5.287e-05 [backend_pass]: 1.17e-06 [task_emit]: 0.00804698 [execute]: 1.007e-05 Sums bootstrap : 0.000493s : 1.91% type_inference : 0.008088s : 31.40% event_method : 0.000017s : 0.07% auto_monad : 0.000068s : 0.27% graph_reusing : 0.000006s : 0.02% inline : 0.000004s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000020s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000036s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000027s : 0.11% optimize.rewriter_before_opt_a : 0.000062s : 0.24% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.15% optimize.opt_a.loop_unroll : 0.000024s : 0.09% optimize.opt_a.a_1 : 0.000546s : 2.12% optimize.opt_a.with_stream_mark : 0.000056s : 0.22% optimize.opt_a.recompute_prepare : 0.000015s : 0.06% optimize.opt_a.updatestate_depend_eliminate : 0.000009s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000009s : 0.03% optimize.opt_a.parameter_eliminate : 0.000004s : 0.02% optimize.opt_a.a_2 : 0.000160s : 0.62% optimize.opt_a.accelerated_algorithm : 0.000015s : 0.06% optimize.opt_a.shard : 0.000006s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000005s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.05% optimize.opt_a.merge_send_recv : 0.000018s : 0.07% optimize.opt_a.auto_parallel : 0.000018s : 0.07% optimize.opt_a.parallel : 0.000032s : 0.12% optimize.opt_a.flash_sp : 0.000015s : 0.06% optimize.opt_a.merge_comm : 0.000008s : 0.03% optimize.opt_a.allreduce_fusion : 0.000007s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000023s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000017s : 0.06% optimize.opt_a.virtual_dataset : 0.000012s : 0.05% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.05% optimize.opt_a.virtual_output : 0.000012s : 0.05% optimize.opt_a.merge_forward : 0.000009s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.02% optimize.opt_a.offload_activation : 0.000023s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000026s : 0.10% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000021s : 0.08% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000009s : 0.04% optimize.opt_a.meta_fg_expand : 0.000006s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000024s : 0.09% optimize.opt_a.a_after_grad : 0.000017s : 0.07% optimize.opt_a.renormalize : 0.004241s : 16.46% optimize.opt_a.add_forward_monad_depend : 0.000019s : 0.07% optimize.opt_a.auto_monad_grad : 0.000005s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000052s : 0.20% optimize.opt_a.cse : 0.000063s : 0.24% optimize.opt_a.a_3 : 0.000100s : 0.39% optimize.py_interpret_to_execute_after_opt_a : 0.000022s : 0.09% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.19% optimize.convert_after_rewriter : 0.000008s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.03% optimize.mutable_eliminate : 0.000955s : 3.71% optimize.opt_b.b_1 : 0.000120s : 0.47% optimize.opt_b.b_2 : 0.000010s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000012s : 0.05% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000035s : 0.14% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000040s : 0.15% optimize.loop_unroll : 0.000595s : 2.31% optimize.opt_after_cconv.c_1 : 0.000028s : 0.11% optimize.opt_after_cconv.parameter_eliminate : 0.000007s : 0.03% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000011s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000028s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000019s : 0.08% optimize.tuple_transform.d_1 : 0.000046s : 0.18% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000059s : 0.23% optimize.cse_after_recomputation.cse : 0.000014s : 0.05% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000007s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000005s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000020s : 0.08% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000008s : 0.03% opt_after_jit_grad : 0.000762s : 2.96% validate : 0.000053s : 0.21% backend_pass : 0.000001s : 0.00% task_emit : 0.008047s : 31.24% execute : 0.000010s : 0.04% Time group info: ------[substitution.] 0.000192 24 22.10% : 0.000042s : 4: substitution.arithmetic_simplify 1.19% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000002s : 2: substitution.fold_const_symbol 3.40% : 0.000007s : 3: substitution.graph_param_transform 63.40% : 0.000121s : 3: substitution.inline 2.70% : 0.000005s : 4: substitution.j_node_and_user_rematch 3.01% : 0.000006s : 4: substitution.remove_not_recompute_node 3.37% : 0.000006s : 2: substitution.replace_old_param ------[type_inference.] 0.008021 2 92.80% : 0.007444s : 1: type_inference.infer 7.20% : 0.000578s : 1: type_inference.specialize ------[replace.] 0.000032 3 100.00% : 0.000032s : 3: replace.inline ------[match.] 0.000119 3 100.00% : 0.000119s : 3: match.inline ------[predicate.] 0.000175 815 0.84% : 0.000001s : 8: predicate.accumulaten_eliminater 1.73% : 0.000003s : 3: predicate.ad_related_special_op_eliminate 0.51% : 0.000001s : 6: predicate.addn_check_dump 0.77% : 0.000001s : 8: predicate.addn_zero_filter 0.70% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.57% : 0.000005s : 14: predicate.arithmetic_simplify 0.86% : 0.000002s : 8: predicate.cast_eliminate 0.62% : 0.000001s : 6: predicate.check_bprop_eliminate 0.53% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 8: predicate.dict_set_item_eliminator 2.05% : 0.000004s : 6: predicate.dumpgradient_eliminate 0.54% : 0.000001s : 3: predicate.elim_not_effective 0.89% : 0.000002s : 3: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000002s : 11: predicate.environ_add_const_eliminate 0.95% : 0.000002s : 11: predicate.environ_get_add_eliminate 0.93% : 0.000002s : 11: predicate.environ_get_depend_swap 1.59% : 0.000003s : 17: predicate.environ_get_eliminate 0.99% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.12% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.04% : 0.000004s : 11: predicate.float_depend_g_call 0.54% : 0.000001s : 6: predicate.float_environ_get_switch 0.74% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.67% : 0.000001s : 6: predicate.get_grad_eliminate 0.33% : 0.000001s : 3: predicate.graph_param_transform 0.70% : 0.000001s : 6: predicate.incorporate_call 0.54% : 0.000001s : 6: predicate.incorporate_call_switch 6.16% : 0.000011s : 37: predicate.inline 0.87% : 0.000002s : 6: predicate.inline_without_move 0.34% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.07% : 0.000002s : 6: predicate.less_batch_normalization 1.44% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.03% : 0.000004s : 22: predicate.load_eliminater 2.10% : 0.000004s : 3: predicate.loop_unroll_after_grad 2.00% : 0.000004s : 18: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.54% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 8: predicate.minmaximum_grad 2.61% : 0.000005s : 3: predicate.mutable_eliminate 0.58% : 0.000001s : 3: predicate.opt_reshape 0.51% : 0.000001s : 3: predicate.parallel_virtual_node 1.26% : 0.000002s : 11: predicate.partial_defer_inline 1.09% : 0.000002s : 11: predicate.partial_eliminate 0.82% : 0.000001s : 8: predicate.print_const_string_wrapper 0.70% : 0.000001s : 6: predicate.reduce_all_const_elim 1.22% : 0.000002s : 8: predicate.reduce_eliminate 2.05% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.64% : 0.000001s : 6: predicate.remove_not_recompute_node 1.46% : 0.000003s : 14: predicate.replace_applicator 0.71% : 0.000001s : 6: predicate.replace_old_param 0.50% : 0.000001s : 3: predicate.reset_defer_inline 0.80% : 0.000001s : 8: predicate.reshape_eliminate 0.58% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 3: predicate.row_tensor_eliminate 1.13% : 0.000002s : 6: predicate.same_eliminate 0.44% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.76% : 0.000001s : 6: predicate.shard_identity_eliminate 0.88% : 0.000002s : 6: predicate.special_op_eliminate 0.81% : 0.000001s : 6: predicate.specialize_transform 1.28% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.04% : 0.000002s : 11: predicate.switch_defer_inline 1.65% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.50% : 0.000008s : 38: predicate.switch_simplify 0.95% : 0.000002s : 8: predicate.tile_eliminate 0.83% : 0.000001s : 8: predicate.transpose_eliminate 1.49% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.33% : 0.000006s : 20: predicate.tuple_list_get_item_eliminator 1.51% : 0.000003s : 14: predicate.tuple_list_get_set_item_eliminator 2.10% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.46% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 1.79% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.57% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.52% : 0.000001s : 3: predicate.value_based_eliminate 0.66% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.65% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000408 7 34.92% : 0.000143s : 2: func_graph_cloner_run.FuncGraphClonerGraph 65.08% : 0.000266s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.049389 196 0.01% : 0.000005s : 1: ForceFp32Comm 7.97% : 0.003938s : 1: add_attr 7.94% : 0.003924s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.13% : 0.000065s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.15% : 0.000074s : 1: auto_monad 0.05% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 1.08% : 0.000536s : 1: bootstrap 0.09% : 0.000045s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000011s : 1: convert_after_rewriter 0.06% : 0.000029s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.05% : 0.000024s : 1: event_method 0.04% : 0.000019s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.02% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000007s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000005s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000010s : 1: label_micro_interleaved_index 1.24% : 0.000611s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 1.98% : 0.000977s : 1: mutable_eliminate 0.02% : 0.000009s : 1: offloading_packed_experts 0.04% : 0.000021s : 1: opt.transform.loop_unroll_optimizer 0.06% : 0.000029s : 1: opt.transform.mutable_eliminate 1.92% : 0.000947s : 78: opt.transform.opt_a 0.05% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000038s : 1: opt.transform.opt_after_jit_grad 0.20% : 0.000098s : 28: opt.transform.opt_b 0.10% : 0.000050s : 2: opt.transform.opt_trans_graph 0.09% : 0.000042s : 4: opt.transform.symbol_engine_opt 12.71% : 0.006278s : 1: opt_a 0.27% : 0.000133s : 1: opt_after_cconv 1.59% : 0.000785s : 1: opt_after_jit_grad 0.49% : 0.000244s : 1: opt_b 18.45% : 0.009115s : 1: optimize 0.07% : 0.000033s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.06% : 0.000029s : 1: overlap_grad_flash_sp 0.01% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000006s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000041s : 1: pre_auto_parallel 0.06% : 0.000031s : 1: py_interpret_to_execute 0.05% : 0.000026s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000024s : 1: remove_dup_value 0.83% : 0.000410s : 1: renormalize.infer 7.73% : 0.003816s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000012s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000053s : 1: rewriter_after_opt_a 0.13% : 0.000067s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.19% : 0.000096s : 1: symbol_engine_optimizer 16.34% : 0.008070s : 1: task_emit 0.17% : 0.000084s : 1: tuple_transform 16.45% : 0.008123s : 1: type_inference 0.21% : 0.000103s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x6-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x6-kbk],max_mem:6.0M . TotalTime = 10.3408, [24] [bootstrap]: 0.00049608 [type_inference]: 0.00897662 [event_method]: 1.541e-05 [auto_monad]: 6.497e-05 [graph_reusing]: 6.43e-06 [inline]: 3.34001e-06 [add_attr]: 0.00455815, [1] [add_attr_with_inline]: 0.00454315, [1] [Cycle 1]: 6.529e-05, [2] [tag_attr]: 2e-05 [meta_addattr_fg_expand]: 4.33999e-06 [parallel-infer-symbol]: 3.6e-06 [pre_auto_parallel]: 3.405e-05 [insert-virtual-dataset]: 2.71999e-06 [parallel-infer-symbol-second]: 9.70002e-07 [dataset_repeat_opt]: 2.48e-06 [pipeline_split]: 1.76e-06 [optimize]: 0.00587379, [53] [py_interpret_to_execute]: 2.346e-05 [rewriter_before_opt_a]: 7.439e-05 [opt_a]: 0.00351251, [2] [Cycle 1]: 0.00281107, [45] [expand_dump_flag]: 3.13e-06 [switch_simplify]: 3.525e-05 [loop_unroll]: 2.046e-05 [a_1]: 0.00127457 [with_stream_mark]: 2.187e-05 [recompute_prepare]: 1.102e-05 [updatestate_depend_eliminate]: 4.35999e-06 [updatestate_assign_eliminate]: 4.19002e-06 [updatestate_loads_eliminate]: 3.35e-06 [parameter_eliminate]: 2.05002e-06 [a_2]: 8.427e-05 [accelerated_algorithm]: 6.75002e-06 [shard]: 2.43e-06 [meta_shard_fg_expand]: 2.15002e-06 [shard_inline]: 6.53e-06 [merge_send_recv]: 9.69e-06 [auto_parallel]: 9.51e-06 [parallel]: 3.121e-05 [flash_sp]: 1.12e-05 [merge_comm]: 4.26001e-06 [allreduce_fusion]: 3.75e-06 [matmul_add_comm_reduction]: 1.001e-05 [allreduce_slice_to_reducescatter]: 7.90023e-07 [virtual_shard_identity]: 7.9e-06 [virtual_dataset]: 6.26e-06 [get_grad_eliminate_]: 5.64998e-06 [virtual_output]: 6.12001e-06 [merge_forward]: 4.42e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 1.093e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.199e-05 [merge_recompute_call_nodes]: 1.74998e-06 [before_grad]: 1.081e-05 [set_forward_comm_id_for_comm_node_pass]: 3.86001e-06 [meta_fg_expand]: 2.93e-06 [flash_sp_send_recv_attached]: 2.48e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 1.096e-05 [a_after_grad]: 9.04998e-06 [renormalize]: 0.00074371 [add_forward_monad_depend]: 1.567e-05 [auto_monad_grad]: 2.42001e-06 [auto_monad_eliminator]: 1.669e-05 [cse]: 3.597e-05 [a_3]: 5.148e-05 [Cycle 2]: 0.00068733, [45] [expand_dump_flag]: 2.36e-06 [switch_simplify]: 8.96998e-06 [loop_unroll]: 6.10002e-06 [a_1]: 0.00012872 [with_stream_mark]: 1.657e-05 [recompute_prepare]: 6.62002e-06 [updatestate_depend_eliminate]: 3.44001e-06 [updatestate_assign_eliminate]: 3.16999e-06 [updatestate_loads_eliminate]: 3.78999e-06 [parameter_eliminate]: 1.26997e-06 [a_2]: 7.96e-05 [accelerated_algorithm]: 6.80998e-06 [shard]: 1.49998e-06 [meta_shard_fg_expand]: 1.62999e-06 [shard_inline]: 6.46999e-06 [merge_send_recv]: 7.53999e-06 [auto_parallel]: 8.21002e-06 [parallel]: 7.97e-06 [flash_sp]: 4.55999e-06 [merge_comm]: 3.68e-06 [allreduce_fusion]: 3.09999e-06 [matmul_add_comm_reduction]: 8.20999e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.88998e-06 [virtual_dataset]: 5.33002e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.42001e-06 [merge_forward]: 3.65998e-06 [cell_reuse_recompute_pass]: 2.16e-06 [offload_activation]: 9.00001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.196e-05 [merge_recompute_call_nodes]: 1.64998e-06 [before_grad]: 9.49e-06 [set_forward_comm_id_for_comm_node_pass]: 3.73001e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 1.30001e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 9.99001e-06 [a_after_grad]: 7.99002e-06 [renormalize]: 5.9983e-08 [add_forward_monad_depend]: 1.81e-06 [auto_monad_grad]: 1.74998e-06 [auto_monad_eliminator]: 7.58999e-06 [cse]: 1.753e-05 [a_3]: 3.333e-05 [py_interpret_to_execute_after_opt_a]: 1.349e-05 [slice_cell_reuse_recomputed_activation]: 2.37999e-06 [rewriter_after_opt_a]: 3.806e-05 [convert_after_rewriter]: 7.32002e-06 [order_py_execute_after_rewriter]: 5.89e-06 [mutable_eliminate]: 0.00072953 [opt_b]: 0.00021342, [1] [Cycle 1]: 0.00020321, [7] [b_1]: 0.00011808 [b_2]: 8.13999e-06 [updatestate_depend_eliminate]: 6.79999e-06 [updatestate_assign_eliminate]: 2.74001e-06 [updatestate_loads_eliminate]: 2.75002e-06 [renormalize]: 8.49977e-07 [cse]: 2.501e-05 [optimize_parallel_all_gather_comm]: 2.162e-05 [overlap_param_gather]: 2.24999e-06 [cconv]: 3.295e-05 [loop_unroll]: 0.00047676 [opt_after_cconv]: 0.00010746, [1] [Cycle 1]: 9.962e-05, [7] [c_1]: 2.799e-05 [parameter_eliminate]: 3.53999e-06 [updatestate_depend_eliminate]: 5.71e-06 [updatestate_assign_eliminate]: 3.00002e-06 [updatestate_loads_eliminate]: 2.59999e-06 [cse]: 2.018e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.803e-05 [tuple_transform]: 7.321e-05, [1] [Cycle 1]: 6.846e-05, [4] [d_1]: 4.049e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.56e-06 [partial_unused_args_eliminate]: 1.94e-06 [add_recomputation]: 5.891e-05 [cse_after_recomputation]: 2.245e-05, [1] [Cycle 1]: 1.708e-05, [1] [cse]: 1.156e-05 [environ_conv]: 1.004e-05 [swap_dp_allreduce_reducescatter]: 5.71e-06 [bias_add_comm_swap]: 3.08e-06 [label_micro_interleaved_index]: 4.90001e-06 [label_fine_grained_interleaved_index]: 2.74999e-06 [merge_cast_opt]: 1.66e-06 [slice_recompute_activation]: 2.54001e-06 [micro_interleaved_order_control]: 2.71999e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 1.00999e-06 [remove_cast_before_assign_add]: 1.27999e-06 [full_micro_interleaved_order_control]: 2.22001e-06 [reorder_send_recv_between_fp_bp]: 2.84999e-06 [comm_op_add_attrs]: 1.12999e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 1.20999e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.395e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 4.28999e-06 [overlap_recompute_and_grad_model_parallel]: 5.40001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.68e-06 [overlap_grad_ring_attention]: 4.46002e-06 [overlap_grad_flash_sp]: 2.247e-05 [begin_end_overlap_inline]: 5.40022e-07 [split_matmul_comm_elemetwise]: 2.34001e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.32e-06 [symbol_engine_optimizer]: 7.782e-05, [1] [Cycle 1]: 7.265e-05, [6] [build]: 4.02e-06 [elim_shapecalc]: 9.59e-06 [elim_not_effective]: 1.261e-05 [opt_reshape]: 6.60002e-06 [fold_const_symbol]: 1.032e-05 [renormalize]: 2.09984e-07 [detach_backward]: 2.39001e-06 [pipeline_parallel_scheduler]: 1.97999e-06 [auto_monad_reorder]: 1.772e-05 [get_jit_bprop_graph]: 2.02001e-06 [rewriter_after_jit_bprop_graph]: 3.97e-06 [opt_after_jit_grad]: 0.00050523 [validate]: 4.323e-05 [backend_pass]: 1.27999e-06 [task_emit]: 10.32 [execute]: 9.87001e-06 Sums bootstrap : 0.000496s : 0.00% type_inference : 0.008977s : 0.09% event_method : 0.000015s : 0.00% auto_monad : 0.000065s : 0.00% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000020s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000034s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.00% optimize.rewriter_before_opt_a : 0.000074s : 0.00% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000044s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.001403s : 0.01% optimize.opt_a.with_stream_mark : 0.000038s : 0.00% optimize.opt_a.recompute_prepare : 0.000018s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000164s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000017s : 0.00% optimize.opt_a.auto_parallel : 0.000018s : 0.00% optimize.opt_a.parallel : 0.000039s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000020s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000744s : 0.01% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.00% optimize.opt_a.cse : 0.000054s : 0.00% optimize.opt_a.a_3 : 0.000085s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000038s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000730s : 0.01% optimize.opt_b.b_1 : 0.000118s : 0.00% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000025s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000033s : 0.00% optimize.loop_unroll : 0.000477s : 0.00% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000020s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000018s : 0.00% optimize.tuple_transform.d_1 : 0.000040s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.00% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000022s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000505s : 0.00% validate : 0.000043s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 10.319955s : 99.85% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000247 26 17.58% : 0.000043s : 5: substitution.arithmetic_simplify 0.91% : 0.000002s : 2: substitution.elim_not_effective 0.69% : 0.000002s : 2: substitution.fold_const_symbol 2.38% : 0.000006s : 3: substitution.graph_param_transform 68.63% : 0.000169s : 3: substitution.inline 1.59% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.19% : 0.000005s : 4: substitution.remove_not_recompute_node 1.83% : 0.000005s : 2: substitution.replace_old_param 4.20% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.008913 2 91.06% : 0.008116s : 1: type_inference.infer 8.94% : 0.000797s : 1: type_inference.specialize ------[replace.] 0.000052 4 79.39% : 0.000041s : 3: replace.inline 20.61% : 0.000011s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000177 4 94.65% : 0.000167s : 3: match.inline 5.35% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000177 883 0.86% : 0.000002s : 9: predicate.accumulaten_eliminater 0.92% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.51% : 0.000001s : 6: predicate.addn_check_dump 0.94% : 0.000002s : 9: predicate.addn_zero_filter 0.82% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.36% : 0.000004s : 15: predicate.arithmetic_simplify 0.93% : 0.000002s : 9: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.87% : 0.000002s : 9: predicate.dict_set_item_eliminator 0.95% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.18% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.01% : 0.000002s : 12: predicate.environ_get_depend_swap 1.62% : 0.000003s : 18: predicate.environ_get_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.17% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.59% : 0.000005s : 13: predicate.float_depend_g_call 0.51% : 0.000001s : 6: predicate.float_environ_get_switch 0.83% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.72% : 0.000001s : 6: predicate.get_grad_eliminate 0.21% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.52% : 0.000001s : 6: predicate.incorporate_call_switch 6.37% : 0.000011s : 40: predicate.inline 0.81% : 0.000001s : 6: predicate.inline_without_move 0.46% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 6: predicate.less_batch_normalization 1.49% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 25: predicate.load_eliminater 1.01% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.98% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.95% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 9: predicate.minmaximum_grad 1.16% : 0.000002s : 3: predicate.mutable_eliminate 0.33% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 2.61% : 0.000005s : 13: predicate.partial_defer_inline 1.34% : 0.000002s : 13: predicate.partial_eliminate 0.84% : 0.000001s : 9: predicate.print_const_string_wrapper 0.61% : 0.000001s : 6: predicate.reduce_all_const_elim 1.59% : 0.000003s : 9: predicate.reduce_eliminate 2.21% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.60% : 0.000001s : 6: predicate.remove_not_recompute_node 1.19% : 0.000002s : 16: predicate.replace_applicator 0.71% : 0.000001s : 6: predicate.replace_old_param 0.36% : 0.000001s : 3: predicate.reset_defer_inline 0.93% : 0.000002s : 9: predicate.reshape_eliminate 0.66% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.80% : 0.000001s : 6: predicate.same_eliminate 0.41% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.94% : 0.000002s : 6: predicate.shard_identity_eliminate 0.89% : 0.000002s : 6: predicate.special_op_eliminate 0.80% : 0.000001s : 6: predicate.specialize_transform 1.13% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.25% : 0.000002s : 13: predicate.switch_defer_inline 1.87% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.79% : 0.000008s : 43: predicate.switch_simplify 1.46% : 0.000003s : 9: predicate.tile_eliminate 0.89% : 0.000002s : 9: predicate.transpose_eliminate 1.42% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.90% : 0.000007s : 22: predicate.tuple_list_get_item_eliminator 1.30% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.53% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.13% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.94% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 3: predicate.value_based_eliminate 0.67% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.63% : 0.000001s : 6: predicate.virtual_output_eliminate 0.28% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000507 8 39.75% : 0.000202s : 3: func_graph_cloner_run.FuncGraphClonerGraph 60.25% : 0.000306s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 10.353945 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.04% : 0.004565s : 1: add_attr 0.04% : 0.004547s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000064s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000071s : 1: auto_monad 0.00% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.01% : 0.000527s : 1: bootstrap 0.00% : 0.000037s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000017s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000014s : 1: environ_conv 0.00% : 0.000022s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.00% : 0.000487s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.01% : 0.000743s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000019s : 1: opt.transform.mutable_eliminate 0.02% : 0.001804s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000095s : 28: opt.transform.opt_b 0.00% : 0.000045s : 2: opt.transform.opt_trans_graph 0.00% : 0.000035s : 4: opt.transform.symbol_engine_opt 0.03% : 0.003516s : 1: opt_a 0.00% : 0.000111s : 1: opt_after_cconv 0.00% : 0.000515s : 1: opt_after_jit_grad 0.00% : 0.000218s : 1: opt_b 0.06% : 0.005879s : 1: optimize 0.00% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000038s : 1: pre_auto_parallel 0.00% : 0.000028s : 1: py_interpret_to_execute 0.00% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000022s : 1: remove_dup_value 0.00% : 0.000366s : 1: renormalize.infer 0.00% : 0.000369s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000043s : 1: rewriter_after_opt_a 0.00% : 0.000079s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000081s : 1: symbol_engine_optimizer 99.67% : 10.319982s : 1: task_emit 0.00% : 0.000076s : 1: tuple_transform 0.09% : 0.008997s : 1: type_inference 0.00% : 0.000070s : 1: validate TotalTime = 0.0979126, [24] [bootstrap]: 0.00046326 [type_inference]: 0.00630784 [event_method]: 1.302e-05 [auto_monad]: 6.094e-05 [graph_reusing]: 6.23998e-06 [inline]: 2.44001e-06 [add_attr]: 0.00315025, [1] [add_attr_with_inline]: 0.0031413, [1] [Cycle 1]: 4.904e-05, [2] [tag_attr]: 1.478e-05 [meta_addattr_fg_expand]: 3.97002e-06 [parallel-infer-symbol]: 3.3e-06 [pre_auto_parallel]: 2.598e-05 [insert-virtual-dataset]: 2.53e-06 [parallel-infer-symbol-second]: 8.60018e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.97999e-06 [optimize]: 0.00405492, [53] [py_interpret_to_execute]: 2.084e-05 [rewriter_before_opt_a]: 5.241e-05 [opt_a]: 0.00211737, [2] [Cycle 1]: 0.00150032, [45] [expand_dump_flag]: 3.20002e-06 [switch_simplify]: 3e-05 [loop_unroll]: 1.691e-05 [a_1]: 0.00035874 [with_stream_mark]: 1.573e-05 [recompute_prepare]: 7.63999e-06 [updatestate_depend_eliminate]: 3.75998e-06 [updatestate_assign_eliminate]: 3.46999e-06 [updatestate_loads_eliminate]: 3.11001e-06 [parameter_eliminate]: 1.76998e-06 [a_2]: 8.044e-05 [accelerated_algorithm]: 6.96001e-06 [shard]: 2.44001e-06 [meta_shard_fg_expand]: 1.48002e-06 [shard_inline]: 6.36e-06 [merge_send_recv]: 9.13002e-06 [auto_parallel]: 6.16e-06 [parallel]: 1.938e-05 [flash_sp]: 7.27997e-06 [merge_comm]: 3.95e-06 [allreduce_fusion]: 3.95998e-06 [matmul_add_comm_reduction]: 9.07999e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 7.56999e-06 [virtual_dataset]: 5.84999e-06 [get_grad_eliminate_]: 5.66e-06 [virtual_output]: 5.92999e-06 [merge_forward]: 4.05e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.97001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.185e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 9.97999e-06 [set_forward_comm_id_for_comm_node_pass]: 4.16001e-06 [meta_fg_expand]: 2.92002e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 2.27001e-06 [after_resolve]: 9.19e-06 [a_after_grad]: 8.57998e-06 [renormalize]: 0.00047401 [add_forward_monad_depend]: 4.62e-06 [auto_monad_grad]: 2.36998e-06 [auto_monad_eliminator]: 1.475e-05 [cse]: 3.145e-05 [a_3]: 4.2e-05 [Cycle 2]: 0.00060692, [45] [expand_dump_flag]: 8.79983e-07 [switch_simplify]: 7.00002e-06 [loop_unroll]: 5.67001e-06 [a_1]: 0.00011548 [with_stream_mark]: 1.27e-05 [recompute_prepare]: 5.82001e-06 [updatestate_depend_eliminate]: 3.45e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 7.032e-05 [accelerated_algorithm]: 5.74e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.35999e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 5.07e-06 [auto_parallel]: 5.35001e-06 [parallel]: 4.30999e-06 [flash_sp]: 3.25e-06 [merge_comm]: 3.4e-06 [allreduce_fusion]: 2.98e-06 [matmul_add_comm_reduction]: 5.09003e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.27001e-06 [virtual_dataset]: 5.59e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 4.96997e-06 [merge_forward]: 2.69999e-06 [cell_reuse_recompute_pass]: 1.39998e-06 [offload_activation]: 6.56e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.059e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 8.87999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.39001e-06 [meta_fg_expand]: 1.81e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 8.22998e-06 [a_after_grad]: 7.78001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.42e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.84001e-06 [cse]: 1.571e-05 [a_3]: 3.27e-05 [py_interpret_to_execute_after_opt_a]: 8.22998e-06 [slice_cell_reuse_recomputed_activation]: 2.46e-06 [rewriter_after_opt_a]: 3.367e-05 [convert_after_rewriter]: 6.16e-06 [order_py_execute_after_rewriter]: 4.63001e-06 [mutable_eliminate]: 0.00051261 [opt_b]: 0.00018996, [1] [Cycle 1]: 0.00018364, [7] [b_1]: 0.00011153 [b_2]: 7.62002e-06 [updatestate_depend_eliminate]: 5.82001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.4002e-07 [cse]: 1.836e-05 [optimize_parallel_all_gather_comm]: 1.675e-05 [overlap_param_gather]: 1.92001e-06 [cconv]: 2.433e-05 [loop_unroll]: 0.00042611 [opt_after_cconv]: 9.734e-05, [1] [Cycle 1]: 9.113e-05, [7] [c_1]: 2.634e-05 [parameter_eliminate]: 2.45002e-06 [updatestate_depend_eliminate]: 5.34003e-06 [updatestate_assign_eliminate]: 2.73998e-06 [updatestate_loads_eliminate]: 2.39999e-06 [cse]: 1.77e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.586e-05 [tuple_transform]: 6.887e-05, [1] [Cycle 1]: 6.434e-05, [4] [d_1]: 3.749e-05 [none_parameter_eliminate]: 1.67999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.58998e-06 [partial_unused_args_eliminate]: 1.61998e-06 [add_recomputation]: 4.005e-05 [cse_after_recomputation]: 2.094e-05, [1] [Cycle 1]: 1.664e-05, [1] [cse]: 1.139e-05 [environ_conv]: 5.72001e-06 [swap_dp_allreduce_reducescatter]: 5.12999e-06 [bias_add_comm_swap]: 2.58003e-06 [label_micro_interleaved_index]: 4.35999e-06 [label_fine_grained_interleaved_index]: 2.70002e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.34999e-06 [micro_interleaved_order_control]: 2.17999e-06 [assign_add_opt]: 1.49e-06 [ForceFp32Comm]: 1.00999e-06 [remove_cast_before_assign_add]: 8.70001e-07 [full_micro_interleaved_order_control]: 2.22999e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.10999e-06 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.09003e-06 [overlap_opt_shard_in_pipeline]: 1.59998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.90001e-06 [control_data_broadcast_order]: 1.346e-05 [grouped_pairwise_exchange_alltoall]: 1.65001e-06 [offloading_packed_experts]: 4.18001e-06 [overlap_recompute_and_grad_model_parallel]: 5.03002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.82999e-06 [overlap_recompute_comm]: 2.32999e-06 [overlap_grad_ring_attention]: 4.44002e-06 [overlap_grad_flash_sp]: 1.843e-05 [begin_end_overlap_inline]: 7.00005e-07 [split_matmul_comm_elemetwise]: 2.78e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 7.079e-05, [1] [Cycle 1]: 6.649e-05, [6] [build]: 2.93998e-06 [elim_shapecalc]: 8.55001e-06 [elim_not_effective]: 1.167e-05 [opt_reshape]: 5.90002e-06 [fold_const_symbol]: 9.32001e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.91e-06 [pipeline_parallel_scheduler]: 1.51002e-06 [auto_monad_reorder]: 1.63e-05 [get_jit_bprop_graph]: 1.79998e-06 [rewriter_after_jit_bprop_graph]: 3.85e-06 [opt_after_jit_grad]: 0.00046425 [validate]: 3.891e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.0830518 [execute]: 1.033e-05 Sums bootstrap : 0.000463s : 0.49% type_inference : 0.006308s : 6.73% event_method : 0.000013s : 0.01% auto_monad : 0.000061s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000052s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000037s : 0.04% optimize.opt_a.loop_unroll : 0.000023s : 0.02% optimize.opt_a.a_1 : 0.000474s : 0.51% optimize.opt_a.with_stream_mark : 0.000028s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000151s : 0.16% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000014s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000024s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000017s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000474s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.02% optimize.opt_a.cse : 0.000047s : 0.05% optimize.opt_a.a_3 : 0.000075s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.04% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000513s : 0.55% optimize.opt_b.b_1 : 0.000112s : 0.12% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000426s : 0.45% optimize.opt_after_cconv.c_1 : 0.000026s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.02% optimize.tuple_transform.d_1 : 0.000037s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000040s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000464s : 0.50% validate : 0.000039s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.083052s : 88.60% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000147 24 19.78% : 0.000029s : 4: substitution.arithmetic_simplify 1.28% : 0.000002s : 2: substitution.elim_not_effective 0.96% : 0.000001s : 2: substitution.fold_const_symbol 3.62% : 0.000005s : 3: substitution.graph_param_transform 66.79% : 0.000098s : 3: substitution.inline 2.32% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.13% : 0.000005s : 4: substitution.remove_not_recompute_node 2.12% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006259 2 92.23% : 0.005773s : 1: type_inference.infer 7.77% : 0.000486s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000096 3 100.00% : 0.000096s : 3: match.inline ------[predicate.] 0.000150 815 0.83% : 0.000001s : 8: predicate.accumulaten_eliminater 0.95% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000001s : 8: predicate.addn_zero_filter 0.77% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.32% : 0.000003s : 14: predicate.arithmetic_simplify 1.06% : 0.000002s : 8: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.67% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.98% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_depend_swap 1.75% : 0.000003s : 17: predicate.environ_get_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.14% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.14% : 0.000003s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 6: predicate.float_environ_get_switch 0.91% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.37% : 0.000001s : 3: predicate.graph_param_transform 0.73% : 0.000001s : 6: predicate.incorporate_call 0.62% : 0.000001s : 6: predicate.incorporate_call_switch 6.43% : 0.000010s : 37: predicate.inline 0.95% : 0.000001s : 6: predicate.inline_without_move 0.56% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 6: predicate.less_batch_normalization 1.52% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.26% : 0.000003s : 22: predicate.load_eliminater 1.10% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.93% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.67% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 6: predicate.merge_addn 0.65% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.22% : 0.000002s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.59% : 0.000001s : 3: predicate.parallel_virtual_node 1.49% : 0.000002s : 11: predicate.partial_defer_inline 1.29% : 0.000002s : 11: predicate.partial_eliminate 0.82% : 0.000001s : 8: predicate.print_const_string_wrapper 0.64% : 0.000001s : 6: predicate.reduce_all_const_elim 1.41% : 0.000002s : 8: predicate.reduce_eliminate 2.18% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.65% : 0.000001s : 6: predicate.remove_not_recompute_node 1.20% : 0.000002s : 14: predicate.replace_applicator 0.69% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.87% : 0.000001s : 8: predicate.reshape_eliminate 0.81% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 1.05% : 0.000002s : 6: predicate.same_eliminate 0.51% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 6: predicate.shard_identity_eliminate 0.77% : 0.000001s : 6: predicate.special_op_eliminate 0.93% : 0.000001s : 6: predicate.specialize_transform 0.98% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.21% : 0.000002s : 11: predicate.switch_defer_inline 1.84% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.78% : 0.000007s : 38: predicate.switch_simplify 1.08% : 0.000002s : 8: predicate.tile_eliminate 0.85% : 0.000001s : 8: predicate.transpose_eliminate 1.58% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.51% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 2.93% : 0.000004s : 20: predicate.tuple_list_get_item_eliminator 1.54% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.15% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.93% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.60% : 0.000001s : 3: predicate.value_based_eliminate 0.87% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000309 7 39.97% : 0.000123s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.03% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.106575 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.96% : 0.003155s : 1: add_attr 2.95% : 0.003145s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000044s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000066s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.47% : 0.000500s : 1: bootstrap 0.03% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000009s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000018s : 1: event_method 0.02% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.41% : 0.000435s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.49% : 0.000523s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.78% : 0.000837s : 78: opt.transform.opt_a 0.02% : 0.000025s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000042s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.99% : 0.002120s : 1: opt_a 0.09% : 0.000101s : 1: opt_after_cconv 0.44% : 0.000474s : 1: opt_after_jit_grad 0.18% : 0.000193s : 1: opt_b 3.81% : 0.004059s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000030s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000019s : 1: remove_dup_value 0.24% : 0.000254s : 1: renormalize.infer 0.20% : 0.000213s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000037s : 1: rewriter_after_opt_a 0.05% : 0.000056s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000074s : 1: symbol_engine_optimizer 77.95% : 0.083075s : 1: task_emit 0.07% : 0.000072s : 1: tuple_transform 5.94% : 0.006325s : 1: type_inference 0.06% : 0.000065s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x6-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x6-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x7-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x7-pynative],max_mem:6.0M TotalTime = 0.0230848, [24] [bootstrap]: 0.00049743 [type_inference]: 0.00647823 [event_method]: 1.417e-05 [auto_monad]: 6.058e-05 [graph_reusing]: 5.74999e-06 [inline]: 2.54001e-06 [add_attr]: 0.0039442, [1] [add_attr_with_inline]: 0.00393069, [1] [Cycle 1]: 6.175e-05, [2] [tag_attr]: 1.773e-05 [meta_addattr_fg_expand]: 4.3e-06 [parallel-infer-symbol]: 3.91999e-06 [pre_auto_parallel]: 3.115e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.41998e-06 [pipeline_split]: 2.04999e-06 [optimize]: 0.00466039, [53] [py_interpret_to_execute]: 2.335e-05 [rewriter_before_opt_a]: 6.757e-05 [opt_a]: 0.00259439, [2] [Cycle 1]: 0.0018688, [45] [expand_dump_flag]: 2.67001e-06 [switch_simplify]: 3.527e-05 [loop_unroll]: 2.082e-05 [a_1]: 0.00047671 [with_stream_mark]: 1.6e-05 [recompute_prepare]: 9.85002e-06 [updatestate_depend_eliminate]: 4.46002e-06 [updatestate_assign_eliminate]: 4.25999e-06 [updatestate_loads_eliminate]: 3.59002e-06 [parameter_eliminate]: 1.96003e-06 [a_2]: 8.293e-05 [accelerated_algorithm]: 7.77e-06 [shard]: 2.29001e-06 [meta_shard_fg_expand]: 2.16e-06 [shard_inline]: 6.35002e-06 [merge_send_recv]: 9.51e-06 [auto_parallel]: 7.3e-06 [parallel]: 3.018e-05 [flash_sp]: 9.34998e-06 [merge_comm]: 5.05999e-06 [allreduce_fusion]: 3.7e-06 [matmul_add_comm_reduction]: 9.91e-06 [allreduce_slice_to_reducescatter]: 9.50007e-07 [virtual_shard_identity]: 8.76002e-06 [virtual_dataset]: 6.42001e-06 [get_grad_eliminate_]: 6.30002e-06 [virtual_output]: 6.61e-06 [merge_forward]: 4.54998e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 1.104e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.373e-05 [merge_recompute_call_nodes]: 1.64998e-06 [before_grad]: 1.063e-05 [set_forward_comm_id_for_comm_node_pass]: 4.35999e-06 [meta_fg_expand]: 3.23e-06 [flash_sp_send_recv_attached]: 2.69999e-06 [receive_attached]: 2.11e-06 [after_resolve]: 1.063e-05 [a_after_grad]: 9.15999e-06 [renormalize]: 0.00063513 [add_forward_monad_depend]: 1.241e-05 [auto_monad_grad]: 2.69999e-06 [auto_monad_eliminator]: 1.568e-05 [cse]: 3.381e-05 [a_3]: 4.452e-05 [Cycle 2]: 0.00071413, [45] [expand_dump_flag]: 2.04e-06 [switch_simplify]: 8.3e-06 [loop_unroll]: 5.75001e-06 [a_1]: 0.00017517 [with_stream_mark]: 1.368e-05 [recompute_prepare]: 7.18998e-06 [updatestate_depend_eliminate]: 3.65e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.83998e-06 [parameter_eliminate]: 1.25001e-06 [a_2]: 7.355e-05 [accelerated_algorithm]: 6.51e-06 [shard]: 9.99979e-07 [meta_shard_fg_expand]: 1.37999e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 5.30999e-06 [auto_parallel]: 6.68e-06 [parallel]: 5.51e-06 [flash_sp]: 3.56999e-06 [merge_comm]: 3.38999e-06 [allreduce_fusion]: 2.95002e-06 [matmul_add_comm_reduction]: 5.52999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.54001e-06 [virtual_dataset]: 5.54e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.32999e-06 [merge_forward]: 3.43e-06 [cell_reuse_recompute_pass]: 1.54998e-06 [offload_activation]: 7.42002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.143e-05 [merge_recompute_call_nodes]: 1.25999e-06 [before_grad]: 9.24e-06 [set_forward_comm_id_for_comm_node_pass]: 3.66999e-06 [meta_fg_expand]: 2.21e-06 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.39e-06 [after_resolve]: 1.035e-05 [a_after_grad]: 8.25e-06 [renormalize]: 7.00238e-08 [add_forward_monad_depend]: 1.57001e-06 [auto_monad_grad]: 1.40001e-06 [auto_monad_eliminator]: 9.29e-06 [cse]: 2.234e-05 [a_3]: 3.444e-05 [py_interpret_to_execute_after_opt_a]: 1.03e-05 [slice_cell_reuse_recomputed_activation]: 2.35002e-06 [rewriter_after_opt_a]: 3.763e-05 [convert_after_rewriter]: 6.54999e-06 [order_py_execute_after_rewriter]: 4.85001e-06 [mutable_eliminate]: 0.00057069 [opt_b]: 0.00019747, [1] [Cycle 1]: 0.00019053, [7] [b_1]: 0.00011176 [b_2]: 7.36999e-06 [updatestate_depend_eliminate]: 6.56e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.71999e-06 [renormalize]: 5.69999e-07 [cse]: 2.202e-05 [optimize_parallel_all_gather_comm]: 1.722e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.416e-05 [loop_unroll]: 0.00042937 [opt_after_cconv]: 9.929e-05, [1] [Cycle 1]: 9.284e-05, [7] [c_1]: 2.648e-05 [parameter_eliminate]: 2.89001e-06 [updatestate_depend_eliminate]: 5.26002e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.41e-06 [cse]: 1.791e-05 [renormalize]: 5.10016e-07 [remove_dup_value]: 1.579e-05 [tuple_transform]: 7.25e-05, [1] [Cycle 1]: 6.782e-05, [4] [d_1]: 3.974e-05 [none_parameter_eliminate]: 1.84e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.89999e-06 [partial_unused_args_eliminate]: 2.02999e-06 [add_recomputation]: 5.107e-05 [cse_after_recomputation]: 2.191e-05, [1] [Cycle 1]: 1.738e-05, [1] [cse]: 1.169e-05 [environ_conv]: 8.18999e-06 [swap_dp_allreduce_reducescatter]: 5.29e-06 [bias_add_comm_swap]: 3.44001e-06 [label_micro_interleaved_index]: 4.45e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.34998e-06 [slice_recompute_activation]: 2.49001e-06 [micro_interleaved_order_control]: 2.49999e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.90023e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 3.01001e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.24998e-06 [overlap_opt_shard_in_pipeline]: 1.21002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67001e-06 [control_data_broadcast_order]: 1.308e-05 [grouped_pairwise_exchange_alltoall]: 1.72999e-06 [offloading_packed_experts]: 3.97998e-06 [overlap_recompute_and_grad_model_parallel]: 4.48001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.44001e-06 [overlap_grad_ring_attention]: 4.49002e-06 [overlap_grad_flash_sp]: 1.956e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.58e-06 [split_layernorm_comm]: 1.72001e-06 [handle_group_info]: 1.07998e-06 [symbol_engine_optimizer]: 7.194e-05, [1] [Cycle 1]: 6.742e-05, [6] [build]: 2.79999e-06 [elim_shapecalc]: 9.49e-06 [elim_not_effective]: 1.218e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 9.30001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.90001e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.62e-05 [get_jit_bprop_graph]: 1.60001e-06 [rewriter_after_jit_bprop_graph]: 3.7e-06 [opt_after_jit_grad]: 0.00046202 [validate]: 4e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.00663489 [execute]: 7.47002e-06 Sums bootstrap : 0.000497s : 2.75% type_inference : 0.006478s : 35.84% event_method : 0.000014s : 0.08% auto_monad : 0.000061s : 0.34% graph_reusing : 0.000006s : 0.03% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000031s : 0.17% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000023s : 0.13% optimize.rewriter_before_opt_a : 0.000068s : 0.37% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000044s : 0.24% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000652s : 3.61% optimize.opt_a.with_stream_mark : 0.000030s : 0.16% optimize.opt_a.recompute_prepare : 0.000017s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000156s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000015s : 0.08% optimize.opt_a.auto_parallel : 0.000014s : 0.08% optimize.opt_a.parallel : 0.000036s : 0.20% optimize.opt_a.flash_sp : 0.000013s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.06% optimize.opt_a.virtual_output : 0.000012s : 0.07% optimize.opt_a.merge_forward : 0.000008s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000018s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000020s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000635s : 3.51% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.08% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.14% optimize.opt_a.cse : 0.000056s : 0.31% optimize.opt_a.a_3 : 0.000079s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000038s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000571s : 3.16% optimize.opt_b.b_1 : 0.000112s : 0.62% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000022s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.13% optimize.loop_unroll : 0.000429s : 2.38% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000008s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000020s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000462s : 2.56% validate : 0.000040s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006635s : 36.71% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000190 26 19.79% : 0.000038s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.31% : 0.000006s : 3: substitution.graph_param_transform 63.80% : 0.000121s : 3: substitution.inline 1.90% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.39% : 0.000005s : 4: substitution.remove_not_recompute_node 2.15% : 0.000004s : 2: substitution.replace_old_param 4.80% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006430 2 90.19% : 0.005799s : 1: type_inference.infer 9.81% : 0.000630s : 1: type_inference.specialize ------[replace.] 0.000042 4 77.61% : 0.000033s : 3: replace.inline 22.39% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000127 4 93.45% : 0.000119s : 3: match.inline 6.55% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 883 0.92% : 0.000002s : 9: predicate.accumulaten_eliminater 0.85% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 9: predicate.addn_zero_filter 0.81% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.44% : 0.000004s : 15: predicate.arithmetic_simplify 0.90% : 0.000001s : 9: predicate.cast_eliminate 0.65% : 0.000001s : 6: predicate.check_bprop_eliminate 0.59% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.93% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.97% : 0.000002s : 9: predicate.dict_set_item_eliminator 0.94% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.38% : 0.000001s : 3: predicate.elim_not_effective 0.38% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_depend_swap 1.72% : 0.000003s : 18: predicate.environ_get_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.47% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.81% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.65% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.71% : 0.000001s : 6: predicate.incorporate_call 0.55% : 0.000001s : 6: predicate.incorporate_call_switch 6.58% : 0.000011s : 40: predicate.inline 0.89% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.75% : 0.000001s : 6: predicate.less_batch_normalization 1.72% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 25: predicate.load_eliminater 1.12% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.08% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 9: predicate.minmaximum_grad 1.14% : 0.000002s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.42% : 0.000001s : 3: predicate.parallel_virtual_node 1.78% : 0.000003s : 13: predicate.partial_defer_inline 1.41% : 0.000002s : 13: predicate.partial_eliminate 1.01% : 0.000002s : 9: predicate.print_const_string_wrapper 0.58% : 0.000001s : 6: predicate.reduce_all_const_elim 1.14% : 0.000002s : 9: predicate.reduce_eliminate 2.26% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 6: predicate.remove_not_recompute_node 1.27% : 0.000002s : 16: predicate.replace_applicator 0.71% : 0.000001s : 6: predicate.replace_old_param 0.26% : 0.000000s : 3: predicate.reset_defer_inline 0.92% : 0.000002s : 9: predicate.reshape_eliminate 0.58% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 3: predicate.row_tensor_eliminate 0.82% : 0.000001s : 6: predicate.same_eliminate 0.43% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 6: predicate.shard_identity_eliminate 0.89% : 0.000001s : 6: predicate.special_op_eliminate 0.85% : 0.000001s : 6: predicate.specialize_transform 1.19% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000002s : 6: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 13: predicate.switch_defer_inline 1.90% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.80% : 0.000008s : 43: predicate.switch_simplify 0.93% : 0.000002s : 9: predicate.tile_eliminate 0.87% : 0.000001s : 9: predicate.transpose_eliminate 1.46% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.67% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.35% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.03% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.49% : 0.000001s : 3: predicate.value_based_eliminate 0.67% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000380 8 41.38% : 0.000157s : 3: func_graph_cloner_run.FuncGraphClonerGraph 58.62% : 0.000223s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.033452 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.81% : 0.003949s : 1: add_attr 11.76% : 0.003935s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000066s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.57% : 0.000525s : 1: bootstrap 0.08% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000012s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000014s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.31% : 0.000439s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000581s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 2.96% : 0.000989s : 78: opt.transform.opt_a 0.07% : 0.000025s : 1: opt.transform.opt_after_cconv 0.06% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000090s : 28: opt.transform.opt_b 0.13% : 0.000045s : 2: opt.transform.opt_trans_graph 0.10% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.77% : 0.002598s : 1: opt_a 0.31% : 0.000103s : 1: opt_after_cconv 1.41% : 0.000472s : 1: opt_after_jit_grad 0.60% : 0.000201s : 1: opt_b 13.94% : 0.004665s : 1: optimize 0.06% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000036s : 1: pre_auto_parallel 0.08% : 0.000028s : 1: py_interpret_to_execute 0.04% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 1.00% : 0.000335s : 1: renormalize.infer 0.87% : 0.000290s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000044s : 1: rewriter_after_opt_a 0.21% : 0.000072s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000075s : 1: symbol_engine_optimizer 19.87% : 0.006648s : 1: task_emit 0.23% : 0.000076s : 1: tuple_transform 19.42% : 0.006496s : 1: type_inference 0.20% : 0.000068s : 1: validate TotalTime = 0.0215634, [24] [bootstrap]: 0.00046328 [type_inference]: 0.00626475 [event_method]: 1.316e-05 [auto_monad]: 6.128e-05 [graph_reusing]: 5.52999e-06 [inline]: 2.29999e-06 [add_attr]: 0.00314258, [1] [add_attr_with_inline]: 0.00313323, [1] [Cycle 1]: 5.43e-05, [2] [tag_attr]: 1.512e-05 [meta_addattr_fg_expand]: 4.53999e-06 [parallel-infer-symbol]: 3.48e-06 [pre_auto_parallel]: 2.774e-05 [insert-virtual-dataset]: 3.2e-06 [parallel-infer-symbol-second]: 9.89996e-07 [dataset_repeat_opt]: 2.36998e-06 [pipeline_split]: 1.85001e-06 [optimize]: 0.00425945, [53] [py_interpret_to_execute]: 2.29e-05 [rewriter_before_opt_a]: 5.282e-05 [opt_a]: 0.00227356, [2] [Cycle 1]: 0.00164779, [45] [expand_dump_flag]: 3.06999e-06 [switch_simplify]: 2.942e-05 [loop_unroll]: 1.778e-05 [a_1]: 0.00038203 [with_stream_mark]: 1.83e-05 [recompute_prepare]: 7.64002e-06 [updatestate_depend_eliminate]: 4.22e-06 [updatestate_assign_eliminate]: 3.93001e-06 [updatestate_loads_eliminate]: 3.13998e-06 [parameter_eliminate]: 1.89e-06 [a_2]: 8.09e-05 [accelerated_algorithm]: 6.71999e-06 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 1.91e-06 [shard_inline]: 6.11e-06 [merge_send_recv]: 9.12999e-06 [auto_parallel]: 6.38998e-06 [parallel]: 1.953e-05 [flash_sp]: 9.05999e-06 [merge_comm]: 3.97998e-06 [allreduce_fusion]: 3.55e-06 [matmul_add_comm_reduction]: 9.02999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 8.02003e-06 [virtual_dataset]: 6.33e-06 [get_grad_eliminate_]: 5.84e-06 [virtual_output]: 5.96998e-06 [merge_forward]: 4.37e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 1.128e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.165e-05 [merge_recompute_call_nodes]: 2.02999e-06 [before_grad]: 9.86998e-06 [set_forward_comm_id_for_comm_node_pass]: 4e-06 [meta_fg_expand]: 2.68e-06 [flash_sp_send_recv_attached]: 2.47001e-06 [receive_attached]: 2.04e-06 [after_resolve]: 9.47001e-06 [a_after_grad]: 8.45999e-06 [renormalize]: 0.00058889 [add_forward_monad_depend]: 4.94003e-06 [auto_monad_grad]: 2.04999e-06 [auto_monad_eliminator]: 1.572e-05 [cse]: 3.167e-05 [a_3]: 4.26e-05 [Cycle 2]: 0.00061538, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 7e-06 [loop_unroll]: 5.61e-06 [a_1]: 0.00011611 [with_stream_mark]: 1.109e-05 [recompute_prepare]: 6.33e-06 [updatestate_depend_eliminate]: 3.22002e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.77002e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 7.157e-05 [accelerated_algorithm]: 5.77001e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 4.95001e-06 [auto_parallel]: 6.02999e-06 [parallel]: 5.15001e-06 [flash_sp]: 4.27e-06 [merge_comm]: 3.38e-06 [allreduce_fusion]: 3.43e-06 [matmul_add_comm_reduction]: 8.32e-06 [allreduce_slice_to_reducescatter]: 4.19997e-07 [virtual_shard_identity]: 6.41e-06 [virtual_dataset]: 5.72001e-06 [get_grad_eliminate_]: 5.28002e-06 [virtual_output]: 5.12e-06 [merge_forward]: 3.09999e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 7.16001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.078e-05 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 8.65999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.52002e-06 [meta_fg_expand]: 1.97001e-06 [flash_sp_send_recv_attached]: 8.40024e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.49998e-06 [a_after_grad]: 7.94002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 6.66999e-06 [cse]: 1.466e-05 [a_3]: 3.28e-05 [py_interpret_to_execute_after_opt_a]: 8.00999e-06 [slice_cell_reuse_recomputed_activation]: 2.39001e-06 [rewriter_after_opt_a]: 3.731e-05 [convert_after_rewriter]: 6.69999e-06 [order_py_execute_after_rewriter]: 5.18002e-06 [mutable_eliminate]: 0.00053103 [opt_b]: 0.00019097, [1] [Cycle 1]: 0.00018415, [7] [b_1]: 0.00011063 [b_2]: 7.01001e-06 [updatestate_depend_eliminate]: 5.56e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.46e-06 [renormalize]: 4.19997e-07 [cse]: 1.941e-05 [optimize_parallel_all_gather_comm]: 1.701e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.561e-05 [loop_unroll]: 0.00043155 [opt_after_cconv]: 9.85e-05, [1] [Cycle 1]: 9.264e-05, [7] [c_1]: 2.622e-05 [parameter_eliminate]: 2.43e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.73e-06 [updatestate_loads_eliminate]: 2.78998e-06 [cse]: 1.825e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.62e-05 [tuple_transform]: 6.997e-05, [1] [Cycle 1]: 6.51e-05, [4] [d_1]: 3.833e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.34999e-06 [partial_unused_args_eliminate]: 2.21e-06 [add_recomputation]: 4.668e-05 [cse_after_recomputation]: 2.168e-05, [1] [Cycle 1]: 1.71e-05, [1] [cse]: 1.163e-05 [environ_conv]: 5.97001e-06 [swap_dp_allreduce_reducescatter]: 5.32999e-06 [bias_add_comm_swap]: 2.93e-06 [label_micro_interleaved_index]: 4.62e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.55999e-06 [slice_recompute_activation]: 2.20002e-06 [micro_interleaved_order_control]: 2.91999e-06 [assign_add_opt]: 1.43002e-06 [ForceFp32Comm]: 1.25001e-06 [remove_cast_before_assign_add]: 1.16002e-06 [full_micro_interleaved_order_control]: 2.41998e-06 [reorder_send_recv_between_fp_bp]: 2.84001e-06 [comm_op_add_attrs]: 1.24e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.25001e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.50001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.281e-05 [grouped_pairwise_exchange_alltoall]: 2.17999e-06 [offloading_packed_experts]: 3.78001e-06 [overlap_recompute_and_grad_model_parallel]: 5.10999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.32999e-06 [overlap_grad_ring_attention]: 4.25e-06 [overlap_grad_flash_sp]: 1.846e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.46e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 1.20001e-06 [symbol_engine_optimizer]: 7.263e-05, [1] [Cycle 1]: 6.823e-05, [6] [build]: 2.69001e-06 [elim_shapecalc]: 8.80999e-06 [elim_not_effective]: 1.245e-05 [opt_reshape]: 6.36e-06 [fold_const_symbol]: 9.46e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.86e-06 [auto_monad_reorder]: 7.66e-05 [get_jit_bprop_graph]: 2.06e-06 [rewriter_after_jit_bprop_graph]: 4.05e-06 [opt_after_jit_grad]: 0.00047662 [validate]: 4.224e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.00645779 [execute]: 9.79999e-06 Sums bootstrap : 0.000463s : 2.67% type_inference : 0.006265s : 36.04% event_method : 0.000013s : 0.08% auto_monad : 0.000061s : 0.35% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000023s : 0.13% optimize.rewriter_before_opt_a : 0.000053s : 0.30% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000036s : 0.21% optimize.opt_a.loop_unroll : 0.000023s : 0.13% optimize.opt_a.a_1 : 0.000498s : 2.87% optimize.opt_a.with_stream_mark : 0.000029s : 0.17% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000152s : 0.88% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000025s : 0.14% optimize.opt_a.flash_sp : 0.000013s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000018s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000016s : 0.09% optimize.opt_a.renormalize : 0.000589s : 3.39% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.13% optimize.opt_a.cse : 0.000046s : 0.27% optimize.opt_a.a_3 : 0.000075s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000037s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000531s : 3.06% optimize.opt_b.b_1 : 0.000111s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000026s : 0.15% optimize.loop_unroll : 0.000432s : 2.48% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.27% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000077s : 0.44% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000477s : 2.74% validate : 0.000042s : 0.24% backend_pass : 0.000001s : 0.01% task_emit : 0.006458s : 37.15% execute : 0.000010s : 0.06% Time group info: ------[substitution.] 0.000163 24 18.55% : 0.000030s : 4: substitution.arithmetic_simplify 1.22% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.71% : 0.000006s : 3: substitution.graph_param_transform 69.07% : 0.000113s : 3: substitution.inline 1.80% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.79% : 0.000005s : 4: substitution.remove_not_recompute_node 2.06% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006215 2 91.50% : 0.005687s : 1: type_inference.infer 8.50% : 0.000528s : 1: type_inference.specialize ------[replace.] 0.000030 3 100.00% : 0.000030s : 3: replace.inline ------[match.] 0.000111 3 100.00% : 0.000111s : 3: match.inline ------[predicate.] 0.000151 815 0.87% : 0.000001s : 8: predicate.accumulaten_eliminater 1.13% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000001s : 8: predicate.addn_zero_filter 0.79% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 14: predicate.arithmetic_simplify 0.85% : 0.000001s : 8: predicate.cast_eliminate 0.73% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.60% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.42% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_depend_swap 1.74% : 0.000003s : 17: predicate.environ_get_eliminate 1.13% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.15% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.27% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 6: predicate.float_environ_get_switch 0.93% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.79% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.69% : 0.000001s : 6: predicate.incorporate_call 0.64% : 0.000001s : 6: predicate.incorporate_call_switch 6.11% : 0.000009s : 37: predicate.inline 0.97% : 0.000001s : 6: predicate.inline_without_move 0.43% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 6: predicate.less_batch_normalization 1.58% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 22: predicate.load_eliminater 1.13% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.24% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.66% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 8: predicate.minmaximum_grad 1.37% : 0.000002s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.54% : 0.000001s : 3: predicate.parallel_virtual_node 1.43% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 11: predicate.partial_eliminate 0.85% : 0.000001s : 8: predicate.print_const_string_wrapper 0.98% : 0.000001s : 6: predicate.reduce_all_const_elim 1.16% : 0.000002s : 8: predicate.reduce_eliminate 2.26% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.66% : 0.000001s : 6: predicate.remove_not_recompute_node 1.20% : 0.000002s : 14: predicate.replace_applicator 0.68% : 0.000001s : 6: predicate.replace_old_param 0.31% : 0.000000s : 3: predicate.reset_defer_inline 1.13% : 0.000002s : 8: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000001s : 6: predicate.same_eliminate 0.52% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 6: predicate.shard_identity_eliminate 0.87% : 0.000001s : 6: predicate.special_op_eliminate 0.93% : 0.000001s : 6: predicate.specialize_transform 0.97% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.75% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.20% : 0.000002s : 11: predicate.switch_defer_inline 1.97% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.73% : 0.000007s : 38: predicate.switch_simplify 0.89% : 0.000001s : 8: predicate.tile_eliminate 0.86% : 0.000001s : 8: predicate.transpose_eliminate 1.50% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.61% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.95% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 3: predicate.value_based_eliminate 0.78% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000320 7 35.34% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 64.66% : 0.000207s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030564 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.30% : 0.003148s : 1: add_attr 10.26% : 0.003137s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000067s : 1: auto_monad 0.26% : 0.000081s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.63% : 0.000497s : 1: bootstrap 0.10% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.05% : 0.000017s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000006s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.44% : 0.000440s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.77% : 0.000540s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.82% : 0.000863s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.45% : 0.002277s : 1: opt_a 0.33% : 0.000102s : 1: opt_after_cconv 1.59% : 0.000487s : 1: opt_after_jit_grad 0.64% : 0.000194s : 1: opt_b 13.95% : 0.004264s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.09% : 0.000027s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000020s : 1: remove_dup_value 1.07% : 0.000328s : 1: renormalize.infer 0.83% : 0.000253s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000041s : 1: rewriter_after_opt_a 0.19% : 0.000057s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000075s : 1: symbol_engine_optimizer 21.20% : 0.006478s : 1: task_emit 0.24% : 0.000073s : 1: tuple_transform 20.57% : 0.006286s : 1: type_inference 0.26% : 0.000079s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x7-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x7-kbk],max_mem:6.0M TotalTime = 0.894305, [24] [bootstrap]: 0.00061334 [type_inference]: 0.00668981 [event_method]: 1.557e-05 [auto_monad]: 6.147e-05 [graph_reusing]: 5.97001e-06 [inline]: 2.21998e-06 [add_attr]: 0.00383246, [1] [add_attr_with_inline]: 0.00381903, [1] [Cycle 1]: 4.992e-05, [2] [tag_attr]: 1.495e-05 [meta_addattr_fg_expand]: 4.74e-06 [parallel-infer-symbol]: 3.88001e-06 [pre_auto_parallel]: 3.011e-05 [insert-virtual-dataset]: 2.94001e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.24001e-06 [pipeline_split]: 1.74998e-06 [optimize]: 0.00444288, [53] [py_interpret_to_execute]: 2.561e-05 [rewriter_before_opt_a]: 6.607e-05 [opt_a]: 0.0023638, [2] [Cycle 1]: 0.00171402, [45] [expand_dump_flag]: 2.92002e-06 [switch_simplify]: 3.375e-05 [loop_unroll]: 2.079e-05 [a_1]: 0.00045332 [with_stream_mark]: 1.708e-05 [recompute_prepare]: 7.93999e-06 [updatestate_depend_eliminate]: 3.91999e-06 [updatestate_assign_eliminate]: 3.81999e-06 [updatestate_loads_eliminate]: 3.16999e-06 [parameter_eliminate]: 2.17999e-06 [a_2]: 7.923e-05 [accelerated_algorithm]: 6.93e-06 [shard]: 1.94e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 6.32001e-06 [merge_send_recv]: 8.20999e-06 [auto_parallel]: 7.16999e-06 [parallel]: 2.688e-05 [flash_sp]: 7.85998e-06 [merge_comm]: 4.24997e-06 [allreduce_fusion]: 3.71999e-06 [matmul_add_comm_reduction]: 9.39998e-06 [allreduce_slice_to_reducescatter]: 8.69972e-07 [virtual_shard_identity]: 7.52002e-06 [virtual_dataset]: 6.04999e-06 [get_grad_eliminate_]: 5.66e-06 [virtual_output]: 5.81998e-06 [merge_forward]: 4.08001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.072e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.208e-05 [merge_recompute_call_nodes]: 1.81e-06 [before_grad]: 1.049e-05 [set_forward_comm_id_for_comm_node_pass]: 3.86999e-06 [meta_fg_expand]: 2.48998e-06 [flash_sp_send_recv_attached]: 2.51e-06 [receive_attached]: 2.00002e-06 [after_resolve]: 9.49e-06 [a_after_grad]: 8.47e-06 [renormalize]: 0.00056058 [add_forward_monad_depend]: 9.15999e-06 [auto_monad_grad]: 2.87002e-06 [auto_monad_eliminator]: 1.522e-05 [cse]: 3.239e-05 [a_3]: 4.402e-05 [Cycle 2]: 0.00063905, [45] [expand_dump_flag]: 1.34998e-06 [switch_simplify]: 7.46999e-06 [loop_unroll]: 5.79e-06 [a_1]: 0.00011973 [with_stream_mark]: 1.126e-05 [recompute_prepare]: 5.74e-06 [updatestate_depend_eliminate]: 3.28998e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.93998e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 7.352e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 1.42e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 6.40002e-06 [merge_send_recv]: 5.89e-06 [auto_parallel]: 7.11001e-06 [parallel]: 4.87e-06 [flash_sp]: 4e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 3.04999e-06 [matmul_add_comm_reduction]: 6.29999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 7.92998e-06 [virtual_dataset]: 6.07001e-06 [get_grad_eliminate_]: 5.74e-06 [virtual_output]: 5.82999e-06 [merge_forward]: 3.13e-06 [cell_reuse_recompute_pass]: 1.36002e-06 [offload_activation]: 7.45e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.036e-05 [merge_recompute_call_nodes]: 1.00001e-06 [before_grad]: 8.69e-06 [set_forward_comm_id_for_comm_node_pass]: 3.59002e-06 [meta_fg_expand]: 1.93002e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 9.00007e-07 [after_resolve]: 9.15001e-06 [a_after_grad]: 7.75e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.57001e-06 [auto_monad_grad]: 1.34e-06 [auto_monad_eliminator]: 7.36999e-06 [cse]: 1.469e-05 [a_3]: 3.304e-05 [py_interpret_to_execute_after_opt_a]: 9.08002e-06 [slice_cell_reuse_recomputed_activation]: 2.09e-06 [rewriter_after_opt_a]: 3.582e-05 [convert_after_rewriter]: 7.11999e-06 [order_py_execute_after_rewriter]: 5.98002e-06 [mutable_eliminate]: 0.00056017 [opt_b]: 0.00019635, [1] [Cycle 1]: 0.00018905, [7] [b_1]: 0.00011024 [b_2]: 7.64002e-06 [updatestate_depend_eliminate]: 7.18e-06 [updatestate_assign_eliminate]: 2.94999e-06 [updatestate_loads_eliminate]: 2.86e-06 [renormalize]: 5.69999e-07 [cse]: 1.937e-05 [optimize_parallel_all_gather_comm]: 1.791e-05 [overlap_param_gather]: 1.92999e-06 [cconv]: 2.492e-05 [loop_unroll]: 0.00045533 [opt_after_cconv]: 0.0001033, [1] [Cycle 1]: 9.685e-05, [7] [c_1]: 2.681e-05 [parameter_eliminate]: 2.91e-06 [updatestate_depend_eliminate]: 6.22001e-06 [updatestate_assign_eliminate]: 2.99999e-06 [updatestate_loads_eliminate]: 2.69999e-06 [cse]: 1.859e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.61e-05 [tuple_transform]: 7.153e-05, [1] [Cycle 1]: 6.69e-05, [4] [d_1]: 3.95e-05 [none_parameter_eliminate]: 1.76e-06 [renormalize]: 2.70025e-07 [switch_simplify]: 6.59999e-06 [partial_unused_args_eliminate]: 1.84998e-06 [add_recomputation]: 4.943e-05 [cse_after_recomputation]: 2.225e-05, [1] [Cycle 1]: 1.732e-05, [1] [cse]: 1.184e-05 [environ_conv]: 8.21002e-06 [swap_dp_allreduce_reducescatter]: 4.88001e-06 [bias_add_comm_swap]: 2.37999e-06 [label_micro_interleaved_index]: 4.25e-06 [label_fine_grained_interleaved_index]: 3.27002e-06 [merge_cast_opt]: 1.60999e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 1.09e-06 [remove_cast_before_assign_add]: 1.15001e-06 [full_micro_interleaved_order_control]: 2.19001e-06 [reorder_send_recv_between_fp_bp]: 2.86e-06 [comm_op_add_attrs]: 1.10999e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.80001e-06 [control_data_broadcast_order]: 1.244e-05 [grouped_pairwise_exchange_alltoall]: 1.82999e-06 [offloading_packed_experts]: 3.93001e-06 [overlap_recompute_and_grad_model_parallel]: 4.85001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.79999e-06 [overlap_grad_ring_attention]: 4.27e-06 [overlap_grad_flash_sp]: 1.916e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.35002e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.16997e-06 [symbol_engine_optimizer]: 7.453e-05, [1] [Cycle 1]: 7.002e-05, [6] [build]: 3.19001e-06 [elim_shapecalc]: 9.17001e-06 [elim_not_effective]: 1.301e-05 [opt_reshape]: 6.38998e-06 [fold_const_symbol]: 9.89001e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.83002e-06 [pipeline_parallel_scheduler]: 1.59e-06 [auto_monad_reorder]: 1.652e-05 [get_jit_bprop_graph]: 1.71e-06 [rewriter_after_jit_bprop_graph]: 3.83999e-06 [opt_after_jit_grad]: 0.00046927 [validate]: 3.786e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.877809 [execute]: 8.57e-06 Sums bootstrap : 0.000613s : 0.07% type_inference : 0.006690s : 0.75% event_method : 0.000016s : 0.00% auto_monad : 0.000061s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000030s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000026s : 0.00% optimize.rewriter_before_opt_a : 0.000066s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000573s : 0.06% optimize.opt_a.with_stream_mark : 0.000028s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000153s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000014s : 0.00% optimize.opt_a.auto_parallel : 0.000014s : 0.00% optimize.opt_a.parallel : 0.000032s : 0.00% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000018s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000016s : 0.00% optimize.opt_a.renormalize : 0.000561s : 0.06% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.00% optimize.opt_a.cse : 0.000047s : 0.01% optimize.opt_a.a_3 : 0.000077s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000560s : 0.06% optimize.opt_b.b_1 : 0.000110s : 0.01% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000019s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.00% optimize.loop_unroll : 0.000455s : 0.05% optimize.opt_after_cconv.c_1 : 0.000027s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.00% optimize.tuple_transform.d_1 : 0.000040s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000008s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000469s : 0.05% validate : 0.000038s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.877809s : 98.70% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000180 26 19.50% : 0.000035s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.89% : 0.000002s : 2: substitution.fold_const_symbol 3.08% : 0.000006s : 3: substitution.graph_param_transform 64.18% : 0.000116s : 3: substitution.inline 1.86% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.78% : 0.000005s : 4: substitution.remove_not_recompute_node 1.94% : 0.000003s : 2: substitution.replace_old_param 4.73% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006636 2 90.72% : 0.006021s : 1: type_inference.infer 9.28% : 0.000616s : 1: type_inference.specialize ------[replace.] 0.000039 4 78.60% : 0.000031s : 3: replace.inline 21.40% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 4 93.62% : 0.000113s : 3: match.inline 6.38% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 883 0.87% : 0.000001s : 9: predicate.accumulaten_eliminater 0.82% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 9: predicate.addn_zero_filter 0.82% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.22% : 0.000004s : 15: predicate.arithmetic_simplify 0.93% : 0.000002s : 9: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.55% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.89% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 9: predicate.dict_set_item_eliminator 0.99% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.57% : 0.000001s : 3: predicate.elim_not_effective 0.39% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_depend_swap 1.99% : 0.000003s : 18: predicate.environ_get_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.29% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.90% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.77% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.25% : 0.000010s : 40: predicate.inline 0.87% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.93% : 0.000002s : 6: predicate.less_batch_normalization 1.59% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 25: predicate.load_eliminater 1.12% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.12% : 0.000002s : 3: predicate.mutable_eliminate 0.31% : 0.000001s : 3: predicate.opt_reshape 0.56% : 0.000001s : 3: predicate.parallel_virtual_node 1.59% : 0.000003s : 13: predicate.partial_defer_inline 1.42% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.59% : 0.000001s : 6: predicate.reduce_all_const_elim 1.12% : 0.000002s : 9: predicate.reduce_eliminate 2.44% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 6: predicate.remove_not_recompute_node 1.27% : 0.000002s : 16: predicate.replace_applicator 0.64% : 0.000001s : 6: predicate.replace_old_param 0.25% : 0.000000s : 3: predicate.reset_defer_inline 1.03% : 0.000002s : 9: predicate.reshape_eliminate 0.62% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.78% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.02% : 0.000002s : 6: predicate.shard_identity_eliminate 0.94% : 0.000002s : 6: predicate.special_op_eliminate 0.85% : 0.000001s : 6: predicate.specialize_transform 1.00% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.71% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 13: predicate.switch_defer_inline 2.00% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 43: predicate.switch_simplify 0.90% : 0.000001s : 9: predicate.tile_eliminate 0.87% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 3: predicate.value_based_eliminate 0.71% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000405 8 46.39% : 0.000188s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.61% : 0.000217s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.904239 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.42% : 0.003837s : 1: add_attr 0.42% : 0.003823s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000067s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000648s : 1: bootstrap 0.00% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.00% : 0.000022s : 1: event_method 0.00% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000465s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000571s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000016s : 1: opt.transform.mutable_eliminate 0.11% : 0.000951s : 78: opt.transform.opt_a 0.00% : 0.000025s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000090s : 28: opt.transform.opt_b 0.00% : 0.000044s : 2: opt.transform.opt_trans_graph 0.00% : 0.000034s : 4: opt.transform.symbol_engine_opt 0.26% : 0.002367s : 1: opt_a 0.01% : 0.000107s : 1: opt_after_cconv 0.05% : 0.000480s : 1: opt_after_jit_grad 0.02% : 0.000200s : 1: opt_b 0.49% : 0.004447s : 1: optimize 0.00% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000035s : 1: pre_auto_parallel 0.00% : 0.000030s : 1: py_interpret_to_execute 0.00% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000020s : 1: remove_dup_value 0.03% : 0.000299s : 1: renormalize.infer 0.03% : 0.000254s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000040s : 1: rewriter_after_opt_a 0.01% : 0.000070s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000077s : 1: symbol_engine_optimizer 97.08% : 0.877855s : 1: task_emit 0.01% : 0.000074s : 1: tuple_transform 0.74% : 0.006707s : 1: type_inference 0.01% : 0.000065s : 1: validate TotalTime = 0.0800513, [24] [bootstrap]: 0.00051107 [type_inference]: 0.00648214 [event_method]: 1.365e-05 [auto_monad]: 6.11e-05 [graph_reusing]: 6.33998e-06 [inline]: 2.33998e-06 [add_attr]: 0.00323632, [1] [add_attr_with_inline]: 0.00322728, [1] [Cycle 1]: 5.294e-05, [2] [tag_attr]: 1.427e-05 [meta_addattr_fg_expand]: 3.65e-06 [parallel-infer-symbol]: 3.82998e-06 [pre_auto_parallel]: 2.853e-05 [insert-virtual-dataset]: 2.84999e-06 [parallel-infer-symbol-second]: 1.10001e-06 [dataset_repeat_opt]: 2.19001e-06 [pipeline_split]: 1.72001e-06 [optimize]: 0.00439149, [53] [py_interpret_to_execute]: 2.238e-05 [rewriter_before_opt_a]: 5.557e-05 [opt_a]: 0.00228626, [2] [Cycle 1]: 0.00163891, [45] [expand_dump_flag]: 3.19001e-06 [switch_simplify]: 2.914e-05 [loop_unroll]: 1.678e-05 [a_1]: 0.00038597 [with_stream_mark]: 1.761e-05 [recompute_prepare]: 7.97998e-06 [updatestate_depend_eliminate]: 4.45e-06 [updatestate_assign_eliminate]: 3.48e-06 [updatestate_loads_eliminate]: 3.86001e-06 [parameter_eliminate]: 1.96e-06 [a_2]: 8.046e-05 [accelerated_algorithm]: 7.16001e-06 [shard]: 2.45997e-06 [meta_shard_fg_expand]: 1.67999e-06 [shard_inline]: 6.01998e-06 [merge_send_recv]: 9.35001e-06 [auto_parallel]: 7.36001e-06 [parallel]: 1.922e-05 [flash_sp]: 7.77998e-06 [merge_comm]: 4.18001e-06 [allreduce_fusion]: 3.51999e-06 [matmul_add_comm_reduction]: 9.52001e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 8.00999e-06 [virtual_dataset]: 5.89e-06 [get_grad_eliminate_]: 5.66003e-06 [virtual_output]: 5.71998e-06 [merge_forward]: 4.23001e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.69999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.227e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 1.001e-05 [set_forward_comm_id_for_comm_node_pass]: 3.95e-06 [meta_fg_expand]: 2.98e-06 [flash_sp_send_recv_attached]: 2.63e-06 [receive_attached]: 2.29001e-06 [after_resolve]: 1.059e-05 [a_after_grad]: 8.59e-06 [renormalize]: 0.00057149 [add_forward_monad_depend]: 6.03002e-06 [auto_monad_grad]: 2.66999e-06 [auto_monad_eliminator]: 1.581e-05 [cse]: 3.045e-05 [a_3]: 4.286e-05 [Cycle 2]: 0.00063706, [45] [expand_dump_flag]: 1.54e-06 [switch_simplify]: 7.45e-06 [loop_unroll]: 5.64998e-06 [a_1]: 0.00011828 [with_stream_mark]: 1.315e-05 [recompute_prepare]: 6.09999e-06 [updatestate_depend_eliminate]: 2.96999e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.85002e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 7.202e-05 [accelerated_algorithm]: 5.87999e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.45001e-06 [shard_inline]: 5.64e-06 [merge_send_recv]: 6.13002e-06 [auto_parallel]: 6.66e-06 [parallel]: 4.75001e-06 [flash_sp]: 3.35e-06 [merge_comm]: 3.83001e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 5.66998e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 6.19999e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.23002e-06 [virtual_output]: 5.09e-06 [merge_forward]: 3.37002e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 7.2e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.103e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 9.31998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.62998e-06 [meta_fg_expand]: 1.92001e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.71e-06 [a_after_grad]: 8.59998e-06 [renormalize]: 5.9983e-08 [add_forward_monad_depend]: 1.51998e-06 [auto_monad_grad]: 1.10999e-06 [auto_monad_eliminator]: 7.73001e-06 [cse]: 1.668e-05 [a_3]: 3.765e-05 [py_interpret_to_execute_after_opt_a]: 1.213e-05 [slice_cell_reuse_recomputed_activation]: 2.99001e-06 [rewriter_after_opt_a]: 3.874e-05 [convert_after_rewriter]: 6.84999e-06 [order_py_execute_after_rewriter]: 5.59e-06 [mutable_eliminate]: 0.00057728 [opt_b]: 0.00019291, [1] [Cycle 1]: 0.00018633, [7] [b_1]: 0.00011208 [b_2]: 7.36999e-06 [updatestate_depend_eliminate]: 6.58e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 2.46e-06 [renormalize]: 6.10016e-07 [cse]: 1.871e-05 [optimize_parallel_all_gather_comm]: 1.784e-05 [overlap_param_gather]: 1.92001e-06 [cconv]: 2.631e-05 [loop_unroll]: 0.00044853 [opt_after_cconv]: 0.00010187, [1] [Cycle 1]: 9.561e-05, [7] [c_1]: 2.66e-05 [parameter_eliminate]: 3.4e-06 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.916e-05 [renormalize]: 8.39995e-07 [remove_dup_value]: 1.619e-05 [tuple_transform]: 7.2e-05, [1] [Cycle 1]: 6.713e-05, [4] [d_1]: 3.946e-05 [none_parameter_eliminate]: 2.10002e-06 [renormalize]: 3.89991e-07 [switch_simplify]: 6.57002e-06 [partial_unused_args_eliminate]: 1.86e-06 [add_recomputation]: 4.563e-05 [cse_after_recomputation]: 2.36e-05, [1] [Cycle 1]: 1.864e-05, [1] [cse]: 1.278e-05 [environ_conv]: 5.64e-06 [swap_dp_allreduce_reducescatter]: 5.30999e-06 [bias_add_comm_swap]: 2.51e-06 [label_micro_interleaved_index]: 4.82e-06 [label_fine_grained_interleaved_index]: 2.91e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.45002e-06 [micro_interleaved_order_control]: 2.57001e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 1.30001e-06 [full_micro_interleaved_order_control]: 2.54001e-06 [reorder_send_recv_between_fp_bp]: 2.99001e-06 [comm_op_add_attrs]: 1.18001e-06 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.36998e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.304e-05 [grouped_pairwise_exchange_alltoall]: 2.29001e-06 [offloading_packed_experts]: 4.72998e-06 [overlap_recompute_and_grad_model_parallel]: 5.23002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.76003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.49999e-06 [overlap_grad_ring_attention]: 4.74e-06 [overlap_grad_flash_sp]: 2.024e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 2.42001e-06 [handle_group_info]: 1.23002e-06 [symbol_engine_optimizer]: 7.645e-05, [1] [Cycle 1]: 7.134e-05, [6] [build]: 2.64999e-06 [elim_shapecalc]: 9.25001e-06 [elim_not_effective]: 1.313e-05 [opt_reshape]: 6.45002e-06 [fold_const_symbol]: 9.68002e-06 [renormalize]: 2.10013e-07 [detach_backward]: 2.44999e-06 [pipeline_parallel_scheduler]: 1.54998e-06 [auto_monad_reorder]: 1.697e-05 [get_jit_bprop_graph]: 2.25002e-06 [rewriter_after_jit_bprop_graph]: 4.42e-06 [opt_after_jit_grad]: 0.00049345 [validate]: 3.952e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0644948 [execute]: 1.073e-05 Sums bootstrap : 0.000511s : 0.67% type_inference : 0.006482s : 8.56% event_method : 0.000014s : 0.02% auto_monad : 0.000061s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000029s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.03% optimize.rewriter_before_opt_a : 0.000056s : 0.07% optimize.opt_a.expand_dump_flag : 0.000005s : 0.01% optimize.opt_a.switch_simplify : 0.000037s : 0.05% optimize.opt_a.loop_unroll : 0.000022s : 0.03% optimize.opt_a.a_1 : 0.000504s : 0.67% optimize.opt_a.with_stream_mark : 0.000031s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000152s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000015s : 0.02% optimize.opt_a.auto_parallel : 0.000014s : 0.02% optimize.opt_a.parallel : 0.000024s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000008s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000572s : 0.75% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.03% optimize.opt_a.cse : 0.000047s : 0.06% optimize.opt_a.a_3 : 0.000081s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000039s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000577s : 0.76% optimize.opt_b.b_1 : 0.000112s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000019s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.03% optimize.loop_unroll : 0.000449s : 0.59% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.06% optimize.cse_after_recomputation.cse : 0.000013s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000493s : 0.65% validate : 0.000040s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.064495s : 85.17% execute : 0.000011s : 0.01% Time group info: ------[substitution.] 0.000174 24 18.61% : 0.000032s : 4: substitution.arithmetic_simplify 1.18% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 3.22% : 0.000006s : 3: substitution.graph_param_transform 68.99% : 0.000120s : 3: substitution.inline 1.98% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.82% : 0.000005s : 4: substitution.remove_not_recompute_node 2.42% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.006434 2 91.83% : 0.005908s : 1: type_inference.infer 8.17% : 0.000526s : 1: type_inference.specialize ------[replace.] 0.000029 3 100.00% : 0.000029s : 3: replace.inline ------[match.] 0.000118 3 100.00% : 0.000118s : 3: match.inline ------[predicate.] 0.000151 815 0.97% : 0.000001s : 8: predicate.accumulaten_eliminater 1.25% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 6: predicate.addn_check_dump 0.92% : 0.000001s : 8: predicate.addn_zero_filter 0.78% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.33% : 0.000004s : 14: predicate.arithmetic_simplify 0.85% : 0.000001s : 8: predicate.cast_eliminate 0.70% : 0.000001s : 6: predicate.check_bprop_eliminate 0.63% : 0.000001s : 6: predicate.compare_switch_simplify 0.25% : 0.000000s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 3: predicate.elim_not_effective 0.48% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.26% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 11: predicate.environ_get_depend_swap 1.77% : 0.000003s : 17: predicate.environ_get_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.13% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.39% : 0.000004s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 6: predicate.float_environ_get_switch 0.89% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 3: predicate.fold_const_symbol 0.72% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.69% : 0.000001s : 6: predicate.incorporate_call 0.62% : 0.000001s : 6: predicate.incorporate_call_switch 6.24% : 0.000009s : 37: predicate.inline 0.95% : 0.000001s : 6: predicate.inline_without_move 0.46% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 6: predicate.less_batch_normalization 1.56% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 22: predicate.load_eliminater 1.22% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.01% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 8: predicate.minmaximum_grad 1.30% : 0.000002s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.48% : 0.000001s : 3: predicate.parallel_virtual_node 1.46% : 0.000002s : 11: predicate.partial_defer_inline 1.32% : 0.000002s : 11: predicate.partial_eliminate 0.83% : 0.000001s : 8: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.10% : 0.000002s : 8: predicate.reduce_eliminate 2.26% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.60% : 0.000001s : 6: predicate.remove_not_recompute_node 1.19% : 0.000002s : 14: predicate.replace_applicator 0.80% : 0.000001s : 6: predicate.replace_old_param 0.42% : 0.000001s : 3: predicate.reset_defer_inline 0.87% : 0.000001s : 8: predicate.reshape_eliminate 0.70% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 1.00% : 0.000002s : 6: predicate.same_eliminate 0.52% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 6: predicate.shard_identity_eliminate 0.74% : 0.000001s : 6: predicate.special_op_eliminate 0.87% : 0.000001s : 6: predicate.specialize_transform 0.98% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.89% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.21% : 0.000002s : 11: predicate.switch_defer_inline 1.86% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.66% : 0.000007s : 38: predicate.switch_simplify 0.87% : 0.000001s : 8: predicate.tile_eliminate 0.87% : 0.000001s : 8: predicate.transpose_eliminate 1.43% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.70% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.25% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.99% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.36% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000341 7 36.64% : 0.000125s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.36% : 0.000216s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.089276 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.63% : 0.003241s : 1: add_attr 3.62% : 0.003231s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000067s : 1: auto_monad 0.02% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.60% : 0.000539s : 1: bootstrap 0.03% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000027s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000020s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.03% : 0.000028s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.51% : 0.000458s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.66% : 0.000588s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 0.98% : 0.000876s : 78: opt.transform.opt_a 0.03% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000035s : 4: opt.transform.symbol_engine_opt 2.56% : 0.002290s : 1: opt_a 0.12% : 0.000106s : 1: opt_after_cconv 0.56% : 0.000504s : 1: opt_after_jit_grad 0.22% : 0.000196s : 1: opt_b 4.92% : 0.004396s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000005s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000033s : 1: pre_auto_parallel 0.03% : 0.000026s : 1: py_interpret_to_execute 0.02% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000020s : 1: remove_dup_value 0.35% : 0.000311s : 1: renormalize.infer 0.28% : 0.000252s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000043s : 1: rewriter_after_opt_a 0.07% : 0.000060s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000079s : 1: symbol_engine_optimizer 72.28% : 0.064531s : 1: task_emit 0.08% : 0.000075s : 1: tuple_transform 7.28% : 0.006502s : 1: type_inference 0.08% : 0.000068s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x7-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x7-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x8-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x8-pynative],max_mem:6.0M TotalTime = 0.0232458, [24] [bootstrap]: 0.00056276 [type_inference]: 0.00679703 [event_method]: 1.523e-05 [auto_monad]: 6.431e-05 [graph_reusing]: 6.15997e-06 [inline]: 1.92001e-06 [add_attr]: 0.0038054, [1] [add_attr_with_inline]: 0.00379327, [1] [Cycle 1]: 5.019e-05, [2] [tag_attr]: 1.579e-05 [meta_addattr_fg_expand]: 4.70001e-06 [parallel-infer-symbol]: 3.32997e-06 [pre_auto_parallel]: 3.107e-05 [insert-virtual-dataset]: 2.79001e-06 [parallel-infer-symbol-second]: 9.09989e-07 [dataset_repeat_opt]: 2.22999e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00440034, [53] [py_interpret_to_execute]: 2.287e-05 [rewriter_before_opt_a]: 6.814e-05 [opt_a]: 0.00236117, [2] [Cycle 1]: 0.00173129, [45] [expand_dump_flag]: 2.71e-06 [switch_simplify]: 3.493e-05 [loop_unroll]: 2.243e-05 [a_1]: 0.00046348 [with_stream_mark]: 1.591e-05 [recompute_prepare]: 8.57998e-06 [updatestate_depend_eliminate]: 3.97e-06 [updatestate_assign_eliminate]: 3.70998e-06 [updatestate_loads_eliminate]: 3.31999e-06 [parameter_eliminate]: 2.39001e-06 [a_2]: 8.522e-05 [accelerated_algorithm]: 7.03998e-06 [shard]: 1.93002e-06 [meta_shard_fg_expand]: 2.17999e-06 [shard_inline]: 6.26e-06 [merge_send_recv]: 8.34002e-06 [auto_parallel]: 6.66e-06 [parallel]: 2.522e-05 [flash_sp]: 8.14002e-06 [merge_comm]: 3.93001e-06 [allreduce_fusion]: 3.6e-06 [matmul_add_comm_reduction]: 9.85002e-06 [allreduce_slice_to_reducescatter]: 6.99976e-07 [virtual_shard_identity]: 7.77e-06 [virtual_dataset]: 7.00998e-06 [get_grad_eliminate_]: 5.82999e-06 [virtual_output]: 6.23e-06 [merge_forward]: 4.05998e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 1.014e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.24e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 1.306e-05 [set_forward_comm_id_for_comm_node_pass]: 4.13999e-06 [meta_fg_expand]: 2.76e-06 [flash_sp_send_recv_attached]: 2.84999e-06 [receive_attached]: 2.11e-06 [after_resolve]: 9.89999e-06 [a_after_grad]: 8.66002e-06 [renormalize]: 0.00055747 [add_forward_monad_depend]: 9.42999e-06 [auto_monad_grad]: 2.33002e-06 [auto_monad_eliminator]: 1.432e-05 [cse]: 2.843e-05 [a_3]: 4.446e-05 [Cycle 2]: 0.00061891, [45] [expand_dump_flag]: 1.34e-06 [switch_simplify]: 7.29001e-06 [loop_unroll]: 5.89e-06 [a_1]: 0.00011594 [with_stream_mark]: 1.092e-05 [recompute_prepare]: 6.24999e-06 [updatestate_depend_eliminate]: 3.43e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.74999e-06 [parameter_eliminate]: 1.14e-06 [a_2]: 7.162e-05 [accelerated_algorithm]: 5.91998e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.35999e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 4.67998e-06 [auto_parallel]: 5.91e-06 [parallel]: 4.42e-06 [flash_sp]: 3.46999e-06 [merge_comm]: 3.86999e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 5.43002e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.94001e-06 [virtual_dataset]: 5.37999e-06 [get_grad_eliminate_]: 5.32001e-06 [virtual_output]: 5.17e-06 [merge_forward]: 3.72998e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 7.35e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.043e-05 [merge_recompute_call_nodes]: 9.79984e-07 [before_grad]: 8.87e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 1.89e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 9.60019e-07 [after_resolve]: 9.29e-06 [a_after_grad]: 7.83001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.26002e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 6.49001e-06 [cse]: 1.477e-05 [a_3]: 3.227e-05 [py_interpret_to_execute_after_opt_a]: 1.063e-05 [slice_cell_reuse_recomputed_activation]: 2.29001e-06 [rewriter_after_opt_a]: 5.665e-05 [convert_after_rewriter]: 8.38001e-06 [order_py_execute_after_rewriter]: 5.54998e-06 [mutable_eliminate]: 0.00052792 [opt_b]: 0.00019435, [1] [Cycle 1]: 0.00018656, [7] [b_1]: 0.00011103 [b_2]: 7.52998e-06 [updatestate_depend_eliminate]: 6.08998e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.66e-06 [renormalize]: 6.39993e-07 [cse]: 1.976e-05 [optimize_parallel_all_gather_comm]: 1.715e-05 [overlap_param_gather]: 2.26e-06 [cconv]: 2.447e-05 [loop_unroll]: 0.00043178 [opt_after_cconv]: 9.982e-05, [1] [Cycle 1]: 9.355e-05, [7] [c_1]: 2.643e-05 [parameter_eliminate]: 3.08e-06 [updatestate_depend_eliminate]: 5.35999e-06 [updatestate_assign_eliminate]: 2.78003e-06 [updatestate_loads_eliminate]: 2.50002e-06 [cse]: 1.84e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.52e-05 [tuple_transform]: 7.071e-05, [1] [Cycle 1]: 6.625e-05, [4] [d_1]: 3.905e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.46e-06 [partial_unused_args_eliminate]: 2.02001e-06 [add_recomputation]: 5.162e-05 [cse_after_recomputation]: 2.243e-05, [1] [Cycle 1]: 1.751e-05, [1] [cse]: 1.208e-05 [environ_conv]: 8.3e-06 [swap_dp_allreduce_reducescatter]: 5.39e-06 [bias_add_comm_swap]: 2.73e-06 [label_micro_interleaved_index]: 4.84e-06 [label_fine_grained_interleaved_index]: 2.58003e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.24999e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 8.29983e-07 [remove_cast_before_assign_add]: 1.22999e-06 [full_micro_interleaved_order_control]: 2.25002e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.50999e-06 [overlap_opt_shard_in_pipeline]: 1.45001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.73002e-06 [control_data_broadcast_order]: 1.245e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 3.91999e-06 [overlap_recompute_and_grad_model_parallel]: 4.87998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 2.72001e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.704e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 1.04e-06 [symbol_engine_optimizer]: 7.589e-05, [1] [Cycle 1]: 7.128e-05, [6] [build]: 2.58e-06 [elim_shapecalc]: 9.96e-06 [elim_not_effective]: 1.351e-05 [opt_reshape]: 6.86001e-06 [fold_const_symbol]: 9.79e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.79998e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 1.627e-05 [get_jit_bprop_graph]: 1.82001e-06 [rewriter_after_jit_bprop_graph]: 4.38001e-06 [opt_after_jit_grad]: 0.00048966 [validate]: 3.932e-05 [backend_pass]: 1.15999e-06 [task_emit]: 0.00677918 [execute]: 7.53e-06 Sums bootstrap : 0.000563s : 3.06% type_inference : 0.006797s : 36.92% event_method : 0.000015s : 0.08% auto_monad : 0.000064s : 0.35% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000031s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000023s : 0.12% optimize.rewriter_before_opt_a : 0.000068s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000042s : 0.23% optimize.opt_a.loop_unroll : 0.000028s : 0.15% optimize.opt_a.a_1 : 0.000579s : 3.15% optimize.opt_a.with_stream_mark : 0.000027s : 0.15% optimize.opt_a.recompute_prepare : 0.000015s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000004s : 0.02% optimize.opt_a.a_2 : 0.000157s : 0.85% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000013s : 0.07% optimize.opt_a.parallel : 0.000030s : 0.16% optimize.opt_a.flash_sp : 0.000012s : 0.06% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000008s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000022s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.10% optimize.opt_a.a_after_grad : 0.000016s : 0.09% optimize.opt_a.renormalize : 0.000558s : 3.03% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.06% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.11% optimize.opt_a.cse : 0.000043s : 0.23% optimize.opt_a.a_3 : 0.000077s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000057s : 0.31% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000528s : 2.87% optimize.opt_b.b_1 : 0.000111s : 0.60% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000020s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.13% optimize.loop_unroll : 0.000432s : 2.35% optimize.opt_after_cconv.c_1 : 0.000026s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000008s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000002s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.09% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000490s : 2.66% validate : 0.000039s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006779s : 36.83% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000181 26 18.48% : 0.000034s : 5: substitution.arithmetic_simplify 1.20% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.12% : 0.000006s : 3: substitution.graph_param_transform 63.09% : 0.000114s : 3: substitution.inline 3.29% : 0.000006s : 4: substitution.j_node_and_user_rematch 2.64% : 0.000005s : 4: substitution.remove_not_recompute_node 2.16% : 0.000004s : 2: substitution.replace_old_param 5.22% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006746 2 90.46% : 0.006102s : 1: type_inference.infer 9.54% : 0.000644s : 1: type_inference.specialize ------[replace.] 0.000041 4 80.02% : 0.000032s : 3: replace.inline 19.98% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 4 93.05% : 0.000112s : 3: match.inline 6.95% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 883 0.89% : 0.000001s : 9: predicate.accumulaten_eliminater 0.94% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000001s : 9: predicate.addn_zero_filter 0.83% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.10% : 0.000003s : 15: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.62% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.69% : 0.000001s : 6: predicate.depend_value_elim 0.92% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.94% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.33% : 0.000001s : 3: predicate.elim_not_effective 0.44% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_depend_swap 1.94% : 0.000003s : 18: predicate.environ_get_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.26% : 0.000004s : 13: predicate.float_depend_g_call 0.57% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.71% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.35% : 0.000010s : 40: predicate.inline 0.84% : 0.000001s : 6: predicate.inline_without_move 0.53% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.07% : 0.000002s : 6: predicate.less_batch_normalization 1.66% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 25: predicate.load_eliminater 1.02% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.28% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.61% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 9: predicate.minmaximum_grad 1.02% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.59% : 0.000003s : 13: predicate.partial_defer_inline 1.39% : 0.000002s : 13: predicate.partial_eliminate 0.86% : 0.000001s : 9: predicate.print_const_string_wrapper 0.63% : 0.000001s : 6: predicate.reduce_all_const_elim 1.16% : 0.000002s : 9: predicate.reduce_eliminate 2.48% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 6: predicate.remove_not_recompute_node 1.28% : 0.000002s : 16: predicate.replace_applicator 0.68% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.94% : 0.000002s : 9: predicate.reshape_eliminate 0.62% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.84% : 0.000001s : 6: predicate.same_eliminate 0.44% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 6: predicate.shard_identity_eliminate 0.73% : 0.000001s : 6: predicate.special_op_eliminate 0.84% : 0.000001s : 6: predicate.specialize_transform 0.98% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.74% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 13: predicate.switch_defer_inline 1.91% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.77% : 0.000008s : 43: predicate.switch_simplify 0.92% : 0.000001s : 9: predicate.tile_eliminate 0.87% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.59% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.35% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.58% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.05% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 3: predicate.value_based_eliminate 0.70% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000411 8 45.35% : 0.000187s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.65% : 0.000225s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.033126 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.50% : 0.003811s : 1: add_attr 11.46% : 0.003797s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000070s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.79% : 0.000592s : 1: bootstrap 0.08% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000012s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000012s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.33% : 0.000441s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.62% : 0.000537s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000014s : 1: opt.transform.mutable_eliminate 2.93% : 0.000969s : 78: opt.transform.opt_a 0.07% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000089s : 28: opt.transform.opt_b 0.13% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000036s : 4: opt.transform.symbol_engine_opt 7.14% : 0.002364s : 1: opt_a 0.31% : 0.000104s : 1: opt_after_cconv 1.51% : 0.000500s : 1: opt_after_jit_grad 0.60% : 0.000198s : 1: opt_b 13.30% : 0.004405s : 1: optimize 0.06% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000035s : 1: pre_auto_parallel 0.08% : 0.000027s : 1: py_interpret_to_execute 0.04% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.90% : 0.000297s : 1: renormalize.infer 0.76% : 0.000253s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.19% : 0.000061s : 1: rewriter_after_opt_a 0.22% : 0.000073s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000079s : 1: symbol_engine_optimizer 20.50% : 0.006791s : 1: task_emit 0.22% : 0.000074s : 1: tuple_transform 20.57% : 0.006816s : 1: type_inference 0.21% : 0.000068s : 1: validate TotalTime = 0.0217999, [24] [bootstrap]: 0.00050736 [type_inference]: 0.00642542 [event_method]: 1.481e-05 [auto_monad]: 6.47e-05 [graph_reusing]: 6.51e-06 [inline]: 2.71e-06 [add_attr]: 0.00327116, [1] [add_attr_with_inline]: 0.00326128, [1] [Cycle 1]: 5.482e-05, [2] [tag_attr]: 1.514e-05 [meta_addattr_fg_expand]: 4.35e-06 [parallel-infer-symbol]: 3.48e-06 [pre_auto_parallel]: 2.774e-05 [insert-virtual-dataset]: 3.09001e-06 [parallel-infer-symbol-second]: 1.09e-06 [dataset_repeat_opt]: 2.09999e-06 [pipeline_split]: 2.02001e-06 [optimize]: 0.00429844, [53] [py_interpret_to_execute]: 2.376e-05 [rewriter_before_opt_a]: 5.572e-05 [opt_a]: 0.002246, [2] [Cycle 1]: 0.00160551, [45] [expand_dump_flag]: 2.65997e-06 [switch_simplify]: 3.004e-05 [loop_unroll]: 1.719e-05 [a_1]: 0.00037447 [with_stream_mark]: 1.867e-05 [recompute_prepare]: 8.38001e-06 [updatestate_depend_eliminate]: 4.38001e-06 [updatestate_assign_eliminate]: 3.44001e-06 [updatestate_loads_eliminate]: 3.19001e-06 [parameter_eliminate]: 1.97001e-06 [a_2]: 8.378e-05 [accelerated_algorithm]: 6.69001e-06 [shard]: 2.69001e-06 [meta_shard_fg_expand]: 1.69998e-06 [shard_inline]: 6.12999e-06 [merge_send_recv]: 8.79003e-06 [auto_parallel]: 6.86001e-06 [parallel]: 1.886e-05 [flash_sp]: 8.72e-06 [merge_comm]: 4e-06 [allreduce_fusion]: 3.63999e-06 [matmul_add_comm_reduction]: 9.71e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 8.32e-06 [virtual_dataset]: 6.43e-06 [get_grad_eliminate_]: 5.82001e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.8e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 1.118e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.308e-05 [merge_recompute_call_nodes]: 1.54998e-06 [before_grad]: 1.08e-05 [set_forward_comm_id_for_comm_node_pass]: 4.18001e-06 [meta_fg_expand]: 2.74999e-06 [flash_sp_send_recv_attached]: 2.66e-06 [receive_attached]: 2.58e-06 [after_resolve]: 1.02e-05 [a_after_grad]: 8.90999e-06 [renormalize]: 0.00053984 [add_forward_monad_depend]: 4.61002e-06 [auto_monad_grad]: 2.30002e-06 [auto_monad_eliminator]: 1.542e-05 [cse]: 3.115e-05 [a_3]: 4.326e-05 [Cycle 2]: 0.00062992, [45] [expand_dump_flag]: 1.37e-06 [switch_simplify]: 7.42998e-06 [loop_unroll]: 5.93998e-06 [a_1]: 0.00011779 [with_stream_mark]: 1.143e-05 [recompute_prepare]: 6.17001e-06 [updatestate_depend_eliminate]: 3.03e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.87002e-06 [parameter_eliminate]: 1.19e-06 [a_2]: 7.247e-05 [accelerated_algorithm]: 5.88002e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 6.07001e-06 [merge_send_recv]: 5.46e-06 [auto_parallel]: 6.09001e-06 [parallel]: 4.48001e-06 [flash_sp]: 3.70998e-06 [merge_comm]: 3.4e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 5.47999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 7.03e-06 [virtual_dataset]: 5.46e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.21002e-06 [merge_forward]: 3.48e-06 [cell_reuse_recompute_pass]: 1.71e-06 [offload_activation]: 7.33999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.094e-05 [merge_recompute_call_nodes]: 9.20001e-07 [before_grad]: 9.40001e-06 [set_forward_comm_id_for_comm_node_pass]: 4.09002e-06 [meta_fg_expand]: 2.02999e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.30999e-06 [after_resolve]: 9.56e-06 [a_after_grad]: 7.98001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.43002e-06 [auto_monad_grad]: 1.36002e-06 [auto_monad_eliminator]: 7.09001e-06 [cse]: 1.432e-05 [a_3]: 3.478e-05 [py_interpret_to_execute_after_opt_a]: 9.82001e-06 [slice_cell_reuse_recomputed_activation]: 2.07999e-06 [rewriter_after_opt_a]: 3.649e-05 [convert_after_rewriter]: 6.69001e-06 [order_py_execute_after_rewriter]: 5.07e-06 [mutable_eliminate]: 0.00052115 [opt_b]: 0.00019429, [1] [Cycle 1]: 0.00018777, [7] [b_1]: 0.00011224 [b_2]: 7.62002e-06 [updatestate_depend_eliminate]: 6.13998e-06 [updatestate_assign_eliminate]: 2.68998e-06 [updatestate_loads_eliminate]: 2.61e-06 [renormalize]: 5.60016e-07 [cse]: 1.967e-05 [optimize_parallel_all_gather_comm]: 1.768e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.642e-05 [loop_unroll]: 0.00044758 [opt_after_cconv]: 0.00013241, [1] [Cycle 1]: 0.0001257, [7] [c_1]: 5.428e-05 [parameter_eliminate]: 2.94001e-06 [updatestate_depend_eliminate]: 5.84e-06 [updatestate_assign_eliminate]: 2.84001e-06 [updatestate_loads_eliminate]: 2.49001e-06 [cse]: 1.824e-05 [renormalize]: 4.80009e-07 [remove_dup_value]: 1.499e-05 [tuple_transform]: 7.286e-05, [1] [Cycle 1]: 6.804e-05, [4] [d_1]: 4.081e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.57002e-06 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 4.818e-05 [cse_after_recomputation]: 2.23e-05, [1] [Cycle 1]: 1.725e-05, [1] [cse]: 1.153e-05 [environ_conv]: 5.86998e-06 [swap_dp_allreduce_reducescatter]: 5.49998e-06 [bias_add_comm_swap]: 2.83998e-06 [label_micro_interleaved_index]: 4.62e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.43002e-06 [slice_recompute_activation]: 2.73998e-06 [micro_interleaved_order_control]: 2.78003e-06 [assign_add_opt]: 1.62001e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.28002e-06 [full_micro_interleaved_order_control]: 2.22001e-06 [reorder_send_recv_between_fp_bp]: 3.26001e-06 [comm_op_add_attrs]: 1.11997e-06 [add_comm_op_reuse_tag]: 1.08001e-06 [interleave_split_concat_branches]: 1.31002e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.55999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.07999e-06 [control_data_broadcast_order]: 1.31e-05 [grouped_pairwise_exchange_alltoall]: 1.45001e-06 [offloading_packed_experts]: 4.33001e-06 [overlap_recompute_and_grad_model_parallel]: 4.77e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.79001e-06 [overlap_grad_ring_attention]: 4.97e-06 [overlap_grad_flash_sp]: 1.846e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.36e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 7.68e-05, [1] [Cycle 1]: 7.18e-05, [6] [build]: 2.98e-06 [elim_shapecalc]: 9.92001e-06 [elim_not_effective]: 1.214e-05 [opt_reshape]: 6.46999e-06 [fold_const_symbol]: 1.023e-05 [renormalize]: 2.69996e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.61998e-06 [auto_monad_reorder]: 1.775e-05 [get_jit_bprop_graph]: 1.64e-06 [rewriter_after_jit_bprop_graph]: 3.76001e-06 [opt_after_jit_grad]: 0.00051937 [validate]: 3.923e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.00634312 [execute]: 8.42e-06 Sums bootstrap : 0.000507s : 2.90% type_inference : 0.006425s : 36.79% event_method : 0.000015s : 0.08% auto_monad : 0.000065s : 0.37% graph_reusing : 0.000007s : 0.04% inline : 0.000003s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000024s : 0.14% optimize.rewriter_before_opt_a : 0.000056s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000037s : 0.21% optimize.opt_a.loop_unroll : 0.000023s : 0.13% optimize.opt_a.a_1 : 0.000492s : 2.82% optimize.opt_a.with_stream_mark : 0.000030s : 0.17% optimize.opt_a.recompute_prepare : 0.000015s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000156s : 0.89% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.13% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000019s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000020s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000540s : 3.09% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.13% optimize.opt_a.cse : 0.000045s : 0.26% optimize.opt_a.a_3 : 0.000078s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000521s : 2.98% optimize.opt_b.b_1 : 0.000112s : 0.64% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000020s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000026s : 0.15% optimize.loop_unroll : 0.000448s : 2.56% optimize.opt_after_cconv.c_1 : 0.000054s : 0.31% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000041s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000018s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000519s : 2.97% validate : 0.000039s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006343s : 36.32% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000157 24 19.76% : 0.000031s : 4: substitution.arithmetic_simplify 1.23% : 0.000002s : 2: substitution.elim_not_effective 0.89% : 0.000001s : 2: substitution.fold_const_symbol 3.65% : 0.000006s : 3: substitution.graph_param_transform 67.06% : 0.000105s : 3: substitution.inline 2.13% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.08% : 0.000005s : 4: substitution.remove_not_recompute_node 2.19% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006375 2 92.43% : 0.005892s : 1: type_inference.infer 7.57% : 0.000483s : 1: type_inference.specialize ------[replace.] 0.000030 3 100.00% : 0.000030s : 3: replace.inline ------[match.] 0.000103 3 100.00% : 0.000103s : 3: match.inline ------[predicate.] 0.000152 815 0.91% : 0.000001s : 8: predicate.accumulaten_eliminater 0.92% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.70% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000001s : 8: predicate.addn_zero_filter 0.78% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.23% : 0.000003s : 14: predicate.arithmetic_simplify 0.87% : 0.000001s : 8: predicate.cast_eliminate 0.74% : 0.000001s : 6: predicate.check_bprop_eliminate 0.65% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.69% : 0.000001s : 6: predicate.depend_value_elim 0.86% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_depend_swap 1.77% : 0.000003s : 17: predicate.environ_get_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.17% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 11: predicate.float_depend_g_call 0.60% : 0.000001s : 6: predicate.float_environ_get_switch 0.90% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.76% : 0.000001s : 6: predicate.incorporate_call 0.62% : 0.000001s : 6: predicate.incorporate_call_switch 6.59% : 0.000010s : 37: predicate.inline 0.94% : 0.000001s : 6: predicate.inline_without_move 0.45% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 6: predicate.less_batch_normalization 1.67% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 22: predicate.load_eliminater 1.10% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.98% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.62% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 6: predicate.merge_addn 0.59% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 8: predicate.minmaximum_grad 1.19% : 0.000002s : 3: predicate.mutable_eliminate 0.43% : 0.000001s : 3: predicate.opt_reshape 0.41% : 0.000001s : 3: predicate.parallel_virtual_node 1.55% : 0.000002s : 11: predicate.partial_defer_inline 1.34% : 0.000002s : 11: predicate.partial_eliminate 0.86% : 0.000001s : 8: predicate.print_const_string_wrapper 0.70% : 0.000001s : 6: predicate.reduce_all_const_elim 1.12% : 0.000002s : 8: predicate.reduce_eliminate 2.33% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.85% : 0.000001s : 6: predicate.remove_not_recompute_node 1.23% : 0.000002s : 14: predicate.replace_applicator 0.65% : 0.000001s : 6: predicate.replace_old_param 0.26% : 0.000000s : 3: predicate.reset_defer_inline 0.87% : 0.000001s : 8: predicate.reshape_eliminate 0.65% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 3: predicate.row_tensor_eliminate 0.81% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 6: predicate.shard_identity_eliminate 0.78% : 0.000001s : 6: predicate.special_op_eliminate 0.91% : 0.000001s : 6: predicate.specialize_transform 0.99% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.74% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.24% : 0.000002s : 11: predicate.switch_defer_inline 1.88% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.75% : 0.000007s : 38: predicate.switch_simplify 0.87% : 0.000001s : 8: predicate.tile_eliminate 0.84% : 0.000001s : 8: predicate.transpose_eliminate 1.60% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.39% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.51% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.47% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.76% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.16% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.08% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.76% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000311 7 37.32% : 0.000116s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.68% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030967 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.58% : 0.003276s : 1: add_attr 10.55% : 0.003266s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000070s : 1: auto_monad 0.07% : 0.000022s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.76% : 0.000545s : 1: bootstrap 0.10% : 0.000030s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.07% : 0.000022s : 1: event_method 0.04% : 0.000014s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000011s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.48% : 0.000458s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.72% : 0.000531s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000016s : 1: opt.transform.mutable_eliminate 2.81% : 0.000869s : 78: opt.transform.opt_a 0.17% : 0.000053s : 1: opt.transform.opt_after_cconv 0.08% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000045s : 2: opt.transform.opt_trans_graph 0.11% : 0.000035s : 4: opt.transform.symbol_engine_opt 7.26% : 0.002249s : 1: opt_a 0.44% : 0.000136s : 1: opt_after_cconv 1.71% : 0.000531s : 1: opt_after_jit_grad 0.64% : 0.000198s : 1: opt_b 13.90% : 0.004303s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000033s : 1: pre_auto_parallel 0.09% : 0.000028s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.97% : 0.000300s : 1: renormalize.infer 0.75% : 0.000232s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000041s : 1: rewriter_after_opt_a 0.19% : 0.000060s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000006s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000080s : 1: symbol_engine_optimizer 20.53% : 0.006357s : 1: task_emit 0.25% : 0.000076s : 1: tuple_transform 20.83% : 0.006450s : 1: type_inference 0.24% : 0.000073s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x8-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x8-kbk],max_mem:6.0M TotalTime = 0.924483, [24] [bootstrap]: 0.00064989 [type_inference]: 0.00688405 [event_method]: 1.586e-05 [auto_monad]: 6.265e-05 [graph_reusing]: 5.66e-06 [inline]: 2.32999e-06 [add_attr]: 0.00392253, [1] [add_attr_with_inline]: 0.00390918, [1] [Cycle 1]: 6.136e-05, [2] [tag_attr]: 1.803e-05 [meta_addattr_fg_expand]: 4.58999e-06 [parallel-infer-symbol]: 3.58999e-06 [pre_auto_parallel]: 3.122e-05 [insert-virtual-dataset]: 2.77002e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.87999e-06 [optimize]: 0.00452984, [53] [py_interpret_to_execute]: 2.357e-05 [rewriter_before_opt_a]: 7.096e-05 [opt_a]: 0.0024944, [2] [Cycle 1]: 0.00186005, [45] [expand_dump_flag]: 3.13998e-06 [switch_simplify]: 3.557e-05 [loop_unroll]: 2.129e-05 [a_1]: 0.0004934 [with_stream_mark]: 1.739e-05 [recompute_prepare]: 8.46002e-06 [updatestate_depend_eliminate]: 4.22e-06 [updatestate_assign_eliminate]: 3.43e-06 [updatestate_loads_eliminate]: 3.03e-06 [parameter_eliminate]: 1.87999e-06 [a_2]: 8.191e-05 [accelerated_algorithm]: 7.38e-06 [shard]: 2.16e-06 [meta_shard_fg_expand]: 1.91e-06 [shard_inline]: 6.49001e-06 [merge_send_recv]: 8.32e-06 [auto_parallel]: 7.00998e-06 [parallel]: 2.858e-05 [flash_sp]: 8.75999e-06 [merge_comm]: 4.16001e-06 [allreduce_fusion]: 3.71999e-06 [matmul_add_comm_reduction]: 9.64e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.92e-06 [virtual_dataset]: 6.48e-06 [get_grad_eliminate_]: 5.82999e-06 [virtual_output]: 5.83002e-06 [merge_forward]: 4.30999e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 1.042e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.247e-05 [merge_recompute_call_nodes]: 1.93002e-06 [before_grad]: 1.099e-05 [set_forward_comm_id_for_comm_node_pass]: 4.16001e-06 [meta_fg_expand]: 2.94001e-06 [flash_sp_send_recv_attached]: 2.63003e-06 [receive_attached]: 2.23002e-06 [after_resolve]: 1.356e-05 [a_after_grad]: 9.00999e-06 [renormalize]: 0.00064728 [add_forward_monad_depend]: 9.71e-06 [auto_monad_grad]: 2.94999e-06 [auto_monad_eliminator]: 1.504e-05 [cse]: 2.902e-05 [a_3]: 4.426e-05 [Cycle 2]: 0.00062257, [45] [expand_dump_flag]: 1.44e-06 [switch_simplify]: 7.11999e-06 [loop_unroll]: 5.73002e-06 [a_1]: 0.00011761 [with_stream_mark]: 1.242e-05 [recompute_prepare]: 5.78002e-06 [updatestate_depend_eliminate]: 3.04999e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.64001e-06 [parameter_eliminate]: 1.03001e-06 [a_2]: 7.241e-05 [accelerated_algorithm]: 5.80002e-06 [shard]: 1.34e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.91998e-06 [merge_send_recv]: 4.64002e-06 [auto_parallel]: 6.49999e-06 [parallel]: 4.74e-06 [flash_sp]: 3.3e-06 [merge_comm]: 3.37002e-06 [allreduce_fusion]: 3.04999e-06 [matmul_add_comm_reduction]: 4.90001e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.82002e-06 [virtual_dataset]: 5.56e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 5.16002e-06 [merge_forward]: 3.02002e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 6.90998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.04e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.82999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 1.97999e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 8.53001e-06 [a_after_grad]: 8e-06 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 1.82999e-06 [auto_monad_grad]: 1.29e-06 [auto_monad_eliminator]: 8.23001e-06 [cse]: 1.493e-05 [a_3]: 3.391e-05 [py_interpret_to_execute_after_opt_a]: 9.08002e-06 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 3.611e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 5.67999e-06 [mutable_eliminate]: 0.0005207 [opt_b]: 0.00019413, [1] [Cycle 1]: 0.00018689, [7] [b_1]: 0.00011482 [b_2]: 7.35998e-06 [updatestate_depend_eliminate]: 5.38002e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 5.8001e-07 [cse]: 1.782e-05 [optimize_parallel_all_gather_comm]: 1.829e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.579e-05 [loop_unroll]: 0.00043575 [opt_after_cconv]: 9.81e-05, [1] [Cycle 1]: 9.192e-05, [7] [c_1]: 2.669e-05 [parameter_eliminate]: 3.08e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.74999e-06 [updatestate_loads_eliminate]: 2.26998e-06 [cse]: 1.753e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.648e-05 [tuple_transform]: 7.249e-05, [1] [Cycle 1]: 6.789e-05, [4] [d_1]: 3.91e-05 [none_parameter_eliminate]: 2.04e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 6.59999e-06 [partial_unused_args_eliminate]: 1.93997e-06 [add_recomputation]: 5.115e-05 [cse_after_recomputation]: 2.258e-05, [1] [Cycle 1]: 1.753e-05, [1] [cse]: 1.207e-05 [environ_conv]: 9.32999e-06 [swap_dp_allreduce_reducescatter]: 5.57001e-06 [bias_add_comm_swap]: 2.44001e-06 [label_micro_interleaved_index]: 4.33999e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.39998e-06 [slice_recompute_activation]: 2.49999e-06 [micro_interleaved_order_control]: 2.73e-06 [assign_add_opt]: 1.31002e-06 [ForceFp32Comm]: 1.11997e-06 [remove_cast_before_assign_add]: 1.37e-06 [full_micro_interleaved_order_control]: 2.39999e-06 [reorder_send_recv_between_fp_bp]: 2.93e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.27999e-06 [interleave_parallel_branches]: 1.15999e-06 [overlap_opt_shard_in_pipeline]: 1.50999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89e-06 [control_data_broadcast_order]: 1.369e-05 [grouped_pairwise_exchange_alltoall]: 1.39998e-06 [offloading_packed_experts]: 1.337e-05 [overlap_recompute_and_grad_model_parallel]: 5.94999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.35999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.68002e-06 [overlap_recompute_comm]: 2.78998e-06 [overlap_grad_ring_attention]: 4.58001e-06 [overlap_grad_flash_sp]: 1.778e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 1.07998e-06 [symbol_engine_optimizer]: 7.516e-05, [1] [Cycle 1]: 7.035e-05, [6] [build]: 2.69001e-06 [elim_shapecalc]: 1.023e-05 [elim_not_effective]: 1.253e-05 [opt_reshape]: 6.91999e-06 [fold_const_symbol]: 9.61e-06 [renormalize]: 3.69997e-07 [detach_backward]: 2.04e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.673e-05 [get_jit_bprop_graph]: 1.50999e-06 [rewriter_after_jit_bprop_graph]: 4.07e-06 [opt_after_jit_grad]: 0.0004706 [validate]: 3.721e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.907578 [execute]: 9.71003e-06 Sums bootstrap : 0.000650s : 0.07% type_inference : 0.006884s : 0.75% event_method : 0.000016s : 0.00% auto_monad : 0.000063s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000031s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.00% optimize.rewriter_before_opt_a : 0.000071s : 0.01% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000043s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000611s : 0.07% optimize.opt_a.with_stream_mark : 0.000030s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000154s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000014s : 0.00% optimize.opt_a.parallel : 0.000033s : 0.00% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000647s : 0.07% optimize.opt_a.add_forward_monad_depend : 0.000012s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.00% optimize.opt_a.cse : 0.000044s : 0.00% optimize.opt_a.a_3 : 0.000078s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000521s : 0.06% optimize.opt_b.b_1 : 0.000115s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.00% optimize.loop_unroll : 0.000436s : 0.05% optimize.opt_after_cconv.c_1 : 0.000027s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000013s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000471s : 0.05% validate : 0.000037s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.907578s : 98.70% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000196 26 18.17% : 0.000036s : 5: substitution.arithmetic_simplify 0.98% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.02% : 0.000006s : 3: substitution.graph_param_transform 66.65% : 0.000131s : 3: substitution.inline 1.62% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.32% : 0.000005s : 4: substitution.remove_not_recompute_node 1.69% : 0.000003s : 2: substitution.replace_old_param 4.79% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006830 2 90.74% : 0.006197s : 1: type_inference.infer 9.26% : 0.000633s : 1: type_inference.specialize ------[replace.] 0.000039 4 78.65% : 0.000031s : 3: replace.inline 21.35% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000136 4 93.76% : 0.000128s : 3: match.inline 6.24% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 883 0.93% : 0.000002s : 9: predicate.accumulaten_eliminater 0.80% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 6: predicate.addn_check_dump 0.95% : 0.000002s : 9: predicate.addn_zero_filter 0.82% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.19% : 0.000004s : 15: predicate.arithmetic_simplify 0.93% : 0.000002s : 9: predicate.cast_eliminate 0.65% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.87% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.02% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.35% : 0.000001s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_depend_swap 1.71% : 0.000003s : 18: predicate.environ_get_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.34% : 0.000004s : 13: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.85% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.71% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.70% : 0.000011s : 40: predicate.inline 0.91% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 6: predicate.less_batch_normalization 1.68% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 25: predicate.load_eliminater 1.02% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.10% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.59% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 9: predicate.minmaximum_grad 1.07% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.50% : 0.000001s : 3: predicate.parallel_virtual_node 1.62% : 0.000003s : 13: predicate.partial_defer_inline 1.52% : 0.000002s : 13: predicate.partial_eliminate 0.90% : 0.000001s : 9: predicate.print_const_string_wrapper 0.68% : 0.000001s : 6: predicate.reduce_all_const_elim 1.31% : 0.000002s : 9: predicate.reduce_eliminate 2.44% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.44% : 0.000001s : 6: predicate.remove_not_recompute_node 1.24% : 0.000002s : 16: predicate.replace_applicator 0.59% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.93% : 0.000002s : 9: predicate.reshape_eliminate 0.64% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.81% : 0.000001s : 6: predicate.same_eliminate 0.44% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 6: predicate.shard_identity_eliminate 0.79% : 0.000001s : 6: predicate.special_op_eliminate 0.82% : 0.000001s : 6: predicate.specialize_transform 0.94% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.74% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.41% : 0.000002s : 13: predicate.switch_defer_inline 2.00% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.98% : 0.000008s : 43: predicate.switch_simplify 0.97% : 0.000002s : 9: predicate.tile_eliminate 0.88% : 0.000001s : 9: predicate.transpose_eliminate 1.58% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.35% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.08% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 3: predicate.value_based_eliminate 0.69% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000426 8 45.63% : 0.000194s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.37% : 0.000231s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.934719 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.42% : 0.003928s : 1: add_attr 0.42% : 0.003913s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000068s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000686s : 1: bootstrap 0.00% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000017s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000026s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.00% : 0.000022s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000444s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.06% : 0.000530s : 1: mutable_eliminate 0.00% : 0.000017s : 1: offloading_packed_experts 0.00% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000014s : 1: opt.transform.mutable_eliminate 0.11% : 0.000996s : 78: opt.transform.opt_a 0.00% : 0.000025s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000092s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000036s : 4: opt.transform.symbol_engine_opt 0.27% : 0.002498s : 1: opt_a 0.01% : 0.000102s : 1: opt_after_cconv 0.05% : 0.000481s : 1: opt_after_jit_grad 0.02% : 0.000198s : 1: opt_b 0.49% : 0.004534s : 1: optimize 0.00% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000036s : 1: pre_auto_parallel 0.00% : 0.000028s : 1: py_interpret_to_execute 0.00% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000020s : 1: remove_dup_value 0.04% : 0.000354s : 1: renormalize.infer 0.03% : 0.000286s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000040s : 1: rewriter_after_opt_a 0.01% : 0.000076s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000078s : 1: symbol_engine_optimizer 97.10% : 0.907610s : 1: task_emit 0.01% : 0.000075s : 1: tuple_transform 0.74% : 0.006903s : 1: type_inference 0.01% : 0.000067s : 1: validate . TotalTime = 0.0856536, [24] [bootstrap]: 0.00042324 [type_inference]: 0.00617291 [event_method]: 1.405e-05 [auto_monad]: 6.146e-05 [graph_reusing]: 6.03002e-06 [inline]: 2.16e-06 [add_attr]: 0.00311603, [1] [add_attr_with_inline]: 0.00310711, [1] [Cycle 1]: 4.889e-05, [2] [tag_attr]: 1.397e-05 [meta_addattr_fg_expand]: 4.08999e-06 [parallel-infer-symbol]: 3.58e-06 [pre_auto_parallel]: 2.642e-05 [insert-virtual-dataset]: 2.88e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.05002e-06 [pipeline_split]: 2.07001e-06 [optimize]: 0.0041984, [53] [py_interpret_to_execute]: 2.039e-05 [rewriter_before_opt_a]: 5.221e-05 [opt_a]: 0.0021648, [2] [Cycle 1]: 0.00151148, [45] [expand_dump_flag]: 3.16999e-06 [switch_simplify]: 2.96e-05 [loop_unroll]: 1.766e-05 [a_1]: 0.00037997 [with_stream_mark]: 1.456e-05 [recompute_prepare]: 8.15e-06 [updatestate_depend_eliminate]: 4.17e-06 [updatestate_assign_eliminate]: 3.50998e-06 [updatestate_loads_eliminate]: 3.55003e-06 [parameter_eliminate]: 2.04999e-06 [a_2]: 8.041e-05 [accelerated_algorithm]: 6.93e-06 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 1.64998e-06 [shard_inline]: 6.21e-06 [merge_send_recv]: 8.82e-06 [auto_parallel]: 6.93e-06 [parallel]: 1.928e-05 [flash_sp]: 7.45998e-06 [merge_comm]: 4.06001e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 9.89001e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 8.13999e-06 [virtual_dataset]: 6.11e-06 [get_grad_eliminate_]: 5.87999e-06 [virtual_output]: 5.76998e-06 [merge_forward]: 3.93999e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 9.68002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.207e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 1.078e-05 [set_forward_comm_id_for_comm_node_pass]: 3.59002e-06 [meta_fg_expand]: 2.79001e-06 [flash_sp_send_recv_attached]: 3.11001e-06 [receive_attached]: 2.44999e-06 [after_resolve]: 1.012e-05 [a_after_grad]: 8.93002e-06 [renormalize]: 0.00045626 [add_forward_monad_depend]: 5.46002e-06 [auto_monad_grad]: 2.19999e-06 [auto_monad_eliminator]: 1.515e-05 [cse]: 2.943e-05 [a_3]: 4.485e-05 [Cycle 2]: 0.00064276, [45] [expand_dump_flag]: 1.33002e-06 [switch_simplify]: 7.51001e-06 [loop_unroll]: 5.71e-06 [a_1]: 0.00011861 [with_stream_mark]: 1.094e-05 [recompute_prepare]: 6.59999e-06 [updatestate_depend_eliminate]: 3.08998e-06 [updatestate_assign_eliminate]: 2.38998e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 7.32e-05 [accelerated_algorithm]: 6.12001e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.42e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 5.71998e-06 [auto_parallel]: 6.63998e-06 [parallel]: 4.63999e-06 [flash_sp]: 3.58999e-06 [merge_comm]: 3.76999e-06 [allreduce_fusion]: 2.99001e-06 [matmul_add_comm_reduction]: 5.79e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.73e-06 [virtual_dataset]: 5.91998e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.47001e-06 [merge_forward]: 3.4e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 7.26999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.186e-05 [merge_recompute_call_nodes]: 1.13001e-06 [before_grad]: 1.072e-05 [set_forward_comm_id_for_comm_node_pass]: 4.55999e-06 [meta_fg_expand]: 2.39999e-06 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.17e-06 [after_resolve]: 9.67001e-06 [a_after_grad]: 7.95e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.50001e-06 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 7.92e-06 [cse]: 1.641e-05 [a_3]: 3.315e-05 [py_interpret_to_execute_after_opt_a]: 1.049e-05 [slice_cell_reuse_recomputed_activation]: 2.46e-06 [rewriter_after_opt_a]: 3.673e-05 [convert_after_rewriter]: 6.84999e-06 [order_py_execute_after_rewriter]: 5.64998e-06 [mutable_eliminate]: 0.00051371 [opt_b]: 0.00019776, [1] [Cycle 1]: 0.0001897, [7] [b_1]: 0.00011306 [b_2]: 8.04002e-06 [updatestate_depend_eliminate]: 6.77002e-06 [updatestate_assign_eliminate]: 2.73998e-06 [updatestate_loads_eliminate]: 2.76999e-06 [renormalize]: 3.70026e-07 [cse]: 1.944e-05 [optimize_parallel_all_gather_comm]: 1.749e-05 [overlap_param_gather]: 1.96998e-06 [cconv]: 2.494e-05 [loop_unroll]: 0.00046615 [opt_after_cconv]: 0.00010604, [1] [Cycle 1]: 9.955e-05, [7] [c_1]: 2.685e-05 [parameter_eliminate]: 3.06001e-06 [updatestate_depend_eliminate]: 6.61e-06 [updatestate_assign_eliminate]: 2.78e-06 [updatestate_loads_eliminate]: 2.41e-06 [cse]: 2.019e-05 [renormalize]: 6.30011e-07 [remove_dup_value]: 1.73e-05 [tuple_transform]: 7.324e-05, [1] [Cycle 1]: 6.851e-05, [4] [d_1]: 4.099e-05 [none_parameter_eliminate]: 1.86e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.53e-06 [partial_unused_args_eliminate]: 1.82999e-06 [add_recomputation]: 4.669e-05 [cse_after_recomputation]: 2.43e-05, [1] [Cycle 1]: 1.903e-05, [1] [cse]: 1.283e-05 [environ_conv]: 5.38002e-06 [swap_dp_allreduce_reducescatter]: 5.70001e-06 [bias_add_comm_swap]: 2.48e-06 [label_micro_interleaved_index]: 4.52998e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.73e-06 [micro_interleaved_order_control]: 2.26998e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 1.04998e-06 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.33998e-06 [reorder_send_recv_between_fp_bp]: 2.79999e-06 [comm_op_add_attrs]: 1.24e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.25999e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.239e-05 [grouped_pairwise_exchange_alltoall]: 1.88002e-06 [offloading_packed_experts]: 3.88001e-06 [overlap_recompute_and_grad_model_parallel]: 4.70001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.38002e-06 [overlap_recompute_comm]: 2.66e-06 [overlap_grad_ring_attention]: 4.79e-06 [overlap_grad_flash_sp]: 1.727e-05 [begin_end_overlap_inline]: 7.7e-07 [split_matmul_comm_elemetwise]: 2.78998e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 7.734e-05, [1] [Cycle 1]: 7.213e-05, [6] [build]: 3.15002e-06 [elim_shapecalc]: 1.009e-05 [elim_not_effective]: 1.286e-05 [opt_reshape]: 6.92002e-06 [fold_const_symbol]: 9.64e-06 [renormalize]: 1.8999e-07 [detach_backward]: 2.18002e-06 [pipeline_parallel_scheduler]: 1.59e-06 [auto_monad_reorder]: 2.949e-05 [get_jit_bprop_graph]: 1.57001e-06 [rewriter_after_jit_bprop_graph]: 4.23999e-06 [opt_after_jit_grad]: 0.00049138 [validate]: 3.921e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0708298 [execute]: 8.13001e-06 Sums bootstrap : 0.000423s : 0.52% type_inference : 0.006173s : 7.58% event_method : 0.000014s : 0.02% auto_monad : 0.000061s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000026s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000052s : 0.06% optimize.opt_a.expand_dump_flag : 0.000005s : 0.01% optimize.opt_a.switch_simplify : 0.000037s : 0.05% optimize.opt_a.loop_unroll : 0.000023s : 0.03% optimize.opt_a.a_1 : 0.000499s : 0.61% optimize.opt_a.with_stream_mark : 0.000025s : 0.03% optimize.opt_a.recompute_prepare : 0.000015s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000154s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000015s : 0.02% optimize.opt_a.auto_parallel : 0.000014s : 0.02% optimize.opt_a.parallel : 0.000024s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000456s : 0.56% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.03% optimize.opt_a.cse : 0.000046s : 0.06% optimize.opt_a.a_3 : 0.000078s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000037s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000514s : 0.63% optimize.opt_b.b_1 : 0.000113s : 0.14% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.03% optimize.loop_unroll : 0.000466s : 0.57% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000020s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000017s : 0.02% optimize.tuple_transform.d_1 : 0.000041s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.06% optimize.cse_after_recomputation.cse : 0.000013s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000029s : 0.04% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000491s : 0.60% validate : 0.000039s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.070830s : 86.92% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000166 24 19.31% : 0.000032s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 0.86% : 0.000001s : 2: substitution.fold_const_symbol 3.30% : 0.000005s : 3: substitution.graph_param_transform 67.21% : 0.000111s : 3: substitution.inline 2.26% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.98% : 0.000005s : 4: substitution.remove_not_recompute_node 2.67% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.006124 2 91.10% : 0.005579s : 1: type_inference.infer 8.90% : 0.000545s : 1: type_inference.specialize ------[replace.] 0.000029 3 100.00% : 0.000029s : 3: replace.inline ------[match.] 0.000109 3 100.00% : 0.000109s : 3: match.inline ------[predicate.] 0.000155 815 0.81% : 0.000001s : 8: predicate.accumulaten_eliminater 1.08% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 6: predicate.addn_check_dump 0.89% : 0.000001s : 8: predicate.addn_zero_filter 0.76% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.25% : 0.000003s : 14: predicate.arithmetic_simplify 1.07% : 0.000002s : 8: predicate.cast_eliminate 0.68% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.86% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.94% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.36% : 0.000001s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.03% : 0.000002s : 11: predicate.environ_get_depend_swap 1.77% : 0.000003s : 17: predicate.environ_get_eliminate 1.05% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.13% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 6: predicate.float_environ_get_switch 0.90% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.71% : 0.000001s : 6: predicate.incorporate_call 0.61% : 0.000001s : 6: predicate.incorporate_call_switch 6.29% : 0.000010s : 37: predicate.inline 1.01% : 0.000002s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.11% : 0.000002s : 6: predicate.less_batch_normalization 1.74% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 22: predicate.load_eliminater 1.15% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.94% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.79% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 8: predicate.minmaximum_grad 1.20% : 0.000002s : 3: predicate.mutable_eliminate 0.40% : 0.000001s : 3: predicate.opt_reshape 0.41% : 0.000001s : 3: predicate.parallel_virtual_node 1.52% : 0.000002s : 11: predicate.partial_defer_inline 1.30% : 0.000002s : 11: predicate.partial_eliminate 0.86% : 0.000001s : 8: predicate.print_const_string_wrapper 0.64% : 0.000001s : 6: predicate.reduce_all_const_elim 1.08% : 0.000002s : 8: predicate.reduce_eliminate 2.12% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 6: predicate.remove_not_recompute_node 1.20% : 0.000002s : 14: predicate.replace_applicator 0.84% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 0.86% : 0.000001s : 8: predicate.reshape_eliminate 0.74% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.92% : 0.000001s : 6: predicate.same_eliminate 0.52% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.03% : 0.000002s : 6: predicate.shard_identity_eliminate 0.84% : 0.000001s : 6: predicate.special_op_eliminate 0.86% : 0.000001s : 6: predicate.specialize_transform 1.08% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.25% : 0.000002s : 11: predicate.switch_defer_inline 1.88% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.89% : 0.000008s : 38: predicate.switch_simplify 0.88% : 0.000001s : 8: predicate.tile_eliminate 0.82% : 0.000001s : 8: predicate.transpose_eliminate 1.59% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.47% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.41% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.71% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.89% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 3: predicate.value_based_eliminate 0.74% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000321 7 39.05% : 0.000125s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.95% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.094463 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.30% : 0.003121s : 1: add_attr 3.29% : 0.003111s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000067s : 1: auto_monad 0.04% : 0.000034s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.48% : 0.000451s : 1: bootstrap 0.03% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000027s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.50% : 0.000476s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.55% : 0.000524s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 0.93% : 0.000877s : 78: opt.transform.opt_a 0.03% : 0.000025s : 1: opt.transform.opt_after_cconv 0.02% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000045s : 2: opt.transform.opt_trans_graph 0.04% : 0.000036s : 4: opt.transform.symbol_engine_opt 2.30% : 0.002168s : 1: opt_a 0.12% : 0.000110s : 1: opt_after_cconv 0.53% : 0.000502s : 1: opt_after_jit_grad 0.21% : 0.000201s : 1: opt_b 4.45% : 0.004203s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000031s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000021s : 1: remove_dup_value 0.25% : 0.000236s : 1: renormalize.infer 0.23% : 0.000213s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000041s : 1: rewriter_after_opt_a 0.06% : 0.000056s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000006s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000080s : 1: symbol_engine_optimizer 75.00% : 0.070850s : 1: task_emit 0.08% : 0.000076s : 1: tuple_transform 6.55% : 0.006192s : 1: type_inference 0.07% : 0.000063s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x8-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x8-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x9-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x9-pynative],max_mem:6.0M TotalTime = 0.0226973, [24] [bootstrap]: 0.00057612 [type_inference]: 0.00657894 [event_method]: 1.534e-05 [auto_monad]: 6.115e-05 [graph_reusing]: 6.57002e-06 [inline]: 2.24999e-06 [add_attr]: 0.00372427, [1] [add_attr_with_inline]: 0.00371171, [1] [Cycle 1]: 5.275e-05, [2] [tag_attr]: 1.578e-05 [meta_addattr_fg_expand]: 4.85001e-06 [parallel-infer-symbol]: 3.31001e-06 [pre_auto_parallel]: 2.778e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 8.40024e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 2.43002e-06 [optimize]: 0.00427816, [53] [py_interpret_to_execute]: 2.241e-05 [rewriter_before_opt_a]: 6.426e-05 [opt_a]: 0.00229874, [2] [Cycle 1]: 0.00167211, [45] [expand_dump_flag]: 2.63998e-06 [switch_simplify]: 3.343e-05 [loop_unroll]: 2.109e-05 [a_1]: 0.00045698 [with_stream_mark]: 1.482e-05 [recompute_prepare]: 8.47e-06 [updatestate_depend_eliminate]: 3.85e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 3.22002e-06 [parameter_eliminate]: 1.97001e-06 [a_2]: 8.272e-05 [accelerated_algorithm]: 6.71999e-06 [shard]: 1.92999e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 6.08998e-06 [merge_send_recv]: 8.86002e-06 [auto_parallel]: 6.91001e-06 [parallel]: 2.865e-05 [flash_sp]: 8e-06 [merge_comm]: 3.94002e-06 [allreduce_fusion]: 3.68e-06 [matmul_add_comm_reduction]: 9.61998e-06 [allreduce_slice_to_reducescatter]: 7.60017e-07 [virtual_shard_identity]: 7.93999e-06 [virtual_dataset]: 6.17999e-06 [get_grad_eliminate_]: 6.06003e-06 [virtual_output]: 5.92999e-06 [merge_forward]: 3.83001e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 9.39e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.132e-05 [merge_recompute_call_nodes]: 1.90001e-06 [before_grad]: 1.003e-05 [set_forward_comm_id_for_comm_node_pass]: 3.88001e-06 [meta_fg_expand]: 2.79999e-06 [flash_sp_send_recv_attached]: 3.04999e-06 [receive_attached]: 2.58e-06 [after_resolve]: 9.12001e-06 [a_after_grad]: 8.92e-06 [renormalize]: 0.00051584 [add_forward_monad_depend]: 8.08999e-06 [auto_monad_grad]: 2.47001e-06 [auto_monad_eliminator]: 1.5e-05 [cse]: 3.063e-05 [a_3]: 4.391e-05 [Cycle 2]: 0.00061589, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 7.21001e-06 [loop_unroll]: 5.97999e-06 [a_1]: 0.00011703 [with_stream_mark]: 1.108e-05 [recompute_prepare]: 6.63998e-06 [updatestate_depend_eliminate]: 3.21999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.83e-06 [parameter_eliminate]: 8.99978e-07 [a_2]: 7.2e-05 [accelerated_algorithm]: 5.76e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 4.55999e-06 [auto_parallel]: 6.26e-06 [parallel]: 4.68001e-06 [flash_sp]: 3.28e-06 [merge_comm]: 3.38999e-06 [allreduce_fusion]: 3.03e-06 [matmul_add_comm_reduction]: 5.61e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.33e-06 [virtual_dataset]: 5.61e-06 [get_grad_eliminate_]: 5.82001e-06 [virtual_output]: 5.05001e-06 [merge_forward]: 2.93e-06 [cell_reuse_recompute_pass]: 1.50001e-06 [offload_activation]: 6.34999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.026e-05 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 8.56002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.40998e-06 [meta_fg_expand]: 1.77999e-06 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.89e-06 [a_after_grad]: 7.95e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.29998e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.24999e-06 [cse]: 1.439e-05 [a_3]: 3.309e-05 [py_interpret_to_execute_after_opt_a]: 8.54998e-06 [slice_cell_reuse_recomputed_activation]: 2.05002e-06 [rewriter_after_opt_a]: 3.575e-05 [convert_after_rewriter]: 6.96001e-06 [order_py_execute_after_rewriter]: 5.14998e-06 [mutable_eliminate]: 0.0004904 [opt_b]: 0.00019299, [1] [Cycle 1]: 0.00018572, [7] [b_1]: 0.00011146 [b_2]: 7.84002e-06 [updatestate_depend_eliminate]: 6.02999e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.50002e-06 [renormalize]: 5.8001e-07 [cse]: 1.852e-05 [optimize_parallel_all_gather_comm]: 1.785e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 2.52e-05 [loop_unroll]: 0.00043139 [opt_after_cconv]: 0.00010064, [1] [Cycle 1]: 9.433e-05, [7] [c_1]: 2.636e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 5.54998e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.49001e-06 [cse]: 1.856e-05 [renormalize]: 4.59986e-07 [remove_dup_value]: 1.537e-05 [tuple_transform]: 7.115e-05, [1] [Cycle 1]: 6.625e-05, [4] [d_1]: 3.837e-05 [none_parameter_eliminate]: 1.72001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.89001e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 5.388e-05 [cse_after_recomputation]: 2.218e-05, [1] [Cycle 1]: 1.742e-05, [1] [cse]: 1.172e-05 [environ_conv]: 9.74e-06 [swap_dp_allreduce_reducescatter]: 5.21998e-06 [bias_add_comm_swap]: 2.56e-06 [label_micro_interleaved_index]: 4.12003e-06 [label_fine_grained_interleaved_index]: 2.81999e-06 [merge_cast_opt]: 1.52001e-06 [slice_recompute_activation]: 2.48e-06 [micro_interleaved_order_control]: 2.46998e-06 [assign_add_opt]: 1.37999e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.78998e-06 [reorder_send_recv_between_fp_bp]: 3.16001e-06 [comm_op_add_attrs]: 1.14998e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.45001e-06 [interleave_parallel_branches]: 1.50999e-06 [overlap_opt_shard_in_pipeline]: 1.22999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.319e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 4.19002e-06 [overlap_recompute_and_grad_model_parallel]: 5.07999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.77999e-06 [overlap_recompute_comm]: 2.40002e-06 [overlap_grad_ring_attention]: 4.15999e-06 [overlap_grad_flash_sp]: 1.809e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.20002e-06 [split_layernorm_comm]: 1.90001e-06 [handle_group_info]: 1.14e-06 [symbol_engine_optimizer]: 7.4e-05, [1] [Cycle 1]: 6.963e-05, [6] [build]: 3.39001e-06 [elim_shapecalc]: 9.47999e-06 [elim_not_effective]: 1.256e-05 [opt_reshape]: 6.10002e-06 [fold_const_symbol]: 9.76e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.94e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.654e-05 [get_jit_bprop_graph]: 1.92001e-06 [rewriter_after_jit_bprop_graph]: 3.93999e-06 [opt_after_jit_grad]: 0.00047051 [validate]: 3.829e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00666277 [execute]: 7.36999e-06 Sums bootstrap : 0.000576s : 3.21% type_inference : 0.006579s : 36.67% event_method : 0.000015s : 0.09% auto_monad : 0.000061s : 0.34% graph_reusing : 0.000007s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.12% optimize.rewriter_before_opt_a : 0.000064s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000574s : 3.20% optimize.opt_a.with_stream_mark : 0.000026s : 0.14% optimize.opt_a.recompute_prepare : 0.000015s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000155s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000013s : 0.07% optimize.opt_a.parallel : 0.000033s : 0.19% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000516s : 2.88% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.05% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000045s : 0.25% optimize.opt_a.a_3 : 0.000077s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000490s : 2.73% optimize.opt_b.b_1 : 0.000111s : 0.62% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000019s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.14% optimize.loop_unroll : 0.000431s : 2.40% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000019s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000054s : 0.30% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000010s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000002s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000471s : 2.62% validate : 0.000038s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006663s : 37.14% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000175 26 19.01% : 0.000033s : 5: substitution.arithmetic_simplify 1.23% : 0.000002s : 2: substitution.elim_not_effective 0.86% : 0.000001s : 2: substitution.fold_const_symbol 3.03% : 0.000005s : 3: substitution.graph_param_transform 64.57% : 0.000113s : 3: substitution.inline 1.86% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.51% : 0.000004s : 4: substitution.remove_not_recompute_node 1.82% : 0.000003s : 2: substitution.replace_old_param 5.10% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006524 2 89.83% : 0.005860s : 1: type_inference.infer 10.17% : 0.000664s : 1: type_inference.specialize ------[replace.] 0.000038 4 80.31% : 0.000030s : 3: replace.inline 19.69% : 0.000007s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 4 93.15% : 0.000111s : 3: match.inline 6.85% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 883 0.92% : 0.000001s : 9: predicate.accumulaten_eliminater 0.83% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.92% : 0.000001s : 9: predicate.addn_zero_filter 0.81% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.07% : 0.000003s : 15: predicate.arithmetic_simplify 0.95% : 0.000002s : 9: predicate.cast_eliminate 0.66% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.91% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.00% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.15% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_depend_swap 1.87% : 0.000003s : 18: predicate.environ_get_eliminate 1.16% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.52% : 0.000004s : 13: predicate.float_depend_g_call 0.57% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.72% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.71% : 0.000001s : 6: predicate.incorporate_call 0.56% : 0.000001s : 6: predicate.incorporate_call_switch 6.29% : 0.000010s : 40: predicate.inline 0.88% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 6: predicate.less_batch_normalization 1.68% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 25: predicate.load_eliminater 1.01% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.62% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.61% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.10% : 0.000002s : 3: predicate.mutable_eliminate 0.41% : 0.000001s : 3: predicate.opt_reshape 0.53% : 0.000001s : 3: predicate.parallel_virtual_node 1.57% : 0.000003s : 13: predicate.partial_defer_inline 1.44% : 0.000002s : 13: predicate.partial_eliminate 0.90% : 0.000001s : 9: predicate.print_const_string_wrapper 0.87% : 0.000001s : 6: predicate.reduce_all_const_elim 1.32% : 0.000002s : 9: predicate.reduce_eliminate 2.36% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 6: predicate.remove_not_recompute_node 1.23% : 0.000002s : 16: predicate.replace_applicator 0.65% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 1.01% : 0.000002s : 9: predicate.reshape_eliminate 0.62% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 3: predicate.row_tensor_eliminate 0.82% : 0.000001s : 6: predicate.same_eliminate 0.53% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.76% : 0.000001s : 6: predicate.shard_identity_eliminate 0.81% : 0.000001s : 6: predicate.special_op_eliminate 0.87% : 0.000001s : 6: predicate.specialize_transform 0.89% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.70% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 13: predicate.switch_defer_inline 1.94% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.89% : 0.000008s : 43: predicate.switch_simplify 0.95% : 0.000002s : 9: predicate.tile_eliminate 0.93% : 0.000002s : 9: predicate.transpose_eliminate 1.56% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.56% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 3: predicate.value_based_eliminate 0.68% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000386 8 43.48% : 0.000168s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.52% : 0.000218s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032311 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.54% : 0.003729s : 1: add_attr 11.50% : 0.003716s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000058s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000067s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.87% : 0.000605s : 1: bootstrap 0.09% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000013s : 1: environ_conv 0.07% : 0.000022s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.36% : 0.000440s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.55% : 0.000500s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000014s : 1: opt.transform.mutable_eliminate 2.94% : 0.000951s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000091s : 28: opt.transform.opt_b 0.13% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.12% : 0.002302s : 1: opt_a 0.32% : 0.000104s : 1: opt_after_cconv 1.49% : 0.000480s : 1: opt_after_jit_grad 0.61% : 0.000196s : 1: opt_b 13.26% : 0.004283s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000006s : 1: pipeline_split 0.10% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000027s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.85% : 0.000275s : 1: renormalize.infer 0.72% : 0.000234s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000040s : 1: rewriter_after_opt_a 0.21% : 0.000069s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000077s : 1: symbol_engine_optimizer 20.66% : 0.006675s : 1: task_emit 0.23% : 0.000074s : 1: tuple_transform 20.42% : 0.006598s : 1: type_inference 0.21% : 0.000068s : 1: validate TotalTime = 0.0207082, [24] [bootstrap]: 0.00047915 [type_inference]: 0.00609841 [event_method]: 1.303e-05 [auto_monad]: 6.059e-05 [graph_reusing]: 5.54998e-06 [inline]: 1.97999e-06 [add_attr]: 0.00303953, [1] [add_attr_with_inline]: 0.0030312, [1] [Cycle 1]: 4.365e-05, [2] [tag_attr]: 1.333e-05 [meta_addattr_fg_expand]: 4.05e-06 [parallel-infer-symbol]: 3.66999e-06 [pre_auto_parallel]: 2.554e-05 [insert-virtual-dataset]: 3.13998e-06 [parallel-infer-symbol-second]: 1.07998e-06 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.87001e-06 [optimize]: 0.00401758, [53] [py_interpret_to_execute]: 2.004e-05 [rewriter_before_opt_a]: 5.109e-05 [opt_a]: 0.00208975, [2] [Cycle 1]: 0.00147111, [45] [expand_dump_flag]: 1.57001e-06 [switch_simplify]: 2.372e-05 [loop_unroll]: 1.687e-05 [a_1]: 0.00034706 [with_stream_mark]: 1.508e-05 [recompute_prepare]: 8.05999e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 3.79002e-06 [updatestate_loads_eliminate]: 3.28e-06 [parameter_eliminate]: 1.96998e-06 [a_2]: 7.974e-05 [accelerated_algorithm]: 6.76e-06 [shard]: 2.36998e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 6.37001e-06 [merge_send_recv]: 8.33999e-06 [auto_parallel]: 6.56e-06 [parallel]: 1.871e-05 [flash_sp]: 8.10999e-06 [merge_comm]: 4.13001e-06 [allreduce_fusion]: 3.89002e-06 [matmul_add_comm_reduction]: 9.86e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 7.48e-06 [virtual_dataset]: 6.14999e-06 [get_grad_eliminate_]: 5.79e-06 [virtual_output]: 6.01e-06 [merge_forward]: 4.15999e-06 [cell_reuse_recompute_pass]: 1.21997e-06 [offload_activation]: 1.02e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.179e-05 [merge_recompute_call_nodes]: 1.60001e-06 [before_grad]: 1.029e-05 [set_forward_comm_id_for_comm_node_pass]: 3.93999e-06 [meta_fg_expand]: 2.67001e-06 [flash_sp_send_recv_attached]: 2.40002e-06 [receive_attached]: 2.21e-06 [after_resolve]: 9.19e-06 [a_after_grad]: 8.95001e-06 [renormalize]: 0.00046934 [add_forward_monad_depend]: 4.89998e-06 [auto_monad_grad]: 2.32999e-06 [auto_monad_eliminator]: 1.364e-05 [cse]: 2.766e-05 [a_3]: 4.245e-05 [Cycle 2]: 0.0006091, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 7.03998e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.00011529 [with_stream_mark]: 9.97999e-06 [recompute_prepare]: 5.87001e-06 [updatestate_depend_eliminate]: 2.98e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.94001e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 7.152e-05 [accelerated_algorithm]: 5.77999e-06 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.77001e-06 [merge_send_recv]: 4.4e-06 [auto_parallel]: 5.56e-06 [parallel]: 4.24002e-06 [flash_sp]: 3.5e-06 [merge_comm]: 3.43e-06 [allreduce_fusion]: 3.09999e-06 [matmul_add_comm_reduction]: 7.55e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.75002e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 5.25001e-06 [virtual_output]: 5.32001e-06 [merge_forward]: 2.71e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 6.69001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.066e-05 [merge_recompute_call_nodes]: 9.00007e-07 [before_grad]: 9.03002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.77998e-06 [meta_fg_expand]: 2.36e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 9.09989e-07 [after_resolve]: 8.50001e-06 [a_after_grad]: 7.73999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.24001e-06 [cse]: 1.408e-05 [a_3]: 3.335e-05 [py_interpret_to_execute_after_opt_a]: 7.99002e-06 [slice_cell_reuse_recomputed_activation]: 1.98002e-06 [rewriter_after_opt_a]: 3.331e-05 [convert_after_rewriter]: 7.5e-06 [order_py_execute_after_rewriter]: 4.56002e-06 [mutable_eliminate]: 0.0004851 [opt_b]: 0.00019019, [1] [Cycle 1]: 0.00018411, [7] [b_1]: 0.00011179 [b_2]: 7.52998e-06 [updatestate_depend_eliminate]: 5.35001e-06 [updatestate_assign_eliminate]: 2.73e-06 [updatestate_loads_eliminate]: 2.54999e-06 [renormalize]: 6.10016e-07 [cse]: 1.782e-05 [optimize_parallel_all_gather_comm]: 1.674e-05 [overlap_param_gather]: 1.84998e-06 [cconv]: 2.188e-05 [loop_unroll]: 0.00042218 [opt_after_cconv]: 9.765e-05, [1] [Cycle 1]: 9.196e-05, [7] [c_1]: 2.617e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.58e-06 [cse]: 1.813e-05 [renormalize]: 4.60015e-07 [remove_dup_value]: 1.494e-05 [tuple_transform]: 6.972e-05, [1] [Cycle 1]: 6.506e-05, [4] [d_1]: 3.834e-05 [none_parameter_eliminate]: 1.45001e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.49999e-06 [partial_unused_args_eliminate]: 1.97001e-06 [add_recomputation]: 4.643e-05 [cse_after_recomputation]: 2.223e-05, [1] [Cycle 1]: 1.777e-05, [1] [cse]: 1.216e-05 [environ_conv]: 4.77e-06 [swap_dp_allreduce_reducescatter]: 5.20999e-06 [bias_add_comm_swap]: 2.81e-06 [label_micro_interleaved_index]: 4.14997e-06 [label_fine_grained_interleaved_index]: 2.77002e-06 [merge_cast_opt]: 1.48002e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.11002e-06 [full_micro_interleaved_order_control]: 2.37001e-06 [reorder_send_recv_between_fp_bp]: 3.23e-06 [comm_op_add_attrs]: 1.10999e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.41002e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.73002e-06 [control_data_broadcast_order]: 1.319e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 3.82002e-06 [overlap_recompute_and_grad_model_parallel]: 4.94998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.15002e-06 [overlap_grad_ring_attention]: 4.52e-06 [overlap_grad_flash_sp]: 1.853e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.34999e-06 [split_layernorm_comm]: 1.88002e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 7.587e-05, [1] [Cycle 1]: 7.147e-05, [6] [build]: 3.11999e-06 [elim_shapecalc]: 8.95001e-06 [elim_not_effective]: 1.32e-05 [opt_reshape]: 6.86999e-06 [fold_const_symbol]: 9.64e-06 [renormalize]: 2.29978e-07 [detach_backward]: 1.89e-06 [pipeline_parallel_scheduler]: 1.71e-06 [auto_monad_reorder]: 1.569e-05 [get_jit_bprop_graph]: 1.27999e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00045756 [validate]: 3.734e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00622205 [execute]: 8.65999e-06 Sums bootstrap : 0.000479s : 2.88% type_inference : 0.006098s : 36.63% event_method : 0.000013s : 0.08% auto_monad : 0.000061s : 0.36% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000026s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000051s : 0.31% optimize.opt_a.expand_dump_flag : 0.000002s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.18% optimize.opt_a.loop_unroll : 0.000022s : 0.13% optimize.opt_a.a_1 : 0.000462s : 2.78% optimize.opt_a.with_stream_mark : 0.000025s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000151s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000469s : 2.82% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000042s : 0.25% optimize.opt_a.a_3 : 0.000076s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000485s : 2.91% optimize.opt_b.b_1 : 0.000112s : 0.67% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000422s : 2.54% optimize.opt_after_cconv.c_1 : 0.000026s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.cse : 0.000018s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000458s : 2.75% validate : 0.000037s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006222s : 37.37% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000138 24 20.92% : 0.000029s : 4: substitution.arithmetic_simplify 1.63% : 0.000002s : 2: substitution.elim_not_effective 0.96% : 0.000001s : 2: substitution.fold_const_symbol 4.23% : 0.000006s : 3: substitution.graph_param_transform 64.48% : 0.000089s : 3: substitution.inline 2.36% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.37% : 0.000005s : 4: substitution.remove_not_recompute_node 2.06% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006052 2 92.25% : 0.005583s : 1: type_inference.infer 7.75% : 0.000469s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000087 3 100.00% : 0.000087s : 3: match.inline ------[predicate.] 0.000146 815 0.85% : 0.000001s : 8: predicate.accumulaten_eliminater 1.01% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000001s : 8: predicate.addn_zero_filter 0.79% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.31% : 0.000003s : 14: predicate.arithmetic_simplify 0.99% : 0.000001s : 8: predicate.cast_eliminate 0.68% : 0.000001s : 6: predicate.check_bprop_eliminate 0.66% : 0.000001s : 6: predicate.compare_switch_simplify 0.25% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 3: predicate.elim_not_effective 0.44% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_depend_swap 1.81% : 0.000003s : 17: predicate.environ_get_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 6: predicate.float_environ_get_switch 0.94% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.74% : 0.000001s : 6: predicate.incorporate_call 0.61% : 0.000001s : 6: predicate.incorporate_call_switch 6.44% : 0.000009s : 37: predicate.inline 1.03% : 0.000002s : 6: predicate.inline_without_move 0.44% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 6: predicate.less_batch_normalization 1.60% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.25% : 0.000003s : 22: predicate.load_eliminater 1.11% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.03% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.68% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 8: predicate.minmaximum_grad 1.17% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.44% : 0.000002s : 11: predicate.partial_defer_inline 1.37% : 0.000002s : 11: predicate.partial_eliminate 0.87% : 0.000001s : 8: predicate.print_const_string_wrapper 0.68% : 0.000001s : 6: predicate.reduce_all_const_elim 1.14% : 0.000002s : 8: predicate.reduce_eliminate 2.23% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 6: predicate.remove_not_recompute_node 1.23% : 0.000002s : 14: predicate.replace_applicator 0.66% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 8: predicate.reshape_eliminate 0.71% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.51% : 0.000001s : 3: predicate.row_tensor_eliminate 0.88% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 6: predicate.shard_identity_eliminate 0.81% : 0.000001s : 6: predicate.special_op_eliminate 0.90% : 0.000001s : 6: predicate.specialize_transform 1.01% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.29% : 0.000002s : 11: predicate.switch_defer_inline 1.88% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.64% : 0.000007s : 38: predicate.switch_simplify 0.91% : 0.000001s : 8: predicate.tile_eliminate 0.84% : 0.000001s : 8: predicate.transpose_eliminate 1.66% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.51% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000004s : 20: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.53% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.18% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 3: predicate.value_based_eliminate 0.79% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 6: predicate.virtual_output_eliminate 0.36% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000294 7 36.84% : 0.000108s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.16% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029214 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.42% : 0.003044s : 1: add_attr 10.39% : 0.003035s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000066s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.73% : 0.000504s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000017s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.05% : 0.000015s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.48% : 0.000431s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.69% : 0.000495s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.82% : 0.000824s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000035s : 4: opt.transform.symbol_engine_opt 7.16% : 0.002093s : 1: opt_a 0.35% : 0.000101s : 1: opt_after_cconv 1.60% : 0.000468s : 1: opt_after_jit_grad 0.66% : 0.000194s : 1: opt_b 13.77% : 0.004022s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.05% : 0.000016s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.87% : 0.000255s : 1: renormalize.infer 0.71% : 0.000207s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000037s : 1: rewriter_after_opt_a 0.19% : 0.000055s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000079s : 1: symbol_engine_optimizer 21.35% : 0.006236s : 1: task_emit 0.25% : 0.000073s : 1: tuple_transform 20.94% : 0.006117s : 1: type_inference 0.23% : 0.000068s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x9-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x9-kbk],max_mem:6.0M TotalTime = 0.93385, [24] [bootstrap]: 0.0006972 [type_inference]: 0.00714711 [event_method]: 1.508e-05 [auto_monad]: 6.294e-05 [graph_reusing]: 6.11e-06 [inline]: 2.32001e-06 [add_attr]: 0.003843, [1] [add_attr_with_inline]: 0.00383035, [1] [Cycle 1]: 5.284e-05, [2] [tag_attr]: 1.652e-05 [meta_addattr_fg_expand]: 4.38999e-06 [parallel-infer-symbol]: 3.28e-06 [pre_auto_parallel]: 2.831e-05 [insert-virtual-dataset]: 2.64999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.07999e-06 [pipeline_split]: 1.75001e-06 [optimize]: 0.0043196, [53] [py_interpret_to_execute]: 2.49e-05 [rewriter_before_opt_a]: 6.596e-05 [opt_a]: 0.00231024, [2] [Cycle 1]: 0.00167885, [45] [expand_dump_flag]: 2.97002e-06 [switch_simplify]: 3.417e-05 [loop_unroll]: 2.098e-05 [a_1]: 0.00045342 [with_stream_mark]: 1.422e-05 [recompute_prepare]: 9.00999e-06 [updatestate_depend_eliminate]: 4.1e-06 [updatestate_assign_eliminate]: 3.35e-06 [updatestate_loads_eliminate]: 3.53e-06 [parameter_eliminate]: 2.08002e-06 [a_2]: 7.999e-05 [accelerated_algorithm]: 7.13e-06 [shard]: 1.86e-06 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 9.16998e-06 [auto_parallel]: 6.34999e-06 [parallel]: 2.636e-05 [flash_sp]: 7.75e-06 [merge_comm]: 4e-06 [allreduce_fusion]: 3.62002e-06 [matmul_add_comm_reduction]: 9.83002e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.40998e-06 [virtual_dataset]: 6.04001e-06 [get_grad_eliminate_]: 6.21e-06 [virtual_output]: 5.86998e-06 [merge_forward]: 3.90998e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.005e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.172e-05 [merge_recompute_call_nodes]: 1.60001e-06 [before_grad]: 1.014e-05 [set_forward_comm_id_for_comm_node_pass]: 3.8e-06 [meta_fg_expand]: 2.79001e-06 [flash_sp_send_recv_attached]: 2.55997e-06 [receive_attached]: 2.73e-06 [after_resolve]: 1.038e-05 [a_after_grad]: 8.61002e-06 [renormalize]: 0.00052546 [add_forward_monad_depend]: 9.87001e-06 [auto_monad_grad]: 2.54999e-06 [auto_monad_eliminator]: 1.471e-05 [cse]: 3.172e-05 [a_3]: 4.337e-05 [Cycle 2]: 0.00062113, [45] [expand_dump_flag]: 1.74998e-06 [switch_simplify]: 7.36001e-06 [loop_unroll]: 6.37001e-06 [a_1]: 0.00011635 [with_stream_mark]: 1.235e-05 [recompute_prepare]: 6.15002e-06 [updatestate_depend_eliminate]: 3.20998e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.76999e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 7.051e-05 [accelerated_algorithm]: 5.97999e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.43002e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 5.59998e-06 [auto_parallel]: 5.97001e-06 [parallel]: 4.52003e-06 [flash_sp]: 4e-06 [merge_comm]: 3.23e-06 [allreduce_fusion]: 2.92002e-06 [matmul_add_comm_reduction]: 5.27001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.33998e-06 [virtual_dataset]: 5.35001e-06 [get_grad_eliminate_]: 5.12999e-06 [virtual_output]: 5.30999e-06 [merge_forward]: 2.71e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 7.01999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.095e-05 [merge_recompute_call_nodes]: 9.29984e-07 [before_grad]: 8.96002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.32002e-06 [meta_fg_expand]: 1.89999e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 9.60019e-07 [after_resolve]: 9.39e-06 [a_after_grad]: 8e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.05999e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.98e-06 [cse]: 1.393e-05 [a_3]: 3.275e-05 [py_interpret_to_execute_after_opt_a]: 9.51e-06 [slice_cell_reuse_recomputed_activation]: 2.64999e-06 [rewriter_after_opt_a]: 3.488e-05 [convert_after_rewriter]: 6.76999e-06 [order_py_execute_after_rewriter]: 5.32001e-06 [mutable_eliminate]: 0.00052706 [opt_b]: 0.00019301, [1] [Cycle 1]: 0.00018601, [7] [b_1]: 0.00011174 [b_2]: 7.40998e-06 [updatestate_depend_eliminate]: 6.16e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.53e-06 [renormalize]: 5.90022e-07 [cse]: 1.932e-05 [optimize_parallel_all_gather_comm]: 1.656e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.544e-05 [loop_unroll]: 0.00043218 [opt_after_cconv]: 9.883e-05, [1] [Cycle 1]: 9.27e-05, [7] [c_1]: 2.596e-05 [parameter_eliminate]: 3.03998e-06 [updatestate_depend_eliminate]: 5.61e-06 [updatestate_assign_eliminate]: 2.82002e-06 [updatestate_loads_eliminate]: 2.41e-06 [cse]: 1.746e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.485e-05 [tuple_transform]: 7.116e-05, [1] [Cycle 1]: 6.671e-05, [4] [d_1]: 3.902e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.68e-06 [partial_unused_args_eliminate]: 1.92001e-06 [add_recomputation]: 5.075e-05 [cse_after_recomputation]: 2.184e-05, [1] [Cycle 1]: 1.737e-05, [1] [cse]: 1.193e-05 [environ_conv]: 8.41002e-06 [swap_dp_allreduce_reducescatter]: 5.20999e-06 [bias_add_comm_swap]: 2.75002e-06 [label_micro_interleaved_index]: 4.47e-06 [label_fine_grained_interleaved_index]: 3.06001e-06 [merge_cast_opt]: 1.36998e-06 [slice_recompute_activation]: 2.44001e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 8.70001e-07 [full_micro_interleaved_order_control]: 2.58e-06 [reorder_send_recv_between_fp_bp]: 3.06001e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.34e-06 [overlap_opt_shard_grad_in_pipeline]: 2.23002e-06 [control_data_broadcast_order]: 1.276e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.68e-06 [overlap_recompute_and_grad_model_parallel]: 4.90001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.26002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.46002e-06 [overlap_recompute_comm]: 2.68e-06 [overlap_grad_ring_attention]: 4.07e-06 [overlap_grad_flash_sp]: 1.938e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.39001e-06 [split_layernorm_comm]: 1.84e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 7.381e-05, [1] [Cycle 1]: 6.906e-05, [6] [build]: 2.64999e-06 [elim_shapecalc]: 9.32001e-06 [elim_not_effective]: 1.25e-05 [opt_reshape]: 6.19001e-06 [fold_const_symbol]: 9.51e-06 [renormalize]: 2.59985e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.67001e-06 [auto_monad_reorder]: 1.662e-05 [get_jit_bprop_graph]: 1.62999e-06 [rewriter_after_jit_bprop_graph]: 3.78001e-06 [opt_after_jit_grad]: 0.00048676 [validate]: 3.894e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.916909 [execute]: 1.751e-05 Sums bootstrap : 0.000697s : 0.08% type_inference : 0.007147s : 0.77% event_method : 0.000015s : 0.00% auto_monad : 0.000063s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000025s : 0.00% optimize.rewriter_before_opt_a : 0.000066s : 0.01% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000042s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000570s : 0.06% optimize.opt_a.with_stream_mark : 0.000027s : 0.00% optimize.opt_a.recompute_prepare : 0.000015s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000150s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000015s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000031s : 0.00% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000526s : 0.06% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.00% optimize.opt_a.cse : 0.000046s : 0.00% optimize.opt_a.a_3 : 0.000076s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000527s : 0.06% optimize.opt_b.b_1 : 0.000112s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000019s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.00% optimize.loop_unroll : 0.000432s : 0.05% optimize.opt_after_cconv.c_1 : 0.000026s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000008s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000487s : 0.05% validate : 0.000039s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.916909s : 98.70% execute : 0.000018s : 0.00% Time group info: ------[substitution.] 0.000177 26 18.79% : 0.000033s : 5: substitution.arithmetic_simplify 1.16% : 0.000002s : 2: substitution.elim_not_effective 0.92% : 0.000002s : 2: substitution.fold_const_symbol 3.36% : 0.000006s : 3: substitution.graph_param_transform 64.12% : 0.000113s : 3: substitution.inline 1.87% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.66% : 0.000005s : 4: substitution.remove_not_recompute_node 2.18% : 0.000004s : 2: substitution.replace_old_param 4.94% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007089 2 90.98% : 0.006449s : 1: type_inference.infer 9.02% : 0.000640s : 1: type_inference.specialize ------[replace.] 0.000039 4 78.55% : 0.000031s : 3: replace.inline 21.45% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 4 93.32% : 0.000111s : 3: match.inline 6.68% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 883 0.91% : 0.000001s : 9: predicate.accumulaten_eliminater 0.86% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000001s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.04% : 0.000003s : 15: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.65% : 0.000001s : 6: predicate.check_bprop_eliminate 0.59% : 0.000001s : 6: predicate.compare_switch_simplify 0.18% : 0.000000s : 3: predicate.const_output_eliminate 0.69% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 9: predicate.dict_set_item_eliminator 0.97% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.38% : 0.000001s : 3: predicate.elim_not_effective 0.36% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_depend_swap 1.83% : 0.000003s : 18: predicate.environ_get_eliminate 1.16% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.45% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.82% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.71% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.42% : 0.000010s : 40: predicate.inline 0.93% : 0.000001s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 6: predicate.less_batch_normalization 1.61% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 25: predicate.load_eliminater 1.11% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.53% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.56% : 0.000001s : 6: predicate.merge_addn 0.83% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.59% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 9: predicate.minmaximum_grad 1.31% : 0.000002s : 3: predicate.mutable_eliminate 0.34% : 0.000001s : 3: predicate.opt_reshape 0.56% : 0.000001s : 3: predicate.parallel_virtual_node 1.61% : 0.000003s : 13: predicate.partial_defer_inline 1.44% : 0.000002s : 13: predicate.partial_eliminate 0.86% : 0.000001s : 9: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.14% : 0.000002s : 9: predicate.reduce_eliminate 2.32% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.56% : 0.000001s : 6: predicate.remove_not_recompute_node 1.34% : 0.000002s : 16: predicate.replace_applicator 0.64% : 0.000001s : 6: predicate.replace_old_param 0.24% : 0.000000s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 9: predicate.reshape_eliminate 0.58% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.74% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.77% : 0.000001s : 6: predicate.shard_identity_eliminate 0.74% : 0.000001s : 6: predicate.special_op_eliminate 0.79% : 0.000001s : 6: predicate.specialize_transform 1.07% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.73% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 13: predicate.switch_defer_inline 1.95% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.93% : 0.000008s : 43: predicate.switch_simplify 0.89% : 0.000001s : 9: predicate.tile_eliminate 0.90% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.04% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.55% : 0.000001s : 3: predicate.value_based_eliminate 0.69% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000422 8 46.03% : 0.000194s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.97% : 0.000228s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.943627 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.41% : 0.003848s : 1: add_attr 0.41% : 0.003834s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000068s : 1: auto_monad 0.00% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.08% : 0.000743s : 1: bootstrap 0.00% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.00% : 0.000021s : 1: event_method 0.00% : 0.000025s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000441s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000537s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000015s : 1: opt.transform.mutable_eliminate 0.10% : 0.000947s : 78: opt.transform.opt_a 0.00% : 0.000025s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000089s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000034s : 4: opt.transform.symbol_engine_opt 0.25% : 0.002313s : 1: opt_a 0.01% : 0.000102s : 1: opt_after_cconv 0.05% : 0.000497s : 1: opt_after_jit_grad 0.02% : 0.000197s : 1: opt_b 0.46% : 0.004324s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000029s : 1: py_interpret_to_execute 0.00% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000019s : 1: remove_dup_value 0.03% : 0.000279s : 1: renormalize.infer 0.03% : 0.000239s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000039s : 1: rewriter_after_opt_a 0.01% : 0.000070s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000077s : 1: symbol_engine_optimizer 97.17% : 0.916930s : 1: task_emit 0.01% : 0.000074s : 1: tuple_transform 0.76% : 0.007168s : 1: type_inference 0.01% : 0.000066s : 1: validate TotalTime = 0.0776573, [24] [bootstrap]: 0.00049106 [type_inference]: 0.00623872 [event_method]: 1.248e-05 [auto_monad]: 6.325e-05 [graph_reusing]: 6.17001e-06 [inline]: 2.40002e-06 [add_attr]: 0.00311871, [1] [add_attr_with_inline]: 0.00310988, [1] [Cycle 1]: 5.232e-05, [2] [tag_attr]: 1.556e-05 [meta_addattr_fg_expand]: 4.03001e-06 [parallel-infer-symbol]: 3.21001e-06 [pre_auto_parallel]: 2.553e-05 [insert-virtual-dataset]: 2.89001e-06 [parallel-infer-symbol-second]: 9.10019e-07 [dataset_repeat_opt]: 2.09999e-06 [pipeline_split]: 1.81e-06 [optimize]: 0.00407888, [53] [py_interpret_to_execute]: 2.174e-05 [rewriter_before_opt_a]: 5.307e-05 [opt_a]: 0.00215613, [2] [Cycle 1]: 0.00152742, [45] [expand_dump_flag]: 2.67001e-06 [switch_simplify]: 2.888e-05 [loop_unroll]: 1.677e-05 [a_1]: 0.00037787 [with_stream_mark]: 1.523e-05 [recompute_prepare]: 7.73001e-06 [updatestate_depend_eliminate]: 3.58e-06 [updatestate_assign_eliminate]: 3.66001e-06 [updatestate_loads_eliminate]: 3.31999e-06 [parameter_eliminate]: 2.04e-06 [a_2]: 8.07e-05 [accelerated_algorithm]: 7.06001e-06 [shard]: 2.12001e-06 [meta_shard_fg_expand]: 1.70001e-06 [shard_inline]: 6.26998e-06 [merge_send_recv]: 9.46e-06 [auto_parallel]: 6.87002e-06 [parallel]: 1.79e-05 [flash_sp]: 7.22002e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.86001e-06 [matmul_add_comm_reduction]: 9.67001e-06 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 7.5e-06 [virtual_dataset]: 5.94e-06 [get_grad_eliminate_]: 5.54998e-06 [virtual_output]: 5.58002e-06 [merge_forward]: 4.12e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 1.016e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.153e-05 [merge_recompute_call_nodes]: 1.84e-06 [before_grad]: 1.038e-05 [set_forward_comm_id_for_comm_node_pass]: 3.7e-06 [meta_fg_expand]: 2.58e-06 [flash_sp_send_recv_attached]: 2.61999e-06 [receive_attached]: 1.98997e-06 [after_resolve]: 9.49999e-06 [a_after_grad]: 8.40999e-06 [renormalize]: 0.00048692 [add_forward_monad_depend]: 4.72e-06 [auto_monad_grad]: 2.15002e-06 [auto_monad_eliminator]: 1.429e-05 [cse]: 2.957e-05 [a_3]: 4.291e-05 [Cycle 2]: 0.00061844, [45] [expand_dump_flag]: 8.09989e-07 [switch_simplify]: 7.00998e-06 [loop_unroll]: 5.81e-06 [a_1]: 0.00011578 [with_stream_mark]: 1.289e-05 [recompute_prepare]: 6.02999e-06 [updatestate_depend_eliminate]: 3.26001e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 7.175e-05 [accelerated_algorithm]: 5.81e-06 [shard]: 1.06997e-06 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 5.60001e-06 [merge_send_recv]: 4.87e-06 [auto_parallel]: 5.64e-06 [parallel]: 3.70998e-06 [flash_sp]: 3.82002e-06 [merge_comm]: 3.68e-06 [allreduce_fusion]: 3.10998e-06 [matmul_add_comm_reduction]: 5.39998e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.87002e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.29e-06 [merge_forward]: 2.96999e-06 [cell_reuse_recompute_pass]: 1.55999e-06 [offload_activation]: 6.39999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.94e-06 [set_forward_comm_id_for_comm_node_pass]: 3.54002e-06 [meta_fg_expand]: 1.76998e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 9.20001e-07 [after_resolve]: 8.57e-06 [a_after_grad]: 7.97e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 6.76999e-06 [cse]: 1.514e-05 [a_3]: 3.513e-05 [py_interpret_to_execute_after_opt_a]: 8.03999e-06 [slice_cell_reuse_recomputed_activation]: 2.22999e-06 [rewriter_after_opt_a]: 3.418e-05 [convert_after_rewriter]: 6.93998e-06 [order_py_execute_after_rewriter]: 5.04e-06 [mutable_eliminate]: 0.00047843 [opt_b]: 0.00018982, [1] [Cycle 1]: 0.00018358, [7] [b_1]: 0.00011072 [b_2]: 7.62998e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.44999e-06 [renormalize]: 4.19997e-07 [cse]: 1.898e-05 [optimize_parallel_all_gather_comm]: 1.654e-05 [overlap_param_gather]: 1.92999e-06 [cconv]: 2.378e-05 [loop_unroll]: 0.00043421 [opt_after_cconv]: 9.825e-05, [1] [Cycle 1]: 9.25e-05, [7] [c_1]: 2.609e-05 [parameter_eliminate]: 2.40002e-06 [updatestate_depend_eliminate]: 5.44998e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.44001e-06 [cse]: 1.846e-05 [renormalize]: 4.59986e-07 [remove_dup_value]: 1.596e-05 [tuple_transform]: 6.982e-05, [1] [Cycle 1]: 6.515e-05, [4] [d_1]: 3.819e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 6.47001e-06 [partial_unused_args_eliminate]: 1.92999e-06 [add_recomputation]: 4.47e-05 [cse_after_recomputation]: 2.197e-05, [1] [Cycle 1]: 1.732e-05, [1] [cse]: 1.201e-05 [environ_conv]: 5.54998e-06 [swap_dp_allreduce_reducescatter]: 5.37999e-06 [bias_add_comm_swap]: 2.57001e-06 [label_micro_interleaved_index]: 4.4e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.22001e-06 [micro_interleaved_order_control]: 2.59001e-06 [assign_add_opt]: 1.67001e-06 [ForceFp32Comm]: 7.90023e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.73e-06 [reorder_send_recv_between_fp_bp]: 2.98e-06 [comm_op_add_attrs]: 1.38002e-06 [add_comm_op_reuse_tag]: 1.29998e-06 [interleave_split_concat_branches]: 1.25001e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.24003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.232e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.71001e-06 [overlap_recompute_and_grad_model_parallel]: 4.95001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 4.89e-06 [overlap_grad_flash_sp]: 1.72e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.83998e-06 [split_layernorm_comm]: 1.66998e-06 [handle_group_info]: 1.05999e-06 [symbol_engine_optimizer]: 7.211e-05, [1] [Cycle 1]: 6.774e-05, [6] [build]: 2.66e-06 [elim_shapecalc]: 9.09e-06 [elim_not_effective]: 1.23e-05 [opt_reshape]: 6.38e-06 [fold_const_symbol]: 9.03002e-06 [renormalize]: 2.70025e-07 [detach_backward]: 2.01e-06 [pipeline_parallel_scheduler]: 1.62999e-06 [auto_monad_reorder]: 1.626e-05 [get_jit_bprop_graph]: 1.09003e-06 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00049233 [validate]: 3.62e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.0628199 [execute]: 9.78998e-06 Sums bootstrap : 0.000491s : 0.67% type_inference : 0.006239s : 8.49% event_method : 0.000012s : 0.02% auto_monad : 0.000063s : 0.09% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.03% optimize.rewriter_before_opt_a : 0.000053s : 0.07% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000036s : 0.05% optimize.opt_a.loop_unroll : 0.000023s : 0.03% optimize.opt_a.a_1 : 0.000494s : 0.67% optimize.opt_a.with_stream_mark : 0.000028s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000152s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000014s : 0.02% optimize.opt_a.auto_parallel : 0.000013s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000018s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000487s : 0.66% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000045s : 0.06% optimize.opt_a.a_3 : 0.000078s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000478s : 0.65% optimize.opt_b.b_1 : 0.000111s : 0.15% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000434s : 0.59% optimize.opt_after_cconv.c_1 : 0.000026s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.06% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000492s : 0.67% validate : 0.000036s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.062820s : 85.46% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000164 24 18.37% : 0.000030s : 4: substitution.arithmetic_simplify 1.24% : 0.000002s : 2: substitution.elim_not_effective 0.84% : 0.000001s : 2: substitution.fold_const_symbol 3.59% : 0.000006s : 3: substitution.graph_param_transform 69.09% : 0.000113s : 3: substitution.inline 2.08% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.81% : 0.000005s : 4: substitution.remove_not_recompute_node 1.97% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006191 2 91.16% : 0.005644s : 1: type_inference.infer 8.84% : 0.000547s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000111 3 100.00% : 0.000111s : 3: match.inline ------[predicate.] 0.000150 815 0.85% : 0.000001s : 8: predicate.accumulaten_eliminater 0.95% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 6: predicate.addn_check_dump 0.89% : 0.000001s : 8: predicate.addn_zero_filter 0.79% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.25% : 0.000003s : 14: predicate.arithmetic_simplify 0.91% : 0.000001s : 8: predicate.cast_eliminate 0.70% : 0.000001s : 6: predicate.check_bprop_eliminate 0.67% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.85% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.96% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.97% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.16% : 0.000002s : 11: predicate.environ_get_depend_swap 1.86% : 0.000003s : 17: predicate.environ_get_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.19% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.31% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 6: predicate.float_environ_get_switch 0.89% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.75% : 0.000001s : 6: predicate.incorporate_call 0.61% : 0.000001s : 6: predicate.incorporate_call_switch 6.21% : 0.000009s : 37: predicate.inline 0.91% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.05% : 0.000002s : 6: predicate.less_batch_normalization 1.71% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 22: predicate.load_eliminater 1.04% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.99% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.83% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 0.99% : 0.000001s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.47% : 0.000002s : 11: predicate.partial_defer_inline 1.29% : 0.000002s : 11: predicate.partial_eliminate 0.93% : 0.000001s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.27% : 0.000002s : 8: predicate.reduce_eliminate 2.34% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.63% : 0.000001s : 6: predicate.remove_not_recompute_node 1.29% : 0.000002s : 14: predicate.replace_applicator 0.83% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 8: predicate.reshape_eliminate 0.85% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.97% : 0.000001s : 6: predicate.same_eliminate 0.47% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 6: predicate.shard_identity_eliminate 0.85% : 0.000001s : 6: predicate.special_op_eliminate 0.93% : 0.000001s : 6: predicate.specialize_transform 1.01% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.20% : 0.000002s : 11: predicate.switch_defer_inline 1.87% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.84% : 0.000007s : 38: predicate.switch_simplify 0.87% : 0.000001s : 8: predicate.tile_eliminate 0.85% : 0.000001s : 8: predicate.transpose_eliminate 1.56% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.52% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.96% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 3: predicate.value_based_eliminate 0.73% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000306 7 39.12% : 0.000120s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.88% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.086348 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.62% : 0.003124s : 1: add_attr 3.61% : 0.003114s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.08% : 0.000068s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.61% : 0.000528s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000018s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.51% : 0.000443s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000488s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.00% : 0.000860s : 78: opt.transform.opt_a 0.03% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000042s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.50% : 0.002159s : 1: opt_a 0.12% : 0.000102s : 1: opt_after_cconv 0.58% : 0.000503s : 1: opt_after_jit_grad 0.22% : 0.000193s : 1: opt_b 4.73% : 0.004083s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000020s : 1: remove_dup_value 0.30% : 0.000263s : 1: renormalize.infer 0.25% : 0.000217s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000038s : 1: rewriter_after_opt_a 0.07% : 0.000057s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000075s : 1: symbol_engine_optimizer 72.78% : 0.062843s : 1: task_emit 0.08% : 0.000073s : 1: tuple_transform 7.25% : 0.006256s : 1: type_inference 0.07% : 0.000062s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2.0-dtype_x9-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x9-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x0-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x0-pynative],max_mem:6.0M TotalTime = 0.0223442, [24] [bootstrap]: 0.00050425 [type_inference]: 0.00648144 [event_method]: 1.463e-05 [auto_monad]: 5.903e-05 [graph_reusing]: 6.39999e-06 [inline]: 1.96e-06 [add_attr]: 0.00369526, [1] [add_attr_with_inline]: 0.00368478, [1] [Cycle 1]: 4.352e-05, [2] [tag_attr]: 1.359e-05 [meta_addattr_fg_expand]: 4.49998e-06 [parallel-infer-symbol]: 3.03e-06 [pre_auto_parallel]: 2.505e-05 [insert-virtual-dataset]: 2.52001e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00417971, [53] [py_interpret_to_execute]: 2.147e-05 [rewriter_before_opt_a]: 6.322e-05 [opt_a]: 0.00222886, [2] [Cycle 1]: 0.00159868, [45] [expand_dump_flag]: 3.21001e-06 [switch_simplify]: 3.379e-05 [loop_unroll]: 2.086e-05 [a_1]: 0.00043377 [with_stream_mark]: 1.286e-05 [recompute_prepare]: 8.58001e-06 [updatestate_depend_eliminate]: 3.85998e-06 [updatestate_assign_eliminate]: 3.17002e-06 [updatestate_loads_eliminate]: 3.03998e-06 [parameter_eliminate]: 1.40999e-06 [a_2]: 8.299e-05 [accelerated_algorithm]: 6.78e-06 [shard]: 1.92001e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 6.29001e-06 [merge_send_recv]: 7.38e-06 [auto_parallel]: 6.24001e-06 [parallel]: 2.484e-05 [flash_sp]: 6.88e-06 [merge_comm]: 4.02e-06 [allreduce_fusion]: 3.55e-06 [matmul_add_comm_reduction]: 8.48001e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.68999e-06 [virtual_dataset]: 6.11998e-06 [get_grad_eliminate_]: 5.66e-06 [virtual_output]: 5.82001e-06 [merge_forward]: 3.64002e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 8.07e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.201e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 1.027e-05 [set_forward_comm_id_for_comm_node_pass]: 3.86999e-06 [meta_fg_expand]: 2.74999e-06 [flash_sp_send_recv_attached]: 2.87002e-06 [receive_attached]: 2.25002e-06 [after_resolve]: 9.19998e-06 [a_after_grad]: 8.77e-06 [renormalize]: 0.00048595 [add_forward_monad_depend]: 9.54e-06 [auto_monad_grad]: 1.89e-06 [auto_monad_eliminator]: 1.438e-05 [cse]: 2.848e-05 [a_3]: 4.413e-05 [Cycle 2]: 0.00062069, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 7.21999e-06 [loop_unroll]: 5.61e-06 [a_1]: 0.00011363 [with_stream_mark]: 9.96e-06 [recompute_prepare]: 5.96e-06 [updatestate_depend_eliminate]: 2.99001e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.84001e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 7.876e-05 [accelerated_algorithm]: 5.99e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.24e-06 [shard_inline]: 5.67001e-06 [merge_send_recv]: 5.22999e-06 [auto_parallel]: 5.89999e-06 [parallel]: 4.73001e-06 [flash_sp]: 3.5e-06 [merge_comm]: 3.48e-06 [allreduce_fusion]: 3.26999e-06 [matmul_add_comm_reduction]: 5.10999e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.61e-06 [virtual_dataset]: 5.57999e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 5.56e-06 [merge_forward]: 2.81999e-06 [cell_reuse_recompute_pass]: 1.37999e-06 [offload_activation]: 6.09999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.062e-05 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 9.20999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 1.71e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.22e-06 [after_resolve]: 8.53001e-06 [a_after_grad]: 7.86001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 6.74001e-06 [cse]: 1.432e-05 [a_3]: 3.557e-05 [py_interpret_to_execute_after_opt_a]: 8.23001e-06 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 3.149e-05 [convert_after_rewriter]: 7.1e-06 [order_py_execute_after_rewriter]: 5.04998e-06 [mutable_eliminate]: 0.00046328 [opt_b]: 0.00019296, [1] [Cycle 1]: 0.00018668, [7] [b_1]: 0.00011407 [b_2]: 7.64002e-06 [updatestate_depend_eliminate]: 5.73997e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.52001e-06 [renormalize]: 5.29981e-07 [cse]: 1.746e-05 [optimize_parallel_all_gather_comm]: 1.71e-05 [overlap_param_gather]: 2.09999e-06 [cconv]: 2.16e-05 [loop_unroll]: 0.00042821 [opt_after_cconv]: 9.842e-05, [1] [Cycle 1]: 9.244e-05, [7] [c_1]: 2.588e-05 [parameter_eliminate]: 2.48e-06 [updatestate_depend_eliminate]: 5.49e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.44001e-06 [cse]: 1.825e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.488e-05 [tuple_transform]: 9.854e-05, [1] [Cycle 1]: 9.347e-05, [4] [d_1]: 6.573e-05 [none_parameter_eliminate]: 1.73002e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.57002e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 5.118e-05 [cse_after_recomputation]: 2.279e-05, [1] [Cycle 1]: 1.773e-05, [1] [cse]: 1.235e-05 [environ_conv]: 7.74002e-06 [swap_dp_allreduce_reducescatter]: 5.21002e-06 [bias_add_comm_swap]: 2.83998e-06 [label_micro_interleaved_index]: 4.42e-06 [label_fine_grained_interleaved_index]: 2.60997e-06 [merge_cast_opt]: 1.34998e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.27001e-06 [assign_add_opt]: 1.60999e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.39e-06 [full_micro_interleaved_order_control]: 2.64001e-06 [reorder_send_recv_between_fp_bp]: 2.89999e-06 [comm_op_add_attrs]: 1.14998e-06 [add_comm_op_reuse_tag]: 1.04998e-06 [interleave_split_concat_branches]: 1.52001e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.281e-05 [grouped_pairwise_exchange_alltoall]: 1.72001e-06 [offloading_packed_experts]: 3.98001e-06 [overlap_recompute_and_grad_model_parallel]: 5.12e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 4.51002e-06 [overlap_grad_flash_sp]: 1.653e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.32001e-06 [split_layernorm_comm]: 1.91998e-06 [handle_group_info]: 1.40999e-06 [symbol_engine_optimizer]: 7.315e-05, [1] [Cycle 1]: 6.876e-05, [6] [build]: 2.58998e-06 [elim_shapecalc]: 9.44e-06 [elim_not_effective]: 1.206e-05 [opt_reshape]: 6.71e-06 [fold_const_symbol]: 9.87001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.64998e-06 [pipeline_parallel_scheduler]: 1.91e-06 [auto_monad_reorder]: 1.688e-05 [get_jit_bprop_graph]: 1.12e-06 [rewriter_after_jit_bprop_graph]: 0.00012917 [opt_after_jit_grad]: 0.00047299 [validate]: 3.493e-05 [backend_pass]: 1.20001e-06 [task_emit]: 0.00649577 [execute]: 8.03999e-06 Sums bootstrap : 0.000504s : 2.86% type_inference : 0.006481s : 36.75% event_method : 0.000015s : 0.08% auto_monad : 0.000059s : 0.33% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000063s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000547s : 3.10% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000015s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000162s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000030s : 0.17% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000014s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000486s : 2.76% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.06% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000043s : 0.24% optimize.opt_a.a_3 : 0.000080s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000463s : 2.63% optimize.opt_b.b_1 : 0.000114s : 0.65% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.12% optimize.loop_unroll : 0.000428s : 2.43% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.08% optimize.tuple_transform.d_1 : 0.000066s : 0.37% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000008s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000002s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.09% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000129s : 0.73% opt_after_jit_grad : 0.000473s : 2.68% validate : 0.000035s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006496s : 36.83% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000187 26 16.73% : 0.000031s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 17.79% : 0.000033s : 3: substitution.graph_param_transform 52.99% : 0.000099s : 3: substitution.inline 1.89% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.91% : 0.000005s : 4: substitution.remove_not_recompute_node 1.75% : 0.000003s : 2: substitution.replace_old_param 4.09% : 0.000008s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006431 2 89.34% : 0.005746s : 1: type_inference.infer 10.66% : 0.000686s : 1: type_inference.specialize ------[replace.] 0.000037 4 79.32% : 0.000029s : 3: replace.inline 20.68% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000104 4 93.41% : 0.000097s : 3: match.inline 6.59% : 0.000007s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 883 1.07% : 0.000002s : 9: predicate.accumulaten_eliminater 0.86% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 6: predicate.addn_check_dump 0.97% : 0.000002s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 15: predicate.arithmetic_simplify 0.97% : 0.000002s : 9: predicate.cast_eliminate 0.62% : 0.000001s : 6: predicate.check_bprop_eliminate 0.63% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.62% : 0.000001s : 6: predicate.depend_value_elim 0.89% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.06% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 3: predicate.elim_not_effective 0.38% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.38% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_depend_swap 1.78% : 0.000003s : 18: predicate.environ_get_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 13: predicate.float_depend_g_call 0.61% : 0.000001s : 6: predicate.float_environ_get_switch 0.85% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.71% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.74% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.31% : 0.000010s : 40: predicate.inline 0.83% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 6: predicate.less_batch_normalization 1.66% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 25: predicate.load_eliminater 0.94% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.68% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.59% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.06% : 0.000002s : 3: predicate.mutable_eliminate 0.41% : 0.000001s : 3: predicate.opt_reshape 0.34% : 0.000001s : 3: predicate.parallel_virtual_node 1.60% : 0.000003s : 13: predicate.partial_defer_inline 1.45% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.63% : 0.000001s : 6: predicate.reduce_all_const_elim 1.25% : 0.000002s : 9: predicate.reduce_eliminate 2.47% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 6: predicate.remove_not_recompute_node 1.30% : 0.000002s : 16: predicate.replace_applicator 0.59% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 0.94% : 0.000002s : 9: predicate.reshape_eliminate 0.74% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.80% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 6: predicate.shard_identity_eliminate 0.74% : 0.000001s : 6: predicate.special_op_eliminate 0.88% : 0.000001s : 6: predicate.specialize_transform 0.85% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 13: predicate.switch_defer_inline 1.94% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.34% : 0.000009s : 43: predicate.switch_simplify 0.87% : 0.000001s : 9: predicate.tile_eliminate 0.86% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 15: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 2.96% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.51% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.07% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 3: predicate.value_based_eliminate 0.77% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000372 8 43.67% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.33% : 0.000210s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031815 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.63% : 0.003700s : 1: add_attr 11.59% : 0.003688s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000064s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.68% : 0.000536s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000026s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000011s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.37% : 0.000437s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.48% : 0.000472s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000933s : 78: opt.transform.opt_a 0.08% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000093s : 28: opt.transform.opt_b 0.22% : 0.000070s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.02% : 0.002232s : 1: opt_a 0.32% : 0.000102s : 1: opt_after_cconv 1.52% : 0.000483s : 1: opt_after_jit_grad 0.62% : 0.000196s : 1: opt_b 13.15% : 0.004184s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.09% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.80% : 0.000256s : 1: renormalize.infer 0.70% : 0.000223s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.42% : 0.000135s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000035s : 1: rewriter_after_opt_a 0.21% : 0.000067s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000076s : 1: symbol_engine_optimizer 20.45% : 0.006508s : 1: task_emit 0.32% : 0.000102s : 1: tuple_transform 20.42% : 0.006496s : 1: type_inference 0.20% : 0.000064s : 1: validate TotalTime = 0.0207438, [24] [bootstrap]: 0.00046356 [type_inference]: 0.006035 [event_method]: 1.232e-05 [auto_monad]: 6.303e-05 [graph_reusing]: 5.69999e-06 [inline]: 1.89e-06 [add_attr]: 0.00311702, [1] [add_attr_with_inline]: 0.00310849, [1] [Cycle 1]: 4.597e-05, [2] [tag_attr]: 1.399e-05 [meta_addattr_fg_expand]: 4.32e-06 [parallel-infer-symbol]: 3.35e-06 [pre_auto_parallel]: 2.592e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.36e-06 [pipeline_split]: 1.69998e-06 [optimize]: 0.00413314, [53] [py_interpret_to_execute]: 2.07e-05 [rewriter_before_opt_a]: 5.229e-05 [opt_a]: 0.00218298, [2] [Cycle 1]: 0.00153449, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 2.966e-05 [loop_unroll]: 1.765e-05 [a_1]: 0.00036577 [with_stream_mark]: 1.526e-05 [recompute_prepare]: 8.35999e-06 [updatestate_depend_eliminate]: 4.21001e-06 [updatestate_assign_eliminate]: 3.28998e-06 [updatestate_loads_eliminate]: 3.36001e-06 [parameter_eliminate]: 2.22999e-06 [a_2]: 8.336e-05 [accelerated_algorithm]: 7.56999e-06 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 2.22001e-06 [shard_inline]: 6.32001e-06 [merge_send_recv]: 8.72e-06 [auto_parallel]: 6.24001e-06 [parallel]: 1.821e-05 [flash_sp]: 7.88001e-06 [merge_comm]: 3.76999e-06 [allreduce_fusion]: 3.66999e-06 [matmul_add_comm_reduction]: 9.77001e-06 [allreduce_slice_to_reducescatter]: 1.00001e-06 [virtual_shard_identity]: 8.37e-06 [virtual_dataset]: 7.28999e-06 [get_grad_eliminate_]: 6.23e-06 [virtual_output]: 5.77001e-06 [merge_forward]: 4.04002e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.188e-05 [merge_recompute_call_nodes]: 1.59998e-06 [before_grad]: 1.068e-05 [set_forward_comm_id_for_comm_node_pass]: 3.83999e-06 [meta_fg_expand]: 2.71999e-06 [flash_sp_send_recv_attached]: 2.49001e-06 [receive_attached]: 2.26e-06 [after_resolve]: 1.022e-05 [a_after_grad]: 9.36002e-06 [renormalize]: 0.00048692 [add_forward_monad_depend]: 4.85001e-06 [auto_monad_grad]: 2.28998e-06 [auto_monad_eliminator]: 1.483e-05 [cse]: 2.885e-05 [a_3]: 4.605e-05 [Cycle 2]: 0.00063776, [45] [expand_dump_flag]: 1.49998e-06 [switch_simplify]: 7.50998e-06 [loop_unroll]: 6.08002e-06 [a_1]: 0.00012051 [with_stream_mark]: 1.057e-05 [recompute_prepare]: 6.92002e-06 [updatestate_depend_eliminate]: 3.31001e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.09998e-06 [a_2]: 7.505e-05 [accelerated_algorithm]: 6.45002e-06 [shard]: 1.14998e-06 [meta_shard_fg_expand]: 1.38002e-06 [shard_inline]: 6.58998e-06 [merge_send_recv]: 4.94e-06 [auto_parallel]: 6.49001e-06 [parallel]: 5.09e-06 [flash_sp]: 3.88001e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.13998e-06 [matmul_add_comm_reduction]: 5.59e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.88998e-06 [virtual_dataset]: 5.68002e-06 [get_grad_eliminate_]: 5.72001e-06 [virtual_output]: 5.68002e-06 [merge_forward]: 3.26999e-06 [cell_reuse_recompute_pass]: 1.86e-06 [offload_activation]: 7.26001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.102e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 9.57001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43e-06 [meta_fg_expand]: 2.01e-06 [flash_sp_send_recv_attached]: 9.10019e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.45001e-06 [a_after_grad]: 7.93001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.16002e-06 [auto_monad_grad]: 1.25001e-06 [auto_monad_eliminator]: 6.34999e-06 [cse]: 1.337e-05 [a_3]: 3.379e-05 [py_interpret_to_execute_after_opt_a]: 8.40999e-06 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 3.559e-05 [convert_after_rewriter]: 7.18e-06 [order_py_execute_after_rewriter]: 5.46e-06 [mutable_eliminate]: 0.00050481 [opt_b]: 0.0001939, [1] [Cycle 1]: 0.00018744, [7] [b_1]: 0.00011431 [b_2]: 7.31001e-06 [updatestate_depend_eliminate]: 6.07999e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.46e-06 [renormalize]: 3.59985e-07 [cse]: 1.825e-05 [optimize_parallel_all_gather_comm]: 1.595e-05 [overlap_param_gather]: 2.17999e-06 [cconv]: 2.49e-05 [loop_unroll]: 0.00043433 [opt_after_cconv]: 9.886e-05, [1] [Cycle 1]: 9.316e-05, [7] [c_1]: 2.647e-05 [parameter_eliminate]: 2.47001e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.76e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.765e-05 [renormalize]: 5.3001e-07 [remove_dup_value]: 1.571e-05 [tuple_transform]: 6.908e-05, [1] [Cycle 1]: 6.445e-05, [4] [d_1]: 3.767e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 6.47001e-06 [partial_unused_args_eliminate]: 1.97999e-06 [add_recomputation]: 4.296e-05 [cse_after_recomputation]: 2.137e-05, [1] [Cycle 1]: 1.683e-05, [1] [cse]: 1.141e-05 [environ_conv]: 5.59998e-06 [swap_dp_allreduce_reducescatter]: 5.44e-06 [bias_add_comm_swap]: 2.64999e-06 [label_micro_interleaved_index]: 4.37003e-06 [label_fine_grained_interleaved_index]: 3.01999e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.54999e-06 [assign_add_opt]: 1.39e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.34001e-06 [reorder_send_recv_between_fp_bp]: 2.89999e-06 [comm_op_add_attrs]: 1.62999e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.34e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.57999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.78002e-06 [control_data_broadcast_order]: 1.281e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 3.87002e-06 [overlap_recompute_and_grad_model_parallel]: 4.70999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27999e-06 [overlap_recompute_comm]: 2.58998e-06 [overlap_grad_ring_attention]: 4.18999e-06 [overlap_grad_flash_sp]: 1.623e-05 [begin_end_overlap_inline]: 5.40022e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.88002e-06 [handle_group_info]: 1.18001e-06 [symbol_engine_optimizer]: 7.269e-05, [1] [Cycle 1]: 6.84e-05, [6] [build]: 2.30002e-06 [elim_shapecalc]: 8.95001e-06 [elim_not_effective]: 1.302e-05 [opt_reshape]: 6.41e-06 [fold_const_symbol]: 9.24e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 1.609e-05 [get_jit_bprop_graph]: 1.29e-06 [rewriter_after_jit_bprop_graph]: 4.00998e-06 [opt_after_jit_grad]: 0.0004733 [validate]: 3.892e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00612602 [execute]: 7.31001e-06 Sums bootstrap : 0.000464s : 2.79% type_inference : 0.006035s : 36.34% event_method : 0.000012s : 0.07% auto_monad : 0.000063s : 0.38% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000052s : 0.31% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000037s : 0.22% optimize.opt_a.loop_unroll : 0.000024s : 0.14% optimize.opt_a.a_1 : 0.000486s : 2.93% optimize.opt_a.with_stream_mark : 0.000026s : 0.16% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000158s : 0.95% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.08% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000013s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000020s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000487s : 2.93% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000042s : 0.25% optimize.opt_a.a_3 : 0.000080s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000505s : 3.04% optimize.opt_b.b_1 : 0.000114s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.15% optimize.loop_unroll : 0.000434s : 2.62% optimize.opt_after_cconv.c_1 : 0.000026s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.26% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000002s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000473s : 2.85% validate : 0.000039s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.006126s : 36.89% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000151 24 19.84% : 0.000030s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 0.87% : 0.000001s : 2: substitution.fold_const_symbol 3.63% : 0.000005s : 3: substitution.graph_param_transform 66.72% : 0.000101s : 3: substitution.inline 2.39% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.00% : 0.000005s : 4: substitution.remove_not_recompute_node 2.04% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005988 2 92.20% : 0.005521s : 1: type_inference.infer 7.80% : 0.000467s : 1: type_inference.specialize ------[replace.] 0.000029 3 100.00% : 0.000029s : 3: replace.inline ------[match.] 0.000099 3 100.00% : 0.000099s : 3: match.inline ------[predicate.] 0.000151 815 0.85% : 0.000001s : 8: predicate.accumulaten_eliminater 0.89% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 1.03% : 0.000002s : 8: predicate.addn_zero_filter 0.78% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.18% : 0.000003s : 14: predicate.arithmetic_simplify 0.88% : 0.000001s : 8: predicate.cast_eliminate 0.73% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.66% : 0.000001s : 6: predicate.depend_value_elim 0.81% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_depend_swap 1.85% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.22% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.28% : 0.000003s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.94% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.80% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.75% : 0.000001s : 6: predicate.incorporate_call 0.61% : 0.000001s : 6: predicate.incorporate_call_switch 6.39% : 0.000010s : 37: predicate.inline 1.01% : 0.000002s : 6: predicate.inline_without_move 0.43% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.18% : 0.000002s : 6: predicate.less_batch_normalization 1.59% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 22: predicate.load_eliminater 0.95% : 0.000001s : 3: predicate.loop_unroll_after_grad 2.03% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.76% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.25% : 0.000002s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.44% : 0.000001s : 3: predicate.parallel_virtual_node 1.50% : 0.000002s : 11: predicate.partial_defer_inline 1.32% : 0.000002s : 11: predicate.partial_eliminate 1.03% : 0.000002s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.17% : 0.000002s : 8: predicate.reduce_eliminate 2.24% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.58% : 0.000001s : 6: predicate.remove_not_recompute_node 1.19% : 0.000002s : 14: predicate.replace_applicator 0.66% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.89% : 0.000001s : 8: predicate.reshape_eliminate 0.73% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 3: predicate.row_tensor_eliminate 0.96% : 0.000001s : 6: predicate.same_eliminate 0.48% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 6: predicate.shard_identity_eliminate 0.89% : 0.000001s : 6: predicate.special_op_eliminate 0.87% : 0.000001s : 6: predicate.specialize_transform 0.93% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.24% : 0.000002s : 11: predicate.switch_defer_inline 1.86% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.85% : 0.000007s : 38: predicate.switch_simplify 0.89% : 0.000001s : 8: predicate.tile_eliminate 0.87% : 0.000001s : 8: predicate.transpose_eliminate 1.59% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.08% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.61% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.93% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 3: predicate.value_based_eliminate 0.81% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000289 7 37.33% : 0.000108s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.67% : 0.000181s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029505 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.58% : 0.003122s : 1: add_attr 10.55% : 0.003112s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000068s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.66% : 0.000489s : 1: bootstrap 0.10% : 0.000028s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000443s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000514s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.95% : 0.000870s : 78: opt.transform.opt_a 0.09% : 0.000025s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000093s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.41% : 0.002186s : 1: opt_a 0.35% : 0.000102s : 1: opt_after_cconv 1.64% : 0.000484s : 1: opt_after_jit_grad 0.67% : 0.000197s : 1: opt_b 14.02% : 0.004137s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000019s : 1: remove_dup_value 0.91% : 0.000268s : 1: renormalize.infer 0.72% : 0.000211s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000040s : 1: rewriter_after_opt_a 0.19% : 0.000056s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000075s : 1: symbol_engine_optimizer 20.81% : 0.006139s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.51% : 0.006050s : 1: type_inference 0.24% : 0.000071s : 1: validate TotalTime = 0.0224249, [24] [bootstrap]: 0.0004744 [type_inference]: 0.00596396 [event_method]: 1.407e-05 [auto_monad]: 6.156e-05 [graph_reusing]: 5.35001e-06 [inline]: 1.76e-06 [add_attr]: 0.00335639, [1] [add_attr_with_inline]: 0.00334496, [1] [Cycle 1]: 6.879e-05, [2] [tag_attr]: 1.921e-05 [meta_addattr_fg_expand]: 4.62e-06 [parallel-infer-symbol]: 3.8e-06 [pre_auto_parallel]: 3.153e-05 [insert-virtual-dataset]: 2.62001e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 2.07001e-06 [pipeline_split]: 1.94e-06 [optimize]: 0.00508739, [53] [py_interpret_to_execute]: 2.761e-05 [rewriter_before_opt_a]: 7.321e-05 [opt_a]: 0.00271708, [2] [Cycle 1]: 0.00203969, [45] [expand_dump_flag]: 2.96001e-06 [switch_simplify]: 3.691e-05 [loop_unroll]: 2.068e-05 [a_1]: 0.00048687 [with_stream_mark]: 1.875e-05 [recompute_prepare]: 9.31e-06 [updatestate_depend_eliminate]: 4.42e-06 [updatestate_assign_eliminate]: 3.34001e-06 [updatestate_loads_eliminate]: 3.18e-06 [parameter_eliminate]: 1.84e-06 [a_2]: 8.312e-05 [accelerated_algorithm]: 6.69999e-06 [shard]: 1.91e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 6.17999e-06 [merge_send_recv]: 8.21002e-06 [auto_parallel]: 8.04997e-06 [parallel]: 1.957e-05 [flash_sp]: 1.009e-05 [merge_comm]: 4.22998e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 1.085e-05 [allreduce_slice_to_reducescatter]: 8.89995e-07 [virtual_shard_identity]: 9.49999e-06 [virtual_dataset]: 5.77e-05 [get_grad_eliminate_]: 6.34001e-06 [virtual_output]: 6.49001e-06 [merge_forward]: 4.00998e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 1.127e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.488e-05 [merge_recompute_call_nodes]: 2.02999e-06 [before_grad]: 1.163e-05 [set_forward_comm_id_for_comm_node_pass]: 4.02e-06 [meta_fg_expand]: 3.35003e-06 [flash_sp_send_recv_attached]: 3.04999e-06 [receive_attached]: 2.26998e-06 [after_resolve]: 1.024e-05 [a_after_grad]: 8.89998e-06 [renormalize]: 0.00074859 [add_forward_monad_depend]: 6.07999e-06 [auto_monad_grad]: 2.92002e-06 [auto_monad_eliminator]: 1.689e-05 [cse]: 3.434e-05 [a_3]: 4.976e-05 [Cycle 2]: 0.00066376, [45] [expand_dump_flag]: 1.63002e-06 [switch_simplify]: 7.48e-06 [loop_unroll]: 6.12001e-06 [a_1]: 0.00012257 [with_stream_mark]: 1.315e-05 [recompute_prepare]: 6.78e-06 [updatestate_depend_eliminate]: 4.12e-06 [updatestate_assign_eliminate]: 2.66999e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.15999e-06 [a_2]: 7.346e-05 [accelerated_algorithm]: 6.07999e-06 [shard]: 1.45999e-06 [meta_shard_fg_expand]: 1.34e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 5.28002e-06 [auto_parallel]: 7.8e-06 [parallel]: 7.40998e-06 [flash_sp]: 3.68999e-06 [merge_comm]: 3.75e-06 [allreduce_fusion]: 3.19001e-06 [matmul_add_comm_reduction]: 8.05e-06 [allreduce_slice_to_reducescatter]: 4.70027e-07 [virtual_shard_identity]: 6.53e-06 [virtual_dataset]: 5.85002e-06 [get_grad_eliminate_]: 5.34998e-06 [virtual_output]: 5.15999e-06 [merge_forward]: 3.85998e-06 [cell_reuse_recompute_pass]: 1.93002e-06 [offload_activation]: 8.88002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.207e-05 [merge_recompute_call_nodes]: 1.07e-06 [before_grad]: 9.69999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.92998e-06 [meta_fg_expand]: 2.11998e-06 [flash_sp_send_recv_attached]: 1.28002e-06 [receive_attached]: 1.98002e-06 [after_resolve]: 1.001e-05 [a_after_grad]: 9.00999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.44e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 9.76e-06 [cse]: 1.648e-05 [a_3]: 3.341e-05 [py_interpret_to_execute_after_opt_a]: 1.386e-05 [slice_cell_reuse_recomputed_activation]: 2.37999e-06 [rewriter_after_opt_a]: 3.99e-05 [convert_after_rewriter]: 7.3e-06 [order_py_execute_after_rewriter]: 5.27001e-06 [mutable_eliminate]: 0.00071885 [opt_b]: 0.00020872, [1] [Cycle 1]: 0.00020042, [7] [b_1]: 0.00011647 [b_2]: 8.74998e-06 [updatestate_depend_eliminate]: 6.83e-06 [updatestate_assign_eliminate]: 3.32997e-06 [updatestate_loads_eliminate]: 2.81999e-06 [renormalize]: 6.60017e-07 [cse]: 2.282e-05 [optimize_parallel_all_gather_comm]: 1.904e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.908e-05 [loop_unroll]: 0.00051368 [opt_after_cconv]: 0.00010679, [1] [Cycle 1]: 9.924e-05, [7] [c_1]: 2.756e-05 [parameter_eliminate]: 3.78001e-06 [updatestate_depend_eliminate]: 6.38e-06 [updatestate_assign_eliminate]: 2.80002e-06 [updatestate_loads_eliminate]: 2.37001e-06 [cse]: 1.934e-05 [renormalize]: 7.39994e-07 [remove_dup_value]: 1.602e-05 [tuple_transform]: 7.412e-05, [1] [Cycle 1]: 6.909e-05, [4] [d_1]: 4.08e-05 [none_parameter_eliminate]: 1.73002e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 6.74999e-06 [partial_unused_args_eliminate]: 1.91998e-06 [add_recomputation]: 5.14e-05 [cse_after_recomputation]: 2.299e-05, [1] [Cycle 1]: 1.739e-05, [1] [cse]: 1.176e-05 [environ_conv]: 6.49001e-06 [swap_dp_allreduce_reducescatter]: 5.81003e-06 [bias_add_comm_swap]: 2.55002e-06 [label_micro_interleaved_index]: 4.65001e-06 [label_fine_grained_interleaved_index]: 2.63998e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.68e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 8.89995e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.69001e-06 [reorder_send_recv_between_fp_bp]: 3.00998e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.30001e-06 [interleave_parallel_branches]: 1.16002e-06 [overlap_opt_shard_in_pipeline]: 1.66e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.442e-05 [grouped_pairwise_exchange_alltoall]: 1.51002e-06 [offloading_packed_experts]: 3.86001e-06 [overlap_recompute_and_grad_model_parallel]: 5.30999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.24999e-06 [overlap_grad_ring_attention]: 4.47e-06 [overlap_grad_flash_sp]: 2.102e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.25002e-06 [split_layernorm_comm]: 1.72999e-06 [handle_group_info]: 1.15001e-06 [symbol_engine_optimizer]: 9.087e-05, [1] [Cycle 1]: 8.604e-05, [6] [build]: 3.48e-06 [elim_shapecalc]: 1.07e-05 [elim_not_effective]: 2.367e-05 [opt_reshape]: 6.81999e-06 [fold_const_symbol]: 9.99999e-06 [renormalize]: 2.29978e-07 [detach_backward]: 2.29999e-06 [pipeline_parallel_scheduler]: 2.01e-06 [auto_monad_reorder]: 1.715e-05 [get_jit_bprop_graph]: 1.86998e-06 [rewriter_after_jit_bprop_graph]: 5.44e-06 [opt_after_jit_grad]: 0.00053778 [validate]: 4.441e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00656342 [execute]: 8.99e-06 Sums bootstrap : 0.000474s : 2.64% type_inference : 0.005964s : 33.21% event_method : 0.000014s : 0.08% auto_monad : 0.000062s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000019s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000032s : 0.18% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000028s : 0.15% optimize.rewriter_before_opt_a : 0.000073s : 0.41% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000044s : 0.25% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000609s : 3.39% optimize.opt_a.with_stream_mark : 0.000032s : 0.18% optimize.opt_a.recompute_prepare : 0.000016s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000009s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000157s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000016s : 0.09% optimize.opt_a.parallel : 0.000027s : 0.15% optimize.opt_a.flash_sp : 0.000014s : 0.08% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.09% optimize.opt_a.virtual_dataset : 0.000064s : 0.35% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.07% optimize.opt_a.virtual_output : 0.000012s : 0.06% optimize.opt_a.merge_forward : 0.000008s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000020s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000027s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000021s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000018s : 0.10% optimize.opt_a.renormalize : 0.000749s : 4.17% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000027s : 0.15% optimize.opt_a.cse : 0.000051s : 0.28% optimize.opt_a.a_3 : 0.000083s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.08% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000040s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000719s : 4.00% optimize.opt_b.b_1 : 0.000116s : 0.65% optimize.opt_b.b_2 : 0.000009s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000023s : 0.13% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000029s : 0.16% optimize.loop_unroll : 0.000514s : 2.86% optimize.opt_after_cconv.c_1 : 0.000028s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000019s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.09% optimize.tuple_transform.d_1 : 0.000041s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000006s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000021s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000024s : 0.13% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.03% opt_after_jit_grad : 0.000538s : 2.99% validate : 0.000044s : 0.25% backend_pass : 0.000001s : 0.01% task_emit : 0.006563s : 36.55% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000206 26 19.62% : 0.000040s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000002s : 2: substitution.fold_const_symbol 2.99% : 0.000006s : 3: substitution.graph_param_transform 64.33% : 0.000132s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.45% : 0.000005s : 4: substitution.remove_not_recompute_node 2.03% : 0.000004s : 2: substitution.replace_old_param 5.01% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005915 2 89.83% : 0.005313s : 1: type_inference.infer 10.17% : 0.000601s : 1: type_inference.specialize ------[replace.] 0.000043 4 79.26% : 0.000034s : 3: replace.inline 20.74% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000139 4 93.26% : 0.000130s : 3: match.inline 6.74% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000170 883 0.87% : 0.000001s : 9: predicate.accumulaten_eliminater 1.17% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.86% : 0.000001s : 9: predicate.addn_zero_filter 0.81% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.19% : 0.000004s : 15: predicate.arithmetic_simplify 0.87% : 0.000001s : 9: predicate.cast_eliminate 0.58% : 0.000001s : 6: predicate.check_bprop_eliminate 0.53% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.91% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.21% : 0.000000s : 3: predicate.elim_not_effective 0.44% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.34% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.15% : 0.000002s : 12: predicate.environ_get_depend_swap 1.64% : 0.000003s : 18: predicate.environ_get_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.19% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.32% : 0.000004s : 13: predicate.float_depend_g_call 0.52% : 0.000001s : 6: predicate.float_environ_get_switch 0.82% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.21% : 0.000000s : 3: predicate.graph_param_transform 0.61% : 0.000001s : 6: predicate.incorporate_call 0.55% : 0.000001s : 6: predicate.incorporate_call_switch 6.28% : 0.000011s : 40: predicate.inline 0.90% : 0.000002s : 6: predicate.inline_without_move 0.36% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.94% : 0.000002s : 6: predicate.less_batch_normalization 1.65% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.32% : 0.000004s : 25: predicate.load_eliminater 1.46% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.03% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 6: predicate.merge_addn 0.56% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 9: predicate.minmaximum_grad 1.45% : 0.000002s : 3: predicate.mutable_eliminate 0.39% : 0.000001s : 3: predicate.opt_reshape 0.46% : 0.000001s : 3: predicate.parallel_virtual_node 1.50% : 0.000003s : 13: predicate.partial_defer_inline 1.40% : 0.000002s : 13: predicate.partial_eliminate 0.88% : 0.000001s : 9: predicate.print_const_string_wrapper 0.61% : 0.000001s : 6: predicate.reduce_all_const_elim 1.16% : 0.000002s : 9: predicate.reduce_eliminate 2.24% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 6: predicate.remove_not_recompute_node 1.27% : 0.000002s : 16: predicate.replace_applicator 0.66% : 0.000001s : 6: predicate.replace_old_param 0.42% : 0.000001s : 3: predicate.reset_defer_inline 0.99% : 0.000002s : 9: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.54% : 0.000001s : 3: predicate.row_tensor_eliminate 1.19% : 0.000002s : 6: predicate.same_eliminate 0.55% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.88% : 0.000002s : 6: predicate.shard_identity_eliminate 0.80% : 0.000001s : 6: predicate.special_op_eliminate 0.76% : 0.000001s : 6: predicate.specialize_transform 1.23% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000002s : 6: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.30% : 0.000002s : 13: predicate.switch_defer_inline 1.91% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.83% : 0.000008s : 43: predicate.switch_simplify 0.85% : 0.000001s : 9: predicate.tile_eliminate 0.93% : 0.000002s : 9: predicate.transpose_eliminate 1.53% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.61% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.56% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.17% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.93% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.80% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000400 8 43.41% : 0.000174s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.59% : 0.000226s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032795 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.25% : 0.003362s : 1: add_attr 10.21% : 0.003349s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000067s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.56% : 0.000511s : 1: bootstrap 0.10% : 0.000033s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000018s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000026s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000010s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.05% : 0.000015s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.60% : 0.000524s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 2.23% : 0.000732s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000017s : 1: opt.transform.mutable_eliminate 3.08% : 0.001010s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000095s : 28: opt.transform.opt_b 0.14% : 0.000045s : 2: opt.transform.opt_trans_graph 0.14% : 0.000047s : 4: opt.transform.symbol_engine_opt 8.30% : 0.002721s : 1: opt_a 0.34% : 0.000110s : 1: opt_after_cconv 1.67% : 0.000548s : 1: opt_after_jit_grad 0.65% : 0.000212s : 1: opt_b 15.53% : 0.005093s : 1: optimize 0.07% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000006s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000036s : 1: pre_auto_parallel 0.10% : 0.000032s : 1: py_interpret_to_execute 0.05% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000020s : 1: remove_dup_value 1.23% : 0.000404s : 1: renormalize.infer 1.02% : 0.000334s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000045s : 1: rewriter_after_opt_a 0.24% : 0.000078s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.29% : 0.000094s : 1: symbol_engine_optimizer 20.06% : 0.006580s : 1: task_emit 0.24% : 0.000077s : 1: tuple_transform 18.24% : 0.005980s : 1: type_inference 0.26% : 0.000084s : 1: validate TotalTime = 0.0394331, [24] [bootstrap]: 0.00042461 [type_inference]: 0.0113027 [event_method]: 4.626e-05 [auto_monad]: 0.00013714 [graph_reusing]: 9.59e-06 [inline]: 2.39001e-06 [add_attr]: 0.00308733, [1] [add_attr_with_inline]: 0.00307842, [1] [Cycle 1]: 7.669e-05, [2] [tag_attr]: 3.461e-05 [meta_addattr_fg_expand]: 1.046e-05 [parallel-infer-symbol]: 3.18e-06 [pre_auto_parallel]: 4.949e-05 [insert-virtual-dataset]: 2.54001e-06 [parallel-infer-symbol-second]: 9.20001e-07 [dataset_repeat_opt]: 2.43e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.0168205, [53] [py_interpret_to_execute]: 3.978e-05 [rewriter_before_opt_a]: 0.000157 [opt_a]: 0.0145902, [3] [Cycle 1]: 0.0110948, [45] [expand_dump_flag]: 3.87002e-06 [switch_simplify]: 7.774e-05 [loop_unroll]: 6.387e-05 [a_1]: 0.00144483 [with_stream_mark]: 2.435e-05 [recompute_prepare]: 2.286e-05 [updatestate_depend_eliminate]: 8.57998e-06 [updatestate_assign_eliminate]: 7.73001e-06 [updatestate_loads_eliminate]: 6.86999e-06 [parameter_eliminate]: 2.69999e-06 [a_2]: 0.00024471 [accelerated_algorithm]: 3.138e-05 [shard]: 2.25002e-06 [meta_shard_fg_expand]: 3.75e-06 [shard_inline]: 1.647e-05 [merge_send_recv]: 1.792e-05 [auto_parallel]: 1.065e-05 [parallel]: 1.867e-05 [flash_sp]: 1.098e-05 [merge_comm]: 9.59e-06 [allreduce_fusion]: 8.62e-06 [matmul_add_comm_reduction]: 2.804e-05 [allreduce_slice_to_reducescatter]: 8.00006e-07 [virtual_shard_identity]: 1.792e-05 [virtual_dataset]: 1.585e-05 [get_grad_eliminate_]: 1.543e-05 [virtual_output]: 1.507e-05 [merge_forward]: 8.67e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 1.806e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.992e-05 [merge_recompute_call_nodes]: 1.63002e-06 [before_grad]: 2.987e-05 [set_forward_comm_id_for_comm_node_pass]: 9.62999e-06 [meta_fg_expand]: 0.00155918 [flash_sp_send_recv_attached]: 4.32998e-06 [receive_attached]: 2.04e-06 [after_resolve]: 6.471e-05 [a_after_grad]: 8.85e-05 [renormalize]: 0.00626074 [add_forward_monad_depend]: 9.74999e-06 [auto_monad_grad]: 6.66e-06 [auto_monad_eliminator]: 5.2e-05 [cse]: 0.00018237 [a_3]: 0.00033944 [Cycle 2]: 0.00276886, [45] [expand_dump_flag]: 2.66e-06 [switch_simplify]: 4.621e-05 [loop_unroll]: 4.314e-05 [a_1]: 0.00135895 [with_stream_mark]: 1.224e-05 [recompute_prepare]: 9.17999e-06 [updatestate_depend_eliminate]: 4.08001e-06 [updatestate_assign_eliminate]: 3.26001e-06 [updatestate_loads_eliminate]: 2.52001e-06 [parameter_eliminate]: 1.46002e-06 [a_2]: 8.995e-05 [accelerated_algorithm]: 1.088e-05 [shard]: 1.81e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 6.89999e-06 [merge_send_recv]: 6.84999e-06 [auto_parallel]: 7.07002e-06 [parallel]: 7.10998e-06 [flash_sp]: 3.85e-06 [merge_comm]: 3.97e-06 [allreduce_fusion]: 3.75e-06 [matmul_add_comm_reduction]: 8.35999e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 8.2e-06 [virtual_dataset]: 6.67002e-06 [get_grad_eliminate_]: 7.43e-06 [virtual_output]: 6.65998e-06 [merge_forward]: 3.9e-06 [cell_reuse_recompute_pass]: 9.20001e-07 [offload_activation]: 8.21002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.335e-05 [merge_recompute_call_nodes]: 1.25999e-06 [before_grad]: 1.131e-05 [set_forward_comm_id_for_comm_node_pass]: 4.4e-06 [meta_fg_expand]: 8.43e-05 [flash_sp_send_recv_attached]: 1.54e-06 [receive_attached]: 1.92999e-06 [after_resolve]: 1.251e-05 [a_after_grad]: 1.053e-05 [renormalize]: 0.00060785 [add_forward_monad_depend]: 4.55999e-06 [auto_monad_grad]: 1.84998e-06 [auto_monad_eliminator]: 1.151e-05 [cse]: 2.213e-05 [a_3]: 4.971e-05 [Cycle 3]: 0.00070962, [45] [expand_dump_flag]: 1.08001e-06 [switch_simplify]: 8.17998e-06 [loop_unroll]: 6.68e-06 [a_1]: 0.00015845 [with_stream_mark]: 9.05999e-06 [recompute_prepare]: 7.1e-06 [updatestate_depend_eliminate]: 3.76999e-06 [updatestate_assign_eliminate]: 2.96999e-06 [updatestate_loads_eliminate]: 2.49999e-06 [parameter_eliminate]: 1.13001e-06 [a_2]: 9.212e-05 [accelerated_algorithm]: 9.77999e-06 [shard]: 9.49978e-07 [meta_shard_fg_expand]: 1.35999e-06 [shard_inline]: 7.13e-06 [merge_send_recv]: 5.54e-06 [auto_parallel]: 6.20002e-06 [parallel]: 5.14e-06 [flash_sp]: 9.70002e-07 [merge_comm]: 4.03999e-06 [allreduce_fusion]: 3.35998e-06 [matmul_add_comm_reduction]: 5.90002e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 7.82e-06 [virtual_dataset]: 6.46e-06 [get_grad_eliminate_]: 6.36998e-06 [virtual_output]: 6.21e-06 [merge_forward]: 3.26999e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 6.83e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.256e-05 [merge_recompute_call_nodes]: 7.60017e-07 [before_grad]: 1.105e-05 [set_forward_comm_id_for_comm_node_pass]: 3.95998e-06 [meta_fg_expand]: 2.25002e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 9.39996e-07 [after_resolve]: 9.27999e-06 [a_after_grad]: 9.46e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 1.25001e-06 [auto_monad_eliminator]: 7.77998e-06 [cse]: 1.6e-05 [a_3]: 4.071e-05 [py_interpret_to_execute_after_opt_a]: 1.067e-05 [slice_cell_reuse_recomputed_activation]: 2.31998e-06 [rewriter_after_opt_a]: 4.116e-05 [convert_after_rewriter]: 7.61001e-06 [order_py_execute_after_rewriter]: 5.54e-06 [mutable_eliminate]: 0.00053681 [opt_b]: 0.000225, [1] [Cycle 1]: 0.00021784, [7] [b_1]: 0.00014006 [b_2]: 8.79e-06 [updatestate_depend_eliminate]: 5.77001e-06 [updatestate_assign_eliminate]: 2.94999e-06 [updatestate_loads_eliminate]: 2.86e-06 [renormalize]: 5.3001e-07 [cse]: 2.068e-05 [optimize_parallel_all_gather_comm]: 1.742e-05 [overlap_param_gather]: 2.00002e-06 [cconv]: 2.189e-05 [loop_unroll]: 0.00046578 [opt_after_cconv]: 0.00010758, [1] [Cycle 1]: 0.00010128, [7] [c_1]: 3.186e-05 [parameter_eliminate]: 2.20002e-06 [updatestate_depend_eliminate]: 5.96e-06 [updatestate_assign_eliminate]: 2.89999e-06 [updatestate_loads_eliminate]: 2.76e-06 [cse]: 2.057e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 1.528e-05 [tuple_transform]: 7.833e-05, [1] [Cycle 1]: 7.38e-05, [4] [d_1]: 4.567e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 7.51001e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 4.989e-05 [cse_after_recomputation]: 2.429e-05, [1] [Cycle 1]: 1.956e-05, [1] [cse]: 1.402e-05 [environ_conv]: 7.92e-06 [swap_dp_allreduce_reducescatter]: 5.77001e-06 [bias_add_comm_swap]: 2.83e-06 [label_micro_interleaved_index]: 4.77e-06 [label_fine_grained_interleaved_index]: 2.65002e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.56e-06 [assign_add_opt]: 1.46998e-06 [ForceFp32Comm]: 9.10019e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.37999e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 1.15001e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.14003e-06 [overlap_opt_shard_in_pipeline]: 1.56998e-06 [overlap_opt_shard_grad_in_pipeline]: 2.01e-06 [control_data_broadcast_order]: 1.412e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 4.23999e-06 [overlap_recompute_and_grad_model_parallel]: 4.94e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.74e-06 [overlap_recompute_comm]: 2.41e-06 [overlap_grad_ring_attention]: 4.67e-06 [overlap_grad_flash_sp]: 2.102e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.97999e-06 [split_layernorm_comm]: 1.80001e-06 [handle_group_info]: 1.29998e-06 [symbol_engine_optimizer]: 8.561e-05, [1] [Cycle 1]: 8.117e-05, [6] [build]: 8.99e-06 [elim_shapecalc]: 1.048e-05 [elim_not_effective]: 1.419e-05 [opt_reshape]: 7.49002e-06 [fold_const_symbol]: 1.168e-05 [renormalize]: 2.30008e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 2.09e-05 [get_jit_bprop_graph]: 1.60001e-06 [rewriter_after_jit_bprop_graph]: 3.61999e-06 [opt_after_jit_grad]: 0.00047627 [validate]: 3.944e-05 [backend_pass]: 1.03001e-06 [task_emit]: 0.00676949 [execute]: 7.35e-06 Sums bootstrap : 0.000425s : 1.21% type_inference : 0.011303s : 32.27% event_method : 0.000046s : 0.13% auto_monad : 0.000137s : 0.39% graph_reusing : 0.000010s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.11% optimize.rewriter_before_opt_a : 0.000157s : 0.45% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.38% optimize.opt_a.loop_unroll : 0.000114s : 0.32% optimize.opt_a.a_1 : 0.002962s : 8.46% optimize.opt_a.with_stream_mark : 0.000046s : 0.13% optimize.opt_a.recompute_prepare : 0.000039s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.03% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000427s : 1.22% optimize.opt_a.accelerated_algorithm : 0.000052s : 0.15% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000030s : 0.09% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000024s : 0.07% optimize.opt_a.parallel : 0.000031s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000018s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000034s : 0.10% optimize.opt_a.virtual_dataset : 0.000029s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000029s : 0.08% optimize.opt_a.virtual_output : 0.000028s : 0.08% optimize.opt_a.merge_forward : 0.000016s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000033s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000056s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000052s : 0.15% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.05% optimize.opt_a.meta_fg_expand : 0.001646s : 4.70% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000086s : 0.25% optimize.opt_a.a_after_grad : 0.000108s : 0.31% optimize.opt_a.renormalize : 0.006869s : 19.61% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.04% optimize.opt_a.auto_monad_grad : 0.000010s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000071s : 0.20% optimize.opt_a.cse : 0.000220s : 0.63% optimize.opt_a.a_3 : 0.000430s : 1.23% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000041s : 0.12% optimize.convert_after_rewriter : 0.000008s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000537s : 1.53% optimize.opt_b.b_1 : 0.000140s : 0.40% optimize.opt_b.b_2 : 0.000009s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000021s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.06% optimize.loop_unroll : 0.000466s : 1.33% optimize.opt_after_cconv.c_1 : 0.000032s : 0.09% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000021s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.04% optimize.tuple_transform.d_1 : 0.000046s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.14% optimize.cse_after_recomputation.cse : 0.000014s : 0.04% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000021s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.06% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000476s : 1.36% validate : 0.000039s : 0.11% backend_pass : 0.000001s : 0.00% task_emit : 0.006769s : 19.33% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000717 161 7.50% : 0.000054s : 8: substitution.arithmetic_simplify 0.33% : 0.000002s : 3: substitution.elim_not_effective 0.67% : 0.000005s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 3: substitution.fold_const_symbol 0.83% : 0.000006s : 4: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 56.70% : 0.000407s : 17: substitution.inline 2.38% : 0.000017s : 2: substitution.inline_without_move 1.40% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.22% : 0.000016s : 3: substitution.less_batch_normalization 1.50% : 0.000011s : 7: substitution.minmaximum_grad 0.79% : 0.000006s : 5: substitution.partial_eliminate 1.71% : 0.000012s : 15: substitution.remove_not_recompute_node 3.86% : 0.000028s : 10: substitution.replace_applicator 1.31% : 0.000009s : 10: substitution.replace_old_param 0.48% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.10% : 0.000022s : 7: substitution.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.98% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.92% : 0.000057s : 19: substitution.tuple_list_get_item_eliminator 2.23% : 0.000016s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011221 2 85.57% : 0.009602s : 1: type_inference.infer 14.43% : 0.001619s : 1: type_inference.specialize ------[replace.] 0.000207 27 64.80% : 0.000134s : 17: replace.inline 35.20% : 0.000073s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000426 27 93.34% : 0.000397s : 17: match.inline 6.66% : 0.000028s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000700 4248 1.16% : 0.000008s : 53: predicate.accumulaten_eliminater 0.23% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.14% : 0.000008s : 53: predicate.addn_zero_filter 1.10% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 1.99% : 0.000014s : 74: predicate.arithmetic_simplify 1.16% : 0.000008s : 53: predicate.cast_eliminate 1.11% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.48% : 0.000003s : 21: predicate.depend_value_elim 1.19% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.26% : 0.000009s : 53: predicate.dict_get_item_eliminator 1.18% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.19% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.19% : 0.000008s : 57: predicate.environ_get_depend_swap 1.69% : 0.000012s : 78: predicate.environ_get_eliminate 1.20% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.82% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.50% : 0.000018s : 80: predicate.float_depend_g_call 0.46% : 0.000003s : 21: predicate.float_environ_get_switch 0.57% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.50% : 0.000004s : 21: predicate.get_grad_eliminate 0.08% : 0.000001s : 4: predicate.graph_param_transform 0.51% : 0.000004s : 21: predicate.incorporate_call 0.45% : 0.000003s : 21: predicate.incorporate_call_switch 5.81% : 0.000041s : 183: predicate.inline 1.46% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.59% : 0.000004s : 21: predicate.less_batch_normalization 1.55% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.65% : 0.000019s : 124: predicate.load_eliminater 0.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.61% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.37% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.48% : 0.000003s : 21: predicate.merge_addn 1.11% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.09% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.17% : 0.000008s : 53: predicate.minmaximum_grad 0.29% : 0.000002s : 4: predicate.mutable_eliminate 0.10% : 0.000001s : 4: predicate.opt_reshape 0.10% : 0.000001s : 4: predicate.parallel_virtual_node 2.13% : 0.000015s : 80: predicate.partial_defer_inline 1.71% : 0.000012s : 67: predicate.partial_eliminate 1.14% : 0.000008s : 53: predicate.print_const_string_wrapper 0.47% : 0.000003s : 21: predicate.reduce_all_const_elim 1.42% : 0.000010s : 53: predicate.reduce_eliminate 2.65% : 0.000019s : 124: predicate.redundant_stop_gradient_eliminater 0.29% : 0.000002s : 21: predicate.remove_not_recompute_node 1.89% : 0.000013s : 113: predicate.replace_applicator 0.68% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.14% : 0.000008s : 53: predicate.reshape_eliminate 1.09% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.21% : 0.000008s : 50: predicate.same_eliminate 0.33% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.61% : 0.000004s : 21: predicate.shard_identity_eliminate 0.22% : 0.000002s : 8: predicate.special_op_eliminate 0.59% : 0.000004s : 21: predicate.specialize_transform 1.28% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.96% : 0.000014s : 80: predicate.switch_defer_inline 3.00% : 0.000021s : 130: predicate.switch_layer_defer_inline 5.22% : 0.000037s : 218: predicate.switch_simplify 1.12% : 0.000008s : 53: predicate.tile_eliminate 1.14% : 0.000008s : 53: predicate.transpose_eliminate 1.49% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000009s : 61: predicate.tuple_list_get_item_depend_reorder 2.73% : 0.000019s : 92: predicate.tuple_list_get_item_eliminator 1.49% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.57% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.63% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.18% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 4: predicate.value_based_eliminate 0.51% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.49% : 0.000003s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001744 36 58.21% : 0.001015s : 15: func_graph_cloner_run.FuncGraphClonerGraph 41.79% : 0.000729s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.070881 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.36% : 0.003092s : 1: add_attr 4.35% : 0.003082s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000145s : 1: auto_monad 0.03% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.64% : 0.000452s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.02% : 0.000011s : 1: convert_after_rewriter 0.04% : 0.000027s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.08% : 0.000054s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000014s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.67% : 0.000474s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000546s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000022s : 1: opt.transform.mutable_eliminate 6.31% : 0.004473s : 117: opt.transform.opt_a 0.04% : 0.000030s : 1: opt.transform.opt_after_cconv 0.03% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.17% : 0.000117s : 28: opt.transform.opt_b 0.07% : 0.000051s : 2: opt.transform.opt_trans_graph 0.06% : 0.000040s : 4: opt.transform.symbol_engine_opt 20.59% : 0.014594s : 1: opt_a 0.16% : 0.000111s : 1: opt_after_cconv 0.69% : 0.000486s : 1: opt_after_jit_grad 0.32% : 0.000229s : 1: opt_b 23.74% : 0.016825s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000054s : 1: pre_auto_parallel 0.06% : 0.000044s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 7.55% : 0.005352s : 2: renormalize.infer 2.12% : 0.001503s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000045s : 1: rewriter_after_opt_a 0.23% : 0.000162s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000088s : 1: symbol_engine_optimizer 9.57% : 0.006787s : 1: task_emit 0.11% : 0.000081s : 1: tuple_transform 15.97% : 0.011322s : 1: type_inference 0.10% : 0.000068s : 1: validate TotalTime = 0.021296, [24] [bootstrap]: 0.00051999 [type_inference]: 0.00603133 [event_method]: 1.219e-05 [auto_monad]: 5.936e-05 [graph_reusing]: 5.94e-06 [inline]: 2.17001e-06 [add_attr]: 0.00313669, [1] [add_attr_with_inline]: 0.00312794, [1] [Cycle 1]: 5.721e-05, [2] [tag_attr]: 1.493e-05 [meta_addattr_fg_expand]: 4e-06 [parallel-infer-symbol]: 3.47997e-06 [pre_auto_parallel]: 2.86e-05 [insert-virtual-dataset]: 3.53e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.71002e-06 [optimize]: 0.00422636, [53] [py_interpret_to_execute]: 2.159e-05 [rewriter_before_opt_a]: 5.276e-05 [opt_a]: 0.00223924, [2] [Cycle 1]: 0.00161639, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 3.078e-05 [loop_unroll]: 1.704e-05 [a_1]: 0.00036083 [with_stream_mark]: 1.641e-05 [recompute_prepare]: 8.33999e-06 [updatestate_depend_eliminate]: 4.35999e-06 [updatestate_assign_eliminate]: 3.49001e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 1.77999e-06 [a_2]: 8.203e-05 [accelerated_algorithm]: 6.57002e-06 [shard]: 2.14e-06 [meta_shard_fg_expand]: 1.78002e-06 [shard_inline]: 6.33e-06 [merge_send_recv]: 8.82e-06 [auto_parallel]: 6.39999e-06 [parallel]: 1.986e-05 [flash_sp]: 8.08999e-06 [merge_comm]: 4.04002e-06 [allreduce_fusion]: 3.46999e-06 [matmul_add_comm_reduction]: 9.51998e-06 [allreduce_slice_to_reducescatter]: 8.60018e-07 [virtual_shard_identity]: 7.85e-06 [virtual_dataset]: 6.07999e-06 [get_grad_eliminate_]: 5.79e-06 [virtual_output]: 5.71e-06 [merge_forward]: 4.10998e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.59e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.196e-05 [merge_recompute_call_nodes]: 1.42999e-06 [before_grad]: 1.06e-05 [set_forward_comm_id_for_comm_node_pass]: 3.54002e-06 [meta_fg_expand]: 2.69001e-06 [flash_sp_send_recv_attached]: 2.79999e-06 [receive_attached]: 2.08998e-06 [after_resolve]: 9.25999e-06 [a_after_grad]: 8.45001e-06 [renormalize]: 0.00052002 [add_forward_monad_depend]: 4.48001e-06 [auto_monad_grad]: 2.66999e-06 [auto_monad_eliminator]: 1.308e-05 [cse]: 3.066e-05 [a_3]: 0.00010348 [Cycle 2]: 0.00061283, [45] [expand_dump_flag]: 1.18001e-06 [switch_simplify]: 7.19001e-06 [loop_unroll]: 5.87001e-06 [a_1]: 0.00011467 [with_stream_mark]: 1.268e-05 [recompute_prepare]: 5.82001e-06 [updatestate_depend_eliminate]: 3.33e-06 [updatestate_assign_eliminate]: 2.78e-06 [updatestate_loads_eliminate]: 2.76e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 7.266e-05 [accelerated_algorithm]: 5.69999e-06 [shard]: 1.27e-06 [meta_shard_fg_expand]: 1.27e-06 [shard_inline]: 5.80002e-06 [merge_send_recv]: 4.72998e-06 [auto_parallel]: 5.54e-06 [parallel]: 4.79e-06 [flash_sp]: 3.58e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 3.03998e-06 [matmul_add_comm_reduction]: 5.67001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.49999e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 5.45001e-06 [virtual_output]: 5.71998e-06 [merge_forward]: 3.35e-06 [cell_reuse_recompute_pass]: 1.39998e-06 [offload_activation]: 6.10002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.106e-05 [merge_recompute_call_nodes]: 8.49977e-07 [before_grad]: 8.74998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.22002e-06 [meta_fg_expand]: 1.84998e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.09998e-06 [after_resolve]: 8.52e-06 [a_after_grad]: 7.78001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 1.00999e-06 [auto_monad_eliminator]: 7.23999e-06 [cse]: 1.52e-05 [a_3]: 3.29e-05 [py_interpret_to_execute_after_opt_a]: 8.46002e-06 [slice_cell_reuse_recomputed_activation]: 2.25002e-06 [rewriter_after_opt_a]: 3.45e-05 [convert_after_rewriter]: 7.01001e-06 [order_py_execute_after_rewriter]: 5.30999e-06 [mutable_eliminate]: 0.00054011 [opt_b]: 0.00019307, [1] [Cycle 1]: 0.00018612, [7] [b_1]: 0.00011342 [b_2]: 7.63999e-06 [updatestate_depend_eliminate]: 6.45002e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.49999e-06 [renormalize]: 4.50003e-07 [cse]: 1.765e-05 [optimize_parallel_all_gather_comm]: 1.702e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.704e-05 [loop_unroll]: 0.00042771 [opt_after_cconv]: 9.715e-05, [1] [Cycle 1]: 9.081e-05, [7] [c_1]: 2.593e-05 [parameter_eliminate]: 2.59999e-06 [updatestate_depend_eliminate]: 5.72001e-06 [updatestate_assign_eliminate]: 2.65002e-06 [updatestate_loads_eliminate]: 2.44001e-06 [cse]: 1.688e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.506e-05 [tuple_transform]: 6.86e-05, [1] [Cycle 1]: 6.343e-05, [4] [d_1]: 3.699e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.79979e-07 [switch_simplify]: 6.14999e-06 [partial_unused_args_eliminate]: 2.09e-06 [add_recomputation]: 4.813e-05 [cse_after_recomputation]: 2.168e-05, [1] [Cycle 1]: 1.687e-05, [1] [cse]: 1.148e-05 [environ_conv]: 5.59e-06 [swap_dp_allreduce_reducescatter]: 5.44e-06 [bias_add_comm_swap]: 2.99001e-06 [label_micro_interleaved_index]: 4.12e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.89999e-06 [assign_add_opt]: 1.67999e-06 [ForceFp32Comm]: 1.18001e-06 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.56998e-06 [reorder_send_recv_between_fp_bp]: 2.76999e-06 [comm_op_add_attrs]: 1.60999e-06 [add_comm_op_reuse_tag]: 1.19e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.16997e-06 [overlap_opt_shard_in_pipeline]: 1.54e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.27e-05 [grouped_pairwise_exchange_alltoall]: 1.40001e-06 [offloading_packed_experts]: 3.9e-06 [overlap_recompute_and_grad_model_parallel]: 4.68999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.66002e-06 [overlap_recompute_comm]: 2.88e-06 [overlap_grad_ring_attention]: 4.64002e-06 [overlap_grad_flash_sp]: 1.903e-05 [begin_end_overlap_inline]: 7.40023e-07 [split_matmul_comm_elemetwise]: 2.26998e-06 [split_layernorm_comm]: 2.28998e-06 [handle_group_info]: 1.09998e-06 [symbol_engine_optimizer]: 7.203e-05, [1] [Cycle 1]: 6.758e-05, [6] [build]: 2.63e-06 [elim_shapecalc]: 9.15001e-06 [elim_not_effective]: 1.176e-05 [opt_reshape]: 6.24999e-06 [fold_const_symbol]: 9.59e-06 [renormalize]: 2.30008e-07 [detach_backward]: 2.06e-06 [pipeline_parallel_scheduler]: 1.49998e-06 [auto_monad_reorder]: 1.632e-05 [get_jit_bprop_graph]: 1.74e-06 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00046349 [validate]: 3.929e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00649179 [execute]: 9.15001e-06 Sums bootstrap : 0.000520s : 3.04% type_inference : 0.006031s : 35.22% event_method : 0.000012s : 0.07% auto_monad : 0.000059s : 0.35% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.17% insert-virtual-dataset : 0.000004s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.13% optimize.rewriter_before_opt_a : 0.000053s : 0.31% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.22% optimize.opt_a.loop_unroll : 0.000023s : 0.13% optimize.opt_a.a_1 : 0.000475s : 2.78% optimize.opt_a.with_stream_mark : 0.000029s : 0.17% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000155s : 0.90% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000025s : 0.14% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000016s : 0.09% optimize.opt_a.renormalize : 0.000520s : 3.04% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000046s : 0.27% optimize.opt_a.a_3 : 0.000136s : 0.80% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000035s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000540s : 3.15% optimize.opt_b.b_1 : 0.000113s : 0.66% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000027s : 0.16% optimize.loop_unroll : 0.000428s : 2.50% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000002s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000463s : 2.71% validate : 0.000039s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.006492s : 37.91% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000152 24 20.21% : 0.000031s : 4: substitution.arithmetic_simplify 1.39% : 0.000002s : 2: substitution.elim_not_effective 1.08% : 0.000002s : 2: substitution.fold_const_symbol 3.59% : 0.000005s : 3: substitution.graph_param_transform 65.71% : 0.000100s : 3: substitution.inline 2.37% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.55% : 0.000005s : 4: substitution.remove_not_recompute_node 2.10% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005982 2 92.43% : 0.005529s : 1: type_inference.infer 7.57% : 0.000453s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000098 3 100.00% : 0.000098s : 3: match.inline ------[predicate.] 0.000147 815 1.14% : 0.000002s : 8: predicate.accumulaten_eliminater 0.92% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 6: predicate.addn_check_dump 0.94% : 0.000001s : 8: predicate.addn_zero_filter 0.79% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.25% : 0.000003s : 14: predicate.arithmetic_simplify 0.88% : 0.000001s : 8: predicate.cast_eliminate 0.71% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.70% : 0.000001s : 6: predicate.depend_value_elim 0.92% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.03% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.29% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_depend_swap 1.89% : 0.000003s : 17: predicate.environ_get_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.15% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.13% : 0.000003s : 11: predicate.float_depend_g_call 0.60% : 0.000001s : 6: predicate.float_environ_get_switch 0.89% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.77% : 0.000001s : 6: predicate.get_grad_eliminate 0.26% : 0.000000s : 3: predicate.graph_param_transform 0.73% : 0.000001s : 6: predicate.incorporate_call 0.62% : 0.000001s : 6: predicate.incorporate_call_switch 6.30% : 0.000009s : 37: predicate.inline 0.94% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 6: predicate.less_batch_normalization 1.59% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.29% : 0.000003s : 22: predicate.load_eliminater 1.21% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.07% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.67% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.66% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 8: predicate.minmaximum_grad 1.79% : 0.000003s : 3: predicate.mutable_eliminate 0.41% : 0.000001s : 3: predicate.opt_reshape 0.44% : 0.000001s : 3: predicate.parallel_virtual_node 1.43% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 11: predicate.partial_eliminate 0.84% : 0.000001s : 8: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.11% : 0.000002s : 8: predicate.reduce_eliminate 2.16% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.64% : 0.000001s : 6: predicate.remove_not_recompute_node 1.18% : 0.000002s : 14: predicate.replace_applicator 0.59% : 0.000001s : 6: predicate.replace_old_param 0.48% : 0.000001s : 3: predicate.reset_defer_inline 0.90% : 0.000001s : 8: predicate.reshape_eliminate 0.64% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 3: predicate.row_tensor_eliminate 1.13% : 0.000002s : 6: predicate.same_eliminate 0.50% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 6: predicate.shard_identity_eliminate 0.85% : 0.000001s : 6: predicate.special_op_eliminate 0.89% : 0.000001s : 6: predicate.specialize_transform 0.96% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.26% : 0.000002s : 11: predicate.switch_defer_inline 1.84% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.76% : 0.000007s : 38: predicate.switch_simplify 0.84% : 0.000001s : 8: predicate.tile_eliminate 0.85% : 0.000001s : 8: predicate.transpose_eliminate 1.44% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.49% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.57% : 0.000001s : 3: predicate.value_based_eliminate 0.81% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000291 7 37.21% : 0.000108s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.79% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030238 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.39% : 0.003141s : 1: add_attr 10.36% : 0.003132s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000065s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.87% : 0.000564s : 1: bootstrap 0.10% : 0.000031s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.05% : 0.000015s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.44% : 0.000436s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.82% : 0.000549s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000016s : 1: opt.transform.mutable_eliminate 3.00% : 0.000908s : 78: opt.transform.opt_a 0.08% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000093s : 28: opt.transform.opt_b 0.14% : 0.000041s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.41% : 0.002242s : 1: opt_a 0.33% : 0.000101s : 1: opt_after_cconv 1.57% : 0.000474s : 1: opt_after_jit_grad 0.65% : 0.000196s : 1: opt_b 13.99% : 0.004230s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.95% : 0.000288s : 1: renormalize.infer 0.74% : 0.000225s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000039s : 1: rewriter_after_opt_a 0.19% : 0.000057s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000075s : 1: symbol_engine_optimizer 21.52% : 0.006508s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.01% : 0.006050s : 1: type_inference 0.24% : 0.000073s : 1: validate TotalTime = 0.0406474, [24] [bootstrap]: 0.00048981 [type_inference]: 0.0122253 [event_method]: 4.272e-05 [auto_monad]: 0.00012975 [graph_reusing]: 8.17998e-06 [inline]: 2.51e-06 [add_attr]: 0.00313362, [1] [add_attr_with_inline]: 0.00312502, [1] [Cycle 1]: 0.0001218, [2] [tag_attr]: 7.859e-05 [meta_addattr_fg_expand]: 9.67001e-06 [parallel-infer-symbol]: 3.58e-06 [pre_auto_parallel]: 4.973e-05 [insert-virtual-dataset]: 2.77002e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.016997, [53] [py_interpret_to_execute]: 3.874e-05 [rewriter_before_opt_a]: 0.00014579 [opt_a]: 0.0146682, [3] [Cycle 1]: 0.0110597, [45] [expand_dump_flag]: 4.93001e-06 [switch_simplify]: 7.235e-05 [loop_unroll]: 5.943e-05 [a_1]: 0.00137605 [with_stream_mark]: 2.646e-05 [recompute_prepare]: 2.234e-05 [updatestate_depend_eliminate]: 8.32e-06 [updatestate_assign_eliminate]: 7.93999e-06 [updatestate_loads_eliminate]: 6.98e-06 [parameter_eliminate]: 2.80997e-06 [a_2]: 0.00024878 [accelerated_algorithm]: 3.237e-05 [shard]: 1.81998e-06 [meta_shard_fg_expand]: 3.91999e-06 [shard_inline]: 1.621e-05 [merge_send_recv]: 1.622e-05 [auto_parallel]: 1.118e-05 [parallel]: 2.005e-05 [flash_sp]: 1.301e-05 [merge_comm]: 9.32001e-06 [allreduce_fusion]: 8.74e-06 [matmul_add_comm_reduction]: 2.865e-05 [allreduce_slice_to_reducescatter]: 8.50006e-07 [virtual_shard_identity]: 1.794e-05 [virtual_dataset]: 1.551e-05 [get_grad_eliminate_]: 1.518e-05 [virtual_output]: 1.512e-05 [merge_forward]: 9.64e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 1.791e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.013e-05 [merge_recompute_call_nodes]: 1.89e-06 [before_grad]: 2.856e-05 [set_forward_comm_id_for_comm_node_pass]: 9.19e-06 [meta_fg_expand]: 0.00154787 [flash_sp_send_recv_attached]: 4.13001e-06 [receive_attached]: 2.88e-06 [after_resolve]: 6.576e-05 [a_after_grad]: 8.853e-05 [renormalize]: 0.00629213 [add_forward_monad_depend]: 1.131e-05 [auto_monad_grad]: 6.61e-06 [auto_monad_eliminator]: 5.222e-05 [cse]: 0.00018487 [a_3]: 0.0003397 [Cycle 2]: 0.00290021, [45] [expand_dump_flag]: 2.54999e-06 [switch_simplify]: 4.619e-05 [loop_unroll]: 4.235e-05 [a_1]: 0.00138053 [with_stream_mark]: 1.617e-05 [recompute_prepare]: 1.052e-05 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 3.94002e-06 [updatestate_loads_eliminate]: 3.23e-06 [parameter_eliminate]: 2.02999e-06 [a_2]: 9.13e-05 [accelerated_algorithm]: 1.23e-05 [shard]: 1.75001e-06 [meta_shard_fg_expand]: 2.72001e-06 [shard_inline]: 7.23999e-06 [merge_send_recv]: 8.94e-06 [auto_parallel]: 9.67999e-06 [parallel]: 9.57999e-06 [flash_sp]: 3.83999e-06 [merge_comm]: 4.18001e-06 [allreduce_fusion]: 3.74002e-06 [matmul_add_comm_reduction]: 9.24998e-06 [allreduce_slice_to_reducescatter]: 1.40001e-06 [virtual_shard_identity]: 8.02e-06 [virtual_dataset]: 6.63e-06 [get_grad_eliminate_]: 6.31e-06 [virtual_output]: 6.11e-06 [merge_forward]: 4.67e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 1.102e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.414e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 1.111e-05 [set_forward_comm_id_for_comm_node_pass]: 4.38001e-06 [meta_fg_expand]: 6.284e-05 [flash_sp_send_recv_attached]: 2.16998e-06 [receive_attached]: 2.11e-06 [after_resolve]: 1.301e-05 [a_after_grad]: 1.005e-05 [renormalize]: 0.00070192 [add_forward_monad_depend]: 5.09e-06 [auto_monad_grad]: 2.03002e-06 [auto_monad_eliminator]: 1.334e-05 [cse]: 2.833e-05 [a_3]: 4.952e-05 [Cycle 3]: 0.00069071, [45] [expand_dump_flag]: 1.45999e-06 [switch_simplify]: 7.98001e-06 [loop_unroll]: 6.74999e-06 [a_1]: 0.00014825 [with_stream_mark]: 9.86e-06 [recompute_prepare]: 6.77002e-06 [updatestate_depend_eliminate]: 3.9e-06 [updatestate_assign_eliminate]: 2.89999e-06 [updatestate_loads_eliminate]: 2.41998e-06 [parameter_eliminate]: 1.12999e-06 [a_2]: 8.681e-05 [accelerated_algorithm]: 1.014e-05 [shard]: 8.70001e-07 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 7.18e-06 [merge_send_recv]: 5.57999e-06 [auto_parallel]: 6.24001e-06 [parallel]: 5.89e-06 [flash_sp]: 9.49978e-07 [merge_comm]: 3.77998e-06 [allreduce_fusion]: 3.53e-06 [matmul_add_comm_reduction]: 6.31998e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 7.77e-06 [virtual_dataset]: 6.46e-06 [get_grad_eliminate_]: 6.27001e-06 [virtual_output]: 6.19001e-06 [merge_forward]: 3.08e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 7.48999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.297e-05 [merge_recompute_call_nodes]: 9.20001e-07 [before_grad]: 1.086e-05 [set_forward_comm_id_for_comm_node_pass]: 3.85e-06 [meta_fg_expand]: 2.46e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 9.99979e-07 [after_resolve]: 9.12001e-06 [a_after_grad]: 9.36e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 1.03001e-06 [auto_monad_eliminator]: 7.37002e-06 [cse]: 1.76e-05 [a_3]: 3.906e-05 [py_interpret_to_execute_after_opt_a]: 1.284e-05 [slice_cell_reuse_recomputed_activation]: 1.98997e-06 [rewriter_after_opt_a]: 4.368e-05 [convert_after_rewriter]: 7.65e-06 [order_py_execute_after_rewriter]: 5.71e-06 [mutable_eliminate]: 0.00061576 [opt_b]: 0.00025673, [1] [Cycle 1]: 0.00024843, [7] [b_1]: 0.00013471 [b_2]: 8.37e-06 [updatestate_depend_eliminate]: 6.36998e-06 [updatestate_assign_eliminate]: 2.95002e-06 [updatestate_loads_eliminate]: 2.58e-06 [renormalize]: 4.50003e-07 [cse]: 5.182e-05 [optimize_parallel_all_gather_comm]: 1.934e-05 [overlap_param_gather]: 1.99999e-06 [cconv]: 2.569e-05 [loop_unroll]: 0.00044149 [opt_after_cconv]: 0.0001102, [1] [Cycle 1]: 0.0001037, [7] [c_1]: 3.306e-05 [parameter_eliminate]: 2.86999e-06 [updatestate_depend_eliminate]: 6.06998e-06 [updatestate_assign_eliminate]: 3.05002e-06 [updatestate_loads_eliminate]: 2.89999e-06 [cse]: 2.071e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.541e-05 [tuple_transform]: 7.856e-05, [1] [Cycle 1]: 7.354e-05, [4] [d_1]: 4.547e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 7.28e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 5.356e-05 [cse_after_recomputation]: 2.552e-05, [1] [Cycle 1]: 2.041e-05, [1] [cse]: 1.487e-05 [environ_conv]: 8.99998e-06 [swap_dp_allreduce_reducescatter]: 6.09999e-06 [bias_add_comm_swap]: 2.74001e-06 [label_micro_interleaved_index]: 4.42e-06 [label_fine_grained_interleaved_index]: 2.72001e-06 [merge_cast_opt]: 1.40001e-06 [slice_recompute_activation]: 2.32001e-06 [micro_interleaved_order_control]: 2.58e-06 [assign_add_opt]: 1.35001e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 3.11999e-06 [comm_op_add_attrs]: 1.27e-06 [add_comm_op_reuse_tag]: 1.20001e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.22999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.05002e-06 [control_data_broadcast_order]: 1.448e-05 [grouped_pairwise_exchange_alltoall]: 1.63002e-06 [offloading_packed_experts]: 4.48001e-06 [overlap_recompute_and_grad_model_parallel]: 5.44e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.51998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.25999e-06 [overlap_recompute_comm]: 2.46e-06 [overlap_grad_ring_attention]: 4.95999e-06 [overlap_grad_flash_sp]: 2.224e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 2.17001e-06 [handle_group_info]: 1.31998e-06 [symbol_engine_optimizer]: 8.699e-05, [1] [Cycle 1]: 8.22e-05, [6] [build]: 9.66e-06 [elim_shapecalc]: 1.017e-05 [elim_not_effective]: 1.41e-05 [opt_reshape]: 7.55e-06 [fold_const_symbol]: 1.144e-05 [renormalize]: 1.80007e-07 [detach_backward]: 2.29999e-06 [pipeline_parallel_scheduler]: 1.64e-06 [auto_monad_reorder]: 2.098e-05 [get_jit_bprop_graph]: 2.12999e-06 [rewriter_after_jit_bprop_graph]: 3.73001e-06 [opt_after_jit_grad]: 0.00047032 [validate]: 4.499e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.00676776 [execute]: 8.54e-06 Sums bootstrap : 0.000490s : 1.35% type_inference : 0.012225s : 33.77% event_method : 0.000043s : 0.12% auto_monad : 0.000130s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000079s : 0.22% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000050s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.11% optimize.rewriter_before_opt_a : 0.000146s : 0.40% optimize.opt_a.expand_dump_flag : 0.000009s : 0.02% optimize.opt_a.switch_simplify : 0.000127s : 0.35% optimize.opt_a.loop_unroll : 0.000109s : 0.30% optimize.opt_a.a_1 : 0.002905s : 8.02% optimize.opt_a.with_stream_mark : 0.000052s : 0.14% optimize.opt_a.recompute_prepare : 0.000040s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.03% optimize.opt_a.parameter_eliminate : 0.000006s : 0.02% optimize.opt_a.a_2 : 0.000427s : 1.18% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.15% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000031s : 0.08% optimize.opt_a.merge_send_recv : 0.000031s : 0.08% optimize.opt_a.auto_parallel : 0.000027s : 0.07% optimize.opt_a.parallel : 0.000036s : 0.10% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000017s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000003s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000034s : 0.09% optimize.opt_a.virtual_dataset : 0.000029s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.08% optimize.opt_a.virtual_output : 0.000027s : 0.08% optimize.opt_a.merge_forward : 0.000017s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000057s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000051s : 0.14% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000017s : 0.05% optimize.opt_a.meta_fg_expand : 0.001613s : 4.46% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000088s : 0.24% optimize.opt_a.a_after_grad : 0.000108s : 0.30% optimize.opt_a.renormalize : 0.006994s : 19.32% optimize.opt_a.add_forward_monad_depend : 0.000018s : 0.05% optimize.opt_a.auto_monad_grad : 0.000010s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000073s : 0.20% optimize.opt_a.cse : 0.000231s : 0.64% optimize.opt_a.a_3 : 0.000428s : 1.18% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000044s : 0.12% optimize.convert_after_rewriter : 0.000008s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000616s : 1.70% optimize.opt_b.b_1 : 0.000135s : 0.37% optimize.opt_b.b_2 : 0.000008s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000052s : 0.14% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000026s : 0.07% optimize.loop_unroll : 0.000441s : 1.22% optimize.opt_after_cconv.c_1 : 0.000033s : 0.09% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000021s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.04% optimize.tuple_transform.d_1 : 0.000045s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000054s : 0.15% optimize.cse_after_recomputation.cse : 0.000015s : 0.04% optimize.environ_conv : 0.000009s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000022s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.06% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000470s : 1.30% validate : 0.000045s : 0.12% backend_pass : 0.000001s : 0.00% task_emit : 0.006768s : 18.69% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000736 159 7.17% : 0.000053s : 7: substitution.arithmetic_simplify 0.31% : 0.000002s : 3: substitution.elim_not_effective 0.61% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 3: substitution.fold_const_symbol 0.82% : 0.000006s : 4: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.36% : 0.000003s : 2: substitution.incorporate_call_switch 58.09% : 0.000427s : 17: substitution.inline 2.29% : 0.000017s : 2: substitution.inline_without_move 1.41% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.32% : 0.000017s : 3: substitution.less_batch_normalization 1.54% : 0.000011s : 7: substitution.minmaximum_grad 0.89% : 0.000007s : 5: substitution.partial_eliminate 1.69% : 0.000012s : 15: substitution.remove_not_recompute_node 3.83% : 0.000028s : 10: substitution.replace_applicator 1.33% : 0.000010s : 10: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.90% : 0.000021s : 7: substitution.tuple_list_convert_item_index_to_positive 1.47% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.92% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.44% : 0.000055s : 18: substitution.tuple_list_get_item_eliminator 2.07% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012143 2 87.89% : 0.010672s : 1: type_inference.infer 12.11% : 0.001471s : 1: type_inference.specialize ------[replace.] 0.000194 26 66.40% : 0.000129s : 17: replace.inline 33.60% : 0.000065s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000444 26 94.06% : 0.000418s : 17: match.inline 5.94% : 0.000026s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000683 4180 1.12% : 0.000008s : 52: predicate.accumulaten_eliminater 0.25% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.11% : 0.000008s : 52: predicate.addn_zero_filter 1.09% : 0.000007s : 52: predicate.adjust_all_reduce_mul_add 2.02% : 0.000014s : 73: predicate.arithmetic_simplify 1.14% : 0.000008s : 52: predicate.cast_eliminate 1.12% : 0.000008s : 50: predicate.check_bprop_eliminate 0.47% : 0.000003s : 21: predicate.compare_switch_simplify 0.07% : 0.000000s : 4: predicate.const_output_eliminate 0.47% : 0.000003s : 21: predicate.depend_value_elim 1.14% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.21% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.15% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000000s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.20% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.21% : 0.000008s : 56: predicate.environ_get_depend_swap 1.68% : 0.000011s : 77: predicate.environ_get_eliminate 1.18% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.83% : 0.000012s : 78: predicate.exchange_switch_depend_value 2.61% : 0.000018s : 78: predicate.float_depend_g_call 0.47% : 0.000003s : 21: predicate.float_environ_get_switch 0.58% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.57% : 0.000004s : 21: predicate.get_grad_eliminate 0.09% : 0.000001s : 4: predicate.graph_param_transform 0.51% : 0.000003s : 21: predicate.incorporate_call 0.46% : 0.000003s : 21: predicate.incorporate_call_switch 5.89% : 0.000040s : 180: predicate.inline 1.46% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.62% : 0.000004s : 21: predicate.less_batch_normalization 1.49% : 0.000010s : 69: predicate.list_to_tuple_eliminator_ 2.60% : 0.000018s : 121: predicate.load_eliminater 0.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.57% : 0.000018s : 110: predicate.loop_unroll_before_grad 1.36% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.48% : 0.000003s : 21: predicate.merge_addn 1.09% : 0.000007s : 50: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 52: predicate.minmaximum_grad 0.36% : 0.000002s : 4: predicate.mutable_eliminate 0.13% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 2.16% : 0.000015s : 78: predicate.partial_defer_inline 1.70% : 0.000012s : 65: predicate.partial_eliminate 1.11% : 0.000008s : 52: predicate.print_const_string_wrapper 0.48% : 0.000003s : 21: predicate.reduce_all_const_elim 1.32% : 0.000009s : 52: predicate.reduce_eliminate 2.61% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 21: predicate.remove_not_recompute_node 1.93% : 0.000013s : 111: predicate.replace_applicator 0.73% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.13% : 0.000008s : 52: predicate.reshape_eliminate 1.12% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.11% : 0.000001s : 4: predicate.row_tensor_eliminate 1.26% : 0.000009s : 50: predicate.same_eliminate 0.34% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.55% : 0.000004s : 21: predicate.shard_identity_eliminate 0.22% : 0.000002s : 8: predicate.special_op_eliminate 0.63% : 0.000004s : 21: predicate.specialize_transform 1.25% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.93% : 0.000013s : 78: predicate.switch_defer_inline 3.03% : 0.000021s : 128: predicate.switch_layer_defer_inline 5.19% : 0.000035s : 213: predicate.switch_simplify 1.12% : 0.000008s : 52: predicate.tile_eliminate 1.12% : 0.000008s : 52: predicate.transpose_eliminate 1.44% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.75% : 0.000019s : 90: predicate.tuple_list_get_item_eliminator 1.44% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000013s : 81: predicate.tuple_list_set_item_eliminator 1.55% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.59% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.20% : 0.000022s : 142: predicate.updatestate_useless_node_eliminater 0.12% : 0.000001s : 4: predicate.value_based_eliminate 0.54% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001748 35 60.49% : 0.001058s : 14: func_graph_cloner_run.FuncGraphClonerGraph 39.51% : 0.000691s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.072364 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.34% : 0.003138s : 1: add_attr 4.32% : 0.003129s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000058s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000137s : 1: auto_monad 0.03% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.73% : 0.000526s : 1: bootstrap 0.04% : 0.000030s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000018s : 1: control_data_broadcast_order 0.02% : 0.000011s : 1: convert_after_rewriter 0.04% : 0.000028s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.07% : 0.000050s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.62% : 0.000450s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.86% : 0.000626s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 6.08% : 0.004403s : 117: opt.transform.opt_a 0.04% : 0.000031s : 1: opt.transform.opt_after_cconv 0.03% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000115s : 28: opt.transform.opt_b 0.07% : 0.000051s : 2: opt.transform.opt_trans_graph 0.05% : 0.000039s : 4: opt.transform.symbol_engine_opt 20.27% : 0.014671s : 1: opt_a 0.16% : 0.000114s : 1: opt_after_cconv 0.66% : 0.000481s : 1: opt_after_jit_grad 0.36% : 0.000260s : 1: opt_b 23.49% : 0.017002s : 1: optimize 0.03% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.04% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000054s : 1: pre_auto_parallel 0.06% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 7.50% : 0.005426s : 2: renormalize.infer 2.14% : 0.001551s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.07% : 0.000048s : 1: rewriter_after_opt_a 0.21% : 0.000150s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000090s : 1: symbol_engine_optimizer 9.37% : 0.006784s : 1: task_emit 0.11% : 0.000082s : 1: tuple_transform 16.93% : 0.012249s : 1: type_inference 0.11% : 0.000078s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x0-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x0-kbk],max_mem:6.0M TotalTime = 0.0718754, [24] [bootstrap]: 0.0005039 [type_inference]: 0.00614155 [event_method]: 1.413e-05 [auto_monad]: 5.778e-05 [graph_reusing]: 6.38e-06 [inline]: 1.75001e-06 [add_attr]: 0.00357698, [1] [add_attr_with_inline]: 0.00356615, [1] [Cycle 1]: 4.79e-05, [2] [tag_attr]: 1.586e-05 [meta_addattr_fg_expand]: 4.24002e-06 [parallel-infer-symbol]: 2.96001e-06 [pre_auto_parallel]: 2.462e-05 [insert-virtual-dataset]: 2.76e-06 [parallel-infer-symbol-second]: 8.40024e-07 [dataset_repeat_opt]: 1.97999e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.00418046, [53] [py_interpret_to_execute]: 1.951e-05 [rewriter_before_opt_a]: 6.304e-05 [opt_a]: 0.00221386, [2] [Cycle 1]: 0.00160079, [45] [expand_dump_flag]: 3.25e-06 [switch_simplify]: 3.444e-05 [loop_unroll]: 2.107e-05 [a_1]: 0.00043541 [with_stream_mark]: 1.432e-05 [recompute_prepare]: 8.54e-06 [updatestate_depend_eliminate]: 3.95e-06 [updatestate_assign_eliminate]: 3.44001e-06 [updatestate_loads_eliminate]: 3.21999e-06 [parameter_eliminate]: 1.81998e-06 [a_2]: 8.347e-05 [accelerated_algorithm]: 6.66e-06 [shard]: 1.84e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 6.20002e-06 [merge_send_recv]: 8.32e-06 [auto_parallel]: 6.21e-06 [parallel]: 2.652e-05 [flash_sp]: 7.30998e-06 [merge_comm]: 4.05e-06 [allreduce_fusion]: 3.85e-06 [matmul_add_comm_reduction]: 9.04e-06 [allreduce_slice_to_reducescatter]: 1.17e-06 [virtual_shard_identity]: 7.98999e-06 [virtual_dataset]: 6.12001e-06 [get_grad_eliminate_]: 6.04001e-06 [virtual_output]: 5.76e-06 [merge_forward]: 3.97e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.32001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.249e-05 [merge_recompute_call_nodes]: 1.46998e-06 [before_grad]: 1.039e-05 [set_forward_comm_id_for_comm_node_pass]: 3.68e-06 [meta_fg_expand]: 2.86e-06 [flash_sp_send_recv_attached]: 2.74999e-06 [receive_attached]: 2.34999e-06 [after_resolve]: 9.75002e-06 [a_after_grad]: 8.72998e-06 [renormalize]: 0.00045983 [add_forward_monad_depend]: 8.52e-06 [auto_monad_grad]: 2.32999e-06 [auto_monad_eliminator]: 1.46e-05 [cse]: 3.192e-05 [a_3]: 4.256e-05 [Cycle 2]: 0.00060386, [45] [expand_dump_flag]: 1.12999e-06 [switch_simplify]: 7.26001e-06 [loop_unroll]: 5.81e-06 [a_1]: 0.00011498 [with_stream_mark]: 1.024e-05 [recompute_prepare]: 5.96998e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 7.197e-05 [accelerated_algorithm]: 6.16e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.35001e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 4.49002e-06 [auto_parallel]: 5.42001e-06 [parallel]: 4.86997e-06 [flash_sp]: 3.72998e-06 [merge_comm]: 3.31999e-06 [allreduce_fusion]: 3.14999e-06 [matmul_add_comm_reduction]: 5.17e-06 [allreduce_slice_to_reducescatter]: 5.00004e-07 [virtual_shard_identity]: 6.53e-06 [virtual_dataset]: 5.54998e-06 [get_grad_eliminate_]: 5.17e-06 [virtual_output]: 5.11002e-06 [merge_forward]: 3.06999e-06 [cell_reuse_recompute_pass]: 1.58002e-06 [offload_activation]: 6.71e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 7.60017e-07 [before_grad]: 8.87999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18998e-06 [meta_fg_expand]: 1.75001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.23002e-06 [after_resolve]: 8e-06 [a_after_grad]: 7.93001e-06 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 9.39996e-07 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 6.16998e-06 [cse]: 1.365e-05 [a_3]: 3.331e-05 [py_interpret_to_execute_after_opt_a]: 8.33001e-06 [slice_cell_reuse_recomputed_activation]: 1.93002e-06 [rewriter_after_opt_a]: 3.387e-05 [convert_after_rewriter]: 6.53e-06 [order_py_execute_after_rewriter]: 4.72e-06 [mutable_eliminate]: 0.00049709 [opt_b]: 0.00019233, [1] [Cycle 1]: 0.00018587, [7] [b_1]: 0.0001137 [b_2]: 7.53999e-06 [updatestate_depend_eliminate]: 5.47999e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.43e-06 [renormalize]: 4.99975e-07 [cse]: 1.783e-05 [optimize_parallel_all_gather_comm]: 1.579e-05 [overlap_param_gather]: 1.81003e-06 [cconv]: 2.441e-05 [loop_unroll]: 0.00043056 [opt_after_cconv]: 9.581e-05, [1] [Cycle 1]: 8.96e-05, [7] [c_1]: 2.59e-05 [parameter_eliminate]: 2.53e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.44999e-06 [cse]: 1.701e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.405e-05 [tuple_transform]: 6.94e-05, [1] [Cycle 1]: 6.494e-05, [4] [d_1]: 3.745e-05 [none_parameter_eliminate]: 1.84e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.56e-06 [partial_unused_args_eliminate]: 1.92999e-06 [add_recomputation]: 4.982e-05 [cse_after_recomputation]: 2.246e-05, [1] [Cycle 1]: 1.737e-05, [1] [cse]: 1.192e-05 [environ_conv]: 9.81e-06 [swap_dp_allreduce_reducescatter]: 5.42999e-06 [bias_add_comm_swap]: 2.46e-06 [label_micro_interleaved_index]: 4.37998e-06 [label_fine_grained_interleaved_index]: 2.96999e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.14999e-06 [micro_interleaved_order_control]: 2.64999e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.27999e-06 [full_micro_interleaved_order_control]: 2.59999e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 1.35999e-06 [add_comm_op_reuse_tag]: 1.25001e-06 [interleave_split_concat_branches]: 1.44e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.21997e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.212e-05 [grouped_pairwise_exchange_alltoall]: 1.39998e-06 [offloading_packed_experts]: 4.15999e-06 [overlap_recompute_and_grad_model_parallel]: 5.07e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.36002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27e-06 [overlap_recompute_comm]: 2.36998e-06 [overlap_grad_ring_attention]: 4.45e-06 [overlap_grad_flash_sp]: 1.706e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 1.90001e-06 [handle_group_info]: 1.15001e-06 [symbol_engine_optimizer]: 8.778e-05, [1] [Cycle 1]: 8.32e-05, [6] [build]: 2.51998e-06 [elim_shapecalc]: 8.15e-06 [elim_not_effective]: 1.402e-05 [opt_reshape]: 6.52001e-06 [fold_const_symbol]: 9.97001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.89e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.63e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.61999e-06 [opt_after_jit_grad]: 0.00046233 [validate]: 3.651e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0566181 [execute]: 7.96001e-06 Sums bootstrap : 0.000504s : 0.75% type_inference : 0.006142s : 9.13% event_method : 0.000014s : 0.02% auto_monad : 0.000058s : 0.09% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000063s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000042s : 0.06% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000550s : 0.82% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000015s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000155s : 0.23% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000031s : 0.05% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000460s : 0.68% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000046s : 0.07% optimize.opt_a.a_3 : 0.000076s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000497s : 0.74% optimize.opt_b.b_1 : 0.000114s : 0.17% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.04% optimize.loop_unroll : 0.000431s : 0.64% optimize.opt_after_cconv.c_1 : 0.000026s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.02% optimize.tuple_transform.d_1 : 0.000037s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.07% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000462s : 0.69% validate : 0.000037s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.056618s : 84.17% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000162 26 20.03% : 0.000033s : 5: substitution.arithmetic_simplify 1.30% : 0.000002s : 2: substitution.elim_not_effective 0.88% : 0.000001s : 2: substitution.fold_const_symbol 3.36% : 0.000005s : 3: substitution.graph_param_transform 60.86% : 0.000099s : 3: substitution.inline 2.08% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.28% : 0.000005s : 4: substitution.remove_not_recompute_node 2.20% : 0.000004s : 2: substitution.replace_old_param 6.01% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006090 2 90.10% : 0.005488s : 1: type_inference.infer 9.90% : 0.000603s : 1: type_inference.specialize ------[replace.] 0.000037 4 77.14% : 0.000028s : 3: replace.inline 22.86% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000106 4 91.51% : 0.000097s : 3: match.inline 8.49% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 883 0.91% : 0.000001s : 9: predicate.accumulaten_eliminater 0.75% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 6: predicate.addn_check_dump 0.93% : 0.000001s : 9: predicate.addn_zero_filter 0.83% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 15: predicate.arithmetic_simplify 0.93% : 0.000001s : 9: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.66% : 0.000001s : 6: predicate.depend_value_elim 0.87% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.93% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.00% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.30% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_depend_swap 1.76% : 0.000003s : 18: predicate.environ_get_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 13: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.83% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.34% : 0.000001s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.57% : 0.000001s : 6: predicate.incorporate_call_switch 6.19% : 0.000010s : 40: predicate.inline 1.07% : 0.000002s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 6: predicate.less_batch_normalization 1.64% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 25: predicate.load_eliminater 1.19% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.84% : 0.000001s : 9: predicate.minmaximum_grad 1.21% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.67% : 0.000003s : 13: predicate.partial_defer_inline 1.42% : 0.000002s : 13: predicate.partial_eliminate 0.88% : 0.000001s : 9: predicate.print_const_string_wrapper 0.64% : 0.000001s : 6: predicate.reduce_all_const_elim 1.24% : 0.000002s : 9: predicate.reduce_eliminate 2.35% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 6: predicate.remove_not_recompute_node 1.31% : 0.000002s : 16: predicate.replace_applicator 0.59% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.92% : 0.000001s : 9: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 3: predicate.row_tensor_eliminate 0.81% : 0.000001s : 6: predicate.same_eliminate 0.52% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 6: predicate.shard_identity_eliminate 0.72% : 0.000001s : 6: predicate.special_op_eliminate 0.83% : 0.000001s : 6: predicate.specialize_transform 0.88% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.68% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.51% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 13: predicate.switch_defer_inline 1.95% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 43: predicate.switch_simplify 0.92% : 0.000001s : 9: predicate.tile_eliminate 0.90% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.15% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.27% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 3: predicate.value_based_eliminate 0.69% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.71% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 43.22% : 0.000147s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.78% : 0.000193s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.081175 196 0.00% : 0.000004s : 1: ForceFp32Comm 4.41% : 0.003583s : 1: add_attr 4.40% : 0.003570s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.08% : 0.000063s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.67% : 0.000542s : 1: bootstrap 0.03% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000013s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000439s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.62% : 0.000506s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.15% : 0.000932s : 78: opt.transform.opt_a 0.03% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000042s : 2: opt.transform.opt_trans_graph 0.04% : 0.000035s : 4: opt.transform.symbol_engine_opt 2.73% : 0.002217s : 1: opt_a 0.12% : 0.000099s : 1: opt_after_cconv 0.58% : 0.000472s : 1: opt_after_jit_grad 0.24% : 0.000196s : 1: opt_b 5.15% : 0.004184s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 0.30% : 0.000240s : 1: renormalize.infer 0.26% : 0.000213s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000038s : 1: rewriter_after_opt_a 0.08% : 0.000067s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000091s : 1: symbol_engine_optimizer 69.77% : 0.056634s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 7.58% : 0.006155s : 1: type_inference 0.07% : 0.000061s : 1: validate TotalTime = 0.0574017, [24] [bootstrap]: 0.00048356 [type_inference]: 0.00606388 [event_method]: 1.284e-05 [auto_monad]: 6.069e-05 [graph_reusing]: 5.27001e-06 [inline]: 2.22001e-06 [add_attr]: 0.00308533, [1] [add_attr_with_inline]: 0.00307719, [1] [Cycle 1]: 4.761e-05, [2] [tag_attr]: 1.388e-05 [meta_addattr_fg_expand]: 3.92998e-06 [parallel-infer-symbol]: 2.98998e-06 [pre_auto_parallel]: 2.366e-05 [insert-virtual-dataset]: 2.78e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 1.90001e-06 [pipeline_split]: 1.66998e-06 [optimize]: 0.00398073, [53] [py_interpret_to_execute]: 1.959e-05 [rewriter_before_opt_a]: 5.272e-05 [opt_a]: 0.0020803, [2] [Cycle 1]: 0.00146755, [45] [expand_dump_flag]: 2.61e-06 [switch_simplify]: 2.965e-05 [loop_unroll]: 1.711e-05 [a_1]: 0.00036604 [with_stream_mark]: 1.489e-05 [recompute_prepare]: 7.93001e-06 [updatestate_depend_eliminate]: 3.85998e-06 [updatestate_assign_eliminate]: 3.86999e-06 [updatestate_loads_eliminate]: 3.34001e-06 [parameter_eliminate]: 1.79e-06 [a_2]: 8.161e-05 [accelerated_algorithm]: 6.79999e-06 [shard]: 2.61e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 6.38003e-06 [merge_send_recv]: 8.45999e-06 [auto_parallel]: 6.43e-06 [parallel]: 1.878e-05 [flash_sp]: 7.37997e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.82002e-06 [matmul_add_comm_reduction]: 9.91e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 7.5e-06 [virtual_dataset]: 6.20002e-06 [get_grad_eliminate_]: 5.81e-06 [virtual_output]: 5.92999e-06 [merge_forward]: 3.92998e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.52001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.186e-05 [merge_recompute_call_nodes]: 1.88002e-06 [before_grad]: 1.012e-05 [set_forward_comm_id_for_comm_node_pass]: 3.78001e-06 [meta_fg_expand]: 3.06001e-06 [flash_sp_send_recv_attached]: 2.50002e-06 [receive_attached]: 2.27999e-06 [after_resolve]: 1.009e-05 [a_after_grad]: 8.57e-06 [renormalize]: 0.00043332 [add_forward_monad_depend]: 5.22e-06 [auto_monad_grad]: 1.97001e-06 [auto_monad_eliminator]: 1.391e-05 [cse]: 3.121e-05 [a_3]: 4.318e-05 [Cycle 2]: 0.00060311, [45] [expand_dump_flag]: 1.09998e-06 [switch_simplify]: 6.77002e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00011297 [with_stream_mark]: 9.78002e-06 [recompute_prepare]: 5.94999e-06 [updatestate_depend_eliminate]: 2.86999e-06 [updatestate_assign_eliminate]: 2.31998e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 7.262e-05 [accelerated_algorithm]: 5.84999e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.24998e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 4.47998e-06 [auto_parallel]: 5.24e-06 [parallel]: 4.33999e-06 [flash_sp]: 3.51001e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.82002e-06 [matmul_add_comm_reduction]: 5.31002e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.07001e-06 [virtual_dataset]: 5.46e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 5.17e-06 [merge_forward]: 2.69001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 6.78e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.049e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.50001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46001e-06 [meta_fg_expand]: 1.97001e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 8.42e-06 [a_after_grad]: 7.58999e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.07998e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 6.96001e-06 [cse]: 1.446e-05 [a_3]: 3.26e-05 [py_interpret_to_execute_after_opt_a]: 7.31001e-06 [slice_cell_reuse_recomputed_activation]: 2.11998e-06 [rewriter_after_opt_a]: 3.296e-05 [convert_after_rewriter]: 6.88e-06 [order_py_execute_after_rewriter]: 4.87998e-06 [mutable_eliminate]: 0.00047661 [opt_b]: 0.00019153, [1] [Cycle 1]: 0.00018498, [7] [b_1]: 0.00011244 [b_2]: 7.53999e-06 [updatestate_depend_eliminate]: 5.42001e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.53e-06 [renormalize]: 3.89991e-07 [cse]: 1.87e-05 [optimize_parallel_all_gather_comm]: 1.696e-05 [overlap_param_gather]: 1.94999e-06 [cconv]: 2.416e-05 [loop_unroll]: 0.00042427 [opt_after_cconv]: 9.698e-05, [1] [Cycle 1]: 9.118e-05, [7] [c_1]: 2.579e-05 [parameter_eliminate]: 2.35002e-06 [updatestate_depend_eliminate]: 5.29998e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.797e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.417e-05 [tuple_transform]: 6.809e-05, [1] [Cycle 1]: 6.335e-05, [4] [d_1]: 3.708e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 6.28e-06 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 4.326e-05 [cse_after_recomputation]: 2.151e-05, [1] [Cycle 1]: 1.702e-05, [1] [cse]: 1.159e-05 [environ_conv]: 5.75001e-06 [swap_dp_allreduce_reducescatter]: 4.97e-06 [bias_add_comm_swap]: 2.59001e-06 [label_micro_interleaved_index]: 4.23001e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.19999e-06 [micro_interleaved_order_control]: 2.48e-06 [assign_add_opt]: 1.26002e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 1.30001e-06 [full_micro_interleaved_order_control]: 2.48002e-06 [reorder_send_recv_between_fp_bp]: 2.69001e-06 [comm_op_add_attrs]: 1.17999e-06 [add_comm_op_reuse_tag]: 1.29998e-06 [interleave_split_concat_branches]: 1.26002e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.30001e-06 [overlap_opt_shard_grad_in_pipeline]: 2.17999e-06 [control_data_broadcast_order]: 1.229e-05 [grouped_pairwise_exchange_alltoall]: 1.44998e-06 [offloading_packed_experts]: 3.68e-06 [overlap_recompute_and_grad_model_parallel]: 5.54e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.50999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.32999e-06 [overlap_grad_ring_attention]: 4.48999e-06 [overlap_grad_flash_sp]: 1.775e-05 [begin_end_overlap_inline]: 6.09987e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 1.10999e-06 [symbol_engine_optimizer]: 7.129e-05, [1] [Cycle 1]: 6.694e-05, [6] [build]: 2.61e-06 [elim_shapecalc]: 8.77e-06 [elim_not_effective]: 1.229e-05 [opt_reshape]: 6.36e-06 [fold_const_symbol]: 9.19e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.587e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 4.00998e-06 [opt_after_jit_grad]: 0.00046894 [validate]: 3.754e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0429056 [execute]: 1.013e-05 Sums bootstrap : 0.000484s : 0.91% type_inference : 0.006064s : 11.38% event_method : 0.000013s : 0.02% auto_monad : 0.000061s : 0.11% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000024s : 0.04% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.04% optimize.rewriter_before_opt_a : 0.000053s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000036s : 0.07% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000479s : 0.90% optimize.opt_a.with_stream_mark : 0.000025s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000154s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.03% optimize.opt_a.renormalize : 0.000433s : 0.81% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.04% optimize.opt_a.cse : 0.000046s : 0.09% optimize.opt_a.a_3 : 0.000076s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000477s : 0.89% optimize.opt_b.b_1 : 0.000112s : 0.21% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.05% optimize.loop_unroll : 0.000424s : 0.80% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.03% optimize.tuple_transform.d_1 : 0.000037s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.08% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000469s : 0.88% validate : 0.000038s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.042906s : 80.51% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000148 24 19.90% : 0.000029s : 4: substitution.arithmetic_simplify 1.31% : 0.000002s : 2: substitution.elim_not_effective 0.92% : 0.000001s : 2: substitution.fold_const_symbol 3.89% : 0.000006s : 3: substitution.graph_param_transform 66.45% : 0.000098s : 3: substitution.inline 2.09% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.15% : 0.000005s : 4: substitution.remove_not_recompute_node 2.30% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006017 2 92.14% : 0.005544s : 1: type_inference.infer 7.86% : 0.000473s : 1: type_inference.specialize ------[replace.] 0.000030 3 100.00% : 0.000030s : 3: replace.inline ------[match.] 0.000096 3 100.00% : 0.000096s : 3: match.inline ------[predicate.] 0.000148 815 0.89% : 0.000001s : 8: predicate.accumulaten_eliminater 1.00% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 6: predicate.addn_check_dump 0.87% : 0.000001s : 8: predicate.addn_zero_filter 0.78% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.17% : 0.000003s : 14: predicate.arithmetic_simplify 0.90% : 0.000001s : 8: predicate.cast_eliminate 0.74% : 0.000001s : 6: predicate.check_bprop_eliminate 0.63% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.85% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 1.01% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_depend_swap 1.80% : 0.000003s : 17: predicate.environ_get_eliminate 1.12% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.21% : 0.000003s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 6: predicate.float_environ_get_switch 1.02% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.72% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.72% : 0.000001s : 6: predicate.incorporate_call 0.62% : 0.000001s : 6: predicate.incorporate_call_switch 6.34% : 0.000009s : 37: predicate.inline 0.91% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 6: predicate.less_batch_normalization 1.52% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.30% : 0.000003s : 22: predicate.load_eliminater 1.10% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.96% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.68% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 6: predicate.merge_addn 0.66% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 8: predicate.minmaximum_grad 1.26% : 0.000002s : 3: predicate.mutable_eliminate 0.40% : 0.000001s : 3: predicate.opt_reshape 0.44% : 0.000001s : 3: predicate.parallel_virtual_node 1.51% : 0.000002s : 11: predicate.partial_defer_inline 1.31% : 0.000002s : 11: predicate.partial_eliminate 0.85% : 0.000001s : 8: predicate.print_const_string_wrapper 0.67% : 0.000001s : 6: predicate.reduce_all_const_elim 1.11% : 0.000002s : 8: predicate.reduce_eliminate 2.30% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.85% : 0.000001s : 6: predicate.remove_not_recompute_node 1.19% : 0.000002s : 14: predicate.replace_applicator 0.62% : 0.000001s : 6: predicate.replace_old_param 0.34% : 0.000001s : 3: predicate.reset_defer_inline 0.93% : 0.000001s : 8: predicate.reshape_eliminate 0.66% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.89% : 0.000001s : 6: predicate.same_eliminate 0.47% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 6: predicate.shard_identity_eliminate 0.78% : 0.000001s : 6: predicate.special_op_eliminate 0.91% : 0.000001s : 6: predicate.specialize_transform 1.08% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.27% : 0.000002s : 11: predicate.switch_defer_inline 1.96% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.85% : 0.000007s : 38: predicate.switch_simplify 0.92% : 0.000001s : 8: predicate.tile_eliminate 0.87% : 0.000001s : 8: predicate.transpose_eliminate 1.56% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.07% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.63% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.38% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.60% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.19% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.77% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 6: predicate.virtual_output_eliminate 0.41% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000306 7 39.89% : 0.000122s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.11% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.065899 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.69% : 0.003090s : 1: add_attr 4.68% : 0.003081s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000066s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.79% : 0.000523s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000018s : 1: event_method 0.03% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.66% : 0.000434s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.74% : 0.000487s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.28% : 0.000846s : 78: opt.transform.opt_a 0.04% : 0.000025s : 1: opt.transform.opt_after_cconv 0.04% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000092s : 28: opt.transform.opt_b 0.06% : 0.000041s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 3.16% : 0.002083s : 1: opt_a 0.15% : 0.000100s : 1: opt_after_cconv 0.73% : 0.000479s : 1: opt_after_jit_grad 0.30% : 0.000195s : 1: opt_b 6.05% : 0.003985s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000028s : 1: pre_auto_parallel 0.04% : 0.000023s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000018s : 1: remove_dup_value 0.35% : 0.000233s : 1: renormalize.infer 0.29% : 0.000193s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000037s : 1: rewriter_after_opt_a 0.09% : 0.000057s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000074s : 1: symbol_engine_optimizer 65.15% : 0.042932s : 1: task_emit 0.11% : 0.000071s : 1: tuple_transform 9.22% : 0.006079s : 1: type_inference 0.10% : 0.000064s : 1: validate TotalTime = 0.0554434, [24] [bootstrap]: 0.00036497 [type_inference]: 0.00533355 [event_method]: 1.418e-05 [auto_monad]: 6.018e-05 [graph_reusing]: 5.32999e-06 [inline]: 1.64e-06 [add_attr]: 0.00298576, [1] [add_attr_with_inline]: 0.00297811, [1] [Cycle 1]: 4.819e-05, [2] [tag_attr]: 1.478e-05 [meta_addattr_fg_expand]: 4.48999e-06 [parallel-infer-symbol]: 3.35e-06 [pre_auto_parallel]: 2.612e-05 [insert-virtual-dataset]: 2.46998e-06 [parallel-infer-symbol-second]: 9.29984e-07 [dataset_repeat_opt]: 1.92999e-06 [pipeline_split]: 2.39999e-06 [optimize]: 0.00405626, [53] [py_interpret_to_execute]: 2.181e-05 [rewriter_before_opt_a]: 6.28e-05 [opt_a]: 0.00216558, [2] [Cycle 1]: 0.00155552, [45] [expand_dump_flag]: 3.09001e-06 [switch_simplify]: 3.29e-05 [loop_unroll]: 2.067e-05 [a_1]: 0.00043892 [with_stream_mark]: 1.375e-05 [recompute_prepare]: 7.6e-06 [updatestate_depend_eliminate]: 3.83001e-06 [updatestate_assign_eliminate]: 3.91001e-06 [updatestate_loads_eliminate]: 3.88001e-06 [parameter_eliminate]: 1.73002e-06 [a_2]: 8.2e-05 [accelerated_algorithm]: 6.93e-06 [shard]: 2.18998e-06 [meta_shard_fg_expand]: 1.62999e-06 [shard_inline]: 6.12999e-06 [merge_send_recv]: 8.35001e-06 [auto_parallel]: 5.88998e-06 [parallel]: 1.884e-05 [flash_sp]: 8.02e-06 [merge_comm]: 3.64002e-06 [allreduce_fusion]: 3.93999e-06 [matmul_add_comm_reduction]: 9.02999e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.54002e-06 [virtual_dataset]: 5.95002e-06 [get_grad_eliminate_]: 5.57001e-06 [virtual_output]: 5.63997e-06 [merge_forward]: 3.76999e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 9.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.154e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 9.84001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.98999e-06 [meta_fg_expand]: 2.89999e-06 [flash_sp_send_recv_attached]: 2.61e-06 [receive_attached]: 2.28998e-06 [after_resolve]: 9.42999e-06 [a_after_grad]: 8.55999e-06 [renormalize]: 0.00044911 [add_forward_monad_depend]: 4.85001e-06 [auto_monad_grad]: 1.89e-06 [auto_monad_eliminator]: 1.33e-05 [cse]: 2.942e-05 [a_3]: 4.203e-05 [Cycle 2]: 0.0005997, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 7.00998e-06 [loop_unroll]: 5.74e-06 [a_1]: 0.00011341 [with_stream_mark]: 9.77999e-06 [recompute_prepare]: 5.97999e-06 [updatestate_depend_eliminate]: 3.00002e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 7.176e-05 [accelerated_algorithm]: 6.32001e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 4.63999e-06 [auto_parallel]: 5.47999e-06 [parallel]: 4.03999e-06 [flash_sp]: 3.46001e-06 [merge_comm]: 3.16999e-06 [allreduce_fusion]: 2.97002e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 6.11e-06 [virtual_dataset]: 5.45001e-06 [get_grad_eliminate_]: 5.27999e-06 [virtual_output]: 5.22e-06 [merge_forward]: 2.75002e-06 [cell_reuse_recompute_pass]: 1.39998e-06 [offload_activation]: 5.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.012e-05 [merge_recompute_call_nodes]: 7.79983e-07 [before_grad]: 8.74998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13e-06 [meta_fg_expand]: 1.81e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.07e-06 [a_after_grad]: 7.91001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 6.00002e-06 [cse]: 1.368e-05 [a_3]: 3.243e-05 [py_interpret_to_execute_after_opt_a]: 7.51999e-06 [slice_cell_reuse_recomputed_activation]: 2.06998e-06 [rewriter_after_opt_a]: 3.304e-05 [convert_after_rewriter]: 6.46e-06 [order_py_execute_after_rewriter]: 5.37999e-06 [mutable_eliminate]: 0.00045848 [opt_b]: 0.00019279, [1] [Cycle 1]: 0.00018641, [7] [b_1]: 0.00011433 [b_2]: 7.3e-06 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.37999e-06 [renormalize]: 3.4002e-07 [cse]: 1.78e-05 [optimize_parallel_all_gather_comm]: 1.713e-05 [overlap_param_gather]: 2.06998e-06 [cconv]: 2.43e-05 [loop_unroll]: 0.00041766 [opt_after_cconv]: 9.629e-05, [1] [Cycle 1]: 9.017e-05, [7] [c_1]: 2.563e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.807e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 1.52e-05 [tuple_transform]: 6.782e-05, [1] [Cycle 1]: 6.281e-05, [4] [d_1]: 3.624e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.43e-06 [partial_unused_args_eliminate]: 1.86998e-06 [add_recomputation]: 4.573e-05 [cse_after_recomputation]: 2.147e-05, [1] [Cycle 1]: 1.67e-05, [1] [cse]: 1.136e-05 [environ_conv]: 5.61998e-06 [swap_dp_allreduce_reducescatter]: 5.40999e-06 [bias_add_comm_swap]: 2.89999e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.53002e-06 [slice_recompute_activation]: 1.99999e-06 [micro_interleaved_order_control]: 2.65002e-06 [assign_add_opt]: 1.37999e-06 [ForceFp32Comm]: 1.09e-06 [remove_cast_before_assign_add]: 8.49977e-07 [full_micro_interleaved_order_control]: 2.34999e-06 [reorder_send_recv_between_fp_bp]: 2.93998e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 1.24e-06 [interleave_split_concat_branches]: 1.19998e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.53002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76998e-06 [control_data_broadcast_order]: 1.22e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.53e-06 [overlap_recompute_and_grad_model_parallel]: 5.62999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.55999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47001e-06 [overlap_recompute_comm]: 2.37999e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.865e-05 [begin_end_overlap_inline]: 5.49975e-07 [split_matmul_comm_elemetwise]: 2.58998e-06 [split_layernorm_comm]: 1.75001e-06 [handle_group_info]: 1.29998e-06 [symbol_engine_optimizer]: 7.176e-05, [1] [Cycle 1]: 6.724e-05, [6] [build]: 2.48998e-06 [elim_shapecalc]: 8.99e-06 [elim_not_effective]: 1.187e-05 [opt_reshape]: 6.27001e-06 [fold_const_symbol]: 9.61998e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.86e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.674e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.31001e-06 [opt_after_jit_grad]: 0.00045173 [validate]: 3.593e-05 [backend_pass]: 1.07998e-06 [task_emit]: 0.0418619 [execute]: 8.55001e-06 Sums bootstrap : 0.000365s : 0.71% type_inference : 0.005334s : 10.36% event_method : 0.000014s : 0.03% auto_monad : 0.000060s : 0.12% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000026s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.04% optimize.rewriter_before_opt_a : 0.000063s : 0.12% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.08% optimize.opt_a.loop_unroll : 0.000026s : 0.05% optimize.opt_a.a_1 : 0.000552s : 1.07% optimize.opt_a.with_stream_mark : 0.000024s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000154s : 0.30% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.03% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.03% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000017s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.03% optimize.opt_a.renormalize : 0.000449s : 0.87% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.04% optimize.opt_a.cse : 0.000043s : 0.08% optimize.opt_a.a_3 : 0.000074s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000458s : 0.89% optimize.opt_b.b_1 : 0.000114s : 0.22% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.05% optimize.loop_unroll : 0.000418s : 0.81% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.04% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000036s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.09% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000019s : 0.04% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000452s : 0.88% validate : 0.000036s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.041862s : 81.35% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000168 26 19.27% : 0.000032s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 3.15% : 0.000005s : 3: substitution.graph_param_transform 64.20% : 0.000108s : 3: substitution.inline 1.83% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.60% : 0.000004s : 4: substitution.remove_not_recompute_node 1.70% : 0.000003s : 2: substitution.replace_old_param 5.26% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005293 2 88.89% : 0.004705s : 1: type_inference.infer 11.11% : 0.000588s : 1: type_inference.specialize ------[replace.] 0.000036 4 78.59% : 0.000029s : 3: replace.inline 21.41% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 4 92.91% : 0.000106s : 3: match.inline 7.09% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 883 0.90% : 0.000001s : 9: predicate.accumulaten_eliminater 0.84% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 0.95% : 0.000001s : 9: predicate.addn_zero_filter 0.85% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.24% : 0.000004s : 15: predicate.arithmetic_simplify 0.93% : 0.000001s : 9: predicate.cast_eliminate 0.65% : 0.000001s : 6: predicate.check_bprop_eliminate 0.61% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.61% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.93% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.14% : 0.000002s : 12: predicate.environ_get_depend_swap 1.79% : 0.000003s : 18: predicate.environ_get_eliminate 1.17% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.31% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.89% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.71% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.65% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.04% : 0.000010s : 40: predicate.inline 0.91% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.09% : 0.000002s : 6: predicate.less_batch_normalization 1.64% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 25: predicate.load_eliminater 1.22% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.12% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.57% : 0.000002s : 15: predicate.make_slice_get_slice_eliminator 0.56% : 0.000001s : 6: predicate.merge_addn 0.62% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.84% : 0.000001s : 9: predicate.minmaximum_grad 1.03% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.35% : 0.000001s : 3: predicate.parallel_virtual_node 1.60% : 0.000003s : 13: predicate.partial_defer_inline 1.43% : 0.000002s : 13: predicate.partial_eliminate 0.94% : 0.000001s : 9: predicate.print_const_string_wrapper 0.60% : 0.000001s : 6: predicate.reduce_all_const_elim 1.27% : 0.000002s : 9: predicate.reduce_eliminate 2.38% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.66% : 0.000001s : 6: predicate.remove_not_recompute_node 1.31% : 0.000002s : 16: predicate.replace_applicator 0.68% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 0.95% : 0.000002s : 9: predicate.reshape_eliminate 0.62% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.81% : 0.000001s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 6: predicate.shard_identity_eliminate 0.78% : 0.000001s : 6: predicate.special_op_eliminate 0.86% : 0.000001s : 6: predicate.specialize_transform 0.86% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.70% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.42% : 0.000002s : 13: predicate.switch_defer_inline 1.97% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.02% : 0.000008s : 43: predicate.switch_simplify 0.91% : 0.000001s : 9: predicate.tile_eliminate 0.89% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000003s : 21: predicate.tuple_list_set_item_eliminator 1.60% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.36% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.70% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 6: predicate.virtual_output_eliminate 0.37% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000338 8 45.53% : 0.000154s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.47% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064010 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.67% : 0.002990s : 1: add_attr 4.66% : 0.002982s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000066s : 1: auto_monad 0.03% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.61% : 0.000390s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000020s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.67% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.73% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.44% : 0.000925s : 78: opt.transform.opt_a 0.04% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000092s : 28: opt.transform.opt_b 0.06% : 0.000040s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 3.39% : 0.002168s : 1: opt_a 0.16% : 0.000100s : 1: opt_after_cconv 0.72% : 0.000461s : 1: opt_after_jit_grad 0.31% : 0.000196s : 1: opt_b 6.34% : 0.004060s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.05% : 0.000030s : 1: pre_auto_parallel 0.04% : 0.000026s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.36% : 0.000233s : 1: renormalize.infer 0.33% : 0.000210s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000037s : 1: rewriter_after_opt_a 0.10% : 0.000067s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000075s : 1: symbol_engine_optimizer 65.43% : 0.041884s : 1: task_emit 0.11% : 0.000071s : 1: tuple_transform 8.35% : 0.005346s : 1: type_inference 0.09% : 0.000059s : 1: validate TotalTime = 0.0862398, [24] [bootstrap]: 0.00049925 [type_inference]: 0.0120779 [event_method]: 4.825e-05 [auto_monad]: 0.00013261 [graph_reusing]: 8.47998e-06 [inline]: 2.01e-06 [add_attr]: 0.00313342, [1] [add_attr_with_inline]: 0.00312407, [1] [Cycle 1]: 7.903e-05, [2] [tag_attr]: 3.53e-05 [meta_addattr_fg_expand]: 1.04e-05 [parallel-infer-symbol]: 3.19001e-06 [pre_auto_parallel]: 4.992e-05 [insert-virtual-dataset]: 2.56998e-06 [parallel-infer-symbol-second]: 8.59989e-07 [dataset_repeat_opt]: 2.06998e-06 [pipeline_split]: 1.84998e-06 [optimize]: 0.0176216, [53] [py_interpret_to_execute]: 3.873e-05 [rewriter_before_opt_a]: 0.00016338 [opt_a]: 0.0152763, [3] [Cycle 1]: 0.0116596, [45] [expand_dump_flag]: 4.27e-06 [switch_simplify]: 0.00013184 [loop_unroll]: 6.817e-05 [a_1]: 0.00144434 [with_stream_mark]: 2.593e-05 [recompute_prepare]: 2.28e-05 [updatestate_depend_eliminate]: 8.84e-06 [updatestate_assign_eliminate]: 7.54002e-06 [updatestate_loads_eliminate]: 6.81001e-06 [parameter_eliminate]: 3.27002e-06 [a_2]: 0.00024633 [accelerated_algorithm]: 3.201e-05 [shard]: 2.10002e-06 [meta_shard_fg_expand]: 4.08999e-06 [shard_inline]: 1.656e-05 [merge_send_recv]: 1.737e-05 [auto_parallel]: 1.067e-05 [parallel]: 2.016e-05 [flash_sp]: 1.191e-05 [merge_comm]: 9.35001e-06 [allreduce_fusion]: 8.47998e-06 [matmul_add_comm_reduction]: 2.844e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.896e-05 [virtual_dataset]: 1.582e-05 [get_grad_eliminate_]: 1.551e-05 [virtual_output]: 1.531e-05 [merge_forward]: 8.84e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 1.797e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.001e-05 [merge_recompute_call_nodes]: 1.82999e-06 [before_grad]: 2.965e-05 [set_forward_comm_id_for_comm_node_pass]: 9.54999e-06 [meta_fg_expand]: 0.00154484 [flash_sp_send_recv_attached]: 4.43999e-06 [receive_attached]: 2.83e-06 [after_resolve]: 6.99e-05 [a_after_grad]: 9.159e-05 [renormalize]: 0.00671582 [add_forward_monad_depend]: 1.223e-05 [auto_monad_grad]: 6.61e-06 [auto_monad_eliminator]: 5.335e-05 [cse]: 0.00019006 [a_3]: 0.00034458 [Cycle 2]: 0.00290384, [45] [expand_dump_flag]: 2.51e-06 [switch_simplify]: 4.65e-05 [loop_unroll]: 4.193e-05 [a_1]: 0.00136652 [with_stream_mark]: 1.579e-05 [recompute_prepare]: 1.05e-05 [updatestate_depend_eliminate]: 4.76002e-06 [updatestate_assign_eliminate]: 3.99002e-06 [updatestate_loads_eliminate]: 3.06999e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 9.099e-05 [accelerated_algorithm]: 1.239e-05 [shard]: 1.78002e-06 [meta_shard_fg_expand]: 2.27001e-06 [shard_inline]: 6.84001e-06 [merge_send_recv]: 9.23002e-06 [auto_parallel]: 9.60001e-06 [parallel]: 7.94002e-06 [flash_sp]: 3.83999e-06 [merge_comm]: 4.24002e-06 [allreduce_fusion]: 3.85e-06 [matmul_add_comm_reduction]: 8.57998e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 7.73999e-06 [virtual_dataset]: 7.11999e-06 [get_grad_eliminate_]: 6.99001e-06 [virtual_output]: 6.68e-06 [merge_forward]: 4.89e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 1.046e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.395e-05 [merge_recompute_call_nodes]: 1.97001e-06 [before_grad]: 1.185e-05 [set_forward_comm_id_for_comm_node_pass]: 4.45e-06 [meta_fg_expand]: 8.968e-05 [flash_sp_send_recv_attached]: 1.82999e-06 [receive_attached]: 2.50997e-06 [after_resolve]: 1.365e-05 [a_after_grad]: 1.015e-05 [renormalize]: 0.00069312 [add_forward_monad_depend]: 5.25999e-06 [auto_monad_grad]: 2.12001e-06 [auto_monad_eliminator]: 1.298e-05 [cse]: 2.702e-05 [a_3]: 5.064e-05 [Cycle 3]: 0.0006947, [45] [expand_dump_flag]: 1.42e-06 [switch_simplify]: 8.71002e-06 [loop_unroll]: 6.67002e-06 [a_1]: 0.00014972 [with_stream_mark]: 8.97e-06 [recompute_prepare]: 6.78998e-06 [updatestate_depend_eliminate]: 4.18001e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 1.42e-06 [a_2]: 8.653e-05 [accelerated_algorithm]: 9.94001e-06 [shard]: 9.10019e-07 [meta_shard_fg_expand]: 2.24999e-06 [shard_inline]: 6.96001e-06 [merge_send_recv]: 5.93998e-06 [auto_parallel]: 6.49999e-06 [parallel]: 5.62001e-06 [flash_sp]: 1.05001e-06 [merge_comm]: 4.12e-06 [allreduce_fusion]: 3.65e-06 [matmul_add_comm_reduction]: 6.23e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 8.08001e-06 [virtual_dataset]: 6.97002e-06 [get_grad_eliminate_]: 6.21998e-06 [virtual_output]: 6.18002e-06 [merge_forward]: 2.99999e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 8.22e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.259e-05 [merge_recompute_call_nodes]: 1.19e-06 [before_grad]: 1.095e-05 [set_forward_comm_id_for_comm_node_pass]: 4.03001e-06 [meta_fg_expand]: 2.26998e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.30013e-07 [after_resolve]: 9.66e-06 [a_after_grad]: 9.69e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.29e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 8.33999e-06 [cse]: 1.776e-05 [a_3]: 4.04e-05 [py_interpret_to_execute_after_opt_a]: 1.398e-05 [slice_cell_reuse_recomputed_activation]: 2.09e-06 [rewriter_after_opt_a]: 4.372e-05 [convert_after_rewriter]: 7.66001e-06 [order_py_execute_after_rewriter]: 5.81998e-06 [mutable_eliminate]: 0.0006214 [opt_b]: 0.00024798, [1] [Cycle 1]: 0.00023973, [7] [b_1]: 0.00015379 [b_2]: 8.95999e-06 [updatestate_depend_eliminate]: 6.93e-06 [updatestate_assign_eliminate]: 3.53999e-06 [updatestate_loads_eliminate]: 3.04999e-06 [renormalize]: 6.50005e-07 [cse]: 2.409e-05 [optimize_parallel_all_gather_comm]: 1.937e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.841e-05 [loop_unroll]: 0.00043913 [opt_after_cconv]: 0.00011106, [1] [Cycle 1]: 0.00010472, [7] [c_1]: 3.189e-05 [parameter_eliminate]: 3.63e-06 [updatestate_depend_eliminate]: 6.33e-06 [updatestate_assign_eliminate]: 2.90002e-06 [updatestate_loads_eliminate]: 2.66e-06 [cse]: 2.23e-05 [renormalize]: 5.3001e-07 [remove_dup_value]: 1.471e-05 [tuple_transform]: 7.97e-05, [1] [Cycle 1]: 7.491e-05, [4] [d_1]: 4.743e-05 [none_parameter_eliminate]: 1.59998e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 7.24001e-06 [partial_unused_args_eliminate]: 2.15002e-06 [add_recomputation]: 5.316e-05 [cse_after_recomputation]: 2.536e-05, [1] [Cycle 1]: 2.01e-05, [1] [cse]: 1.471e-05 [environ_conv]: 8.27e-06 [swap_dp_allreduce_reducescatter]: 5.79999e-06 [bias_add_comm_swap]: 3.48999e-06 [label_micro_interleaved_index]: 4.32998e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.62001e-06 [micro_interleaved_order_control]: 2.38002e-06 [assign_add_opt]: 1.51002e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.22999e-06 [reorder_send_recv_between_fp_bp]: 3.16999e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.10001e-06 [interleave_split_concat_branches]: 1.38002e-06 [interleave_parallel_branches]: 1.19e-06 [overlap_opt_shard_in_pipeline]: 1.34e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72999e-06 [control_data_broadcast_order]: 1.476e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 4.53001e-06 [overlap_recompute_and_grad_model_parallel]: 5.04e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.26002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49998e-06 [overlap_recompute_comm]: 2.63e-06 [overlap_grad_ring_attention]: 4.63001e-06 [overlap_grad_flash_sp]: 2.278e-05 [begin_end_overlap_inline]: 5.79981e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 1.11002e-06 [symbol_engine_optimizer]: 8.711e-05, [1] [Cycle 1]: 8.241e-05, [6] [build]: 9.27001e-06 [elim_shapecalc]: 1.087e-05 [elim_not_effective]: 1.497e-05 [opt_reshape]: 7.48999e-06 [fold_const_symbol]: 1.136e-05 [renormalize]: 2.79979e-07 [detach_backward]: 2.46e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 2.118e-05 [get_jit_bprop_graph]: 1.97999e-06 [rewriter_after_jit_bprop_graph]: 4.02e-06 [opt_after_jit_grad]: 0.00047992 [validate]: 4.53e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.051848 [execute]: 1.01e-05 Sums bootstrap : 0.000499s : 0.61% type_inference : 0.012078s : 14.78% event_method : 0.000048s : 0.06% auto_monad : 0.000133s : 0.16% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.06% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.05% optimize.rewriter_before_opt_a : 0.000163s : 0.20% optimize.opt_a.expand_dump_flag : 0.000008s : 0.01% optimize.opt_a.switch_simplify : 0.000187s : 0.23% optimize.opt_a.loop_unroll : 0.000117s : 0.14% optimize.opt_a.a_1 : 0.002961s : 3.62% optimize.opt_a.with_stream_mark : 0.000051s : 0.06% optimize.opt_a.recompute_prepare : 0.000040s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000018s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.02% optimize.opt_a.parameter_eliminate : 0.000006s : 0.01% optimize.opt_a.a_2 : 0.000424s : 0.52% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.07% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.01% optimize.opt_a.shard_inline : 0.000030s : 0.04% optimize.opt_a.merge_send_recv : 0.000033s : 0.04% optimize.opt_a.auto_parallel : 0.000027s : 0.03% optimize.opt_a.parallel : 0.000034s : 0.04% optimize.opt_a.flash_sp : 0.000017s : 0.02% optimize.opt_a.merge_comm : 0.000018s : 0.02% optimize.opt_a.allreduce_fusion : 0.000016s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000035s : 0.04% optimize.opt_a.virtual_dataset : 0.000030s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000029s : 0.04% optimize.opt_a.virtual_output : 0.000028s : 0.03% optimize.opt_a.merge_forward : 0.000017s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000037s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000057s : 0.07% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.01% optimize.opt_a.before_grad : 0.000052s : 0.06% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.02% optimize.opt_a.meta_fg_expand : 0.001637s : 2.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.01% optimize.opt_a.receive_attached : 0.000006s : 0.01% optimize.opt_a.after_resolve : 0.000093s : 0.11% optimize.opt_a.a_after_grad : 0.000111s : 0.14% optimize.opt_a.renormalize : 0.007409s : 9.07% optimize.opt_a.add_forward_monad_depend : 0.000019s : 0.02% optimize.opt_a.auto_monad_grad : 0.000010s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000075s : 0.09% optimize.opt_a.cse : 0.000235s : 0.29% optimize.opt_a.a_3 : 0.000436s : 0.53% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000044s : 0.05% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000621s : 0.76% optimize.opt_b.b_1 : 0.000154s : 0.19% optimize.opt_b.b_2 : 0.000009s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000024s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000028s : 0.03% optimize.loop_unroll : 0.000439s : 0.54% optimize.opt_after_cconv.c_1 : 0.000032s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000022s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.02% optimize.tuple_transform.d_1 : 0.000047s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.07% optimize.cse_after_recomputation.cse : 0.000015s : 0.02% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000023s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000480s : 0.59% validate : 0.000045s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.051848s : 63.44% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000760 161 7.71% : 0.000059s : 8: substitution.arithmetic_simplify 0.31% : 0.000002s : 3: substitution.elim_not_effective 0.64% : 0.000005s : 5: substitution.float_depend_g_call 0.51% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.23% : 0.000002s : 3: substitution.fold_const_symbol 0.94% : 0.000007s : 4: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 57.62% : 0.000438s : 17: substitution.inline 2.44% : 0.000019s : 2: substitution.inline_without_move 1.45% : 0.000011s : 15: substitution.j_node_and_user_rematch 2.35% : 0.000018s : 3: substitution.less_batch_normalization 1.47% : 0.000011s : 7: substitution.minmaximum_grad 0.85% : 0.000006s : 5: substitution.partial_eliminate 1.60% : 0.000012s : 15: substitution.remove_not_recompute_node 3.80% : 0.000029s : 10: substitution.replace_applicator 1.49% : 0.000011s : 10: substitution.replace_old_param 0.36% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.93% : 0.000022s : 7: substitution.tuple_list_convert_item_index_to_positive 1.51% : 0.000012s : 7: substitution.tuple_list_get_item_const_eliminator 2.01% : 0.000015s : 7: substitution.tuple_list_get_item_depend_reorder 7.18% : 0.000055s : 19: substitution.tuple_list_get_item_eliminator 1.95% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011992 2 85.53% : 0.010256s : 1: type_inference.infer 14.47% : 0.001735s : 1: type_inference.specialize ------[replace.] 0.000201 27 64.17% : 0.000129s : 17: replace.inline 35.83% : 0.000072s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000454 27 94.27% : 0.000428s : 17: match.inline 5.73% : 0.000026s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000749 4248 1.06% : 0.000008s : 53: predicate.accumulaten_eliminater 0.24% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.41% : 0.000003s : 21: predicate.addn_check_dump 1.05% : 0.000008s : 53: predicate.addn_zero_filter 1.00% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 1.86% : 0.000014s : 74: predicate.arithmetic_simplify 1.07% : 0.000008s : 53: predicate.cast_eliminate 1.03% : 0.000008s : 50: predicate.check_bprop_eliminate 0.42% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.44% : 0.000003s : 21: predicate.depend_value_elim 1.08% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.14% : 0.000009s : 53: predicate.dict_get_item_eliminator 1.08% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000001s : 4: predicate.elim_not_effective 0.11% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.11% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.09% : 0.000008s : 57: predicate.environ_get_depend_swap 1.54% : 0.000011s : 78: predicate.environ_get_eliminate 1.10% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.69% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.35% : 0.000018s : 80: predicate.float_depend_g_call 0.42% : 0.000003s : 21: predicate.float_environ_get_switch 0.52% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.05% : 0.000000s : 4: predicate.fold_const_symbol 0.47% : 0.000004s : 21: predicate.get_grad_eliminate 0.06% : 0.000000s : 4: predicate.graph_param_transform 0.48% : 0.000004s : 21: predicate.incorporate_call 0.42% : 0.000003s : 21: predicate.incorporate_call_switch 5.48% : 0.000041s : 183: predicate.inline 1.32% : 0.000010s : 45: predicate.inline_without_move 0.31% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.62% : 0.000005s : 21: predicate.less_batch_normalization 1.45% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.43% : 0.000018s : 124: predicate.load_eliminater 0.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.48% : 0.000019s : 113: predicate.loop_unroll_before_grad 1.28% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.44% : 0.000003s : 21: predicate.merge_addn 0.99% : 0.000007s : 50: predicate.micro_step_allgather_replace 1.03% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.06% : 0.000008s : 53: predicate.minmaximum_grad 0.32% : 0.000002s : 4: predicate.mutable_eliminate 0.10% : 0.000001s : 4: predicate.opt_reshape 0.13% : 0.000001s : 4: predicate.parallel_virtual_node 2.00% : 0.000015s : 80: predicate.partial_defer_inline 1.59% : 0.000012s : 67: predicate.partial_eliminate 1.04% : 0.000008s : 53: predicate.print_const_string_wrapper 0.44% : 0.000003s : 21: predicate.reduce_all_const_elim 1.31% : 0.000010s : 53: predicate.reduce_eliminate 2.46% : 0.000018s : 124: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 21: predicate.remove_not_recompute_node 1.78% : 0.000013s : 113: predicate.replace_applicator 0.66% : 0.000005s : 45: predicate.replace_old_param 0.07% : 0.000001s : 4: predicate.reset_defer_inline 1.06% : 0.000008s : 53: predicate.reshape_eliminate 1.02% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.11% : 0.000001s : 4: predicate.row_tensor_eliminate 1.20% : 0.000009s : 50: predicate.same_eliminate 0.32% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.52% : 0.000004s : 21: predicate.shard_identity_eliminate 0.20% : 0.000001s : 8: predicate.special_op_eliminate 0.57% : 0.000004s : 21: predicate.specialize_transform 1.16% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.09% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.82% : 0.000014s : 80: predicate.switch_defer_inline 2.81% : 0.000021s : 130: predicate.switch_layer_defer_inline 11.78% : 0.000088s : 218: predicate.switch_simplify 1.04% : 0.000008s : 53: predicate.tile_eliminate 1.04% : 0.000008s : 53: predicate.transpose_eliminate 1.32% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.45% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.26% : 0.000009s : 61: predicate.tuple_list_get_item_depend_reorder 2.49% : 0.000019s : 92: predicate.tuple_list_get_item_eliminator 1.36% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 1.84% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.53% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.42% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 2.90% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.10% : 0.000001s : 4: predicate.value_based_eliminate 0.51% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.50% : 0.000004s : 21: predicate.virtual_output_eliminate 0.08% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.13% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001882 36 59.21% : 0.001114s : 15: func_graph_cloner_run.FuncGraphClonerGraph 40.79% : 0.000768s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.119143 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.63% : 0.003138s : 1: add_attr 2.63% : 0.003128s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.12% : 0.000140s : 1: auto_monad 0.02% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.45% : 0.000535s : 1: bootstrap 0.03% : 0.000032s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000028s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.05% : 0.000055s : 1: event_method 0.02% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.38% : 0.000447s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.53% : 0.000633s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000016s : 1: opt.transform.mutable_eliminate 3.81% : 0.004542s : 117: opt.transform.opt_a 0.03% : 0.000030s : 1: opt.transform.opt_after_cconv 0.02% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000123s : 28: opt.transform.opt_b 0.04% : 0.000052s : 2: opt.transform.opt_trans_graph 0.03% : 0.000040s : 4: opt.transform.symbol_engine_opt 12.82% : 0.015280s : 1: opt_a 0.10% : 0.000114s : 1: opt_after_cconv 0.41% : 0.000490s : 1: opt_after_jit_grad 0.21% : 0.000251s : 1: opt_b 14.79% : 0.017627s : 1: optimize 0.02% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000055s : 1: pre_auto_parallel 0.04% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 4.82% : 0.005748s : 2: renormalize.infer 1.38% : 0.001645s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000048s : 1: rewriter_after_opt_a 0.14% : 0.000168s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000090s : 1: symbol_engine_optimizer 43.54% : 0.051874s : 1: task_emit 0.07% : 0.000083s : 1: tuple_transform 10.15% : 0.012096s : 1: type_inference 0.06% : 0.000073s : 1: validate TotalTime = 0.0594306, [24] [bootstrap]: 0.00050026 [type_inference]: 0.00597686 [event_method]: 1.404e-05 [auto_monad]: 6.304e-05 [graph_reusing]: 5.44e-06 [inline]: 2.49999e-06 [add_attr]: 0.00314578, [1] [add_attr_with_inline]: 0.00313738, [1] [Cycle 1]: 5.291e-05, [2] [tag_attr]: 1.415e-05 [meta_addattr_fg_expand]: 4.16001e-06 [parallel-infer-symbol]: 3.38e-06 [pre_auto_parallel]: 2.531e-05 [insert-virtual-dataset]: 2.81e-06 [parallel-infer-symbol-second]: 1.12999e-06 [dataset_repeat_opt]: 2.03002e-06 [pipeline_split]: 1.92001e-06 [optimize]: 0.00409085, [53] [py_interpret_to_execute]: 2.025e-05 [rewriter_before_opt_a]: 5.196e-05 [opt_a]: 0.00213661, [2] [Cycle 1]: 0.0014985, [45] [expand_dump_flag]: 3.36999e-06 [switch_simplify]: 2.931e-05 [loop_unroll]: 1.725e-05 [a_1]: 0.00036362 [with_stream_mark]: 1.662e-05 [recompute_prepare]: 7.97998e-06 [updatestate_depend_eliminate]: 4.03001e-06 [updatestate_assign_eliminate]: 3.75998e-06 [updatestate_loads_eliminate]: 3.43999e-06 [parameter_eliminate]: 1.97999e-06 [a_2]: 8.258e-05 [accelerated_algorithm]: 6.82002e-06 [shard]: 2.27999e-06 [meta_shard_fg_expand]: 1.94e-06 [shard_inline]: 6.20002e-06 [merge_send_recv]: 8.50999e-06 [auto_parallel]: 6.75998e-06 [parallel]: 1.994e-05 [flash_sp]: 7.77e-06 [merge_comm]: 3.66999e-06 [allreduce_fusion]: 3.48e-06 [matmul_add_comm_reduction]: 9.62001e-06 [allreduce_slice_to_reducescatter]: 1.09e-06 [virtual_shard_identity]: 7.5e-06 [virtual_dataset]: 6.07999e-06 [get_grad_eliminate_]: 5.69e-06 [virtual_output]: 5.89e-06 [merge_forward]: 3.9e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.79999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.174e-05 [merge_recompute_call_nodes]: 1.74e-06 [before_grad]: 1.056e-05 [set_forward_comm_id_for_comm_node_pass]: 3.69002e-06 [meta_fg_expand]: 2.77002e-06 [flash_sp_send_recv_attached]: 2.41e-06 [receive_attached]: 2.01e-06 [after_resolve]: 9.96998e-06 [a_after_grad]: 8.56997e-06 [renormalize]: 0.00046116 [add_forward_monad_depend]: 5.37999e-06 [auto_monad_grad]: 2.43e-06 [auto_monad_eliminator]: 1.38e-05 [cse]: 3.132e-05 [a_3]: 4.334e-05 [Cycle 2]: 0.00062804, [45] [expand_dump_flag]: 1.15001e-06 [switch_simplify]: 7.23999e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00011632 [with_stream_mark]: 2.531e-05 [recompute_prepare]: 6.69999e-06 [updatestate_depend_eliminate]: 3.38999e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 7.424e-05 [accelerated_algorithm]: 5.73002e-06 [shard]: 1.27e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.96998e-06 [merge_send_recv]: 4.94e-06 [auto_parallel]: 5.74e-06 [parallel]: 4.51002e-06 [flash_sp]: 3.10998e-06 [merge_comm]: 3.31999e-06 [allreduce_fusion]: 3.04999e-06 [matmul_add_comm_reduction]: 5.47001e-06 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 6.47001e-06 [virtual_dataset]: 5.51e-06 [get_grad_eliminate_]: 5.16002e-06 [virtual_output]: 5.24003e-06 [merge_forward]: 3.08e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 6.13002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.08e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.64e-06 [set_forward_comm_id_for_comm_node_pass]: 3.69002e-06 [meta_fg_expand]: 1.81e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 8.44002e-06 [a_after_grad]: 7.85e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.47001e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 6.89001e-06 [cse]: 1.537e-05 [a_3]: 3.348e-05 [py_interpret_to_execute_after_opt_a]: 8.60001e-06 [slice_cell_reuse_recomputed_activation]: 2.02001e-06 [rewriter_after_opt_a]: 3.334e-05 [convert_after_rewriter]: 6.79999e-06 [order_py_execute_after_rewriter]: 5.32001e-06 [mutable_eliminate]: 0.00050059 [opt_b]: 0.00019458, [1] [Cycle 1]: 0.00018711, [7] [b_1]: 0.00011306 [b_2]: 7.20998e-06 [updatestate_depend_eliminate]: 5.29998e-06 [updatestate_assign_eliminate]: 2.70997e-06 [updatestate_loads_eliminate]: 2.48002e-06 [renormalize]: 3.00002e-07 [cse]: 1.894e-05 [optimize_parallel_all_gather_comm]: 1.749e-05 [overlap_param_gather]: 2.03997e-06 [cconv]: 2.443e-05 [loop_unroll]: 0.0004299 [opt_after_cconv]: 9.899e-05, [1] [Cycle 1]: 9.273e-05, [7] [c_1]: 2.638e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.26998e-06 [cse]: 1.788e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 2.059e-05 [tuple_transform]: 7.097e-05, [1] [Cycle 1]: 6.607e-05, [4] [d_1]: 3.878e-05 [none_parameter_eliminate]: 1.67001e-06 [renormalize]: 2.79979e-07 [switch_simplify]: 6.46e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.672e-05 [cse_after_recomputation]: 2.142e-05, [1] [Cycle 1]: 1.661e-05, [1] [cse]: 1.126e-05 [environ_conv]: 4.95001e-06 [swap_dp_allreduce_reducescatter]: 5.14998e-06 [bias_add_comm_swap]: 3.13e-06 [label_micro_interleaved_index]: 4.57e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.65001e-06 [slice_recompute_activation]: 2.31998e-06 [micro_interleaved_order_control]: 2.88e-06 [assign_add_opt]: 1.60001e-06 [ForceFp32Comm]: 1.09e-06 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.85002e-06 [reorder_send_recv_between_fp_bp]: 3.01999e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.08001e-06 [interleave_split_concat_branches]: 1.38002e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.28002e-06 [overlap_opt_shard_grad_in_pipeline]: 2.04999e-06 [control_data_broadcast_order]: 1.273e-05 [grouped_pairwise_exchange_alltoall]: 1.76e-06 [offloading_packed_experts]: 4.25999e-06 [overlap_recompute_and_grad_model_parallel]: 4.82e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.52001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.35997e-06 [overlap_grad_ring_attention]: 4.52e-06 [overlap_grad_flash_sp]: 1.822e-05 [begin_end_overlap_inline]: 6.39993e-07 [split_matmul_comm_elemetwise]: 2.71e-06 [split_layernorm_comm]: 2.12001e-06 [handle_group_info]: 1.26997e-06 [symbol_engine_optimizer]: 7.207e-05, [1] [Cycle 1]: 6.74e-05, [6] [build]: 2.43998e-06 [elim_shapecalc]: 8.69998e-06 [elim_not_effective]: 1.195e-05 [opt_reshape]: 6.29001e-06 [fold_const_symbol]: 9.29e-06 [renormalize]: 2.60014e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.53002e-06 [auto_monad_reorder]: 1.647e-05 [get_jit_bprop_graph]: 1.46002e-06 [rewriter_after_jit_bprop_graph]: 4.00998e-06 [opt_after_jit_grad]: 0.00046181 [validate]: 3.686e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0448419 [execute]: 9.93998e-06 Sums bootstrap : 0.000500s : 0.91% type_inference : 0.005977s : 10.82% event_method : 0.000014s : 0.03% auto_monad : 0.000063s : 0.11% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000025s : 0.05% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.04% optimize.rewriter_before_opt_a : 0.000052s : 0.09% optimize.opt_a.expand_dump_flag : 0.000005s : 0.01% optimize.opt_a.switch_simplify : 0.000037s : 0.07% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000480s : 0.87% optimize.opt_a.with_stream_mark : 0.000042s : 0.08% optimize.opt_a.recompute_prepare : 0.000015s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000157s : 0.28% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000024s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.03% optimize.opt_a.renormalize : 0.000461s : 0.83% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.04% optimize.opt_a.cse : 0.000047s : 0.08% optimize.opt_a.a_3 : 0.000077s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000501s : 0.91% optimize.opt_b.b_1 : 0.000113s : 0.20% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.04% optimize.loop_unroll : 0.000430s : 0.78% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000021s : 0.04% optimize.tuple_transform.d_1 : 0.000039s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000462s : 0.84% validate : 0.000037s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.044842s : 81.15% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000149 24 21.13% : 0.000031s : 4: substitution.arithmetic_simplify 1.22% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000001s : 2: substitution.fold_const_symbol 3.93% : 0.000006s : 3: substitution.graph_param_transform 65.17% : 0.000097s : 3: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.17% : 0.000005s : 4: substitution.remove_not_recompute_node 2.20% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005926 2 91.82% : 0.005441s : 1: type_inference.infer 8.18% : 0.000485s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000095 3 100.00% : 0.000095s : 3: match.inline ------[predicate.] 0.000148 815 0.89% : 0.000001s : 8: predicate.accumulaten_eliminater 0.91% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000001s : 8: predicate.addn_zero_filter 0.84% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.53% : 0.000004s : 14: predicate.arithmetic_simplify 0.88% : 0.000001s : 8: predicate.cast_eliminate 0.74% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.68% : 0.000001s : 6: predicate.depend_value_elim 0.90% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.97% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_depend_swap 1.80% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.15% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.24% : 0.000003s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 6: predicate.float_environ_get_switch 0.89% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000001s : 6: predicate.get_grad_eliminate 0.26% : 0.000000s : 3: predicate.graph_param_transform 0.84% : 0.000001s : 6: predicate.incorporate_call 0.63% : 0.000001s : 6: predicate.incorporate_call_switch 6.41% : 0.000009s : 37: predicate.inline 0.95% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 6: predicate.less_batch_normalization 1.51% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.25% : 0.000003s : 22: predicate.load_eliminater 1.08% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.00% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.65% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.67% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 8: predicate.minmaximum_grad 1.19% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.60% : 0.000002s : 11: predicate.partial_defer_inline 1.32% : 0.000002s : 11: predicate.partial_eliminate 0.84% : 0.000001s : 8: predicate.print_const_string_wrapper 0.69% : 0.000001s : 6: predicate.reduce_all_const_elim 1.26% : 0.000002s : 8: predicate.reduce_eliminate 2.31% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.65% : 0.000001s : 6: predicate.remove_not_recompute_node 1.32% : 0.000002s : 14: predicate.replace_applicator 0.64% : 0.000001s : 6: predicate.replace_old_param 0.32% : 0.000000s : 3: predicate.reset_defer_inline 0.86% : 0.000001s : 8: predicate.reshape_eliminate 0.66% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 3: predicate.row_tensor_eliminate 0.87% : 0.000001s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 6: predicate.shard_identity_eliminate 0.82% : 0.000001s : 6: predicate.special_op_eliminate 0.90% : 0.000001s : 6: predicate.specialize_transform 1.09% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.24% : 0.000002s : 11: predicate.switch_defer_inline 1.91% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.88% : 0.000007s : 38: predicate.switch_simplify 0.88% : 0.000001s : 8: predicate.tile_eliminate 0.90% : 0.000001s : 8: predicate.transpose_eliminate 1.53% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.19% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.00% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 3: predicate.value_based_eliminate 0.72% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000295 7 37.46% : 0.000110s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.54% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.068132 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.62% : 0.003150s : 1: add_attr 4.61% : 0.003141s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000068s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.78% : 0.000531s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000020s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.64% : 0.000439s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.75% : 0.000510s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.25% : 0.000852s : 78: opt.transform.opt_a 0.04% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000092s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 3.14% : 0.002140s : 1: opt_a 0.15% : 0.000102s : 1: opt_after_cconv 0.69% : 0.000471s : 1: opt_after_jit_grad 0.29% : 0.000198s : 1: opt_b 6.01% : 0.004095s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.04% : 0.000024s : 1: py_interpret_to_execute 0.02% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.04% : 0.000024s : 1: remove_dup_value 0.37% : 0.000251s : 1: renormalize.infer 0.30% : 0.000204s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000037s : 1: rewriter_after_opt_a 0.08% : 0.000056s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000075s : 1: symbol_engine_optimizer 65.85% : 0.044867s : 1: task_emit 0.11% : 0.000074s : 1: tuple_transform 8.80% : 0.005995s : 1: type_inference 0.09% : 0.000062s : 1: validate TotalTime = 0.0782709, [24] [bootstrap]: 0.00047623 [type_inference]: 0.0120266 [event_method]: 4.613e-05 [auto_monad]: 0.0001322 [graph_reusing]: 9.27001e-06 [inline]: 2.69001e-06 [add_attr]: 0.00315642, [1] [add_attr_with_inline]: 0.00314772, [1] [Cycle 1]: 7.224e-05, [2] [tag_attr]: 3.126e-05 [meta_addattr_fg_expand]: 1.079e-05 [parallel-infer-symbol]: 3.10002e-06 [pre_auto_parallel]: 4.873e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 8.59989e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.91998e-06 [optimize]: 0.016834, [53] [py_interpret_to_execute]: 3.851e-05 [rewriter_before_opt_a]: 0.00014603 [opt_a]: 0.0145425, [3] [Cycle 1]: 0.0110836, [45] [expand_dump_flag]: 4.05998e-06 [switch_simplify]: 7.4e-05 [loop_unroll]: 6.023e-05 [a_1]: 0.00136421 [with_stream_mark]: 2.471e-05 [recompute_prepare]: 2.268e-05 [updatestate_depend_eliminate]: 8.22e-06 [updatestate_assign_eliminate]: 7.6e-06 [updatestate_loads_eliminate]: 6.98e-06 [parameter_eliminate]: 2.61999e-06 [a_2]: 0.00024673 [accelerated_algorithm]: 3.203e-05 [shard]: 1.96e-06 [meta_shard_fg_expand]: 3.49001e-06 [shard_inline]: 1.609e-05 [merge_send_recv]: 1.674e-05 [auto_parallel]: 1.036e-05 [parallel]: 2.09e-05 [flash_sp]: 1.175e-05 [merge_comm]: 9.44998e-06 [allreduce_fusion]: 8.50001e-06 [matmul_add_comm_reduction]: 2.812e-05 [allreduce_slice_to_reducescatter]: 9.89996e-07 [virtual_shard_identity]: 1.714e-05 [virtual_dataset]: 1.54e-05 [get_grad_eliminate_]: 1.485e-05 [virtual_output]: 1.487e-05 [merge_forward]: 8.85999e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 1.765e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.952e-05 [merge_recompute_call_nodes]: 1.60999e-06 [before_grad]: 2.954e-05 [set_forward_comm_id_for_comm_node_pass]: 9.42001e-06 [meta_fg_expand]: 0.00153425 [flash_sp_send_recv_attached]: 4.65001e-06 [receive_attached]: 2.11e-06 [after_resolve]: 6.463e-05 [a_after_grad]: 8.925e-05 [renormalize]: 0.0063266 [add_forward_monad_depend]: 1.023e-05 [auto_monad_grad]: 6.59001e-06 [auto_monad_eliminator]: 5.38e-05 [cse]: 0.00018943 [a_3]: 0.00035636 [Cycle 2]: 0.00275955, [45] [expand_dump_flag]: 2.02001e-06 [switch_simplify]: 4.699e-05 [loop_unroll]: 4.242e-05 [a_1]: 0.00134924 [with_stream_mark]: 1.308e-05 [recompute_prepare]: 9.86998e-06 [updatestate_depend_eliminate]: 4.48001e-06 [updatestate_assign_eliminate]: 3.80998e-06 [updatestate_loads_eliminate]: 3.19001e-06 [parameter_eliminate]: 1.79998e-06 [a_2]: 9.034e-05 [accelerated_algorithm]: 1.07e-05 [shard]: 1.32999e-06 [meta_shard_fg_expand]: 2.29001e-06 [shard_inline]: 6.94999e-06 [merge_send_recv]: 7.12002e-06 [auto_parallel]: 8.87e-06 [parallel]: 7.18998e-06 [flash_sp]: 3.86001e-06 [merge_comm]: 3.96001e-06 [allreduce_fusion]: 3.88001e-06 [matmul_add_comm_reduction]: 8.44998e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 7.92e-06 [virtual_dataset]: 6.47001e-06 [get_grad_eliminate_]: 6.51e-06 [virtual_output]: 6.04999e-06 [merge_forward]: 3.88999e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.022e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.369e-05 [merge_recompute_call_nodes]: 1.20999e-06 [before_grad]: 1.182e-05 [set_forward_comm_id_for_comm_node_pass]: 4.27998e-06 [meta_fg_expand]: 5.85e-05 [flash_sp_send_recv_attached]: 1.30999e-06 [receive_attached]: 1.53002e-06 [after_resolve]: 1.191e-05 [a_after_grad]: 1.011e-05 [renormalize]: 0.00063607 [add_forward_monad_depend]: 4.03001e-06 [auto_monad_grad]: 1.60001e-06 [auto_monad_eliminator]: 1.201e-05 [cse]: 2.292e-05 [a_3]: 4.844e-05 [Cycle 3]: 0.00068198, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 8.3e-06 [loop_unroll]: 7.18998e-06 [a_1]: 0.00014784 [with_stream_mark]: 7.78999e-06 [recompute_prepare]: 6.89999e-06 [updatestate_depend_eliminate]: 3.83999e-06 [updatestate_assign_eliminate]: 2.96001e-06 [updatestate_loads_eliminate]: 2.55997e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 8.737e-05 [accelerated_algorithm]: 9.72001e-06 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.35001e-06 [shard_inline]: 6.74999e-06 [merge_send_recv]: 5.38002e-06 [auto_parallel]: 5.92999e-06 [parallel]: 4.53999e-06 [flash_sp]: 9.89996e-07 [merge_comm]: 3.65e-06 [allreduce_fusion]: 3.63e-06 [matmul_add_comm_reduction]: 6.11e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 7.98999e-06 [virtual_dataset]: 6.32001e-06 [get_grad_eliminate_]: 6.29001e-06 [virtual_output]: 6.09001e-06 [merge_forward]: 3.46001e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 7.06999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.268e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.082e-05 [set_forward_comm_id_for_comm_node_pass]: 3.93001e-06 [meta_fg_expand]: 2.49001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.85999e-06 [a_after_grad]: 9.40001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 7.15e-06 [cse]: 1.609e-05 [a_3]: 3.951e-05 [py_interpret_to_execute_after_opt_a]: 1.321e-05 [slice_cell_reuse_recomputed_activation]: 2.26e-06 [rewriter_after_opt_a]: 4.215e-05 [convert_after_rewriter]: 7.80998e-06 [order_py_execute_after_rewriter]: 5.32001e-06 [mutable_eliminate]: 0.00062956 [opt_b]: 0.00022901, [1] [Cycle 1]: 0.00022202, [7] [b_1]: 0.00014303 [b_2]: 8.70001e-06 [updatestate_depend_eliminate]: 5.92001e-06 [updatestate_assign_eliminate]: 3.30003e-06 [updatestate_loads_eliminate]: 3.06001e-06 [renormalize]: 5.40022e-07 [cse]: 2.118e-05 [optimize_parallel_all_gather_comm]: 1.813e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.371e-05 [loop_unroll]: 0.0004327 [opt_after_cconv]: 0.00010892, [1] [Cycle 1]: 0.00010305, [7] [c_1]: 3.265e-05 [parameter_eliminate]: 2.46e-06 [updatestate_depend_eliminate]: 5.69999e-06 [updatestate_assign_eliminate]: 3.07002e-06 [updatestate_loads_eliminate]: 2.86999e-06 [cse]: 2.08e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.627e-05 [tuple_transform]: 7.777e-05, [1] [Cycle 1]: 7.339e-05, [4] [d_1]: 4.553e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 7.31001e-06 [partial_unused_args_eliminate]: 2.02001e-06 [add_recomputation]: 5.137e-05 [cse_after_recomputation]: 2.507e-05, [1] [Cycle 1]: 2.035e-05, [1] [cse]: 1.489e-05 [environ_conv]: 8.70001e-06 [swap_dp_allreduce_reducescatter]: 6.06e-06 [bias_add_comm_swap]: 2.88e-06 [label_micro_interleaved_index]: 4.44002e-06 [label_fine_grained_interleaved_index]: 2.77002e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.11998e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.63002e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.60997e-06 [reorder_send_recv_between_fp_bp]: 2.82002e-06 [comm_op_add_attrs]: 1.62999e-06 [add_comm_op_reuse_tag]: 1.07998e-06 [interleave_split_concat_branches]: 1.22999e-06 [interleave_parallel_branches]: 1.11002e-06 [overlap_opt_shard_in_pipeline]: 1.30001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.398e-05 [grouped_pairwise_exchange_alltoall]: 1.86e-06 [offloading_packed_experts]: 5.05999e-06 [overlap_recompute_and_grad_model_parallel]: 5.51998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.66e-06 [overlap_recompute_comm]: 2.34001e-06 [overlap_grad_ring_attention]: 4.55001e-06 [overlap_grad_flash_sp]: 2.176e-05 [begin_end_overlap_inline]: 5.40022e-07 [split_matmul_comm_elemetwise]: 2.31998e-06 [split_layernorm_comm]: 1.75001e-06 [handle_group_info]: 1.03001e-06 [symbol_engine_optimizer]: 8.65e-05, [1] [Cycle 1]: 8.215e-05, [6] [build]: 9.83998e-06 [elim_shapecalc]: 1.041e-05 [elim_not_effective]: 1.438e-05 [opt_reshape]: 7.5e-06 [fold_const_symbol]: 1.146e-05 [renormalize]: 2.20025e-07 [detach_backward]: 2.31e-06 [pipeline_parallel_scheduler]: 1.64e-06 [auto_monad_reorder]: 2.082e-05 [get_jit_bprop_graph]: 1.66e-06 [rewriter_after_jit_bprop_graph]: 3.33e-06 [opt_after_jit_grad]: 0.0004764 [validate]: 4.392e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0447346 [execute]: 8.38999e-06 Sums bootstrap : 0.000476s : 0.65% type_inference : 0.012027s : 16.30% event_method : 0.000046s : 0.06% auto_monad : 0.000132s : 0.18% graph_reusing : 0.000009s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.07% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.05% optimize.rewriter_before_opt_a : 0.000146s : 0.20% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000129s : 0.18% optimize.opt_a.loop_unroll : 0.000110s : 0.15% optimize.opt_a.a_1 : 0.002861s : 3.88% optimize.opt_a.with_stream_mark : 0.000046s : 0.06% optimize.opt_a.recompute_prepare : 0.000039s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000424s : 0.58% optimize.opt_a.accelerated_algorithm : 0.000052s : 0.07% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000030s : 0.04% optimize.opt_a.merge_send_recv : 0.000029s : 0.04% optimize.opt_a.auto_parallel : 0.000025s : 0.03% optimize.opt_a.parallel : 0.000033s : 0.04% optimize.opt_a.flash_sp : 0.000017s : 0.02% optimize.opt_a.merge_comm : 0.000017s : 0.02% optimize.opt_a.allreduce_fusion : 0.000016s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.06% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000033s : 0.04% optimize.opt_a.virtual_dataset : 0.000028s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.04% optimize.opt_a.virtual_output : 0.000027s : 0.04% optimize.opt_a.merge_forward : 0.000016s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.05% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000056s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000052s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.02% optimize.opt_a.meta_fg_expand : 0.001595s : 2.16% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000085s : 0.12% optimize.opt_a.a_after_grad : 0.000109s : 0.15% optimize.opt_a.renormalize : 0.006963s : 9.44% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.02% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000073s : 0.10% optimize.opt_a.cse : 0.000228s : 0.31% optimize.opt_a.a_3 : 0.000444s : 0.60% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000042s : 0.06% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000630s : 0.85% optimize.opt_b.b_1 : 0.000143s : 0.19% optimize.opt_b.b_2 : 0.000009s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000021s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000433s : 0.59% optimize.opt_after_cconv.c_1 : 0.000033s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000021s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.02% optimize.tuple_transform.d_1 : 0.000046s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.07% optimize.cse_after_recomputation.cse : 0.000015s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000002s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000022s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000476s : 0.65% validate : 0.000044s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.044735s : 60.63% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000716 159 6.93% : 0.000050s : 7: substitution.arithmetic_simplify 0.33% : 0.000002s : 3: substitution.elim_not_effective 0.63% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 3: substitution.fold_const_symbol 0.86% : 0.000006s : 4: substitution.graph_param_transform 0.40% : 0.000003s : 2: substitution.incorporate_call 0.33% : 0.000002s : 2: substitution.incorporate_call_switch 57.85% : 0.000414s : 17: substitution.inline 2.40% : 0.000017s : 2: substitution.inline_without_move 1.55% : 0.000011s : 15: substitution.j_node_and_user_rematch 2.26% : 0.000016s : 3: substitution.less_batch_normalization 1.64% : 0.000012s : 7: substitution.minmaximum_grad 0.91% : 0.000006s : 5: substitution.partial_eliminate 1.75% : 0.000013s : 15: substitution.remove_not_recompute_node 3.90% : 0.000028s : 10: substitution.replace_applicator 1.26% : 0.000009s : 10: substitution.replace_old_param 0.38% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.08% : 0.000022s : 7: substitution.tuple_list_convert_item_index_to_positive 1.50% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.96% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.20% : 0.000052s : 18: substitution.tuple_list_get_item_eliminator 2.08% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011947 2 87.38% : 0.010439s : 1: type_inference.infer 12.62% : 0.001507s : 1: type_inference.specialize ------[replace.] 0.000191 26 65.77% : 0.000126s : 17: replace.inline 34.23% : 0.000065s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000429 26 94.31% : 0.000405s : 17: match.inline 5.69% : 0.000024s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000680 4180 1.11% : 0.000008s : 52: predicate.accumulaten_eliminater 0.25% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.14% : 0.000008s : 52: predicate.addn_zero_filter 1.08% : 0.000007s : 52: predicate.adjust_all_reduce_mul_add 2.03% : 0.000014s : 73: predicate.arithmetic_simplify 1.14% : 0.000008s : 52: predicate.cast_eliminate 1.17% : 0.000008s : 50: predicate.check_bprop_eliminate 0.47% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.47% : 0.000003s : 21: predicate.depend_value_elim 1.18% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.21% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.19% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.20% : 0.000008s : 56: predicate.environ_get_depend_swap 1.69% : 0.000011s : 77: predicate.environ_get_eliminate 1.17% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.83% : 0.000012s : 78: predicate.exchange_switch_depend_value 2.48% : 0.000017s : 78: predicate.float_depend_g_call 0.47% : 0.000003s : 21: predicate.float_environ_get_switch 0.58% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.07% : 0.000000s : 4: predicate.fold_const_symbol 0.54% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000001s : 4: predicate.graph_param_transform 0.51% : 0.000004s : 21: predicate.incorporate_call 0.47% : 0.000003s : 21: predicate.incorporate_call_switch 5.88% : 0.000040s : 180: predicate.inline 1.47% : 0.000010s : 45: predicate.inline_without_move 0.29% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.66% : 0.000004s : 21: predicate.less_batch_normalization 1.55% : 0.000011s : 69: predicate.list_to_tuple_eliminator_ 2.63% : 0.000018s : 121: predicate.load_eliminater 0.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.58% : 0.000018s : 110: predicate.loop_unroll_before_grad 1.38% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.49% : 0.000003s : 21: predicate.merge_addn 1.13% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 52: predicate.minmaximum_grad 0.31% : 0.000002s : 4: predicate.mutable_eliminate 0.15% : 0.000001s : 4: predicate.opt_reshape 0.10% : 0.000001s : 4: predicate.parallel_virtual_node 2.08% : 0.000014s : 78: predicate.partial_defer_inline 1.72% : 0.000012s : 65: predicate.partial_eliminate 1.11% : 0.000008s : 52: predicate.print_const_string_wrapper 0.50% : 0.000003s : 21: predicate.reduce_all_const_elim 1.37% : 0.000009s : 52: predicate.reduce_eliminate 2.62% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.30% : 0.000002s : 21: predicate.remove_not_recompute_node 1.93% : 0.000013s : 111: predicate.replace_applicator 0.69% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.12% : 0.000008s : 52: predicate.reshape_eliminate 1.12% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.11% : 0.000001s : 4: predicate.row_tensor_eliminate 1.25% : 0.000008s : 50: predicate.same_eliminate 0.35% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.56% : 0.000004s : 21: predicate.shard_identity_eliminate 0.24% : 0.000002s : 8: predicate.special_op_eliminate 0.62% : 0.000004s : 21: predicate.specialize_transform 1.22% : 0.000008s : 50: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.12% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.96% : 0.000013s : 78: predicate.switch_defer_inline 3.04% : 0.000021s : 128: predicate.switch_layer_defer_inline 5.22% : 0.000036s : 213: predicate.switch_simplify 1.12% : 0.000008s : 52: predicate.tile_eliminate 1.11% : 0.000008s : 52: predicate.transpose_eliminate 1.41% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000010s : 60: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.72% : 0.000018s : 90: predicate.tuple_list_get_item_eliminator 1.55% : 0.000011s : 60: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000014s : 81: predicate.tuple_list_set_item_eliminator 1.52% : 0.000010s : 69: predicate.tuple_to_list_eliminator_ 2.60% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.16% : 0.000021s : 142: predicate.updatestate_useless_node_eliminater 0.10% : 0.000001s : 4: predicate.value_based_eliminate 0.51% : 0.000003s : 21: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001751 35 60.32% : 0.001056s : 14: func_graph_cloner_run.FuncGraphClonerGraph 39.68% : 0.000695s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.109794 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.88% : 0.003161s : 1: add_attr 2.87% : 0.003152s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.13% : 0.000140s : 1: auto_monad 0.02% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.47% : 0.000511s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000028s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.05% : 0.000053s : 1: event_method 0.01% : 0.000015s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000014s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.40% : 0.000441s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000639s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000015s : 1: opt.transform.mutable_eliminate 3.98% : 0.004372s : 117: opt.transform.opt_a 0.03% : 0.000031s : 1: opt.transform.opt_after_cconv 0.02% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000123s : 28: opt.transform.opt_b 0.05% : 0.000051s : 2: opt.transform.opt_trans_graph 0.04% : 0.000040s : 4: opt.transform.symbol_engine_opt 13.25% : 0.014546s : 1: opt_a 0.10% : 0.000112s : 1: opt_after_cconv 0.44% : 0.000486s : 1: opt_after_jit_grad 0.21% : 0.000232s : 1: opt_b 15.34% : 0.016839s : 1: optimize 0.02% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000054s : 1: pre_auto_parallel 0.04% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000020s : 1: remove_dup_value 4.90% : 0.005381s : 2: renormalize.infer 1.43% : 0.001567s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000047s : 1: rewriter_after_opt_a 0.14% : 0.000151s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000089s : 1: symbol_engine_optimizer 40.77% : 0.044758s : 1: task_emit 0.07% : 0.000081s : 1: tuple_transform 10.97% : 0.012049s : 1: type_inference 0.06% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x0-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x0-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x1-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x1-pynative],max_mem:6.0M TotalTime = 0.0248741, [24] [bootstrap]: 0.00058052 [type_inference]: 0.00732119 [event_method]: 1.477e-05 [auto_monad]: 6.321e-05 [graph_reusing]: 6.08998e-06 [inline]: 2.78e-06 [add_attr]: 0.00415393, [1] [add_attr_with_inline]: 0.00413998, [1] [Cycle 1]: 6.025e-05, [2] [tag_attr]: 1.861e-05 [meta_addattr_fg_expand]: 4.50001e-06 [parallel-infer-symbol]: 3.8e-06 [pre_auto_parallel]: 2.996e-05 [insert-virtual-dataset]: 2.52001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.13998e-06 [pipeline_split]: 2.02999e-06 [optimize]: 0.00464275, [53] [py_interpret_to_execute]: 2.608e-05 [rewriter_before_opt_a]: 6.909e-05 [opt_a]: 0.00257001, [2] [Cycle 1]: 0.00193736, [45] [expand_dump_flag]: 3.02002e-06 [switch_simplify]: 3.551e-05 [loop_unroll]: 2.095e-05 [a_1]: 0.00047274 [with_stream_mark]: 1.748e-05 [recompute_prepare]: 8.61002e-06 [updatestate_depend_eliminate]: 3.99002e-06 [updatestate_assign_eliminate]: 3.41001e-06 [updatestate_loads_eliminate]: 3.5e-06 [parameter_eliminate]: 2.47001e-06 [a_2]: 8.359e-05 [accelerated_algorithm]: 7.54002e-06 [shard]: 2.19999e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 6.48e-06 [merge_send_recv]: 9.05999e-06 [auto_parallel]: 6.73e-06 [parallel]: 2.968e-05 [flash_sp]: 8.88002e-06 [merge_comm]: 4.28001e-06 [allreduce_fusion]: 3.65003e-06 [matmul_add_comm_reduction]: 1.01e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 8.55001e-06 [virtual_dataset]: 6.43e-06 [get_grad_eliminate_]: 5.65001e-06 [virtual_output]: 5.92001e-06 [merge_forward]: 3.86999e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 1.069e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.157e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 1.029e-05 [set_forward_comm_id_for_comm_node_pass]: 4.47e-06 [meta_fg_expand]: 2.81e-06 [flash_sp_send_recv_attached]: 2.55002e-06 [receive_attached]: 2.54999e-06 [after_resolve]: 1.03e-05 [a_after_grad]: 9.51003e-06 [renormalize]: 0.0007426 [add_forward_monad_depend]: 1.125e-05 [auto_monad_grad]: 2.49001e-06 [auto_monad_eliminator]: 1.613e-05 [cse]: 3.056e-05 [a_3]: 4.619e-05 [Cycle 2]: 0.00062157, [45] [expand_dump_flag]: 1.47999e-06 [switch_simplify]: 7.03e-06 [loop_unroll]: 5.73002e-06 [a_1]: 0.00011688 [with_stream_mark]: 1.174e-05 [recompute_prepare]: 6.39999e-06 [updatestate_depend_eliminate]: 3.01001e-06 [updatestate_assign_eliminate]: 2.28998e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 7.261e-05 [accelerated_algorithm]: 6.16e-06 [shard]: 1.18001e-06 [meta_shard_fg_expand]: 1.30001e-06 [shard_inline]: 5.94999e-06 [merge_send_recv]: 5.35001e-06 [auto_parallel]: 6.58998e-06 [parallel]: 5.35999e-06 [flash_sp]: 3.53e-06 [merge_comm]: 3.31001e-06 [allreduce_fusion]: 2.89999e-06 [matmul_add_comm_reduction]: 6.49999e-06 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 6.29999e-06 [virtual_dataset]: 5.35999e-06 [get_grad_eliminate_]: 5.20999e-06 [virtual_output]: 5.19e-06 [merge_forward]: 3.40998e-06 [cell_reuse_recompute_pass]: 1.57001e-06 [offload_activation]: 7.45e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.021e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.80999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 1.97999e-06 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.44e-06 [after_resolve]: 9.07001e-06 [a_after_grad]: 7.78001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 7.73001e-06 [cse]: 1.53e-05 [a_3]: 3.293e-05 [py_interpret_to_execute_after_opt_a]: 1.049e-05 [slice_cell_reuse_recomputed_activation]: 2.27001e-06 [rewriter_after_opt_a]: 3.365e-05 [convert_after_rewriter]: 6.69001e-06 [order_py_execute_after_rewriter]: 5.17e-06 [mutable_eliminate]: 0.00056189 [opt_b]: 0.00019283, [1] [Cycle 1]: 0.00018531, [7] [b_1]: 0.00011202 [b_2]: 7.35e-06 [updatestate_depend_eliminate]: 6.19001e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 6.89994e-07 [cse]: 1.866e-05 [optimize_parallel_all_gather_comm]: 1.772e-05 [overlap_param_gather]: 2.14e-06 [cconv]: 2.562e-05 [loop_unroll]: 0.00044633 [opt_after_cconv]: 9.949e-05, [1] [Cycle 1]: 9.256e-05, [7] [c_1]: 2.616e-05 [parameter_eliminate]: 3.4e-06 [updatestate_depend_eliminate]: 6.18998e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.675e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.614e-05 [tuple_transform]: 7.012e-05, [1] [Cycle 1]: 6.546e-05, [4] [d_1]: 3.875e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 6.44001e-06 [partial_unused_args_eliminate]: 1.70001e-06 [add_recomputation]: 5.481e-05 [cse_after_recomputation]: 2.141e-05, [1] [Cycle 1]: 1.665e-05, [1] [cse]: 1.108e-05 [environ_conv]: 9.57999e-06 [swap_dp_allreduce_reducescatter]: 5.17999e-06 [bias_add_comm_swap]: 3.11001e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 3.26999e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.55999e-06 [ForceFp32Comm]: 8.89995e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.27999e-06 [reorder_send_recv_between_fp_bp]: 2.89001e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.28002e-06 [interleave_parallel_branches]: 1.45001e-06 [overlap_opt_shard_in_pipeline]: 1.30999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.93997e-06 [control_data_broadcast_order]: 1.319e-05 [grouped_pairwise_exchange_alltoall]: 1.65001e-06 [offloading_packed_experts]: 3.71999e-06 [overlap_recompute_and_grad_model_parallel]: 4.82998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.61e-06 [overlap_grad_ring_attention]: 4.30999e-06 [overlap_grad_flash_sp]: 1.954e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.27001e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 7.421e-05, [1] [Cycle 1]: 6.949e-05, [6] [build]: 2.83e-06 [elim_shapecalc]: 1.04e-05 [elim_not_effective]: 1.273e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 9.69e-06 [renormalize]: 2.19996e-07 [detach_backward]: 2.26e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 1.722e-05 [get_jit_bprop_graph]: 1.94e-06 [rewriter_after_jit_bprop_graph]: 0.00014885 [opt_after_jit_grad]: 0.00050327 [validate]: 3.903e-05 [backend_pass]: 1.30999e-06 [task_emit]: 0.00706764 [execute]: 9.63002e-06 Sums bootstrap : 0.000581s : 2.95% type_inference : 0.007321s : 37.26% event_method : 0.000015s : 0.08% auto_monad : 0.000063s : 0.32% graph_reusing : 0.000006s : 0.03% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000019s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000030s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000026s : 0.13% optimize.rewriter_before_opt_a : 0.000069s : 0.35% optimize.opt_a.expand_dump_flag : 0.000005s : 0.02% optimize.opt_a.switch_simplify : 0.000043s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.14% optimize.opt_a.a_1 : 0.000590s : 3.00% optimize.opt_a.with_stream_mark : 0.000029s : 0.15% optimize.opt_a.recompute_prepare : 0.000015s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000156s : 0.79% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000014s : 0.07% optimize.opt_a.auto_parallel : 0.000013s : 0.07% optimize.opt_a.parallel : 0.000035s : 0.18% optimize.opt_a.flash_sp : 0.000012s : 0.06% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000018s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000743s : 3.78% optimize.opt_a.add_forward_monad_depend : 0.000012s : 0.06% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.12% optimize.opt_a.cse : 0.000046s : 0.23% optimize.opt_a.a_3 : 0.000079s : 0.40% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.03% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000562s : 2.86% optimize.opt_b.b_1 : 0.000112s : 0.57% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000019s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000026s : 0.13% optimize.loop_unroll : 0.000446s : 2.27% optimize.opt_after_cconv.c_1 : 0.000026s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000055s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000010s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000020s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000149s : 0.76% opt_after_jit_grad : 0.000503s : 2.56% validate : 0.000039s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.007068s : 35.97% execute : 0.000010s : 0.05% Time group info: ------[substitution.] 0.000192 26 18.21% : 0.000035s : 5: substitution.arithmetic_simplify 1.17% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 2.88% : 0.000006s : 3: substitution.graph_param_transform 66.24% : 0.000127s : 3: substitution.inline 1.85% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.33% : 0.000004s : 4: substitution.remove_not_recompute_node 2.01% : 0.000004s : 2: substitution.replace_old_param 4.58% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007258 2 90.62% : 0.006577s : 1: type_inference.infer 9.38% : 0.000681s : 1: type_inference.specialize ------[replace.] 0.000040 4 79.52% : 0.000032s : 3: replace.inline 20.48% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000132 4 93.95% : 0.000124s : 3: match.inline 6.05% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 883 0.93% : 0.000001s : 9: predicate.accumulaten_eliminater 0.95% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.88% : 0.000001s : 9: predicate.addn_zero_filter 0.88% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 15: predicate.arithmetic_simplify 1.01% : 0.000002s : 9: predicate.cast_eliminate 0.68% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.56% : 0.000001s : 6: predicate.depend_value_elim 0.92% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.21% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.21% : 0.000000s : 3: predicate.elim_not_effective 0.39% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 12: predicate.environ_get_depend_swap 1.73% : 0.000003s : 18: predicate.environ_get_eliminate 1.07% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.32% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.67% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.57% : 0.000001s : 6: predicate.incorporate_call_switch 6.39% : 0.000010s : 40: predicate.inline 0.83% : 0.000001s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.04% : 0.000002s : 6: predicate.less_batch_normalization 1.66% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.64% : 0.000004s : 25: predicate.load_eliminater 1.02% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.20% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.59% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.70% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 9: predicate.minmaximum_grad 1.22% : 0.000002s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.35% : 0.000001s : 3: predicate.parallel_virtual_node 1.59% : 0.000003s : 13: predicate.partial_defer_inline 1.42% : 0.000002s : 13: predicate.partial_eliminate 1.05% : 0.000002s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.32% : 0.000002s : 9: predicate.reduce_eliminate 2.42% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.34% : 0.000002s : 16: predicate.replace_applicator 0.65% : 0.000001s : 6: predicate.replace_old_param 0.25% : 0.000000s : 3: predicate.reset_defer_inline 0.95% : 0.000002s : 9: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 6: predicate.shard_identity_eliminate 0.78% : 0.000001s : 6: predicate.special_op_eliminate 0.85% : 0.000001s : 6: predicate.specialize_transform 0.95% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 13: predicate.switch_defer_inline 1.96% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.86% : 0.000008s : 43: predicate.switch_simplify 0.92% : 0.000001s : 9: predicate.tile_eliminate 0.87% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 15: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.07% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.68% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000464 8 47.38% : 0.000220s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.62% : 0.000244s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.035529 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.71% : 0.004159s : 1: add_attr 11.66% : 0.004144s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000059s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000069s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000008s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.73% : 0.000614s : 1: bootstrap 0.08% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000013s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.05% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.28% : 0.000455s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.61% : 0.000572s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000016s : 1: opt.transform.mutable_eliminate 2.74% : 0.000975s : 78: opt.transform.opt_a 0.07% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.26% : 0.000091s : 28: opt.transform.opt_b 0.12% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.24% : 0.002574s : 1: opt_a 0.29% : 0.000103s : 1: opt_after_cconv 1.45% : 0.000514s : 1: opt_after_jit_grad 0.55% : 0.000196s : 1: opt_b 13.08% : 0.004647s : 1: optimize 0.06% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.10% : 0.000034s : 1: pre_auto_parallel 0.09% : 0.000030s : 1: py_interpret_to_execute 0.04% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000020s : 1: remove_dup_value 1.21% : 0.000430s : 1: renormalize.infer 0.86% : 0.000305s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.44% : 0.000155s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000038s : 1: rewriter_after_opt_a 0.21% : 0.000073s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000077s : 1: symbol_engine_optimizer 19.95% : 0.007089s : 1: task_emit 0.21% : 0.000073s : 1: tuple_transform 20.66% : 0.007341s : 1: type_inference 0.25% : 0.000088s : 1: validate TotalTime = 0.0221704, [24] [bootstrap]: 0.00042608 [type_inference]: 0.00631588 [event_method]: 1.442e-05 [auto_monad]: 6.347e-05 [graph_reusing]: 6.41998e-06 [inline]: 3.23e-06 [add_attr]: 0.00341005, [1] [add_attr_with_inline]: 0.00340029, [1] [Cycle 1]: 5.957e-05, [2] [tag_attr]: 1.465e-05 [meta_addattr_fg_expand]: 4.17e-06 [parallel-infer-symbol]: 3.48999e-06 [pre_auto_parallel]: 2.817e-05 [insert-virtual-dataset]: 2.79001e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 1.89999e-06 [optimize]: 0.00438358, [53] [py_interpret_to_execute]: 3.281e-05 [rewriter_before_opt_a]: 5.79e-05 [opt_a]: 0.0022763, [2] [Cycle 1]: 0.00161725, [45] [expand_dump_flag]: 3.26999e-06 [switch_simplify]: 3.038e-05 [loop_unroll]: 1.73e-05 [a_1]: 0.00037481 [with_stream_mark]: 1.637e-05 [recompute_prepare]: 8.57998e-06 [updatestate_depend_eliminate]: 4.23999e-06 [updatestate_assign_eliminate]: 3.38e-06 [updatestate_loads_eliminate]: 3.56999e-06 [parameter_eliminate]: 1.93002e-06 [a_2]: 9.143e-05 [accelerated_algorithm]: 6.56e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 2.12999e-06 [shard_inline]: 6.19001e-06 [merge_send_recv]: 9.12001e-06 [auto_parallel]: 6.57002e-06 [parallel]: 1.967e-05 [flash_sp]: 8.62e-06 [merge_comm]: 4.28001e-06 [allreduce_fusion]: 3.89002e-06 [matmul_add_comm_reduction]: 1.069e-05 [allreduce_slice_to_reducescatter]: 7.39994e-07 [virtual_shard_identity]: 8.49998e-06 [virtual_dataset]: 7.00998e-06 [get_grad_eliminate_]: 6.26998e-06 [virtual_output]: 6.56999e-06 [merge_forward]: 4.43001e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 1.058e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.401e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 1.146e-05 [set_forward_comm_id_for_comm_node_pass]: 4.13001e-06 [meta_fg_expand]: 2.88e-06 [flash_sp_send_recv_attached]: 3.04999e-06 [receive_attached]: 2.61e-06 [after_resolve]: 1.109e-05 [a_after_grad]: 9.37999e-06 [renormalize]: 0.00052925 [add_forward_monad_depend]: 4.63001e-06 [auto_monad_grad]: 2.17001e-06 [auto_monad_eliminator]: 1.393e-05 [cse]: 3.129e-05 [a_3]: 4.667e-05 [Cycle 2]: 0.00064806, [45] [expand_dump_flag]: 1.12e-06 [switch_simplify]: 7.77998e-06 [loop_unroll]: 6.24999e-06 [a_1]: 0.0001254 [with_stream_mark]: 1.073e-05 [recompute_prepare]: 5.74999e-06 [updatestate_depend_eliminate]: 3.04001e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.92002e-06 [parameter_eliminate]: 8.29983e-07 [a_2]: 7.427e-05 [accelerated_algorithm]: 6.22001e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 6.28e-06 [merge_send_recv]: 5.44e-06 [auto_parallel]: 6.10002e-06 [parallel]: 5.14e-06 [flash_sp]: 4.07e-06 [merge_comm]: 3.33e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 5.80002e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 6.91999e-06 [virtual_dataset]: 5.89999e-06 [get_grad_eliminate_]: 5.72001e-06 [virtual_output]: 5.53002e-06 [merge_forward]: 3.01001e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 7.43999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.106e-05 [merge_recompute_call_nodes]: 9.29984e-07 [before_grad]: 9.69999e-06 [set_forward_comm_id_for_comm_node_pass]: 4.28999e-06 [meta_fg_expand]: 1.84998e-06 [flash_sp_send_recv_attached]: 8.99978e-07 [receive_attached]: 1.44e-06 [after_resolve]: 9.22999e-06 [a_after_grad]: 8.37e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.25999e-06 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 7.09001e-06 [cse]: 1.448e-05 [a_3]: 3.651e-05 [py_interpret_to_execute_after_opt_a]: 8.27e-06 [slice_cell_reuse_recomputed_activation]: 2.36998e-06 [rewriter_after_opt_a]: 3.68e-05 [convert_after_rewriter]: 7.1e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00056682 [opt_b]: 0.00019365, [1] [Cycle 1]: 0.00018709, [7] [b_1]: 0.00011355 [b_2]: 7.78001e-06 [updatestate_depend_eliminate]: 5.75001e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.17001e-06 [renormalize]: 5.00004e-07 [cse]: 1.934e-05 [optimize_parallel_all_gather_comm]: 1.619e-05 [overlap_param_gather]: 2.30002e-06 [cconv]: 2.568e-05 [loop_unroll]: 0.00046266 [opt_after_cconv]: 0.00010082, [1] [Cycle 1]: 9.439e-05, [7] [c_1]: 2.639e-05 [parameter_eliminate]: 2.74999e-06 [updatestate_depend_eliminate]: 5.84e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.59999e-06 [cse]: 1.93e-05 [renormalize]: 3.9002e-07 [remove_dup_value]: 1.571e-05 [tuple_transform]: 7.132e-05, [1] [Cycle 1]: 6.654e-05, [4] [d_1]: 3.888e-05 [none_parameter_eliminate]: 1.59998e-06 [renormalize]: 2.70025e-07 [switch_simplify]: 6.60002e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 6.381e-05 [cse_after_recomputation]: 2.346e-05, [1] [Cycle 1]: 1.873e-05, [1] [cse]: 1.288e-05 [environ_conv]: 5.65001e-06 [swap_dp_allreduce_reducescatter]: 5.54998e-06 [bias_add_comm_swap]: 2.49999e-06 [label_micro_interleaved_index]: 4.48999e-06 [label_fine_grained_interleaved_index]: 3.16001e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.35002e-06 [micro_interleaved_order_control]: 2.57001e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 1.09998e-06 [full_micro_interleaved_order_control]: 2.33998e-06 [reorder_send_recv_between_fp_bp]: 3.33e-06 [comm_op_add_attrs]: 1.06997e-06 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.45999e-06 [interleave_parallel_branches]: 1.20999e-06 [overlap_opt_shard_in_pipeline]: 1.67999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.09e-06 [control_data_broadcast_order]: 1.476e-05 [grouped_pairwise_exchange_alltoall]: 1.92999e-06 [offloading_packed_experts]: 4.37e-06 [overlap_recompute_and_grad_model_parallel]: 5.19e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.59e-06 [overlap_recompute_comm]: 3.08998e-06 [overlap_grad_ring_attention]: 4.23999e-06 [overlap_grad_flash_sp]: 2.159e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.84e-06 [handle_group_info]: 1.19998e-06 [symbol_engine_optimizer]: 7.566e-05, [1] [Cycle 1]: 7.093e-05, [6] [build]: 3.08e-06 [elim_shapecalc]: 9.71e-06 [elim_not_effective]: 1.292e-05 [opt_reshape]: 6.55997e-06 [fold_const_symbol]: 9.99999e-06 [renormalize]: 3.19997e-07 [detach_backward]: 2.14e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 1.684e-05 [get_jit_bprop_graph]: 1.94999e-06 [rewriter_after_jit_bprop_graph]: 3.83001e-06 [opt_after_jit_grad]: 0.00049588 [validate]: 3.971e-05 [backend_pass]: 1.35999e-06 [task_emit]: 0.00670514 [execute]: 8.33999e-06 Sums bootstrap : 0.000426s : 2.41% type_inference : 0.006316s : 35.69% event_method : 0.000014s : 0.08% auto_monad : 0.000063s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000003s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000033s : 0.19% optimize.rewriter_before_opt_a : 0.000058s : 0.33% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.22% optimize.opt_a.loop_unroll : 0.000024s : 0.13% optimize.opt_a.a_1 : 0.000500s : 2.83% optimize.opt_a.with_stream_mark : 0.000027s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000166s : 0.94% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000015s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.07% optimize.opt_a.parallel : 0.000025s : 0.14% optimize.opt_a.flash_sp : 0.000013s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000013s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.07% optimize.opt_a.virtual_output : 0.000012s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000018s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000021s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000018s : 0.10% optimize.opt_a.renormalize : 0.000529s : 2.99% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000046s : 0.26% optimize.opt_a.a_3 : 0.000083s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000037s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000567s : 3.20% optimize.opt_b.b_1 : 0.000114s : 0.64% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000019s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000026s : 0.15% optimize.loop_unroll : 0.000463s : 2.61% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000019s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000064s : 0.36% optimize.cse_after_recomputation.cse : 0.000013s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000015s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000022s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000496s : 2.80% validate : 0.000040s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006705s : 37.89% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000153 24 21.15% : 0.000032s : 4: substitution.arithmetic_simplify 1.33% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000002s : 2: substitution.fold_const_symbol 3.94% : 0.000006s : 3: substitution.graph_param_transform 64.26% : 0.000098s : 3: substitution.inline 2.31% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.55% : 0.000005s : 4: substitution.remove_not_recompute_node 2.46% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.006259 2 89.84% : 0.005623s : 1: type_inference.infer 10.16% : 0.000636s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000097 3 100.00% : 0.000097s : 3: match.inline ------[predicate.] 0.000156 815 0.88% : 0.000001s : 8: predicate.accumulaten_eliminater 0.88% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.95% : 0.000001s : 8: predicate.addn_zero_filter 0.83% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.39% : 0.000004s : 14: predicate.arithmetic_simplify 0.95% : 0.000001s : 8: predicate.cast_eliminate 0.66% : 0.000001s : 6: predicate.check_bprop_eliminate 0.67% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.67% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.54% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 11: predicate.environ_get_depend_swap 1.68% : 0.000003s : 17: predicate.environ_get_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.12% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.06% : 0.000003s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 6: predicate.float_environ_get_switch 0.93% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.81% : 0.000001s : 6: predicate.get_grad_eliminate 0.27% : 0.000000s : 3: predicate.graph_param_transform 0.76% : 0.000001s : 6: predicate.incorporate_call 0.60% : 0.000001s : 6: predicate.incorporate_call_switch 6.50% : 0.000010s : 37: predicate.inline 1.02% : 0.000002s : 6: predicate.inline_without_move 0.45% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.98% : 0.000002s : 6: predicate.less_batch_normalization 1.55% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 22: predicate.load_eliminater 1.09% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.91% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 6: predicate.merge_addn 0.66% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.20% : 0.000002s : 3: predicate.mutable_eliminate 0.39% : 0.000001s : 3: predicate.opt_reshape 0.43% : 0.000001s : 3: predicate.parallel_virtual_node 1.39% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 11: predicate.partial_eliminate 0.86% : 0.000001s : 8: predicate.print_const_string_wrapper 0.68% : 0.000001s : 6: predicate.reduce_all_const_elim 1.11% : 0.000002s : 8: predicate.reduce_eliminate 2.33% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 6: predicate.remove_not_recompute_node 1.25% : 0.000002s : 14: predicate.replace_applicator 0.79% : 0.000001s : 6: predicate.replace_old_param 0.26% : 0.000000s : 3: predicate.reset_defer_inline 1.14% : 0.000002s : 8: predicate.reshape_eliminate 0.72% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 3: predicate.row_tensor_eliminate 0.93% : 0.000001s : 6: predicate.same_eliminate 0.51% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 6: predicate.shard_identity_eliminate 0.83% : 0.000001s : 6: predicate.special_op_eliminate 0.90% : 0.000001s : 6: predicate.specialize_transform 1.02% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.25% : 0.000002s : 11: predicate.switch_defer_inline 1.84% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.77% : 0.000007s : 38: predicate.switch_simplify 0.86% : 0.000001s : 8: predicate.tile_eliminate 0.88% : 0.000001s : 8: predicate.transpose_eliminate 1.50% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.41% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.59% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.99% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 3: predicate.value_based_eliminate 0.90% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000320 7 34.05% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 65.95% : 0.000211s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031539 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.83% : 0.003415s : 1: add_attr 10.79% : 0.003404s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.22% : 0.000068s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000069s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.44% : 0.000455s : 1: bootstrap 0.09% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000018s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000027s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.07% : 0.000022s : 1: event_method 0.05% : 0.000014s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000011s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000007s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000472s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.83% : 0.000577s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000014s : 1: opt.transform.mutable_eliminate 2.85% : 0.000898s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000093s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000035s : 4: opt.transform.symbol_engine_opt 7.23% : 0.002279s : 1: opt_a 0.33% : 0.000104s : 1: opt_after_cconv 1.60% : 0.000506s : 1: opt_after_jit_grad 0.62% : 0.000197s : 1: opt_b 13.91% : 0.004388s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000025s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000033s : 1: pre_auto_parallel 0.12% : 0.000038s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.91% : 0.000288s : 1: renormalize.infer 0.74% : 0.000235s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000041s : 1: rewriter_after_opt_a 0.20% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000078s : 1: symbol_engine_optimizer 21.32% : 0.006723s : 1: task_emit 0.24% : 0.000074s : 1: tuple_transform 20.10% : 0.006338s : 1: type_inference 0.24% : 0.000076s : 1: validate TotalTime = 0.0224711, [24] [bootstrap]: 0.00043862 [type_inference]: 0.00604405 [event_method]: 1.453e-05 [auto_monad]: 6.21e-05 [graph_reusing]: 5.99e-06 [inline]: 3.26001e-06 [add_attr]: 0.00345573, [1] [add_attr_with_inline]: 0.00344459, [1] [Cycle 1]: 5.963e-05, [2] [tag_attr]: 1.794e-05 [meta_addattr_fg_expand]: 4.42e-06 [parallel-infer-symbol]: 3.8e-06 [pre_auto_parallel]: 3.034e-05 [insert-virtual-dataset]: 2.74001e-06 [parallel-infer-symbol-second]: 9.30013e-07 [dataset_repeat_opt]: 2.34001e-06 [pipeline_split]: 2.10002e-06 [optimize]: 0.0047942, [53] [py_interpret_to_execute]: 2.443e-05 [rewriter_before_opt_a]: 6.983e-05 [opt_a]: 0.00256782, [2] [Cycle 1]: 0.00192038, [45] [expand_dump_flag]: 3.06001e-06 [switch_simplify]: 3.411e-05 [loop_unroll]: 2.058e-05 [a_1]: 0.00046916 [with_stream_mark]: 1.695e-05 [recompute_prepare]: 7.71001e-06 [updatestate_depend_eliminate]: 4.17998e-06 [updatestate_assign_eliminate]: 3.80998e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 1.77001e-06 [a_2]: 8.222e-05 [accelerated_algorithm]: 6.71e-06 [shard]: 2.41998e-06 [meta_shard_fg_expand]: 1.69998e-06 [shard_inline]: 5.92999e-06 [merge_send_recv]: 8.59e-06 [auto_parallel]: 6.59999e-06 [parallel]: 1.931e-05 [flash_sp]: 8.60999e-06 [merge_comm]: 3.94002e-06 [allreduce_fusion]: 4.17998e-06 [matmul_add_comm_reduction]: 1.018e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.5e-06 [virtual_dataset]: 6.33e-06 [get_grad_eliminate_]: 5.86e-06 [virtual_output]: 6.31998e-06 [merge_forward]: 3.97e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 1.004e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.22e-05 [merge_recompute_call_nodes]: 1.99999e-06 [before_grad]: 1.183e-05 [set_forward_comm_id_for_comm_node_pass]: 4.09002e-06 [meta_fg_expand]: 2.81999e-06 [flash_sp_send_recv_attached]: 2.68e-06 [receive_attached]: 2.07001e-06 [after_resolve]: 1.044e-05 [a_after_grad]: 5.87e-05 [renormalize]: 0.00069918 [add_forward_monad_depend]: 5.92999e-06 [auto_monad_grad]: 2.53998e-06 [auto_monad_eliminator]: 1.464e-05 [cse]: 3.431e-05 [a_3]: 4.58e-05 [Cycle 2]: 0.00063548, [45] [expand_dump_flag]: 1.25999e-06 [switch_simplify]: 6.87002e-06 [loop_unroll]: 5.77001e-06 [a_1]: 0.00012052 [with_stream_mark]: 1.215e-05 [recompute_prepare]: 6.46e-06 [updatestate_depend_eliminate]: 3.24001e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.85002e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 7.34e-05 [accelerated_algorithm]: 6.16998e-06 [shard]: 1.13001e-06 [meta_shard_fg_expand]: 1.25001e-06 [shard_inline]: 6.01998e-06 [merge_send_recv]: 5.05001e-06 [auto_parallel]: 5.81998e-06 [parallel]: 5.69e-06 [flash_sp]: 3.64002e-06 [merge_comm]: 3.09999e-06 [allreduce_fusion]: 3.17002e-06 [matmul_add_comm_reduction]: 6.07999e-06 [allreduce_slice_to_reducescatter]: 4.69998e-07 [virtual_shard_identity]: 6.77002e-06 [virtual_dataset]: 5.79e-06 [get_grad_eliminate_]: 5.36002e-06 [virtual_output]: 5.54e-06 [merge_forward]: 3.16999e-06 [cell_reuse_recompute_pass]: 2.27999e-06 [offload_activation]: 7.82998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.045e-05 [merge_recompute_call_nodes]: 1.06002e-06 [before_grad]: 9.40001e-06 [set_forward_comm_id_for_comm_node_pass]: 4.23001e-06 [meta_fg_expand]: 2.12001e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.45999e-06 [after_resolve]: 8.41002e-06 [a_after_grad]: 8.47e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.59e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 6.35997e-06 [cse]: 1.624e-05 [a_3]: 3.363e-05 [py_interpret_to_execute_after_opt_a]: 1.008e-05 [slice_cell_reuse_recomputed_activation]: 1.99e-06 [rewriter_after_opt_a]: 3.764e-05 [convert_after_rewriter]: 7.31001e-06 [order_py_execute_after_rewriter]: 5.04e-06 [mutable_eliminate]: 0.00064113 [opt_b]: 0.00020114, [1] [Cycle 1]: 0.00019383, [7] [b_1]: 0.00011701 [b_2]: 7.13e-06 [updatestate_depend_eliminate]: 6.04001e-06 [updatestate_assign_eliminate]: 3.01001e-06 [updatestate_loads_eliminate]: 2.92002e-06 [renormalize]: 9.89996e-07 [cse]: 2.035e-05 [optimize_parallel_all_gather_comm]: 1.889e-05 [overlap_param_gather]: 2.50002e-06 [cconv]: 2.859e-05 [loop_unroll]: 0.00049752 [opt_after_cconv]: 0.00010286, [1] [Cycle 1]: 9.575e-05, [7] [c_1]: 2.677e-05 [parameter_eliminate]: 3.11001e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.889e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 1.569e-05 [tuple_transform]: 7.455e-05, [1] [Cycle 1]: 6.961e-05, [4] [d_1]: 4.055e-05 [none_parameter_eliminate]: 1.92001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 7.38999e-06 [partial_unused_args_eliminate]: 1.76e-06 [add_recomputation]: 5.008e-05 [cse_after_recomputation]: 2.144e-05, [1] [Cycle 1]: 1.63e-05, [1] [cse]: 1.107e-05 [environ_conv]: 5.87999e-06 [swap_dp_allreduce_reducescatter]: 5.52001e-06 [bias_add_comm_swap]: 3.36001e-06 [label_micro_interleaved_index]: 5.00999e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.67001e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.96001e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 9.5999e-07 [remove_cast_before_assign_add]: 1.44e-06 [full_micro_interleaved_order_control]: 2.49999e-06 [reorder_send_recv_between_fp_bp]: 3.21999e-06 [comm_op_add_attrs]: 1.09998e-06 [add_comm_op_reuse_tag]: 1.07998e-06 [interleave_split_concat_branches]: 1.35999e-06 [interleave_parallel_branches]: 1.15999e-06 [overlap_opt_shard_in_pipeline]: 1.50999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.07999e-06 [control_data_broadcast_order]: 1.266e-05 [grouped_pairwise_exchange_alltoall]: 1.56998e-06 [offloading_packed_experts]: 5.04e-06 [overlap_recompute_and_grad_model_parallel]: 5.09e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.41002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.81999e-06 [overlap_grad_ring_attention]: 4.46002e-06 [overlap_grad_flash_sp]: 2.081e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.16998e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 2.00002e-06 [symbol_engine_optimizer]: 7.562e-05, [1] [Cycle 1]: 7.078e-05, [6] [build]: 3.52997e-06 [elim_shapecalc]: 9.49999e-06 [elim_not_effective]: 1.242e-05 [opt_reshape]: 6.87002e-06 [fold_const_symbol]: 9.81e-06 [renormalize]: 1.8999e-07 [detach_backward]: 2.04e-06 [pipeline_parallel_scheduler]: 1.97001e-06 [auto_monad_reorder]: 1.665e-05 [get_jit_bprop_graph]: 2.69999e-06 [rewriter_after_jit_bprop_graph]: 4.58999e-06 [opt_after_jit_grad]: 0.00057718 [validate]: 4.492e-05 [backend_pass]: 1.04e-06 [task_emit]: 0.00671352 [execute]: 1.022e-05 Sums bootstrap : 0.000439s : 2.44% type_inference : 0.006044s : 33.67% event_method : 0.000015s : 0.08% auto_monad : 0.000062s : 0.35% graph_reusing : 0.000006s : 0.03% inline : 0.000003s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000030s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000024s : 0.14% optimize.rewriter_before_opt_a : 0.000070s : 0.39% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000590s : 3.28% optimize.opt_a.with_stream_mark : 0.000029s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000156s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000025s : 0.14% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000012s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000018s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000021s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000067s : 0.37% optimize.opt_a.renormalize : 0.000699s : 3.90% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000051s : 0.28% optimize.opt_a.a_3 : 0.000079s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000038s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000641s : 3.57% optimize.opt_b.b_1 : 0.000117s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.01% optimize.opt_b.cse : 0.000020s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.11% optimize.overlap_param_gather : 0.000003s : 0.01% optimize.cconv : 0.000029s : 0.16% optimize.loop_unroll : 0.000498s : 2.77% optimize.opt_after_cconv.c_1 : 0.000027s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000019s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.09% optimize.tuple_transform.d_1 : 0.000041s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000021s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000002s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000003s : 0.02% rewriter_after_jit_bprop_graph : 0.000005s : 0.03% opt_after_jit_grad : 0.000577s : 3.22% validate : 0.000045s : 0.25% backend_pass : 0.000001s : 0.01% task_emit : 0.006714s : 37.40% execute : 0.000010s : 0.06% Time group info: ------[substitution.] 0.000193 26 18.51% : 0.000036s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.93% : 0.000002s : 2: substitution.fold_const_symbol 3.31% : 0.000006s : 3: substitution.graph_param_transform 64.91% : 0.000126s : 3: substitution.inline 1.86% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.51% : 0.000005s : 4: substitution.remove_not_recompute_node 1.90% : 0.000004s : 2: substitution.replace_old_param 5.00% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005988 2 89.48% : 0.005358s : 1: type_inference.infer 10.52% : 0.000630s : 1: type_inference.specialize ------[replace.] 0.000041 4 79.66% : 0.000032s : 3: replace.inline 20.34% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000132 4 93.25% : 0.000123s : 3: match.inline 6.75% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000165 883 0.91% : 0.000001s : 9: predicate.accumulaten_eliminater 0.98% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.92% : 0.000002s : 9: predicate.addn_zero_filter 0.87% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.13% : 0.000004s : 15: predicate.arithmetic_simplify 0.87% : 0.000001s : 9: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.54% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.58% : 0.000001s : 6: predicate.depend_value_elim 0.86% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.95% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 3: predicate.elim_not_effective 0.47% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.17% : 0.000002s : 12: predicate.environ_get_depend_swap 1.75% : 0.000003s : 18: predicate.environ_get_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.13% : 0.000004s : 13: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.79% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.64% : 0.000001s : 6: predicate.incorporate_call 0.55% : 0.000001s : 6: predicate.incorporate_call_switch 6.42% : 0.000011s : 40: predicate.inline 0.84% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 6: predicate.less_batch_normalization 1.65% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.70% : 0.000004s : 25: predicate.load_eliminater 1.21% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.07% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.90% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.55% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 9: predicate.minmaximum_grad 1.53% : 0.000003s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.39% : 0.000001s : 3: predicate.parallel_virtual_node 1.52% : 0.000003s : 13: predicate.partial_defer_inline 1.38% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.24% : 0.000002s : 9: predicate.reduce_eliminate 2.40% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.44% : 0.000001s : 6: predicate.remove_not_recompute_node 1.18% : 0.000002s : 16: predicate.replace_applicator 0.62% : 0.000001s : 6: predicate.replace_old_param 0.27% : 0.000000s : 3: predicate.reset_defer_inline 1.10% : 0.000002s : 9: predicate.reshape_eliminate 0.69% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.82% : 0.000001s : 6: predicate.same_eliminate 0.41% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.92% : 0.000002s : 6: predicate.shard_identity_eliminate 1.15% : 0.000002s : 6: predicate.special_op_eliminate 0.75% : 0.000001s : 6: predicate.specialize_transform 0.92% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 13: predicate.switch_defer_inline 1.89% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.81% : 0.000008s : 43: predicate.switch_simplify 0.86% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.50% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.50% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.01% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.73% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.78% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000404 8 42.64% : 0.000172s : 3: func_graph_cloner_run.FuncGraphClonerGraph 57.36% : 0.000232s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032596 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.62% : 0.003461s : 1: add_attr 10.58% : 0.003449s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000067s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.44% : 0.000470s : 1: bootstrap 0.10% : 0.000032s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.05% : 0.000016s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000006s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000005s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.56% : 0.000507s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 2.00% : 0.000652s : 1: mutable_eliminate 0.02% : 0.000008s : 1: offloading_packed_experts 0.04% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000016s : 1: opt.transform.mutable_eliminate 3.14% : 0.001024s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000094s : 28: opt.transform.opt_b 0.14% : 0.000046s : 2: opt.transform.opt_trans_graph 0.10% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.89% : 0.002571s : 1: opt_a 0.33% : 0.000106s : 1: opt_after_cconv 1.80% : 0.000588s : 1: opt_after_jit_grad 0.63% : 0.000205s : 1: opt_b 14.72% : 0.004800s : 1: optimize 0.07% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000035s : 1: pre_auto_parallel 0.09% : 0.000028s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 1.14% : 0.000370s : 1: renormalize.infer 0.98% : 0.000321s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000042s : 1: rewriter_after_opt_a 0.23% : 0.000074s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000006s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000078s : 1: symbol_engine_optimizer 20.66% : 0.006733s : 1: task_emit 0.24% : 0.000078s : 1: tuple_transform 18.60% : 0.006064s : 1: type_inference 0.26% : 0.000084s : 1: validate TotalTime = 0.0434829, [24] [bootstrap]: 0.00055064 [type_inference]: 0.0123946 [event_method]: 5.077e-05 [auto_monad]: 0.00014359 [graph_reusing]: 9.14998e-06 [inline]: 2.60002e-06 [add_attr]: 0.00355211, [1] [add_attr_with_inline]: 0.00354162, [1] [Cycle 1]: 8.535e-05, [2] [tag_attr]: 3.855e-05 [meta_addattr_fg_expand]: 1.019e-05 [parallel-infer-symbol]: 3.42002e-06 [pre_auto_parallel]: 5.597e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 9.60019e-07 [dataset_repeat_opt]: 2.12001e-06 [pipeline_split]: 2.02001e-06 [optimize]: 0.0190352, [53] [py_interpret_to_execute]: 4.491e-05 [rewriter_before_opt_a]: 0.00016742 [opt_a]: 0.0165344, [3] [Cycle 1]: 0.0127878, [45] [expand_dump_flag]: 5.25999e-06 [switch_simplify]: 7.796e-05 [loop_unroll]: 6.464e-05 [a_1]: 0.00154333 [with_stream_mark]: 3.089e-05 [recompute_prepare]: 2.694e-05 [updatestate_depend_eliminate]: 9.02999e-06 [updatestate_assign_eliminate]: 7.5e-06 [updatestate_loads_eliminate]: 7.09001e-06 [parameter_eliminate]: 3.04999e-06 [a_2]: 0.00025047 [accelerated_algorithm]: 3.547e-05 [shard]: 1.97999e-06 [meta_shard_fg_expand]: 4.54002e-06 [shard_inline]: 1.649e-05 [merge_send_recv]: 1.85e-05 [auto_parallel]: 1.23e-05 [parallel]: 2.219e-05 [flash_sp]: 1.304e-05 [merge_comm]: 1.045e-05 [allreduce_fusion]: 8.64998e-06 [matmul_add_comm_reduction]: 3.237e-05 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 1.883e-05 [virtual_dataset]: 1.552e-05 [get_grad_eliminate_]: 1.499e-05 [virtual_output]: 1.538e-05 [merge_forward]: 1.032e-05 [cell_reuse_recompute_pass]: 1.06002e-06 [offload_activation]: 1.952e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.189e-05 [merge_recompute_call_nodes]: 1.74e-06 [before_grad]: 2.953e-05 [set_forward_comm_id_for_comm_node_pass]: 1.046e-05 [meta_fg_expand]: 0.00177819 [flash_sp_send_recv_attached]: 4.42e-06 [receive_attached]: 2.68e-06 [after_resolve]: 7.095e-05 [a_after_grad]: 9.317e-05 [renormalize]: 0.00745434 [add_forward_monad_depend]: 1.267e-05 [auto_monad_grad]: 6.76e-06 [auto_monad_eliminator]: 5.526e-05 [cse]: 0.00025019 [a_3]: 0.00035074 [Cycle 2]: 0.00298676, [45] [expand_dump_flag]: 2.96999e-06 [switch_simplify]: 4.713e-05 [loop_unroll]: 4.295e-05 [a_1]: 0.00138387 [with_stream_mark]: 1.696e-05 [recompute_prepare]: 1.116e-05 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 4.50999e-06 [updatestate_loads_eliminate]: 3.57997e-06 [parameter_eliminate]: 1.68002e-06 [a_2]: 9.309e-05 [accelerated_algorithm]: 1.294e-05 [shard]: 2.86999e-06 [meta_shard_fg_expand]: 3.03998e-06 [shard_inline]: 6.91001e-06 [merge_send_recv]: 9.67999e-06 [auto_parallel]: 1.042e-05 [parallel]: 9.81998e-06 [flash_sp]: 4.62e-06 [merge_comm]: 4.1e-06 [allreduce_fusion]: 3.93001e-06 [matmul_add_comm_reduction]: 9.25001e-06 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 8.33999e-06 [virtual_dataset]: 6.89999e-06 [get_grad_eliminate_]: 6.56e-06 [virtual_output]: 6.58e-06 [merge_forward]: 4.08001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.079e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.381e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 1.241e-05 [set_forward_comm_id_for_comm_node_pass]: 4.76002e-06 [meta_fg_expand]: 0.00010628 [flash_sp_send_recv_attached]: 1.55999e-06 [receive_attached]: 2.14e-06 [after_resolve]: 1.375e-05 [a_after_grad]: 1.08e-05 [renormalize]: 0.00073453 [add_forward_monad_depend]: 4.63999e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 1.259e-05 [cse]: 2.488e-05 [a_3]: 4.976e-05 [Cycle 3]: 0.00074104, [45] [expand_dump_flag]: 1.59e-06 [switch_simplify]: 8.33999e-06 [loop_unroll]: 6.62002e-06 [a_1]: 0.0001506 [with_stream_mark]: 8.85999e-06 [recompute_prepare]: 7.02002e-06 [updatestate_depend_eliminate]: 4.21001e-06 [updatestate_assign_eliminate]: 2.81999e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 8.772e-05 [accelerated_algorithm]: 9.93998e-06 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.30999e-06 [shard_inline]: 6.78e-06 [merge_send_recv]: 5.45001e-06 [auto_parallel]: 7.11001e-06 [parallel]: 5.75001e-06 [flash_sp]: 1.12e-06 [merge_comm]: 3.71999e-06 [allreduce_fusion]: 3.47997e-06 [matmul_add_comm_reduction]: 6.07999e-06 [allreduce_slice_to_reducescatter]: 5.00004e-07 [virtual_shard_identity]: 7.45e-06 [virtual_dataset]: 6.83e-06 [get_grad_eliminate_]: 6.41998e-06 [virtual_output]: 6.16e-06 [merge_forward]: 3.48e-06 [cell_reuse_recompute_pass]: 1.31998e-06 [offload_activation]: 7.45e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.328e-05 [merge_recompute_call_nodes]: 8.70001e-07 [before_grad]: 5.437e-05 [set_forward_comm_id_for_comm_node_pass]: 4.87998e-06 [meta_fg_expand]: 2.53003e-06 [flash_sp_send_recv_attached]: 8.49977e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 1.03e-05 [a_after_grad]: 9.49e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.44998e-06 [auto_monad_grad]: 1.19003e-06 [auto_monad_eliminator]: 8.67e-06 [cse]: 1.673e-05 [a_3]: 3.943e-05 [py_interpret_to_execute_after_opt_a]: 1.37e-05 [slice_cell_reuse_recomputed_activation]: 2.03997e-06 [rewriter_after_opt_a]: 4.663e-05 [convert_after_rewriter]: 7.28999e-06 [order_py_execute_after_rewriter]: 5.74999e-06 [mutable_eliminate]: 0.00072321 [opt_b]: 0.00022924, [1] [Cycle 1]: 0.0002208, [7] [b_1]: 0.00013679 [b_2]: 8.89e-06 [updatestate_depend_eliminate]: 7.48999e-06 [updatestate_assign_eliminate]: 2.84001e-06 [updatestate_loads_eliminate]: 2.89999e-06 [renormalize]: 4.60015e-07 [cse]: 2.542e-05 [optimize_parallel_all_gather_comm]: 2.547e-05 [overlap_param_gather]: 2.16998e-06 [cconv]: 2.655e-05 [loop_unroll]: 0.00047801 [opt_after_cconv]: 0.0001156, [1] [Cycle 1]: 0.00010905, [7] [c_1]: 3.432e-05 [parameter_eliminate]: 3.05002e-06 [updatestate_depend_eliminate]: 6.63e-06 [updatestate_assign_eliminate]: 3.14999e-06 [updatestate_loads_eliminate]: 3.38e-06 [cse]: 2.187e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 1.635e-05 [tuple_transform]: 8.1e-05, [1] [Cycle 1]: 7.611e-05, [4] [d_1]: 4.706e-05 [none_parameter_eliminate]: 1.79e-06 [renormalize]: 2.80008e-07 [switch_simplify]: 7.8e-06 [partial_unused_args_eliminate]: 1.92001e-06 [add_recomputation]: 5.436e-05 [cse_after_recomputation]: 2.531e-05, [1] [Cycle 1]: 2.007e-05, [1] [cse]: 1.465e-05 [environ_conv]: 8.71002e-06 [swap_dp_allreduce_reducescatter]: 6.56e-06 [bias_add_comm_swap]: 2.63e-06 [label_micro_interleaved_index]: 4.43001e-06 [label_fine_grained_interleaved_index]: 2.88998e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.51998e-06 [assign_add_opt]: 1.69998e-06 [ForceFp32Comm]: 8.60018e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.64001e-06 [reorder_send_recv_between_fp_bp]: 3.08998e-06 [comm_op_add_attrs]: 1.42999e-06 [add_comm_op_reuse_tag]: 1.14e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.09003e-06 [overlap_opt_shard_in_pipeline]: 1.39e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.527e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 4.40999e-06 [overlap_recompute_and_grad_model_parallel]: 5.32999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.52001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.32001e-06 [overlap_grad_ring_attention]: 4.50999e-06 [overlap_grad_flash_sp]: 2.379e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.54999e-06 [split_layernorm_comm]: 1.84998e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 9.056e-05, [1] [Cycle 1]: 8.545e-05, [6] [build]: 9.39998e-06 [elim_shapecalc]: 1.257e-05 [elim_not_effective]: 1.453e-05 [opt_reshape]: 7.56999e-06 [fold_const_symbol]: 1.22e-05 [renormalize]: 2.30008e-07 [detach_backward]: 2.39999e-06 [pipeline_parallel_scheduler]: 1.94e-06 [auto_monad_reorder]: 2.221e-05 [get_jit_bprop_graph]: 1.89999e-06 [rewriter_after_jit_bprop_graph]: 4.15e-06 [opt_after_jit_grad]: 0.00052078 [validate]: 4.828e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.00683305 [execute]: 8.58001e-06 Sums bootstrap : 0.000551s : 1.43% type_inference : 0.012395s : 32.15% event_method : 0.000051s : 0.13% auto_monad : 0.000144s : 0.37% graph_reusing : 0.000009s : 0.02% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000039s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000056s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000045s : 0.12% optimize.rewriter_before_opt_a : 0.000167s : 0.43% optimize.opt_a.expand_dump_flag : 0.000010s : 0.03% optimize.opt_a.switch_simplify : 0.000133s : 0.35% optimize.opt_a.loop_unroll : 0.000114s : 0.30% optimize.opt_a.a_1 : 0.003078s : 7.98% optimize.opt_a.with_stream_mark : 0.000057s : 0.15% optimize.opt_a.recompute_prepare : 0.000045s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000018s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.03% optimize.opt_a.parameter_eliminate : 0.000006s : 0.01% optimize.opt_a.a_2 : 0.000431s : 1.12% optimize.opt_a.accelerated_algorithm : 0.000058s : 0.15% optimize.opt_a.shard : 0.000006s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.02% optimize.opt_a.shard_inline : 0.000030s : 0.08% optimize.opt_a.merge_send_recv : 0.000034s : 0.09% optimize.opt_a.auto_parallel : 0.000030s : 0.08% optimize.opt_a.parallel : 0.000038s : 0.10% optimize.opt_a.flash_sp : 0.000019s : 0.05% optimize.opt_a.merge_comm : 0.000018s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000048s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000035s : 0.09% optimize.opt_a.virtual_dataset : 0.000029s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.07% optimize.opt_a.virtual_output : 0.000028s : 0.07% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000038s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000059s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000096s : 0.25% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.05% optimize.opt_a.meta_fg_expand : 0.001887s : 4.90% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000095s : 0.25% optimize.opt_a.a_after_grad : 0.000113s : 0.29% optimize.opt_a.renormalize : 0.008189s : 21.24% optimize.opt_a.add_forward_monad_depend : 0.000019s : 0.05% optimize.opt_a.auto_monad_grad : 0.000010s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000077s : 0.20% optimize.opt_a.cse : 0.000292s : 0.76% optimize.opt_a.a_3 : 0.000440s : 1.14% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.12% optimize.convert_after_rewriter : 0.000007s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000723s : 1.88% optimize.opt_b.b_1 : 0.000137s : 0.35% optimize.opt_b.b_2 : 0.000009s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000025s : 0.07% optimize.optimize_parallel_all_gather_comm : 0.000025s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000027s : 0.07% optimize.loop_unroll : 0.000478s : 1.24% optimize.opt_after_cconv.c_1 : 0.000034s : 0.09% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000022s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.04% optimize.tuple_transform.d_1 : 0.000047s : 0.12% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000054s : 0.14% optimize.cse_after_recomputation.cse : 0.000015s : 0.04% optimize.environ_conv : 0.000009s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000022s : 0.06% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000521s : 1.35% validate : 0.000048s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.006833s : 17.73% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000872 161 7.13% : 0.000062s : 8: substitution.arithmetic_simplify 0.27% : 0.000002s : 3: substitution.elim_not_effective 0.59% : 0.000005s : 5: substitution.float_depend_g_call 0.55% : 0.000005s : 2: substitution.float_tuple_getitem_switch 0.23% : 0.000002s : 3: substitution.fold_const_symbol 0.70% : 0.000006s : 4: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 55.96% : 0.000488s : 17: substitution.inline 2.38% : 0.000021s : 2: substitution.inline_without_move 6.07% : 0.000053s : 15: substitution.j_node_and_user_rematch 2.08% : 0.000018s : 3: substitution.less_batch_normalization 1.30% : 0.000011s : 7: substitution.minmaximum_grad 0.82% : 0.000007s : 5: substitution.partial_eliminate 1.47% : 0.000013s : 15: substitution.remove_not_recompute_node 3.65% : 0.000032s : 10: substitution.replace_applicator 1.37% : 0.000012s : 10: substitution.replace_old_param 0.37% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.79% : 0.000024s : 7: substitution.tuple_list_convert_item_index_to_positive 1.29% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.73% : 0.000015s : 7: substitution.tuple_list_get_item_depend_reorder 6.76% : 0.000059s : 19: substitution.tuple_list_get_item_eliminator 1.81% : 0.000016s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012303 2 86.53% : 0.010646s : 1: type_inference.infer 13.47% : 0.001657s : 1: type_inference.specialize ------[replace.] 0.000234 27 67.00% : 0.000157s : 17: replace.inline 33.00% : 0.000077s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000506 27 94.46% : 0.000478s : 17: match.inline 5.54% : 0.000028s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000705 4248 1.18% : 0.000008s : 53: predicate.accumulaten_eliminater 0.26% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.13% : 0.000008s : 53: predicate.addn_zero_filter 1.09% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 2.07% : 0.000015s : 74: predicate.arithmetic_simplify 1.15% : 0.000008s : 53: predicate.cast_eliminate 1.08% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.46% : 0.000003s : 21: predicate.depend_value_elim 1.16% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.18% : 0.000008s : 53: predicate.dict_get_item_eliminator 1.15% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 4: predicate.elim_not_effective 0.15% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.16% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.16% : 0.000008s : 57: predicate.environ_get_depend_swap 1.70% : 0.000012s : 78: predicate.environ_get_eliminate 1.16% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.81% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.52% : 0.000018s : 80: predicate.float_depend_g_call 0.46% : 0.000003s : 21: predicate.float_environ_get_switch 0.58% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.51% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.50% : 0.000004s : 21: predicate.incorporate_call 0.45% : 0.000003s : 21: predicate.incorporate_call_switch 5.95% : 0.000042s : 183: predicate.inline 1.41% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 21: predicate.less_batch_normalization 1.56% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.61% : 0.000018s : 124: predicate.load_eliminater 0.31% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.52% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.42% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.49% : 0.000003s : 21: predicate.merge_addn 1.06% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.07% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.15% : 0.000008s : 53: predicate.minmaximum_grad 0.35% : 0.000002s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.11% : 0.000001s : 4: predicate.parallel_virtual_node 2.11% : 0.000015s : 80: predicate.partial_defer_inline 1.70% : 0.000012s : 67: predicate.partial_eliminate 1.13% : 0.000008s : 53: predicate.print_const_string_wrapper 0.50% : 0.000004s : 21: predicate.reduce_all_const_elim 1.50% : 0.000011s : 53: predicate.reduce_eliminate 2.63% : 0.000019s : 124: predicate.redundant_stop_gradient_eliminater 0.39% : 0.000003s : 21: predicate.remove_not_recompute_node 1.90% : 0.000013s : 113: predicate.replace_applicator 0.68% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.15% : 0.000008s : 53: predicate.reshape_eliminate 1.11% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 4: predicate.row_tensor_eliminate 1.28% : 0.000009s : 50: predicate.same_eliminate 0.33% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.61% : 0.000004s : 21: predicate.shard_identity_eliminate 0.23% : 0.000002s : 8: predicate.special_op_eliminate 0.60% : 0.000004s : 21: predicate.specialize_transform 1.29% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.25% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.12% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.92% : 0.000014s : 80: predicate.switch_defer_inline 2.97% : 0.000021s : 130: predicate.switch_layer_defer_inline 5.24% : 0.000037s : 218: predicate.switch_simplify 1.13% : 0.000008s : 53: predicate.tile_eliminate 1.10% : 0.000008s : 53: predicate.transpose_eliminate 1.39% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000009s : 61: predicate.tuple_list_get_item_depend_reorder 2.80% : 0.000020s : 92: predicate.tuple_list_get_item_eliminator 1.45% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.55% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.56% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.16% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 4: predicate.value_based_eliminate 0.52% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.13% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001926 36 60.50% : 0.001165s : 15: func_graph_cloner_run.FuncGraphClonerGraph 39.50% : 0.000761s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.079123 237 0.00% : 0.000004s : 1: ForceFp32Comm 4.50% : 0.003557s : 1: add_attr 4.48% : 0.003546s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000059s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.19% : 0.000151s : 1: auto_monad 0.03% : 0.000026s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.73% : 0.000579s : 1: bootstrap 0.04% : 0.000030s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000019s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.04% : 0.000028s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.07% : 0.000059s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.62% : 0.000487s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.93% : 0.000733s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000018s : 1: opt.transform.mutable_eliminate 5.90% : 0.004667s : 117: opt.transform.opt_a 0.04% : 0.000033s : 1: opt.transform.opt_after_cconv 0.03% : 0.000027s : 1: opt.transform.opt_after_jit_grad 0.15% : 0.000116s : 28: opt.transform.opt_b 0.07% : 0.000052s : 2: opt.transform.opt_trans_graph 0.05% : 0.000042s : 4: opt.transform.symbol_engine_opt 20.90% : 0.016538s : 1: opt_a 0.15% : 0.000119s : 1: opt_after_cconv 0.67% : 0.000530s : 1: opt_after_jit_grad 0.29% : 0.000233s : 1: opt_b 24.06% : 0.019041s : 1: optimize 0.04% : 0.000029s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000061s : 1: pre_auto_parallel 0.06% : 0.000049s : 1: py_interpret_to_execute 0.02% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000020s : 1: remove_dup_value 8.14% : 0.006443s : 2: renormalize.infer 2.18% : 0.001729s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000172s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000093s : 1: symbol_engine_optimizer 8.65% : 0.006847s : 1: task_emit 0.11% : 0.000084s : 1: tuple_transform 15.70% : 0.012419s : 1: type_inference 0.11% : 0.000084s : 1: validate TotalTime = 0.0202285, [24] [bootstrap]: 0.00042855 [type_inference]: 0.00571057 [event_method]: 1.288e-05 [auto_monad]: 6.11e-05 [graph_reusing]: 5.88998e-06 [inline]: 2.10002e-06 [add_attr]: 0.00307853, [1] [add_attr_with_inline]: 0.00307021, [1] [Cycle 1]: 4.616e-05, [2] [tag_attr]: 1.41e-05 [meta_addattr_fg_expand]: 3.63999e-06 [parallel-infer-symbol]: 3.58e-06 [pre_auto_parallel]: 2.642e-05 [insert-virtual-dataset]: 2.63e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.91e-06 [optimize]: 0.00403351, [53] [py_interpret_to_execute]: 1.987e-05 [rewriter_before_opt_a]: 5.271e-05 [opt_a]: 0.00212379, [2] [Cycle 1]: 0.001507, [45] [expand_dump_flag]: 2.68e-06 [switch_simplify]: 2.852e-05 [loop_unroll]: 1.686e-05 [a_1]: 0.00034624 [with_stream_mark]: 1.433e-05 [recompute_prepare]: 7.95e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 3.51001e-06 [updatestate_loads_eliminate]: 3.63e-06 [parameter_eliminate]: 1.85001e-06 [a_2]: 0.00012512 [accelerated_algorithm]: 6.49999e-06 [shard]: 2.68e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 6.19001e-06 [merge_send_recv]: 8.73001e-06 [auto_parallel]: 6.90002e-06 [parallel]: 1.981e-05 [flash_sp]: 8.55999e-06 [merge_comm]: 3.99002e-06 [allreduce_fusion]: 4.01001e-06 [matmul_add_comm_reduction]: 9.67999e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.38999e-06 [virtual_dataset]: 6.10002e-06 [get_grad_eliminate_]: 5.73002e-06 [virtual_output]: 6.21e-06 [merge_forward]: 3.97e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 9.74999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.264e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 1.083e-05 [set_forward_comm_id_for_comm_node_pass]: 3.78001e-06 [meta_fg_expand]: 2.56e-06 [flash_sp_send_recv_attached]: 3.33e-06 [receive_attached]: 3.01001e-06 [after_resolve]: 9.67999e-06 [a_after_grad]: 8.53001e-06 [renormalize]: 0.00045414 [add_forward_monad_depend]: 4.71002e-06 [auto_monad_grad]: 1.81e-06 [auto_monad_eliminator]: 1.38e-05 [cse]: 2.393e-05 [a_3]: 4.257e-05 [Cycle 2]: 0.00060609, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 7.05e-06 [loop_unroll]: 5.77999e-06 [a_1]: 0.00011386 [with_stream_mark]: 1.327e-05 [recompute_prepare]: 6.09001e-06 [updatestate_depend_eliminate]: 3.02002e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 7.38e-05 [accelerated_algorithm]: 5.68002e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 4.74e-06 [auto_parallel]: 5.72999e-06 [parallel]: 4.32e-06 [flash_sp]: 3.6e-06 [merge_comm]: 3.02002e-06 [allreduce_fusion]: 2.79999e-06 [matmul_add_comm_reduction]: 5.57001e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 6.84001e-06 [virtual_dataset]: 5.56e-06 [get_grad_eliminate_]: 5.35999e-06 [virtual_output]: 5.16002e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.47001e-06 [offload_activation]: 6.46999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.057e-05 [merge_recompute_call_nodes]: 8.09989e-07 [before_grad]: 8.57e-06 [set_forward_comm_id_for_comm_node_pass]: 3.6e-06 [meta_fg_expand]: 1.76e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.69e-06 [a_after_grad]: 7.87e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.36e-06 [cse]: 1.339e-05 [a_3]: 3.306e-05 [py_interpret_to_execute_after_opt_a]: 7.85e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 3.384e-05 [convert_after_rewriter]: 6.56e-06 [order_py_execute_after_rewriter]: 5.08002e-06 [mutable_eliminate]: 0.0004854 [opt_b]: 0.00018878, [1] [Cycle 1]: 0.00018205, [7] [b_1]: 0.00011128 [b_2]: 7.75998e-06 [updatestate_depend_eliminate]: 5.37999e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [renormalize]: 4.89992e-07 [cse]: 1.774e-05 [optimize_parallel_all_gather_comm]: 1.682e-05 [overlap_param_gather]: 2.21998e-06 [cconv]: 2.353e-05 [loop_unroll]: 0.00042803 [opt_after_cconv]: 9.604e-05, [1] [Cycle 1]: 8.991e-05, [7] [c_1]: 2.556e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 2.48002e-06 [updatestate_loads_eliminate]: 2.30002e-06 [cse]: 1.738e-05 [renormalize]: 6.89994e-07 [remove_dup_value]: 1.462e-05 [tuple_transform]: 6.996e-05, [1] [Cycle 1]: 6.506e-05, [4] [d_1]: 3.742e-05 [none_parameter_eliminate]: 2.01e-06 [renormalize]: 2.80008e-07 [switch_simplify]: 6.37001e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 4.44e-05 [cse_after_recomputation]: 2.17e-05, [1] [Cycle 1]: 1.666e-05, [1] [cse]: 1.136e-05 [environ_conv]: 4.94e-06 [swap_dp_allreduce_reducescatter]: 5.10999e-06 [bias_add_comm_swap]: 2.78003e-06 [label_micro_interleaved_index]: 4.63999e-06 [label_fine_grained_interleaved_index]: 2.83998e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 8.29983e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.48e-06 [reorder_send_recv_between_fp_bp]: 2.73998e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 1.25001e-06 [interleave_split_concat_branches]: 1.25001e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 2.04e-06 [control_data_broadcast_order]: 1.237e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.07e-06 [overlap_recompute_and_grad_model_parallel]: 4.93001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.62999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52999e-06 [overlap_recompute_comm]: 2.35002e-06 [overlap_grad_ring_attention]: 4.47e-06 [overlap_grad_flash_sp]: 1.732e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.29001e-06 [split_layernorm_comm]: 1.98002e-06 [handle_group_info]: 1.09998e-06 [symbol_engine_optimizer]: 7.258e-05, [1] [Cycle 1]: 6.784e-05, [6] [build]: 2.68e-06 [elim_shapecalc]: 9.22001e-06 [elim_not_effective]: 1.243e-05 [opt_reshape]: 6.39999e-06 [fold_const_symbol]: 9.55001e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.59e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.78999e-06 [opt_after_jit_grad]: 0.00046327 [validate]: 3.459e-05 [backend_pass]: 1.293e-05 [task_emit]: 0.00611699 [execute]: 7.61001e-06 Sums bootstrap : 0.000429s : 2.65% type_inference : 0.005711s : 35.35% event_method : 0.000013s : 0.08% auto_monad : 0.000061s : 0.38% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000053s : 0.33% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000036s : 0.22% optimize.opt_a.loop_unroll : 0.000023s : 0.14% optimize.opt_a.a_1 : 0.000460s : 2.85% optimize.opt_a.with_stream_mark : 0.000028s : 0.17% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000199s : 1.23% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000024s : 0.15% optimize.opt_a.flash_sp : 0.000012s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000454s : 2.81% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000037s : 0.23% optimize.opt_a.a_3 : 0.000076s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000485s : 3.00% optimize.opt_b.b_1 : 0.000111s : 0.69% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.15% optimize.loop_unroll : 0.000428s : 2.65% optimize.opt_after_cconv.c_1 : 0.000026s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000463s : 2.87% validate : 0.000035s : 0.21% backend_pass : 0.000013s : 0.08% task_emit : 0.006117s : 37.86% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000182 24 39.06% : 0.000071s : 4: substitution.arithmetic_simplify 1.17% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 2.95% : 0.000005s : 3: substitution.graph_param_transform 49.36% : 0.000090s : 3: substitution.inline 2.01% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.79% : 0.000005s : 4: substitution.remove_not_recompute_node 1.88% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005574 2 92.35% : 0.005147s : 1: type_inference.infer 7.65% : 0.000426s : 1: type_inference.specialize ------[replace.] 0.000026 3 100.00% : 0.000026s : 3: replace.inline ------[match.] 0.000088 3 100.00% : 0.000088s : 3: match.inline ------[predicate.] 0.000146 815 0.87% : 0.000001s : 8: predicate.accumulaten_eliminater 0.90% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 1.00% : 0.000001s : 8: predicate.addn_zero_filter 0.79% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.35% : 0.000003s : 14: predicate.arithmetic_simplify 0.82% : 0.000001s : 8: predicate.cast_eliminate 0.70% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.85% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.45% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_depend_swap 1.73% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.33% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.05% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 6: predicate.float_environ_get_switch 0.89% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.76% : 0.000001s : 6: predicate.incorporate_call 0.63% : 0.000001s : 6: predicate.incorporate_call_switch 6.36% : 0.000009s : 37: predicate.inline 0.92% : 0.000001s : 6: predicate.inline_without_move 0.44% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 6: predicate.less_batch_normalization 1.75% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.26% : 0.000003s : 22: predicate.load_eliminater 1.13% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.97% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 6: predicate.merge_addn 0.70% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 8: predicate.minmaximum_grad 1.13% : 0.000002s : 3: predicate.mutable_eliminate 0.40% : 0.000001s : 3: predicate.opt_reshape 0.41% : 0.000001s : 3: predicate.parallel_virtual_node 1.41% : 0.000002s : 11: predicate.partial_defer_inline 1.34% : 0.000002s : 11: predicate.partial_eliminate 0.81% : 0.000001s : 8: predicate.print_const_string_wrapper 0.79% : 0.000001s : 6: predicate.reduce_all_const_elim 1.12% : 0.000002s : 8: predicate.reduce_eliminate 2.24% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 6: predicate.remove_not_recompute_node 1.36% : 0.000002s : 14: predicate.replace_applicator 0.66% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.83% : 0.000001s : 8: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.51% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000001s : 6: predicate.same_eliminate 0.50% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.09% : 0.000002s : 6: predicate.shard_identity_eliminate 0.86% : 0.000001s : 6: predicate.special_op_eliminate 0.87% : 0.000001s : 6: predicate.specialize_transform 1.02% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.26% : 0.000002s : 11: predicate.switch_defer_inline 1.86% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.90% : 0.000007s : 38: predicate.switch_simplify 0.89% : 0.000001s : 8: predicate.tile_eliminate 0.83% : 0.000001s : 8: predicate.transpose_eliminate 1.53% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.60% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.18% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.98% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.83% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000270 7 33.57% : 0.000091s : 2: func_graph_cloner_run.FuncGraphClonerGraph 66.43% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028823 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.70% : 0.003084s : 1: add_attr 10.66% : 0.003074s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000066s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.06% : 0.000018s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.58% : 0.000454s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.51% : 0.000436s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.71% : 0.000494s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 3.03% : 0.000873s : 78: opt.transform.opt_a 0.08% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000092s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.38% : 0.002127s : 1: opt_a 0.34% : 0.000099s : 1: opt_after_cconv 1.64% : 0.000472s : 1: opt_after_jit_grad 0.67% : 0.000192s : 1: opt_b 14.01% : 0.004038s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.84% : 0.000243s : 1: renormalize.infer 0.71% : 0.000204s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000038s : 1: rewriter_after_opt_a 0.20% : 0.000057s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000075s : 1: symbol_engine_optimizer 21.27% : 0.006129s : 1: task_emit 0.25% : 0.000073s : 1: tuple_transform 19.87% : 0.005728s : 1: type_inference 0.22% : 0.000065s : 1: validate TotalTime = 0.0400016, [24] [bootstrap]: 0.00049021 [type_inference]: 0.0119444 [event_method]: 4.12e-05 [auto_monad]: 0.00012732 [graph_reusing]: 9.29e-06 [inline]: 2.12999e-06 [add_attr]: 0.0031171, [1] [add_attr_with_inline]: 0.00310796, [1] [Cycle 1]: 7.448e-05, [2] [tag_attr]: 3.374e-05 [meta_addattr_fg_expand]: 9.62001e-06 [parallel-infer-symbol]: 3.55003e-06 [pre_auto_parallel]: 4.795e-05 [insert-virtual-dataset]: 2.51998e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.72001e-06 [optimize]: 0.0167643, [53] [py_interpret_to_execute]: 3.974e-05 [rewriter_before_opt_a]: 0.00014608 [opt_a]: 0.0145461, [3] [Cycle 1]: 0.011009, [45] [expand_dump_flag]: 3.88001e-06 [switch_simplify]: 7.414e-05 [loop_unroll]: 6.047e-05 [a_1]: 0.00143025 [with_stream_mark]: 2.633e-05 [recompute_prepare]: 2.352e-05 [updatestate_depend_eliminate]: 9.37999e-06 [updatestate_assign_eliminate]: 7.19001e-06 [updatestate_loads_eliminate]: 7.25e-06 [parameter_eliminate]: 2.89999e-06 [a_2]: 0.00024854 [accelerated_algorithm]: 3.111e-05 [shard]: 1.89e-06 [meta_shard_fg_expand]: 4.02e-06 [shard_inline]: 1.645e-05 [merge_send_recv]: 1.766e-05 [auto_parallel]: 1.15e-05 [parallel]: 1.965e-05 [flash_sp]: 1.254e-05 [merge_comm]: 9.74999e-06 [allreduce_fusion]: 8.65001e-06 [matmul_add_comm_reduction]: 2.569e-05 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 1.918e-05 [virtual_dataset]: 1.574e-05 [get_grad_eliminate_]: 1.52e-05 [virtual_output]: 1.55e-05 [merge_forward]: 9.20999e-06 [cell_reuse_recompute_pass]: 9.70002e-07 [offload_activation]: 1.833e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.074e-05 [merge_recompute_call_nodes]: 1.54998e-06 [before_grad]: 2.938e-05 [set_forward_comm_id_for_comm_node_pass]: 9.39e-06 [meta_fg_expand]: 0.00151054 [flash_sp_send_recv_attached]: 3.64002e-06 [receive_attached]: 2.54999e-06 [after_resolve]: 6.612e-05 [a_after_grad]: 8.923e-05 [renormalize]: 0.00621072 [add_forward_monad_depend]: 1.03e-05 [auto_monad_grad]: 5.72001e-06 [auto_monad_eliminator]: 5.175e-05 [cse]: 0.0001822 [a_3]: 0.00033799 [Cycle 2]: 0.00282613, [45] [expand_dump_flag]: 2.16e-06 [switch_simplify]: 4.585e-05 [loop_unroll]: 4.258e-05 [a_1]: 0.00135957 [with_stream_mark]: 1.474e-05 [recompute_prepare]: 1.116e-05 [updatestate_depend_eliminate]: 4.36002e-06 [updatestate_assign_eliminate]: 2.94001e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 1.32999e-06 [a_2]: 9.178e-05 [accelerated_algorithm]: 1.062e-05 [shard]: 1.35999e-06 [meta_shard_fg_expand]: 1.97001e-06 [shard_inline]: 6.94001e-06 [merge_send_recv]: 7.50998e-06 [auto_parallel]: 8.04002e-06 [parallel]: 7.3e-06 [flash_sp]: 3.54002e-06 [merge_comm]: 4.1e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 7.65e-06 [allreduce_slice_to_reducescatter]: 4.7998e-07 [virtual_shard_identity]: 8.15e-06 [virtual_dataset]: 4.906e-05 [get_grad_eliminate_]: 7.03e-06 [virtual_output]: 6.31998e-06 [merge_forward]: 3.9e-06 [cell_reuse_recompute_pass]: 9.70002e-07 [offload_activation]: 9.00001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.527e-05 [merge_recompute_call_nodes]: 1.09e-06 [before_grad]: 1.183e-05 [set_forward_comm_id_for_comm_node_pass]: 4.99e-06 [meta_fg_expand]: 5.969e-05 [flash_sp_send_recv_attached]: 1.22999e-06 [receive_attached]: 1.51002e-06 [after_resolve]: 1.278e-05 [a_after_grad]: 1.081e-05 [renormalize]: 0.00063197 [add_forward_monad_depend]: 4.42998e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.242e-05 [cse]: 2.371e-05 [a_3]: 4.816e-05 [Cycle 3]: 0.00069395, [45] [expand_dump_flag]: 1.55999e-06 [switch_simplify]: 8.15e-06 [loop_unroll]: 6.71e-06 [a_1]: 0.00014846 [with_stream_mark]: 9.25001e-06 [recompute_prepare]: 7.05e-06 [updatestate_depend_eliminate]: 3.90998e-06 [updatestate_assign_eliminate]: 2.71e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 8.772e-05 [accelerated_algorithm]: 1.002e-05 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.34998e-06 [shard_inline]: 6.76999e-06 [merge_send_recv]: 5.90002e-06 [auto_parallel]: 6.66999e-06 [parallel]: 5.12e-06 [flash_sp]: 9.80013e-07 [merge_comm]: 3.66001e-06 [allreduce_fusion]: 3.66001e-06 [matmul_add_comm_reduction]: 6.16e-06 [allreduce_slice_to_reducescatter]: 4.60015e-07 [virtual_shard_identity]: 7.28e-06 [virtual_dataset]: 6.24999e-06 [get_grad_eliminate_]: 6.36e-06 [virtual_output]: 5.92999e-06 [merge_forward]: 3.26001e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 8.00999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.292e-05 [merge_recompute_call_nodes]: 9.30013e-07 [before_grad]: 1.065e-05 [set_forward_comm_id_for_comm_node_pass]: 3.81999e-06 [meta_fg_expand]: 2.56998e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.10999e-06 [a_after_grad]: 9.48002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.24998e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 8.92e-06 [cse]: 1.752e-05 [a_3]: 4.137e-05 [py_interpret_to_execute_after_opt_a]: 1.245e-05 [slice_cell_reuse_recomputed_activation]: 1.97001e-06 [rewriter_after_opt_a]: 4.36e-05 [convert_after_rewriter]: 7.38e-06 [order_py_execute_after_rewriter]: 5.54e-06 [mutable_eliminate]: 0.0005292 [opt_b]: 0.00022597, [1] [Cycle 1]: 0.00021892, [7] [b_1]: 0.00013622 [b_2]: 8.42e-06 [updatestate_depend_eliminate]: 7.18998e-06 [updatestate_assign_eliminate]: 2.97002e-06 [updatestate_loads_eliminate]: 2.95002e-06 [renormalize]: 6.00005e-07 [cse]: 2.314e-05 [optimize_parallel_all_gather_comm]: 1.708e-05 [overlap_param_gather]: 2.33998e-06 [cconv]: 2.466e-05 [loop_unroll]: 0.00044964 [opt_after_cconv]: 0.00011291, [1] [Cycle 1]: 0.00010627, [7] [c_1]: 3.366e-05 [parameter_eliminate]: 2.86999e-06 [updatestate_depend_eliminate]: 6.19999e-06 [updatestate_assign_eliminate]: 3.45e-06 [updatestate_loads_eliminate]: 2.88003e-06 [cse]: 2.06e-05 [renormalize]: 5.3001e-07 [remove_dup_value]: 1.671e-05 [tuple_transform]: 7.921e-05, [1] [Cycle 1]: 7.437e-05, [4] [d_1]: 4.694e-05 [none_parameter_eliminate]: 1.64998e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 7.21999e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 5.362e-05 [cse_after_recomputation]: 2.537e-05, [1] [Cycle 1]: 2.019e-05, [1] [cse]: 1.446e-05 [environ_conv]: 8.32e-06 [swap_dp_allreduce_reducescatter]: 5.54e-06 [bias_add_comm_swap]: 3.12002e-06 [label_micro_interleaved_index]: 4.39002e-06 [label_fine_grained_interleaved_index]: 2.60997e-06 [merge_cast_opt]: 1.50999e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.88e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.14998e-06 [full_micro_interleaved_order_control]: 2.37001e-06 [reorder_send_recv_between_fp_bp]: 2.84001e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.24e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.417e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 4.28999e-06 [overlap_recompute_and_grad_model_parallel]: 5.41002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.60999e-06 [overlap_recompute_comm]: 2.29999e-06 [overlap_grad_ring_attention]: 4.67e-06 [overlap_grad_flash_sp]: 2.173e-05 [begin_end_overlap_inline]: 6.10016e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 2.01e-06 [symbol_engine_optimizer]: 8.708e-05, [1] [Cycle 1]: 8.208e-05, [6] [build]: 8.68001e-06 [elim_shapecalc]: 1.068e-05 [elim_not_effective]: 1.493e-05 [opt_reshape]: 7.55e-06 [fold_const_symbol]: 1.164e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.02001e-06 [pipeline_parallel_scheduler]: 1.62001e-06 [auto_monad_reorder]: 2.1e-05 [get_jit_bprop_graph]: 1.72001e-06 [rewriter_after_jit_bprop_graph]: 3.60998e-06 [opt_after_jit_grad]: 0.00053759 [validate]: 4.401e-05 [backend_pass]: 1.18001e-06 [task_emit]: 0.00660262 [execute]: 7.81001e-06 Sums bootstrap : 0.000490s : 1.38% type_inference : 0.011944s : 33.62% event_method : 0.000041s : 0.12% auto_monad : 0.000127s : 0.36% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000048s : 0.13% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.11% optimize.rewriter_before_opt_a : 0.000146s : 0.41% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000128s : 0.36% optimize.opt_a.loop_unroll : 0.000110s : 0.31% optimize.opt_a.a_1 : 0.002938s : 8.27% optimize.opt_a.with_stream_mark : 0.000050s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000018s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000013s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000428s : 1.20% optimize.opt_a.accelerated_algorithm : 0.000052s : 0.15% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000030s : 0.08% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.07% optimize.opt_a.parallel : 0.000032s : 0.09% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000018s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000040s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000035s : 0.10% optimize.opt_a.virtual_dataset : 0.000071s : 0.20% optimize.opt_a.get_grad_eliminate_ : 0.000029s : 0.08% optimize.opt_a.virtual_output : 0.000028s : 0.08% optimize.opt_a.merge_forward : 0.000016s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000059s : 0.17% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000052s : 0.15% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.05% optimize.opt_a.meta_fg_expand : 0.001573s : 4.43% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.25% optimize.opt_a.a_after_grad : 0.000110s : 0.31% optimize.opt_a.renormalize : 0.006843s : 19.26% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000073s : 0.21% optimize.opt_a.cse : 0.000223s : 0.63% optimize.opt_a.a_3 : 0.000428s : 1.20% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000044s : 0.12% optimize.convert_after_rewriter : 0.000007s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000529s : 1.49% optimize.opt_b.b_1 : 0.000136s : 0.38% optimize.opt_b.b_2 : 0.000008s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000023s : 0.07% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.07% optimize.loop_unroll : 0.000450s : 1.27% optimize.opt_after_cconv.c_1 : 0.000034s : 0.09% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000021s : 0.06% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000017s : 0.05% optimize.tuple_transform.d_1 : 0.000047s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000054s : 0.15% optimize.cse_after_recomputation.cse : 0.000014s : 0.04% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000022s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000002s : 0.01% optimize.symbol_engine_optimizer.build : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.06% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000538s : 1.51% validate : 0.000044s : 0.12% backend_pass : 0.000001s : 0.00% task_emit : 0.006603s : 18.59% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000737 159 6.55% : 0.000048s : 7: substitution.arithmetic_simplify 0.32% : 0.000002s : 3: substitution.elim_not_effective 0.55% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 3: substitution.fold_const_symbol 0.86% : 0.000006s : 4: substitution.graph_param_transform 0.43% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 59.17% : 0.000436s : 17: substitution.inline 2.36% : 0.000017s : 2: substitution.inline_without_move 1.43% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.14% : 0.000016s : 3: substitution.less_batch_normalization 1.39% : 0.000010s : 7: substitution.minmaximum_grad 0.82% : 0.000006s : 5: substitution.partial_eliminate 1.80% : 0.000013s : 15: substitution.remove_not_recompute_node 3.75% : 0.000028s : 10: substitution.replace_applicator 1.37% : 0.000010s : 10: substitution.replace_old_param 0.41% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.94% : 0.000022s : 7: substitution.tuple_list_convert_item_index_to_positive 1.47% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.94% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.30% : 0.000054s : 18: substitution.tuple_list_get_item_eliminator 1.90% : 0.000014s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011867 2 87.94% : 0.010436s : 1: type_inference.infer 12.06% : 0.001431s : 1: type_inference.specialize ------[replace.] 0.000200 26 66.91% : 0.000134s : 17: replace.inline 33.09% : 0.000066s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000453 26 94.05% : 0.000426s : 17: match.inline 5.95% : 0.000027s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000683 4180 1.13% : 0.000008s : 52: predicate.accumulaten_eliminater 0.26% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.12% : 0.000008s : 52: predicate.addn_zero_filter 1.08% : 0.000007s : 52: predicate.adjust_all_reduce_mul_add 1.99% : 0.000014s : 73: predicate.arithmetic_simplify 1.15% : 0.000008s : 52: predicate.cast_eliminate 1.15% : 0.000008s : 50: predicate.check_bprop_eliminate 0.47% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.47% : 0.000003s : 21: predicate.depend_value_elim 1.18% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.18% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.17% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.19% : 0.000008s : 56: predicate.environ_get_depend_swap 1.70% : 0.000012s : 77: predicate.environ_get_eliminate 1.21% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.82% : 0.000012s : 78: predicate.exchange_switch_depend_value 2.50% : 0.000017s : 78: predicate.float_depend_g_call 0.47% : 0.000003s : 21: predicate.float_environ_get_switch 0.58% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.54% : 0.000004s : 21: predicate.get_grad_eliminate 0.06% : 0.000000s : 4: predicate.graph_param_transform 0.51% : 0.000004s : 21: predicate.incorporate_call 0.48% : 0.000003s : 21: predicate.incorporate_call_switch 5.86% : 0.000040s : 180: predicate.inline 1.46% : 0.000010s : 45: predicate.inline_without_move 0.29% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.60% : 0.000004s : 21: predicate.less_batch_normalization 1.57% : 0.000011s : 69: predicate.list_to_tuple_eliminator_ 2.64% : 0.000018s : 121: predicate.load_eliminater 0.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.57% : 0.000018s : 110: predicate.loop_unroll_before_grad 1.37% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.49% : 0.000003s : 21: predicate.merge_addn 1.13% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.15% : 0.000008s : 52: predicate.minmaximum_grad 0.31% : 0.000002s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.15% : 0.000001s : 4: predicate.parallel_virtual_node 2.11% : 0.000014s : 78: predicate.partial_defer_inline 1.68% : 0.000011s : 65: predicate.partial_eliminate 1.12% : 0.000008s : 52: predicate.print_const_string_wrapper 0.48% : 0.000003s : 21: predicate.reduce_all_const_elim 1.37% : 0.000009s : 52: predicate.reduce_eliminate 2.63% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 21: predicate.remove_not_recompute_node 1.86% : 0.000013s : 111: predicate.replace_applicator 0.67% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.13% : 0.000008s : 52: predicate.reshape_eliminate 1.14% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.31% : 0.000009s : 50: predicate.same_eliminate 0.33% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.57% : 0.000004s : 21: predicate.shard_identity_eliminate 0.23% : 0.000002s : 8: predicate.special_op_eliminate 0.62% : 0.000004s : 21: predicate.specialize_transform 1.23% : 0.000008s : 50: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.97% : 0.000013s : 78: predicate.switch_defer_inline 3.00% : 0.000021s : 128: predicate.switch_layer_defer_inline 5.20% : 0.000036s : 213: predicate.switch_simplify 1.21% : 0.000008s : 52: predicate.tile_eliminate 1.11% : 0.000008s : 52: predicate.transpose_eliminate 1.42% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.76% : 0.000019s : 90: predicate.tuple_list_get_item_eliminator 1.44% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000013s : 81: predicate.tuple_list_set_item_eliminator 1.57% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.62% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.15% : 0.000022s : 142: predicate.updatestate_useless_node_eliminater 0.11% : 0.000001s : 4: predicate.value_based_eliminate 0.54% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.51% : 0.000003s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001684 35 60.79% : 0.001023s : 14: func_graph_cloner_run.FuncGraphClonerGraph 39.21% : 0.000660s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.071416 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.37% : 0.003122s : 1: add_attr 4.36% : 0.003112s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000058s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000135s : 1: auto_monad 0.04% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.74% : 0.000530s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.04% : 0.000028s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.07% : 0.000048s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000005s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.64% : 0.000459s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.75% : 0.000539s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000016s : 1: opt.transform.mutable_eliminate 6.28% : 0.004487s : 117: opt.transform.opt_a 0.04% : 0.000032s : 1: opt.transform.opt_after_cconv 0.04% : 0.000026s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000117s : 28: opt.transform.opt_b 0.07% : 0.000052s : 2: opt.transform.opt_trans_graph 0.06% : 0.000041s : 4: opt.transform.symbol_engine_opt 20.37% : 0.014550s : 1: opt_a 0.16% : 0.000117s : 1: opt_after_cconv 0.77% : 0.000547s : 1: opt_after_jit_grad 0.32% : 0.000230s : 1: opt_b 23.48% : 0.016769s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.04% : 0.000025s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.07% : 0.000053s : 1: pre_auto_parallel 0.06% : 0.000044s : 1: py_interpret_to_execute 0.02% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000020s : 1: remove_dup_value 7.46% : 0.005324s : 2: renormalize.infer 2.11% : 0.001504s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.07% : 0.000048s : 1: rewriter_after_opt_a 0.21% : 0.000151s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.13% : 0.000090s : 1: symbol_engine_optimizer 9.26% : 0.006615s : 1: task_emit 0.12% : 0.000082s : 1: tuple_transform 16.75% : 0.011963s : 1: type_inference 0.11% : 0.000075s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x1-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x1-kbk],max_mem:6.0M TotalTime = 0.0641985, [24] [bootstrap]: 0.00049141 [type_inference]: 0.00610241 [event_method]: 1.362e-05 [auto_monad]: 5.903e-05 [graph_reusing]: 5.52001e-06 [inline]: 1.77001e-06 [add_attr]: 0.00351427, [1] [add_attr_with_inline]: 0.00350303, [1] [Cycle 1]: 4.556e-05, [2] [tag_attr]: 1.492e-05 [meta_addattr_fg_expand]: 4.3e-06 [parallel-infer-symbol]: 3.33e-06 [pre_auto_parallel]: 2.552e-05 [insert-virtual-dataset]: 2.39999e-06 [parallel-infer-symbol-second]: 8.29983e-07 [dataset_repeat_opt]: 2.43e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.00406793, [53] [py_interpret_to_execute]: 2.102e-05 [rewriter_before_opt_a]: 6.319e-05 [opt_a]: 0.00217239, [2] [Cycle 1]: 0.00156308, [45] [expand_dump_flag]: 3.18e-06 [switch_simplify]: 3.313e-05 [loop_unroll]: 2.059e-05 [a_1]: 0.0004573 [with_stream_mark]: 1.386e-05 [recompute_prepare]: 8.57998e-06 [updatestate_depend_eliminate]: 4.03001e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 3.06999e-06 [parameter_eliminate]: 2.14e-06 [a_2]: 8.336e-05 [accelerated_algorithm]: 6.64001e-06 [shard]: 1.89e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 6.11e-06 [merge_send_recv]: 7.36999e-06 [auto_parallel]: 7.06001e-06 [parallel]: 2.538e-05 [flash_sp]: 7.75998e-06 [merge_comm]: 4.40999e-06 [allreduce_fusion]: 3.83001e-06 [matmul_add_comm_reduction]: 9.14e-06 [allreduce_slice_to_reducescatter]: 6.80011e-07 [virtual_shard_identity]: 7.85e-06 [virtual_dataset]: 6.23998e-06 [get_grad_eliminate_]: 5.92001e-06 [virtual_output]: 5.92999e-06 [merge_forward]: 3.86999e-06 [cell_reuse_recompute_pass]: 1.06002e-06 [offload_activation]: 9.59e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.179e-05 [merge_recompute_call_nodes]: 1.57001e-06 [before_grad]: 1.058e-05 [set_forward_comm_id_for_comm_node_pass]: 3.68e-06 [meta_fg_expand]: 2.58003e-06 [flash_sp_send_recv_attached]: 2.58e-06 [receive_attached]: 2.09e-06 [after_resolve]: 1.02e-05 [a_after_grad]: 9.04e-06 [renormalize]: 0.00042259 [add_forward_monad_depend]: 8.50001e-06 [auto_monad_grad]: 1.86e-06 [auto_monad_eliminator]: 1.307e-05 [cse]: 2.768e-05 [a_3]: 4.226e-05 [Cycle 2]: 0.00059957, [45] [expand_dump_flag]: 8.30012e-07 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.90002e-06 [a_1]: 0.00011479 [with_stream_mark]: 1.012e-05 [recompute_prepare]: 5.92001e-06 [updatestate_depend_eliminate]: 3.16999e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.52001e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 7.259e-05 [accelerated_algorithm]: 5.71e-06 [shard]: 1.19e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 4.57e-06 [auto_parallel]: 5.12e-06 [parallel]: 4.47e-06 [flash_sp]: 3.67002e-06 [merge_comm]: 3.04999e-06 [allreduce_fusion]: 2.93e-06 [matmul_add_comm_reduction]: 5.24998e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 5.99999e-06 [virtual_dataset]: 5.42999e-06 [get_grad_eliminate_]: 5.15001e-06 [virtual_output]: 5.26002e-06 [merge_forward]: 2.69999e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 6.04001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.025e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 8.32998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.42002e-06 [meta_fg_expand]: 1.82001e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 9.60019e-07 [after_resolve]: 8.2e-06 [a_after_grad]: 7.77e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 9.50007e-07 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 6.15002e-06 [cse]: 1.627e-05 [a_3]: 3.359e-05 [py_interpret_to_execute_after_opt_a]: 7.63001e-06 [slice_cell_reuse_recomputed_activation]: 1.97001e-06 [rewriter_after_opt_a]: 3.185e-05 [convert_after_rewriter]: 7.1e-06 [order_py_execute_after_rewriter]: 4.81002e-06 [mutable_eliminate]: 0.0004605 [opt_b]: 0.00018985, [1] [Cycle 1]: 0.00018342, [7] [b_1]: 0.00011183 [b_2]: 6.89001e-06 [updatestate_depend_eliminate]: 5.55001e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.68e-06 [renormalize]: 3.4002e-07 [cse]: 1.84e-05 [optimize_parallel_all_gather_comm]: 1.593e-05 [overlap_param_gather]: 2.17001e-06 [cconv]: 2.183e-05 [loop_unroll]: 0.00042291 [opt_after_cconv]: 9.401e-05, [1] [Cycle 1]: 8.825e-05, [7] [c_1]: 2.461e-05 [parameter_eliminate]: 2.44999e-06 [updatestate_depend_eliminate]: 4.99998e-06 [updatestate_assign_eliminate]: 2.81e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.702e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.458e-05 [tuple_transform]: 6.845e-05, [1] [Cycle 1]: 6.412e-05, [4] [d_1]: 3.707e-05 [none_parameter_eliminate]: 1.59998e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 6.68003e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.965e-05 [cse_after_recomputation]: 2.074e-05, [1] [Cycle 1]: 1.621e-05, [1] [cse]: 1.089e-05 [environ_conv]: 7.46001e-06 [swap_dp_allreduce_reducescatter]: 5.15999e-06 [bias_add_comm_swap]: 2.69001e-06 [label_micro_interleaved_index]: 4.14002e-06 [label_fine_grained_interleaved_index]: 2.77002e-06 [merge_cast_opt]: 1.34998e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.45002e-06 [assign_add_opt]: 1.79e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.54998e-06 [full_micro_interleaved_order_control]: 2.23002e-06 [reorder_send_recv_between_fp_bp]: 3.01999e-06 [comm_op_add_attrs]: 1.25999e-06 [add_comm_op_reuse_tag]: 1.29e-06 [interleave_split_concat_branches]: 1.70001e-06 [interleave_parallel_branches]: 1.23002e-06 [overlap_opt_shard_in_pipeline]: 1.49e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.245e-05 [grouped_pairwise_exchange_alltoall]: 1.82001e-06 [offloading_packed_experts]: 4.22e-06 [overlap_recompute_and_grad_model_parallel]: 4.65001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.37999e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.753e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.23998e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 1.14003e-06 [symbol_engine_optimizer]: 7.05e-05, [1] [Cycle 1]: 6.62e-05, [6] [build]: 2.27999e-06 [elim_shapecalc]: 8.74e-06 [elim_not_effective]: 1.209e-05 [opt_reshape]: 6.33998e-06 [fold_const_symbol]: 9.46e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.88002e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 1.7e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00048329 [validate]: 3.424e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0491575 [execute]: 8.12998e-06 Sums bootstrap : 0.000491s : 0.82% type_inference : 0.006102s : 10.22% event_method : 0.000014s : 0.02% auto_monad : 0.000059s : 0.10% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000026s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.04% optimize.rewriter_before_opt_a : 0.000063s : 0.11% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.07% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000572s : 0.96% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000156s : 0.26% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000030s : 0.05% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000423s : 0.71% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.02% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000044s : 0.07% optimize.opt_a.a_3 : 0.000076s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000460s : 0.77% optimize.opt_b.b_1 : 0.000112s : 0.19% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.04% optimize.loop_unroll : 0.000423s : 0.71% optimize.opt_after_cconv.c_1 : 0.000025s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.02% optimize.tuple_transform.d_1 : 0.000037s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000007s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000017s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000483s : 0.81% validate : 0.000034s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.049157s : 82.35% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000179 26 17.52% : 0.000031s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.12% : 0.000006s : 3: substitution.graph_param_transform 65.82% : 0.000118s : 3: substitution.inline 1.87% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.75% : 0.000005s : 4: substitution.remove_not_recompute_node 2.06% : 0.000004s : 2: substitution.replace_old_param 5.04% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006033 2 90.36% : 0.005451s : 1: type_inference.infer 9.64% : 0.000582s : 1: type_inference.specialize ------[replace.] 0.000036 4 78.25% : 0.000028s : 3: replace.inline 21.75% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000124 4 93.29% : 0.000116s : 3: match.inline 6.71% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 883 0.93% : 0.000001s : 9: predicate.accumulaten_eliminater 0.88% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000001s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.19% : 0.000003s : 15: predicate.arithmetic_simplify 0.93% : 0.000001s : 9: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.62% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.00% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.38% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 12: predicate.environ_get_depend_swap 1.76% : 0.000003s : 18: predicate.environ_get_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.32% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.29% : 0.000004s : 13: predicate.float_depend_g_call 0.55% : 0.000001s : 6: predicate.float_environ_get_switch 0.83% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.80% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.75% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.33% : 0.000010s : 40: predicate.inline 0.90% : 0.000001s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 6: predicate.less_batch_normalization 1.68% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 25: predicate.load_eliminater 1.04% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.21% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.84% : 0.000001s : 9: predicate.minmaximum_grad 1.18% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.35% : 0.000001s : 3: predicate.parallel_virtual_node 1.65% : 0.000003s : 13: predicate.partial_defer_inline 1.45% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.16% : 0.000002s : 9: predicate.reduce_eliminate 2.31% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 6: predicate.remove_not_recompute_node 1.26% : 0.000002s : 16: predicate.replace_applicator 0.59% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.95% : 0.000002s : 9: predicate.reshape_eliminate 0.62% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 3: predicate.row_tensor_eliminate 0.86% : 0.000001s : 6: predicate.same_eliminate 0.44% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 6: predicate.shard_identity_eliminate 0.69% : 0.000001s : 6: predicate.special_op_eliminate 0.85% : 0.000001s : 6: predicate.specialize_transform 0.94% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.74% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.42% : 0.000002s : 13: predicate.switch_defer_inline 2.00% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 43: predicate.switch_simplify 0.96% : 0.000002s : 9: predicate.tile_eliminate 0.90% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.60% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.02% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.71% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.69% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000315 8 41.48% : 0.000131s : 3: func_graph_cloner_run.FuncGraphClonerGraph 58.52% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.073299 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.80% : 0.003519s : 1: add_attr 4.78% : 0.003506s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000054s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.09% : 0.000064s : 1: auto_monad 0.03% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.71% : 0.000517s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000005s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.59% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.64% : 0.000469s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.30% : 0.000950s : 78: opt.transform.opt_a 0.03% : 0.000023s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000091s : 28: opt.transform.opt_b 0.06% : 0.000042s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.97% : 0.002175s : 1: opt_a 0.13% : 0.000097s : 1: opt_after_cconv 0.67% : 0.000493s : 1: opt_after_jit_grad 0.26% : 0.000193s : 1: opt_b 5.56% : 0.004072s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 0.29% : 0.000215s : 1: renormalize.infer 0.27% : 0.000201s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000036s : 1: rewriter_after_opt_a 0.09% : 0.000067s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.10% : 0.000073s : 1: symbol_engine_optimizer 67.09% : 0.049179s : 1: task_emit 0.10% : 0.000071s : 1: tuple_transform 8.34% : 0.006116s : 1: type_inference 0.08% : 0.000056s : 1: validate TotalTime = 0.0598673, [24] [bootstrap]: 0.00043735 [type_inference]: 0.00622766 [event_method]: 1.386e-05 [auto_monad]: 6.027e-05 [graph_reusing]: 5.34e-06 [inline]: 2.17001e-06 [add_attr]: 0.0031794, [1] [add_attr_with_inline]: 0.00317058, [1] [Cycle 1]: 5.151e-05, [2] [tag_attr]: 1.609e-05 [meta_addattr_fg_expand]: 3.73999e-06 [parallel-infer-symbol]: 3.18e-06 [pre_auto_parallel]: 2.624e-05 [insert-virtual-dataset]: 3.10002e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.26998e-06 [pipeline_split]: 1.66998e-06 [optimize]: 0.00410858, [53] [py_interpret_to_execute]: 2.174e-05 [rewriter_before_opt_a]: 5.343e-05 [opt_a]: 0.00214192, [2] [Cycle 1]: 0.00152223, [45] [expand_dump_flag]: 2.73998e-06 [switch_simplify]: 2.852e-05 [loop_unroll]: 1.702e-05 [a_1]: 0.00035538 [with_stream_mark]: 1.605e-05 [recompute_prepare]: 7.82e-06 [updatestate_depend_eliminate]: 3.7e-06 [updatestate_assign_eliminate]: 3.86999e-06 [updatestate_loads_eliminate]: 3.45e-06 [parameter_eliminate]: 2.04999e-06 [a_2]: 8.228e-05 [accelerated_algorithm]: 6.93e-06 [shard]: 1.94999e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 6.18998e-06 [merge_send_recv]: 9.18002e-06 [auto_parallel]: 6.23e-06 [parallel]: 1.788e-05 [flash_sp]: 7.86001e-06 [merge_comm]: 3.66999e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 9.39998e-06 [allreduce_slice_to_reducescatter]: 1.30999e-06 [virtual_shard_identity]: 6.99001e-06 [virtual_dataset]: 6.39001e-06 [get_grad_eliminate_]: 5.73002e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 3.86001e-06 [cell_reuse_recompute_pass]: 1.16002e-06 [offload_activation]: 1.002e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.202e-05 [merge_recompute_call_nodes]: 1.52999e-06 [before_grad]: 1.06e-05 [set_forward_comm_id_for_comm_node_pass]: 3.68999e-06 [meta_fg_expand]: 2.68e-06 [flash_sp_send_recv_attached]: 2.59001e-06 [receive_attached]: 2.26e-06 [after_resolve]: 9.81998e-06 [a_after_grad]: 8.62e-06 [renormalize]: 0.00050164 [add_forward_monad_depend]: 4.94003e-06 [auto_monad_grad]: 2.17001e-06 [auto_monad_eliminator]: 1.353e-05 [cse]: 3.163e-05 [a_3]: 4.226e-05 [Cycle 2]: 0.00060958, [45] [expand_dump_flag]: 1.02998e-06 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.99e-06 [a_1]: 0.00011367 [with_stream_mark]: 1.068e-05 [recompute_prepare]: 5.96998e-06 [updatestate_depend_eliminate]: 3.04001e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.66e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 7.211e-05 [accelerated_algorithm]: 5.66998e-06 [shard]: 1.06002e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 5.17999e-06 [auto_parallel]: 5.57999e-06 [parallel]: 4.60999e-06 [flash_sp]: 3.49001e-06 [merge_comm]: 3.23e-06 [allreduce_fusion]: 2.86999e-06 [matmul_add_comm_reduction]: 5.75001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.28002e-06 [virtual_dataset]: 5.54e-06 [get_grad_eliminate_]: 5.55001e-06 [virtual_output]: 5.18002e-06 [merge_forward]: 2.68e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 6.02001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.103e-05 [merge_recompute_call_nodes]: 8.09989e-07 [before_grad]: 8.75999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.65e-06 [meta_fg_expand]: 2.02999e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.30013e-07 [after_resolve]: 8.82999e-06 [a_after_grad]: 7.66001e-06 [renormalize]: 5.9983e-08 [add_forward_monad_depend]: 1.42999e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 5.99999e-06 [cse]: 1.508e-05 [a_3]: 3.536e-05 [py_interpret_to_execute_after_opt_a]: 8.47e-06 [slice_cell_reuse_recomputed_activation]: 2.29999e-06 [rewriter_after_opt_a]: 3.427e-05 [convert_after_rewriter]: 6.89001e-06 [order_py_execute_after_rewriter]: 5.55001e-06 [mutable_eliminate]: 0.00051104 [opt_b]: 0.00019199, [1] [Cycle 1]: 0.00018442, [7] [b_1]: 0.00011122 [b_2]: 6.96001e-06 [updatestate_depend_eliminate]: 6.01e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.79001e-06 [renormalize]: 4.00003e-07 [cse]: 1.897e-05 [optimize_parallel_all_gather_comm]: 1.76e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.547e-05 [loop_unroll]: 0.00043522 [opt_after_cconv]: 9.779e-05, [1] [Cycle 1]: 9.172e-05, [7] [c_1]: 2.609e-05 [parameter_eliminate]: 3.23e-06 [updatestate_depend_eliminate]: 5.32999e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.51e-06 [cse]: 1.689e-05 [renormalize]: 6.39993e-07 [remove_dup_value]: 1.479e-05 [tuple_transform]: 7.037e-05, [1] [Cycle 1]: 6.584e-05, [4] [d_1]: 3.853e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.46e-06 [partial_unused_args_eliminate]: 2.29001e-06 [add_recomputation]: 4.672e-05 [cse_after_recomputation]: 2.133e-05, [1] [Cycle 1]: 1.596e-05, [1] [cse]: 1.085e-05 [environ_conv]: 6.51999e-06 [swap_dp_allreduce_reducescatter]: 5.40999e-06 [bias_add_comm_swap]: 2.76e-06 [label_micro_interleaved_index]: 4.29002e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.47001e-06 [micro_interleaved_order_control]: 2.56998e-06 [assign_add_opt]: 1.64e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.04998e-06 [full_micro_interleaved_order_control]: 2.46998e-06 [reorder_send_recv_between_fp_bp]: 3.12002e-06 [comm_op_add_attrs]: 1.26002e-06 [add_comm_op_reuse_tag]: 1.10001e-06 [interleave_split_concat_branches]: 1.39998e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.47001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.233e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 4.3e-06 [overlap_recompute_and_grad_model_parallel]: 5.16998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.29001e-06 [overlap_grad_ring_attention]: 4.35e-06 [overlap_grad_flash_sp]: 2.014e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.35002e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 1.72999e-06 [symbol_engine_optimizer]: 7.28e-05, [1] [Cycle 1]: 6.805e-05, [6] [build]: 2.86e-06 [elim_shapecalc]: 9.27999e-06 [elim_not_effective]: 1.207e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 9.14e-06 [renormalize]: 2.29978e-07 [detach_backward]: 2.21e-06 [pipeline_parallel_scheduler]: 1.62001e-06 [auto_monad_reorder]: 1.586e-05 [get_jit_bprop_graph]: 1.72999e-06 [rewriter_after_jit_bprop_graph]: 4.1e-06 [opt_after_jit_grad]: 0.00046406 [validate]: 3.909e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.0450345 [execute]: 9.31998e-06 Sums bootstrap : 0.000437s : 0.79% type_inference : 0.006228s : 11.19% event_method : 0.000014s : 0.02% auto_monad : 0.000060s : 0.11% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000026s : 0.05% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.04% optimize.rewriter_before_opt_a : 0.000053s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000035s : 0.06% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000469s : 0.84% optimize.opt_a.with_stream_mark : 0.000027s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000154s : 0.28% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000014s : 0.03% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.03% optimize.opt_a.renormalize : 0.000502s : 0.90% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000047s : 0.08% optimize.opt_a.a_3 : 0.000078s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000511s : 0.92% optimize.opt_b.b_1 : 0.000111s : 0.20% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.05% optimize.loop_unroll : 0.000435s : 0.78% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000039s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000007s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.04% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000002s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000464s : 0.83% validate : 0.000039s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.045034s : 80.90% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000145 24 20.29% : 0.000029s : 4: substitution.arithmetic_simplify 1.34% : 0.000002s : 2: substitution.elim_not_effective 0.94% : 0.000001s : 2: substitution.fold_const_symbol 4.00% : 0.000006s : 3: substitution.graph_param_transform 65.86% : 0.000096s : 3: substitution.inline 2.30% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.01% : 0.000004s : 4: substitution.remove_not_recompute_node 2.27% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006181 2 91.80% : 0.005674s : 1: type_inference.infer 8.20% : 0.000507s : 1: type_inference.specialize ------[replace.] 0.000029 3 100.00% : 0.000029s : 3: replace.inline ------[match.] 0.000094 3 100.00% : 0.000094s : 3: match.inline ------[predicate.] 0.000146 815 0.89% : 0.000001s : 8: predicate.accumulaten_eliminater 0.95% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 6: predicate.addn_check_dump 0.88% : 0.000001s : 8: predicate.addn_zero_filter 0.81% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.30% : 0.000003s : 14: predicate.arithmetic_simplify 0.90% : 0.000001s : 8: predicate.cast_eliminate 0.72% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 3: predicate.elim_not_effective 0.42% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_depend_swap 1.81% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.19% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.22% : 0.000003s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.91% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.73% : 0.000001s : 6: predicate.incorporate_call 0.62% : 0.000001s : 6: predicate.incorporate_call_switch 6.15% : 0.000009s : 37: predicate.inline 0.94% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 6: predicate.less_batch_normalization 1.50% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.33% : 0.000003s : 22: predicate.load_eliminater 1.16% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.97% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.91% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.66% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 8: predicate.minmaximum_grad 1.35% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.47% : 0.000002s : 11: predicate.partial_defer_inline 1.30% : 0.000002s : 11: predicate.partial_eliminate 0.94% : 0.000001s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.11% : 0.000002s : 8: predicate.reduce_eliminate 2.28% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.66% : 0.000001s : 6: predicate.remove_not_recompute_node 1.26% : 0.000002s : 14: predicate.replace_applicator 0.75% : 0.000001s : 6: predicate.replace_old_param 0.31% : 0.000000s : 3: predicate.reset_defer_inline 0.93% : 0.000001s : 8: predicate.reshape_eliminate 0.66% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 3: predicate.row_tensor_eliminate 0.93% : 0.000001s : 6: predicate.same_eliminate 0.50% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 6: predicate.shard_identity_eliminate 0.82% : 0.000001s : 6: predicate.special_op_eliminate 0.88% : 0.000001s : 6: predicate.specialize_transform 0.95% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.29% : 0.000002s : 11: predicate.switch_defer_inline 1.95% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.84% : 0.000007s : 38: predicate.switch_simplify 0.87% : 0.000001s : 8: predicate.tile_eliminate 1.03% : 0.000002s : 8: predicate.transpose_eliminate 1.49% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.19% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 3: predicate.value_based_eliminate 0.79% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000346 7 36.60% : 0.000126s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.40% : 0.000219s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.068645 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.64% : 0.003184s : 1: add_attr 4.62% : 0.003174s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000065s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.68% : 0.000466s : 1: bootstrap 0.04% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000010s : 1: environ_conv 0.03% : 0.000020s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000005s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.65% : 0.000444s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000521s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.22% : 0.000839s : 78: opt.transform.opt_a 0.04% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 3.12% : 0.002145s : 1: opt_a 0.15% : 0.000101s : 1: opt_after_cconv 0.69% : 0.000474s : 1: opt_after_jit_grad 0.28% : 0.000196s : 1: opt_b 5.99% : 0.004113s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000031s : 1: pre_auto_parallel 0.04% : 0.000025s : 1: py_interpret_to_execute 0.02% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000018s : 1: remove_dup_value 0.40% : 0.000272s : 1: renormalize.infer 0.32% : 0.000223s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000038s : 1: rewriter_after_opt_a 0.08% : 0.000058s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000076s : 1: symbol_engine_optimizer 65.64% : 0.045058s : 1: task_emit 0.11% : 0.000073s : 1: tuple_transform 9.10% : 0.006247s : 1: type_inference 0.10% : 0.000068s : 1: validate TotalTime = 0.0595699, [24] [bootstrap]: 0.00045545 [type_inference]: 0.00617826 [event_method]: 1.553e-05 [auto_monad]: 6.174e-05 [graph_reusing]: 5.84e-06 [inline]: 3.16999e-06 [add_attr]: 0.0032866, [1] [add_attr_with_inline]: 0.00327913, [1] [Cycle 1]: 4.247e-05, [2] [tag_attr]: 1.317e-05 [meta_addattr_fg_expand]: 4.31002e-06 [parallel-infer-symbol]: 3.27002e-06 [pre_auto_parallel]: 2.775e-05 [insert-virtual-dataset]: 2.55002e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.31998e-06 [pipeline_split]: 1.87999e-06 [optimize]: 0.00431539, [53] [py_interpret_to_execute]: 2.227e-05 [rewriter_before_opt_a]: 6.383e-05 [opt_a]: 0.0023757, [2] [Cycle 1]: 0.00167099, [45] [expand_dump_flag]: 2.63998e-06 [switch_simplify]: 2.78e-05 [loop_unroll]: 2.042e-05 [a_1]: 0.00043752 [with_stream_mark]: 1.488e-05 [recompute_prepare]: 7.92998e-06 [updatestate_depend_eliminate]: 3.43e-06 [updatestate_assign_eliminate]: 2.78998e-06 [updatestate_loads_eliminate]: 2.37999e-06 [parameter_eliminate]: 1.15999e-06 [a_2]: 8.025e-05 [accelerated_algorithm]: 6.78e-06 [shard]: 2.16998e-06 [meta_shard_fg_expand]: 1.96e-06 [shard_inline]: 5.96998e-06 [merge_send_recv]: 7.87e-06 [auto_parallel]: 6.31e-06 [parallel]: 1.862e-05 [flash_sp]: 8.79e-06 [merge_comm]: 3.78001e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 7.00998e-06 [allreduce_slice_to_reducescatter]: 7.10017e-07 [virtual_shard_identity]: 7.68001e-06 [virtual_dataset]: 6.01e-06 [get_grad_eliminate_]: 5.81e-06 [virtual_output]: 5.86998e-06 [merge_forward]: 3.91001e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 9.02999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.274e-05 [merge_recompute_call_nodes]: 1.53002e-06 [before_grad]: 1.042e-05 [set_forward_comm_id_for_comm_node_pass]: 3.85e-06 [meta_fg_expand]: 2.79001e-06 [flash_sp_send_recv_attached]: 2.97002e-06 [receive_attached]: 1.87999e-06 [after_resolve]: 9.72001e-06 [a_after_grad]: 8.70999e-06 [renormalize]: 0.00057886 [add_forward_monad_depend]: 5.52001e-06 [auto_monad_grad]: 2.72001e-06 [auto_monad_eliminator]: 1.433e-05 [cse]: 1.933e-05 [a_3]: 4.348e-05 [Cycle 2]: 0.00069359, [45] [expand_dump_flag]: 1.29e-06 [switch_simplify]: 7.2e-06 [loop_unroll]: 7.433e-05 [a_1]: 0.00011882 [with_stream_mark]: 1.173e-05 [recompute_prepare]: 6.36e-06 [updatestate_depend_eliminate]: 2.96999e-06 [updatestate_assign_eliminate]: 2.66999e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 1.10999e-06 [a_2]: 7.293e-05 [accelerated_algorithm]: 6.19001e-06 [shard]: 1.17999e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 6.41e-06 [merge_send_recv]: 4.94998e-06 [auto_parallel]: 5.52999e-06 [parallel]: 5.27999e-06 [flash_sp]: 3.37002e-06 [merge_comm]: 3.52997e-06 [allreduce_fusion]: 2.86999e-06 [matmul_add_comm_reduction]: 6.17999e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.22001e-06 [virtual_dataset]: 5.49e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.18002e-06 [merge_forward]: 2.82002e-06 [cell_reuse_recompute_pass]: 1.64998e-06 [offload_activation]: 7.07002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.057e-05 [merge_recompute_call_nodes]: 9.20001e-07 [before_grad]: 9.14e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58999e-06 [meta_fg_expand]: 1.81e-06 [flash_sp_send_recv_attached]: 1.02998e-06 [receive_attached]: 1.47999e-06 [after_resolve]: 9.10001e-06 [a_after_grad]: 7.68001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.71e-06 [auto_monad_grad]: 1.65001e-06 [auto_monad_eliminator]: 6.68e-06 [cse]: 1.553e-05 [a_3]: 3.394e-05 [py_interpret_to_execute_after_opt_a]: 9.66998e-06 [slice_cell_reuse_recomputed_activation]: 1.69e-06 [rewriter_after_opt_a]: 3.248e-05 [convert_after_rewriter]: 7.06001e-06 [order_py_execute_after_rewriter]: 4.85001e-06 [mutable_eliminate]: 0.00051077 [opt_b]: 0.00019443, [1] [Cycle 1]: 0.00018791, [7] [b_1]: 0.00011445 [b_2]: 7.32002e-06 [updatestate_depend_eliminate]: 5.61e-06 [updatestate_assign_eliminate]: 2.79999e-06 [updatestate_loads_eliminate]: 2.20002e-06 [renormalize]: 3.4002e-07 [cse]: 1.965e-05 [optimize_parallel_all_gather_comm]: 1.452e-05 [overlap_param_gather]: 1.66002e-06 [cconv]: 2.502e-05 [loop_unroll]: 0.00042892 [opt_after_cconv]: 0.00010018, [1] [Cycle 1]: 9.382e-05, [7] [c_1]: 2.499e-05 [parameter_eliminate]: 3.45e-06 [updatestate_depend_eliminate]: 5.86998e-06 [updatestate_assign_eliminate]: 2.94999e-06 [updatestate_loads_eliminate]: 2.53e-06 [cse]: 1.82e-05 [renormalize]: 3.70026e-07 [remove_dup_value]: 1.146e-05 [tuple_transform]: 6.859e-05, [1] [Cycle 1]: 6.339e-05, [4] [d_1]: 3.697e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 2.60014e-07 [switch_simplify]: 6.19999e-06 [partial_unused_args_eliminate]: 1.54998e-06 [add_recomputation]: 4.074e-05 [cse_after_recomputation]: 2.174e-05, [1] [Cycle 1]: 1.635e-05, [1] [cse]: 1.102e-05 [environ_conv]: 4.85999e-06 [swap_dp_allreduce_reducescatter]: 4.53999e-06 [bias_add_comm_swap]: 2.14e-06 [label_micro_interleaved_index]: 3.49001e-06 [label_fine_grained_interleaved_index]: 2.28002e-06 [merge_cast_opt]: 7.7e-07 [slice_recompute_activation]: 1.96998e-06 [micro_interleaved_order_control]: 2.17001e-06 [assign_add_opt]: 8.80013e-07 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 6.40022e-07 [full_micro_interleaved_order_control]: 3.03e-06 [reorder_send_recv_between_fp_bp]: 2.01e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 6.59988e-07 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.04003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.152e-05 [grouped_pairwise_exchange_alltoall]: 8.80013e-07 [offloading_packed_experts]: 3.86001e-06 [overlap_recompute_and_grad_model_parallel]: 4.29002e-06 [overlap_grad_matmul_and_grad_allreduce]: 9.5999e-07 [overlap_recompute_allgather_and_fa_grad]: 9.50007e-07 [overlap_recompute_comm]: 1.89999e-06 [overlap_grad_ring_attention]: 3.68999e-06 [overlap_grad_flash_sp]: 1.541e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 1.42999e-06 [split_layernorm_comm]: 1.49e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 7.316e-05, [1] [Cycle 1]: 6.824e-05, [6] [build]: 2.45002e-06 [elim_shapecalc]: 9.25001e-06 [elim_not_effective]: 1.288e-05 [opt_reshape]: 6.06998e-06 [fold_const_symbol]: 9.50001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.37e-06 [pipeline_parallel_scheduler]: 1.19998e-06 [auto_monad_reorder]: 1.348e-05 [get_jit_bprop_graph]: 1.17999e-06 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00049349 [validate]: 3.462e-05 [backend_pass]: 1.20999e-06 [task_emit]: 0.0444266 [execute]: 9.91e-06 Sums bootstrap : 0.000455s : 0.82% type_inference : 0.006178s : 11.18% event_method : 0.000016s : 0.03% auto_monad : 0.000062s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000028s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.04% optimize.rewriter_before_opt_a : 0.000064s : 0.12% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000035s : 0.06% optimize.opt_a.loop_unroll : 0.000095s : 0.17% optimize.opt_a.a_1 : 0.000556s : 1.01% optimize.opt_a.with_stream_mark : 0.000027s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000153s : 0.28% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000024s : 0.04% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.03% optimize.opt_a.renormalize : 0.000579s : 1.05% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.04% optimize.opt_a.cse : 0.000035s : 0.06% optimize.opt_a.a_3 : 0.000077s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000511s : 0.92% optimize.opt_b.b_1 : 0.000114s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.05% optimize.loop_unroll : 0.000429s : 0.78% optimize.opt_after_cconv.c_1 : 0.000025s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000011s : 0.02% optimize.tuple_transform.d_1 : 0.000037s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000041s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000003s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000015s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000001s : 0.00% optimize.split_layernorm_comm : 0.000001s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000013s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000493s : 0.89% validate : 0.000035s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.044427s : 80.41% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000171 26 19.35% : 0.000033s : 5: substitution.arithmetic_simplify 1.16% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 2.78% : 0.000005s : 3: substitution.graph_param_transform 63.39% : 0.000108s : 3: substitution.inline 2.16% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.87% : 0.000005s : 4: substitution.remove_not_recompute_node 2.18% : 0.000004s : 2: substitution.replace_old_param 5.30% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006128 2 89.68% : 0.005496s : 1: type_inference.infer 10.32% : 0.000632s : 1: type_inference.specialize ------[replace.] 0.000037 4 80.50% : 0.000030s : 3: replace.inline 19.50% : 0.000007s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 4 92.82% : 0.000106s : 3: match.inline 7.18% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 883 0.96% : 0.000002s : 9: predicate.accumulaten_eliminater 1.03% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.94% : 0.000001s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 15: predicate.arithmetic_simplify 0.91% : 0.000001s : 9: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.61% : 0.000001s : 6: predicate.depend_value_elim 0.86% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.03% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.34% : 0.000001s : 3: predicate.elim_not_effective 0.39% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_depend_swap 1.77% : 0.000003s : 18: predicate.environ_get_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.07% : 0.000003s : 13: predicate.float_depend_g_call 0.57% : 0.000001s : 6: predicate.float_environ_get_switch 0.85% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 3: predicate.fold_const_symbol 0.92% : 0.000001s : 6: predicate.get_grad_eliminate 0.26% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.28% : 0.000010s : 40: predicate.inline 0.92% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 6: predicate.less_batch_normalization 1.66% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.57% : 0.000004s : 25: predicate.load_eliminater 1.10% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.59% : 0.000002s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 9: predicate.minmaximum_grad 1.24% : 0.000002s : 3: predicate.mutable_eliminate 0.42% : 0.000001s : 3: predicate.opt_reshape 0.36% : 0.000001s : 3: predicate.parallel_virtual_node 1.62% : 0.000003s : 13: predicate.partial_defer_inline 1.47% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.58% : 0.000001s : 6: predicate.reduce_all_const_elim 1.37% : 0.000002s : 9: predicate.reduce_eliminate 2.43% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 6: predicate.remove_not_recompute_node 1.53% : 0.000002s : 16: predicate.replace_applicator 0.71% : 0.000001s : 6: predicate.replace_old_param 0.42% : 0.000001s : 3: predicate.reset_defer_inline 0.94% : 0.000001s : 9: predicate.reshape_eliminate 0.64% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 3: predicate.row_tensor_eliminate 0.87% : 0.000001s : 6: predicate.same_eliminate 0.47% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 6: predicate.shard_identity_eliminate 0.80% : 0.000001s : 6: predicate.special_op_eliminate 0.80% : 0.000001s : 6: predicate.specialize_transform 1.03% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.72% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 13: predicate.switch_defer_inline 1.94% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.73% : 0.000007s : 43: predicate.switch_simplify 0.90% : 0.000001s : 9: predicate.tile_eliminate 0.93% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.08% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.59% : 0.000002s : 16: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.04% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 3: predicate.value_based_eliminate 0.73% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 6: predicate.virtual_output_eliminate 0.36% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000398 8 46.07% : 0.000183s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.93% : 0.000215s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.068904 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.78% : 0.003291s : 1: add_attr 4.76% : 0.003282s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.07% : 0.000045s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000067s : 1: auto_monad 0.02% : 0.000017s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.71% : 0.000491s : 1: bootstrap 0.04% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000011s : 1: convert_after_rewriter 0.04% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000022s : 1: event_method 0.03% : 0.000018s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000007s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000006s : 1: label_micro_interleaved_index 0.64% : 0.000438s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000521s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000016s : 1: opt.transform.mutable_eliminate 1.45% : 0.001000s : 78: opt.transform.opt_a 0.03% : 0.000023s : 1: opt.transform.opt_after_cconv 0.03% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000093s : 28: opt.transform.opt_b 0.06% : 0.000041s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 3.45% : 0.002379s : 1: opt_a 0.15% : 0.000104s : 1: opt_after_cconv 0.73% : 0.000504s : 1: opt_after_jit_grad 0.29% : 0.000198s : 1: opt_b 6.27% : 0.004320s : 1: optimize 0.03% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.05% : 0.000032s : 1: pre_auto_parallel 0.04% : 0.000026s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.45% : 0.000313s : 1: renormalize.infer 0.38% : 0.000259s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000037s : 1: rewriter_after_opt_a 0.10% : 0.000069s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000004s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000076s : 1: symbol_engine_optimizer 64.51% : 0.044450s : 1: task_emit 0.10% : 0.000072s : 1: tuple_transform 8.99% : 0.006197s : 1: type_inference 0.08% : 0.000057s : 1: validate . TotalTime = 0.0779036, [24] [bootstrap]: 0.00040351 [type_inference]: 0.011273 [event_method]: 4.671e-05 [auto_monad]: 0.00012871 [graph_reusing]: 9.25001e-06 [inline]: 2.26e-06 [add_attr]: 0.0031167, [1] [add_attr_with_inline]: 0.00310824, [1] [Cycle 1]: 7.633e-05, [2] [tag_attr]: 3.514e-05 [meta_addattr_fg_expand]: 1.001e-05 [parallel-infer-symbol]: 3.18e-06 [pre_auto_parallel]: 4.958e-05 [insert-virtual-dataset]: 2.82002e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 2.29999e-06 [pipeline_split]: 1.84998e-06 [optimize]: 0.0169951, [53] [py_interpret_to_execute]: 4.007e-05 [rewriter_before_opt_a]: 0.00015629 [opt_a]: 0.0148223, [3] [Cycle 1]: 0.0114073, [45] [expand_dump_flag]: 3.71999e-06 [switch_simplify]: 7.717e-05 [loop_unroll]: 6.475e-05 [a_1]: 0.00143553 [with_stream_mark]: 2.442e-05 [recompute_prepare]: 2.209e-05 [updatestate_depend_eliminate]: 8.40999e-06 [updatestate_assign_eliminate]: 7.55998e-06 [updatestate_loads_eliminate]: 6.59999e-06 [parameter_eliminate]: 2.81999e-06 [a_2]: 0.00025558 [accelerated_algorithm]: 3.27e-05 [shard]: 2.01e-06 [meta_shard_fg_expand]: 3.73001e-06 [shard_inline]: 1.64e-05 [merge_send_recv]: 1.738e-05 [auto_parallel]: 1.151e-05 [parallel]: 1.991e-05 [flash_sp]: 1.136e-05 [merge_comm]: 9.05001e-06 [allreduce_fusion]: 8.46002e-06 [matmul_add_comm_reduction]: 2.808e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.764e-05 [virtual_dataset]: 1.564e-05 [get_grad_eliminate_]: 1.502e-05 [virtual_output]: 1.533e-05 [merge_forward]: 9.03002e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 1.881e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.026e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 2.881e-05 [set_forward_comm_id_for_comm_node_pass]: 9.85002e-06 [meta_fg_expand]: 0.00151982 [flash_sp_send_recv_attached]: 3.88001e-06 [receive_attached]: 2.69999e-06 [after_resolve]: 6.348e-05 [a_after_grad]: 8.724e-05 [renormalize]: 0.00661417 [add_forward_monad_depend]: 9.95002e-06 [auto_monad_grad]: 6.61999e-06 [auto_monad_eliminator]: 5.074e-05 [cse]: 0.00018809 [a_3]: 0.00033314 [Cycle 2]: 0.00272271, [45] [expand_dump_flag]: 2.17999e-06 [switch_simplify]: 4.598e-05 [loop_unroll]: 4.241e-05 [a_1]: 0.00132056 [with_stream_mark]: 1.284e-05 [recompute_prepare]: 8.51997e-06 [updatestate_depend_eliminate]: 4.13001e-06 [updatestate_assign_eliminate]: 3.46999e-06 [updatestate_loads_eliminate]: 3.06999e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 8.935e-05 [accelerated_algorithm]: 1.03e-05 [shard]: 1.16002e-06 [meta_shard_fg_expand]: 2.08998e-06 [shard_inline]: 6.89001e-06 [merge_send_recv]: 6.49001e-06 [auto_parallel]: 7.35e-06 [parallel]: 6.43003e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 4e-06 [allreduce_fusion]: 3.97002e-06 [matmul_add_comm_reduction]: 8.00999e-06 [allreduce_slice_to_reducescatter]: 5.00004e-07 [virtual_shard_identity]: 8.09002e-06 [virtual_dataset]: 6.42001e-06 [get_grad_eliminate_]: 6.18998e-06 [virtual_output]: 6.14999e-06 [merge_forward]: 3.76001e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 8.56002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.395e-05 [merge_recompute_call_nodes]: 1.15999e-06 [before_grad]: 1.194e-05 [set_forward_comm_id_for_comm_node_pass]: 4.19002e-06 [meta_fg_expand]: 8.556e-05 [flash_sp_send_recv_attached]: 1.35001e-06 [receive_attached]: 1.52001e-06 [after_resolve]: 1.168e-05 [a_after_grad]: 1.037e-05 [renormalize]: 0.00061615 [add_forward_monad_depend]: 4.58001e-06 [auto_monad_grad]: 1.29e-06 [auto_monad_eliminator]: 1.115e-05 [cse]: 2.063e-05 [a_3]: 4.788e-05 [Cycle 3]: 0.0006776, [45] [expand_dump_flag]: 8.10018e-07 [switch_simplify]: 8.12e-06 [loop_unroll]: 6.69001e-06 [a_1]: 0.00014557 [with_stream_mark]: 8.32e-06 [recompute_prepare]: 6.86001e-06 [updatestate_depend_eliminate]: 3.86001e-06 [updatestate_assign_eliminate]: 2.73e-06 [updatestate_loads_eliminate]: 2.51998e-06 [parameter_eliminate]: 9.29984e-07 [a_2]: 8.643e-05 [accelerated_algorithm]: 9.81e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.47001e-06 [shard_inline]: 6.94001e-06 [merge_send_recv]: 5.36998e-06 [auto_parallel]: 6.25002e-06 [parallel]: 4.85001e-06 [flash_sp]: 1.07e-06 [merge_comm]: 3.76001e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 5.63002e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 7.55998e-06 [virtual_dataset]: 6.83e-06 [get_grad_eliminate_]: 6.37001e-06 [virtual_output]: 6.01e-06 [merge_forward]: 3.34001e-06 [cell_reuse_recompute_pass]: 1.29003e-06 [offload_activation]: 6.59999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.219e-05 [merge_recompute_call_nodes]: 7.80012e-07 [before_grad]: 1.047e-05 [set_forward_comm_id_for_comm_node_pass]: 4.17e-06 [meta_fg_expand]: 2.33998e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 9.20001e-07 [after_resolve]: 9.03002e-06 [a_after_grad]: 9.47999e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 8.10018e-07 [auto_monad_eliminator]: 7.93001e-06 [cse]: 1.654e-05 [a_3]: 3.98e-05 [py_interpret_to_execute_after_opt_a]: 1.07e-05 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 4.075e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 5.74e-06 [mutable_eliminate]: 0.00051334 [opt_b]: 0.00022406, [1] [Cycle 1]: 0.00021766, [7] [b_1]: 0.00013494 [b_2]: 7.97e-06 [updatestate_depend_eliminate]: 5.66998e-06 [updatestate_assign_eliminate]: 3.00002e-06 [updatestate_loads_eliminate]: 2.74001e-06 [renormalize]: 3.50003e-07 [cse]: 2.151e-05 [optimize_parallel_all_gather_comm]: 1.831e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.235e-05 [loop_unroll]: 0.00043114 [opt_after_cconv]: 0.00010992, [1] [Cycle 1]: 0.00010399, [7] [c_1]: 3.286e-05 [parameter_eliminate]: 2.39001e-06 [updatestate_depend_eliminate]: 5.99e-06 [updatestate_assign_eliminate]: 3.14999e-06 [updatestate_loads_eliminate]: 2.70002e-06 [cse]: 2.147e-05 [renormalize]: 5.19998e-07 [remove_dup_value]: 1.601e-05 [tuple_transform]: 7.884e-05, [1] [Cycle 1]: 7.418e-05, [4] [d_1]: 4.579e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 7.36999e-06 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 5.128e-05 [cse_after_recomputation]: 2.482e-05, [1] [Cycle 1]: 2.009e-05, [1] [cse]: 1.469e-05 [environ_conv]: 8.57e-06 [swap_dp_allreduce_reducescatter]: 6.16998e-06 [bias_add_comm_swap]: 2.54999e-06 [label_micro_interleaved_index]: 4.50001e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.17001e-06 [micro_interleaved_order_control]: 2.06e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.53e-06 [reorder_send_recv_between_fp_bp]: 2.83e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.04998e-06 [interleave_split_concat_branches]: 1.44e-06 [interleave_parallel_branches]: 1.21002e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.365e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 4.08001e-06 [overlap_recompute_and_grad_model_parallel]: 5.60001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.55999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.69998e-06 [overlap_recompute_comm]: 2.35002e-06 [overlap_grad_ring_attention]: 4.52998e-06 [overlap_grad_flash_sp]: 1.994e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.58998e-06 [split_layernorm_comm]: 2.06e-06 [handle_group_info]: 1.10999e-06 [symbol_engine_optimizer]: 8.67e-05, [1] [Cycle 1]: 8.229e-05, [6] [build]: 9.02999e-06 [elim_shapecalc]: 1.077e-05 [elim_not_effective]: 1.458e-05 [opt_reshape]: 7.53e-06 [fold_const_symbol]: 1.152e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.97999e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 2.1e-05 [get_jit_bprop_graph]: 1.37999e-06 [rewriter_after_jit_bprop_graph]: 3.49001e-06 [opt_after_jit_grad]: 0.00046382 [validate]: 4.274e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0451156 [execute]: 9.26998e-06 Sums bootstrap : 0.000404s : 0.55% type_inference : 0.011273s : 15.34% event_method : 0.000047s : 0.06% auto_monad : 0.000129s : 0.18% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.05% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.07% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.05% optimize.rewriter_before_opt_a : 0.000156s : 0.21% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000131s : 0.18% optimize.opt_a.loop_unroll : 0.000114s : 0.15% optimize.opt_a.a_1 : 0.002902s : 3.95% optimize.opt_a.with_stream_mark : 0.000046s : 0.06% optimize.opt_a.recompute_prepare : 0.000037s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000431s : 0.59% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.07% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000030s : 0.04% optimize.opt_a.merge_send_recv : 0.000029s : 0.04% optimize.opt_a.auto_parallel : 0.000025s : 0.03% optimize.opt_a.parallel : 0.000031s : 0.04% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000017s : 0.02% optimize.opt_a.allreduce_fusion : 0.000016s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.06% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000033s : 0.05% optimize.opt_a.virtual_dataset : 0.000029s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.04% optimize.opt_a.virtual_output : 0.000027s : 0.04% optimize.opt_a.merge_forward : 0.000016s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000034s : 0.05% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000056s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000051s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.02% optimize.opt_a.meta_fg_expand : 0.001608s : 2.19% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000084s : 0.11% optimize.opt_a.a_after_grad : 0.000107s : 0.15% optimize.opt_a.renormalize : 0.007230s : 9.84% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.02% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000070s : 0.10% optimize.opt_a.cse : 0.000225s : 0.31% optimize.opt_a.a_3 : 0.000421s : 0.57% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000041s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000513s : 0.70% optimize.opt_b.b_1 : 0.000135s : 0.18% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000022s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000431s : 0.59% optimize.opt_after_cconv.c_1 : 0.000033s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000021s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.02% optimize.tuple_transform.d_1 : 0.000046s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.07% optimize.cse_after_recomputation.cse : 0.000015s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000464s : 0.63% validate : 0.000043s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.045116s : 61.39% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000711 161 7.35% : 0.000052s : 8: substitution.arithmetic_simplify 0.33% : 0.000002s : 3: substitution.elim_not_effective 0.68% : 0.000005s : 5: substitution.float_depend_g_call 0.61% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 3: substitution.fold_const_symbol 0.87% : 0.000006s : 4: substitution.graph_param_transform 0.42% : 0.000003s : 2: substitution.incorporate_call 0.30% : 0.000002s : 2: substitution.incorporate_call_switch 57.58% : 0.000409s : 17: substitution.inline 2.35% : 0.000017s : 2: substitution.inline_without_move 1.46% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.28% : 0.000016s : 3: substitution.less_batch_normalization 1.56% : 0.000011s : 7: substitution.minmaximum_grad 0.89% : 0.000006s : 5: substitution.partial_eliminate 1.68% : 0.000012s : 15: substitution.remove_not_recompute_node 3.81% : 0.000027s : 10: substitution.replace_applicator 1.31% : 0.000009s : 10: substitution.replace_old_param 0.36% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.00% : 0.000021s : 7: substitution.tuple_list_convert_item_index_to_positive 1.45% : 0.000010s : 7: substitution.tuple_list_get_item_const_eliminator 2.01% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.46% : 0.000053s : 19: substitution.tuple_list_get_item_eliminator 1.99% : 0.000014s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011195 2 85.76% : 0.009601s : 1: type_inference.infer 14.24% : 0.001594s : 1: type_inference.specialize ------[replace.] 0.000197 27 64.13% : 0.000126s : 17: replace.inline 35.87% : 0.000071s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000426 27 93.79% : 0.000400s : 17: match.inline 6.21% : 0.000026s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000685 4248 1.14% : 0.000008s : 53: predicate.accumulaten_eliminater 0.27% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.13% : 0.000008s : 53: predicate.addn_zero_filter 1.10% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 1.97% : 0.000013s : 74: predicate.arithmetic_simplify 1.15% : 0.000008s : 53: predicate.cast_eliminate 1.09% : 0.000007s : 50: predicate.check_bprop_eliminate 0.49% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.46% : 0.000003s : 21: predicate.depend_value_elim 1.16% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.24% : 0.000008s : 53: predicate.dict_get_item_eliminator 1.18% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.20% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_depend_swap 1.70% : 0.000012s : 78: predicate.environ_get_eliminate 1.20% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.83% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.56% : 0.000018s : 80: predicate.float_depend_g_call 0.47% : 0.000003s : 21: predicate.float_environ_get_switch 0.57% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.07% : 0.000000s : 4: predicate.fold_const_symbol 0.54% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.52% : 0.000004s : 21: predicate.incorporate_call 0.46% : 0.000003s : 21: predicate.incorporate_call_switch 5.93% : 0.000041s : 183: predicate.inline 1.47% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.63% : 0.000004s : 21: predicate.less_batch_normalization 1.56% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.64% : 0.000018s : 124: predicate.load_eliminater 0.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.59% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.35% : 0.000009s : 61: predicate.make_slice_get_slice_eliminator 0.48% : 0.000003s : 21: predicate.merge_addn 1.07% : 0.000007s : 50: predicate.micro_step_allgather_replace 1.07% : 0.000007s : 50: predicate.mini_step_allgather_replace 1.16% : 0.000008s : 53: predicate.minmaximum_grad 0.26% : 0.000002s : 4: predicate.mutable_eliminate 0.12% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 2.10% : 0.000014s : 80: predicate.partial_defer_inline 1.76% : 0.000012s : 67: predicate.partial_eliminate 1.11% : 0.000008s : 53: predicate.print_const_string_wrapper 0.47% : 0.000003s : 21: predicate.reduce_all_const_elim 1.35% : 0.000009s : 53: predicate.reduce_eliminate 2.69% : 0.000018s : 124: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 21: predicate.remove_not_recompute_node 1.93% : 0.000013s : 113: predicate.replace_applicator 0.66% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.15% : 0.000008s : 53: predicate.reshape_eliminate 1.10% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.21% : 0.000008s : 50: predicate.same_eliminate 0.34% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.58% : 0.000004s : 21: predicate.shard_identity_eliminate 0.21% : 0.000001s : 8: predicate.special_op_eliminate 0.61% : 0.000004s : 21: predicate.specialize_transform 1.19% : 0.000008s : 50: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.13% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.97% : 0.000013s : 80: predicate.switch_defer_inline 3.00% : 0.000021s : 130: predicate.switch_layer_defer_inline 5.24% : 0.000036s : 218: predicate.switch_simplify 1.11% : 0.000008s : 53: predicate.tile_eliminate 1.11% : 0.000008s : 53: predicate.transpose_eliminate 1.46% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000010s : 61: predicate.tuple_list_get_item_depend_reorder 2.74% : 0.000019s : 92: predicate.tuple_list_get_item_eliminator 1.48% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.54% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.61% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.13% : 0.000021s : 145: predicate.updatestate_useless_node_eliminater 0.11% : 0.000001s : 4: predicate.value_based_eliminate 0.54% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.53% : 0.000004s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001780 36 58.92% : 0.001049s : 15: func_graph_cloner_run.FuncGraphClonerGraph 41.08% : 0.000731s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.109838 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.84% : 0.003121s : 1: add_attr 2.83% : 0.003112s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.12% : 0.000136s : 1: auto_monad 0.02% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.39% : 0.000423s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000028s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.05% : 0.000054s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000014s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.40% : 0.000440s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.48% : 0.000522s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 4.01% : 0.004401s : 117: opt.transform.opt_a 0.03% : 0.000031s : 1: opt.transform.opt_after_cconv 0.02% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000114s : 28: opt.transform.opt_b 0.05% : 0.000051s : 2: opt.transform.opt_trans_graph 0.04% : 0.000040s : 4: opt.transform.symbol_engine_opt 13.50% : 0.014825s : 1: opt_a 0.10% : 0.000113s : 1: opt_after_cconv 0.43% : 0.000473s : 1: opt_after_jit_grad 0.21% : 0.000227s : 1: opt_b 15.48% : 0.017000s : 1: optimize 0.02% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000054s : 1: pre_auto_parallel 0.04% : 0.000044s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000020s : 1: remove_dup_value 5.11% : 0.005614s : 2: renormalize.infer 1.46% : 0.001601s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000045s : 1: rewriter_after_opt_a 0.15% : 0.000161s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000089s : 1: symbol_engine_optimizer 41.09% : 0.045137s : 1: task_emit 0.07% : 0.000082s : 1: tuple_transform 10.28% : 0.011290s : 1: type_inference 0.06% : 0.000067s : 1: validate TotalTime = 0.0596513, [24] [bootstrap]: 0.00045397 [type_inference]: 0.00600594 [event_method]: 1.255e-05 [auto_monad]: 6.292e-05 [graph_reusing]: 5.88998e-06 [inline]: 2.86e-06 [add_attr]: 0.00344851, [1] [add_attr_with_inline]: 0.00343923, [1] [Cycle 1]: 5.213e-05, [2] [tag_attr]: 1.466e-05 [meta_addattr_fg_expand]: 4.1e-06 [parallel-infer-symbol]: 3.31999e-06 [pre_auto_parallel]: 2.773e-05 [insert-virtual-dataset]: 3.08998e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 2.21e-06 [pipeline_split]: 2.41e-06 [optimize]: 0.00430681, [53] [py_interpret_to_execute]: 2.111e-05 [rewriter_before_opt_a]: 5.536e-05 [opt_a]: 0.00227929, [2] [Cycle 1]: 0.00163718, [45] [expand_dump_flag]: 3.41001e-06 [switch_simplify]: 2.793e-05 [loop_unroll]: 1.71e-05 [a_1]: 0.00036588 [with_stream_mark]: 1.586e-05 [recompute_prepare]: 8.33999e-06 [updatestate_depend_eliminate]: 3.85e-06 [updatestate_assign_eliminate]: 3.2e-06 [updatestate_loads_eliminate]: 3.02002e-06 [parameter_eliminate]: 1.87999e-06 [a_2]: 8.14e-05 [accelerated_algorithm]: 6.79999e-06 [shard]: 1.74e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 6.71999e-06 [merge_send_recv]: 7.53e-06 [auto_parallel]: 6.71e-06 [parallel]: 1.892e-05 [flash_sp]: 8.17e-06 [merge_comm]: 3.93999e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 7.68999e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 8.08999e-06 [virtual_dataset]: 7.06999e-06 [get_grad_eliminate_]: 6.69999e-06 [virtual_output]: 6.67002e-06 [merge_forward]: 4.28999e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 9.05001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.243e-05 [merge_recompute_call_nodes]: 1.26002e-06 [before_grad]: 1.007e-05 [set_forward_comm_id_for_comm_node_pass]: 3.8e-06 [meta_fg_expand]: 2.40002e-06 [flash_sp_send_recv_attached]: 2.96999e-06 [receive_attached]: 1.82001e-06 [after_resolve]: 9.98998e-06 [a_after_grad]: 9.19e-06 [renormalize]: 0.00058187 [add_forward_monad_depend]: 5.37999e-06 [auto_monad_grad]: 3.09001e-06 [auto_monad_eliminator]: 1.51e-05 [cse]: 3.13e-05 [a_3]: 4.591e-05 [Cycle 2]: 0.00063153, [45] [expand_dump_flag]: 8.49977e-07 [switch_simplify]: 7.05e-06 [loop_unroll]: 5.64e-06 [a_1]: 0.00012071 [with_stream_mark]: 1.025e-05 [recompute_prepare]: 5.81e-06 [updatestate_depend_eliminate]: 3.08e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.00999e-06 [a_2]: 7.626e-05 [accelerated_algorithm]: 5.84e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.88002e-06 [merge_send_recv]: 4.99998e-06 [auto_parallel]: 6.08002e-06 [parallel]: 4.68999e-06 [flash_sp]: 3.58e-06 [merge_comm]: 3.38999e-06 [allreduce_fusion]: 3.37002e-06 [matmul_add_comm_reduction]: 5.76e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 6.00002e-06 [virtual_dataset]: 5.76998e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 5.30001e-06 [merge_forward]: 2.93e-06 [cell_reuse_recompute_pass]: 1.45001e-06 [offload_activation]: 6.11e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.048e-05 [merge_recompute_call_nodes]: 8.39995e-07 [before_grad]: 9.77001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.85e-06 [meta_fg_expand]: 1.89999e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 9.40025e-07 [after_resolve]: 9.02999e-06 [a_after_grad]: 7.87e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.33002e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 7.01999e-06 [cse]: 1.552e-05 [a_3]: 3.486e-05 [py_interpret_to_execute_after_opt_a]: 8.57998e-06 [slice_cell_reuse_recomputed_activation]: 2.31e-06 [rewriter_after_opt_a]: 3.659e-05 [convert_after_rewriter]: 6.51999e-06 [order_py_execute_after_rewriter]: 5.37999e-06 [mutable_eliminate]: 0.00055844 [opt_b]: 0.00019324, [1] [Cycle 1]: 0.00018628, [7] [b_1]: 0.00011386 [b_2]: 7.26001e-06 [updatestate_depend_eliminate]: 5.54e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.67001e-06 [renormalize]: 4.69998e-07 [cse]: 1.861e-05 [optimize_parallel_all_gather_comm]: 1.599e-05 [overlap_param_gather]: 1.97001e-06 [cconv]: 2.48e-05 [loop_unroll]: 0.00044948 [opt_after_cconv]: 9.813e-05, [1] [Cycle 1]: 9.166e-05, [7] [c_1]: 2.513e-05 [parameter_eliminate]: 2.44001e-06 [updatestate_depend_eliminate]: 5.29003e-06 [updatestate_assign_eliminate]: 2.88e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.813e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.537e-05 [tuple_transform]: 7.194e-05, [1] [Cycle 1]: 6.696e-05, [4] [d_1]: 3.949e-05 [none_parameter_eliminate]: 1.71998e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.88e-06 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 4.361e-05 [cse_after_recomputation]: 2.265e-05, [1] [Cycle 1]: 1.754e-05, [1] [cse]: 1.163e-05 [environ_conv]: 4.95001e-06 [swap_dp_allreduce_reducescatter]: 5.34e-06 [bias_add_comm_swap]: 2.93e-06 [label_micro_interleaved_index]: 4.72e-06 [label_fine_grained_interleaved_index]: 3.08998e-06 [merge_cast_opt]: 1.62001e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.47001e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.26998e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.37e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94999e-06 [control_data_broadcast_order]: 1.28e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 3.97e-06 [overlap_recompute_and_grad_model_parallel]: 4.53999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 4.63999e-06 [overlap_grad_flash_sp]: 1.629e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 7.427e-05, [1] [Cycle 1]: 6.937e-05, [6] [build]: 3.03e-06 [elim_shapecalc]: 9.51998e-06 [elim_not_effective]: 1.158e-05 [opt_reshape]: 6.38e-06 [fold_const_symbol]: 9.99999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.54998e-06 [auto_monad_reorder]: 1.689e-05 [get_jit_bprop_graph]: 1.62001e-06 [rewriter_after_jit_bprop_graph]: 3.83999e-06 [opt_after_jit_grad]: 0.0004862 [validate]: 3.837e-05 [backend_pass]: 1.04998e-06 [task_emit]: 0.0445421 [execute]: 7.94002e-06 Sums bootstrap : 0.000454s : 0.82% type_inference : 0.006006s : 10.89% event_method : 0.000013s : 0.02% auto_monad : 0.000063s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000028s : 0.05% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.04% optimize.rewriter_before_opt_a : 0.000055s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000035s : 0.06% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000487s : 0.88% optimize.opt_a.with_stream_mark : 0.000026s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000158s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000013s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000013s : 0.02% optimize.opt_a.parallel : 0.000024s : 0.04% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000013s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.02% optimize.opt_a.virtual_output : 0.000012s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000582s : 1.05% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.04% optimize.opt_a.cse : 0.000047s : 0.08% optimize.opt_a.a_3 : 0.000081s : 0.15% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000037s : 0.07% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000558s : 1.01% optimize.opt_b.b_1 : 0.000114s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.04% optimize.loop_unroll : 0.000449s : 0.81% optimize.opt_after_cconv.c_1 : 0.000025s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000039s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.08% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000486s : 0.88% validate : 0.000038s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.044542s : 80.75% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000146 24 19.71% : 0.000029s : 4: substitution.arithmetic_simplify 1.39% : 0.000002s : 2: substitution.elim_not_effective 1.14% : 0.000002s : 2: substitution.fold_const_symbol 3.68% : 0.000005s : 3: substitution.graph_param_transform 66.55% : 0.000097s : 3: substitution.inline 2.11% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.30% : 0.000005s : 4: substitution.remove_not_recompute_node 2.13% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005958 2 91.56% : 0.005455s : 1: type_inference.infer 8.44% : 0.000503s : 1: type_inference.specialize ------[replace.] 0.000030 3 100.00% : 0.000030s : 3: replace.inline ------[match.] 0.000095 3 100.00% : 0.000095s : 3: match.inline ------[predicate.] 0.000155 815 0.91% : 0.000001s : 8: predicate.accumulaten_eliminater 0.94% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 0.88% : 0.000001s : 8: predicate.addn_zero_filter 0.81% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.50% : 0.000004s : 14: predicate.arithmetic_simplify 0.85% : 0.000001s : 8: predicate.cast_eliminate 0.74% : 0.000001s : 6: predicate.check_bprop_eliminate 0.59% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.68% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.22% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 11: predicate.environ_get_depend_swap 1.75% : 0.000003s : 17: predicate.environ_get_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.02% : 0.000003s : 11: predicate.float_depend_g_call 0.60% : 0.000001s : 6: predicate.float_environ_get_switch 0.85% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.72% : 0.000001s : 6: predicate.incorporate_call 0.63% : 0.000001s : 6: predicate.incorporate_call_switch 6.08% : 0.000009s : 37: predicate.inline 0.89% : 0.000001s : 6: predicate.inline_without_move 0.43% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.09% : 0.000002s : 6: predicate.less_batch_normalization 1.75% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.30% : 0.000004s : 22: predicate.load_eliminater 1.08% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.87% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.71% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.90% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 8: predicate.minmaximum_grad 1.54% : 0.000002s : 3: predicate.mutable_eliminate 0.39% : 0.000001s : 3: predicate.opt_reshape 0.48% : 0.000001s : 3: predicate.parallel_virtual_node 1.39% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 11: predicate.partial_eliminate 1.03% : 0.000002s : 8: predicate.print_const_string_wrapper 0.76% : 0.000001s : 6: predicate.reduce_all_const_elim 1.28% : 0.000002s : 8: predicate.reduce_eliminate 2.22% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.62% : 0.000001s : 6: predicate.remove_not_recompute_node 1.23% : 0.000002s : 14: predicate.replace_applicator 0.72% : 0.000001s : 6: predicate.replace_old_param 0.27% : 0.000000s : 3: predicate.reset_defer_inline 0.93% : 0.000001s : 8: predicate.reshape_eliminate 0.66% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.86% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 6: predicate.shard_identity_eliminate 0.82% : 0.000001s : 6: predicate.special_op_eliminate 0.88% : 0.000001s : 6: predicate.specialize_transform 1.10% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.21% : 0.000002s : 11: predicate.switch_defer_inline 1.79% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.51% : 0.000007s : 38: predicate.switch_simplify 0.94% : 0.000001s : 8: predicate.tile_eliminate 1.03% : 0.000002s : 8: predicate.transpose_eliminate 1.52% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.58% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.74% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.87% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.61% : 0.000001s : 3: predicate.value_based_eliminate 0.77% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000317 7 35.27% : 0.000112s : 2: func_graph_cloner_run.FuncGraphClonerGraph 64.73% : 0.000205s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.069010 196 0.01% : 0.000004s : 1: ForceFp32Comm 5.00% : 0.003454s : 1: add_attr 4.99% : 0.003443s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000068s : 1: auto_monad 0.03% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.69% : 0.000476s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000026s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000018s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.66% : 0.000458s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.82% : 0.000568s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000016s : 1: opt.transform.mutable_eliminate 1.26% : 0.000867s : 78: opt.transform.opt_a 0.03% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000093s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 3.31% : 0.002283s : 1: opt_a 0.15% : 0.000103s : 1: opt_after_cconv 0.72% : 0.000497s : 1: opt_after_jit_grad 0.28% : 0.000197s : 1: opt_b 6.25% : 0.004311s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000006s : 1: pipeline_split 0.05% : 0.000032s : 1: pre_auto_parallel 0.04% : 0.000025s : 1: py_interpret_to_execute 0.02% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.49% : 0.000339s : 1: renormalize.infer 0.34% : 0.000236s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000040s : 1: rewriter_after_opt_a 0.09% : 0.000059s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000077s : 1: symbol_engine_optimizer 64.58% : 0.044565s : 1: task_emit 0.11% : 0.000075s : 1: tuple_transform 8.73% : 0.006023s : 1: type_inference 0.09% : 0.000065s : 1: validate TotalTime = 0.0773888, [24] [bootstrap]: 0.00055127 [type_inference]: 0.0118946 [event_method]: 4.399e-05 [auto_monad]: 0.00012716 [graph_reusing]: 8.57e-06 [inline]: 1.88997e-06 [add_attr]: 0.00306612, [1] [add_attr_with_inline]: 0.00305827, [1] [Cycle 1]: 6.862e-05, [2] [tag_attr]: 3.15e-05 [meta_addattr_fg_expand]: 9.49999e-06 [parallel-infer-symbol]: 3.2e-06 [pre_auto_parallel]: 4.695e-05 [insert-virtual-dataset]: 2.44999e-06 [parallel-infer-symbol-second]: 8.29983e-07 [dataset_repeat_opt]: 1.96998e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.0166123, [53] [py_interpret_to_execute]: 3.606e-05 [rewriter_before_opt_a]: 0.00014796 [opt_a]: 0.014398, [3] [Cycle 1]: 0.0109727, [45] [expand_dump_flag]: 3.89002e-06 [switch_simplify]: 7.259e-05 [loop_unroll]: 6e-05 [a_1]: 0.00140377 [with_stream_mark]: 2.331e-05 [recompute_prepare]: 2.255e-05 [updatestate_depend_eliminate]: 8.52998e-06 [updatestate_assign_eliminate]: 7.48e-06 [updatestate_loads_eliminate]: 6.83998e-06 [parameter_eliminate]: 2.67001e-06 [a_2]: 0.00024623 [accelerated_algorithm]: 3.19e-05 [shard]: 2.04999e-06 [meta_shard_fg_expand]: 3.58e-06 [shard_inline]: 1.591e-05 [merge_send_recv]: 1.629e-05 [auto_parallel]: 1.035e-05 [parallel]: 2.392e-05 [flash_sp]: 1.193e-05 [merge_comm]: 9.26998e-06 [allreduce_fusion]: 8.48999e-06 [matmul_add_comm_reduction]: 2.59e-05 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 1.745e-05 [virtual_dataset]: 1.58e-05 [get_grad_eliminate_]: 1.511e-05 [virtual_output]: 1.514e-05 [merge_forward]: 9.28002e-06 [cell_reuse_recompute_pass]: 1.09998e-06 [offload_activation]: 1.718e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.928e-05 [merge_recompute_call_nodes]: 1.57999e-06 [before_grad]: 2.854e-05 [set_forward_comm_id_for_comm_node_pass]: 9.66998e-06 [meta_fg_expand]: 0.00146152 [flash_sp_send_recv_attached]: 3.73001e-06 [receive_attached]: 2.58998e-06 [after_resolve]: 6.449e-05 [a_after_grad]: 8.827e-05 [renormalize]: 0.00628569 [add_forward_monad_depend]: 9.61003e-06 [auto_monad_grad]: 6.46999e-06 [auto_monad_eliminator]: 5.114e-05 [cse]: 0.0001867 [a_3]: 0.00033584 [Cycle 2]: 0.00272391, [45] [expand_dump_flag]: 1.67001e-06 [switch_simplify]: 4.6e-05 [loop_unroll]: 4.247e-05 [a_1]: 0.00133519 [with_stream_mark]: 1.241e-05 [recompute_prepare]: 9.00001e-06 [updatestate_depend_eliminate]: 4.25e-06 [updatestate_assign_eliminate]: 3.42002e-06 [updatestate_loads_eliminate]: 2.70002e-06 [parameter_eliminate]: 1.24e-06 [a_2]: 9.074e-05 [accelerated_algorithm]: 1.048e-05 [shard]: 1.52999e-06 [meta_shard_fg_expand]: 2.05002e-06 [shard_inline]: 6.91999e-06 [merge_send_recv]: 7.06001e-06 [auto_parallel]: 7.66001e-06 [parallel]: 5.80002e-06 [flash_sp]: 3.48999e-06 [merge_comm]: 3.90998e-06 [allreduce_fusion]: 3.62998e-06 [matmul_add_comm_reduction]: 7.31001e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 7.71999e-06 [virtual_dataset]: 6.36e-06 [get_grad_eliminate_]: 6.26e-06 [virtual_output]: 6.05002e-06 [merge_forward]: 3.45e-06 [cell_reuse_recompute_pass]: 8.50006e-07 [offload_activation]: 8.39998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.335e-05 [merge_recompute_call_nodes]: 1.00001e-06 [before_grad]: 1.232e-05 [set_forward_comm_id_for_comm_node_pass]: 4.57e-06 [meta_fg_expand]: 5.446e-05 [flash_sp_send_recv_attached]: 1.17999e-06 [receive_attached]: 1.44e-06 [after_resolve]: 1.189e-05 [a_after_grad]: 1.007e-05 [renormalize]: 0.00063174 [add_forward_monad_depend]: 4.4e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.134e-05 [cse]: 2.158e-05 [a_3]: 4.832e-05 [Cycle 3]: 0.00068554, [45] [expand_dump_flag]: 8.00006e-07 [switch_simplify]: 8.27e-06 [loop_unroll]: 6.78e-06 [a_1]: 0.00014688 [with_stream_mark]: 8.64998e-06 [recompute_prepare]: 7.48e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 2.81e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 8.65e-05 [accelerated_algorithm]: 9.81e-06 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.33002e-06 [shard_inline]: 6.79001e-06 [merge_send_recv]: 5.86998e-06 [auto_parallel]: 6.24001e-06 [parallel]: 5.35999e-06 [flash_sp]: 9.09989e-07 [merge_comm]: 3.85e-06 [allreduce_fusion]: 3.65998e-06 [matmul_add_comm_reduction]: 5.71003e-06 [allreduce_slice_to_reducescatter]: 3.99974e-07 [virtual_shard_identity]: 8.35001e-06 [virtual_dataset]: 6.32001e-06 [get_grad_eliminate_]: 6.38e-06 [virtual_output]: 6.06998e-06 [merge_forward]: 3.46001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 7.23e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.266e-05 [merge_recompute_call_nodes]: 8.09989e-07 [before_grad]: 1.084e-05 [set_forward_comm_id_for_comm_node_pass]: 3.66999e-06 [meta_fg_expand]: 2.46e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 9.39996e-07 [after_resolve]: 8.80999e-06 [a_after_grad]: 9.45001e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.32e-06 [auto_monad_grad]: 9.79984e-07 [auto_monad_eliminator]: 7.46999e-06 [cse]: 1.677e-05 [a_3]: 3.995e-05 [py_interpret_to_execute_after_opt_a]: 1.128e-05 [slice_cell_reuse_recomputed_activation]: 2.02001e-06 [rewriter_after_opt_a]: 4.033e-05 [convert_after_rewriter]: 7.83999e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00054711 [opt_b]: 0.00021984, [1] [Cycle 1]: 0.00021308, [7] [b_1]: 0.00013446 [b_2]: 8.47998e-06 [updatestate_depend_eliminate]: 5.86e-06 [updatestate_assign_eliminate]: 2.96999e-06 [updatestate_loads_eliminate]: 2.61e-06 [renormalize]: 5.50004e-07 [cse]: 2.245e-05 [optimize_parallel_all_gather_comm]: 2.378e-05 [overlap_param_gather]: 1.92001e-06 [cconv]: 2.291e-05 [loop_unroll]: 0.00044005 [opt_after_cconv]: 0.00011128, [1] [Cycle 1]: 0.00010532, [7] [c_1]: 3.332e-05 [parameter_eliminate]: 2.59999e-06 [updatestate_depend_eliminate]: 5.60001e-06 [updatestate_assign_eliminate]: 3.14001e-06 [updatestate_loads_eliminate]: 2.86999e-06 [cse]: 2.117e-05 [renormalize]: 6.60017e-07 [remove_dup_value]: 1.671e-05 [tuple_transform]: 7.757e-05, [1] [Cycle 1]: 7.305e-05, [4] [d_1]: 4.525e-05 [none_parameter_eliminate]: 1.79e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 7.3e-06 [partial_unused_args_eliminate]: 2.01e-06 [add_recomputation]: 5.309e-05 [cse_after_recomputation]: 2.614e-05, [1] [Cycle 1]: 2.153e-05, [1] [cse]: 1.557e-05 [environ_conv]: 8.35001e-06 [swap_dp_allreduce_reducescatter]: 6.21e-06 [bias_add_comm_swap]: 3.25e-06 [label_micro_interleaved_index]: 4.50001e-06 [label_fine_grained_interleaved_index]: 3.04001e-06 [merge_cast_opt]: 1.72001e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.88998e-06 [assign_add_opt]: 1.71e-06 [ForceFp32Comm]: 1.10999e-06 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.56998e-06 [reorder_send_recv_between_fp_bp]: 2.78003e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.66e-06 [interleave_split_concat_branches]: 1.48002e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89e-06 [control_data_broadcast_order]: 1.421e-05 [grouped_pairwise_exchange_alltoall]: 1.62999e-06 [offloading_packed_experts]: 4.47e-06 [overlap_recompute_and_grad_model_parallel]: 5.00999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.41998e-06 [overlap_grad_ring_attention]: 4.47e-06 [overlap_grad_flash_sp]: 2.138e-05 [begin_end_overlap_inline]: 8.09989e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 2.01e-06 [handle_group_info]: 1.20001e-06 [symbol_engine_optimizer]: 8.687e-05, [1] [Cycle 1]: 8.237e-05, [6] [build]: 9.91e-06 [elim_shapecalc]: 1.057e-05 [elim_not_effective]: 1.46e-05 [opt_reshape]: 7.15998e-06 [fold_const_symbol]: 1.142e-05 [renormalize]: 2.10013e-07 [detach_backward]: 2.08998e-06 [pipeline_parallel_scheduler]: 1.88002e-06 [auto_monad_reorder]: 2.137e-05 [get_jit_bprop_graph]: 1.58002e-06 [rewriter_after_jit_bprop_graph]: 3.72998e-06 [opt_after_jit_grad]: 0.00047594 [validate]: 4.525e-05 [backend_pass]: 9.40025e-07 [task_emit]: 0.0442369 [execute]: 1.001e-05 Sums bootstrap : 0.000551s : 0.76% type_inference : 0.011895s : 16.29% event_method : 0.000044s : 0.06% auto_monad : 0.000127s : 0.17% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000047s : 0.06% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.05% optimize.rewriter_before_opt_a : 0.000148s : 0.20% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000127s : 0.17% optimize.opt_a.loop_unroll : 0.000109s : 0.15% optimize.opt_a.a_1 : 0.002886s : 3.95% optimize.opt_a.with_stream_mark : 0.000044s : 0.06% optimize.opt_a.recompute_prepare : 0.000039s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000423s : 0.58% optimize.opt_a.accelerated_algorithm : 0.000052s : 0.07% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000030s : 0.04% optimize.opt_a.merge_send_recv : 0.000029s : 0.04% optimize.opt_a.auto_parallel : 0.000024s : 0.03% optimize.opt_a.parallel : 0.000035s : 0.05% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000017s : 0.02% optimize.opt_a.allreduce_fusion : 0.000016s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000039s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000034s : 0.05% optimize.opt_a.virtual_dataset : 0.000028s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.04% optimize.opt_a.virtual_output : 0.000027s : 0.04% optimize.opt_a.merge_forward : 0.000016s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000033s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000055s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000052s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.02% optimize.opt_a.meta_fg_expand : 0.001518s : 2.08% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000085s : 0.12% optimize.opt_a.a_after_grad : 0.000108s : 0.15% optimize.opt_a.renormalize : 0.006918s : 9.48% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.02% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000070s : 0.10% optimize.opt_a.cse : 0.000225s : 0.31% optimize.opt_a.a_3 : 0.000424s : 0.58% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000040s : 0.06% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000547s : 0.75% optimize.opt_b.b_1 : 0.000134s : 0.18% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000022s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000440s : 0.60% optimize.opt_after_cconv.c_1 : 0.000033s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000021s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000017s : 0.02% optimize.tuple_transform.d_1 : 0.000045s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.07% optimize.cse_after_recomputation.cse : 0.000016s : 0.02% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000002s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000021s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000476s : 0.65% validate : 0.000045s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.044237s : 60.60% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000694 159 7.09% : 0.000049s : 7: substitution.arithmetic_simplify 0.37% : 0.000003s : 3: substitution.elim_not_effective 0.63% : 0.000004s : 5: substitution.float_depend_g_call 0.63% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 3: substitution.fold_const_symbol 0.83% : 0.000006s : 4: substitution.graph_param_transform 0.42% : 0.000003s : 2: substitution.incorporate_call 0.30% : 0.000002s : 2: substitution.incorporate_call_switch 57.50% : 0.000399s : 17: substitution.inline 2.34% : 0.000016s : 2: substitution.inline_without_move 1.48% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.23% : 0.000015s : 3: substitution.less_batch_normalization 1.56% : 0.000011s : 7: substitution.minmaximum_grad 0.91% : 0.000006s : 5: substitution.partial_eliminate 1.80% : 0.000013s : 15: substitution.remove_not_recompute_node 3.81% : 0.000026s : 10: substitution.replace_applicator 1.32% : 0.000009s : 10: substitution.replace_old_param 0.39% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.13% : 0.000022s : 7: substitution.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 2.03% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.33% : 0.000051s : 18: substitution.tuple_list_get_item_eliminator 2.11% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011820 2 87.53% : 0.010347s : 1: type_inference.infer 12.47% : 0.001473s : 1: type_inference.specialize ------[replace.] 0.000188 26 65.74% : 0.000124s : 17: replace.inline 34.26% : 0.000065s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000414 26 94.15% : 0.000390s : 17: match.inline 5.85% : 0.000024s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000680 4180 1.11% : 0.000008s : 52: predicate.accumulaten_eliminater 0.24% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.12% : 0.000008s : 52: predicate.addn_zero_filter 1.09% : 0.000007s : 52: predicate.adjust_all_reduce_mul_add 1.96% : 0.000013s : 73: predicate.arithmetic_simplify 1.18% : 0.000008s : 52: predicate.cast_eliminate 1.15% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.46% : 0.000003s : 21: predicate.depend_value_elim 1.16% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.22% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.19% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 4: predicate.elim_not_effective 0.11% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.18% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.18% : 0.000008s : 56: predicate.environ_get_depend_swap 1.71% : 0.000012s : 77: predicate.environ_get_eliminate 1.18% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.84% : 0.000013s : 78: predicate.exchange_switch_depend_value 2.50% : 0.000017s : 78: predicate.float_depend_g_call 0.46% : 0.000003s : 21: predicate.float_environ_get_switch 0.60% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.53% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.52% : 0.000004s : 21: predicate.incorporate_call 0.47% : 0.000003s : 21: predicate.incorporate_call_switch 5.76% : 0.000039s : 180: predicate.inline 1.47% : 0.000010s : 45: predicate.inline_without_move 0.29% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.64% : 0.000004s : 21: predicate.less_batch_normalization 1.52% : 0.000010s : 69: predicate.list_to_tuple_eliminator_ 2.66% : 0.000018s : 121: predicate.load_eliminater 0.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.55% : 0.000017s : 110: predicate.loop_unroll_before_grad 1.36% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.49% : 0.000003s : 21: predicate.merge_addn 1.12% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 52: predicate.minmaximum_grad 0.32% : 0.000002s : 4: predicate.mutable_eliminate 0.10% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 2.08% : 0.000014s : 78: predicate.partial_defer_inline 1.68% : 0.000011s : 65: predicate.partial_eliminate 1.11% : 0.000008s : 52: predicate.print_const_string_wrapper 0.49% : 0.000003s : 21: predicate.reduce_all_const_elim 1.40% : 0.000010s : 52: predicate.reduce_eliminate 2.68% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 21: predicate.remove_not_recompute_node 1.90% : 0.000013s : 111: predicate.replace_applicator 0.66% : 0.000005s : 45: predicate.replace_old_param 0.09% : 0.000001s : 4: predicate.reset_defer_inline 1.17% : 0.000008s : 52: predicate.reshape_eliminate 1.12% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.31% : 0.000009s : 50: predicate.same_eliminate 0.36% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.57% : 0.000004s : 21: predicate.shard_identity_eliminate 0.22% : 0.000001s : 8: predicate.special_op_eliminate 0.63% : 0.000004s : 21: predicate.specialize_transform 1.26% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.23% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.95% : 0.000013s : 78: predicate.switch_defer_inline 3.04% : 0.000021s : 128: predicate.switch_layer_defer_inline 5.21% : 0.000035s : 213: predicate.switch_simplify 1.12% : 0.000008s : 52: predicate.tile_eliminate 1.13% : 0.000008s : 52: predicate.transpose_eliminate 1.45% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.72% : 0.000018s : 90: predicate.tuple_list_get_item_eliminator 1.46% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000014s : 81: predicate.tuple_list_set_item_eliminator 1.56% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.59% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.15% : 0.000021s : 142: predicate.updatestate_useless_node_eliminater 0.12% : 0.000001s : 4: predicate.value_based_eliminate 0.52% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.53% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001717 35 59.69% : 0.001025s : 14: func_graph_cloner_run.FuncGraphClonerGraph 40.31% : 0.000692s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.108553 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.83% : 0.003071s : 1: add_attr 2.82% : 0.003062s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000058s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.12% : 0.000135s : 1: auto_monad 0.02% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.54% : 0.000588s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000029s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.05% : 0.000051s : 1: event_method 0.02% : 0.000018s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.41% : 0.000449s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.51% : 0.000557s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000015s : 1: opt.transform.mutable_eliminate 4.03% : 0.004374s : 117: opt.transform.opt_a 0.03% : 0.000032s : 1: opt.transform.opt_after_cconv 0.02% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000115s : 28: opt.transform.opt_b 0.05% : 0.000050s : 2: opt.transform.opt_trans_graph 0.04% : 0.000040s : 4: opt.transform.symbol_engine_opt 13.27% : 0.014401s : 1: opt_a 0.11% : 0.000115s : 1: opt_after_cconv 0.45% : 0.000486s : 1: opt_after_jit_grad 0.21% : 0.000223s : 1: opt_b 15.31% : 0.016617s : 1: optimize 0.03% : 0.000027s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000052s : 1: pre_auto_parallel 0.04% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000020s : 1: remove_dup_value 4.86% : 0.005277s : 2: renormalize.infer 1.50% : 0.001627s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000044s : 1: rewriter_after_opt_a 0.14% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000090s : 1: symbol_engine_optimizer 40.77% : 0.044261s : 1: task_emit 0.07% : 0.000080s : 1: tuple_transform 10.97% : 0.011911s : 1: type_inference 0.06% : 0.000069s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x1-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x1-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x2-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x2-pynative],max_mem:6.0M TotalTime = 0.0222137, [24] [bootstrap]: 0.00055653 [type_inference]: 0.0063664 [event_method]: 1.456e-05 [auto_monad]: 6.043e-05 [graph_reusing]: 6.06e-06 [inline]: 1.74e-06 [add_attr]: 0.00360139, [1] [add_attr_with_inline]: 0.0035903, [1] [Cycle 1]: 4.91e-05, [2] [tag_attr]: 1.633e-05 [meta_addattr_fg_expand]: 4.18999e-06 [parallel-infer-symbol]: 3.68999e-06 [pre_auto_parallel]: 2.592e-05 [insert-virtual-dataset]: 2.53998e-06 [parallel-infer-symbol-second]: 9.30013e-07 [dataset_repeat_opt]: 2.41998e-06 [pipeline_split]: 2.02999e-06 [optimize]: 0.00413561, [53] [py_interpret_to_execute]: 2.121e-05 [rewriter_before_opt_a]: 6.243e-05 [opt_a]: 0.00224997, [2] [Cycle 1]: 0.0016324, [45] [expand_dump_flag]: 2.71e-06 [switch_simplify]: 3.288e-05 [loop_unroll]: 2.099e-05 [a_1]: 0.00044407 [with_stream_mark]: 1.424e-05 [recompute_prepare]: 8.05e-06 [updatestate_depend_eliminate]: 3.78999e-06 [updatestate_assign_eliminate]: 3.66001e-06 [updatestate_loads_eliminate]: 3.52002e-06 [parameter_eliminate]: 1.80001e-06 [a_2]: 8.232e-05 [accelerated_algorithm]: 6.64999e-06 [shard]: 2.07001e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 6.76e-06 [merge_send_recv]: 8.25999e-06 [auto_parallel]: 6.04001e-06 [parallel]: 2.912e-05 [flash_sp]: 7.78001e-06 [merge_comm]: 3.88001e-06 [allreduce_fusion]: 3.45003e-06 [matmul_add_comm_reduction]: 9.04998e-06 [allreduce_slice_to_reducescatter]: 7.29982e-07 [virtual_shard_identity]: 7.84002e-06 [virtual_dataset]: 6.42001e-06 [get_grad_eliminate_]: 6.14001e-06 [virtual_output]: 6.09999e-06 [merge_forward]: 3.98001e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 1.037e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.208e-05 [merge_recompute_call_nodes]: 1.58002e-06 [before_grad]: 1.007e-05 [set_forward_comm_id_for_comm_node_pass]: 4.3e-06 [meta_fg_expand]: 2.74999e-06 [flash_sp_send_recv_attached]: 2.44001e-06 [receive_attached]: 2.34999e-06 [after_resolve]: 9.44e-06 [a_after_grad]: 8.67998e-06 [renormalize]: 0.00046813 [add_forward_monad_depend]: 8.65999e-06 [auto_monad_grad]: 1.77001e-06 [auto_monad_eliminator]: 1.448e-05 [cse]: 2.894e-05 [a_3]: 4.315e-05 [Cycle 2]: 0.00060793, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 7.01999e-06 [loop_unroll]: 5.64998e-06 [a_1]: 0.00011516 [with_stream_mark]: 1.053e-05 [recompute_prepare]: 6.04001e-06 [updatestate_depend_eliminate]: 2.96001e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 7.181e-05 [accelerated_algorithm]: 5.83002e-06 [shard]: 1.30999e-06 [meta_shard_fg_expand]: 1.16002e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 4.38001e-06 [auto_parallel]: 5.44e-06 [parallel]: 4.80001e-06 [flash_sp]: 3.6e-06 [merge_comm]: 3.26001e-06 [allreduce_fusion]: 2.93e-06 [matmul_add_comm_reduction]: 5.42001e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 6.14999e-06 [virtual_dataset]: 5.56e-06 [get_grad_eliminate_]: 5.17999e-06 [virtual_output]: 5.12999e-06 [merge_forward]: 2.62001e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 5.89999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.05e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.74e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46001e-06 [meta_fg_expand]: 2.06e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 8.1e-06 [a_after_grad]: 8.18999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.21002e-06 [auto_monad_grad]: 9.49978e-07 [auto_monad_eliminator]: 6.86001e-06 [cse]: 1.456e-05 [a_3]: 3.323e-05 [py_interpret_to_execute_after_opt_a]: 7.7e-06 [slice_cell_reuse_recomputed_activation]: 2.11e-06 [rewriter_after_opt_a]: 3.014e-05 [convert_after_rewriter]: 6.93e-06 [order_py_execute_after_rewriter]: 5.49998e-06 [mutable_eliminate]: 0.00045925 [opt_b]: 0.0001895, [1] [Cycle 1]: 0.00018296, [7] [b_1]: 0.00011191 [b_2]: 7.3e-06 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.51e-06 [renormalize]: 3.70026e-07 [cse]: 1.821e-05 [optimize_parallel_all_gather_comm]: 1.638e-05 [overlap_param_gather]: 1.99e-06 [cconv]: 2.338e-05 [loop_unroll]: 0.00041681 [opt_after_cconv]: 9.664e-05, [1] [Cycle 1]: 9.06e-05, [7] [c_1]: 2.588e-05 [parameter_eliminate]: 2.37999e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.48998e-06 [cse]: 1.803e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.433e-05 [tuple_transform]: 7.028e-05, [1] [Cycle 1]: 6.573e-05, [4] [d_1]: 3.752e-05 [none_parameter_eliminate]: 1.96e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 7.08998e-06 [partial_unused_args_eliminate]: 1.93002e-06 [add_recomputation]: 4.272e-05 [cse_after_recomputation]: 2.16e-05, [1] [Cycle 1]: 1.685e-05, [1] [cse]: 1.165e-05 [environ_conv]: 8.75999e-06 [swap_dp_allreduce_reducescatter]: 5.46002e-06 [bias_add_comm_swap]: 2.30002e-06 [label_micro_interleaved_index]: 4.57e-06 [label_fine_grained_interleaved_index]: 2.60002e-06 [merge_cast_opt]: 1.24003e-06 [slice_recompute_activation]: 2.14999e-06 [micro_interleaved_order_control]: 2.49001e-06 [assign_add_opt]: 1.47999e-06 [ForceFp32Comm]: 1.07998e-06 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.27001e-06 [reorder_send_recv_between_fp_bp]: 2.72001e-06 [comm_op_add_attrs]: 1.42e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.40001e-06 [interleave_parallel_branches]: 1.40001e-06 [overlap_opt_shard_in_pipeline]: 1.33002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.23e-05 [grouped_pairwise_exchange_alltoall]: 1.51002e-06 [offloading_packed_experts]: 3.85998e-06 [overlap_recompute_and_grad_model_parallel]: 4.68999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 2.38998e-06 [overlap_grad_ring_attention]: 4.03001e-06 [overlap_grad_flash_sp]: 1.761e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.29999e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 1.22e-06 [symbol_engine_optimizer]: 7.169e-05, [1] [Cycle 1]: 6.752e-05, [6] [build]: 2.66e-06 [elim_shapecalc]: 8.70001e-06 [elim_not_effective]: 1.235e-05 [opt_reshape]: 6.34001e-06 [fold_const_symbol]: 9.94001e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.81e-06 [auto_monad_reorder]: 1.558e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 0.00012416 [opt_after_jit_grad]: 0.00045842 [validate]: 3.505e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.00657754 [execute]: 7.88001e-06 Sums bootstrap : 0.000557s : 3.17% type_inference : 0.006366s : 36.23% event_method : 0.000015s : 0.08% auto_monad : 0.000060s : 0.34% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000026s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000062s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000559s : 3.18% optimize.opt_a.with_stream_mark : 0.000025s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000154s : 0.88% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000034s : 0.19% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000468s : 2.66% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.06% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000044s : 0.25% optimize.opt_a.a_3 : 0.000076s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000459s : 2.61% optimize.opt_b.b_1 : 0.000112s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000417s : 2.37% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.24% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000009s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000124s : 0.71% opt_after_jit_grad : 0.000458s : 2.61% validate : 0.000035s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006578s : 37.43% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000171 26 18.68% : 0.000032s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000002s : 2: substitution.fold_const_symbol 3.22% : 0.000005s : 3: substitution.graph_param_transform 64.15% : 0.000110s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.01% : 0.000005s : 4: substitution.remove_not_recompute_node 1.81% : 0.000003s : 2: substitution.replace_old_param 5.25% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006317 2 90.11% : 0.005692s : 1: type_inference.infer 9.89% : 0.000625s : 1: type_inference.specialize ------[replace.] 0.000037 4 79.24% : 0.000029s : 3: replace.inline 20.76% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 4 92.93% : 0.000108s : 3: match.inline 7.07% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 883 0.96% : 0.000001s : 9: predicate.accumulaten_eliminater 0.83% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 0.93% : 0.000001s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 15: predicate.arithmetic_simplify 0.91% : 0.000001s : 9: predicate.cast_eliminate 0.65% : 0.000001s : 6: predicate.check_bprop_eliminate 0.59% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.85% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 9: predicate.dict_set_item_eliminator 0.96% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.39% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_depend_swap 1.79% : 0.000003s : 18: predicate.environ_get_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.31% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.29% : 0.000004s : 13: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.89% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.60% : 0.000001s : 6: predicate.incorporate_call_switch 6.36% : 0.000010s : 40: predicate.inline 0.91% : 0.000001s : 6: predicate.inline_without_move 0.38% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 6: predicate.less_batch_normalization 1.70% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.46% : 0.000004s : 25: predicate.load_eliminater 0.91% : 0.000001s : 3: predicate.loop_unroll_after_grad 2.22% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.84% : 0.000001s : 9: predicate.minmaximum_grad 1.12% : 0.000002s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.36% : 0.000001s : 3: predicate.parallel_virtual_node 1.62% : 0.000003s : 13: predicate.partial_defer_inline 1.50% : 0.000002s : 13: predicate.partial_eliminate 0.91% : 0.000001s : 9: predicate.print_const_string_wrapper 0.64% : 0.000001s : 6: predicate.reduce_all_const_elim 1.19% : 0.000002s : 9: predicate.reduce_eliminate 2.43% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.42% : 0.000001s : 6: predicate.remove_not_recompute_node 1.29% : 0.000002s : 16: predicate.replace_applicator 0.59% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 9: predicate.reshape_eliminate 0.56% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.92% : 0.000001s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 6: predicate.shard_identity_eliminate 0.73% : 0.000001s : 6: predicate.special_op_eliminate 0.84% : 0.000001s : 6: predicate.specialize_transform 0.95% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.71% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.40% : 0.000002s : 13: predicate.switch_defer_inline 2.00% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.23% : 0.000008s : 43: predicate.switch_simplify 0.98% : 0.000002s : 9: predicate.tile_eliminate 0.91% : 0.000001s : 9: predicate.transpose_eliminate 1.61% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 15: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.37% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.07% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.73% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000369 8 45.69% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.31% : 0.000200s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031504 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.45% : 0.003606s : 1: add_attr 11.41% : 0.003594s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.15% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000065s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.88% : 0.000591s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000012s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000005s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.35% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.49% : 0.000468s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.97% : 0.000937s : 78: opt.transform.opt_a 0.08% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.13% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.15% : 0.002253s : 1: opt_a 0.32% : 0.000100s : 1: opt_after_cconv 1.49% : 0.000469s : 1: opt_after_jit_grad 0.61% : 0.000193s : 1: opt_b 13.14% : 0.004139s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.77% : 0.000242s : 1: renormalize.infer 0.70% : 0.000220s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.41% : 0.000130s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000067s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000074s : 1: symbol_engine_optimizer 20.91% : 0.006589s : 1: task_emit 0.23% : 0.000073s : 1: tuple_transform 20.25% : 0.006381s : 1: type_inference 0.22% : 0.000068s : 1: validate TotalTime = 0.0209605, [24] [bootstrap]: 0.00040613 [type_inference]: 0.00631843 [event_method]: 1.254e-05 [auto_monad]: 5.982e-05 [graph_reusing]: 4.82998e-06 [inline]: 2.78e-06 [add_attr]: 0.00307197, [1] [add_attr_with_inline]: 0.00306336, [1] [Cycle 1]: 4.782e-05, [2] [tag_attr]: 1.283e-05 [meta_addattr_fg_expand]: 3.65e-06 [parallel-infer-symbol]: 3.33e-06 [pre_auto_parallel]: 2.273e-05 [insert-virtual-dataset]: 2.49999e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.59998e-06 [optimize]: 0.00402536, [53] [py_interpret_to_execute]: 1.987e-05 [rewriter_before_opt_a]: 5.031e-05 [opt_a]: 0.00205471, [2] [Cycle 1]: 0.00143219, [45] [expand_dump_flag]: 2.61e-06 [switch_simplify]: 2.755e-05 [loop_unroll]: 1.681e-05 [a_1]: 0.00034551 [with_stream_mark]: 1.287e-05 [recompute_prepare]: 7.77e-06 [updatestate_depend_eliminate]: 3.75e-06 [updatestate_assign_eliminate]: 3.16001e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 8.257e-05 [accelerated_algorithm]: 6.54999e-06 [shard]: 1.75001e-06 [meta_shard_fg_expand]: 1.62999e-06 [shard_inline]: 6.48e-06 [merge_send_recv]: 7.18e-06 [auto_parallel]: 6.33e-06 [parallel]: 1.558e-05 [flash_sp]: 6.83e-06 [merge_comm]: 4.42e-06 [allreduce_fusion]: 3.68999e-06 [matmul_add_comm_reduction]: 8.32e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.11999e-06 [virtual_dataset]: 5.97001e-06 [get_grad_eliminate_]: 5.80002e-06 [virtual_output]: 6.12001e-06 [merge_forward]: 3.33e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 8.75999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.256e-05 [merge_recompute_call_nodes]: 1.15001e-06 [before_grad]: 1.009e-05 [set_forward_comm_id_for_comm_node_pass]: 3.63999e-06 [meta_fg_expand]: 2.59001e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 1.85001e-06 [after_resolve]: 1.041e-05 [a_after_grad]: 8.64003e-06 [renormalize]: 0.00042375 [add_forward_monad_depend]: 4.74e-06 [auto_monad_grad]: 2.29001e-06 [auto_monad_eliminator]: 1.429e-05 [cse]: 3.071e-05 [a_3]: 5.262e-05 [Cycle 2]: 0.00061182, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 7.13998e-06 [loop_unroll]: 5.66e-06 [a_1]: 0.00011305 [with_stream_mark]: 1.114e-05 [recompute_prepare]: 6.21998e-06 [updatestate_depend_eliminate]: 3.12002e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.78003e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 7.266e-05 [accelerated_algorithm]: 6.02999e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 5.07999e-06 [auto_parallel]: 6.01998e-06 [parallel]: 4.28001e-06 [flash_sp]: 3.58999e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 3.18998e-06 [matmul_add_comm_reduction]: 5.33002e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 6.23e-06 [virtual_dataset]: 5.35001e-06 [get_grad_eliminate_]: 5.30001e-06 [virtual_output]: 5.06997e-06 [merge_forward]: 2.78e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 7.01001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.039e-05 [merge_recompute_call_nodes]: 7.80012e-07 [before_grad]: 8.97e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35998e-06 [meta_fg_expand]: 2.39001e-06 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 8.70001e-07 [after_resolve]: 8.65999e-06 [a_after_grad]: 7.92e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.14003e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 7.9e-06 [cse]: 1.397e-05 [a_3]: 3.362e-05 [py_interpret_to_execute_after_opt_a]: 8.45001e-06 [slice_cell_reuse_recomputed_activation]: 2.74999e-06 [rewriter_after_opt_a]: 3.446e-05 [convert_after_rewriter]: 6.64001e-06 [order_py_execute_after_rewriter]: 5.30999e-06 [mutable_eliminate]: 0.00051674 [opt_b]: 0.00019299, [1] [Cycle 1]: 0.0001861, [7] [b_1]: 0.0001128 [b_2]: 7.45e-06 [updatestate_depend_eliminate]: 5.87999e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 4.40021e-07 [cse]: 1.857e-05 [optimize_parallel_all_gather_comm]: 1.8e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.548e-05 [loop_unroll]: 0.00043281 [opt_after_cconv]: 9.937e-05, [1] [Cycle 1]: 9.179e-05, [7] [c_1]: 2.552e-05 [parameter_eliminate]: 2.93998e-06 [updatestate_depend_eliminate]: 5.60001e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.43998e-06 [cse]: 1.773e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.546e-05 [tuple_transform]: 7.162e-05, [1] [Cycle 1]: 6.724e-05, [4] [d_1]: 3.967e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.61e-06 [partial_unused_args_eliminate]: 2.27001e-06 [add_recomputation]: 4.651e-05 [cse_after_recomputation]: 2.119e-05, [1] [Cycle 1]: 1.668e-05, [1] [cse]: 1.131e-05 [environ_conv]: 5.42999e-06 [swap_dp_allreduce_reducescatter]: 5.40999e-06 [bias_add_comm_swap]: 2.99999e-06 [label_micro_interleaved_index]: 4.12998e-06 [label_fine_grained_interleaved_index]: 2.74001e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.29999e-06 [micro_interleaved_order_control]: 2.30002e-06 [assign_add_opt]: 1.60001e-06 [ForceFp32Comm]: 1.01002e-06 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.19001e-06 [reorder_send_recv_between_fp_bp]: 2.86e-06 [comm_op_add_attrs]: 1.15999e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.24998e-06 [interleave_parallel_branches]: 1.11002e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.308e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.23001e-06 [overlap_recompute_and_grad_model_parallel]: 4.63001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.43002e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 4.38999e-06 [overlap_grad_flash_sp]: 2.012e-05 [begin_end_overlap_inline]: 8.39995e-07 [split_matmul_comm_elemetwise]: 2.57001e-06 [split_layernorm_comm]: 1.91998e-06 [handle_group_info]: 1.27e-06 [symbol_engine_optimizer]: 7.248e-05, [1] [Cycle 1]: 6.779e-05, [6] [build]: 3.11001e-06 [elim_shapecalc]: 8.59e-06 [elim_not_effective]: 1.259e-05 [opt_reshape]: 6.24999e-06 [fold_const_symbol]: 9.77001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 2.00002e-06 [pipeline_parallel_scheduler]: 1.59e-06 [auto_monad_reorder]: 1.717e-05 [get_jit_bprop_graph]: 1.17e-06 [rewriter_after_jit_bprop_graph]: 3.75e-06 [opt_after_jit_grad]: 0.00047916 [validate]: 3.873e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.00626762 [execute]: 7.68001e-06 Sums bootstrap : 0.000406s : 2.41% type_inference : 0.006318s : 37.45% event_method : 0.000013s : 0.07% auto_monad : 0.000060s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000003s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.13% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000050s : 0.30% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000035s : 0.21% optimize.opt_a.loop_unroll : 0.000022s : 0.13% optimize.opt_a.a_1 : 0.000459s : 2.72% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000155s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000020s : 0.12% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000424s : 2.51% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.13% optimize.opt_a.cse : 0.000045s : 0.26% optimize.opt_a.a_3 : 0.000086s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.02% optimize.rewriter_after_opt_a : 0.000034s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000517s : 3.06% optimize.opt_b.b_1 : 0.000113s : 0.67% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.15% optimize.loop_unroll : 0.000433s : 2.57% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000020s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000479s : 2.84% validate : 0.000039s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.006268s : 37.15% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000136 24 19.72% : 0.000027s : 4: substitution.arithmetic_simplify 1.59% : 0.000002s : 2: substitution.elim_not_effective 1.20% : 0.000002s : 2: substitution.fold_const_symbol 4.41% : 0.000006s : 3: substitution.graph_param_transform 64.78% : 0.000088s : 3: substitution.inline 2.23% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.34% : 0.000005s : 4: substitution.remove_not_recompute_node 2.73% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.006268 2 92.41% : 0.005792s : 1: type_inference.infer 7.59% : 0.000476s : 1: type_inference.specialize ------[replace.] 0.000026 3 100.00% : 0.000026s : 3: replace.inline ------[match.] 0.000086 3 100.00% : 0.000086s : 3: match.inline ------[predicate.] 0.000147 815 0.90% : 0.000001s : 8: predicate.accumulaten_eliminater 0.85% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 6: predicate.addn_check_dump 0.86% : 0.000001s : 8: predicate.addn_zero_filter 0.80% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.37% : 0.000003s : 14: predicate.arithmetic_simplify 0.90% : 0.000001s : 8: predicate.cast_eliminate 0.71% : 0.000001s : 6: predicate.check_bprop_eliminate 0.63% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.84% : 0.000001s : 6: predicate.depend_value_elim 0.82% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.44% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_depend_swap 1.73% : 0.000003s : 17: predicate.environ_get_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.26% : 0.000003s : 11: predicate.float_depend_g_call 0.60% : 0.000001s : 6: predicate.float_environ_get_switch 0.90% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000001s : 6: predicate.get_grad_eliminate 0.27% : 0.000000s : 3: predicate.graph_param_transform 0.77% : 0.000001s : 6: predicate.incorporate_call 0.60% : 0.000001s : 6: predicate.incorporate_call_switch 6.34% : 0.000009s : 37: predicate.inline 0.95% : 0.000001s : 6: predicate.inline_without_move 0.44% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 6: predicate.less_batch_normalization 1.52% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 22: predicate.load_eliminater 1.08% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.99% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.65% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 8: predicate.minmaximum_grad 1.15% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.53% : 0.000001s : 3: predicate.parallel_virtual_node 1.44% : 0.000002s : 11: predicate.partial_defer_inline 1.37% : 0.000002s : 11: predicate.partial_eliminate 0.85% : 0.000001s : 8: predicate.print_const_string_wrapper 0.88% : 0.000001s : 6: predicate.reduce_all_const_elim 1.05% : 0.000002s : 8: predicate.reduce_eliminate 2.22% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.65% : 0.000001s : 6: predicate.remove_not_recompute_node 1.18% : 0.000002s : 14: predicate.replace_applicator 0.89% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.92% : 0.000001s : 8: predicate.reshape_eliminate 0.67% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000001s : 6: predicate.same_eliminate 0.51% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 6: predicate.shard_identity_eliminate 0.91% : 0.000001s : 6: predicate.special_op_eliminate 0.88% : 0.000001s : 6: predicate.specialize_transform 0.98% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.26% : 0.000002s : 11: predicate.switch_defer_inline 2.10% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.86% : 0.000007s : 38: predicate.switch_simplify 0.84% : 0.000001s : 8: predicate.tile_eliminate 0.84% : 0.000001s : 8: predicate.transpose_eliminate 1.54% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.71% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.53% : 0.000001s : 3: predicate.value_based_eliminate 0.73% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000296 7 36.64% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.36% : 0.000188s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029477 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.44% : 0.003077s : 1: add_attr 10.40% : 0.003067s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.02% : 0.000005s : 1: assign_add_opt 0.22% : 0.000065s : 1: auto_monad 0.07% : 0.000022s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.46% : 0.000432s : 1: bootstrap 0.10% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000442s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.79% : 0.000529s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.84% : 0.000837s : 78: opt.transform.opt_a 0.08% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000092s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 6.98% : 0.002058s : 1: opt_a 0.35% : 0.000103s : 1: opt_after_cconv 1.66% : 0.000489s : 1: opt_after_jit_grad 0.67% : 0.000196s : 1: opt_b 13.67% : 0.004030s : 1: optimize 0.07% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.09% : 0.000027s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.74% : 0.000218s : 1: renormalize.infer 0.68% : 0.000199s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000039s : 1: rewriter_after_opt_a 0.18% : 0.000054s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000075s : 1: symbol_engine_optimizer 21.31% : 0.006280s : 1: task_emit 0.25% : 0.000075s : 1: tuple_transform 21.50% : 0.006338s : 1: type_inference 0.24% : 0.000069s : 1: validate TotalTime = 0.0209399, [24] [bootstrap]: 0.00047691 [type_inference]: 0.00583744 [event_method]: 1.457e-05 [auto_monad]: 6.127e-05 [graph_reusing]: 5.52001e-06 [inline]: 2.26e-06 [add_attr]: 0.00316431, [1] [add_attr_with_inline]: 0.00315529, [1] [Cycle 1]: 5.435e-05, [2] [tag_attr]: 1.542e-05 [meta_addattr_fg_expand]: 4.64002e-06 [parallel-infer-symbol]: 2.88e-06 [pre_auto_parallel]: 2.669e-05 [insert-virtual-dataset]: 2.69001e-06 [parallel-infer-symbol-second]: 9.20001e-07 [dataset_repeat_opt]: 2.43e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.00426321, [53] [py_interpret_to_execute]: 2.21e-05 [rewriter_before_opt_a]: 6.532e-05 [opt_a]: 0.00230239, [2] [Cycle 1]: 0.00164858, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 3.331e-05 [loop_unroll]: 2.065e-05 [a_1]: 0.00044564 [with_stream_mark]: 1.458e-05 [recompute_prepare]: 7.7e-06 [updatestate_depend_eliminate]: 3.9e-06 [updatestate_assign_eliminate]: 3.85e-06 [updatestate_loads_eliminate]: 3.62002e-06 [parameter_eliminate]: 1.86e-06 [a_2]: 8.147e-05 [accelerated_algorithm]: 6.78998e-06 [shard]: 2.34001e-06 [meta_shard_fg_expand]: 1.97999e-06 [shard_inline]: 5.94999e-06 [merge_send_recv]: 9.37001e-06 [auto_parallel]: 7.01999e-06 [parallel]: 1.88e-05 [flash_sp]: 7.22002e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.65e-06 [matmul_add_comm_reduction]: 9.72999e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 6.84001e-06 [virtual_dataset]: 6.03002e-06 [get_grad_eliminate_]: 5.74e-06 [virtual_output]: 6.02999e-06 [merge_forward]: 3.73001e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 1.045e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.128e-05 [merge_recompute_call_nodes]: 1.89e-06 [before_grad]: 1.023e-05 [set_forward_comm_id_for_comm_node_pass]: 3.71001e-06 [meta_fg_expand]: 2.63e-06 [flash_sp_send_recv_attached]: 2.43998e-06 [receive_attached]: 1.91e-06 [after_resolve]: 9.64999e-06 [a_after_grad]: 8.23001e-06 [renormalize]: 0.00052973 [add_forward_monad_depend]: 4.82e-06 [auto_monad_grad]: 2.81999e-06 [auto_monad_eliminator]: 1.308e-05 [cse]: 3.109e-05 [a_3]: 4.27e-05 [Cycle 2]: 0.00064292, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 7.06001e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00011467 [with_stream_mark]: 1.01e-05 [recompute_prepare]: 5.97999e-06 [updatestate_depend_eliminate]: 3.08e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.62001e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 7.224e-05 [accelerated_algorithm]: 5.67999e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 4.97e-06 [auto_parallel]: 5.33002e-06 [parallel]: 4.45999e-06 [flash_sp]: 3.52002e-06 [merge_comm]: 3.21001e-06 [allreduce_fusion]: 3.03e-06 [matmul_add_comm_reduction]: 5.60001e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 4.054e-05 [virtual_dataset]: 5.87999e-06 [get_grad_eliminate_]: 5.15001e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 2.98e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 6.04001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.042e-05 [merge_recompute_call_nodes]: 8.30012e-07 [before_grad]: 8.40001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.61001e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 7.30011e-07 [receive_attached]: 1.18001e-06 [after_resolve]: 8.18001e-06 [a_after_grad]: 7.84002e-06 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 1.32e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.85002e-06 [cse]: 1.543e-05 [a_3]: 3.275e-05 [py_interpret_to_execute_after_opt_a]: 8.30999e-06 [slice_cell_reuse_recomputed_activation]: 2.27001e-06 [rewriter_after_opt_a]: 3.435e-05 [convert_after_rewriter]: 7.2e-06 [order_py_execute_after_rewriter]: 5.01002e-06 [mutable_eliminate]: 0.00049312 [opt_b]: 0.00019231, [1] [Cycle 1]: 0.0001851, [7] [b_1]: 0.0001125 [b_2]: 7.15e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.51998e-06 [renormalize]: 3.39991e-07 [cse]: 1.845e-05 [optimize_parallel_all_gather_comm]: 1.665e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 2.545e-05 [loop_unroll]: 0.00044093 [opt_after_cconv]: 9.698e-05, [1] [Cycle 1]: 9.068e-05, [7] [c_1]: 2.579e-05 [parameter_eliminate]: 2.49001e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.74999e-06 [updatestate_loads_eliminate]: 2.24999e-06 [cse]: 1.732e-05 [renormalize]: 5.89993e-07 [remove_dup_value]: 1.392e-05 [tuple_transform]: 7.044e-05, [1] [Cycle 1]: 6.553e-05, [4] [d_1]: 3.769e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.66e-06 [partial_unused_args_eliminate]: 1.86e-06 [add_recomputation]: 4.617e-05 [cse_after_recomputation]: 2.11e-05, [1] [Cycle 1]: 1.644e-05, [1] [cse]: 1.107e-05 [environ_conv]: 5.71e-06 [swap_dp_allreduce_reducescatter]: 5.29e-06 [bias_add_comm_swap]: 2.44001e-06 [label_micro_interleaved_index]: 4.77998e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.51e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.39e-06 [full_micro_interleaved_order_control]: 2.54999e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 1.22e-06 [interleave_split_concat_branches]: 1.19998e-06 [interleave_parallel_branches]: 1.18001e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 1.68002e-06 [control_data_broadcast_order]: 1.271e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 4.51002e-06 [overlap_recompute_and_grad_model_parallel]: 4.63001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.35001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.62999e-06 [overlap_recompute_comm]: 2.35002e-06 [overlap_grad_ring_attention]: 4.19002e-06 [overlap_grad_flash_sp]: 1.9e-05 [begin_end_overlap_inline]: 9.20001e-07 [split_matmul_comm_elemetwise]: 2.60997e-06 [split_layernorm_comm]: 2.02999e-06 [handle_group_info]: 9.90025e-07 [symbol_engine_optimizer]: 7.238e-05, [1] [Cycle 1]: 6.724e-05, [6] [build]: 2.90998e-06 [elim_shapecalc]: 8.96002e-06 [elim_not_effective]: 1.191e-05 [opt_reshape]: 5.97999e-06 [fold_const_symbol]: 9.62999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 2.04999e-06 [pipeline_parallel_scheduler]: 1.84e-06 [auto_monad_reorder]: 1.663e-05 [get_jit_bprop_graph]: 1.60999e-06 [rewriter_after_jit_bprop_graph]: 3.55e-06 [opt_after_jit_grad]: 0.00047803 [validate]: 3.8e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.00630762 [execute]: 7.85998e-06 Sums bootstrap : 0.000477s : 2.85% type_inference : 0.005837s : 34.85% event_method : 0.000015s : 0.09% auto_monad : 0.000061s : 0.37% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.13% optimize.rewriter_before_opt_a : 0.000065s : 0.39% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000560s : 3.34% optimize.opt_a.with_stream_mark : 0.000025s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000154s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000047s : 0.28% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000530s : 3.16% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000047s : 0.28% optimize.opt_a.a_3 : 0.000075s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000493s : 2.94% optimize.opt_b.b_1 : 0.000112s : 0.67% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.15% optimize.loop_unroll : 0.000441s : 2.63% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.01% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000478s : 2.85% validate : 0.000038s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.006308s : 37.65% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000174 26 18.65% : 0.000033s : 5: substitution.arithmetic_simplify 1.24% : 0.000002s : 2: substitution.elim_not_effective 0.96% : 0.000002s : 2: substitution.fold_const_symbol 3.42% : 0.000006s : 3: substitution.graph_param_transform 64.27% : 0.000112s : 3: substitution.inline 1.79% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.68% : 0.000005s : 4: substitution.remove_not_recompute_node 1.81% : 0.000003s : 2: substitution.replace_old_param 5.18% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005786 2 89.80% : 0.005196s : 1: type_inference.infer 10.20% : 0.000590s : 1: type_inference.specialize ------[replace.] 0.000037 4 79.45% : 0.000029s : 3: replace.inline 20.55% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 4 93.16% : 0.000110s : 3: match.inline 6.84% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 883 0.92% : 0.000001s : 9: predicate.accumulaten_eliminater 0.89% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.89% : 0.000001s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.10% : 0.000003s : 15: predicate.arithmetic_simplify 0.92% : 0.000001s : 9: predicate.cast_eliminate 0.65% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.66% : 0.000001s : 6: predicate.depend_value_elim 0.87% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 1.02% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.38% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.14% : 0.000002s : 12: predicate.environ_get_depend_swap 1.74% : 0.000003s : 18: predicate.environ_get_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.73% : 0.000001s : 6: predicate.get_grad_eliminate 0.20% : 0.000000s : 3: predicate.graph_param_transform 0.67% : 0.000001s : 6: predicate.incorporate_call 0.60% : 0.000001s : 6: predicate.incorporate_call_switch 6.34% : 0.000010s : 40: predicate.inline 0.94% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.01% : 0.000002s : 6: predicate.less_batch_normalization 1.65% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 25: predicate.load_eliminater 0.97% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.88% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.63% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.23% : 0.000002s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.35% : 0.000001s : 3: predicate.parallel_virtual_node 1.58% : 0.000002s : 13: predicate.partial_defer_inline 1.45% : 0.000002s : 13: predicate.partial_eliminate 1.00% : 0.000002s : 9: predicate.print_const_string_wrapper 0.63% : 0.000001s : 6: predicate.reduce_all_const_elim 1.14% : 0.000002s : 9: predicate.reduce_eliminate 2.50% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 6: predicate.remove_not_recompute_node 1.30% : 0.000002s : 16: predicate.replace_applicator 0.78% : 0.000001s : 6: predicate.replace_old_param 0.26% : 0.000000s : 3: predicate.reset_defer_inline 0.90% : 0.000001s : 9: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 3: predicate.row_tensor_eliminate 0.84% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 6: predicate.shard_identity_eliminate 0.92% : 0.000001s : 6: predicate.special_op_eliminate 0.84% : 0.000001s : 6: predicate.specialize_transform 0.95% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.71% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 13: predicate.switch_defer_inline 1.96% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.96% : 0.000008s : 43: predicate.switch_simplify 0.87% : 0.000001s : 9: predicate.tile_eliminate 0.92% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 3: predicate.value_based_eliminate 0.70% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000367 8 45.19% : 0.000166s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.81% : 0.000201s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030015 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.56% : 0.003169s : 1: add_attr 10.53% : 0.003159s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000067s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.71% : 0.000514s : 1: bootstrap 0.10% : 0.000030s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000006s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.50% : 0.000449s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.68% : 0.000503s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 3.22% : 0.000967s : 78: opt.transform.opt_a 0.08% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.68% : 0.002306s : 1: opt_a 0.33% : 0.000100s : 1: opt_after_cconv 1.62% : 0.000488s : 1: opt_after_jit_grad 0.65% : 0.000196s : 1: opt_b 14.22% : 0.004268s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.09% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.91% : 0.000274s : 1: renormalize.infer 0.82% : 0.000247s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000038s : 1: rewriter_after_opt_a 0.23% : 0.000069s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000075s : 1: symbol_engine_optimizer 21.06% : 0.006322s : 1: task_emit 0.24% : 0.000074s : 1: tuple_transform 19.51% : 0.005855s : 1: type_inference 0.22% : 0.000067s : 1: validate TotalTime = 0.0428138, [24] [bootstrap]: 0.00049192 [type_inference]: 0.0123115 [event_method]: 5.033e-05 [auto_monad]: 0.0001402 [graph_reusing]: 9.15999e-06 [inline]: 2.29001e-06 [add_attr]: 0.00331282, [1] [add_attr_with_inline]: 0.00330376, [1] [Cycle 1]: 7.886e-05, [2] [tag_attr]: 3.579e-05 [meta_addattr_fg_expand]: 1.001e-05 [parallel-infer-symbol]: 3.87998e-06 [pre_auto_parallel]: 5.151e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.39001e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.0182922, [53] [py_interpret_to_execute]: 4.132e-05 [rewriter_before_opt_a]: 0.00015841 [opt_a]: 0.0157772, [3] [Cycle 1]: 0.0119439, [45] [expand_dump_flag]: 4.68001e-06 [switch_simplify]: 7.713e-05 [loop_unroll]: 6.375e-05 [a_1]: 0.00153268 [with_stream_mark]: 2.722e-05 [recompute_prepare]: 2.31e-05 [updatestate_depend_eliminate]: 8.52e-06 [updatestate_assign_eliminate]: 7.68001e-06 [updatestate_loads_eliminate]: 6.78e-06 [parameter_eliminate]: 2.84001e-06 [a_2]: 0.00024855 [accelerated_algorithm]: 3.324e-05 [shard]: 2.26e-06 [meta_shard_fg_expand]: 3.95e-06 [shard_inline]: 1.611e-05 [merge_send_recv]: 1.737e-05 [auto_parallel]: 1.153e-05 [parallel]: 2.116e-05 [flash_sp]: 1.211e-05 [merge_comm]: 9.44e-06 [allreduce_fusion]: 8.65999e-06 [matmul_add_comm_reduction]: 3.074e-05 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 1.828e-05 [virtual_dataset]: 1.534e-05 [get_grad_eliminate_]: 1.506e-05 [virtual_output]: 1.534e-05 [merge_forward]: 9.20999e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 1.886e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.985e-05 [merge_recompute_call_nodes]: 1.77001e-06 [before_grad]: 2.915e-05 [set_forward_comm_id_for_comm_node_pass]: 9.86e-06 [meta_fg_expand]: 0.00162372 [flash_sp_send_recv_attached]: 4.77e-06 [receive_attached]: 2.21e-06 [after_resolve]: 6.805e-05 [a_after_grad]: 8.97e-05 [renormalize]: 0.00689582 [add_forward_monad_depend]: 1.107e-05 [auto_monad_grad]: 6.73e-06 [auto_monad_eliminator]: 5.409e-05 [cse]: 0.00019179 [a_3]: 0.00034583 [Cycle 2]: 0.0031005, [45] [expand_dump_flag]: 2.69999e-06 [switch_simplify]: 4.686e-05 [loop_unroll]: 4.332e-05 [a_1]: 0.00141945 [with_stream_mark]: 1.885e-05 [recompute_prepare]: 1.146e-05 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 3.91999e-06 [updatestate_loads_eliminate]: 3.14001e-06 [parameter_eliminate]: 1.82001e-06 [a_2]: 9.608e-05 [accelerated_algorithm]: 1.355e-05 [shard]: 2.34999e-06 [meta_shard_fg_expand]: 2.14e-06 [shard_inline]: 8.23999e-06 [merge_send_recv]: 9.72001e-06 [auto_parallel]: 9.67001e-06 [parallel]: 7.45e-06 [flash_sp]: 4.33999e-06 [merge_comm]: 4.42998e-06 [allreduce_fusion]: 3.73001e-06 [matmul_add_comm_reduction]: 9.32001e-06 [allreduce_slice_to_reducescatter]: 7.80012e-07 [virtual_shard_identity]: 9.61e-06 [virtual_dataset]: 6.98998e-06 [get_grad_eliminate_]: 6.62002e-06 [virtual_output]: 6.48e-06 [merge_forward]: 4.24002e-06 [cell_reuse_recompute_pass]: 9.89996e-07 [offload_activation]: 1.021e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.473e-05 [merge_recompute_call_nodes]: 1.28002e-06 [before_grad]: 1.19e-05 [set_forward_comm_id_for_comm_node_pass]: 4.60001e-06 [meta_fg_expand]: 0.00010608 [flash_sp_send_recv_attached]: 1.73002e-06 [receive_attached]: 2.24001e-06 [after_resolve]: 1.559e-05 [a_after_grad]: 1.065e-05 [renormalize]: 0.00077575 [add_forward_monad_depend]: 5.11002e-06 [auto_monad_grad]: 2.43e-06 [auto_monad_eliminator]: 1.566e-05 [cse]: 3.061e-05 [a_3]: 5.133e-05 [Cycle 3]: 0.00071377, [45] [expand_dump_flag]: 1.53002e-06 [switch_simplify]: 8.60001e-06 [loop_unroll]: 6.63e-06 [a_1]: 0.00015329 [with_stream_mark]: 9.21002e-06 [recompute_prepare]: 7.35e-06 [updatestate_depend_eliminate]: 4.12998e-06 [updatestate_assign_eliminate]: 2.91e-06 [updatestate_loads_eliminate]: 2.76e-06 [parameter_eliminate]: 1.17e-06 [a_2]: 8.784e-05 [accelerated_algorithm]: 9.97999e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 6.84001e-06 [merge_send_recv]: 6.96999e-06 [auto_parallel]: 8.40001e-06 [parallel]: 6.23998e-06 [flash_sp]: 1.42e-06 [merge_comm]: 3.80998e-06 [allreduce_fusion]: 3.86999e-06 [matmul_add_comm_reduction]: 6.52001e-06 [allreduce_slice_to_reducescatter]: 4.90021e-07 [virtual_shard_identity]: 8.17e-06 [virtual_dataset]: 6.64001e-06 [get_grad_eliminate_]: 6.14001e-06 [virtual_output]: 6.29001e-06 [merge_forward]: 3.83001e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 8.54e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.279e-05 [merge_recompute_call_nodes]: 1.13001e-06 [before_grad]: 1.12e-05 [set_forward_comm_id_for_comm_node_pass]: 4.84e-06 [meta_fg_expand]: 2.86999e-06 [flash_sp_send_recv_attached]: 1.00001e-06 [receive_attached]: 1.17999e-06 [after_resolve]: 1.113e-05 [a_after_grad]: 9.47999e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 1.39998e-06 [auto_monad_eliminator]: 8.69e-06 [cse]: 1.757e-05 [a_3]: 4.065e-05 [py_interpret_to_execute_after_opt_a]: 1.497e-05 [slice_cell_reuse_recomputed_activation]: 2.36998e-06 [rewriter_after_opt_a]: 4.66e-05 [convert_after_rewriter]: 8.18001e-06 [order_py_execute_after_rewriter]: 5.95002e-06 [mutable_eliminate]: 0.00070662 [opt_b]: 0.00024025, [1] [Cycle 1]: 0.00023085, [7] [b_1]: 0.0001376 [b_2]: 1.516e-05 [updatestate_depend_eliminate]: 7.82998e-06 [updatestate_assign_eliminate]: 3.28998e-06 [updatestate_loads_eliminate]: 3.33998e-06 [renormalize]: 8.49977e-07 [cse]: 2.561e-05 [optimize_parallel_all_gather_comm]: 2.024e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.917e-05 [loop_unroll]: 0.00046938 [opt_after_cconv]: 0.00013238, [1] [Cycle 1]: 0.00012534, [7] [c_1]: 4.802e-05 [parameter_eliminate]: 4.05e-06 [updatestate_depend_eliminate]: 6.83e-06 [updatestate_assign_eliminate]: 3.11999e-06 [updatestate_loads_eliminate]: 2.63e-06 [cse]: 2.294e-05 [renormalize]: 2.00002e-07 [remove_dup_value]: 1.75e-05 [tuple_transform]: 8.405e-05, [1] [Cycle 1]: 7.874e-05, [4] [d_1]: 5.07e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 7.48999e-06 [partial_unused_args_eliminate]: 1.87999e-06 [add_recomputation]: 6.05e-05 [cse_after_recomputation]: 2.547e-05, [1] [Cycle 1]: 2.075e-05, [1] [cse]: 1.482e-05 [environ_conv]: 1.032e-05 [swap_dp_allreduce_reducescatter]: 5.61998e-06 [bias_add_comm_swap]: 3.18e-06 [label_micro_interleaved_index]: 5.30999e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.52001e-06 [assign_add_opt]: 1.45001e-06 [ForceFp32Comm]: 8.29983e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.21998e-06 [reorder_send_recv_between_fp_bp]: 2.98003e-06 [comm_op_add_attrs]: 1.43002e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.32999e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92999e-06 [control_data_broadcast_order]: 1.496e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 4.51002e-06 [overlap_recompute_and_grad_model_parallel]: 5.00001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.46002e-06 [overlap_recompute_comm]: 2.70002e-06 [overlap_grad_ring_attention]: 4.89e-06 [overlap_grad_flash_sp]: 2.52e-05 [begin_end_overlap_inline]: 6.19999e-07 [split_matmul_comm_elemetwise]: 2.82002e-06 [split_layernorm_comm]: 2.12001e-06 [handle_group_info]: 1.15001e-06 [symbol_engine_optimizer]: 9.208e-05, [1] [Cycle 1]: 8.728e-05, [6] [build]: 1.004e-05 [elim_shapecalc]: 1.135e-05 [elim_not_effective]: 1.545e-05 [opt_reshape]: 7.26001e-06 [fold_const_symbol]: 1.213e-05 [renormalize]: 2.50002e-07 [detach_backward]: 2.09999e-06 [pipeline_parallel_scheduler]: 1.80001e-06 [auto_monad_reorder]: 2.19e-05 [get_jit_bprop_graph]: 2.12001e-06 [rewriter_after_jit_bprop_graph]: 4.58999e-06 [opt_after_jit_grad]: 0.00053184 [validate]: 5.034e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00724918 [execute]: 1.006e-05 Sums bootstrap : 0.000492s : 1.29% type_inference : 0.012312s : 32.34% event_method : 0.000050s : 0.13% auto_monad : 0.000140s : 0.37% graph_reusing : 0.000009s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000052s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000041s : 0.11% optimize.rewriter_before_opt_a : 0.000158s : 0.42% optimize.opt_a.expand_dump_flag : 0.000009s : 0.02% optimize.opt_a.switch_simplify : 0.000133s : 0.35% optimize.opt_a.loop_unroll : 0.000114s : 0.30% optimize.opt_a.a_1 : 0.003105s : 8.16% optimize.opt_a.with_stream_mark : 0.000055s : 0.15% optimize.opt_a.recompute_prepare : 0.000042s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000018s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.03% optimize.opt_a.parameter_eliminate : 0.000006s : 0.02% optimize.opt_a.a_2 : 0.000432s : 1.14% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.15% optimize.opt_a.shard : 0.000006s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000031s : 0.08% optimize.opt_a.merge_send_recv : 0.000034s : 0.09% optimize.opt_a.auto_parallel : 0.000030s : 0.08% optimize.opt_a.parallel : 0.000035s : 0.09% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000018s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000047s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000036s : 0.09% optimize.opt_a.virtual_dataset : 0.000029s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.07% optimize.opt_a.virtual_output : 0.000028s : 0.07% optimize.opt_a.merge_forward : 0.000017s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000038s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000057s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000052s : 0.14% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000019s : 0.05% optimize.opt_a.meta_fg_expand : 0.001733s : 4.55% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.01% optimize.opt_a.after_resolve : 0.000095s : 0.25% optimize.opt_a.a_after_grad : 0.000110s : 0.29% optimize.opt_a.renormalize : 0.007672s : 20.15% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.05% optimize.opt_a.auto_monad_grad : 0.000011s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000078s : 0.21% optimize.opt_a.cse : 0.000240s : 0.63% optimize.opt_a.a_3 : 0.000438s : 1.15% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.12% optimize.convert_after_rewriter : 0.000008s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000707s : 1.86% optimize.opt_b.b_1 : 0.000138s : 0.36% optimize.opt_b.b_2 : 0.000015s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000026s : 0.07% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000029s : 0.08% optimize.loop_unroll : 0.000469s : 1.23% optimize.opt_after_cconv.c_1 : 0.000048s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000023s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000018s : 0.05% optimize.tuple_transform.d_1 : 0.000051s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000060s : 0.16% optimize.cse_after_recomputation.cse : 0.000015s : 0.04% optimize.environ_conv : 0.000010s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000015s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000025s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.06% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.01% opt_after_jit_grad : 0.000532s : 1.40% validate : 0.000050s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.007249s : 19.04% execute : 0.000010s : 0.03% Time group info: ------[substitution.] 0.000836 161 7.62% : 0.000064s : 8: substitution.arithmetic_simplify 0.31% : 0.000003s : 3: substitution.elim_not_effective 0.58% : 0.000005s : 5: substitution.float_depend_g_call 0.51% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.22% : 0.000002s : 3: substitution.fold_const_symbol 0.87% : 0.000007s : 4: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 60.58% : 0.000507s : 17: substitution.inline 2.13% : 0.000018s : 2: substitution.inline_without_move 1.27% : 0.000011s : 15: substitution.j_node_and_user_rematch 2.10% : 0.000018s : 3: substitution.less_batch_normalization 1.36% : 0.000011s : 7: substitution.minmaximum_grad 0.75% : 0.000006s : 5: substitution.partial_eliminate 1.54% : 0.000013s : 15: substitution.remove_not_recompute_node 3.52% : 0.000029s : 10: substitution.replace_applicator 1.48% : 0.000012s : 10: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.68% : 0.000022s : 7: substitution.tuple_list_convert_item_index_to_positive 1.35% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.79% : 0.000015s : 7: substitution.tuple_list_get_item_depend_reorder 6.60% : 0.000055s : 19: substitution.tuple_list_get_item_eliminator 1.81% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012217 2 86.71% : 0.010594s : 1: type_inference.infer 13.29% : 0.001623s : 1: type_inference.specialize ------[replace.] 0.000207 27 65.48% : 0.000136s : 17: replace.inline 34.52% : 0.000072s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000525 27 94.72% : 0.000497s : 17: match.inline 5.28% : 0.000028s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000762 4248 1.07% : 0.000008s : 53: predicate.accumulaten_eliminater 0.26% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.43% : 0.000003s : 21: predicate.addn_check_dump 1.06% : 0.000008s : 53: predicate.addn_zero_filter 1.00% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 1.82% : 0.000014s : 74: predicate.arithmetic_simplify 1.11% : 0.000008s : 53: predicate.cast_eliminate 1.02% : 0.000008s : 50: predicate.check_bprop_eliminate 0.44% : 0.000003s : 21: predicate.compare_switch_simplify 0.05% : 0.000000s : 4: predicate.const_output_eliminate 0.43% : 0.000003s : 21: predicate.depend_value_elim 1.13% : 0.000009s : 53: predicate.dict_get_item_const_eliminator 1.11% : 0.000008s : 53: predicate.dict_get_item_eliminator 1.06% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000009s : 57: predicate.environ_add_const_eliminate 1.09% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.08% : 0.000008s : 57: predicate.environ_get_depend_swap 1.56% : 0.000012s : 78: predicate.environ_get_eliminate 1.11% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.69% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.30% : 0.000018s : 80: predicate.float_depend_g_call 0.46% : 0.000004s : 21: predicate.float_environ_get_switch 0.53% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.49% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000001s : 4: predicate.graph_param_transform 0.46% : 0.000004s : 21: predicate.incorporate_call 0.42% : 0.000003s : 21: predicate.incorporate_call_switch 5.53% : 0.000042s : 183: predicate.inline 1.31% : 0.000010s : 45: predicate.inline_without_move 0.26% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.60% : 0.000005s : 21: predicate.less_batch_normalization 1.47% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.48% : 0.000019s : 124: predicate.load_eliminater 0.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.38% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.31% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.45% : 0.000003s : 21: predicate.merge_addn 1.02% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.03% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.06% : 0.000008s : 53: predicate.minmaximum_grad 0.43% : 0.000003s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.11% : 0.000001s : 4: predicate.parallel_virtual_node 1.92% : 0.000015s : 80: predicate.partial_defer_inline 1.58% : 0.000012s : 67: predicate.partial_eliminate 1.03% : 0.000008s : 53: predicate.print_const_string_wrapper 0.44% : 0.000003s : 21: predicate.reduce_all_const_elim 1.29% : 0.000010s : 53: predicate.reduce_eliminate 2.45% : 0.000019s : 124: predicate.redundant_stop_gradient_eliminater 0.29% : 0.000002s : 21: predicate.remove_not_recompute_node 1.77% : 0.000014s : 113: predicate.replace_applicator 0.68% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.05% : 0.000008s : 53: predicate.reshape_eliminate 1.02% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.11% : 0.000001s : 4: predicate.row_tensor_eliminate 1.12% : 0.000009s : 50: predicate.same_eliminate 0.30% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.58% : 0.000004s : 21: predicate.shard_identity_eliminate 0.21% : 0.000002s : 8: predicate.special_op_eliminate 0.57% : 0.000004s : 21: predicate.specialize_transform 1.24% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.12% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.82% : 0.000014s : 80: predicate.switch_defer_inline 2.78% : 0.000021s : 130: predicate.switch_layer_defer_inline 4.90% : 0.000037s : 218: predicate.switch_simplify 1.03% : 0.000008s : 53: predicate.tile_eliminate 1.03% : 0.000008s : 53: predicate.transpose_eliminate 1.35% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.45% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000010s : 61: predicate.tuple_list_get_item_depend_reorder 2.63% : 0.000020s : 92: predicate.tuple_list_get_item_eliminator 6.40% : 0.000049s : 61: predicate.tuple_list_get_set_item_eliminator 1.80% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.46% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.42% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 4.60% : 0.000035s : 145: predicate.updatestate_useless_node_eliminater 0.10% : 0.000001s : 4: predicate.value_based_eliminate 0.47% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.48% : 0.000004s : 21: predicate.virtual_output_eliminate 0.08% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001861 36 61.01% : 0.001136s : 15: func_graph_cloner_run.FuncGraphClonerGraph 38.99% : 0.000726s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.076956 237 0.00% : 0.000004s : 1: ForceFp32Comm 4.31% : 0.003319s : 1: add_attr 4.30% : 0.003308s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000065s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000148s : 1: auto_monad 0.03% : 0.000026s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.69% : 0.000533s : 1: bootstrap 0.04% : 0.000033s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000019s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.04% : 0.000029s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000013s : 1: environ_conv 0.08% : 0.000058s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000014s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000005s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.62% : 0.000480s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.94% : 0.000720s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000020s : 1: opt.transform.mutable_eliminate 6.03% : 0.004642s : 117: opt.transform.opt_a 0.06% : 0.000046s : 1: opt.transform.opt_after_cconv 0.04% : 0.000028s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000124s : 28: opt.transform.opt_b 0.07% : 0.000056s : 2: opt.transform.opt_trans_graph 0.05% : 0.000042s : 4: opt.transform.symbol_engine_opt 20.51% : 0.015780s : 1: opt_a 0.18% : 0.000136s : 1: opt_after_cconv 0.71% : 0.000545s : 1: opt_after_jit_grad 0.32% : 0.000244s : 1: opt_b 23.78% : 0.018298s : 1: optimize 0.03% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.04% : 0.000029s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.07% : 0.000057s : 1: pre_auto_parallel 0.06% : 0.000045s : 1: py_interpret_to_execute 0.02% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000021s : 1: remove_dup_value 7.82% : 0.006020s : 2: renormalize.infer 2.13% : 0.001635s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.07% : 0.000051s : 1: rewriter_after_opt_a 0.21% : 0.000164s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000095s : 1: symbol_engine_optimizer 9.45% : 0.007270s : 1: task_emit 0.11% : 0.000087s : 1: tuple_transform 16.03% : 0.012336s : 1: type_inference 0.12% : 0.000094s : 1: validate TotalTime = 0.021298, [24] [bootstrap]: 0.00045739 [type_inference]: 0.00589346 [event_method]: 1.198e-05 [auto_monad]: 5.995e-05 [graph_reusing]: 5.79e-06 [inline]: 2.09999e-06 [add_attr]: 0.00316909, [1] [add_attr_with_inline]: 0.0031591, [1] [Cycle 1]: 6.346e-05, [2] [tag_attr]: 1.462e-05 [meta_addattr_fg_expand]: 4.01001e-06 [parallel-infer-symbol]: 3.68999e-06 [pre_auto_parallel]: 2.707e-05 [insert-virtual-dataset]: 2.60002e-06 [parallel-infer-symbol-second]: 8.60018e-07 [dataset_repeat_opt]: 2.06998e-06 [pipeline_split]: 1.94999e-06 [optimize]: 0.00421533, [53] [py_interpret_to_execute]: 2.196e-05 [rewriter_before_opt_a]: 5.486e-05 [opt_a]: 0.00223526, [2] [Cycle 1]: 0.00155854, [45] [expand_dump_flag]: 3.11001e-06 [switch_simplify]: 2.913e-05 [loop_unroll]: 1.73e-05 [a_1]: 0.00036181 [with_stream_mark]: 1.744e-05 [recompute_prepare]: 8.70001e-06 [updatestate_depend_eliminate]: 4.48001e-06 [updatestate_assign_eliminate]: 3.51999e-06 [updatestate_loads_eliminate]: 3.6e-06 [parameter_eliminate]: 1.80001e-06 [a_2]: 8.456e-05 [accelerated_algorithm]: 6.38e-06 [shard]: 1.99e-06 [meta_shard_fg_expand]: 1.90001e-06 [shard_inline]: 6.79999e-06 [merge_send_recv]: 8.41002e-06 [auto_parallel]: 7.21999e-06 [parallel]: 1.882e-05 [flash_sp]: 8.89e-06 [merge_comm]: 4.2e-06 [allreduce_fusion]: 3.66999e-06 [matmul_add_comm_reduction]: 9.50001e-06 [allreduce_slice_to_reducescatter]: 1.16002e-06 [virtual_shard_identity]: 8.43999e-06 [virtual_dataset]: 6.41998e-06 [get_grad_eliminate_]: 5.60001e-06 [virtual_output]: 6.26998e-06 [merge_forward]: 4.97e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 9.64e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.337e-05 [merge_recompute_call_nodes]: 1.67999e-06 [before_grad]: 1.125e-05 [set_forward_comm_id_for_comm_node_pass]: 3.88001e-06 [meta_fg_expand]: 2.53998e-06 [flash_sp_send_recv_attached]: 3.76001e-06 [receive_attached]: 2.44001e-06 [after_resolve]: 1.058e-05 [a_after_grad]: 8.72998e-06 [renormalize]: 0.00048557 [add_forward_monad_depend]: 5.47001e-06 [auto_monad_grad]: 2.60002e-06 [auto_monad_eliminator]: 1.591e-05 [cse]: 3.178e-05 [a_3]: 4.4e-05 [Cycle 2]: 0.00066499, [45] [expand_dump_flag]: 1.39998e-06 [switch_simplify]: 6.84999e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00011728 [with_stream_mark]: 1.303e-05 [recompute_prepare]: 6.53e-06 [updatestate_depend_eliminate]: 3.3e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.57001e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 8.483e-05 [accelerated_algorithm]: 6.53e-06 [shard]: 1.54e-06 [meta_shard_fg_expand]: 1.36998e-06 [shard_inline]: 6.03002e-06 [merge_send_recv]: 5.17e-06 [auto_parallel]: 7.16999e-06 [parallel]: 6.01998e-06 [flash_sp]: 4.11001e-06 [merge_comm]: 3.63999e-06 [allreduce_fusion]: 3.12002e-06 [matmul_add_comm_reduction]: 6.78998e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 7.33e-06 [virtual_dataset]: 5.52999e-06 [get_grad_eliminate_]: 5.20999e-06 [virtual_output]: 5.15999e-06 [merge_forward]: 3.75e-06 [cell_reuse_recompute_pass]: 2.11e-06 [offload_activation]: 8.21002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.177e-05 [merge_recompute_call_nodes]: 8.79983e-07 [before_grad]: 8.57e-06 [set_forward_comm_id_for_comm_node_pass]: 4.13999e-06 [meta_fg_expand]: 1.91e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.72001e-06 [after_resolve]: 8.98002e-06 [a_after_grad]: 8.06001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 2.15002e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 9.75002e-06 [cse]: 1.767e-05 [a_3]: 3.329e-05 [py_interpret_to_execute_after_opt_a]: 1.025e-05 [slice_cell_reuse_recomputed_activation]: 2.22001e-06 [rewriter_after_opt_a]: 3.867e-05 [convert_after_rewriter]: 7.35e-06 [order_py_execute_after_rewriter]: 5.50001e-06 [mutable_eliminate]: 0.00052878 [opt_b]: 0.00019556, [1] [Cycle 1]: 0.00018822, [7] [b_1]: 0.00011146 [b_2]: 7.35e-06 [updatestate_depend_eliminate]: 6.83e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.31998e-06 [renormalize]: 7.50006e-07 [cse]: 2.021e-05 [optimize_parallel_all_gather_comm]: 1.691e-05 [overlap_param_gather]: 1.76998e-06 [cconv]: 2.385e-05 [loop_unroll]: 0.00042613 [opt_after_cconv]: 9.697e-05, [1] [Cycle 1]: 9.107e-05, [7] [c_1]: 2.607e-05 [parameter_eliminate]: 2.86e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.753e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.477e-05 [tuple_transform]: 7.077e-05, [1] [Cycle 1]: 6.573e-05, [4] [d_1]: 3.915e-05 [none_parameter_eliminate]: 1.37999e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 6.54999e-06 [partial_unused_args_eliminate]: 2.15002e-06 [add_recomputation]: 4.851e-05 [cse_after_recomputation]: 2.069e-05, [1] [Cycle 1]: 1.607e-05, [1] [cse]: 1.083e-05 [environ_conv]: 5.89e-06 [swap_dp_allreduce_reducescatter]: 4.90999e-06 [bias_add_comm_swap]: 2.57001e-06 [label_micro_interleaved_index]: 4.20999e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.36998e-06 [slice_recompute_activation]: 2.47001e-06 [micro_interleaved_order_control]: 2.06e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 9.89996e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.28002e-06 [reorder_send_recv_between_fp_bp]: 2.88998e-06 [comm_op_add_attrs]: 1.09998e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.44e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.289e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 4.27e-06 [overlap_recompute_and_grad_model_parallel]: 4.77e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 2.78e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.952e-05 [begin_end_overlap_inline]: 6.19999e-07 [split_matmul_comm_elemetwise]: 2.43998e-06 [split_layernorm_comm]: 2.02999e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 7.195e-05, [1] [Cycle 1]: 6.743e-05, [6] [build]: 2.91e-06 [elim_shapecalc]: 8.85999e-06 [elim_not_effective]: 1.189e-05 [opt_reshape]: 6.48e-06 [fold_const_symbol]: 9.41998e-06 [renormalize]: 2.19996e-07 [detach_backward]: 2.19999e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 1.715e-05 [get_jit_bprop_graph]: 1.55001e-06 [rewriter_after_jit_bprop_graph]: 3.78999e-06 [opt_after_jit_grad]: 0.00046481 [validate]: 3.703e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.0066912 [execute]: 6.84001e-06 Sums bootstrap : 0.000457s : 2.68% type_inference : 0.005893s : 34.52% event_method : 0.000012s : 0.07% auto_monad : 0.000060s : 0.35% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.13% optimize.rewriter_before_opt_a : 0.000055s : 0.32% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000036s : 0.21% optimize.opt_a.loop_unroll : 0.000023s : 0.13% optimize.opt_a.a_1 : 0.000479s : 2.81% optimize.opt_a.with_stream_mark : 0.000030s : 0.18% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000169s : 0.99% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.08% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000014s : 0.08% optimize.opt_a.parallel : 0.000025s : 0.15% optimize.opt_a.flash_sp : 0.000013s : 0.08% optimize.opt_a.merge_comm : 0.000008s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000009s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.02% optimize.opt_a.offload_activation : 0.000018s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000020s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.03% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000486s : 2.84% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000026s : 0.15% optimize.opt_a.cse : 0.000049s : 0.29% optimize.opt_a.a_3 : 0.000077s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000039s : 0.23% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000529s : 3.10% optimize.opt_b.b_1 : 0.000111s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000020s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.14% optimize.loop_unroll : 0.000426s : 2.50% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000020s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000465s : 2.72% validate : 0.000037s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006691s : 39.19% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000151 24 21.51% : 0.000032s : 4: substitution.arithmetic_simplify 1.22% : 0.000002s : 2: substitution.elim_not_effective 0.96% : 0.000001s : 2: substitution.fold_const_symbol 4.19% : 0.000006s : 3: substitution.graph_param_transform 64.42% : 0.000097s : 3: substitution.inline 2.30% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.51% : 0.000005s : 4: substitution.remove_not_recompute_node 1.89% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005845 2 92.11% : 0.005384s : 1: type_inference.infer 7.89% : 0.000461s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000095 3 100.00% : 0.000095s : 3: match.inline ------[predicate.] 0.000152 815 0.86% : 0.000001s : 8: predicate.accumulaten_eliminater 1.01% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 0.81% : 0.000001s : 8: predicate.addn_zero_filter 0.74% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.14% : 0.000003s : 14: predicate.arithmetic_simplify 0.85% : 0.000001s : 8: predicate.cast_eliminate 0.66% : 0.000001s : 6: predicate.check_bprop_eliminate 0.61% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.68% : 0.000001s : 6: predicate.depend_value_elim 0.81% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.46% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_depend_swap 1.92% : 0.000003s : 17: predicate.environ_get_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.25% : 0.000003s : 11: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.93% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.47% : 0.000001s : 3: predicate.graph_param_transform 0.87% : 0.000001s : 6: predicate.incorporate_call 0.78% : 0.000001s : 6: predicate.incorporate_call_switch 6.29% : 0.000010s : 37: predicate.inline 1.10% : 0.000002s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.01% : 0.000002s : 6: predicate.less_batch_normalization 1.53% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 22: predicate.load_eliminater 1.12% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.06% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 6: predicate.merge_addn 0.63% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 8: predicate.minmaximum_grad 1.67% : 0.000003s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.55% : 0.000001s : 3: predicate.parallel_virtual_node 1.54% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 11: predicate.partial_eliminate 0.80% : 0.000001s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.18% : 0.000002s : 8: predicate.reduce_eliminate 2.20% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 6: predicate.remove_not_recompute_node 1.37% : 0.000002s : 14: predicate.replace_applicator 0.64% : 0.000001s : 6: predicate.replace_old_param 0.40% : 0.000001s : 3: predicate.reset_defer_inline 0.85% : 0.000001s : 8: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.51% : 0.000001s : 3: predicate.row_tensor_eliminate 0.95% : 0.000001s : 6: predicate.same_eliminate 0.65% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 6: predicate.shard_identity_eliminate 0.72% : 0.000001s : 6: predicate.special_op_eliminate 0.93% : 0.000001s : 6: predicate.specialize_transform 1.16% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.49% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.24% : 0.000002s : 11: predicate.switch_defer_inline 1.82% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.71% : 0.000007s : 38: predicate.switch_simplify 0.87% : 0.000001s : 8: predicate.tile_eliminate 0.81% : 0.000001s : 8: predicate.transpose_eliminate 1.50% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.06% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.95% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.74% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.76% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000297 7 37.82% : 0.000112s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.18% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030190 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.51% : 0.003174s : 1: add_attr 10.48% : 0.003163s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000066s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.63% : 0.000492s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.44% : 0.000435s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.78% : 0.000538s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000016s : 1: opt.transform.mutable_eliminate 2.88% : 0.000868s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.41% : 0.002238s : 1: opt_a 0.33% : 0.000100s : 1: opt_after_cconv 1.57% : 0.000475s : 1: opt_after_jit_grad 0.66% : 0.000199s : 1: opt_b 13.98% : 0.004219s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000032s : 1: pre_auto_parallel 0.09% : 0.000026s : 1: py_interpret_to_execute 0.05% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.84% : 0.000254s : 1: renormalize.infer 0.74% : 0.000223s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000043s : 1: rewriter_after_opt_a 0.20% : 0.000059s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000075s : 1: symbol_engine_optimizer 22.21% : 0.006707s : 1: task_emit 0.24% : 0.000074s : 1: tuple_transform 19.59% : 0.005913s : 1: type_inference 0.22% : 0.000067s : 1: validate TotalTime = 0.0402479, [24] [bootstrap]: 0.00047726 [type_inference]: 0.0119282 [event_method]: 4.513e-05 [auto_monad]: 0.00013226 [graph_reusing]: 8.03999e-06 [inline]: 2.01998e-06 [add_attr]: 0.0031484, [1] [add_attr_with_inline]: 0.00313969, [1] [Cycle 1]: 7.139e-05, [2] [tag_attr]: 3.255e-05 [meta_addattr_fg_expand]: 9.72001e-06 [parallel-infer-symbol]: 3.38999e-06 [pre_auto_parallel]: 4.817e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 2.17999e-06 [pipeline_split]: 1.82001e-06 [optimize]: 0.0171231, [53] [py_interpret_to_execute]: 4.023e-05 [rewriter_before_opt_a]: 0.00015163 [opt_a]: 0.0148563, [3] [Cycle 1]: 0.0113482, [45] [expand_dump_flag]: 4.48999e-06 [switch_simplify]: 7.236e-05 [loop_unroll]: 6.084e-05 [a_1]: 0.00138785 [with_stream_mark]: 2.448e-05 [recompute_prepare]: 2.206e-05 [updatestate_depend_eliminate]: 8.73001e-06 [updatestate_assign_eliminate]: 7.31999e-06 [updatestate_loads_eliminate]: 7.63001e-06 [parameter_eliminate]: 2.74001e-06 [a_2]: 0.00024642 [accelerated_algorithm]: 3.185e-05 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 3.93001e-06 [shard_inline]: 1.791e-05 [merge_send_recv]: 1.649e-05 [auto_parallel]: 1.112e-05 [parallel]: 1.866e-05 [flash_sp]: 1.193e-05 [merge_comm]: 9.91e-06 [allreduce_fusion]: 8.77e-06 [matmul_add_comm_reduction]: 3.015e-05 [allreduce_slice_to_reducescatter]: 7.49977e-07 [virtual_shard_identity]: 1.952e-05 [virtual_dataset]: 1.521e-05 [get_grad_eliminate_]: 1.496e-05 [virtual_output]: 1.539e-05 [merge_forward]: 9.17999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.839e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.027e-05 [merge_recompute_call_nodes]: 1.60001e-06 [before_grad]: 2.887e-05 [set_forward_comm_id_for_comm_node_pass]: 1.024e-05 [meta_fg_expand]: 0.00151212 [flash_sp_send_recv_attached]: 3.68e-06 [receive_attached]: 2.44999e-06 [after_resolve]: 6.496e-05 [a_after_grad]: 8.789e-05 [renormalize]: 0.00657932 [add_forward_monad_depend]: 1.196e-05 [auto_monad_grad]: 6.80002e-06 [auto_monad_eliminator]: 5.348e-05 [cse]: 0.00019035 [a_3]: 0.00034611 [Cycle 2]: 0.00281068, [45] [expand_dump_flag]: 2.89999e-06 [switch_simplify]: 4.634e-05 [loop_unroll]: 4.257e-05 [a_1]: 0.00135811 [with_stream_mark]: 1.45e-05 [recompute_prepare]: 9.89001e-06 [updatestate_depend_eliminate]: 4.81002e-06 [updatestate_assign_eliminate]: 3.98999e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.73002e-06 [a_2]: 0.00011911 [accelerated_algorithm]: 1.164e-05 [shard]: 1.97999e-06 [meta_shard_fg_expand]: 2.43e-06 [shard_inline]: 7.01999e-06 [merge_send_recv]: 8.80999e-06 [auto_parallel]: 9.62001e-06 [parallel]: 8.69003e-06 [flash_sp]: 4.52e-06 [merge_comm]: 4.15e-06 [allreduce_fusion]: 3.79002e-06 [matmul_add_comm_reduction]: 8.62e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 8.26002e-06 [virtual_dataset]: 6.72002e-06 [get_grad_eliminate_]: 6.65002e-06 [virtual_output]: 6.36e-06 [merge_forward]: 4.14002e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 9.70002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.362e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 1.148e-05 [set_forward_comm_id_for_comm_node_pass]: 4.13001e-06 [meta_fg_expand]: 6.533e-05 [flash_sp_send_recv_attached]: 1.56002e-06 [receive_attached]: 2.12999e-06 [after_resolve]: 1.226e-05 [a_after_grad]: 1.04e-05 [renormalize]: 0.00063356 [add_forward_monad_depend]: 4.27e-06 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 1.175e-05 [cse]: 2.125e-05 [a_3]: 4.839e-05 [Cycle 3]: 0.00068027, [45] [expand_dump_flag]: 1.07998e-06 [switch_simplify]: 8.18999e-06 [loop_unroll]: 6.93e-06 [a_1]: 0.00014799 [with_stream_mark]: 8.40001e-06 [recompute_prepare]: 6.87002e-06 [updatestate_depend_eliminate]: 3.67002e-06 [updatestate_assign_eliminate]: 2.81999e-06 [updatestate_loads_eliminate]: 2.58998e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 8.72e-05 [accelerated_algorithm]: 9.94001e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.64998e-06 [shard_inline]: 6.78e-06 [merge_send_recv]: 5.27999e-06 [auto_parallel]: 6.09001e-06 [parallel]: 5.14998e-06 [flash_sp]: 1.05001e-06 [merge_comm]: 3.62002e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 5.95002e-06 [allreduce_slice_to_reducescatter]: 4.30009e-07 [virtual_shard_identity]: 7.28e-06 [virtual_dataset]: 6.24999e-06 [get_grad_eliminate_]: 6.52001e-06 [virtual_output]: 6.02999e-06 [merge_forward]: 2.93998e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 7.15e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.225e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 1.074e-05 [set_forward_comm_id_for_comm_node_pass]: 3.78001e-06 [meta_fg_expand]: 2.51e-06 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.06002e-06 [after_resolve]: 8.54002e-06 [a_after_grad]: 9.68002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 7.36001e-06 [cse]: 1.621e-05 [a_3]: 3.95e-05 [py_interpret_to_execute_after_opt_a]: 1.28e-05 [slice_cell_reuse_recomputed_activation]: 2.02001e-06 [rewriter_after_opt_a]: 4.27e-05 [convert_after_rewriter]: 7.1e-06 [order_py_execute_after_rewriter]: 5.47999e-06 [mutable_eliminate]: 0.00059926 [opt_b]: 0.00021786, [1] [Cycle 1]: 0.0002105, [7] [b_1]: 0.0001351 [b_2]: 8.28001e-06 [updatestate_depend_eliminate]: 5.86e-06 [updatestate_assign_eliminate]: 2.94999e-06 [updatestate_loads_eliminate]: 2.80002e-06 [renormalize]: 5.60016e-07 [cse]: 2.021e-05 [optimize_parallel_all_gather_comm]: 1.738e-05 [overlap_param_gather]: 2.12001e-06 [cconv]: 2.338e-05 [loop_unroll]: 0.00043337 [opt_after_cconv]: 0.0001172, [1] [Cycle 1]: 0.00011096, [7] [c_1]: 3.899e-05 [parameter_eliminate]: 2.49001e-06 [updatestate_depend_eliminate]: 6.06998e-06 [updatestate_assign_eliminate]: 3.36999e-06 [updatestate_loads_eliminate]: 2.79999e-06 [cse]: 2.097e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.546e-05 [tuple_transform]: 7.877e-05, [1] [Cycle 1]: 7.401e-05, [4] [d_1]: 4.623e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 7.41001e-06 [partial_unused_args_eliminate]: 2.39001e-06 [add_recomputation]: 5.128e-05 [cse_after_recomputation]: 2.48e-05, [1] [Cycle 1]: 2e-05, [1] [cse]: 1.425e-05 [environ_conv]: 7.82002e-06 [swap_dp_allreduce_reducescatter]: 5.75001e-06 [bias_add_comm_swap]: 2.59999e-06 [label_micro_interleaved_index]: 4.45999e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 1.14998e-06 [remove_cast_before_assign_add]: 1.22999e-06 [full_micro_interleaved_order_control]: 2.17999e-06 [reorder_send_recv_between_fp_bp]: 3.21999e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 1.06002e-06 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.37e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.459e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 4.38001e-06 [overlap_recompute_and_grad_model_parallel]: 5.74e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 2.31998e-06 [overlap_grad_ring_attention]: 4.58001e-06 [overlap_grad_flash_sp]: 2.283e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.37001e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 1.23002e-06 [symbol_engine_optimizer]: 8.602e-05, [1] [Cycle 1]: 8.158e-05, [6] [build]: 7.98999e-06 [elim_shapecalc]: 1.049e-05 [elim_not_effective]: 1.484e-05 [opt_reshape]: 7.96001e-06 [fold_const_symbol]: 1.136e-05 [renormalize]: 2.29978e-07 [detach_backward]: 2.49999e-06 [pipeline_parallel_scheduler]: 1.78002e-06 [auto_monad_reorder]: 2.031e-05 [get_jit_bprop_graph]: 1.78002e-06 [rewriter_after_jit_bprop_graph]: 3.95998e-06 [opt_after_jit_grad]: 0.00047741 [validate]: 4.416e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.00650413 [execute]: 7.70998e-06 Sums bootstrap : 0.000477s : 1.34% type_inference : 0.011928s : 33.39% event_method : 0.000045s : 0.13% auto_monad : 0.000132s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000048s : 0.13% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000040s : 0.11% optimize.rewriter_before_opt_a : 0.000152s : 0.42% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000127s : 0.36% optimize.opt_a.loop_unroll : 0.000110s : 0.31% optimize.opt_a.a_1 : 0.002894s : 8.10% optimize.opt_a.with_stream_mark : 0.000047s : 0.13% optimize.opt_a.recompute_prepare : 0.000039s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000453s : 1.27% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.15% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000032s : 0.09% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000027s : 0.08% optimize.opt_a.parallel : 0.000033s : 0.09% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000018s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000045s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000035s : 0.10% optimize.opt_a.virtual_dataset : 0.000028s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.08% optimize.opt_a.virtual_output : 0.000028s : 0.08% optimize.opt_a.merge_forward : 0.000016s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000056s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000051s : 0.14% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.05% optimize.opt_a.meta_fg_expand : 0.001580s : 4.42% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000086s : 0.24% optimize.opt_a.a_after_grad : 0.000108s : 0.30% optimize.opt_a.renormalize : 0.007213s : 20.19% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.05% optimize.opt_a.auto_monad_grad : 0.000009s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000073s : 0.20% optimize.opt_a.cse : 0.000228s : 0.64% optimize.opt_a.a_3 : 0.000434s : 1.21% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000043s : 0.12% optimize.convert_after_rewriter : 0.000007s : 0.02% optimize.order_py_execute_after_rewriter : 0.000005s : 0.02% optimize.mutable_eliminate : 0.000599s : 1.68% optimize.opt_b.b_1 : 0.000135s : 0.38% optimize.opt_b.b_2 : 0.000008s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000020s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.07% optimize.loop_unroll : 0.000433s : 1.21% optimize.opt_after_cconv.c_1 : 0.000039s : 0.11% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000021s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.04% optimize.tuple_transform.d_1 : 0.000046s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.14% optimize.cse_after_recomputation.cse : 0.000014s : 0.04% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000023s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000008s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000020s : 0.06% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000477s : 1.34% validate : 0.000044s : 0.12% backend_pass : 0.000001s : 0.00% task_emit : 0.006504s : 18.21% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000734 159 7.28% : 0.000053s : 7: substitution.arithmetic_simplify 0.37% : 0.000003s : 3: substitution.elim_not_effective 0.66% : 0.000005s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 3: substitution.fold_const_symbol 0.93% : 0.000007s : 4: substitution.graph_param_transform 0.40% : 0.000003s : 2: substitution.incorporate_call 0.36% : 0.000003s : 2: substitution.incorporate_call_switch 57.32% : 0.000421s : 17: substitution.inline 2.30% : 0.000017s : 2: substitution.inline_without_move 1.40% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.29% : 0.000017s : 3: substitution.less_batch_normalization 1.51% : 0.000011s : 7: substitution.minmaximum_grad 0.87% : 0.000006s : 5: substitution.partial_eliminate 1.68% : 0.000012s : 15: substitution.remove_not_recompute_node 3.97% : 0.000029s : 10: substitution.replace_applicator 1.30% : 0.000010s : 10: substitution.replace_old_param 0.36% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.15% : 0.000023s : 7: substitution.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 2.11% : 0.000015s : 7: substitution.tuple_list_get_item_depend_reorder 7.36% : 0.000054s : 18: substitution.tuple_list_get_item_eliminator 2.09% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011850 2 87.69% : 0.010391s : 1: type_inference.infer 12.31% : 0.001459s : 1: type_inference.specialize ------[replace.] 0.000188 26 66.13% : 0.000124s : 17: replace.inline 33.87% : 0.000064s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000435 26 94.34% : 0.000411s : 17: match.inline 5.66% : 0.000025s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000687 4180 1.10% : 0.000008s : 52: predicate.accumulaten_eliminater 0.32% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.45% : 0.000003s : 21: predicate.addn_check_dump 1.16% : 0.000008s : 52: predicate.addn_zero_filter 1.08% : 0.000007s : 52: predicate.adjust_all_reduce_mul_add 1.96% : 0.000013s : 73: predicate.arithmetic_simplify 1.18% : 0.000008s : 52: predicate.cast_eliminate 1.11% : 0.000008s : 50: predicate.check_bprop_eliminate 0.48% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.47% : 0.000003s : 21: predicate.depend_value_elim 1.17% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.20% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000001s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.18% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.18% : 0.000008s : 56: predicate.environ_get_depend_swap 1.71% : 0.000012s : 77: predicate.environ_get_eliminate 1.20% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.81% : 0.000012s : 78: predicate.exchange_switch_depend_value 2.50% : 0.000017s : 78: predicate.float_depend_g_call 0.47% : 0.000003s : 21: predicate.float_environ_get_switch 0.57% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.54% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.54% : 0.000004s : 21: predicate.incorporate_call 0.46% : 0.000003s : 21: predicate.incorporate_call_switch 5.99% : 0.000041s : 180: predicate.inline 1.45% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.61% : 0.000004s : 21: predicate.less_batch_normalization 1.61% : 0.000011s : 69: predicate.list_to_tuple_eliminator_ 2.62% : 0.000018s : 121: predicate.load_eliminater 0.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.60% : 0.000018s : 110: predicate.loop_unroll_before_grad 1.37% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.47% : 0.000003s : 21: predicate.merge_addn 1.09% : 0.000007s : 50: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 52: predicate.minmaximum_grad 0.28% : 0.000002s : 4: predicate.mutable_eliminate 0.15% : 0.000001s : 4: predicate.opt_reshape 0.11% : 0.000001s : 4: predicate.parallel_virtual_node 2.09% : 0.000014s : 78: predicate.partial_defer_inline 1.71% : 0.000012s : 65: predicate.partial_eliminate 1.09% : 0.000008s : 52: predicate.print_const_string_wrapper 0.50% : 0.000003s : 21: predicate.reduce_all_const_elim 1.39% : 0.000010s : 52: predicate.reduce_eliminate 2.57% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.29% : 0.000002s : 21: predicate.remove_not_recompute_node 1.95% : 0.000013s : 111: predicate.replace_applicator 0.69% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.10% : 0.000008s : 52: predicate.reshape_eliminate 1.11% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.23% : 0.000008s : 50: predicate.same_eliminate 0.34% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.62% : 0.000004s : 21: predicate.shard_identity_eliminate 0.21% : 0.000001s : 8: predicate.special_op_eliminate 0.62% : 0.000004s : 21: predicate.specialize_transform 1.28% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.91% : 0.000013s : 78: predicate.switch_defer_inline 3.01% : 0.000021s : 128: predicate.switch_layer_defer_inline 5.19% : 0.000036s : 213: predicate.switch_simplify 1.12% : 0.000008s : 52: predicate.tile_eliminate 1.08% : 0.000007s : 52: predicate.transpose_eliminate 1.43% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.77% : 0.000019s : 90: predicate.tuple_list_get_item_eliminator 1.50% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 1.96% : 0.000013s : 81: predicate.tuple_list_set_item_eliminator 1.54% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.57% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.18% : 0.000022s : 142: predicate.updatestate_useless_node_eliminater 0.11% : 0.000001s : 4: predicate.value_based_eliminate 0.53% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.58% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001735 35 61.21% : 0.001062s : 14: func_graph_cloner_run.FuncGraphClonerGraph 38.79% : 0.000673s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.072353 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.36% : 0.003153s : 1: add_attr 4.34% : 0.003143s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000140s : 1: auto_monad 0.09% : 0.000068s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.70% : 0.000504s : 1: bootstrap 0.04% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.04% : 0.000028s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.07% : 0.000053s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.61% : 0.000442s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.84% : 0.000608s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 6.11% : 0.004423s : 117: opt.transform.opt_a 0.05% : 0.000038s : 1: opt.transform.opt_after_cconv 0.03% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000114s : 28: opt.transform.opt_b 0.07% : 0.000051s : 2: opt.transform.opt_trans_graph 0.06% : 0.000041s : 4: opt.transform.symbol_engine_opt 20.54% : 0.014859s : 1: opt_a 0.17% : 0.000121s : 1: opt_after_cconv 0.67% : 0.000487s : 1: opt_after_jit_grad 0.31% : 0.000221s : 1: opt_b 23.67% : 0.017128s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.04% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000006s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.07% : 0.000053s : 1: pre_auto_parallel 0.06% : 0.000044s : 1: py_interpret_to_execute 0.02% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 7.77% : 0.005618s : 2: renormalize.infer 2.18% : 0.001578s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000047s : 1: rewriter_after_opt_a 0.22% : 0.000157s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000089s : 1: symbol_engine_optimizer 9.01% : 0.006518s : 1: task_emit 0.11% : 0.000082s : 1: tuple_transform 16.51% : 0.011948s : 1: type_inference 0.10% : 0.000074s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x2-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x2-kbk],max_mem:6.0M TotalTime = 0.113269, [24] [bootstrap]: 0.0005764 [type_inference]: 0.00666103 [event_method]: 1.392e-05 [auto_monad]: 5.97e-05 [graph_reusing]: 5.78002e-06 [inline]: 1.92999e-06 [add_attr]: 0.00372788, [1] [add_attr_with_inline]: 0.00371552, [1] [Cycle 1]: 5.461e-05, [2] [tag_attr]: 1.505e-05 [meta_addattr_fg_expand]: 4.42e-06 [parallel-infer-symbol]: 3.62002e-06 [pre_auto_parallel]: 2.747e-05 [insert-virtual-dataset]: 2.68e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 2.58e-06 [pipeline_split]: 1.77001e-06 [optimize]: 0.00436871, [53] [py_interpret_to_execute]: 2.307e-05 [rewriter_before_opt_a]: 6.487e-05 [opt_a]: 0.00236743, [2] [Cycle 1]: 0.00168561, [45] [expand_dump_flag]: 3.08e-06 [switch_simplify]: 3.486e-05 [loop_unroll]: 2.041e-05 [a_1]: 0.00045526 [with_stream_mark]: 1.56e-05 [recompute_prepare]: 8.55001e-06 [updatestate_depend_eliminate]: 3.93001e-06 [updatestate_assign_eliminate]: 3.47002e-06 [updatestate_loads_eliminate]: 3.09001e-06 [parameter_eliminate]: 2.09e-06 [a_2]: 8.241e-05 [accelerated_algorithm]: 6.87002e-06 [shard]: 1.96e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 6.29001e-06 [merge_send_recv]: 9.11998e-06 [auto_parallel]: 6.94001e-06 [parallel]: 2.674e-05 [flash_sp]: 8.10999e-06 [merge_comm]: 4.1e-06 [allreduce_fusion]: 3.45e-06 [matmul_add_comm_reduction]: 9.54e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 7.71999e-06 [virtual_dataset]: 5.97001e-06 [get_grad_eliminate_]: 5.82001e-06 [virtual_output]: 6.18002e-06 [merge_forward]: 4.3e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.04e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.195e-05 [merge_recompute_call_nodes]: 1.77999e-06 [before_grad]: 1.026e-05 [set_forward_comm_id_for_comm_node_pass]: 3.65e-06 [meta_fg_expand]: 2.46e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 1.94999e-06 [after_resolve]: 1.019e-05 [a_after_grad]: 8.72e-06 [renormalize]: 0.00053237 [add_forward_monad_depend]: 8.16002e-06 [auto_monad_grad]: 2.20002e-06 [auto_monad_eliminator]: 1.431e-05 [cse]: 3.167e-05 [a_3]: 4.29e-05 [Cycle 2]: 0.00067161, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 7.18e-06 [loop_unroll]: 5.72001e-06 [a_1]: 0.00011491 [with_stream_mark]: 1.1e-05 [recompute_prepare]: 5.83002e-06 [updatestate_depend_eliminate]: 2.93998e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.68998e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 7.136e-05 [accelerated_algorithm]: 6.21998e-06 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 4.67e-06 [auto_parallel]: 5.67999e-06 [parallel]: 4.15999e-06 [flash_sp]: 3.36999e-06 [merge_comm]: 3.44001e-06 [allreduce_fusion]: 2.98998e-06 [matmul_add_comm_reduction]: 5.79e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.71999e-06 [virtual_dataset]: 5.58002e-06 [get_grad_eliminate_]: 5.18002e-06 [virtual_output]: 5.11002e-06 [merge_forward]: 2.97002e-06 [cell_reuse_recompute_pass]: 1.89e-06 [offload_activation]: 7.45e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.158e-05 [merge_recompute_call_nodes]: 9.79984e-07 [before_grad]: 1.029e-05 [set_forward_comm_id_for_comm_node_pass]: 4.1e-06 [meta_fg_expand]: 2.02999e-06 [flash_sp_send_recv_attached]: 1.01002e-06 [receive_attached]: 1.38002e-06 [after_resolve]: 9.69e-06 [a_after_grad]: 7.95e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 2.28002e-06 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 6.81001e-06 [cse]: 1.512e-05 [a_3]: 3.3e-05 [py_interpret_to_execute_after_opt_a]: 1.085e-05 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 3.468e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 5.29e-06 [mutable_eliminate]: 0.0005111 [opt_b]: 0.0001938, [1] [Cycle 1]: 0.00018719, [7] [b_1]: 0.00011226 [b_2]: 8.04002e-06 [updatestate_depend_eliminate]: 6.83e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 5.60016e-07 [cse]: 1.893e-05 [optimize_parallel_all_gather_comm]: 1.745e-05 [overlap_param_gather]: 1.74998e-06 [cconv]: 2.56e-05 [loop_unroll]: 0.00044096 [opt_after_cconv]: 9.951e-05, [1] [Cycle 1]: 9.368e-05, [7] [c_1]: 2.598e-05 [parameter_eliminate]: 2.99999e-06 [updatestate_depend_eliminate]: 5.49e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.21998e-06 [cse]: 1.927e-05 [renormalize]: 4.2998e-07 [remove_dup_value]: 1.472e-05 [tuple_transform]: 7.112e-05, [1] [Cycle 1]: 6.654e-05, [4] [d_1]: 3.874e-05 [none_parameter_eliminate]: 1.50001e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.71e-06 [partial_unused_args_eliminate]: 2.22999e-06 [add_recomputation]: 4.601e-05 [cse_after_recomputation]: 2.11e-05, [1] [Cycle 1]: 1.65e-05, [1] [cse]: 1.077e-05 [environ_conv]: 1.082e-05 [swap_dp_allreduce_reducescatter]: 5.49e-06 [bias_add_comm_swap]: 3.08e-06 [label_micro_interleaved_index]: 5.22e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.42001e-06 [micro_interleaved_order_control]: 2.48e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.19001e-06 [reorder_send_recv_between_fp_bp]: 2.83e-06 [comm_op_add_attrs]: 1.42999e-06 [add_comm_op_reuse_tag]: 1.04998e-06 [interleave_split_concat_branches]: 1.74e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.25001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.272e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.53001e-06 [overlap_recompute_and_grad_model_parallel]: 5.05001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 4.35999e-06 [overlap_grad_flash_sp]: 1.873e-05 [begin_end_overlap_inline]: 5.49975e-07 [split_matmul_comm_elemetwise]: 2.34999e-06 [split_layernorm_comm]: 2.02001e-06 [handle_group_info]: 1.47999e-06 [symbol_engine_optimizer]: 7.338e-05, [1] [Cycle 1]: 6.889e-05, [6] [build]: 2.53003e-06 [elim_shapecalc]: 9.37001e-06 [elim_not_effective]: 1.216e-05 [opt_reshape]: 6.52001e-06 [fold_const_symbol]: 9.66e-06 [renormalize]: 2.70025e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.663e-05 [get_jit_bprop_graph]: 1.67001e-06 [rewriter_after_jit_bprop_graph]: 4.13999e-06 [opt_after_jit_grad]: 0.00048268 [validate]: 3.825e-05 [backend_pass]: 1.03001e-06 [task_emit]: 0.0970261 [execute]: 9.89001e-06 Sums bootstrap : 0.000576s : 0.53% type_inference : 0.006661s : 6.14% event_method : 0.000014s : 0.01% auto_monad : 0.000060s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000027s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.02% optimize.rewriter_before_opt_a : 0.000065s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000042s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000570s : 0.53% optimize.opt_a.with_stream_mark : 0.000027s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000154s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000014s : 0.01% optimize.opt_a.auto_parallel : 0.000013s : 0.01% optimize.opt_a.parallel : 0.000031s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000018s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000532s : 0.49% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000047s : 0.04% optimize.opt_a.a_3 : 0.000076s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000511s : 0.47% optimize.opt_b.b_1 : 0.000112s : 0.10% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000019s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.02% optimize.loop_unroll : 0.000441s : 0.41% optimize.opt_after_cconv.c_1 : 0.000026s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000011s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000483s : 0.45% validate : 0.000038s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.097026s : 89.47% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000180 26 19.37% : 0.000035s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.16% : 0.000006s : 3: substitution.graph_param_transform 64.44% : 0.000116s : 3: substitution.inline 1.81% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.67% : 0.000005s : 4: substitution.remove_not_recompute_node 2.06% : 0.000004s : 2: substitution.replace_old_param 4.66% : 0.000008s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006606 2 90.92% : 0.006006s : 1: type_inference.infer 9.08% : 0.000600s : 1: type_inference.specialize ------[replace.] 0.000037 4 79.85% : 0.000030s : 3: replace.inline 20.15% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 4 93.69% : 0.000114s : 3: match.inline 6.31% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 883 0.90% : 0.000001s : 9: predicate.accumulaten_eliminater 0.91% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.87% : 0.000001s : 9: predicate.addn_zero_filter 0.86% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.07% : 0.000003s : 15: predicate.arithmetic_simplify 0.99% : 0.000002s : 9: predicate.cast_eliminate 0.62% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.59% : 0.000001s : 6: predicate.depend_value_elim 0.89% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 9: predicate.dict_set_item_eliminator 0.98% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_depend_swap 1.97% : 0.000003s : 18: predicate.environ_get_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.42% : 0.000004s : 13: predicate.float_depend_g_call 0.55% : 0.000001s : 6: predicate.float_environ_get_switch 0.83% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.69% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.58% : 0.000011s : 40: predicate.inline 0.87% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 6: predicate.less_batch_normalization 1.73% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 25: predicate.load_eliminater 1.18% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.10% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.84% : 0.000001s : 9: predicate.minmaximum_grad 1.32% : 0.000002s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.63% : 0.000003s : 13: predicate.partial_defer_inline 1.48% : 0.000002s : 13: predicate.partial_eliminate 0.85% : 0.000001s : 9: predicate.print_const_string_wrapper 0.60% : 0.000001s : 6: predicate.reduce_all_const_elim 1.20% : 0.000002s : 9: predicate.reduce_eliminate 2.40% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.56% : 0.000001s : 6: predicate.remove_not_recompute_node 1.29% : 0.000002s : 16: predicate.replace_applicator 0.63% : 0.000001s : 6: predicate.replace_old_param 0.27% : 0.000000s : 3: predicate.reset_defer_inline 0.97% : 0.000002s : 9: predicate.reshape_eliminate 0.56% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.57% : 0.000001s : 3: predicate.row_tensor_eliminate 0.80% : 0.000001s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 6: predicate.shard_identity_eliminate 0.78% : 0.000001s : 6: predicate.special_op_eliminate 0.82% : 0.000001s : 6: predicate.specialize_transform 0.89% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.71% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 13: predicate.switch_defer_inline 1.99% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 43: predicate.switch_simplify 0.89% : 0.000001s : 9: predicate.tile_eliminate 0.93% : 0.000002s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 15: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.50% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.57% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.03% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 3: predicate.value_based_eliminate 0.67% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.86% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000367 8 46.96% : 0.000172s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.04% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.122993 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.04% : 0.003733s : 1: add_attr 3.02% : 0.003719s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000065s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.49% : 0.000607s : 1: bootstrap 0.02% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000014s : 1: environ_conv 0.02% : 0.000021s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000005s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.37% : 0.000451s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.42% : 0.000521s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.77% : 0.000952s : 78: opt.transform.opt_a 0.02% : 0.000025s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000092s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000034s : 4: opt.transform.symbol_engine_opt 1.93% : 0.002371s : 1: opt_a 0.08% : 0.000103s : 1: opt_after_cconv 0.40% : 0.000493s : 1: opt_after_jit_grad 0.16% : 0.000197s : 1: opt_b 3.56% : 0.004373s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000032s : 1: pre_auto_parallel 0.02% : 0.000027s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000018s : 1: remove_dup_value 0.23% : 0.000282s : 1: renormalize.infer 0.20% : 0.000243s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000039s : 1: rewriter_after_opt_a 0.06% : 0.000069s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000076s : 1: symbol_engine_optimizer 78.91% : 0.097050s : 1: task_emit 0.06% : 0.000074s : 1: tuple_transform 5.43% : 0.006679s : 1: type_inference 0.06% : 0.000071s : 1: validate TotalTime = 0.100713, [24] [bootstrap]: 0.00050373 [type_inference]: 0.00673824 [event_method]: 1.262e-05 [auto_monad]: 6.112e-05 [graph_reusing]: 5.81e-06 [inline]: 2.09e-06 [add_attr]: 0.00321866, [1] [add_attr_with_inline]: 0.00320995, [1] [Cycle 1]: 5.481e-05, [2] [tag_attr]: 1.589e-05 [meta_addattr_fg_expand]: 4.38001e-06 [parallel-infer-symbol]: 3.14001e-06 [pre_auto_parallel]: 2.529e-05 [insert-virtual-dataset]: 2.63e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.0041525, [53] [py_interpret_to_execute]: 2.026e-05 [rewriter_before_opt_a]: 5.49e-05 [opt_a]: 0.00218926, [2] [Cycle 1]: 0.00157127, [45] [expand_dump_flag]: 2.98e-06 [switch_simplify]: 2.848e-05 [loop_unroll]: 1.702e-05 [a_1]: 0.0003628 [with_stream_mark]: 1.612e-05 [recompute_prepare]: 7.88999e-06 [updatestate_depend_eliminate]: 3.75e-06 [updatestate_assign_eliminate]: 3.73001e-06 [updatestate_loads_eliminate]: 3.68999e-06 [parameter_eliminate]: 1.91e-06 [a_2]: 8.175e-05 [accelerated_algorithm]: 6.41998e-06 [shard]: 2.18998e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 6.02001e-06 [merge_send_recv]: 9.00999e-06 [auto_parallel]: 6.40002e-06 [parallel]: 1.965e-05 [flash_sp]: 7.93001e-06 [merge_comm]: 3.64002e-06 [allreduce_fusion]: 3.55003e-06 [matmul_add_comm_reduction]: 9.61e-06 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 7.41001e-06 [virtual_dataset]: 6.15002e-06 [get_grad_eliminate_]: 5.72999e-06 [virtual_output]: 5.81e-06 [merge_forward]: 4.26001e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.77999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.124e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 1.009e-05 [set_forward_comm_id_for_comm_node_pass]: 3.71999e-06 [meta_fg_expand]: 2.51998e-06 [flash_sp_send_recv_attached]: 2.66999e-06 [receive_attached]: 2.31e-06 [after_resolve]: 9.88998e-06 [a_after_grad]: 8.60001e-06 [renormalize]: 0.00054149 [add_forward_monad_depend]: 4.51002e-06 [auto_monad_grad]: 2.34001e-06 [auto_monad_eliminator]: 1.412e-05 [cse]: 3.139e-05 [a_3]: 4.344e-05 [Cycle 2]: 0.00060851, [45] [expand_dump_flag]: 1.02998e-06 [switch_simplify]: 6.88998e-06 [loop_unroll]: 5.84999e-06 [a_1]: 0.00011388 [with_stream_mark]: 9.60001e-06 [recompute_prepare]: 5.89999e-06 [updatestate_depend_eliminate]: 3.09999e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.64999e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 7.233e-05 [accelerated_algorithm]: 5.76e-06 [shard]: 1.40001e-06 [meta_shard_fg_expand]: 1.39998e-06 [shard_inline]: 5.72999e-06 [merge_send_recv]: 5.07999e-06 [auto_parallel]: 5.72001e-06 [parallel]: 4.43001e-06 [flash_sp]: 3.75e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 2.86e-06 [matmul_add_comm_reduction]: 8.12e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 6.34999e-06 [virtual_dataset]: 5.59e-06 [get_grad_eliminate_]: 5.24e-06 [virtual_output]: 5.27999e-06 [merge_forward]: 2.91999e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.95002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.024e-05 [merge_recompute_call_nodes]: 9.09989e-07 [before_grad]: 8.56002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55998e-06 [meta_fg_expand]: 1.92001e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 9.90025e-07 [after_resolve]: 8.33999e-06 [a_after_grad]: 7.92e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.34e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 6.41e-06 [cse]: 1.424e-05 [a_3]: 3.32e-05 [py_interpret_to_execute_after_opt_a]: 8.66997e-06 [slice_cell_reuse_recomputed_activation]: 2.16e-06 [rewriter_after_opt_a]: 3.451e-05 [convert_after_rewriter]: 6.49999e-06 [order_py_execute_after_rewriter]: 5.24e-06 [mutable_eliminate]: 0.00050945 [opt_b]: 0.00019161, [1] [Cycle 1]: 0.00018498, [7] [b_1]: 0.00011248 [b_2]: 6.96001e-06 [updatestate_depend_eliminate]: 6.05002e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.27001e-06 [renormalize]: 4.50003e-07 [cse]: 1.931e-05 [optimize_parallel_all_gather_comm]: 1.668e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.559e-05 [loop_unroll]: 0.00043843 [opt_after_cconv]: 9.799e-05, [1] [Cycle 1]: 9.194e-05, [7] [c_1]: 2.543e-05 [parameter_eliminate]: 2.84001e-06 [updatestate_depend_eliminate]: 6.01e-06 [updatestate_assign_eliminate]: 2.98e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 1.815e-05 [renormalize]: 5.19998e-07 [remove_dup_value]: 1.469e-05 [tuple_transform]: 7.016e-05, [1] [Cycle 1]: 6.565e-05, [4] [d_1]: 3.827e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.72002e-06 [partial_unused_args_eliminate]: 1.92001e-06 [add_recomputation]: 4.686e-05 [cse_after_recomputation]: 2.137e-05, [1] [Cycle 1]: 1.685e-05, [1] [cse]: 1.148e-05 [environ_conv]: 5.36002e-06 [swap_dp_allreduce_reducescatter]: 5.42999e-06 [bias_add_comm_swap]: 2.76999e-06 [label_micro_interleaved_index]: 4.75001e-06 [label_fine_grained_interleaved_index]: 2.79999e-06 [merge_cast_opt]: 1.52001e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.63e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 1.14e-06 [remove_cast_before_assign_add]: 1.37e-06 [full_micro_interleaved_order_control]: 2.27999e-06 [reorder_send_recv_between_fp_bp]: 2.98998e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 1.10001e-06 [interleave_split_concat_branches]: 1.32999e-06 [interleave_parallel_branches]: 1.34e-06 [overlap_opt_shard_in_pipeline]: 1.44e-06 [overlap_opt_shard_grad_in_pipeline]: 2.09e-06 [control_data_broadcast_order]: 1.27e-05 [grouped_pairwise_exchange_alltoall]: 1.82001e-06 [offloading_packed_experts]: 4.11001e-06 [overlap_recompute_and_grad_model_parallel]: 4.67998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27999e-06 [overlap_recompute_comm]: 2.59001e-06 [overlap_grad_ring_attention]: 4.47998e-06 [overlap_grad_flash_sp]: 1.81e-05 [begin_end_overlap_inline]: 6.00005e-07 [split_matmul_comm_elemetwise]: 2.21998e-06 [split_layernorm_comm]: 2.12001e-06 [handle_group_info]: 1.27e-06 [symbol_engine_optimizer]: 7.28e-05, [1] [Cycle 1]: 6.807e-05, [6] [build]: 2.71e-06 [elim_shapecalc]: 8.87e-06 [elim_not_effective]: 1.253e-05 [opt_reshape]: 6.10002e-06 [fold_const_symbol]: 9.42001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 2.02001e-06 [pipeline_parallel_scheduler]: 1.94e-06 [auto_monad_reorder]: 1.604e-05 [get_jit_bprop_graph]: 1.29e-06 [rewriter_after_jit_bprop_graph]: 4.18001e-06 [opt_after_jit_grad]: 0.0004778 [validate]: 3.72e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.0851998 [execute]: 1.011e-05 Sums bootstrap : 0.000504s : 0.52% type_inference : 0.006738s : 6.98% event_method : 0.000013s : 0.01% auto_monad : 0.000061s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000055s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000035s : 0.04% optimize.opt_a.loop_unroll : 0.000023s : 0.02% optimize.opt_a.a_1 : 0.000477s : 0.49% optimize.opt_a.with_stream_mark : 0.000026s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000154s : 0.16% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000014s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000024s : 0.02% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000018s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000542s : 0.56% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000046s : 0.05% optimize.opt_a.a_3 : 0.000077s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.04% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000509s : 0.53% optimize.opt_b.b_1 : 0.000112s : 0.12% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.03% optimize.loop_unroll : 0.000438s : 0.45% optimize.opt_after_cconv.c_1 : 0.000025s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000478s : 0.50% validate : 0.000037s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.085200s : 88.32% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000147 24 20.28% : 0.000030s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 4.23% : 0.000006s : 3: substitution.graph_param_transform 65.93% : 0.000097s : 3: substitution.inline 2.10% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.08% : 0.000005s : 4: substitution.remove_not_recompute_node 1.97% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006685 2 92.57% : 0.006188s : 1: type_inference.infer 7.43% : 0.000497s : 1: type_inference.specialize ------[replace.] 0.000029 3 100.00% : 0.000029s : 3: replace.inline ------[match.] 0.000095 3 100.00% : 0.000095s : 3: match.inline ------[predicate.] 0.000149 815 0.87% : 0.000001s : 8: predicate.accumulaten_eliminater 0.91% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.70% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000001s : 8: predicate.addn_zero_filter 0.79% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.22% : 0.000003s : 14: predicate.arithmetic_simplify 0.87% : 0.000001s : 8: predicate.cast_eliminate 0.68% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.81% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.20% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.44% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_depend_swap 1.94% : 0.000003s : 17: predicate.environ_get_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.15% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.20% : 0.000003s : 11: predicate.float_depend_g_call 0.60% : 0.000001s : 6: predicate.float_environ_get_switch 0.89% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.77% : 0.000001s : 6: predicate.get_grad_eliminate 0.27% : 0.000000s : 3: predicate.graph_param_transform 0.74% : 0.000001s : 6: predicate.incorporate_call 0.62% : 0.000001s : 6: predicate.incorporate_call_switch 6.31% : 0.000009s : 37: predicate.inline 0.88% : 0.000001s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 6: predicate.less_batch_normalization 1.69% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 22: predicate.load_eliminater 1.12% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.05% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.68% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.83% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 8: predicate.minmaximum_grad 1.38% : 0.000002s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.42% : 0.000002s : 11: predicate.partial_defer_inline 1.34% : 0.000002s : 11: predicate.partial_eliminate 1.05% : 0.000002s : 8: predicate.print_const_string_wrapper 0.74% : 0.000001s : 6: predicate.reduce_all_const_elim 1.09% : 0.000002s : 8: predicate.reduce_eliminate 2.33% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 6: predicate.remove_not_recompute_node 1.26% : 0.000002s : 14: predicate.replace_applicator 0.79% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 0.93% : 0.000001s : 8: predicate.reshape_eliminate 0.69% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.90% : 0.000001s : 6: predicate.same_eliminate 0.52% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 6: predicate.shard_identity_eliminate 0.81% : 0.000001s : 6: predicate.special_op_eliminate 0.88% : 0.000001s : 6: predicate.specialize_transform 1.21% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.23% : 0.000002s : 11: predicate.switch_defer_inline 1.89% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.69% : 0.000007s : 38: predicate.switch_simplify 0.85% : 0.000001s : 8: predicate.tile_eliminate 0.91% : 0.000001s : 8: predicate.transpose_eliminate 1.54% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.50% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.04% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.62% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.17% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 3: predicate.value_based_eliminate 0.74% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 6: predicate.virtual_output_eliminate 0.36% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000332 7 39.77% : 0.000132s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.23% : 0.000200s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.109618 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.94% : 0.003223s : 1: add_attr 2.93% : 0.003214s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000067s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.50% : 0.000546s : 1: bootstrap 0.03% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.41% : 0.000447s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.47% : 0.000519s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.77% : 0.000843s : 78: opt.transform.opt_a 0.02% : 0.000024s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000034s : 4: opt.transform.symbol_engine_opt 2.00% : 0.002193s : 1: opt_a 0.09% : 0.000101s : 1: opt_after_cconv 0.44% : 0.000487s : 1: opt_after_jit_grad 0.18% : 0.000195s : 1: opt_b 3.79% : 0.004157s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000030s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 0.29% : 0.000315s : 1: renormalize.infer 0.20% : 0.000219s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000039s : 1: rewriter_after_opt_a 0.05% : 0.000059s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000076s : 1: symbol_engine_optimizer 77.75% : 0.085223s : 1: task_emit 0.07% : 0.000073s : 1: tuple_transform 6.16% : 0.006756s : 1: type_inference 0.06% : 0.000062s : 1: validate TotalTime = 0.098417, [24] [bootstrap]: 0.00038551 [type_inference]: 0.0054669 [event_method]: 1.438e-05 [auto_monad]: 5.869e-05 [graph_reusing]: 5.39e-06 [inline]: 2.07999e-06 [add_attr]: 0.00305519, [1] [add_attr_with_inline]: 0.00304762, [1] [Cycle 1]: 4.642e-05, [2] [tag_attr]: 1.481e-05 [meta_addattr_fg_expand]: 4.28999e-06 [parallel-infer-symbol]: 3.25e-06 [pre_auto_parallel]: 2.575e-05 [insert-virtual-dataset]: 2.51e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.00412128, [53] [py_interpret_to_execute]: 2.236e-05 [rewriter_before_opt_a]: 6.251e-05 [opt_a]: 0.0021978, [2] [Cycle 1]: 0.0015735, [45] [expand_dump_flag]: 3.03e-06 [switch_simplify]: 3.432e-05 [loop_unroll]: 2.07e-05 [a_1]: 0.00043378 [with_stream_mark]: 1.267e-05 [recompute_prepare]: 8.25999e-06 [updatestate_depend_eliminate]: 4.32998e-06 [updatestate_assign_eliminate]: 3.28e-06 [updatestate_loads_eliminate]: 3.12002e-06 [parameter_eliminate]: 1.82999e-06 [a_2]: 8.183e-05 [accelerated_algorithm]: 6.71e-06 [shard]: 2.37999e-06 [meta_shard_fg_expand]: 1.79e-06 [shard_inline]: 6.07999e-06 [merge_send_recv]: 7.81001e-06 [auto_parallel]: 5.87999e-06 [parallel]: 1.826e-05 [flash_sp]: 7.31999e-06 [merge_comm]: 3.81999e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 9.43002e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.16001e-06 [virtual_dataset]: 6.05002e-06 [get_grad_eliminate_]: 5.48002e-06 [virtual_output]: 5.99e-06 [merge_forward]: 3.81001e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.61e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.158e-05 [merge_recompute_call_nodes]: 1.74e-06 [before_grad]: 9.92999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.8e-06 [meta_fg_expand]: 2.61999e-06 [flash_sp_send_recv_attached]: 2.96001e-06 [receive_attached]: 2.73998e-06 [after_resolve]: 9.67999e-06 [a_after_grad]: 8.25e-06 [renormalize]: 0.00047127 [add_forward_monad_depend]: 4.89e-06 [auto_monad_grad]: 2.32999e-06 [auto_monad_eliminator]: 1.35e-05 [cse]: 2.965e-05 [a_3]: 4.222e-05 [Cycle 2]: 0.000614, [45] [expand_dump_flag]: 1.13001e-06 [switch_simplify]: 7.28e-06 [loop_unroll]: 5.72001e-06 [a_1]: 0.00011362 [with_stream_mark]: 1.028e-05 [recompute_prepare]: 5.77999e-06 [updatestate_depend_eliminate]: 2.82002e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 7.149e-05 [accelerated_algorithm]: 5.65001e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 5.05001e-06 [auto_parallel]: 6.09999e-06 [parallel]: 4.56002e-06 [flash_sp]: 3.48e-06 [merge_comm]: 3.07002e-06 [allreduce_fusion]: 3.26001e-06 [matmul_add_comm_reduction]: 5.62999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.58e-06 [virtual_dataset]: 5.61003e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.19998e-06 [merge_forward]: 3.13e-06 [cell_reuse_recompute_pass]: 1.63002e-06 [offload_activation]: 6.83e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.147e-05 [merge_recompute_call_nodes]: 8.2e-07 [before_grad]: 9.88998e-06 [set_forward_comm_id_for_comm_node_pass]: 4.17e-06 [meta_fg_expand]: 2.04e-06 [flash_sp_send_recv_attached]: 7.00005e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 8.48999e-06 [a_after_grad]: 7.74002e-06 [renormalize]: 7.00238e-08 [add_forward_monad_depend]: 1.11002e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 6.98998e-06 [cse]: 1.412e-05 [a_3]: 3.347e-05 [py_interpret_to_execute_after_opt_a]: 8.37998e-06 [slice_cell_reuse_recomputed_activation]: 2.39001e-06 [rewriter_after_opt_a]: 3.503e-05 [convert_after_rewriter]: 6.51e-06 [order_py_execute_after_rewriter]: 5.24998e-06 [mutable_eliminate]: 0.00048039 [opt_b]: 0.00019026, [1] [Cycle 1]: 0.00018365, [7] [b_1]: 0.00011215 [b_2]: 6.99001e-06 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.19001e-06 [renormalize]: 7.59988e-07 [cse]: 1.832e-05 [optimize_parallel_all_gather_comm]: 1.672e-05 [overlap_param_gather]: 2.14999e-06 [cconv]: 2.349e-05 [loop_unroll]: 0.00042602 [opt_after_cconv]: 9.528e-05, [1] [Cycle 1]: 8.924e-05, [7] [c_1]: 2.517e-05 [parameter_eliminate]: 2.51e-06 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.43e-06 [cse]: 1.775e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 1.517e-05 [tuple_transform]: 7.065e-05, [1] [Cycle 1]: 6.611e-05, [4] [d_1]: 3.86e-05 [none_parameter_eliminate]: 1.59998e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.88998e-06 [partial_unused_args_eliminate]: 1.86e-06 [add_recomputation]: 4.373e-05 [cse_after_recomputation]: 2.145e-05, [1] [Cycle 1]: 1.677e-05, [1] [cse]: 1.143e-05 [environ_conv]: 5.64998e-06 [swap_dp_allreduce_reducescatter]: 5.12999e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.87e-06 [label_fine_grained_interleaved_index]: 3.08e-06 [merge_cast_opt]: 1.34998e-06 [slice_recompute_activation]: 2.14e-06 [micro_interleaved_order_control]: 2.59999e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 9.60019e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.31002e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.39e-06 [interleave_parallel_branches]: 1.12999e-06 [overlap_opt_shard_in_pipeline]: 1.35001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.284e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.95998e-06 [overlap_recompute_and_grad_model_parallel]: 4.83001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.44999e-06 [overlap_grad_ring_attention]: 4.20999e-06 [overlap_grad_flash_sp]: 1.972e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.28998e-06 [split_layernorm_comm]: 1.94e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 7.168e-05, [1] [Cycle 1]: 6.67e-05, [6] [build]: 2.69999e-06 [elim_shapecalc]: 8.19002e-06 [elim_not_effective]: 1.201e-05 [opt_reshape]: 6.19001e-06 [fold_const_symbol]: 9.47999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.57001e-06 [pipeline_parallel_scheduler]: 1.66e-06 [auto_monad_reorder]: 1.553e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.81001e-06 [opt_after_jit_grad]: 0.00045391 [validate]: 3.672e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.084546 [execute]: 8.02003e-06 Sums bootstrap : 0.000386s : 0.41% type_inference : 0.005467s : 5.79% event_method : 0.000014s : 0.02% auto_monad : 0.000059s : 0.06% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.02% optimize.rewriter_before_opt_a : 0.000063s : 0.07% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000042s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000547s : 0.58% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000153s : 0.16% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000018s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000471s : 0.50% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000044s : 0.05% optimize.opt_a.a_3 : 0.000076s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000480s : 0.51% optimize.opt_b.b_1 : 0.000112s : 0.12% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000426s : 0.45% optimize.opt_after_cconv.c_1 : 0.000025s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000020s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000454s : 0.48% validate : 0.000037s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.084546s : 89.60% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000166 26 19.35% : 0.000032s : 5: substitution.arithmetic_simplify 1.14% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 3.48% : 0.000006s : 3: substitution.graph_param_transform 63.46% : 0.000105s : 3: substitution.inline 1.94% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.85% : 0.000005s : 4: substitution.remove_not_recompute_node 1.92% : 0.000003s : 2: substitution.replace_old_param 5.02% : 0.000008s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005424 2 88.82% : 0.004817s : 1: type_inference.infer 11.18% : 0.000606s : 1: type_inference.specialize ------[replace.] 0.000036 4 78.99% : 0.000029s : 3: replace.inline 21.01% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000111 4 93.15% : 0.000104s : 3: match.inline 6.85% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 883 0.89% : 0.000001s : 9: predicate.accumulaten_eliminater 0.75% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 6: predicate.addn_check_dump 1.01% : 0.000002s : 9: predicate.addn_zero_filter 0.85% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.25% : 0.000004s : 15: predicate.arithmetic_simplify 0.88% : 0.000001s : 9: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.62% : 0.000001s : 6: predicate.depend_value_elim 0.89% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.39% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.28% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.15% : 0.000002s : 12: predicate.environ_get_depend_swap 1.80% : 0.000003s : 18: predicate.environ_get_eliminate 1.07% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.46% : 0.000004s : 13: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.86% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.80% : 0.000001s : 6: predicate.get_grad_eliminate 0.27% : 0.000000s : 3: predicate.graph_param_transform 0.70% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.41% : 0.000010s : 40: predicate.inline 0.87% : 0.000001s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 6: predicate.less_batch_normalization 1.65% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 25: predicate.load_eliminater 1.12% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.84% : 0.000001s : 9: predicate.minmaximum_grad 1.03% : 0.000002s : 3: predicate.mutable_eliminate 0.34% : 0.000001s : 3: predicate.opt_reshape 0.39% : 0.000001s : 3: predicate.parallel_virtual_node 1.57% : 0.000002s : 13: predicate.partial_defer_inline 1.43% : 0.000002s : 13: predicate.partial_eliminate 0.88% : 0.000001s : 9: predicate.print_const_string_wrapper 0.71% : 0.000001s : 6: predicate.reduce_all_const_elim 1.12% : 0.000002s : 9: predicate.reduce_eliminate 2.38% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.31% : 0.000002s : 16: predicate.replace_applicator 0.60% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 9: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 3: predicate.row_tensor_eliminate 0.83% : 0.000001s : 6: predicate.same_eliminate 0.47% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 6: predicate.shard_identity_eliminate 0.77% : 0.000001s : 6: predicate.special_op_eliminate 0.80% : 0.000001s : 6: predicate.specialize_transform 0.88% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.66% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 13: predicate.switch_defer_inline 1.98% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.13% : 0.000008s : 43: predicate.switch_simplify 0.87% : 0.000001s : 9: predicate.tile_eliminate 0.94% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.47% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.67% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.67% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000334 8 41.38% : 0.000138s : 3: func_graph_cloner_run.FuncGraphClonerGraph 58.62% : 0.000196s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.107140 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.86% : 0.003060s : 1: add_attr 2.85% : 0.003051s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000064s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.39% : 0.000415s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.41% : 0.000435s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.46% : 0.000489s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.86% : 0.000923s : 78: opt.transform.opt_a 0.02% : 0.000024s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.05% : 0.002201s : 1: opt_a 0.09% : 0.000099s : 1: opt_after_cconv 0.43% : 0.000464s : 1: opt_after_jit_grad 0.18% : 0.000194s : 1: opt_b 3.85% : 0.004126s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000030s : 1: pre_auto_parallel 0.02% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000019s : 1: remove_dup_value 0.22% : 0.000232s : 1: renormalize.infer 0.22% : 0.000232s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000039s : 1: rewriter_after_opt_a 0.06% : 0.000067s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000074s : 1: symbol_engine_optimizer 78.93% : 0.084563s : 1: task_emit 0.07% : 0.000074s : 1: tuple_transform 5.12% : 0.005483s : 1: type_inference 0.06% : 0.000061s : 1: validate TotalTime = 0.127964, [24] [bootstrap]: 0.00048843 [type_inference]: 0.0127665 [event_method]: 5.226e-05 [auto_monad]: 0.00013875 [graph_reusing]: 9.10999e-06 [inline]: 2.94999e-06 [add_attr]: 0.00350571, [1] [add_attr_with_inline]: 0.00349551, [1] [Cycle 1]: 8.611e-05, [2] [tag_attr]: 3.804e-05 [meta_addattr_fg_expand]: 1.08e-05 [parallel-infer-symbol]: 3.65998e-06 [pre_auto_parallel]: 5.578e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 7.99977e-07 [dataset_repeat_opt]: 2.00002e-06 [pipeline_split]: 2.04e-06 [optimize]: 0.0191589, [53] [py_interpret_to_execute]: 4.39e-05 [rewriter_before_opt_a]: 0.000168 [opt_a]: 0.0166827, [3] [Cycle 1]: 0.0128211, [45] [expand_dump_flag]: 4.72e-06 [switch_simplify]: 7.967e-05 [loop_unroll]: 6.496e-05 [a_1]: 0.00155313 [with_stream_mark]: 2.821e-05 [recompute_prepare]: 2.442e-05 [updatestate_depend_eliminate]: 9.18002e-06 [updatestate_assign_eliminate]: 7.97e-06 [updatestate_loads_eliminate]: 7.45998e-06 [parameter_eliminate]: 2.93e-06 [a_2]: 0.00025014 [accelerated_algorithm]: 3.448e-05 [shard]: 2.21e-06 [meta_shard_fg_expand]: 3.83001e-06 [shard_inline]: 1.619e-05 [merge_send_recv]: 1.839e-05 [auto_parallel]: 1.311e-05 [parallel]: 2.046e-05 [flash_sp]: 1.337e-05 [merge_comm]: 9.74e-06 [allreduce_fusion]: 8.82999e-06 [matmul_add_comm_reduction]: 3.072e-05 [allreduce_slice_to_reducescatter]: 1.22999e-06 [virtual_shard_identity]: 1.813e-05 [virtual_dataset]: 1.606e-05 [get_grad_eliminate_]: 1.555e-05 [virtual_output]: 1.52e-05 [merge_forward]: 9.64999e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 1.882e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.085e-05 [merge_recompute_call_nodes]: 1.69e-06 [before_grad]: 2.939e-05 [set_forward_comm_id_for_comm_node_pass]: 9.91e-06 [meta_fg_expand]: 0.00185061 [flash_sp_send_recv_attached]: 4.49002e-06 [receive_attached]: 2.24999e-06 [after_resolve]: 7.343e-05 [a_after_grad]: 9.377e-05 [renormalize]: 0.00738543 [add_forward_monad_depend]: 1.337e-05 [auto_monad_grad]: 7.33e-06 [auto_monad_eliminator]: 8.54e-05 [cse]: 0.00021444 [a_3]: 0.0003728 [Cycle 2]: 0.00310162, [45] [expand_dump_flag]: 3.01001e-06 [switch_simplify]: 5.081e-05 [loop_unroll]: 4.621e-05 [a_1]: 0.00141558 [with_stream_mark]: 1.595e-05 [recompute_prepare]: 9.52999e-06 [updatestate_depend_eliminate]: 5.94e-06 [updatestate_assign_eliminate]: 4.91002e-06 [updatestate_loads_eliminate]: 4.27003e-06 [parameter_eliminate]: 1.97001e-06 [a_2]: 9.383e-05 [accelerated_algorithm]: 1.223e-05 [shard]: 1.94e-06 [meta_shard_fg_expand]: 2.91e-06 [shard_inline]: 7.30998e-06 [merge_send_recv]: 9.77999e-06 [auto_parallel]: 1.122e-05 [parallel]: 8.78001e-06 [flash_sp]: 5.19e-06 [merge_comm]: 4.68999e-06 [allreduce_fusion]: 4.05998e-06 [matmul_add_comm_reduction]: 1.016e-05 [allreduce_slice_to_reducescatter]: 8.70001e-07 [virtual_shard_identity]: 8.90999e-06 [virtual_dataset]: 7.22002e-06 [get_grad_eliminate_]: 6.56999e-06 [virtual_output]: 6.31e-06 [merge_forward]: 5.40999e-06 [cell_reuse_recompute_pass]: 8.59989e-07 [offload_activation]: 1.135e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.474e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 1.212e-05 [set_forward_comm_id_for_comm_node_pass]: 4.84e-06 [meta_fg_expand]: 0.00010398 [flash_sp_send_recv_attached]: 2.34001e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.485e-05 [a_after_grad]: 1.089e-05 [renormalize]: 0.00079548 [add_forward_monad_depend]: 5.14e-06 [auto_monad_grad]: 1.97999e-06 [auto_monad_eliminator]: 1.33e-05 [cse]: 2.661e-05 [a_3]: 5.108e-05 [Cycle 3]: 0.00074091, [45] [expand_dump_flag]: 1.24e-06 [switch_simplify]: 8.37e-06 [loop_unroll]: 6.94001e-06 [a_1]: 0.00015393 [with_stream_mark]: 1.002e-05 [recompute_prepare]: 7.1e-06 [updatestate_depend_eliminate]: 4.02e-06 [updatestate_assign_eliminate]: 2.83998e-06 [updatestate_loads_eliminate]: 3.20998e-06 [parameter_eliminate]: 1.41002e-06 [a_2]: 9.196e-05 [accelerated_algorithm]: 1.099e-05 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.72999e-06 [shard_inline]: 7e-06 [merge_send_recv]: 5.84e-06 [auto_parallel]: 7.43999e-06 [parallel]: 5.84999e-06 [flash_sp]: 9.29984e-07 [merge_comm]: 4.08001e-06 [allreduce_fusion]: 3.66001e-06 [matmul_add_comm_reduction]: 6.38e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 8.35999e-06 [virtual_dataset]: 6.68e-06 [get_grad_eliminate_]: 6.42001e-06 [virtual_output]: 6.21e-06 [merge_forward]: 3.81999e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 8.03999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.346e-05 [merge_recompute_call_nodes]: 1.03001e-06 [before_grad]: 1.144e-05 [set_forward_comm_id_for_comm_node_pass]: 4.11001e-06 [meta_fg_expand]: 2.53998e-06 [flash_sp_send_recv_attached]: 1.09e-06 [receive_attached]: 1.02998e-06 [after_resolve]: 1.005e-05 [a_after_grad]: 1.017e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.29003e-06 [auto_monad_grad]: 1.04003e-06 [auto_monad_eliminator]: 8.52e-06 [cse]: 1.859e-05 [a_3]: 4.114e-05 [py_interpret_to_execute_after_opt_a]: 1.369e-05 [slice_cell_reuse_recomputed_activation]: 2.46e-06 [rewriter_after_opt_a]: 4.421e-05 [convert_after_rewriter]: 7.87e-06 [order_py_execute_after_rewriter]: 6.27001e-06 [mutable_eliminate]: 0.00068424 [opt_b]: 0.00023677, [1] [Cycle 1]: 0.00022822, [7] [b_1]: 0.00013752 [b_2]: 8.85999e-06 [updatestate_depend_eliminate]: 7.13998e-06 [updatestate_assign_eliminate]: 3.01001e-06 [updatestate_loads_eliminate]: 3.09999e-06 [renormalize]: 4.39992e-07 [cse]: 3.095e-05 [optimize_parallel_all_gather_comm]: 1.827e-05 [overlap_param_gather]: 1.89999e-06 [cconv]: 2.552e-05 [loop_unroll]: 0.00047168 [opt_after_cconv]: 0.00011852, [1] [Cycle 1]: 0.00011177, [7] [c_1]: 3.44e-05 [parameter_eliminate]: 3.65003e-06 [updatestate_depend_eliminate]: 6.61999e-06 [updatestate_assign_eliminate]: 3.09999e-06 [updatestate_loads_eliminate]: 3.05998e-06 [cse]: 2.368e-05 [renormalize]: 5.89993e-07 [remove_dup_value]: 1.646e-05 [tuple_transform]: 8.273e-05, [1] [Cycle 1]: 7.769e-05, [4] [d_1]: 4.911e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 7.67002e-06 [partial_unused_args_eliminate]: 2.21e-06 [add_recomputation]: 5.679e-05 [cse_after_recomputation]: 2.735e-05, [1] [Cycle 1]: 2.244e-05, [1] [cse]: 1.628e-05 [environ_conv]: 9.97999e-06 [swap_dp_allreduce_reducescatter]: 6.48e-06 [bias_add_comm_swap]: 3.08998e-06 [label_micro_interleaved_index]: 4.58999e-06 [label_fine_grained_interleaved_index]: 2.99999e-06 [merge_cast_opt]: 1.70001e-06 [slice_recompute_activation]: 2.55002e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.35999e-06 [ForceFp32Comm]: 1.10001e-06 [remove_cast_before_assign_add]: 1.18001e-06 [full_micro_interleaved_order_control]: 2.22999e-06 [reorder_send_recv_between_fp_bp]: 2.96999e-06 [comm_op_add_attrs]: 1.40999e-06 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 2.00002e-06 [control_data_broadcast_order]: 1.567e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 4.42e-06 [overlap_recompute_and_grad_model_parallel]: 5.55001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.43002e-06 [overlap_grad_ring_attention]: 4.82998e-06 [overlap_grad_flash_sp]: 2.616e-05 [begin_end_overlap_inline]: 6.00005e-07 [split_matmul_comm_elemetwise]: 2.33998e-06 [split_layernorm_comm]: 1.89999e-06 [handle_group_info]: 1.39e-06 [symbol_engine_optimizer]: 9.362e-05, [1] [Cycle 1]: 8.898e-05, [6] [build]: 9.88002e-06 [elim_shapecalc]: 1.207e-05 [elim_not_effective]: 1.476e-05 [opt_reshape]: 7.41001e-06 [fold_const_symbol]: 1.203e-05 [renormalize]: 4.10015e-07 [detach_backward]: 2.44999e-06 [pipeline_parallel_scheduler]: 1.67999e-06 [auto_monad_reorder]: 2.336e-05 [get_jit_bprop_graph]: 1.85001e-06 [rewriter_after_jit_bprop_graph]: 5.05001e-06 [opt_after_jit_grad]: 0.0005304 [validate]: 4.968e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0908872 [execute]: 1.08e-05 Sums bootstrap : 0.000488s : 0.40% type_inference : 0.012767s : 10.38% event_method : 0.000052s : 0.04% auto_monad : 0.000139s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000038s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000056s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000044s : 0.04% optimize.rewriter_before_opt_a : 0.000168s : 0.14% optimize.opt_a.expand_dump_flag : 0.000009s : 0.01% optimize.opt_a.switch_simplify : 0.000139s : 0.11% optimize.opt_a.loop_unroll : 0.000118s : 0.10% optimize.opt_a.a_1 : 0.003123s : 2.54% optimize.opt_a.with_stream_mark : 0.000054s : 0.04% optimize.opt_a.recompute_prepare : 0.000041s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000006s : 0.01% optimize.opt_a.a_2 : 0.000436s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000058s : 0.05% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000030s : 0.02% optimize.opt_a.merge_send_recv : 0.000034s : 0.03% optimize.opt_a.auto_parallel : 0.000032s : 0.03% optimize.opt_a.parallel : 0.000035s : 0.03% optimize.opt_a.flash_sp : 0.000019s : 0.02% optimize.opt_a.merge_comm : 0.000019s : 0.02% optimize.opt_a.allreduce_fusion : 0.000017s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000047s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000035s : 0.03% optimize.opt_a.virtual_dataset : 0.000030s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000029s : 0.02% optimize.opt_a.virtual_output : 0.000028s : 0.02% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000038s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000059s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000053s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000019s : 0.02% optimize.opt_a.meta_fg_expand : 0.001957s : 1.59% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.01% optimize.opt_a.receive_attached : 0.000006s : 0.00% optimize.opt_a.after_resolve : 0.000098s : 0.08% optimize.opt_a.a_after_grad : 0.000115s : 0.09% optimize.opt_a.renormalize : 0.008181s : 6.65% optimize.opt_a.add_forward_monad_depend : 0.000020s : 0.02% optimize.opt_a.auto_monad_grad : 0.000010s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000107s : 0.09% optimize.opt_a.cse : 0.000260s : 0.21% optimize.opt_a.a_3 : 0.000465s : 0.38% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000044s : 0.04% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000684s : 0.56% optimize.opt_b.b_1 : 0.000138s : 0.11% optimize.opt_b.b_2 : 0.000009s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.02% optimize.loop_unroll : 0.000472s : 0.38% optimize.opt_after_cconv.c_1 : 0.000034s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000024s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.01% optimize.tuple_transform.d_1 : 0.000049s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.05% optimize.cse_after_recomputation.cse : 0.000016s : 0.01% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000023s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000530s : 0.43% validate : 0.000050s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.090887s : 73.89% execute : 0.000011s : 0.01% Time group info: ------[substitution.] 0.000815 161 7.71% : 0.000063s : 8: substitution.arithmetic_simplify 0.29% : 0.000002s : 3: substitution.elim_not_effective 0.64% : 0.000005s : 5: substitution.float_depend_g_call 0.50% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.22% : 0.000002s : 3: substitution.fold_const_symbol 0.86% : 0.000007s : 4: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 58.15% : 0.000474s : 17: substitution.inline 2.39% : 0.000019s : 2: substitution.inline_without_move 1.38% : 0.000011s : 15: substitution.j_node_and_user_rematch 2.30% : 0.000019s : 3: substitution.less_batch_normalization 1.51% : 0.000012s : 7: substitution.minmaximum_grad 0.92% : 0.000007s : 5: substitution.partial_eliminate 1.57% : 0.000013s : 15: substitution.remove_not_recompute_node 3.90% : 0.000032s : 10: substitution.replace_applicator 1.24% : 0.000010s : 10: substitution.replace_old_param 0.33% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.96% : 0.000024s : 7: substitution.tuple_list_convert_item_index_to_positive 1.44% : 0.000012s : 7: substitution.tuple_list_get_item_const_eliminator 2.01% : 0.000016s : 7: substitution.tuple_list_get_item_depend_reorder 7.15% : 0.000058s : 19: substitution.tuple_list_get_item_eliminator 1.92% : 0.000016s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012669 2 85.51% : 0.010833s : 1: type_inference.infer 14.49% : 0.001836s : 1: type_inference.specialize ------[replace.] 0.000212 27 64.28% : 0.000136s : 17: replace.inline 35.72% : 0.000076s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000491 27 94.37% : 0.000463s : 17: match.inline 5.63% : 0.000028s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000721 4248 1.10% : 0.000008s : 53: predicate.accumulaten_eliminater 0.27% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.45% : 0.000003s : 21: predicate.addn_check_dump 1.11% : 0.000008s : 53: predicate.addn_zero_filter 1.08% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 74: predicate.arithmetic_simplify 1.17% : 0.000008s : 53: predicate.cast_eliminate 1.14% : 0.000008s : 50: predicate.check_bprop_eliminate 0.44% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.46% : 0.000003s : 21: predicate.depend_value_elim 1.17% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.25% : 0.000009s : 53: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.33% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000000s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 57: predicate.environ_get_add_eliminate 1.16% : 0.000008s : 57: predicate.environ_get_depend_swap 1.65% : 0.000012s : 78: predicate.environ_get_eliminate 1.16% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.83% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.54% : 0.000018s : 80: predicate.float_depend_g_call 0.48% : 0.000003s : 21: predicate.float_environ_get_switch 0.55% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.50% : 0.000004s : 21: predicate.get_grad_eliminate 0.11% : 0.000001s : 4: predicate.graph_param_transform 0.51% : 0.000004s : 21: predicate.incorporate_call 0.45% : 0.000003s : 21: predicate.incorporate_call_switch 6.02% : 0.000043s : 183: predicate.inline 1.48% : 0.000011s : 45: predicate.inline_without_move 0.29% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 21: predicate.less_batch_normalization 1.57% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.59% : 0.000019s : 124: predicate.load_eliminater 0.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.59% : 0.000019s : 113: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.47% : 0.000003s : 21: predicate.merge_addn 1.11% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.09% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 53: predicate.minmaximum_grad 0.34% : 0.000002s : 4: predicate.mutable_eliminate 0.10% : 0.000001s : 4: predicate.opt_reshape 0.11% : 0.000001s : 4: predicate.parallel_virtual_node 2.11% : 0.000015s : 80: predicate.partial_defer_inline 1.73% : 0.000012s : 67: predicate.partial_eliminate 1.10% : 0.000008s : 53: predicate.print_const_string_wrapper 0.46% : 0.000003s : 21: predicate.reduce_all_const_elim 1.36% : 0.000010s : 53: predicate.reduce_eliminate 2.59% : 0.000019s : 124: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 21: predicate.remove_not_recompute_node 1.84% : 0.000013s : 113: predicate.replace_applicator 0.69% : 0.000005s : 45: predicate.replace_old_param 0.07% : 0.000001s : 4: predicate.reset_defer_inline 1.11% : 0.000008s : 53: predicate.reshape_eliminate 1.12% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.11% : 0.000001s : 4: predicate.row_tensor_eliminate 1.36% : 0.000010s : 50: predicate.same_eliminate 0.33% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.55% : 0.000004s : 21: predicate.shard_identity_eliminate 0.24% : 0.000002s : 8: predicate.special_op_eliminate 0.61% : 0.000004s : 21: predicate.specialize_transform 1.47% : 0.000011s : 50: predicate.split_environ_get_set_with_tuple_value 1.24% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.94% : 0.000014s : 80: predicate.switch_defer_inline 2.99% : 0.000022s : 130: predicate.switch_layer_defer_inline 5.22% : 0.000038s : 218: predicate.switch_simplify 1.14% : 0.000008s : 53: predicate.tile_eliminate 1.09% : 0.000008s : 53: predicate.transpose_eliminate 1.45% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 61: predicate.tuple_list_get_item_depend_reorder 2.76% : 0.000020s : 92: predicate.tuple_list_get_item_eliminator 1.45% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.55% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.56% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.06% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.11% : 0.000001s : 4: predicate.value_based_eliminate 0.50% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.50% : 0.000004s : 21: predicate.virtual_output_eliminate 0.08% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.13% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002114 36 58.54% : 0.001237s : 15: func_graph_cloner_run.FuncGraphClonerGraph 41.46% : 0.000876s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.163712 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.14% : 0.003511s : 1: add_attr 2.14% : 0.003500s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000146s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.32% : 0.000531s : 1: bootstrap 0.02% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000019s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000030s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000014s : 1: environ_conv 0.04% : 0.000060s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000014s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.29% : 0.000482s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.42% : 0.000694s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.87% : 0.004706s : 117: opt.transform.opt_a 0.02% : 0.000033s : 1: opt.transform.opt_after_cconv 0.02% : 0.000028s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000118s : 28: opt.transform.opt_b 0.03% : 0.000055s : 2: opt.transform.opt_trans_graph 0.03% : 0.000042s : 4: opt.transform.symbol_engine_opt 10.19% : 0.016686s : 1: opt_a 0.07% : 0.000122s : 1: opt_after_cconv 0.33% : 0.000542s : 1: opt_after_jit_grad 0.15% : 0.000240s : 1: opt_b 11.71% : 0.019164s : 1: optimize 0.01% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000030s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000060s : 1: pre_auto_parallel 0.03% : 0.000048s : 1: py_interpret_to_execute 0.01% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000020s : 1: remove_dup_value 3.89% : 0.006366s : 2: renormalize.infer 1.10% : 0.001795s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000048s : 1: rewriter_after_opt_a 0.11% : 0.000173s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000097s : 1: symbol_engine_optimizer 55.53% : 0.090911s : 1: task_emit 0.05% : 0.000086s : 1: tuple_transform 7.81% : 0.012794s : 1: type_inference 0.05% : 0.000081s : 1: validate TotalTime = 0.0998487, [24] [bootstrap]: 0.00040974 [type_inference]: 0.00588858 [event_method]: 1.414e-05 [auto_monad]: 6.395e-05 [graph_reusing]: 5.99e-06 [inline]: 2.49001e-06 [add_attr]: 0.00319659, [1] [add_attr_with_inline]: 0.00318738, [1] [Cycle 1]: 5.96e-05, [2] [tag_attr]: 1.434e-05 [meta_addattr_fg_expand]: 4.28999e-06 [parallel-infer-symbol]: 4.01001e-06 [pre_auto_parallel]: 2.606e-05 [insert-virtual-dataset]: 3.12002e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.00418865, [53] [py_interpret_to_execute]: 2.064e-05 [rewriter_before_opt_a]: 5.16e-05 [opt_a]: 0.00215815, [2] [Cycle 1]: 0.00148563, [45] [expand_dump_flag]: 3.23e-06 [switch_simplify]: 3.04e-05 [loop_unroll]: 1.693e-05 [a_1]: 0.00036104 [with_stream_mark]: 1.417e-05 [recompute_prepare]: 8.03001e-06 [updatestate_depend_eliminate]: 4.38999e-06 [updatestate_assign_eliminate]: 3.31001e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 1.99e-06 [a_2]: 8.424e-05 [accelerated_algorithm]: 6.81999e-06 [shard]: 1.96998e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 6.63998e-06 [merge_send_recv]: 8.17e-06 [auto_parallel]: 7.41001e-06 [parallel]: 1.761e-05 [flash_sp]: 7.95e-06 [merge_comm]: 4.23001e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 9.51998e-06 [allreduce_slice_to_reducescatter]: 6.70028e-07 [virtual_shard_identity]: 8.49998e-06 [virtual_dataset]: 6.09001e-06 [get_grad_eliminate_]: 5.78002e-06 [virtual_output]: 6.08998e-06 [merge_forward]: 3.89002e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.78998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.184e-05 [merge_recompute_call_nodes]: 1.65001e-06 [before_grad]: 9.86e-06 [set_forward_comm_id_for_comm_node_pass]: 4.30999e-06 [meta_fg_expand]: 2.64999e-06 [flash_sp_send_recv_attached]: 2.62001e-06 [receive_attached]: 2.09e-06 [after_resolve]: 1.014e-05 [a_after_grad]: 9.02e-06 [renormalize]: 0.0004489 [add_forward_monad_depend]: 4.47998e-06 [auto_monad_grad]: 2.22999e-06 [auto_monad_eliminator]: 1.501e-05 [cse]: 2.834e-05 [a_3]: 4.364e-05 [Cycle 2]: 0.00066043, [45] [expand_dump_flag]: 1.17e-06 [switch_simplify]: 7.12002e-06 [loop_unroll]: 5.67999e-06 [a_1]: 0.00013094 [with_stream_mark]: 1.192e-05 [recompute_prepare]: 7.5e-06 [updatestate_depend_eliminate]: 3.63999e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.06002e-06 [a_2]: 7.551e-05 [accelerated_algorithm]: 6.26e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 6.07999e-06 [merge_send_recv]: 5.35001e-06 [auto_parallel]: 7.45998e-06 [parallel]: 4.61002e-06 [flash_sp]: 3.6e-06 [merge_comm]: 3.63999e-06 [allreduce_fusion]: 3.11001e-06 [matmul_add_comm_reduction]: 6.24999e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 7.09001e-06 [virtual_dataset]: 6.09999e-06 [get_grad_eliminate_]: 6.01e-06 [virtual_output]: 5.94e-06 [merge_forward]: 2.74999e-06 [cell_reuse_recompute_pass]: 1.67999e-06 [offload_activation]: 1.003e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.166e-05 [merge_recompute_call_nodes]: 9.39996e-07 [before_grad]: 8.79e-06 [set_forward_comm_id_for_comm_node_pass]: 3.62002e-06 [meta_fg_expand]: 2.09e-06 [flash_sp_send_recv_attached]: 9.29984e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.57e-06 [a_after_grad]: 8.2e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 7.33e-06 [cse]: 1.484e-05 [a_3]: 3.407e-05 [py_interpret_to_execute_after_opt_a]: 8.99e-06 [slice_cell_reuse_recomputed_activation]: 2.11e-06 [rewriter_after_opt_a]: 3.397e-05 [convert_after_rewriter]: 6.44999e-06 [order_py_execute_after_rewriter]: 5.64e-06 [mutable_eliminate]: 0.00050819 [opt_b]: 0.00020617, [1] [Cycle 1]: 0.00019794, [7] [b_1]: 0.00011958 [b_2]: 7.41001e-06 [updatestate_depend_eliminate]: 5.57999e-06 [updatestate_assign_eliminate]: 2.81e-06 [updatestate_loads_eliminate]: 2.63998e-06 [renormalize]: 2.79979e-07 [cse]: 2.132e-05 [optimize_parallel_all_gather_comm]: 1.775e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.587e-05 [loop_unroll]: 0.00046654 [opt_after_cconv]: 0.00010183, [1] [Cycle 1]: 9.48e-05, [7] [c_1]: 2.671e-05 [parameter_eliminate]: 2.51998e-06 [updatestate_depend_eliminate]: 5.72999e-06 [updatestate_assign_eliminate]: 3.01001e-06 [updatestate_loads_eliminate]: 2.49999e-06 [cse]: 1.928e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.554e-05 [tuple_transform]: 7.615e-05, [1] [Cycle 1]: 7.115e-05, [4] [d_1]: 4.268e-05 [none_parameter_eliminate]: 1.72999e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 7.23e-06 [partial_unused_args_eliminate]: 2.01e-06 [add_recomputation]: 4.816e-05 [cse_after_recomputation]: 2.298e-05, [1] [Cycle 1]: 1.806e-05, [1] [cse]: 1.212e-05 [environ_conv]: 5.90002e-06 [swap_dp_allreduce_reducescatter]: 5.45001e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 4.55999e-06 [label_fine_grained_interleaved_index]: 2.71999e-06 [merge_cast_opt]: 1.73002e-06 [slice_recompute_activation]: 2.30002e-06 [micro_interleaved_order_control]: 2.58e-06 [assign_add_opt]: 1.36002e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.14e-06 [full_micro_interleaved_order_control]: 2.66e-06 [reorder_send_recv_between_fp_bp]: 3.41999e-06 [comm_op_add_attrs]: 1.32e-06 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.12999e-06 [overlap_opt_shard_in_pipeline]: 1.37e-06 [overlap_opt_shard_grad_in_pipeline]: 2.16e-06 [control_data_broadcast_order]: 1.411e-05 [grouped_pairwise_exchange_alltoall]: 1.43002e-06 [offloading_packed_experts]: 4.13999e-06 [overlap_recompute_and_grad_model_parallel]: 5.20001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27e-06 [overlap_recompute_comm]: 2.51998e-06 [overlap_grad_ring_attention]: 4.15999e-06 [overlap_grad_flash_sp]: 1.966e-05 [begin_end_overlap_inline]: 5.89993e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 2.56e-06 [handle_group_info]: 1.22e-06 [symbol_engine_optimizer]: 8.085e-05, [1] [Cycle 1]: 7.628e-05, [6] [build]: 3.36001e-06 [elim_shapecalc]: 9.69e-06 [elim_not_effective]: 1.414e-05 [opt_reshape]: 6.88e-06 [fold_const_symbol]: 1.077e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.27999e-06 [pipeline_parallel_scheduler]: 1.82001e-06 [auto_monad_reorder]: 1.73e-05 [get_jit_bprop_graph]: 1.30001e-06 [rewriter_after_jit_bprop_graph]: 3.73001e-06 [opt_after_jit_grad]: 0.00051667 [validate]: 3.891e-05 [backend_pass]: 1.03001e-06 [task_emit]: 0.0852246 [execute]: 8.94e-06 Sums bootstrap : 0.000410s : 0.43% type_inference : 0.005889s : 6.16% event_method : 0.000014s : 0.01% auto_monad : 0.000064s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000026s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000052s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.04% optimize.opt_a.loop_unroll : 0.000023s : 0.02% optimize.opt_a.a_1 : 0.000492s : 0.51% optimize.opt_a.with_stream_mark : 0.000026s : 0.03% optimize.opt_a.recompute_prepare : 0.000016s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000160s : 0.17% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.01% optimize.opt_a.merge_send_recv : 0.000014s : 0.01% optimize.opt_a.auto_parallel : 0.000015s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.01% optimize.opt_a.virtual_output : 0.000012s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000020s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000449s : 0.47% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.02% optimize.opt_a.cse : 0.000043s : 0.05% optimize.opt_a.a_3 : 0.000078s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.04% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000508s : 0.53% optimize.opt_b.b_1 : 0.000120s : 0.13% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000021s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.03% optimize.loop_unroll : 0.000467s : 0.49% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.02% optimize.tuple_transform.d_1 : 0.000043s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.05% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000020s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000003s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000517s : 0.54% validate : 0.000039s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.085225s : 89.15% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000147 24 20.55% : 0.000030s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.20% : 0.000002s : 2: substitution.fold_const_symbol 3.96% : 0.000006s : 3: substitution.graph_param_transform 65.07% : 0.000096s : 3: substitution.inline 2.14% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.41% : 0.000005s : 4: substitution.remove_not_recompute_node 2.18% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005839 2 91.36% : 0.005335s : 1: type_inference.infer 8.64% : 0.000505s : 1: type_inference.specialize ------[replace.] 0.000029 3 100.00% : 0.000029s : 3: replace.inline ------[match.] 0.000093 3 100.00% : 0.000093s : 3: match.inline ------[predicate.] 0.000154 815 0.91% : 0.000001s : 8: predicate.accumulaten_eliminater 1.29% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 6: predicate.addn_check_dump 0.83% : 0.000001s : 8: predicate.addn_zero_filter 0.77% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.30% : 0.000004s : 14: predicate.arithmetic_simplify 0.94% : 0.000001s : 8: predicate.cast_eliminate 0.67% : 0.000001s : 6: predicate.check_bprop_eliminate 0.61% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.67% : 0.000001s : 6: predicate.depend_value_elim 0.81% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.67% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 11: predicate.environ_get_depend_swap 1.75% : 0.000003s : 17: predicate.environ_get_eliminate 1.03% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.12% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 11: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.91% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.80% : 0.000001s : 6: predicate.get_grad_eliminate 0.27% : 0.000000s : 3: predicate.graph_param_transform 0.73% : 0.000001s : 6: predicate.incorporate_call 0.64% : 0.000001s : 6: predicate.incorporate_call_switch 6.28% : 0.000010s : 37: predicate.inline 0.92% : 0.000001s : 6: predicate.inline_without_move 0.44% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 6: predicate.less_batch_normalization 1.58% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.24% : 0.000003s : 22: predicate.load_eliminater 1.50% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.96% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 6: predicate.merge_addn 0.62% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 8: predicate.minmaximum_grad 1.32% : 0.000002s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.46% : 0.000001s : 3: predicate.parallel_virtual_node 1.45% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 11: predicate.partial_eliminate 0.82% : 0.000001s : 8: predicate.print_const_string_wrapper 0.64% : 0.000001s : 6: predicate.reduce_all_const_elim 1.15% : 0.000002s : 8: predicate.reduce_eliminate 2.14% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 6: predicate.remove_not_recompute_node 1.15% : 0.000002s : 14: predicate.replace_applicator 0.79% : 0.000001s : 6: predicate.replace_old_param 0.32% : 0.000001s : 3: predicate.reset_defer_inline 0.82% : 0.000001s : 8: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.97% : 0.000001s : 6: predicate.same_eliminate 0.53% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.01% : 0.000002s : 6: predicate.shard_identity_eliminate 0.86% : 0.000001s : 6: predicate.special_op_eliminate 0.96% : 0.000001s : 6: predicate.specialize_transform 1.18% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.73% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.53% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.22% : 0.000002s : 11: predicate.switch_defer_inline 1.80% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.82% : 0.000007s : 38: predicate.switch_simplify 0.88% : 0.000001s : 8: predicate.tile_eliminate 0.82% : 0.000001s : 8: predicate.transpose_eliminate 1.46% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.58% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.00% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 3: predicate.value_based_eliminate 0.75% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000292 7 33.56% : 0.000098s : 2: func_graph_cloner_run.FuncGraphClonerGraph 66.44% : 0.000194s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.108728 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.94% : 0.003201s : 1: add_attr 2.93% : 0.003191s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000069s : 1: auto_monad 0.02% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.41% : 0.000443s : 1: bootstrap 0.03% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000026s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.44% : 0.000476s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.48% : 0.000517s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000016s : 1: opt.transform.mutable_eliminate 0.80% : 0.000874s : 78: opt.transform.opt_a 0.02% : 0.000025s : 1: opt.transform.opt_after_cconv 0.02% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000097s : 28: opt.transform.opt_b 0.04% : 0.000047s : 2: opt.transform.opt_trans_graph 0.03% : 0.000037s : 4: opt.transform.symbol_engine_opt 1.99% : 0.002161s : 1: opt_a 0.10% : 0.000105s : 1: opt_after_cconv 0.49% : 0.000528s : 1: opt_after_jit_grad 0.19% : 0.000210s : 1: opt_b 3.86% : 0.004193s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000031s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000019s : 1: remove_dup_value 0.21% : 0.000232s : 1: renormalize.infer 0.19% : 0.000210s : 1: renormalize.specialize 0.01% : 0.000007s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000038s : 1: rewriter_after_opt_a 0.05% : 0.000056s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000084s : 1: symbol_engine_optimizer 78.40% : 0.085246s : 1: task_emit 0.07% : 0.000079s : 1: tuple_transform 5.43% : 0.005909s : 1: type_inference 0.06% : 0.000063s : 1: validate TotalTime = 0.159097, [24] [bootstrap]: 0.00054416 [type_inference]: 0.0126271 [event_method]: 4.552e-05 [auto_monad]: 0.00013228 [graph_reusing]: 8.67e-06 [inline]: 2.73e-06 [add_attr]: 0.00325823, [1] [add_attr_with_inline]: 0.00324768, [1] [Cycle 1]: 8.12e-05, [2] [tag_attr]: 3.387e-05 [meta_addattr_fg_expand]: 9.77999e-06 [parallel-infer-symbol]: 3.74002e-06 [pre_auto_parallel]: 5.195e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 2.32999e-06 [pipeline_split]: 2.01e-06 [optimize]: 0.0175812, [53] [py_interpret_to_execute]: 4.122e-05 [rewriter_before_opt_a]: 0.00014751 [opt_a]: 0.0152719, [3] [Cycle 1]: 0.0116486, [45] [expand_dump_flag]: 4.14002e-06 [switch_simplify]: 7.395e-05 [loop_unroll]: 6.019e-05 [a_1]: 0.00140078 [with_stream_mark]: 2.888e-05 [recompute_prepare]: 2.632e-05 [updatestate_depend_eliminate]: 9.22001e-06 [updatestate_assign_eliminate]: 7.97e-06 [updatestate_loads_eliminate]: 7.85e-06 [parameter_eliminate]: 3.33e-06 [a_2]: 0.00025048 [accelerated_algorithm]: 3.45e-05 [shard]: 1.96e-06 [meta_shard_fg_expand]: 3.53e-06 [shard_inline]: 1.667e-05 [merge_send_recv]: 1.851e-05 [auto_parallel]: 1.3e-05 [parallel]: 2.157e-05 [flash_sp]: 1.302e-05 [merge_comm]: 1.068e-05 [allreduce_fusion]: 6.134e-05 [matmul_add_comm_reduction]: 2.893e-05 [allreduce_slice_to_reducescatter]: 9.70002e-07 [virtual_shard_identity]: 1.987e-05 [virtual_dataset]: 1.581e-05 [get_grad_eliminate_]: 1.545e-05 [virtual_output]: 1.524e-05 [merge_forward]: 1.015e-05 [cell_reuse_recompute_pass]: 1.56998e-06 [offload_activation]: 1.933e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.256e-05 [merge_recompute_call_nodes]: 1.89999e-06 [before_grad]: 2.869e-05 [set_forward_comm_id_for_comm_node_pass]: 1.001e-05 [meta_fg_expand]: 0.00161326 [flash_sp_send_recv_attached]: 4.48001e-06 [receive_attached]: 2.48e-06 [after_resolve]: 6.75e-05 [a_after_grad]: 9.008e-05 [renormalize]: 0.00666814 [add_forward_monad_depend]: 1.056e-05 [auto_monad_grad]: 6.06998e-06 [auto_monad_eliminator]: 5.48e-05 [cse]: 0.0001921 [a_3]: 0.00033871 [Cycle 2]: 0.00289403, [45] [expand_dump_flag]: 2.68e-06 [switch_simplify]: 4.537e-05 [loop_unroll]: 4.255e-05 [a_1]: 0.00136234 [with_stream_mark]: 1.656e-05 [recompute_prepare]: 1.194e-05 [updatestate_depend_eliminate]: 4.93001e-06 [updatestate_assign_eliminate]: 4.24002e-06 [updatestate_loads_eliminate]: 2.91999e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 9.086e-05 [accelerated_algorithm]: 1.263e-05 [shard]: 1.47999e-06 [meta_shard_fg_expand]: 2.45002e-06 [shard_inline]: 8.2e-06 [merge_send_recv]: 7.93001e-06 [auto_parallel]: 8.48999e-06 [parallel]: 7.83999e-06 [flash_sp]: 4.26001e-06 [merge_comm]: 4.23999e-06 [allreduce_fusion]: 3.71001e-06 [matmul_add_comm_reduction]: 7.88001e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 8.79e-06 [virtual_dataset]: 6.59999e-06 [get_grad_eliminate_]: 7.27002e-06 [virtual_output]: 6.94999e-06 [merge_forward]: 4.04002e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 1.028e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.579e-05 [merge_recompute_call_nodes]: 1.10001e-06 [before_grad]: 1.141e-05 [set_forward_comm_id_for_comm_node_pass]: 4.10998e-06 [meta_fg_expand]: 6.162e-05 [flash_sp_send_recv_attached]: 1.09e-06 [receive_attached]: 1.77999e-06 [after_resolve]: 1.276e-05 [a_after_grad]: 1.041e-05 [renormalize]: 0.00069904 [add_forward_monad_depend]: 5.34e-06 [auto_monad_grad]: 2.26998e-06 [auto_monad_eliminator]: 1.393e-05 [cse]: 2.597e-05 [a_3]: 5.161e-05 [Cycle 3]: 0.0007116, [45] [expand_dump_flag]: 1.81e-06 [switch_simplify]: 8.69998e-06 [loop_unroll]: 6.87002e-06 [a_1]: 0.00015014 [with_stream_mark]: 9.67001e-06 [recompute_prepare]: 7.21999e-06 [updatestate_depend_eliminate]: 4.67998e-06 [updatestate_assign_eliminate]: 3.09001e-06 [updatestate_loads_eliminate]: 2.51998e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 8.723e-05 [accelerated_algorithm]: 1.035e-05 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 7.19001e-06 [merge_send_recv]: 5.89e-06 [auto_parallel]: 6.93e-06 [parallel]: 6.04001e-06 [flash_sp]: 1.04998e-06 [merge_comm]: 3.9e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 7.21999e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 8.28999e-06 [virtual_dataset]: 6.46e-06 [get_grad_eliminate_]: 6.46e-06 [virtual_output]: 6.12999e-06 [merge_forward]: 3.33998e-06 [cell_reuse_recompute_pass]: 2.09e-06 [offload_activation]: 8.27e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.38e-05 [merge_recompute_call_nodes]: 1.05999e-06 [before_grad]: 1.077e-05 [set_forward_comm_id_for_comm_node_pass]: 3.85e-06 [meta_fg_expand]: 2.76e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.45999e-06 [after_resolve]: 9.64999e-06 [a_after_grad]: 9.82001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.39998e-06 [auto_monad_grad]: 1.08001e-06 [auto_monad_eliminator]: 8.33001e-06 [cse]: 1.792e-05 [a_3]: 4.088e-05 [py_interpret_to_execute_after_opt_a]: 1.339e-05 [slice_cell_reuse_recomputed_activation]: 2.32001e-06 [rewriter_after_opt_a]: 4.467e-05 [convert_after_rewriter]: 7.93001e-06 [order_py_execute_after_rewriter]: 5.37001e-06 [mutable_eliminate]: 0.00057366 [opt_b]: 0.00023166, [1] [Cycle 1]: 0.0002234, [7] [b_1]: 0.00013598 [b_2]: 9.54e-06 [updatestate_depend_eliminate]: 7.97998e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 3.03e-06 [renormalize]: 4.2998e-07 [cse]: 2.485e-05 [optimize_parallel_all_gather_comm]: 1.882e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.416e-05 [loop_unroll]: 0.00045768 [opt_after_cconv]: 0.00012028, [1] [Cycle 1]: 0.00011389, [7] [c_1]: 3.231e-05 [parameter_eliminate]: 3.40998e-06 [updatestate_depend_eliminate]: 1.3e-05 [updatestate_assign_eliminate]: 3.42997e-06 [updatestate_loads_eliminate]: 2.93003e-06 [cse]: 2.262e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.676e-05 [tuple_transform]: 8.315e-05, [1] [Cycle 1]: 7.802e-05, [4] [d_1]: 4.875e-05 [none_parameter_eliminate]: 1.99999e-06 [renormalize]: 2.20025e-07 [switch_simplify]: 7.9e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 5.686e-05 [cse_after_recomputation]: 2.523e-05, [1] [Cycle 1]: 2.052e-05, [1] [cse]: 1.459e-05 [environ_conv]: 9.56998e-06 [swap_dp_allreduce_reducescatter]: 6.21e-06 [bias_add_comm_swap]: 2.84999e-06 [label_micro_interleaved_index]: 4.48001e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.36998e-06 [assign_add_opt]: 1.94e-06 [ForceFp32Comm]: 8.89995e-07 [remove_cast_before_assign_add]: 9.09989e-07 [full_micro_interleaved_order_control]: 2.70997e-06 [reorder_send_recv_between_fp_bp]: 3.16001e-06 [comm_op_add_attrs]: 1.38002e-06 [add_comm_op_reuse_tag]: 1.27999e-06 [interleave_split_concat_branches]: 1.25999e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.40999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.476e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.15e-06 [overlap_recompute_and_grad_model_parallel]: 5.29e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 4.68999e-06 [overlap_grad_flash_sp]: 2.367e-05 [begin_end_overlap_inline]: 7.60017e-07 [split_matmul_comm_elemetwise]: 2.36998e-06 [split_layernorm_comm]: 2.15002e-06 [handle_group_info]: 1.19998e-06 [symbol_engine_optimizer]: 9.045e-05, [1] [Cycle 1]: 8.555e-05, [6] [build]: 1.017e-05 [elim_shapecalc]: 1.134e-05 [elim_not_effective]: 1.531e-05 [opt_reshape]: 7.58001e-06 [fold_const_symbol]: 1.189e-05 [renormalize]: 2.00002e-07 [detach_backward]: 2.19001e-06 [pipeline_parallel_scheduler]: 1.72001e-06 [auto_monad_reorder]: 2.095e-05 [get_jit_bprop_graph]: 1.71998e-06 [rewriter_after_jit_bprop_graph]: 4.87998e-06 [opt_after_jit_grad]: 0.00052564 [validate]: 4.764e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.123974 [execute]: 9.52001e-06 Sums bootstrap : 0.000544s : 0.35% type_inference : 0.012627s : 8.18% event_method : 0.000046s : 0.03% auto_monad : 0.000132s : 0.09% graph_reusing : 0.000009s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000052s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000041s : 0.03% optimize.rewriter_before_opt_a : 0.000148s : 0.10% optimize.opt_a.expand_dump_flag : 0.000009s : 0.01% optimize.opt_a.switch_simplify : 0.000128s : 0.08% optimize.opt_a.loop_unroll : 0.000110s : 0.07% optimize.opt_a.a_1 : 0.002913s : 1.89% optimize.opt_a.with_stream_mark : 0.000055s : 0.04% optimize.opt_a.recompute_prepare : 0.000045s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.01% optimize.opt_a.parameter_eliminate : 0.000006s : 0.00% optimize.opt_a.a_2 : 0.000429s : 0.28% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.04% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.00% optimize.opt_a.shard_inline : 0.000032s : 0.02% optimize.opt_a.merge_send_recv : 0.000032s : 0.02% optimize.opt_a.auto_parallel : 0.000028s : 0.02% optimize.opt_a.parallel : 0.000035s : 0.02% optimize.opt_a.flash_sp : 0.000018s : 0.01% optimize.opt_a.merge_comm : 0.000019s : 0.01% optimize.opt_a.allreduce_fusion : 0.000068s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.02% optimize.opt_a.virtual_dataset : 0.000029s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000029s : 0.02% optimize.opt_a.virtual_output : 0.000028s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000038s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000051s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.01% optimize.opt_a.meta_fg_expand : 0.001678s : 1.09% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000006s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.06% optimize.opt_a.a_after_grad : 0.000110s : 0.07% optimize.opt_a.renormalize : 0.007367s : 4.77% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.01% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000077s : 0.05% optimize.opt_a.cse : 0.000236s : 0.15% optimize.opt_a.a_3 : 0.000431s : 0.28% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000045s : 0.03% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000574s : 0.37% optimize.opt_b.b_1 : 0.000136s : 0.09% optimize.opt_b.b_2 : 0.000010s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000025s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.02% optimize.loop_unroll : 0.000458s : 0.30% optimize.opt_after_cconv.c_1 : 0.000032s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000013s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000023s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000017s : 0.01% optimize.tuple_transform.d_1 : 0.000049s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.04% optimize.cse_after_recomputation.cse : 0.000015s : 0.01% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000526s : 0.34% validate : 0.000048s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.123974s : 80.29% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000754 159 7.34% : 0.000055s : 7: substitution.arithmetic_simplify 0.34% : 0.000003s : 3: substitution.elim_not_effective 0.62% : 0.000005s : 5: substitution.float_depend_g_call 0.51% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.23% : 0.000002s : 3: substitution.fold_const_symbol 0.88% : 0.000007s : 4: substitution.graph_param_transform 0.43% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 58.10% : 0.000438s : 17: substitution.inline 2.45% : 0.000018s : 2: substitution.inline_without_move 1.39% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.35% : 0.000018s : 3: substitution.less_batch_normalization 1.35% : 0.000010s : 7: substitution.minmaximum_grad 0.82% : 0.000006s : 5: substitution.partial_eliminate 1.75% : 0.000013s : 15: substitution.remove_not_recompute_node 3.76% : 0.000028s : 10: substitution.replace_applicator 1.36% : 0.000010s : 10: substitution.replace_old_param 0.49% : 0.000004s : 1: substitution.set_cell_output_no_recompute 2.94% : 0.000022s : 7: substitution.tuple_list_convert_item_index_to_positive 1.43% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.97% : 0.000015s : 7: substitution.tuple_list_get_item_depend_reorder 7.22% : 0.000054s : 18: substitution.tuple_list_get_item_eliminator 1.96% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012539 2 87.76% : 0.011005s : 1: type_inference.infer 12.24% : 0.001534s : 1: type_inference.specialize ------[replace.] 0.000202 26 66.93% : 0.000135s : 17: replace.inline 33.07% : 0.000067s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000453 26 94.30% : 0.000427s : 17: match.inline 5.70% : 0.000026s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000686 4180 1.10% : 0.000008s : 52: predicate.accumulaten_eliminater 0.26% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.13% : 0.000008s : 52: predicate.addn_zero_filter 1.08% : 0.000007s : 52: predicate.adjust_all_reduce_mul_add 2.06% : 0.000014s : 73: predicate.arithmetic_simplify 1.13% : 0.000008s : 52: predicate.cast_eliminate 1.11% : 0.000008s : 50: predicate.check_bprop_eliminate 0.45% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.46% : 0.000003s : 21: predicate.depend_value_elim 1.14% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.20% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000000s : 4: predicate.elim_not_effective 0.14% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.17% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.16% : 0.000008s : 56: predicate.environ_get_depend_swap 1.67% : 0.000011s : 77: predicate.environ_get_eliminate 1.16% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.80% : 0.000012s : 78: predicate.exchange_switch_depend_value 2.51% : 0.000017s : 78: predicate.float_depend_g_call 0.46% : 0.000003s : 21: predicate.float_environ_get_switch 0.58% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.55% : 0.000004s : 21: predicate.get_grad_eliminate 0.10% : 0.000001s : 4: predicate.graph_param_transform 0.52% : 0.000004s : 21: predicate.incorporate_call 0.45% : 0.000003s : 21: predicate.incorporate_call_switch 5.97% : 0.000041s : 180: predicate.inline 1.48% : 0.000010s : 45: predicate.inline_without_move 0.29% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 21: predicate.less_batch_normalization 1.54% : 0.000011s : 69: predicate.list_to_tuple_eliminator_ 2.64% : 0.000018s : 121: predicate.load_eliminater 0.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.52% : 0.000017s : 110: predicate.loop_unroll_before_grad 1.37% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.48% : 0.000003s : 21: predicate.merge_addn 1.09% : 0.000007s : 50: predicate.micro_step_allgather_replace 1.08% : 0.000007s : 50: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 52: predicate.minmaximum_grad 0.46% : 0.000003s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.11% : 0.000001s : 4: predicate.parallel_virtual_node 2.12% : 0.000015s : 78: predicate.partial_defer_inline 1.70% : 0.000012s : 65: predicate.partial_eliminate 1.11% : 0.000008s : 52: predicate.print_const_string_wrapper 0.48% : 0.000003s : 21: predicate.reduce_all_const_elim 1.36% : 0.000009s : 52: predicate.reduce_eliminate 2.59% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.39% : 0.000003s : 21: predicate.remove_not_recompute_node 1.87% : 0.000013s : 111: predicate.replace_applicator 0.69% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.12% : 0.000008s : 52: predicate.reshape_eliminate 1.12% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 4: predicate.row_tensor_eliminate 1.23% : 0.000008s : 50: predicate.same_eliminate 0.34% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.74% : 0.000005s : 21: predicate.shard_identity_eliminate 0.26% : 0.000002s : 8: predicate.special_op_eliminate 0.64% : 0.000004s : 21: predicate.specialize_transform 1.25% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.94% : 0.000013s : 78: predicate.switch_defer_inline 2.99% : 0.000021s : 128: predicate.switch_layer_defer_inline 5.25% : 0.000036s : 213: predicate.switch_simplify 1.12% : 0.000008s : 52: predicate.tile_eliminate 1.08% : 0.000007s : 52: predicate.transpose_eliminate 1.41% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000010s : 60: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.72% : 0.000019s : 90: predicate.tuple_list_get_item_eliminator 1.48% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000014s : 81: predicate.tuple_list_set_item_eliminator 1.55% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.60% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.11% : 0.000021s : 142: predicate.updatestate_useless_node_eliminater 0.11% : 0.000001s : 4: predicate.value_based_eliminate 0.51% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.51% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.13% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001841 35 59.98% : 0.001104s : 14: func_graph_cloner_run.FuncGraphClonerGraph 40.02% : 0.000737s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.191937 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.70% : 0.003264s : 1: add_attr 1.69% : 0.003252s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000061s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.07% : 0.000140s : 1: auto_monad 0.01% : 0.000025s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.30% : 0.000584s : 1: bootstrap 0.01% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000028s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.03% : 0.000053s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.24% : 0.000467s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.31% : 0.000586s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.31% : 0.004436s : 117: opt.transform.opt_a 0.02% : 0.000031s : 1: opt.transform.opt_after_cconv 0.01% : 0.000027s : 1: opt.transform.opt_after_jit_grad 0.06% : 0.000117s : 28: opt.transform.opt_b 0.03% : 0.000054s : 2: opt.transform.opt_trans_graph 0.02% : 0.000042s : 4: opt.transform.symbol_engine_opt 7.96% : 0.015275s : 1: opt_a 0.06% : 0.000124s : 1: opt_after_cconv 0.28% : 0.000536s : 1: opt_after_jit_grad 0.12% : 0.000235s : 1: opt_b 9.16% : 0.017586s : 1: optimize 0.01% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000057s : 1: pre_auto_parallel 0.02% : 0.000046s : 1: py_interpret_to_execute 0.01% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000020s : 1: remove_dup_value 2.95% : 0.005657s : 2: renormalize.infer 0.88% : 0.001694s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000049s : 1: rewriter_after_opt_a 0.08% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000093s : 1: symbol_engine_optimizer 64.60% : 0.123998s : 1: task_emit 0.04% : 0.000086s : 1: tuple_transform 6.59% : 0.012650s : 1: type_inference 0.04% : 0.000073s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x2-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x2-ge],max_mem:8.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x3-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x3-pynative],max_mem:8.0M TotalTime = 0.0244302, [24] [bootstrap]: 0.00069297 [type_inference]: 0.00712441 [event_method]: 1.489e-05 [auto_monad]: 6.45e-05 [graph_reusing]: 5.23002e-06 [inline]: 2.48e-06 [add_attr]: 0.00388891, [1] [add_attr_with_inline]: 0.00387622, [1] [Cycle 1]: 5.945e-05, [2] [tag_attr]: 1.731e-05 [meta_addattr_fg_expand]: 4.57e-06 [parallel-infer-symbol]: 3.7e-06 [pre_auto_parallel]: 3.184e-05 [insert-virtual-dataset]: 2.68e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 2.02001e-06 [pipeline_split]: 1.61998e-06 [optimize]: 0.00465043, [53] [py_interpret_to_execute]: 2.51e-05 [rewriter_before_opt_a]: 6.765e-05 [opt_a]: 0.00250754, [2] [Cycle 1]: 0.00186997, [45] [expand_dump_flag]: 2.78e-06 [switch_simplify]: 3.421e-05 [loop_unroll]: 2.062e-05 [a_1]: 0.00051939 [with_stream_mark]: 1.588e-05 [recompute_prepare]: 9.19998e-06 [updatestate_depend_eliminate]: 4.13001e-06 [updatestate_assign_eliminate]: 3.38999e-06 [updatestate_loads_eliminate]: 3.31001e-06 [parameter_eliminate]: 1.90001e-06 [a_2]: 8.369e-05 [accelerated_algorithm]: 7.23999e-06 [shard]: 2.24999e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 6.22001e-06 [merge_send_recv]: 8.83001e-06 [auto_parallel]: 6.98e-06 [parallel]: 2.785e-05 [flash_sp]: 8.69e-06 [merge_comm]: 4.80999e-06 [allreduce_fusion]: 3.61999e-06 [matmul_add_comm_reduction]: 1.042e-05 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 7.90998e-06 [virtual_dataset]: 6.48e-06 [get_grad_eliminate_]: 5.89e-06 [virtual_output]: 6.01998e-06 [merge_forward]: 4.13001e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 1.065e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.292e-05 [merge_recompute_call_nodes]: 1.55001e-06 [before_grad]: 1.039e-05 [set_forward_comm_id_for_comm_node_pass]: 3.88001e-06 [meta_fg_expand]: 2.81e-06 [flash_sp_send_recv_attached]: 2.66e-06 [receive_attached]: 2.34999e-06 [after_resolve]: 1.084e-05 [a_after_grad]: 8.61997e-06 [renormalize]: 0.00062908 [add_forward_monad_depend]: 8.45001e-06 [auto_monad_grad]: 2.33998e-06 [auto_monad_eliminator]: 1.543e-05 [cse]: 3.137e-05 [a_3]: 4.516e-05 [Cycle 2]: 0.00062662, [45] [expand_dump_flag]: 1.36998e-06 [switch_simplify]: 7.39002e-06 [loop_unroll]: 5.75001e-06 [a_1]: 0.00011879 [with_stream_mark]: 1.198e-05 [recompute_prepare]: 6.12999e-06 [updatestate_depend_eliminate]: 3.29001e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.75997e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 7.334e-05 [accelerated_algorithm]: 6.19001e-06 [shard]: 1.24998e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 6.12001e-06 [auto_parallel]: 6.09999e-06 [parallel]: 5.10001e-06 [flash_sp]: 3.76001e-06 [merge_comm]: 3.3e-06 [allreduce_fusion]: 3.05002e-06 [matmul_add_comm_reduction]: 6.44001e-06 [allreduce_slice_to_reducescatter]: 4.2998e-07 [virtual_shard_identity]: 6.19001e-06 [virtual_dataset]: 5.42999e-06 [get_grad_eliminate_]: 5.28002e-06 [virtual_output]: 5.22999e-06 [merge_forward]: 3.11999e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 8e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.032e-05 [merge_recompute_call_nodes]: 1.00999e-06 [before_grad]: 8.68001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.64002e-06 [meta_fg_expand]: 2.24999e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.17999e-06 [a_after_grad]: 7.87003e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 7.02002e-06 [cse]: 1.907e-05 [a_3]: 3.393e-05 [py_interpret_to_execute_after_opt_a]: 1.086e-05 [slice_cell_reuse_recomputed_activation]: 1.96998e-06 [rewriter_after_opt_a]: 3.397e-05 [convert_after_rewriter]: 7.15e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.0005669 [opt_b]: 0.00019845, [1] [Cycle 1]: 0.00019146, [7] [b_1]: 0.00011325 [b_2]: 7.28999e-06 [updatestate_depend_eliminate]: 7.05998e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 7.7e-07 [cse]: 2.083e-05 [optimize_parallel_all_gather_comm]: 1.703e-05 [overlap_param_gather]: 2.01998e-06 [cconv]: 2.649e-05 [loop_unroll]: 0.00045452 [opt_after_cconv]: 0.00010064, [1] [Cycle 1]: 9.475e-05, [7] [c_1]: 2.638e-05 [parameter_eliminate]: 3.5e-06 [updatestate_depend_eliminate]: 6.16e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.71e-06 [cse]: 1.798e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 1.548e-05 [tuple_transform]: 7.353e-05, [1] [Cycle 1]: 6.888e-05, [4] [d_1]: 4.121e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 4.50003e-07 [switch_simplify]: 6.65998e-06 [partial_unused_args_eliminate]: 2.16e-06 [add_recomputation]: 5.742e-05 [cse_after_recomputation]: 2.246e-05, [1] [Cycle 1]: 1.79e-05, [1] [cse]: 1.241e-05 [environ_conv]: 8.18001e-06 [swap_dp_allreduce_reducescatter]: 5.38002e-06 [bias_add_comm_swap]: 2.90002e-06 [label_micro_interleaved_index]: 5.05001e-06 [label_fine_grained_interleaved_index]: 3.13e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.86e-06 [assign_add_opt]: 1.60001e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.59001e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 1.42e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.11002e-06 [overlap_opt_shard_in_pipeline]: 1.21002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.354e-05 [grouped_pairwise_exchange_alltoall]: 1.57999e-06 [offloading_packed_experts]: 3.72002e-06 [overlap_recompute_and_grad_model_parallel]: 4.97999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.65002e-06 [overlap_grad_ring_attention]: 4.23001e-06 [overlap_grad_flash_sp]: 1.941e-05 [begin_end_overlap_inline]: 5.40022e-07 [split_matmul_comm_elemetwise]: 2.47001e-06 [split_layernorm_comm]: 1.91e-06 [handle_group_info]: 1.11002e-06 [symbol_engine_optimizer]: 0.00011689, [1] [Cycle 1]: 0.00011112, [6] [build]: 2.79999e-06 [elim_shapecalc]: 9.69999e-06 [elim_not_effective]: 1.337e-05 [opt_reshape]: 6.56e-06 [fold_const_symbol]: 9.94001e-06 [renormalize]: 2.29978e-07 [detach_backward]: 2.38002e-06 [pipeline_parallel_scheduler]: 1.71e-06 [auto_monad_reorder]: 1.701e-05 [get_jit_bprop_graph]: 1.60999e-06 [rewriter_after_jit_bprop_graph]: 0.00017116 [opt_after_jit_grad]: 0.00053249 [validate]: 4.003e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00692774 [execute]: 8.2e-06 Sums bootstrap : 0.000693s : 3.57% type_inference : 0.007124s : 36.66% event_method : 0.000015s : 0.08% auto_monad : 0.000064s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000032s : 0.16% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000025s : 0.13% optimize.rewriter_before_opt_a : 0.000068s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000042s : 0.21% optimize.opt_a.loop_unroll : 0.000026s : 0.14% optimize.opt_a.a_1 : 0.000638s : 3.28% optimize.opt_a.with_stream_mark : 0.000028s : 0.14% optimize.opt_a.recompute_prepare : 0.000015s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000157s : 0.81% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000015s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.07% optimize.opt_a.parallel : 0.000033s : 0.17% optimize.opt_a.flash_sp : 0.000012s : 0.06% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.07% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000019s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.10% optimize.opt_a.a_after_grad : 0.000016s : 0.08% optimize.opt_a.renormalize : 0.000629s : 3.24% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.05% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.12% optimize.opt_a.cse : 0.000050s : 0.26% optimize.opt_a.a_3 : 0.000079s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000567s : 2.92% optimize.opt_b.b_1 : 0.000113s : 0.58% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000021s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000026s : 0.14% optimize.loop_unroll : 0.000455s : 2.34% optimize.opt_after_cconv.c_1 : 0.000026s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000057s : 0.30% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000008s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000171s : 0.88% opt_after_jit_grad : 0.000532s : 2.74% validate : 0.000040s : 0.21% backend_pass : 0.000001s : 0.00% task_emit : 0.006928s : 35.65% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000237 26 35.14% : 0.000083s : 5: substitution.arithmetic_simplify 0.92% : 0.000002s : 2: substitution.elim_not_effective 0.60% : 0.000001s : 2: substitution.fold_const_symbol 2.58% : 0.000006s : 3: substitution.graph_param_transform 51.18% : 0.000121s : 3: substitution.inline 1.44% : 0.000003s : 4: substitution.j_node_and_user_rematch 1.87% : 0.000004s : 4: substitution.remove_not_recompute_node 1.96% : 0.000005s : 2: substitution.replace_old_param 4.30% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007059 2 90.67% : 0.006401s : 1: type_inference.infer 9.33% : 0.000658s : 1: type_inference.specialize ------[replace.] 0.000040 4 79.19% : 0.000032s : 3: replace.inline 20.81% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000129 4 92.74% : 0.000119s : 3: match.inline 7.26% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 883 0.92% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.87% : 0.000001s : 9: predicate.addn_zero_filter 0.82% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.03% : 0.000003s : 15: predicate.arithmetic_simplify 0.91% : 0.000001s : 9: predicate.cast_eliminate 0.62% : 0.000001s : 6: predicate.check_bprop_eliminate 0.59% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.60% : 0.000001s : 6: predicate.depend_value_elim 0.89% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.20% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 3: predicate.elim_not_effective 0.47% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_depend_swap 2.11% : 0.000003s : 18: predicate.environ_get_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.25% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.87% : 0.000001s : 6: predicate.get_grad_eliminate 0.21% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.55% : 0.000001s : 6: predicate.incorporate_call_switch 6.63% : 0.000011s : 40: predicate.inline 0.85% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 6: predicate.less_batch_normalization 1.65% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 25: predicate.load_eliminater 1.02% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.10% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.63% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 9: predicate.minmaximum_grad 1.43% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.71% : 0.000003s : 13: predicate.partial_defer_inline 1.44% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.61% : 0.000001s : 6: predicate.reduce_all_const_elim 1.25% : 0.000002s : 9: predicate.reduce_eliminate 2.45% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 6: predicate.remove_not_recompute_node 1.18% : 0.000002s : 16: predicate.replace_applicator 0.51% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 0.88% : 0.000001s : 9: predicate.reshape_eliminate 0.58% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.83% : 0.000001s : 6: predicate.same_eliminate 0.43% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 6: predicate.shard_identity_eliminate 0.76% : 0.000001s : 6: predicate.special_op_eliminate 0.81% : 0.000001s : 6: predicate.specialize_transform 0.98% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.72% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 13: predicate.switch_defer_inline 1.95% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.14% : 0.000008s : 43: predicate.switch_simplify 0.95% : 0.000002s : 9: predicate.tile_eliminate 0.89% : 0.000001s : 9: predicate.transpose_eliminate 1.59% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.24% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 3: predicate.value_based_eliminate 0.66% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000434 8 45.78% : 0.000199s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.22% : 0.000236s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.034777 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.20% : 0.003895s : 1: add_attr 11.16% : 0.003880s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000069s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.11% : 0.000733s : 1: bootstrap 0.09% : 0.000030s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000026s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000011s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.04% : 0.000014s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.34% : 0.000464s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.66% : 0.000577s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000016s : 1: opt.transform.mutable_eliminate 2.95% : 0.001025s : 78: opt.transform.opt_a 0.07% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000092s : 28: opt.transform.opt_b 0.13% : 0.000046s : 2: opt.transform.opt_trans_graph 0.10% : 0.000036s : 4: opt.transform.symbol_engine_opt 7.22% : 0.002511s : 1: opt_a 0.30% : 0.000104s : 1: opt_after_cconv 1.57% : 0.000545s : 1: opt_after_jit_grad 0.58% : 0.000202s : 1: opt_b 13.39% : 0.004656s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.10% : 0.000036s : 1: pre_auto_parallel 0.08% : 0.000029s : 1: py_interpret_to_execute 0.04% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000019s : 1: remove_dup_value 0.95% : 0.000331s : 1: renormalize.infer 0.83% : 0.000289s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.51% : 0.000177s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000038s : 1: rewriter_after_opt_a 0.21% : 0.000072s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.34% : 0.000120s : 1: symbol_engine_optimizer 19.97% : 0.006944s : 1: task_emit 0.22% : 0.000076s : 1: tuple_transform 20.55% : 0.007146s : 1: type_inference 0.22% : 0.000076s : 1: validate TotalTime = 0.0206209, [24] [bootstrap]: 0.00047828 [type_inference]: 0.00588261 [event_method]: 1.276e-05 [auto_monad]: 6.096e-05 [graph_reusing]: 6.59999e-06 [inline]: 2.02001e-06 [add_attr]: 0.00311725, [1] [add_attr_with_inline]: 0.00310821, [1] [Cycle 1]: 5.514e-05, [2] [tag_attr]: 1.357e-05 [meta_addattr_fg_expand]: 4.18001e-06 [parallel-infer-symbol]: 3.53e-06 [pre_auto_parallel]: 2.713e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 2.12001e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.00406035, [53] [py_interpret_to_execute]: 2.055e-05 [rewriter_before_opt_a]: 5.292e-05 [opt_a]: 0.00213794, [2] [Cycle 1]: 0.00151888, [45] [expand_dump_flag]: 2.76999e-06 [switch_simplify]: 2.929e-05 [loop_unroll]: 1.726e-05 [a_1]: 0.00035687 [with_stream_mark]: 1.513e-05 [recompute_prepare]: 8.04997e-06 [updatestate_depend_eliminate]: 4.20999e-06 [updatestate_assign_eliminate]: 3.31999e-06 [updatestate_loads_eliminate]: 3.21999e-06 [parameter_eliminate]: 1.72999e-06 [a_2]: 8.216e-05 [accelerated_algorithm]: 6.67002e-06 [shard]: 2.10002e-06 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 6.41e-06 [merge_send_recv]: 8.31002e-06 [auto_parallel]: 6.38e-06 [parallel]: 1.841e-05 [flash_sp]: 7.95e-06 [merge_comm]: 3.66999e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 9.50001e-06 [allreduce_slice_to_reducescatter]: 6.90023e-07 [virtual_shard_identity]: 7.75998e-06 [virtual_dataset]: 6.01e-06 [get_grad_eliminate_]: 5.81e-06 [virtual_output]: 5.74e-06 [merge_forward]: 3.90998e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 1.04e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.207e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 1.048e-05 [set_forward_comm_id_for_comm_node_pass]: 3.63999e-06 [meta_fg_expand]: 2.79001e-06 [flash_sp_send_recv_attached]: 2.68998e-06 [receive_attached]: 2.43e-06 [after_resolve]: 9.98998e-06 [a_after_grad]: 9.52999e-06 [renormalize]: 0.00045308 [add_forward_monad_depend]: 5.16002e-06 [auto_monad_grad]: 2.07001e-06 [auto_monad_eliminator]: 1.316e-05 [cse]: 3.071e-05 [a_3]: 4.321e-05 [Cycle 2]: 0.00060894, [45] [expand_dump_flag]: 1.05999e-06 [switch_simplify]: 7.24001e-06 [loop_unroll]: 5.78002e-06 [a_1]: 0.00011438 [with_stream_mark]: 1.053e-05 [recompute_prepare]: 5.92999e-06 [updatestate_depend_eliminate]: 2.85998e-06 [updatestate_assign_eliminate]: 2.33998e-06 [updatestate_loads_eliminate]: 2.66e-06 [parameter_eliminate]: 1.00999e-06 [a_2]: 7.271e-05 [accelerated_algorithm]: 5.94e-06 [shard]: 9.29984e-07 [meta_shard_fg_expand]: 1.39998e-06 [shard_inline]: 5.83002e-06 [merge_send_recv]: 4.4e-06 [auto_parallel]: 5.79e-06 [parallel]: 4.62e-06 [flash_sp]: 3.21999e-06 [merge_comm]: 3.37002e-06 [allreduce_fusion]: 2.98e-06 [matmul_add_comm_reduction]: 5.86e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.29001e-06 [virtual_dataset]: 5.54e-06 [get_grad_eliminate_]: 5.29998e-06 [virtual_output]: 5.19e-06 [merge_forward]: 2.69001e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 6.16e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.033e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 8.92e-06 [set_forward_comm_id_for_comm_node_pass]: 3.85998e-06 [meta_fg_expand]: 1.84998e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.12e-06 [after_resolve]: 8.24998e-06 [a_after_grad]: 7.76001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 9.99979e-07 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.51e-06 [cse]: 1.375e-05 [a_3]: 3.388e-05 [py_interpret_to_execute_after_opt_a]: 7.91001e-06 [slice_cell_reuse_recomputed_activation]: 2.39001e-06 [rewriter_after_opt_a]: 3.443e-05 [convert_after_rewriter]: 6.57002e-06 [order_py_execute_after_rewriter]: 5.11002e-06 [mutable_eliminate]: 0.00049098 [opt_b]: 0.00019013, [1] [Cycle 1]: 0.00018333, [7] [b_1]: 0.00011269 [b_2]: 7.31001e-06 [updatestate_depend_eliminate]: 5.44e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.00002e-07 [cse]: 1.777e-05 [optimize_parallel_all_gather_comm]: 1.65e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.369e-05 [loop_unroll]: 0.00042452 [opt_after_cconv]: 9.616e-05, [1] [Cycle 1]: 9.032e-05, [7] [c_1]: 2.636e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.66e-06 [cse]: 1.729e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.496e-05 [tuple_transform]: 6.745e-05, [1] [Cycle 1]: 6.306e-05, [4] [d_1]: 3.589e-05 [none_parameter_eliminate]: 1.86003e-06 [renormalize]: 2.9002e-07 [switch_simplify]: 6.23e-06 [partial_unused_args_eliminate]: 1.79998e-06 [add_recomputation]: 4.581e-05 [cse_after_recomputation]: 2.193e-05, [1] [Cycle 1]: 1.725e-05, [1] [cse]: 1.173e-05 [environ_conv]: 5.49e-06 [swap_dp_allreduce_reducescatter]: 5.29998e-06 [bias_add_comm_swap]: 2.88998e-06 [label_micro_interleaved_index]: 4.36002e-06 [label_fine_grained_interleaved_index]: 2.86e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.14e-06 [micro_interleaved_order_control]: 2.78003e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.22e-06 [full_micro_interleaved_order_control]: 2.94001e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 1.17999e-06 [add_comm_op_reuse_tag]: 1.19998e-06 [interleave_split_concat_branches]: 1.29998e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.50999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.344e-05 [grouped_pairwise_exchange_alltoall]: 1.85001e-06 [offloading_packed_experts]: 4.75001e-06 [overlap_recompute_and_grad_model_parallel]: 5.49e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.36e-06 [overlap_grad_ring_attention]: 4.56002e-06 [overlap_grad_flash_sp]: 1.778e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.21e-06 [split_layernorm_comm]: 1.87001e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 7.424e-05, [1] [Cycle 1]: 6.978e-05, [6] [build]: 3.01001e-06 [elim_shapecalc]: 9.29998e-06 [elim_not_effective]: 1.286e-05 [opt_reshape]: 6.29001e-06 [fold_const_symbol]: 9.94999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 2.09e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 1.718e-05 [get_jit_bprop_graph]: 1.27999e-06 [rewriter_after_jit_bprop_graph]: 3.85998e-06 [opt_after_jit_grad]: 0.00046046 [validate]: 3.548e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.0062219 [execute]: 7.19001e-06 Sums bootstrap : 0.000478s : 2.91% type_inference : 0.005883s : 35.76% event_method : 0.000013s : 0.08% auto_monad : 0.000061s : 0.37% graph_reusing : 0.000007s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000053s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000037s : 0.22% optimize.opt_a.loop_unroll : 0.000023s : 0.14% optimize.opt_a.a_1 : 0.000471s : 2.86% optimize.opt_a.with_stream_mark : 0.000026s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000155s : 0.94% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000453s : 2.75% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000044s : 0.27% optimize.opt_a.a_3 : 0.000077s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000491s : 2.98% optimize.opt_b.b_1 : 0.000113s : 0.68% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.14% optimize.loop_unroll : 0.000425s : 2.58% optimize.opt_after_cconv.c_1 : 0.000026s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000036s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000460s : 2.80% validate : 0.000035s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006222s : 37.82% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000142 24 20.43% : 0.000029s : 4: substitution.arithmetic_simplify 1.57% : 0.000002s : 2: substitution.elim_not_effective 1.18% : 0.000002s : 2: substitution.fold_const_symbol 3.55% : 0.000005s : 3: substitution.graph_param_transform 65.98% : 0.000094s : 3: substitution.inline 2.20% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.08% : 0.000004s : 4: substitution.remove_not_recompute_node 2.01% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005836 2 91.96% : 0.005367s : 1: type_inference.infer 8.04% : 0.000469s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000092 3 100.00% : 0.000092s : 3: match.inline ------[predicate.] 0.000148 815 1.13% : 0.000002s : 8: predicate.accumulaten_eliminater 0.96% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.87% : 0.000001s : 8: predicate.addn_zero_filter 0.76% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.17% : 0.000003s : 14: predicate.arithmetic_simplify 0.87% : 0.000001s : 8: predicate.cast_eliminate 0.68% : 0.000001s : 6: predicate.check_bprop_eliminate 0.65% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.82% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 3: predicate.elim_not_effective 0.51% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.32% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_depend_swap 2.01% : 0.000003s : 17: predicate.environ_get_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.16% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.34% : 0.000003s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 6: predicate.float_environ_get_switch 0.90% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.69% : 0.000001s : 6: predicate.incorporate_call 0.62% : 0.000001s : 6: predicate.incorporate_call_switch 6.19% : 0.000009s : 37: predicate.inline 0.99% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 6: predicate.less_batch_normalization 1.51% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.30% : 0.000003s : 22: predicate.load_eliminater 1.19% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.03% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 6: predicate.merge_addn 0.65% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.27% : 0.000002s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.45% : 0.000001s : 3: predicate.parallel_virtual_node 1.40% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 11: predicate.partial_eliminate 1.03% : 0.000002s : 8: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.22% : 0.000002s : 8: predicate.reduce_eliminate 2.17% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.61% : 0.000001s : 6: predicate.remove_not_recompute_node 1.34% : 0.000002s : 14: predicate.replace_applicator 0.79% : 0.000001s : 6: predicate.replace_old_param 0.32% : 0.000000s : 3: predicate.reset_defer_inline 0.88% : 0.000001s : 8: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 3: predicate.row_tensor_eliminate 0.99% : 0.000001s : 6: predicate.same_eliminate 0.51% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 6: predicate.shard_identity_eliminate 0.85% : 0.000001s : 6: predicate.special_op_eliminate 0.88% : 0.000001s : 6: predicate.specialize_transform 0.99% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.24% : 0.000002s : 11: predicate.switch_defer_inline 1.88% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.85% : 0.000007s : 38: predicate.switch_simplify 0.88% : 0.000001s : 8: predicate.tile_eliminate 0.87% : 0.000001s : 8: predicate.transpose_eliminate 1.53% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.23% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 3: predicate.value_based_eliminate 0.72% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000265 7 33.31% : 0.000088s : 2: func_graph_cloner_run.FuncGraphClonerGraph 66.69% : 0.000177s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029246 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.68% : 0.003122s : 1: add_attr 10.64% : 0.003112s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000066s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.76% : 0.000513s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.04% : 0.000010s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.48% : 0.000433s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.71% : 0.000500s : 1: mutable_eliminate 0.03% : 0.000008s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.88% : 0.000842s : 78: opt.transform.opt_a 0.09% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000092s : 28: opt.transform.opt_b 0.14% : 0.000040s : 2: opt.transform.opt_trans_graph 0.12% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.32% : 0.002141s : 1: opt_a 0.34% : 0.000099s : 1: opt_after_cconv 1.60% : 0.000469s : 1: opt_after_jit_grad 0.66% : 0.000193s : 1: opt_b 13.90% : 0.004065s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.81% : 0.000238s : 1: renormalize.infer 0.71% : 0.000208s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000038s : 1: rewriter_after_opt_a 0.19% : 0.000057s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000077s : 1: symbol_engine_optimizer 21.32% : 0.006234s : 1: task_emit 0.24% : 0.000070s : 1: tuple_transform 20.18% : 0.005902s : 1: type_inference 0.22% : 0.000063s : 1: validate TotalTime = 0.0211209, [24] [bootstrap]: 0.00051849 [type_inference]: 0.0059788 [event_method]: 1.536e-05 [auto_monad]: 6.178e-05 [graph_reusing]: 5.57001e-06 [inline]: 2.46e-06 [add_attr]: 0.0032275, [1] [add_attr_with_inline]: 0.00321891, [1] [Cycle 1]: 5.511e-05, [2] [tag_attr]: 1.528e-05 [meta_addattr_fg_expand]: 4.4e-06 [parallel-infer-symbol]: 3.16999e-06 [pre_auto_parallel]: 2.875e-05 [insert-virtual-dataset]: 2.63e-06 [parallel-infer-symbol-second]: 8.59989e-07 [dataset_repeat_opt]: 2.07999e-06 [pipeline_split]: 2.06e-06 [optimize]: 0.00432615, [53] [py_interpret_to_execute]: 2.29e-05 [rewriter_before_opt_a]: 6.524e-05 [opt_a]: 0.00234428, [2] [Cycle 1]: 0.00171967, [45] [expand_dump_flag]: 3.28e-06 [switch_simplify]: 3.308e-05 [loop_unroll]: 2.087e-05 [a_1]: 0.00044862 [with_stream_mark]: 1.497e-05 [recompute_prepare]: 8.2e-06 [updatestate_depend_eliminate]: 3.85e-06 [updatestate_assign_eliminate]: 3.16001e-06 [updatestate_loads_eliminate]: 3.36999e-06 [parameter_eliminate]: 1.75001e-06 [a_2]: 8.065e-05 [accelerated_algorithm]: 6.94999e-06 [shard]: 2.44001e-06 [meta_shard_fg_expand]: 1.65001e-06 [shard_inline]: 6.21e-06 [merge_send_recv]: 8.15999e-06 [auto_parallel]: 6.45002e-06 [parallel]: 1.934e-05 [flash_sp]: 7.97e-06 [merge_comm]: 4.04002e-06 [allreduce_fusion]: 3.77002e-06 [matmul_add_comm_reduction]: 9.93002e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.9e-06 [virtual_dataset]: 6.41e-06 [get_grad_eliminate_]: 5.76e-06 [virtual_output]: 5.82001e-06 [merge_forward]: 4.17e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 1.048e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.187e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 1.058e-05 [set_forward_comm_id_for_comm_node_pass]: 3.81999e-06 [meta_fg_expand]: 2.74001e-06 [flash_sp_send_recv_attached]: 2.58e-06 [receive_attached]: 2.63e-06 [after_resolve]: 1.077e-05 [a_after_grad]: 9.13002e-06 [renormalize]: 0.0005856 [add_forward_monad_depend]: 4.78001e-06 [auto_monad_grad]: 2.02001e-06 [auto_monad_eliminator]: 1.337e-05 [cse]: 2.964e-05 [a_3]: 4.378e-05 [Cycle 2]: 0.0006134, [45] [expand_dump_flag]: 8.49977e-07 [switch_simplify]: 7.31001e-06 [loop_unroll]: 5.79e-06 [a_1]: 0.00011575 [with_stream_mark]: 1.034e-05 [recompute_prepare]: 5.86e-06 [updatestate_depend_eliminate]: 3.26001e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 3.01001e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 7.307e-05 [accelerated_algorithm]: 5.89e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.25999e-06 [shard_inline]: 5.74999e-06 [merge_send_recv]: 4.99e-06 [auto_parallel]: 5.50001e-06 [parallel]: 4.74002e-06 [flash_sp]: 3.28998e-06 [merge_comm]: 3.32002e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 5.94999e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.18998e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.20999e-06 [virtual_output]: 5.11997e-06 [merge_forward]: 2.73e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 6.33998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.06e-05 [merge_recompute_call_nodes]: 1.14e-06 [before_grad]: 8.65001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.78001e-06 [meta_fg_expand]: 1.84998e-06 [flash_sp_send_recv_attached]: 7.00005e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 9.07999e-06 [a_after_grad]: 8.02e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.20001e-06 [auto_monad_grad]: 1.30001e-06 [auto_monad_eliminator]: 6.46e-06 [cse]: 1.724e-05 [a_3]: 3.327e-05 [py_interpret_to_execute_after_opt_a]: 9.34998e-06 [slice_cell_reuse_recomputed_activation]: 2.13998e-06 [rewriter_after_opt_a]: 3.407e-05 [convert_after_rewriter]: 6.79999e-06 [order_py_execute_after_rewriter]: 5.28002e-06 [mutable_eliminate]: 0.00051601 [opt_b]: 0.00019436, [1] [Cycle 1]: 0.00018749, [7] [b_1]: 0.00011181 [b_2]: 7.07002e-06 [updatestate_depend_eliminate]: 5.99e-06 [updatestate_assign_eliminate]: 2.73998e-06 [updatestate_loads_eliminate]: 2.59999e-06 [renormalize]: 5.3001e-07 [cse]: 1.954e-05 [optimize_parallel_all_gather_comm]: 1.613e-05 [overlap_param_gather]: 2.06998e-06 [cconv]: 2.527e-05 [loop_unroll]: 0.00043471 [opt_after_cconv]: 9.876e-05, [1] [Cycle 1]: 9.272e-05, [7] [c_1]: 2.653e-05 [parameter_eliminate]: 3.01001e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.60997e-06 [cse]: 1.723e-05 [renormalize]: 5.69999e-07 [remove_dup_value]: 1.486e-05 [tuple_transform]: 6.968e-05, [1] [Cycle 1]: 6.486e-05, [4] [d_1]: 3.798e-05 [none_parameter_eliminate]: 1.82001e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 6.51e-06 [partial_unused_args_eliminate]: 2.13998e-06 [add_recomputation]: 4.505e-05 [cse_after_recomputation]: 2.116e-05, [1] [Cycle 1]: 1.68e-05, [1] [cse]: 1.154e-05 [environ_conv]: 5.18002e-06 [swap_dp_allreduce_reducescatter]: 5.09e-06 [bias_add_comm_swap]: 2.91e-06 [label_micro_interleaved_index]: 4.77998e-06 [label_fine_grained_interleaved_index]: 3.08e-06 [merge_cast_opt]: 1.29998e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.94001e-06 [assign_add_opt]: 1.23002e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.43002e-06 [full_micro_interleaved_order_control]: 2.93998e-06 [reorder_send_recv_between_fp_bp]: 2.90002e-06 [comm_op_add_attrs]: 1.29003e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.32999e-06 [interleave_parallel_branches]: 1.13001e-06 [overlap_opt_shard_in_pipeline]: 1.38002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76003e-06 [control_data_broadcast_order]: 1.371e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 4.08999e-06 [overlap_recompute_and_grad_model_parallel]: 5.20999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.41998e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 4.01001e-06 [overlap_grad_flash_sp]: 1.869e-05 [begin_end_overlap_inline]: 5.89993e-07 [split_matmul_comm_elemetwise]: 2.49001e-06 [split_layernorm_comm]: 1.84998e-06 [handle_group_info]: 1.15999e-06 [symbol_engine_optimizer]: 7.242e-05, [1] [Cycle 1]: 6.798e-05, [6] [build]: 2.73998e-06 [elim_shapecalc]: 8.67998e-06 [elim_not_effective]: 1.231e-05 [opt_reshape]: 6.35002e-06 [fold_const_symbol]: 9.91998e-06 [renormalize]: 2.09984e-07 [detach_backward]: 2.09999e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 1.641e-05 [get_jit_bprop_graph]: 1.59e-06 [rewriter_after_jit_bprop_graph]: 4.84003e-06 [opt_after_jit_grad]: 0.00047171 [validate]: 3.733e-05 [backend_pass]: 1.05001e-06 [task_emit]: 0.00618011 [execute]: 8.02e-06 Sums bootstrap : 0.000518s : 3.07% type_inference : 0.005979s : 35.45% event_method : 0.000015s : 0.09% auto_monad : 0.000062s : 0.37% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000023s : 0.14% optimize.rewriter_before_opt_a : 0.000065s : 0.39% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.24% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000564s : 3.35% optimize.opt_a.with_stream_mark : 0.000025s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000154s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000024s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000586s : 3.47% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000047s : 0.28% optimize.opt_a.a_3 : 0.000077s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000516s : 3.06% optimize.opt_b.b_1 : 0.000112s : 0.66% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000020s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.15% optimize.loop_unroll : 0.000435s : 2.58% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.27% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.03% opt_after_jit_grad : 0.000472s : 2.80% validate : 0.000037s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006180s : 36.65% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000175 26 18.27% : 0.000032s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000002s : 2: substitution.fold_const_symbol 3.26% : 0.000006s : 3: substitution.graph_param_transform 64.27% : 0.000112s : 3: substitution.inline 1.96% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.62% : 0.000005s : 4: substitution.remove_not_recompute_node 2.31% : 0.000004s : 2: substitution.replace_old_param 5.18% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005930 2 89.81% : 0.005326s : 1: type_inference.infer 10.19% : 0.000604s : 1: type_inference.specialize ------[replace.] 0.000037 4 80.15% : 0.000030s : 3: replace.inline 19.85% : 0.000007s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 4 93.01% : 0.000110s : 3: match.inline 6.99% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 883 0.95% : 0.000001s : 9: predicate.accumulaten_eliminater 0.85% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 6: predicate.addn_check_dump 0.87% : 0.000001s : 9: predicate.addn_zero_filter 0.86% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.23% : 0.000003s : 15: predicate.arithmetic_simplify 0.88% : 0.000001s : 9: predicate.cast_eliminate 0.66% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.66% : 0.000001s : 6: predicate.depend_value_elim 0.91% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.04% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_depend_swap 1.87% : 0.000003s : 18: predicate.environ_get_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 13: predicate.float_depend_g_call 0.57% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.70% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.57% : 0.000001s : 6: predicate.incorporate_call_switch 6.37% : 0.000010s : 40: predicate.inline 0.95% : 0.000001s : 6: predicate.inline_without_move 0.37% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.00% : 0.000002s : 6: predicate.less_batch_normalization 1.78% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 25: predicate.load_eliminater 1.09% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.12% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 9: predicate.minmaximum_grad 1.17% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.64% : 0.000003s : 13: predicate.partial_defer_inline 1.48% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.16% : 0.000002s : 9: predicate.reduce_eliminate 2.39% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 6: predicate.remove_not_recompute_node 1.25% : 0.000002s : 16: predicate.replace_applicator 0.68% : 0.000001s : 6: predicate.replace_old_param 0.25% : 0.000000s : 3: predicate.reset_defer_inline 0.92% : 0.000001s : 9: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.80% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.77% : 0.000001s : 6: predicate.shard_identity_eliminate 0.76% : 0.000001s : 6: predicate.special_op_eliminate 0.91% : 0.000001s : 6: predicate.specialize_transform 0.99% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.73% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 13: predicate.switch_defer_inline 1.97% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.98% : 0.000008s : 43: predicate.switch_simplify 0.94% : 0.000001s : 9: predicate.tile_eliminate 0.91% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.49% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.54% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000003s : 21: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.68% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000364 8 45.40% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.60% : 0.000199s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030358 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.65% : 0.003232s : 1: add_attr 10.62% : 0.003223s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000067s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.84% : 0.000560s : 1: bootstrap 0.09% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000022s : 1: event_method 0.04% : 0.000014s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.46% : 0.000443s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.73% : 0.000526s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.11% : 0.000944s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.73% : 0.002347s : 1: opt_a 0.34% : 0.000102s : 1: opt_after_cconv 1.59% : 0.000482s : 1: opt_after_jit_grad 0.65% : 0.000198s : 1: opt_b 14.26% : 0.004331s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000006s : 1: pipeline_split 0.11% : 0.000034s : 1: pre_auto_parallel 0.09% : 0.000027s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 1.11% : 0.000338s : 1: renormalize.infer 0.79% : 0.000240s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000038s : 1: rewriter_after_opt_a 0.23% : 0.000070s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000075s : 1: symbol_engine_optimizer 20.40% : 0.006192s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 19.75% : 0.005997s : 1: type_inference 0.23% : 0.000069s : 1: validate TotalTime = 0.0411869, [24] [bootstrap]: 0.00054898 [type_inference]: 0.0120144 [event_method]: 4.956e-05 [auto_monad]: 0.00013823 [graph_reusing]: 9.44998e-06 [inline]: 2.37999e-06 [add_attr]: 0.00329625, [1] [add_attr_with_inline]: 0.00328713, [1] [Cycle 1]: 8.312e-05, [2] [tag_attr]: 3.52e-05 [meta_addattr_fg_expand]: 9.91e-06 [parallel-infer-symbol]: 3.72998e-06 [pre_auto_parallel]: 5.208e-05 [insert-virtual-dataset]: 2.73998e-06 [parallel-infer-symbol-second]: 9.20001e-07 [dataset_repeat_opt]: 2.31e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.0175832, [53] [py_interpret_to_execute]: 4.296e-05 [rewriter_before_opt_a]: 0.00015874 [opt_a]: 0.0152841, [3] [Cycle 1]: 0.0117064, [45] [expand_dump_flag]: 4.12e-06 [switch_simplify]: 7.813e-05 [loop_unroll]: 6.39e-05 [a_1]: 0.0014831 [with_stream_mark]: 2.841e-05 [recompute_prepare]: 2.319e-05 [updatestate_depend_eliminate]: 9.10001e-06 [updatestate_assign_eliminate]: 7.78999e-06 [updatestate_loads_eliminate]: 6.91999e-06 [parameter_eliminate]: 2.47001e-06 [a_2]: 0.00024759 [accelerated_algorithm]: 3.309e-05 [shard]: 1.82999e-06 [meta_shard_fg_expand]: 3.46999e-06 [shard_inline]: 1.605e-05 [merge_send_recv]: 1.712e-05 [auto_parallel]: 1.096e-05 [parallel]: 2.002e-05 [flash_sp]: 1.268e-05 [merge_comm]: 9.92001e-06 [allreduce_fusion]: 9.25999e-06 [matmul_add_comm_reduction]: 2.947e-05 [allreduce_slice_to_reducescatter]: 1.05999e-06 [virtual_shard_identity]: 1.82e-05 [virtual_dataset]: 1.607e-05 [get_grad_eliminate_]: 1.533e-05 [virtual_output]: 1.53e-05 [merge_forward]: 8.80001e-06 [cell_reuse_recompute_pass]: 1.62001e-06 [offload_activation]: 1.854e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.062e-05 [merge_recompute_call_nodes]: 1.71002e-06 [before_grad]: 2.951e-05 [set_forward_comm_id_for_comm_node_pass]: 9.59e-06 [meta_fg_expand]: 0.00164632 [flash_sp_send_recv_attached]: 4.01001e-06 [receive_attached]: 2.33002e-06 [after_resolve]: 6.667e-05 [a_after_grad]: 8.98e-05 [renormalize]: 0.00666191 [add_forward_monad_depend]: 9.81e-06 [auto_monad_grad]: 6.34999e-06 [auto_monad_eliminator]: 8.363e-05 [cse]: 0.00018953 [a_3]: 0.00034526 [Cycle 2]: 0.00281867, [45] [expand_dump_flag]: 2.53e-06 [switch_simplify]: 4.611e-05 [loop_unroll]: 4.245e-05 [a_1]: 0.00134716 [with_stream_mark]: 1.402e-05 [recompute_prepare]: 9.18002e-06 [updatestate_depend_eliminate]: 4.45999e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 3.36999e-06 [parameter_eliminate]: 1.84998e-06 [a_2]: 9.119e-05 [accelerated_algorithm]: 1.187e-05 [shard]: 1.83002e-06 [meta_shard_fg_expand]: 2.19001e-06 [shard_inline]: 7.00002e-06 [merge_send_recv]: 8.18001e-06 [auto_parallel]: 7.54002e-06 [parallel]: 7.25e-06 [flash_sp]: 4.08001e-06 [merge_comm]: 4.26001e-06 [allreduce_fusion]: 3.80998e-06 [matmul_add_comm_reduction]: 8.19002e-06 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 8.48999e-06 [virtual_dataset]: 6.75002e-06 [get_grad_eliminate_]: 6.39001e-06 [virtual_output]: 6.07001e-06 [merge_forward]: 3.97e-06 [cell_reuse_recompute_pass]: 8.50006e-07 [offload_activation]: 9.08002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.343e-05 [merge_recompute_call_nodes]: 1.14e-06 [before_grad]: 1.161e-05 [set_forward_comm_id_for_comm_node_pass]: 4.33999e-06 [meta_fg_expand]: 8.988e-05 [flash_sp_send_recv_attached]: 1.10999e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.271e-05 [a_after_grad]: 1.036e-05 [renormalize]: 0.00064757 [add_forward_monad_depend]: 4.97999e-06 [auto_monad_grad]: 1.90001e-06 [auto_monad_eliminator]: 1.341e-05 [cse]: 2.542e-05 [a_3]: 4.889e-05 [Cycle 3]: 0.00074233, [45] [expand_dump_flag]: 1.25999e-06 [switch_simplify]: 8.13001e-06 [loop_unroll]: 6.71e-06 [a_1]: 0.00015046 [with_stream_mark]: 9.52999e-06 [recompute_prepare]: 6.81001e-06 [updatestate_depend_eliminate]: 3.7e-06 [updatestate_assign_eliminate]: 3.04001e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 8.737e-05 [accelerated_algorithm]: 1.027e-05 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 7.11001e-06 [merge_send_recv]: 5.69e-06 [auto_parallel]: 6.14001e-06 [parallel]: 5.30001e-06 [flash_sp]: 1.20001e-06 [merge_comm]: 3.85e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 6.51e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 7.84002e-06 [virtual_dataset]: 6.66e-06 [get_grad_eliminate_]: 6.15002e-06 [virtual_output]: 6.11e-06 [merge_forward]: 3.23e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 7.38e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.271e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 1.11e-05 [set_forward_comm_id_for_comm_node_pass]: 3.88999e-06 [meta_fg_expand]: 4.491e-05 [flash_sp_send_recv_attached]: 9.29984e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.014e-05 [a_after_grad]: 1.005e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.42e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 8.27998e-06 [cse]: 1.808e-05 [a_3]: 4.142e-05 [py_interpret_to_execute_after_opt_a]: 1.278e-05 [slice_cell_reuse_recomputed_activation]: 1.92999e-06 [rewriter_after_opt_a]: 4.348e-05 [convert_after_rewriter]: 7.11999e-06 [order_py_execute_after_rewriter]: 5.32999e-06 [mutable_eliminate]: 0.00059498 [opt_b]: 0.00022309, [1] [Cycle 1]: 0.00021556, [7] [b_1]: 0.00013745 [b_2]: 8.65001e-06 [updatestate_depend_eliminate]: 6.46999e-06 [updatestate_assign_eliminate]: 2.86e-06 [updatestate_loads_eliminate]: 2.59999e-06 [renormalize]: 5.79981e-07 [cse]: 2.098e-05 [optimize_parallel_all_gather_comm]: 1.824e-05 [overlap_param_gather]: 1.99999e-06 [cconv]: 2.347e-05 [loop_unroll]: 0.00045136 [opt_after_cconv]: 0.00011187, [1] [Cycle 1]: 0.0001057, [7] [c_1]: 3.421e-05 [parameter_eliminate]: 2.33002e-06 [updatestate_depend_eliminate]: 6.17999e-06 [updatestate_assign_eliminate]: 3.37002e-06 [updatestate_loads_eliminate]: 3.05998e-06 [cse]: 2.074e-05 [renormalize]: 2.40019e-07 [remove_dup_value]: 1.673e-05 [tuple_transform]: 8.056e-05, [1] [Cycle 1]: 7.605e-05, [4] [d_1]: 4.742e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 7.56001e-06 [partial_unused_args_eliminate]: 1.76e-06 [add_recomputation]: 5.381e-05 [cse_after_recomputation]: 2.477e-05, [1] [Cycle 1]: 2.003e-05, [1] [cse]: 1.438e-05 [environ_conv]: 8.55001e-06 [swap_dp_allreduce_reducescatter]: 6.31e-06 [bias_add_comm_swap]: 2.71e-06 [label_micro_interleaved_index]: 4.74998e-06 [label_fine_grained_interleaved_index]: 3.60998e-06 [merge_cast_opt]: 1.55999e-06 [slice_recompute_activation]: 2.22999e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.45001e-06 [ForceFp32Comm]: 1.05001e-06 [remove_cast_before_assign_add]: 1.43002e-06 [full_micro_interleaved_order_control]: 2.24001e-06 [reorder_send_recv_between_fp_bp]: 3.05998e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 1.07998e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.34998e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.429e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.49998e-06 [overlap_recompute_and_grad_model_parallel]: 5.19e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.55999e-06 [overlap_recompute_comm]: 2.48002e-06 [overlap_grad_ring_attention]: 4.55999e-06 [overlap_grad_flash_sp]: 2.255e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.68e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 8.729e-05, [1] [Cycle 1]: 8.297e-05, [6] [build]: 9.72001e-06 [elim_shapecalc]: 1.04e-05 [elim_not_effective]: 1.495e-05 [opt_reshape]: 7.18998e-06 [fold_const_symbol]: 1.157e-05 [renormalize]: 2.30008e-07 [detach_backward]: 2.48e-06 [pipeline_parallel_scheduler]: 1.64998e-06 [auto_monad_reorder]: 2.216e-05 [get_jit_bprop_graph]: 1.76e-06 [rewriter_after_jit_bprop_graph]: 3.65998e-06 [opt_after_jit_grad]: 0.00047954 [validate]: 4.394e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00668257 [execute]: 8.25e-06 Sums bootstrap : 0.000549s : 1.50% type_inference : 0.012014s : 32.89% event_method : 0.000050s : 0.14% auto_monad : 0.000138s : 0.38% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000052s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000043s : 0.12% optimize.rewriter_before_opt_a : 0.000159s : 0.43% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.36% optimize.opt_a.loop_unroll : 0.000113s : 0.31% optimize.opt_a.a_1 : 0.002981s : 8.16% optimize.opt_a.with_stream_mark : 0.000052s : 0.14% optimize.opt_a.recompute_prepare : 0.000039s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000426s : 1.17% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.15% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000030s : 0.08% optimize.opt_a.merge_send_recv : 0.000031s : 0.08% optimize.opt_a.auto_parallel : 0.000025s : 0.07% optimize.opt_a.parallel : 0.000033s : 0.09% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000018s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000035s : 0.09% optimize.opt_a.virtual_dataset : 0.000029s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.08% optimize.opt_a.virtual_output : 0.000027s : 0.08% optimize.opt_a.merge_forward : 0.000016s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000057s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000052s : 0.14% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.05% optimize.opt_a.meta_fg_expand : 0.001781s : 4.88% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000090s : 0.25% optimize.opt_a.a_after_grad : 0.000110s : 0.30% optimize.opt_a.renormalize : 0.007310s : 20.01% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.04% optimize.opt_a.auto_monad_grad : 0.000009s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000105s : 0.29% optimize.opt_a.cse : 0.000233s : 0.64% optimize.opt_a.a_3 : 0.000436s : 1.19% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000043s : 0.12% optimize.convert_after_rewriter : 0.000007s : 0.02% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000595s : 1.63% optimize.opt_b.b_1 : 0.000137s : 0.38% optimize.opt_b.b_2 : 0.000009s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000021s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.06% optimize.loop_unroll : 0.000451s : 1.24% optimize.opt_after_cconv.c_1 : 0.000034s : 0.09% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000021s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000017s : 0.05% optimize.tuple_transform.d_1 : 0.000047s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000054s : 0.15% optimize.cse_after_recomputation.cse : 0.000014s : 0.04% optimize.environ_conv : 0.000009s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000004s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000023s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.06% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000480s : 1.31% validate : 0.000044s : 0.12% backend_pass : 0.000001s : 0.00% task_emit : 0.006683s : 18.29% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000751 161 7.34% : 0.000055s : 8: substitution.arithmetic_simplify 0.35% : 0.000003s : 3: substitution.elim_not_effective 0.56% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.23% : 0.000002s : 3: substitution.fold_const_symbol 0.91% : 0.000007s : 4: substitution.graph_param_transform 0.43% : 0.000003s : 2: substitution.incorporate_call 0.30% : 0.000002s : 2: substitution.incorporate_call_switch 58.03% : 0.000436s : 17: substitution.inline 2.28% : 0.000017s : 2: substitution.inline_without_move 1.39% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.35% : 0.000018s : 3: substitution.less_batch_normalization 1.44% : 0.000011s : 7: substitution.minmaximum_grad 0.79% : 0.000006s : 5: substitution.partial_eliminate 1.74% : 0.000013s : 15: substitution.remove_not_recompute_node 3.87% : 0.000029s : 10: substitution.replace_applicator 1.31% : 0.000010s : 10: substitution.replace_old_param 0.43% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.97% : 0.000022s : 7: substitution.tuple_list_convert_item_index_to_positive 1.38% : 0.000010s : 7: substitution.tuple_list_get_item_const_eliminator 1.99% : 0.000015s : 7: substitution.tuple_list_get_item_depend_reorder 7.43% : 0.000056s : 19: substitution.tuple_list_get_item_eliminator 1.94% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011929 2 86.53% : 0.010323s : 1: type_inference.infer 13.47% : 0.001606s : 1: type_inference.specialize ------[replace.] 0.000202 27 64.38% : 0.000130s : 17: replace.inline 35.62% : 0.000072s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000454 27 93.94% : 0.000426s : 17: match.inline 6.06% : 0.000027s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000704 4248 1.17% : 0.000008s : 53: predicate.accumulaten_eliminater 0.23% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.45% : 0.000003s : 21: predicate.addn_check_dump 1.16% : 0.000008s : 53: predicate.addn_zero_filter 1.10% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 2.00% : 0.000014s : 74: predicate.arithmetic_simplify 1.17% : 0.000008s : 53: predicate.cast_eliminate 1.10% : 0.000008s : 50: predicate.check_bprop_eliminate 0.45% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.48% : 0.000003s : 21: predicate.depend_value_elim 1.18% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.25% : 0.000009s : 53: predicate.dict_get_item_eliminator 1.17% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000001s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.19% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_depend_swap 1.67% : 0.000012s : 78: predicate.environ_get_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.82% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.52% : 0.000018s : 80: predicate.float_depend_g_call 0.47% : 0.000003s : 21: predicate.float_environ_get_switch 0.57% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.07% : 0.000000s : 4: predicate.fold_const_symbol 0.50% : 0.000004s : 21: predicate.get_grad_eliminate 0.08% : 0.000001s : 4: predicate.graph_param_transform 0.52% : 0.000004s : 21: predicate.incorporate_call 0.45% : 0.000003s : 21: predicate.incorporate_call_switch 6.04% : 0.000043s : 183: predicate.inline 1.40% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 21: predicate.less_batch_normalization 1.54% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.65% : 0.000019s : 124: predicate.load_eliminater 0.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.51% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.36% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.47% : 0.000003s : 21: predicate.merge_addn 1.08% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.07% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 53: predicate.minmaximum_grad 0.29% : 0.000002s : 4: predicate.mutable_eliminate 0.10% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 2.12% : 0.000015s : 80: predicate.partial_defer_inline 1.72% : 0.000012s : 67: predicate.partial_eliminate 1.11% : 0.000008s : 53: predicate.print_const_string_wrapper 0.47% : 0.000003s : 21: predicate.reduce_all_const_elim 1.35% : 0.000009s : 53: predicate.reduce_eliminate 2.60% : 0.000018s : 124: predicate.redundant_stop_gradient_eliminater 0.30% : 0.000002s : 21: predicate.remove_not_recompute_node 1.85% : 0.000013s : 113: predicate.replace_applicator 0.75% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.16% : 0.000008s : 53: predicate.reshape_eliminate 1.11% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.11% : 0.000001s : 4: predicate.row_tensor_eliminate 1.29% : 0.000009s : 50: predicate.same_eliminate 0.36% : 0.000003s : 21: predicate.set_cell_output_no_recompute 0.58% : 0.000004s : 21: predicate.shard_identity_eliminate 0.23% : 0.000002s : 8: predicate.special_op_eliminate 0.60% : 0.000004s : 21: predicate.specialize_transform 1.24% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.23% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.95% : 0.000014s : 80: predicate.switch_defer_inline 3.01% : 0.000021s : 130: predicate.switch_layer_defer_inline 5.23% : 0.000037s : 218: predicate.switch_simplify 1.12% : 0.000008s : 53: predicate.tile_eliminate 1.10% : 0.000008s : 53: predicate.transpose_eliminate 1.48% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000009s : 61: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000020s : 92: predicate.tuple_list_get_item_eliminator 1.47% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.57% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.61% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.13% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.12% : 0.000001s : 4: predicate.value_based_eliminate 0.51% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.50% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.20% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001774 36 60.85% : 0.001080s : 15: func_graph_cloner_run.FuncGraphClonerGraph 39.15% : 0.000695s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.074070 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.46% : 0.003301s : 1: add_attr 4.44% : 0.003291s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000058s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000146s : 1: auto_monad 0.04% : 0.000026s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.79% : 0.000588s : 1: bootstrap 0.04% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000028s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000058s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000014s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000007s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.62% : 0.000460s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.82% : 0.000604s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 6.08% : 0.004500s : 117: opt.transform.opt_a 0.04% : 0.000032s : 1: opt.transform.opt_after_cconv 0.03% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000117s : 28: opt.transform.opt_b 0.07% : 0.000053s : 2: opt.transform.opt_trans_graph 0.05% : 0.000041s : 4: opt.transform.symbol_engine_opt 20.64% : 0.015288s : 1: opt_a 0.16% : 0.000115s : 1: opt_after_cconv 0.66% : 0.000489s : 1: opt_after_jit_grad 0.31% : 0.000227s : 1: opt_b 23.75% : 0.017588s : 1: optimize 0.03% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000057s : 1: pre_auto_parallel 0.06% : 0.000047s : 1: py_interpret_to_execute 0.02% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000020s : 1: remove_dup_value 7.76% : 0.005751s : 2: renormalize.infer 2.08% : 0.001544s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000048s : 1: rewriter_after_opt_a 0.22% : 0.000163s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000090s : 1: symbol_engine_optimizer 9.04% : 0.006697s : 1: task_emit 0.11% : 0.000084s : 1: tuple_transform 16.25% : 0.012036s : 1: type_inference 0.10% : 0.000076s : 1: validate TotalTime = 0.0211415, [24] [bootstrap]: 0.00057398 [type_inference]: 0.00607983 [event_method]: 1.34e-05 [auto_monad]: 6.118e-05 [graph_reusing]: 5.57001e-06 [inline]: 1.83997e-06 [add_attr]: 0.00319027, [1] [add_attr_with_inline]: 0.00318138, [1] [Cycle 1]: 5.934e-05, [2] [tag_attr]: 1.465e-05 [meta_addattr_fg_expand]: 4.24002e-06 [parallel-infer-symbol]: 3.04001e-06 [pre_auto_parallel]: 2.61e-05 [insert-virtual-dataset]: 2.69001e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.68e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00404549, [53] [py_interpret_to_execute]: 2.062e-05 [rewriter_before_opt_a]: 5.271e-05 [opt_a]: 0.00211883, [2] [Cycle 1]: 0.00146499, [45] [expand_dump_flag]: 3.04999e-06 [switch_simplify]: 2.966e-05 [loop_unroll]: 1.734e-05 [a_1]: 0.00036076 [with_stream_mark]: 1.559e-05 [recompute_prepare]: 8.25e-06 [updatestate_depend_eliminate]: 3.88001e-06 [updatestate_assign_eliminate]: 3.35998e-06 [updatestate_loads_eliminate]: 3.06001e-06 [parameter_eliminate]: 2.31e-06 [a_2]: 8.146e-05 [accelerated_algorithm]: 6.94001e-06 [shard]: 2.29001e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 6.24999e-06 [merge_send_recv]: 8.54e-06 [auto_parallel]: 6.21e-06 [parallel]: 1.873e-05 [flash_sp]: 8.15999e-06 [merge_comm]: 4.12e-06 [allreduce_fusion]: 3.61999e-06 [matmul_add_comm_reduction]: 9.81e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 7.38e-06 [virtual_dataset]: 6.09001e-06 [get_grad_eliminate_]: 5.57999e-06 [virtual_output]: 5.76998e-06 [merge_forward]: 3.97998e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 1.014e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.266e-05 [merge_recompute_call_nodes]: 1.92001e-06 [before_grad]: 1.011e-05 [set_forward_comm_id_for_comm_node_pass]: 3.73001e-06 [meta_fg_expand]: 2.79999e-06 [flash_sp_send_recv_attached]: 2.61999e-06 [receive_attached]: 2.24001e-06 [after_resolve]: 9.93998e-06 [a_after_grad]: 9.04998e-06 [renormalize]: 0.00043608 [add_forward_monad_depend]: 4.99003e-06 [auto_monad_grad]: 2.44999e-06 [auto_monad_eliminator]: 1.354e-05 [cse]: 2.882e-05 [a_3]: 4.164e-05 [Cycle 2]: 0.00064289, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 7e-06 [loop_unroll]: 5.79e-06 [a_1]: 0.00011229 [with_stream_mark]: 1.015e-05 [recompute_prepare]: 6.24001e-06 [updatestate_depend_eliminate]: 3.01001e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.63998e-06 [parameter_eliminate]: 1.04998e-06 [a_2]: 7.183e-05 [accelerated_algorithm]: 5.92001e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.97999e-06 [merge_send_recv]: 4.46002e-06 [auto_parallel]: 5.19003e-06 [parallel]: 4.52998e-06 [flash_sp]: 3.5e-06 [merge_comm]: 3.14999e-06 [allreduce_fusion]: 2.88e-06 [matmul_add_comm_reduction]: 5.37001e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 6.38e-06 [virtual_dataset]: 5.47999e-06 [get_grad_eliminate_]: 5.29e-06 [virtual_output]: 5.25999e-06 [merge_forward]: 2.71999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 6.16998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.096e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.79998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.68e-06 [meta_fg_expand]: 2.06003e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.17e-06 [after_resolve]: 9.07001e-06 [a_after_grad]: 8.1e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.59e-06 [auto_monad_grad]: 1.01002e-06 [auto_monad_eliminator]: 7.10998e-06 [cse]: 1.33e-05 [a_3]: 3.29e-05 [py_interpret_to_execute_after_opt_a]: 8.40001e-06 [slice_cell_reuse_recomputed_activation]: 2.26e-06 [rewriter_after_opt_a]: 3.464e-05 [convert_after_rewriter]: 6.44001e-06 [order_py_execute_after_rewriter]: 5.52001e-06 [mutable_eliminate]: 0.0004931 [opt_b]: 0.0001888, [1] [Cycle 1]: 0.00018198, [7] [b_1]: 0.0001115 [b_2]: 6.88e-06 [updatestate_depend_eliminate]: 5.61e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.46e-06 [renormalize]: 4.30009e-07 [cse]: 1.78e-05 [optimize_parallel_all_gather_comm]: 1.644e-05 [overlap_param_gather]: 2.08998e-06 [cconv]: 2.352e-05 [loop_unroll]: 0.00042655 [opt_after_cconv]: 9.842e-05, [1] [Cycle 1]: 9.167e-05, [7] [c_1]: 2.587e-05 [parameter_eliminate]: 2.56e-06 [updatestate_depend_eliminate]: 5.36002e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.714e-05 [renormalize]: 6.69999e-07 [remove_dup_value]: 1.543e-05 [tuple_transform]: 6.839e-05, [1] [Cycle 1]: 6.374e-05, [4] [d_1]: 3.712e-05 [none_parameter_eliminate]: 1.67999e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.41998e-06 [partial_unused_args_eliminate]: 2.09999e-06 [add_recomputation]: 4.563e-05 [cse_after_recomputation]: 2.225e-05, [1] [Cycle 1]: 1.722e-05, [1] [cse]: 1.127e-05 [environ_conv]: 5.57999e-06 [swap_dp_allreduce_reducescatter]: 5.21998e-06 [bias_add_comm_swap]: 2.73e-06 [label_micro_interleaved_index]: 4.61002e-06 [label_fine_grained_interleaved_index]: 2.62001e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.03997e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.11997e-06 [full_micro_interleaved_order_control]: 2.54001e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 1.18001e-06 [add_comm_op_reuse_tag]: 1.20001e-06 [interleave_split_concat_branches]: 1.32e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.45999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.222e-05 [grouped_pairwise_exchange_alltoall]: 2.07001e-06 [offloading_packed_experts]: 4.13001e-06 [overlap_recompute_and_grad_model_parallel]: 5.02e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.41002e-06 [overlap_recompute_comm]: 2.93e-06 [overlap_grad_ring_attention]: 4.38001e-06 [overlap_grad_flash_sp]: 1.834e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.46998e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 1.15001e-06 [symbol_engine_optimizer]: 7.261e-05, [1] [Cycle 1]: 6.805e-05, [6] [build]: 2.51e-06 [elim_shapecalc]: 8.75001e-06 [elim_not_effective]: 1.195e-05 [opt_reshape]: 6.32001e-06 [fold_const_symbol]: 9.44e-06 [renormalize]: 4.09986e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 2.02999e-06 [auto_monad_reorder]: 1.615e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.81999e-06 [opt_after_jit_grad]: 0.00046796 [validate]: 3.572e-05 [backend_pass]: 1.19e-06 [task_emit]: 0.00637877 [execute]: 7.24001e-06 Sums bootstrap : 0.000574s : 3.40% type_inference : 0.006080s : 35.99% event_method : 0.000013s : 0.08% auto_monad : 0.000061s : 0.36% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000003s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000053s : 0.31% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000037s : 0.22% optimize.opt_a.loop_unroll : 0.000023s : 0.14% optimize.opt_a.a_1 : 0.000473s : 2.80% optimize.opt_a.with_stream_mark : 0.000026s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000153s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000436s : 2.58% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000042s : 0.25% optimize.opt_a.a_3 : 0.000075s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000035s : 0.21% optimize.convert_after_rewriter : 0.000006s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000493s : 2.92% optimize.opt_b.b_1 : 0.000112s : 0.66% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.14% optimize.loop_unroll : 0.000427s : 2.53% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000468s : 2.77% validate : 0.000036s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006379s : 37.76% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000146 24 19.95% : 0.000029s : 4: substitution.arithmetic_simplify 1.26% : 0.000002s : 2: substitution.elim_not_effective 0.97% : 0.000001s : 2: substitution.fold_const_symbol 3.63% : 0.000005s : 3: substitution.graph_param_transform 66.21% : 0.000097s : 3: substitution.inline 2.19% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.64% : 0.000005s : 4: substitution.remove_not_recompute_node 2.14% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006032 2 92.16% : 0.005560s : 1: type_inference.infer 7.84% : 0.000473s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000095 3 100.00% : 0.000095s : 3: match.inline ------[predicate.] 0.000146 815 0.86% : 0.000001s : 8: predicate.accumulaten_eliminater 0.96% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 6: predicate.addn_check_dump 0.87% : 0.000001s : 8: predicate.addn_zero_filter 0.81% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.40% : 0.000004s : 14: predicate.arithmetic_simplify 1.01% : 0.000001s : 8: predicate.cast_eliminate 0.69% : 0.000001s : 6: predicate.check_bprop_eliminate 0.63% : 0.000001s : 6: predicate.compare_switch_simplify 0.25% : 0.000000s : 3: predicate.const_output_eliminate 0.69% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.19% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.46% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_depend_swap 1.75% : 0.000003s : 17: predicate.environ_get_eliminate 1.11% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.17% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 6: predicate.float_environ_get_switch 0.88% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.71% : 0.000001s : 6: predicate.incorporate_call 0.63% : 0.000001s : 6: predicate.incorporate_call_switch 6.28% : 0.000009s : 37: predicate.inline 1.03% : 0.000002s : 6: predicate.inline_without_move 0.46% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 6: predicate.less_batch_normalization 1.54% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.31% : 0.000003s : 22: predicate.load_eliminater 1.14% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.04% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.66% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 6: predicate.merge_addn 0.63% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.11% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.42% : 0.000001s : 3: predicate.parallel_virtual_node 1.42% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 11: predicate.partial_eliminate 0.88% : 0.000001s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.23% : 0.000002s : 8: predicate.reduce_eliminate 2.21% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.87% : 0.000001s : 6: predicate.remove_not_recompute_node 1.19% : 0.000002s : 14: predicate.replace_applicator 0.69% : 0.000001s : 6: predicate.replace_old_param 0.25% : 0.000000s : 3: predicate.reset_defer_inline 0.86% : 0.000001s : 8: predicate.reshape_eliminate 0.66% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.84% : 0.000001s : 6: predicate.same_eliminate 0.50% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 6: predicate.shard_identity_eliminate 0.93% : 0.000001s : 6: predicate.special_op_eliminate 0.86% : 0.000001s : 6: predicate.specialize_transform 1.08% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.23% : 0.000002s : 11: predicate.switch_defer_inline 1.88% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.89% : 0.000007s : 38: predicate.switch_simplify 0.87% : 0.000001s : 8: predicate.tile_eliminate 0.85% : 0.000001s : 8: predicate.transpose_eliminate 1.54% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.49% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.69% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.17% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.74% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.73% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000299 7 38.36% : 0.000115s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.64% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029810 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.72% : 0.003195s : 1: add_attr 10.68% : 0.003185s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000066s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.06% : 0.000613s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.46% : 0.000436s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.69% : 0.000502s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.83% : 0.000843s : 78: opt.transform.opt_a 0.08% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.12% : 0.002122s : 1: opt_a 0.34% : 0.000102s : 1: opt_after_cconv 1.60% : 0.000477s : 1: opt_after_jit_grad 0.64% : 0.000192s : 1: opt_b 13.59% : 0.004050s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.76% : 0.000225s : 1: renormalize.infer 0.68% : 0.000203s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000038s : 1: rewriter_after_opt_a 0.19% : 0.000057s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000075s : 1: symbol_engine_optimizer 21.44% : 0.006390s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.47% : 0.006102s : 1: type_inference 0.22% : 0.000066s : 1: validate TotalTime = 0.0389686, [24] [bootstrap]: 0.00049927 [type_inference]: 0.0118092 [event_method]: 4.2e-05 [auto_monad]: 0.00012167 [graph_reusing]: 8.2e-06 [inline]: 1.65001e-06 [add_attr]: 0.00308839, [1] [add_attr_with_inline]: 0.00308052, [1] [Cycle 1]: 6.625e-05, [2] [tag_attr]: 3.028e-05 [meta_addattr_fg_expand]: 8.85999e-06 [parallel-infer-symbol]: 2.90998e-06 [pre_auto_parallel]: 4.591e-05 [insert-virtual-dataset]: 2.05002e-06 [parallel-infer-symbol-second]: 9.09989e-07 [dataset_repeat_opt]: 1.50999e-06 [pipeline_split]: 1.44e-06 [optimize]: 0.0163245, [53] [py_interpret_to_execute]: 3.781e-05 [rewriter_before_opt_a]: 0.00014064 [opt_a]: 0.014274, [3] [Cycle 1]: 0.0108399, [45] [expand_dump_flag]: 3.33e-06 [switch_simplify]: 6.994e-05 [loop_unroll]: 6.089e-05 [a_1]: 0.00142963 [with_stream_mark]: 2.29e-05 [recompute_prepare]: 2.205e-05 [updatestate_depend_eliminate]: 8.56002e-06 [updatestate_assign_eliminate]: 6.85002e-06 [updatestate_loads_eliminate]: 7.16999e-06 [parameter_eliminate]: 2.19001e-06 [a_2]: 0.00024597 [accelerated_algorithm]: 3.24e-05 [shard]: 1.54e-06 [meta_shard_fg_expand]: 3.7e-06 [shard_inline]: 1.625e-05 [merge_send_recv]: 1.447e-05 [auto_parallel]: 1.034e-05 [parallel]: 1.447e-05 [flash_sp]: 1.07e-05 [merge_comm]: 9.56e-06 [allreduce_fusion]: 8.92e-06 [matmul_add_comm_reduction]: 2.543e-05 [allreduce_slice_to_reducescatter]: 8.79983e-07 [virtual_shard_identity]: 1.746e-05 [virtual_dataset]: 1.569e-05 [get_grad_eliminate_]: 1.538e-05 [virtual_output]: 1.516e-05 [merge_forward]: 9.07001e-06 [cell_reuse_recompute_pass]: 8.70001e-07 [offload_activation]: 1.696e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.04e-05 [merge_recompute_call_nodes]: 1.26002e-06 [before_grad]: 2.833e-05 [set_forward_comm_id_for_comm_node_pass]: 1.013e-05 [meta_fg_expand]: 0.00149018 [flash_sp_send_recv_attached]: 3.88999e-06 [receive_attached]: 1.72001e-06 [after_resolve]: 6.472e-05 [a_after_grad]: 9.004e-05 [renormalize]: 0.00614468 [add_forward_monad_depend]: 8.62998e-06 [auto_monad_grad]: 5.14e-06 [auto_monad_eliminator]: 4.789e-05 [cse]: 0.00016648 [a_3]: 0.00033394 [Cycle 2]: 0.00273082, [45] [expand_dump_flag]: 1.96e-06 [switch_simplify]: 4.631e-05 [loop_unroll]: 4.264e-05 [a_1]: 0.0013389 [with_stream_mark]: 1.19e-05 [recompute_prepare]: 9.12999e-06 [updatestate_depend_eliminate]: 3.9e-06 [updatestate_assign_eliminate]: 2.94001e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 8.899e-05 [accelerated_algorithm]: 1.08e-05 [shard]: 1.52001e-06 [meta_shard_fg_expand]: 2.03002e-06 [shard_inline]: 6.98e-06 [merge_send_recv]: 6.23e-06 [auto_parallel]: 6.83e-06 [parallel]: 5.53002e-06 [flash_sp]: 2.66e-06 [merge_comm]: 4.07e-06 [allreduce_fusion]: 3.76001e-06 [matmul_add_comm_reduction]: 6.09001e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 8.02e-06 [virtual_dataset]: 6.52001e-06 [get_grad_eliminate_]: 6.33998e-06 [virtual_output]: 6.04999e-06 [merge_forward]: 3.38999e-06 [cell_reuse_recompute_pass]: 7.50006e-07 [offload_activation]: 7.7e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.279e-05 [merge_recompute_call_nodes]: 9.50007e-07 [before_grad]: 1.103e-05 [set_forward_comm_id_for_comm_node_pass]: 3.85e-06 [meta_fg_expand]: 5.65e-05 [flash_sp_send_recv_attached]: 9.10019e-07 [receive_attached]: 1.14998e-06 [after_resolve]: 1.143e-05 [a_after_grad]: 1.059e-05 [renormalize]: 0.00064708 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 1.14e-06 [auto_monad_eliminator]: 1.191e-05 [cse]: 2.199e-05 [a_3]: 4.978e-05 [Cycle 3]: 0.00068924, [45] [expand_dump_flag]: 8.79983e-07 [switch_simplify]: 8.82999e-06 [loop_unroll]: 6.73998e-06 [a_1]: 0.00015375 [with_stream_mark]: 8.41002e-06 [recompute_prepare]: 7.02997e-06 [updatestate_depend_eliminate]: 3.84002e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 8.665e-05 [accelerated_algorithm]: 9.62001e-06 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.33002e-06 [shard_inline]: 6.92002e-06 [merge_send_recv]: 5.24998e-06 [auto_parallel]: 6.02999e-06 [parallel]: 4.94e-06 [flash_sp]: 1.05999e-06 [merge_comm]: 3.68999e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 5.67001e-06 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 7.41001e-06 [virtual_dataset]: 6.46999e-06 [get_grad_eliminate_]: 6.31e-06 [virtual_output]: 6.40002e-06 [merge_forward]: 2.94999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 7.43e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.308e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.088e-05 [set_forward_comm_id_for_comm_node_pass]: 3.66999e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.10001e-06 [a_after_grad]: 9.66998e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.21002e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 7.34002e-06 [cse]: 1.602e-05 [a_3]: 3.956e-05 [py_interpret_to_execute_after_opt_a]: 1.021e-05 [slice_cell_reuse_recomputed_activation]: 1.49e-06 [rewriter_after_opt_a]: 4.002e-05 [convert_after_rewriter]: 7.1e-06 [order_py_execute_after_rewriter]: 5.56e-06 [mutable_eliminate]: 0.00048101 [opt_b]: 0.00021757, [1] [Cycle 1]: 0.00021149, [7] [b_1]: 0.00013486 [b_2]: 8.62998e-06 [updatestate_depend_eliminate]: 6.21e-06 [updatestate_assign_eliminate]: 2.94999e-06 [updatestate_loads_eliminate]: 2.56e-06 [renormalize]: 6.00005e-07 [cse]: 2.047e-05 [optimize_parallel_all_gather_comm]: 1.334e-05 [overlap_param_gather]: 1.57999e-06 [cconv]: 1.588e-05 [loop_unroll]: 0.00042801 [opt_after_cconv]: 0.00010771, [1] [Cycle 1]: 0.00010177, [7] [c_1]: 3.248e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 5.79e-06 [updatestate_assign_eliminate]: 2.99001e-06 [updatestate_loads_eliminate]: 2.70002e-06 [cse]: 2.01e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 9.74999e-06 [tuple_transform]: 8.182e-05, [1] [Cycle 1]: 7.226e-05, [4] [d_1]: 4.432e-05 [none_parameter_eliminate]: 1.29e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 7.79002e-06 [partial_unused_args_eliminate]: 1.37999e-06 [add_recomputation]: 4.066e-05 [cse_after_recomputation]: 2.608e-05, [1] [Cycle 1]: 2.119e-05, [1] [cse]: 1.537e-05 [environ_conv]: 5.92001e-06 [swap_dp_allreduce_reducescatter]: 4.89998e-06 [bias_add_comm_swap]: 1.72001e-06 [label_micro_interleaved_index]: 3.04999e-06 [label_fine_grained_interleaved_index]: 1.57999e-06 [merge_cast_opt]: 8.90024e-07 [slice_recompute_activation]: 1.59e-06 [micro_interleaved_order_control]: 1.32999e-06 [assign_add_opt]: 8.09989e-07 [ForceFp32Comm]: 4.09986e-07 [remove_cast_before_assign_add]: 4.89992e-07 [full_micro_interleaved_order_control]: 1.49e-06 [reorder_send_recv_between_fp_bp]: 1.41002e-06 [comm_op_add_attrs]: 6.70028e-07 [add_comm_op_reuse_tag]: 5.00004e-07 [interleave_split_concat_branches]: 9.5999e-07 [interleave_parallel_branches]: 9.39996e-07 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.12999e-06 [control_data_broadcast_order]: 1.202e-05 [grouped_pairwise_exchange_alltoall]: 7.89994e-07 [offloading_packed_experts]: 3.38999e-06 [overlap_recompute_and_grad_model_parallel]: 4.15e-06 [overlap_grad_matmul_and_grad_allreduce]: 8.2e-07 [overlap_recompute_allgather_and_fa_grad]: 1.06002e-06 [overlap_recompute_comm]: 1.91998e-06 [overlap_grad_ring_attention]: 3.83001e-06 [overlap_grad_flash_sp]: 1.634e-05 [begin_end_overlap_inline]: 3.50003e-07 [split_matmul_comm_elemetwise]: 1.17e-06 [split_layernorm_comm]: 1.12999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 8.307e-05, [1] [Cycle 1]: 7.867e-05, [6] [build]: 4.92999e-06 [elim_shapecalc]: 1.112e-05 [elim_not_effective]: 1.493e-05 [opt_reshape]: 7.46999e-06 [fold_const_symbol]: 1.159e-05 [renormalize]: 2.60014e-07 [detach_backward]: 1.32999e-06 [pipeline_parallel_scheduler]: 1.17999e-06 [auto_monad_reorder]: 1.48e-05 [get_jit_bprop_graph]: 1.30001e-06 [rewriter_after_jit_bprop_graph]: 3.46999e-06 [opt_after_jit_grad]: 0.00046687 [validate]: 3.577e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00628112 [execute]: 8.57e-06 Sums bootstrap : 0.000499s : 1.44% type_inference : 0.011809s : 34.14% event_method : 0.000042s : 0.12% auto_monad : 0.000122s : 0.35% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000030s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.13% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.11% optimize.rewriter_before_opt_a : 0.000141s : 0.41% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000125s : 0.36% optimize.opt_a.loop_unroll : 0.000110s : 0.32% optimize.opt_a.a_1 : 0.002922s : 8.45% optimize.opt_a.with_stream_mark : 0.000043s : 0.12% optimize.opt_a.recompute_prepare : 0.000038s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000012s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.04% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000422s : 1.22% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.15% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000030s : 0.09% optimize.opt_a.merge_send_recv : 0.000026s : 0.08% optimize.opt_a.auto_parallel : 0.000023s : 0.07% optimize.opt_a.parallel : 0.000025s : 0.07% optimize.opt_a.flash_sp : 0.000014s : 0.04% optimize.opt_a.merge_comm : 0.000017s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000037s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000033s : 0.10% optimize.opt_a.virtual_dataset : 0.000029s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.08% optimize.opt_a.virtual_output : 0.000028s : 0.08% optimize.opt_a.merge_forward : 0.000015s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000032s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000056s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000050s : 0.15% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.05% optimize.opt_a.meta_fg_expand : 0.001549s : 4.48% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000085s : 0.25% optimize.opt_a.a_after_grad : 0.000110s : 0.32% optimize.opt_a.renormalize : 0.006792s : 19.63% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000067s : 0.19% optimize.opt_a.cse : 0.000204s : 0.59% optimize.opt_a.a_3 : 0.000423s : 1.22% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000001s : 0.00% optimize.rewriter_after_opt_a : 0.000040s : 0.12% optimize.convert_after_rewriter : 0.000007s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000481s : 1.39% optimize.opt_b.b_1 : 0.000135s : 0.39% optimize.opt_b.b_2 : 0.000009s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000020s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000013s : 0.04% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000016s : 0.05% optimize.loop_unroll : 0.000428s : 1.24% optimize.opt_after_cconv.c_1 : 0.000032s : 0.09% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000020s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000010s : 0.03% optimize.tuple_transform.d_1 : 0.000044s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.02% optimize.partial_unused_args_eliminate : 0.000001s : 0.00% optimize.add_recomputation : 0.000041s : 0.12% optimize.cse_after_recomputation.cse : 0.000015s : 0.04% optimize.environ_conv : 0.000006s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000003s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000001s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000000s : 0.00% optimize.remove_cast_before_assign_add : 0.000000s : 0.00% optimize.full_micro_interleaved_order_control : 0.000001s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000001s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.03% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.05% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000001s : 0.00% optimize.split_layernorm_comm : 0.000001s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000005s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.04% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000467s : 1.35% validate : 0.000036s : 0.10% backend_pass : 0.000001s : 0.00% task_emit : 0.006281s : 18.16% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000686 159 6.69% : 0.000046s : 7: substitution.arithmetic_simplify 0.30% : 0.000002s : 3: substitution.elim_not_effective 0.70% : 0.000005s : 5: substitution.float_depend_g_call 0.64% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 3: substitution.fold_const_symbol 0.71% : 0.000005s : 4: substitution.graph_param_transform 0.41% : 0.000003s : 2: substitution.incorporate_call 0.32% : 0.000002s : 2: substitution.incorporate_call_switch 57.50% : 0.000394s : 17: substitution.inline 2.70% : 0.000019s : 2: substitution.inline_without_move 1.40% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.44% : 0.000017s : 3: substitution.less_batch_normalization 1.59% : 0.000011s : 7: substitution.minmaximum_grad 0.88% : 0.000006s : 5: substitution.partial_eliminate 1.81% : 0.000012s : 15: substitution.remove_not_recompute_node 3.83% : 0.000026s : 10: substitution.replace_applicator 1.32% : 0.000009s : 10: substitution.replace_old_param 0.38% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.04% : 0.000021s : 7: substitution.tuple_list_convert_item_index_to_positive 1.58% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 2.09% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.34% : 0.000050s : 18: substitution.tuple_list_get_item_eliminator 2.09% : 0.000014s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011739 2 87.88% : 0.010317s : 1: type_inference.infer 12.12% : 0.001423s : 1: type_inference.specialize ------[replace.] 0.000189 26 66.15% : 0.000125s : 17: replace.inline 33.85% : 0.000064s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000410 26 94.03% : 0.000385s : 17: match.inline 5.97% : 0.000024s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000681 4180 1.12% : 0.000008s : 52: predicate.accumulaten_eliminater 0.22% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.14% : 0.000008s : 52: predicate.addn_zero_filter 1.09% : 0.000007s : 52: predicate.adjust_all_reduce_mul_add 1.96% : 0.000013s : 73: predicate.arithmetic_simplify 1.16% : 0.000008s : 52: predicate.cast_eliminate 1.10% : 0.000008s : 50: predicate.check_bprop_eliminate 0.47% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.46% : 0.000003s : 21: predicate.depend_value_elim 1.17% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.20% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.16% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.20% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.18% : 0.000008s : 56: predicate.environ_get_depend_swap 1.67% : 0.000011s : 77: predicate.environ_get_eliminate 1.20% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.85% : 0.000013s : 78: predicate.exchange_switch_depend_value 2.47% : 0.000017s : 78: predicate.float_depend_g_call 0.46% : 0.000003s : 21: predicate.float_environ_get_switch 0.59% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.07% : 0.000000s : 4: predicate.fold_const_symbol 0.52% : 0.000004s : 21: predicate.get_grad_eliminate 0.08% : 0.000001s : 4: predicate.graph_param_transform 0.50% : 0.000003s : 21: predicate.incorporate_call 0.46% : 0.000003s : 21: predicate.incorporate_call_switch 5.85% : 0.000040s : 180: predicate.inline 1.50% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.65% : 0.000004s : 21: predicate.less_batch_normalization 1.64% : 0.000011s : 69: predicate.list_to_tuple_eliminator_ 2.66% : 0.000018s : 121: predicate.load_eliminater 0.31% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.54% : 0.000017s : 110: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 60: predicate.make_slice_get_slice_eliminator 0.49% : 0.000003s : 21: predicate.merge_addn 1.11% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 52: predicate.minmaximum_grad 0.32% : 0.000002s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.10% : 0.000001s : 4: predicate.parallel_virtual_node 2.11% : 0.000014s : 78: predicate.partial_defer_inline 1.70% : 0.000012s : 65: predicate.partial_eliminate 1.09% : 0.000007s : 52: predicate.print_const_string_wrapper 0.47% : 0.000003s : 21: predicate.reduce_all_const_elim 1.36% : 0.000009s : 52: predicate.reduce_eliminate 2.64% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 21: predicate.remove_not_recompute_node 1.87% : 0.000013s : 111: predicate.replace_applicator 0.67% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.12% : 0.000008s : 52: predicate.reshape_eliminate 1.11% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.11% : 0.000001s : 4: predicate.row_tensor_eliminate 1.29% : 0.000009s : 50: predicate.same_eliminate 0.34% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.59% : 0.000004s : 21: predicate.shard_identity_eliminate 0.22% : 0.000002s : 8: predicate.special_op_eliminate 0.62% : 0.000004s : 21: predicate.specialize_transform 1.22% : 0.000008s : 50: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.98% : 0.000013s : 78: predicate.switch_defer_inline 3.04% : 0.000021s : 128: predicate.switch_layer_defer_inline 5.32% : 0.000036s : 213: predicate.switch_simplify 1.14% : 0.000008s : 52: predicate.tile_eliminate 1.12% : 0.000008s : 52: predicate.transpose_eliminate 1.44% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000010s : 60: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.74% : 0.000019s : 90: predicate.tuple_list_get_item_eliminator 1.46% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000014s : 81: predicate.tuple_list_set_item_eliminator 1.53% : 0.000010s : 69: predicate.tuple_to_list_eliminator_ 2.64% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.17% : 0.000022s : 142: predicate.updatestate_useless_node_eliminater 0.12% : 0.000001s : 4: predicate.value_based_eliminate 0.51% : 0.000003s : 21: predicate.virtual_dataset_eliminate 0.49% : 0.000003s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001674 35 59.59% : 0.000998s : 14: func_graph_cloner_run.FuncGraphClonerGraph 40.41% : 0.000677s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.069784 237 0.00% : 0.000003s : 1: ForceFp32Comm 4.43% : 0.003093s : 1: add_attr 4.42% : 0.003084s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.06% : 0.000045s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.18% : 0.000129s : 1: auto_monad 0.03% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.75% : 0.000526s : 1: bootstrap 0.03% : 0.000020s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000029s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.07% : 0.000050s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000004s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000006s : 1: label_micro_interleaved_index 0.63% : 0.000437s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000004s : 1: micro_interleaved_order_control 0.70% : 0.000490s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 6.32% : 0.004411s : 117: opt.transform.opt_a 0.04% : 0.000031s : 1: opt.transform.opt_after_cconv 0.03% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.17% : 0.000116s : 28: opt.transform.opt_b 0.07% : 0.000050s : 2: opt.transform.opt_trans_graph 0.06% : 0.000041s : 4: opt.transform.symbol_engine_opt 20.46% : 0.014277s : 1: opt_a 0.16% : 0.000111s : 1: opt_after_cconv 0.68% : 0.000476s : 1: opt_after_jit_grad 0.32% : 0.000221s : 1: opt_b 23.40% : 0.016329s : 1: optimize 0.02% : 0.000017s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.07% : 0.000051s : 1: pre_auto_parallel 0.06% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.02% : 0.000013s : 1: remove_dup_value 7.49% : 0.005230s : 2: renormalize.infer 2.22% : 0.001547s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000044s : 1: rewriter_after_opt_a 0.21% : 0.000145s : 1: rewriter_before_opt_a 0.01% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000004s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000004s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000086s : 1: symbol_engine_optimizer 9.02% : 0.006294s : 1: task_emit 0.12% : 0.000085s : 1: tuple_transform 16.95% : 0.011826s : 1: type_inference 0.09% : 0.000064s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x3-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x3-kbk],max_mem:8.0M TotalTime = 0.0662096, [24] [bootstrap]: 0.00070658 [type_inference]: 0.00695457 [event_method]: 1.412e-05 [auto_monad]: 6.244e-05 [graph_reusing]: 5.90002e-06 [inline]: 2.00002e-06 [add_attr]: 0.00376457, [1] [add_attr_with_inline]: 0.00375254, [1] [Cycle 1]: 5.6e-05, [2] [tag_attr]: 1.703e-05 [meta_addattr_fg_expand]: 4.56002e-06 [parallel-infer-symbol]: 3.42002e-06 [pre_auto_parallel]: 2.902e-05 [insert-virtual-dataset]: 2.81999e-06 [parallel-infer-symbol-second]: 1.10001e-06 [dataset_repeat_opt]: 2.19001e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.00441258, [53] [py_interpret_to_execute]: 2.363e-05 [rewriter_before_opt_a]: 6.574e-05 [opt_a]: 0.00232337, [2] [Cycle 1]: 0.00168442, [45] [expand_dump_flag]: 2.99001e-06 [switch_simplify]: 3.428e-05 [loop_unroll]: 2.094e-05 [a_1]: 0.00045402 [with_stream_mark]: 1.594e-05 [recompute_prepare]: 8.63001e-06 [updatestate_depend_eliminate]: 3.85e-06 [updatestate_assign_eliminate]: 3.40998e-06 [updatestate_loads_eliminate]: 3.01001e-06 [parameter_eliminate]: 2.08002e-06 [a_2]: 8.17e-05 [accelerated_algorithm]: 7.43999e-06 [shard]: 2.49999e-06 [meta_shard_fg_expand]: 2.02999e-06 [shard_inline]: 6.26e-06 [merge_send_recv]: 8.34002e-06 [auto_parallel]: 8.10999e-06 [parallel]: 2.655e-05 [flash_sp]: 9.07001e-06 [merge_comm]: 4.54002e-06 [allreduce_fusion]: 3.98001e-06 [matmul_add_comm_reduction]: 8.65999e-06 [allreduce_slice_to_reducescatter]: 8.89995e-07 [virtual_shard_identity]: 8.2e-06 [virtual_dataset]: 6.39001e-06 [get_grad_eliminate_]: 6.03998e-06 [virtual_output]: 5.99e-06 [merge_forward]: 4.25e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 1.016e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.216e-05 [merge_recompute_call_nodes]: 1.57999e-06 [before_grad]: 1.019e-05 [set_forward_comm_id_for_comm_node_pass]: 3.93001e-06 [meta_fg_expand]: 2.63e-06 [flash_sp_send_recv_attached]: 2.67001e-06 [receive_attached]: 2.43e-06 [after_resolve]: 9.90002e-06 [a_after_grad]: 8.70001e-06 [renormalize]: 0.00051797 [add_forward_monad_depend]: 8.97e-06 [auto_monad_grad]: 2.22999e-06 [auto_monad_eliminator]: 1.548e-05 [cse]: 3.04e-05 [a_3]: 4.42e-05 [Cycle 2]: 0.00062818, [45] [expand_dump_flag]: 1.60999e-06 [switch_simplify]: 7.26001e-06 [loop_unroll]: 5.80002e-06 [a_1]: 0.00011662 [with_stream_mark]: 1.181e-05 [recompute_prepare]: 6.07999e-06 [updatestate_depend_eliminate]: 3.47002e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 7.349e-05 [accelerated_algorithm]: 5.53002e-06 [shard]: 1.23002e-06 [meta_shard_fg_expand]: 1.43002e-06 [shard_inline]: 5.67999e-06 [merge_send_recv]: 4.53999e-06 [auto_parallel]: 6.29001e-06 [parallel]: 4.95001e-06 [flash_sp]: 3.33e-06 [merge_comm]: 3.24001e-06 [allreduce_fusion]: 2.99999e-06 [matmul_add_comm_reduction]: 6.21998e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 6.87002e-06 [virtual_dataset]: 5.98002e-06 [get_grad_eliminate_]: 5.30999e-06 [virtual_output]: 5.26002e-06 [merge_forward]: 3.35e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 6.74001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.156e-05 [merge_recompute_call_nodes]: 8.60018e-07 [before_grad]: 9.39e-06 [set_forward_comm_id_for_comm_node_pass]: 3.81999e-06 [meta_fg_expand]: 2.06998e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.41002e-06 [after_resolve]: 8.64e-06 [a_after_grad]: 7.85e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.64e-06 [auto_monad_grad]: 1.10001e-06 [auto_monad_eliminator]: 8.3e-06 [cse]: 1.553e-05 [a_3]: 3.418e-05 [py_interpret_to_execute_after_opt_a]: 9.92999e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 3.753e-05 [convert_after_rewriter]: 7.39002e-06 [order_py_execute_after_rewriter]: 5.39998e-06 [mutable_eliminate]: 0.00052327 [opt_b]: 0.00019461, [1] [Cycle 1]: 0.0001875, [7] [b_1]: 0.00011278 [b_2]: 6.97002e-06 [updatestate_depend_eliminate]: 7.00998e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 3.30008e-07 [cse]: 1.919e-05 [optimize_parallel_all_gather_comm]: 1.858e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 2.43e-05 [loop_unroll]: 0.00050537 [opt_after_cconv]: 0.00010299, [1] [Cycle 1]: 9.65e-05, [7] [c_1]: 2.665e-05 [parameter_eliminate]: 3.31001e-06 [updatestate_depend_eliminate]: 5.56e-06 [updatestate_assign_eliminate]: 2.89001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.795e-05 [renormalize]: 5.89993e-07 [remove_dup_value]: 1.525e-05 [tuple_transform]: 6.835e-05, [1] [Cycle 1]: 6.351e-05, [4] [d_1]: 3.762e-05 [none_parameter_eliminate]: 1.46998e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.28e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 5.229e-05 [cse_after_recomputation]: 2.147e-05, [1] [Cycle 1]: 1.712e-05, [1] [cse]: 1.161e-05 [environ_conv]: 8.23999e-06 [swap_dp_allreduce_reducescatter]: 5.72999e-06 [bias_add_comm_swap]: 2.66999e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.79999e-06 [merge_cast_opt]: 1.26002e-06 [slice_recompute_activation]: 2.11998e-06 [micro_interleaved_order_control]: 2.48998e-06 [assign_add_opt]: 1.33002e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 3.08e-06 [comm_op_add_attrs]: 1.27999e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.12999e-06 [overlap_opt_shard_in_pipeline]: 1.32999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.75001e-06 [control_data_broadcast_order]: 1.341e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.50999e-06 [overlap_recompute_and_grad_model_parallel]: 5.27999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.41998e-06 [overlap_recompute_comm]: 2.63003e-06 [overlap_grad_ring_attention]: 5.08002e-06 [overlap_grad_flash_sp]: 1.897e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.48998e-06 [split_layernorm_comm]: 1.83002e-06 [handle_group_info]: 1.23002e-06 [symbol_engine_optimizer]: 7.461e-05, [1] [Cycle 1]: 6.971e-05, [6] [build]: 3.04001e-06 [elim_shapecalc]: 1.045e-05 [elim_not_effective]: 1.21e-05 [opt_reshape]: 6.28e-06 [fold_const_symbol]: 9.27001e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.88997e-06 [pipeline_parallel_scheduler]: 1.92001e-06 [auto_monad_reorder]: 1.821e-05 [get_jit_bprop_graph]: 1.36998e-06 [rewriter_after_jit_bprop_graph]: 3.78999e-06 [opt_after_jit_grad]: 0.0005008 [validate]: 3.962e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.0494438 [execute]: 9.99001e-06 Sums bootstrap : 0.000707s : 1.15% type_inference : 0.006955s : 11.33% event_method : 0.000014s : 0.02% auto_monad : 0.000062s : 0.10% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000029s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.04% optimize.rewriter_before_opt_a : 0.000066s : 0.11% optimize.opt_a.expand_dump_flag : 0.000005s : 0.01% optimize.opt_a.switch_simplify : 0.000042s : 0.07% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000571s : 0.93% optimize.opt_a.with_stream_mark : 0.000028s : 0.05% optimize.opt_a.recompute_prepare : 0.000015s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000155s : 0.25% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000014s : 0.02% optimize.opt_a.parallel : 0.000032s : 0.05% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000008s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000518s : 0.84% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.02% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.04% optimize.opt_a.cse : 0.000046s : 0.07% optimize.opt_a.a_3 : 0.000078s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000038s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000523s : 0.85% optimize.opt_b.b_1 : 0.000113s : 0.18% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.04% optimize.loop_unroll : 0.000505s : 0.82% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.09% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000019s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000501s : 0.82% validate : 0.000040s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.049444s : 80.54% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000180 26 19.24% : 0.000035s : 5: substitution.arithmetic_simplify 1.02% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.06% : 0.000006s : 3: substitution.graph_param_transform 64.47% : 0.000116s : 3: substitution.inline 1.79% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.74% : 0.000005s : 4: substitution.remove_not_recompute_node 1.82% : 0.000003s : 2: substitution.replace_old_param 5.05% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006896 2 90.56% : 0.006245s : 1: type_inference.infer 9.44% : 0.000651s : 1: type_inference.specialize ------[replace.] 0.000037 4 77.99% : 0.000029s : 3: replace.inline 22.01% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 4 93.10% : 0.000113s : 3: match.inline 6.90% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000166 883 0.88% : 0.000001s : 9: predicate.accumulaten_eliminater 1.34% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.53% : 0.000001s : 6: predicate.addn_check_dump 1.05% : 0.000002s : 9: predicate.addn_zero_filter 0.80% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.28% : 0.000004s : 15: predicate.arithmetic_simplify 0.88% : 0.000001s : 9: predicate.cast_eliminate 0.74% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.58% : 0.000001s : 6: predicate.depend_value_elim 0.85% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.63% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 3: predicate.elim_not_effective 0.64% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 12: predicate.environ_get_depend_swap 1.68% : 0.000003s : 18: predicate.environ_get_eliminate 1.06% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 13: predicate.float_depend_g_call 0.52% : 0.000001s : 6: predicate.float_environ_get_switch 0.79% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.66% : 0.000001s : 6: predicate.get_grad_eliminate 0.33% : 0.000001s : 3: predicate.graph_param_transform 0.63% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.37% : 0.000011s : 40: predicate.inline 0.88% : 0.000001s : 6: predicate.inline_without_move 0.38% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.98% : 0.000002s : 6: predicate.less_batch_normalization 1.68% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.24% : 0.000004s : 25: predicate.load_eliminater 1.78% : 0.000003s : 3: predicate.loop_unroll_after_grad 2.00% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.54% : 0.000001s : 6: predicate.merge_addn 0.57% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 9: predicate.minmaximum_grad 1.53% : 0.000003s : 3: predicate.mutable_eliminate 0.34% : 0.000001s : 3: predicate.opt_reshape 0.43% : 0.000001s : 3: predicate.parallel_virtual_node 1.64% : 0.000003s : 13: predicate.partial_defer_inline 1.40% : 0.000002s : 13: predicate.partial_eliminate 0.85% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 6: predicate.reduce_all_const_elim 1.08% : 0.000002s : 9: predicate.reduce_eliminate 2.30% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.66% : 0.000001s : 6: predicate.remove_not_recompute_node 1.36% : 0.000002s : 16: predicate.replace_applicator 0.60% : 0.000001s : 6: predicate.replace_old_param 0.40% : 0.000001s : 3: predicate.reset_defer_inline 0.88% : 0.000001s : 9: predicate.reshape_eliminate 0.62% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 3: predicate.row_tensor_eliminate 0.77% : 0.000001s : 6: predicate.same_eliminate 0.57% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 6: predicate.shard_identity_eliminate 0.74% : 0.000001s : 6: predicate.special_op_eliminate 0.74% : 0.000001s : 6: predicate.specialize_transform 0.95% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.70% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 13: predicate.switch_defer_inline 1.89% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 43: predicate.switch_simplify 0.86% : 0.000001s : 9: predicate.tile_eliminate 0.89% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 2.99% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.60% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.23% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 3: predicate.value_based_eliminate 0.65% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.65% : 0.000001s : 6: predicate.virtual_output_eliminate 0.28% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000398 8 47.46% : 0.000189s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.54% : 0.000209s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.076009 196 0.00% : 0.000004s : 1: ForceFp32Comm 4.96% : 0.003769s : 1: add_attr 4.94% : 0.003756s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000068s : 1: auto_monad 0.03% : 0.000022s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.97% : 0.000740s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.68% : 0.000515s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.70% : 0.000533s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 1.26% : 0.000955s : 78: opt.transform.opt_a 0.03% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000092s : 28: opt.transform.opt_b 0.06% : 0.000042s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 3.06% : 0.002326s : 1: opt_a 0.14% : 0.000106s : 1: opt_after_cconv 0.67% : 0.000511s : 1: opt_after_jit_grad 0.26% : 0.000198s : 1: opt_b 5.81% : 0.004416s : 1: optimize 0.03% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000034s : 1: pre_auto_parallel 0.04% : 0.000028s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 0.36% : 0.000273s : 1: renormalize.infer 0.31% : 0.000237s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000042s : 1: rewriter_after_opt_a 0.09% : 0.000071s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.10% : 0.000077s : 1: symbol_engine_optimizer 65.08% : 0.049468s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 9.17% : 0.006973s : 1: type_inference 0.09% : 0.000066s : 1: validate TotalTime = 0.0641915, [24] [bootstrap]: 0.00039732 [type_inference]: 0.00586571 [event_method]: 1.283e-05 [auto_monad]: 6.126e-05 [graph_reusing]: 5.48002e-06 [inline]: 1.74e-06 [add_attr]: 0.00302637, [1] [add_attr_with_inline]: 0.00301839, [1] [Cycle 1]: 4.729e-05, [2] [tag_attr]: 1.359e-05 [meta_addattr_fg_expand]: 3.95e-06 [parallel-infer-symbol]: 3.11999e-06 [pre_auto_parallel]: 2.483e-05 [insert-virtual-dataset]: 2.49001e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.51998e-06 [pipeline_split]: 1.72001e-06 [optimize]: 0.00393563, [53] [py_interpret_to_execute]: 1.922e-05 [rewriter_before_opt_a]: 5.167e-05 [opt_a]: 0.00204155, [2] [Cycle 1]: 0.0014267, [45] [expand_dump_flag]: 3.09001e-06 [switch_simplify]: 2.814e-05 [loop_unroll]: 1.703e-05 [a_1]: 0.00035602 [with_stream_mark]: 1.478e-05 [recompute_prepare]: 7.8e-06 [updatestate_depend_eliminate]: 3.65e-06 [updatestate_assign_eliminate]: 3.54002e-06 [updatestate_loads_eliminate]: 3.35e-06 [parameter_eliminate]: 1.84e-06 [a_2]: 8.197e-05 [accelerated_algorithm]: 6.87002e-06 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 6.09001e-06 [merge_send_recv]: 8.84998e-06 [auto_parallel]: 6.81999e-06 [parallel]: 1.872e-05 [flash_sp]: 7.75998e-06 [merge_comm]: 3.90998e-06 [allreduce_fusion]: 3.69002e-06 [matmul_add_comm_reduction]: 9.57999e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 7.92e-06 [virtual_dataset]: 6.21998e-06 [get_grad_eliminate_]: 5.77001e-06 [virtual_output]: 5.78002e-06 [merge_forward]: 3.97e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 9.98002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.156e-05 [merge_recompute_call_nodes]: 1.55001e-06 [before_grad]: 9.97999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.85e-06 [meta_fg_expand]: 2.83e-06 [flash_sp_send_recv_attached]: 2.89001e-06 [receive_attached]: 3.00998e-06 [after_resolve]: 9.66e-06 [a_after_grad]: 8.90001e-06 [renormalize]: 0.00040531 [add_forward_monad_depend]: 4.56002e-06 [auto_monad_grad]: 1.80001e-06 [auto_monad_eliminator]: 1.345e-05 [cse]: 2.765e-05 [a_3]: 4.197e-05 [Cycle 2]: 0.00060579, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.88e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00011414 [with_stream_mark]: 1.111e-05 [recompute_prepare]: 5.92999e-06 [updatestate_depend_eliminate]: 2.92002e-06 [updatestate_assign_eliminate]: 2.33002e-06 [updatestate_loads_eliminate]: 2.82002e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 7.171e-05 [accelerated_algorithm]: 5.66e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.78002e-06 [merge_send_recv]: 4.52998e-06 [auto_parallel]: 5.34998e-06 [parallel]: 4.33999e-06 [flash_sp]: 3.81001e-06 [merge_comm]: 3.31999e-06 [allreduce_fusion]: 3.14999e-06 [matmul_add_comm_reduction]: 5.39e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.69999e-06 [virtual_dataset]: 5.72999e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.20999e-06 [merge_forward]: 2.55997e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 6.36e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.033e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 8.99e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 9.10019e-07 [after_resolve]: 8.53001e-06 [a_after_grad]: 8.06001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.37001e-06 [cse]: 1.295e-05 [a_3]: 3.341e-05 [py_interpret_to_execute_after_opt_a]: 7.01001e-06 [slice_cell_reuse_recomputed_activation]: 2.12999e-06 [rewriter_after_opt_a]: 3.271e-05 [convert_after_rewriter]: 6.58e-06 [order_py_execute_after_rewriter]: 5.49998e-06 [mutable_eliminate]: 0.00046949 [opt_b]: 0.00018915, [1] [Cycle 1]: 0.00018299, [7] [b_1]: 0.00011217 [b_2]: 7.16999e-06 [updatestate_depend_eliminate]: 5.47001e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.22001e-06 [renormalize]: 3.59985e-07 [cse]: 1.77e-05 [optimize_parallel_all_gather_comm]: 1.666e-05 [overlap_param_gather]: 1.95001e-06 [cconv]: 2.313e-05 [loop_unroll]: 0.00041802 [opt_after_cconv]: 9.557e-05, [1] [Cycle 1]: 8.996e-05, [7] [c_1]: 2.549e-05 [parameter_eliminate]: 2.42001e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.83e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.769e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.435e-05 [tuple_transform]: 6.909e-05, [1] [Cycle 1]: 6.438e-05, [4] [d_1]: 3.767e-05 [none_parameter_eliminate]: 1.43002e-06 [renormalize]: 5.00004e-07 [switch_simplify]: 6.49001e-06 [partial_unused_args_eliminate]: 1.92999e-06 [add_recomputation]: 4.567e-05 [cse_after_recomputation]: 2.14e-05, [1] [Cycle 1]: 1.687e-05, [1] [cse]: 1.141e-05 [environ_conv]: 5.69999e-06 [swap_dp_allreduce_reducescatter]: 5.30999e-06 [bias_add_comm_swap]: 2.42001e-06 [label_micro_interleaved_index]: 4.26001e-06 [label_fine_grained_interleaved_index]: 2.86999e-06 [merge_cast_opt]: 1.66002e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.86e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 1.07e-06 [remove_cast_before_assign_add]: 1.24998e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.25999e-06 [interleave_parallel_branches]: 1.32e-06 [overlap_opt_shard_in_pipeline]: 1.28002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79998e-06 [control_data_broadcast_order]: 1.244e-05 [grouped_pairwise_exchange_alltoall]: 1.76e-06 [offloading_packed_experts]: 3.91999e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39998e-06 [overlap_recompute_comm]: 2.49999e-06 [overlap_grad_ring_attention]: 4.48001e-06 [overlap_grad_flash_sp]: 1.88e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 1.38002e-06 [symbol_engine_optimizer]: 7.261e-05, [1] [Cycle 1]: 6.821e-05, [6] [build]: 2.29001e-06 [elim_shapecalc]: 8.96002e-06 [elim_not_effective]: 1.237e-05 [opt_reshape]: 6.46e-06 [fold_const_symbol]: 9.88998e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.65001e-06 [auto_monad_reorder]: 1.635e-05 [get_jit_bprop_graph]: 1.19998e-06 [rewriter_after_jit_bprop_graph]: 3.93001e-06 [opt_after_jit_grad]: 0.00045984 [validate]: 3.538e-05 [backend_pass]: 9.40025e-07 [task_emit]: 0.0501091 [execute]: 1.006e-05 Sums bootstrap : 0.000397s : 0.66% type_inference : 0.005866s : 9.75% event_method : 0.000013s : 0.02% auto_monad : 0.000061s : 0.10% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.03% optimize.rewriter_before_opt_a : 0.000052s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000035s : 0.06% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000470s : 0.78% optimize.opt_a.with_stream_mark : 0.000026s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000154s : 0.26% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.04% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000405s : 0.67% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.07% optimize.opt_a.a_3 : 0.000075s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000469s : 0.78% optimize.opt_b.b_1 : 0.000112s : 0.19% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.04% optimize.loop_unroll : 0.000418s : 0.70% optimize.opt_after_cconv.c_1 : 0.000025s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000001s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000019s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000460s : 0.76% validate : 0.000035s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.050109s : 83.31% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000144 24 20.08% : 0.000029s : 4: substitution.arithmetic_simplify 1.39% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 3.97% : 0.000006s : 3: substitution.graph_param_transform 66.21% : 0.000095s : 3: substitution.inline 2.18% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.08% : 0.000004s : 4: substitution.remove_not_recompute_node 2.12% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005820 2 91.85% : 0.005346s : 1: type_inference.infer 8.15% : 0.000474s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000094 3 100.00% : 0.000094s : 3: match.inline ------[predicate.] 0.000147 815 0.91% : 0.000001s : 8: predicate.accumulaten_eliminater 0.93% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 6: predicate.addn_check_dump 0.86% : 0.000001s : 8: predicate.addn_zero_filter 0.81% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.43% : 0.000004s : 14: predicate.arithmetic_simplify 0.88% : 0.000001s : 8: predicate.cast_eliminate 0.75% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.25% : 0.000000s : 3: predicate.const_output_eliminate 0.72% : 0.000001s : 6: predicate.depend_value_elim 0.84% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.98% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.19% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.15% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 11: predicate.environ_get_depend_swap 1.80% : 0.000003s : 17: predicate.environ_get_eliminate 1.11% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.17% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.26% : 0.000003s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.92% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.74% : 0.000001s : 6: predicate.incorporate_call 0.64% : 0.000001s : 6: predicate.incorporate_call_switch 6.39% : 0.000009s : 37: predicate.inline 0.98% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 6: predicate.less_batch_normalization 1.51% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 22: predicate.load_eliminater 1.14% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.00% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.66% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.86% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 8: predicate.minmaximum_grad 1.19% : 0.000002s : 3: predicate.mutable_eliminate 0.39% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.45% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 11: predicate.partial_eliminate 0.86% : 0.000001s : 8: predicate.print_const_string_wrapper 0.64% : 0.000001s : 6: predicate.reduce_all_const_elim 1.16% : 0.000002s : 8: predicate.reduce_eliminate 2.26% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.64% : 0.000001s : 6: predicate.remove_not_recompute_node 1.20% : 0.000002s : 14: predicate.replace_applicator 0.68% : 0.000001s : 6: predicate.replace_old_param 0.31% : 0.000000s : 3: predicate.reset_defer_inline 0.88% : 0.000001s : 8: predicate.reshape_eliminate 0.64% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 3: predicate.row_tensor_eliminate 0.88% : 0.000001s : 6: predicate.same_eliminate 0.56% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.77% : 0.000001s : 6: predicate.shard_identity_eliminate 0.77% : 0.000001s : 6: predicate.special_op_eliminate 0.87% : 0.000001s : 6: predicate.specialize_transform 1.00% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.27% : 0.000002s : 11: predicate.switch_defer_inline 1.89% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.85% : 0.000007s : 38: predicate.switch_simplify 0.88% : 0.000001s : 8: predicate.tile_eliminate 0.85% : 0.000001s : 8: predicate.transpose_eliminate 1.55% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.68% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.49% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.18% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.76% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000289 7 37.00% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.00% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.072536 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.18% : 0.003031s : 1: add_attr 4.17% : 0.003022s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000066s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.58% : 0.000421s : 1: bootstrap 0.04% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.03% : 0.000018s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.59% : 0.000427s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.66% : 0.000479s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.16% : 0.000838s : 78: opt.transform.opt_a 0.03% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000091s : 28: opt.transform.opt_b 0.06% : 0.000042s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 2.82% : 0.002045s : 1: opt_a 0.14% : 0.000099s : 1: opt_after_cconv 0.65% : 0.000469s : 1: opt_after_jit_grad 0.27% : 0.000192s : 1: opt_b 5.43% : 0.003940s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 0.28% : 0.000205s : 1: renormalize.infer 0.27% : 0.000193s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000037s : 1: rewriter_after_opt_a 0.08% : 0.000056s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.10% : 0.000075s : 1: symbol_engine_optimizer 69.12% : 0.050135s : 1: task_emit 0.10% : 0.000072s : 1: tuple_transform 8.11% : 0.005882s : 1: type_inference 0.08% : 0.000058s : 1: validate TotalTime = 0.0580058, [24] [bootstrap]: 0.00037067 [type_inference]: 0.00541143 [event_method]: 1.454e-05 [auto_monad]: 5.776e-05 [graph_reusing]: 5.95002e-06 [inline]: 2.17999e-06 [add_attr]: 0.00319062, [1] [add_attr_with_inline]: 0.00318255, [1] [Cycle 1]: 5.284e-05, [2] [tag_attr]: 1.569e-05 [meta_addattr_fg_expand]: 4.37e-06 [parallel-infer-symbol]: 3.16001e-06 [pre_auto_parallel]: 2.7e-05 [insert-virtual-dataset]: 2.86999e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.77999e-06 [optimize]: 0.00421763, [53] [py_interpret_to_execute]: 2.333e-05 [rewriter_before_opt_a]: 6.4e-05 [opt_a]: 0.00227716, [2] [Cycle 1]: 0.00159684, [45] [expand_dump_flag]: 3.03998e-06 [switch_simplify]: 3.312e-05 [loop_unroll]: 2.062e-05 [a_1]: 0.00044099 [with_stream_mark]: 1.446e-05 [recompute_prepare]: 8.2e-06 [updatestate_depend_eliminate]: 4.20999e-06 [updatestate_assign_eliminate]: 3.56999e-06 [updatestate_loads_eliminate]: 3.11001e-06 [parameter_eliminate]: 1.84998e-06 [a_2]: 8.224e-05 [accelerated_algorithm]: 6.96001e-06 [shard]: 2.05002e-06 [meta_shard_fg_expand]: 1.97001e-06 [shard_inline]: 6.21e-06 [merge_send_recv]: 8.82999e-06 [auto_parallel]: 6.21e-06 [parallel]: 1.801e-05 [flash_sp]: 8.23999e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.63e-06 [matmul_add_comm_reduction]: 9.12001e-06 [allreduce_slice_to_reducescatter]: 8.50006e-07 [virtual_shard_identity]: 8.58001e-06 [virtual_dataset]: 6.31998e-06 [get_grad_eliminate_]: 5.87001e-06 [virtual_output]: 6.06e-06 [merge_forward]: 3.95e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 9.71e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.277e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 1.09e-05 [set_forward_comm_id_for_comm_node_pass]: 3.71001e-06 [meta_fg_expand]: 2.91e-06 [flash_sp_send_recv_attached]: 2.64999e-06 [receive_attached]: 2.34999e-06 [after_resolve]: 9.47001e-06 [a_after_grad]: 9.15001e-06 [renormalize]: 0.00047195 [add_forward_monad_depend]: 5.32001e-06 [auto_monad_grad]: 2.34001e-06 [auto_monad_eliminator]: 1.46e-05 [cse]: 2.793e-05 [a_3]: 4.378e-05 [Cycle 2]: 0.00066949, [45] [expand_dump_flag]: 1.92999e-06 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.62999e-06 [a_1]: 0.00016558 [with_stream_mark]: 1.138e-05 [recompute_prepare]: 6.00002e-06 [updatestate_depend_eliminate]: 3.45998e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.66e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 7.228e-05 [accelerated_algorithm]: 6.04001e-06 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.32999e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 4.78001e-06 [auto_parallel]: 5.69e-06 [parallel]: 4.04002e-06 [flash_sp]: 3.33e-06 [merge_comm]: 3.21001e-06 [allreduce_fusion]: 2.99999e-06 [matmul_add_comm_reduction]: 5.71e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.54001e-06 [virtual_dataset]: 5.69e-06 [get_grad_eliminate_]: 5.27001e-06 [virtual_output]: 5.14e-06 [merge_forward]: 2.76e-06 [cell_reuse_recompute_pass]: 1.45001e-06 [offload_activation]: 6.36e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.081e-05 [merge_recompute_call_nodes]: 8.80013e-07 [before_grad]: 8.75999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 1.81e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 8.45001e-06 [a_after_grad]: 7.82e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.54e-06 [auto_monad_grad]: 1.34e-06 [auto_monad_eliminator]: 8.15e-06 [cse]: 1.459e-05 [a_3]: 3.375e-05 [py_interpret_to_execute_after_opt_a]: 8.73001e-06 [slice_cell_reuse_recomputed_activation]: 2.29001e-06 [rewriter_after_opt_a]: 3.469e-05 [convert_after_rewriter]: 6.49001e-06 [order_py_execute_after_rewriter]: 5.50001e-06 [mutable_eliminate]: 0.00047684 [opt_b]: 0.00019214, [1] [Cycle 1]: 0.00018535, [7] [b_1]: 0.00011273 [b_2]: 7.53999e-06 [updatestate_depend_eliminate]: 5.35999e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 3.00002e-07 [cse]: 1.9e-05 [optimize_parallel_all_gather_comm]: 1.705e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.493e-05 [loop_unroll]: 0.00043424 [opt_after_cconv]: 9.604e-05, [1] [Cycle 1]: 8.997e-05, [7] [c_1]: 2.534e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.73e-06 [cse]: 1.784e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.467e-05 [tuple_transform]: 6.98e-05, [1] [Cycle 1]: 6.495e-05, [4] [d_1]: 3.701e-05 [none_parameter_eliminate]: 1.58002e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.89999e-06 [partial_unused_args_eliminate]: 1.99999e-06 [add_recomputation]: 4.627e-05 [cse_after_recomputation]: 2.151e-05, [1] [Cycle 1]: 1.654e-05, [1] [cse]: 1.036e-05 [environ_conv]: 5.72001e-06 [swap_dp_allreduce_reducescatter]: 5.30001e-06 [bias_add_comm_swap]: 2.75002e-06 [label_micro_interleaved_index]: 4.77998e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.52999e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.35001e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.24e-06 [full_micro_interleaved_order_control]: 2.50002e-06 [reorder_send_recv_between_fp_bp]: 3.16001e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.30001e-06 [interleave_parallel_branches]: 1.31002e-06 [overlap_opt_shard_in_pipeline]: 1.57999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.379e-05 [grouped_pairwise_exchange_alltoall]: 1.56002e-06 [offloading_packed_experts]: 4.50001e-06 [overlap_recompute_and_grad_model_parallel]: 5.02e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.52999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.37999e-06 [overlap_grad_ring_attention]: 4.09002e-06 [overlap_grad_flash_sp]: 1.982e-05 [begin_end_overlap_inline]: 5.89993e-07 [split_matmul_comm_elemetwise]: 2.44999e-06 [split_layernorm_comm]: 2.11998e-06 [handle_group_info]: 1.30001e-06 [symbol_engine_optimizer]: 7.37e-05, [1] [Cycle 1]: 6.909e-05, [6] [build]: 2.76e-06 [elim_shapecalc]: 9.57001e-06 [elim_not_effective]: 1.195e-05 [opt_reshape]: 6.31e-06 [fold_const_symbol]: 9.49e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.87001e-06 [pipeline_parallel_scheduler]: 1.72001e-06 [auto_monad_reorder]: 1.581e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 4.38999e-06 [opt_after_jit_grad]: 0.00050162 [validate]: 3.628e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.0439191 [execute]: 9.72001e-06 Sums bootstrap : 0.000371s : 0.69% type_inference : 0.005411s : 10.06% event_method : 0.000015s : 0.03% auto_monad : 0.000058s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000027s : 0.05% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.04% optimize.rewriter_before_opt_a : 0.000064s : 0.12% optimize.opt_a.expand_dump_flag : 0.000005s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.07% optimize.opt_a.loop_unroll : 0.000026s : 0.05% optimize.opt_a.a_1 : 0.000607s : 1.13% optimize.opt_a.with_stream_mark : 0.000026s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000155s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000014s : 0.03% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.04% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000472s : 0.88% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.04% optimize.opt_a.cse : 0.000043s : 0.08% optimize.opt_a.a_3 : 0.000078s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.06% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000477s : 0.89% optimize.opt_b.b_1 : 0.000113s : 0.21% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.05% optimize.loop_unroll : 0.000434s : 0.81% optimize.opt_after_cconv.c_1 : 0.000025s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000037s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.09% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.03% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.04% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000502s : 0.93% validate : 0.000036s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.043919s : 81.64% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000171 26 19.20% : 0.000033s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000006s : 3: substitution.graph_param_transform 63.79% : 0.000109s : 3: substitution.inline 1.95% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.82% : 0.000005s : 4: substitution.remove_not_recompute_node 1.95% : 0.000003s : 2: substitution.replace_old_param 5.07% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005367 2 88.94% : 0.004773s : 1: type_inference.infer 11.06% : 0.000594s : 1: type_inference.specialize ------[replace.] 0.000037 4 78.55% : 0.000029s : 3: replace.inline 21.45% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 4 93.15% : 0.000107s : 3: match.inline 6.85% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 883 1.10% : 0.000002s : 9: predicate.accumulaten_eliminater 0.93% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.86% : 0.000001s : 9: predicate.addn_zero_filter 0.83% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 15: predicate.arithmetic_simplify 0.93% : 0.000001s : 9: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.60% : 0.000001s : 6: predicate.depend_value_elim 0.86% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.59% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_depend_swap 1.78% : 0.000003s : 18: predicate.environ_get_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.33% : 0.000004s : 13: predicate.float_depend_g_call 0.54% : 0.000001s : 6: predicate.float_environ_get_switch 0.82% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.69% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.57% : 0.000001s : 6: predicate.incorporate_call_switch 6.38% : 0.000010s : 40: predicate.inline 0.84% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.75% : 0.000001s : 6: predicate.less_batch_normalization 1.62% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 25: predicate.load_eliminater 1.49% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.05% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.56% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 9: predicate.minmaximum_grad 1.15% : 0.000002s : 3: predicate.mutable_eliminate 0.39% : 0.000001s : 3: predicate.opt_reshape 0.33% : 0.000001s : 3: predicate.parallel_virtual_node 1.65% : 0.000003s : 13: predicate.partial_defer_inline 1.42% : 0.000002s : 13: predicate.partial_eliminate 0.87% : 0.000001s : 9: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.10% : 0.000002s : 9: predicate.reduce_eliminate 2.41% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 6: predicate.remove_not_recompute_node 1.35% : 0.000002s : 16: predicate.replace_applicator 0.54% : 0.000001s : 6: predicate.replace_old_param 0.45% : 0.000001s : 3: predicate.reset_defer_inline 1.10% : 0.000002s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 3: predicate.row_tensor_eliminate 1.01% : 0.000002s : 6: predicate.same_eliminate 0.48% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 6: predicate.shard_identity_eliminate 0.70% : 0.000001s : 6: predicate.special_op_eliminate 0.84% : 0.000001s : 6: predicate.specialize_transform 0.87% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.71% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 13: predicate.switch_defer_inline 1.93% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 43: predicate.switch_simplify 0.92% : 0.000001s : 9: predicate.tile_eliminate 0.88% : 0.000001s : 9: predicate.transpose_eliminate 1.44% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.48% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 21: predicate.tuple_list_set_item_eliminator 1.55% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.26% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.49% : 0.000001s : 3: predicate.value_based_eliminate 0.71% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 6: predicate.virtual_output_eliminate 0.28% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000334 8 41.99% : 0.000140s : 3: func_graph_cloner_run.FuncGraphClonerGraph 58.01% : 0.000194s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.067028 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.77% : 0.003195s : 1: add_attr 4.75% : 0.003186s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000063s : 1: auto_monad 0.03% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.59% : 0.000394s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000021s : 1: event_method 0.03% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.66% : 0.000443s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.72% : 0.000485s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.47% : 0.000987s : 78: opt.transform.opt_a 0.04% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000093s : 28: opt.transform.opt_b 0.06% : 0.000042s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 3.40% : 0.002280s : 1: opt_a 0.15% : 0.000099s : 1: opt_after_cconv 0.76% : 0.000511s : 1: opt_after_jit_grad 0.29% : 0.000196s : 1: opt_b 6.30% : 0.004221s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.05% : 0.000031s : 1: pre_auto_parallel 0.04% : 0.000028s : 1: py_interpret_to_execute 0.02% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000018s : 1: remove_dup_value 0.36% : 0.000242s : 1: renormalize.infer 0.33% : 0.000223s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000038s : 1: rewriter_after_opt_a 0.10% : 0.000069s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000077s : 1: symbol_engine_optimizer 65.56% : 0.043944s : 1: task_emit 0.11% : 0.000073s : 1: tuple_transform 8.10% : 0.005427s : 1: type_inference 0.09% : 0.000061s : 1: validate TotalTime = 0.0787537, [24] [bootstrap]: 0.00048313 [type_inference]: 0.0123321 [event_method]: 5.158e-05 [auto_monad]: 0.00013405 [graph_reusing]: 8.42e-06 [inline]: 2.46e-06 [add_attr]: 0.00327728, [1] [add_attr_with_inline]: 0.00326896, [1] [Cycle 1]: 7.335e-05, [2] [tag_attr]: 3.405e-05 [meta_addattr_fg_expand]: 9.91998e-06 [parallel-infer-symbol]: 3.33998e-06 [pre_auto_parallel]: 5.039e-05 [insert-virtual-dataset]: 2.54999e-06 [parallel-infer-symbol-second]: 7.99977e-07 [dataset_repeat_opt]: 2.07001e-06 [pipeline_split]: 1.89999e-06 [optimize]: 0.016883, [53] [py_interpret_to_execute]: 3.908e-05 [rewriter_before_opt_a]: 0.00015857 [opt_a]: 0.01464, [3] [Cycle 1]: 0.0111903, [45] [expand_dump_flag]: 4.07e-06 [switch_simplify]: 7.714e-05 [loop_unroll]: 6.523e-05 [a_1]: 0.00144864 [with_stream_mark]: 2.471e-05 [recompute_prepare]: 2.211e-05 [updatestate_depend_eliminate]: 8.74003e-06 [updatestate_assign_eliminate]: 7.87e-06 [updatestate_loads_eliminate]: 6.83e-06 [parameter_eliminate]: 2.61e-06 [a_2]: 0.0002467 [accelerated_algorithm]: 3.214e-05 [shard]: 1.79998e-06 [meta_shard_fg_expand]: 3.44001e-06 [shard_inline]: 1.595e-05 [merge_send_recv]: 1.586e-05 [auto_parallel]: 1.107e-05 [parallel]: 1.857e-05 [flash_sp]: 1.186e-05 [merge_comm]: 9.20999e-06 [allreduce_fusion]: 9.20999e-06 [matmul_add_comm_reduction]: 2.609e-05 [allreduce_slice_to_reducescatter]: 7.10017e-07 [virtual_shard_identity]: 1.824e-05 [virtual_dataset]: 1.58e-05 [get_grad_eliminate_]: 1.537e-05 [virtual_output]: 1.49e-05 [merge_forward]: 9.29e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 1.719e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.989e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 2.866e-05 [set_forward_comm_id_for_comm_node_pass]: 9.44998e-06 [meta_fg_expand]: 0.00152225 [flash_sp_send_recv_attached]: 3.57997e-06 [receive_attached]: 2.31e-06 [after_resolve]: 6.385e-05 [a_after_grad]: 8.8e-05 [renormalize]: 0.00636078 [add_forward_monad_depend]: 9.89001e-06 [auto_monad_grad]: 6.01998e-06 [auto_monad_eliminator]: 6.193e-05 [cse]: 0.00019189 [a_3]: 0.00034835 [Cycle 2]: 0.0027479, [45] [expand_dump_flag]: 2.23002e-06 [switch_simplify]: 4.542e-05 [loop_unroll]: 4.28e-05 [a_1]: 0.0013322 [with_stream_mark]: 1.298e-05 [recompute_prepare]: 9.02e-06 [updatestate_depend_eliminate]: 4.62998e-06 [updatestate_assign_eliminate]: 3.51001e-06 [updatestate_loads_eliminate]: 3.03e-06 [parameter_eliminate]: 1.15999e-06 [a_2]: 8.981e-05 [accelerated_algorithm]: 1.065e-05 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 6.91999e-06 [merge_send_recv]: 6.93e-06 [auto_parallel]: 7.53e-06 [parallel]: 6.22001e-06 [flash_sp]: 3.45e-06 [merge_comm]: 4.09002e-06 [allreduce_fusion]: 4.02e-06 [matmul_add_comm_reduction]: 7.65e-06 [allreduce_slice_to_reducescatter]: 5.3001e-07 [virtual_shard_identity]: 7.8e-06 [virtual_dataset]: 6.52001e-06 [get_grad_eliminate_]: 6.56999e-06 [virtual_output]: 6.14001e-06 [merge_forward]: 3.78001e-06 [cell_reuse_recompute_pass]: 8.50006e-07 [offload_activation]: 8.59e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.287e-05 [merge_recompute_call_nodes]: 1.14e-06 [before_grad]: 1.2e-05 [set_forward_comm_id_for_comm_node_pass]: 4.48001e-06 [meta_fg_expand]: 7.965e-05 [flash_sp_send_recv_attached]: 1.19998e-06 [receive_attached]: 1.82001e-06 [after_resolve]: 1.236e-05 [a_after_grad]: 1.006e-05 [renormalize]: 0.00062717 [add_forward_monad_depend]: 4.58001e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.233e-05 [cse]: 2.332e-05 [a_3]: 4.812e-05 [Cycle 3]: 0.00068686, [45] [expand_dump_flag]: 1.20999e-06 [switch_simplify]: 8.12e-06 [loop_unroll]: 6.69001e-06 [a_1]: 0.00014859 [with_stream_mark]: 8.63001e-06 [recompute_prepare]: 6.96001e-06 [updatestate_depend_eliminate]: 4.14002e-06 [updatestate_assign_eliminate]: 2.70997e-06 [updatestate_loads_eliminate]: 2.49999e-06 [parameter_eliminate]: 8.99978e-07 [a_2]: 8.776e-05 [accelerated_algorithm]: 1.009e-05 [shard]: 9.09989e-07 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 7e-06 [merge_send_recv]: 5.15999e-06 [auto_parallel]: 6.19999e-06 [parallel]: 4.89998e-06 [flash_sp]: 1.26002e-06 [merge_comm]: 3.71999e-06 [allreduce_fusion]: 3.35998e-06 [matmul_add_comm_reduction]: 5.66e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 7.84002e-06 [virtual_dataset]: 6.30002e-06 [get_grad_eliminate_]: 6.26e-06 [virtual_output]: 6.10002e-06 [merge_forward]: 3.45003e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 7.03e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.27e-05 [merge_recompute_call_nodes]: 8.10018e-07 [before_grad]: 1.089e-05 [set_forward_comm_id_for_comm_node_pass]: 3.86999e-06 [meta_fg_expand]: 2.34999e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 9.38002e-06 [a_after_grad]: 9.79e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.34998e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 7.81001e-06 [cse]: 1.608e-05 [a_3]: 4.012e-05 [py_interpret_to_execute_after_opt_a]: 1.105e-05 [slice_cell_reuse_recomputed_activation]: 2.25002e-06 [rewriter_after_opt_a]: 4.157e-05 [convert_after_rewriter]: 8.69e-06 [order_py_execute_after_rewriter]: 5.46e-06 [mutable_eliminate]: 0.00056845 [opt_b]: 0.00021916, [1] [Cycle 1]: 0.00021222, [7] [b_1]: 0.00013464 [b_2]: 8.85999e-06 [updatestate_depend_eliminate]: 6.12999e-06 [updatestate_assign_eliminate]: 2.87002e-06 [updatestate_loads_eliminate]: 2.59001e-06 [renormalize]: 5.00004e-07 [cse]: 2.137e-05 [optimize_parallel_all_gather_comm]: 1.732e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.141e-05 [loop_unroll]: 0.00044376 [opt_after_cconv]: 0.00011144, [1] [Cycle 1]: 0.00010554, [7] [c_1]: 3.351e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.92999e-06 [updatestate_assign_eliminate]: 3.23998e-06 [updatestate_loads_eliminate]: 2.81e-06 [cse]: 2.205e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.524e-05 [tuple_transform]: 8.148e-05, [1] [Cycle 1]: 7.701e-05, [4] [d_1]: 4.808e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 7.96001e-06 [partial_unused_args_eliminate]: 1.75001e-06 [add_recomputation]: 5.168e-05 [cse_after_recomputation]: 2.491e-05, [1] [Cycle 1]: 2.025e-05, [1] [cse]: 1.507e-05 [environ_conv]: 7.83001e-06 [swap_dp_allreduce_reducescatter]: 5.57999e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.28999e-06 [label_fine_grained_interleaved_index]: 2.88003e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.05002e-06 [micro_interleaved_order_control]: 2.54999e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.17999e-06 [full_micro_interleaved_order_control]: 2.34001e-06 [reorder_send_recv_between_fp_bp]: 2.86999e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.21002e-06 [overlap_opt_shard_in_pipeline]: 1.40999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.474e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 4.88001e-06 [overlap_recompute_and_grad_model_parallel]: 5.34998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.58002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.32999e-06 [overlap_grad_ring_attention]: 4.62e-06 [overlap_grad_flash_sp]: 2.186e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.35002e-06 [split_layernorm_comm]: 1.72999e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 8.749e-05, [1] [Cycle 1]: 8.286e-05, [6] [build]: 9.64e-06 [elim_shapecalc]: 1.087e-05 [elim_not_effective]: 1.507e-05 [opt_reshape]: 7.44002e-06 [fold_const_symbol]: 1.165e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.93997e-06 [pipeline_parallel_scheduler]: 1.51002e-06 [auto_monad_reorder]: 2.061e-05 [get_jit_bprop_graph]: 1.17e-06 [rewriter_after_jit_bprop_graph]: 3.78001e-06 [opt_after_jit_grad]: 0.00047784 [validate]: 4.069e-05 [backend_pass]: 8.99978e-07 [task_emit]: 0.0447448 [execute]: 8.10999e-06 Sums bootstrap : 0.000483s : 0.65% type_inference : 0.012332s : 16.63% event_method : 0.000052s : 0.07% auto_monad : 0.000134s : 0.18% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.05% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.07% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.05% optimize.rewriter_before_opt_a : 0.000159s : 0.21% optimize.opt_a.expand_dump_flag : 0.000008s : 0.01% optimize.opt_a.switch_simplify : 0.000131s : 0.18% optimize.opt_a.loop_unroll : 0.000115s : 0.15% optimize.opt_a.a_1 : 0.002929s : 3.95% optimize.opt_a.with_stream_mark : 0.000046s : 0.06% optimize.opt_a.recompute_prepare : 0.000038s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000018s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000424s : 0.57% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.07% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000030s : 0.04% optimize.opt_a.merge_send_recv : 0.000028s : 0.04% optimize.opt_a.auto_parallel : 0.000025s : 0.03% optimize.opt_a.parallel : 0.000030s : 0.04% optimize.opt_a.flash_sp : 0.000017s : 0.02% optimize.opt_a.merge_comm : 0.000017s : 0.02% optimize.opt_a.allreduce_fusion : 0.000017s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000039s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000034s : 0.05% optimize.opt_a.virtual_dataset : 0.000029s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.04% optimize.opt_a.virtual_output : 0.000027s : 0.04% optimize.opt_a.merge_forward : 0.000017s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000033s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000055s : 0.07% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000052s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.02% optimize.opt_a.meta_fg_expand : 0.001604s : 2.16% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000086s : 0.12% optimize.opt_a.a_after_grad : 0.000108s : 0.15% optimize.opt_a.renormalize : 0.006988s : 9.42% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.02% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.11% optimize.opt_a.cse : 0.000231s : 0.31% optimize.opt_a.a_3 : 0.000437s : 0.59% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000042s : 0.06% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000568s : 0.77% optimize.opt_b.b_1 : 0.000135s : 0.18% optimize.opt_b.b_2 : 0.000009s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000021s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000444s : 0.60% optimize.opt_after_cconv.c_1 : 0.000034s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000022s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.02% optimize.tuple_transform.d_1 : 0.000048s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.07% optimize.cse_after_recomputation.cse : 0.000015s : 0.02% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000022s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000478s : 0.64% validate : 0.000041s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.044745s : 60.33% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000715 161 7.13% : 0.000051s : 8: substitution.arithmetic_simplify 0.35% : 0.000002s : 3: substitution.elim_not_effective 0.62% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 3: substitution.fold_const_symbol 0.88% : 0.000006s : 4: substitution.graph_param_transform 0.45% : 0.000003s : 2: substitution.incorporate_call 0.31% : 0.000002s : 2: substitution.incorporate_call_switch 57.63% : 0.000412s : 17: substitution.inline 2.24% : 0.000016s : 2: substitution.inline_without_move 1.48% : 0.000011s : 15: substitution.j_node_and_user_rematch 2.35% : 0.000017s : 3: substitution.less_batch_normalization 1.42% : 0.000010s : 7: substitution.minmaximum_grad 0.85% : 0.000006s : 5: substitution.partial_eliminate 1.69% : 0.000012s : 15: substitution.remove_not_recompute_node 3.75% : 0.000027s : 10: substitution.replace_applicator 1.36% : 0.000010s : 10: substitution.replace_old_param 0.39% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.14% : 0.000022s : 7: substitution.tuple_list_convert_item_index_to_positive 1.50% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 2.02% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.57% : 0.000054s : 19: substitution.tuple_list_get_item_eliminator 2.09% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012239 2 85.85% : 0.010507s : 1: type_inference.infer 14.15% : 0.001731s : 1: type_inference.specialize ------[replace.] 0.000196 27 64.16% : 0.000126s : 17: replace.inline 35.84% : 0.000070s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000430 27 93.54% : 0.000403s : 17: match.inline 6.46% : 0.000028s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000707 4248 1.12% : 0.000008s : 53: predicate.accumulaten_eliminater 0.22% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.45% : 0.000003s : 21: predicate.addn_check_dump 1.11% : 0.000008s : 53: predicate.addn_zero_filter 1.09% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 1.94% : 0.000014s : 74: predicate.arithmetic_simplify 1.17% : 0.000008s : 53: predicate.cast_eliminate 1.10% : 0.000008s : 50: predicate.check_bprop_eliminate 0.44% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.45% : 0.000003s : 21: predicate.depend_value_elim 1.16% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 53: predicate.dict_get_item_eliminator 1.15% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.06% : 0.000000s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.17% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.19% : 0.000008s : 57: predicate.environ_get_depend_swap 1.67% : 0.000012s : 78: predicate.environ_get_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.80% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.48% : 0.000018s : 80: predicate.float_depend_g_call 0.45% : 0.000003s : 21: predicate.float_environ_get_switch 0.56% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.53% : 0.000004s : 21: predicate.get_grad_eliminate 0.14% : 0.000001s : 4: predicate.graph_param_transform 0.50% : 0.000004s : 21: predicate.incorporate_call 0.45% : 0.000003s : 21: predicate.incorporate_call_switch 5.79% : 0.000041s : 183: predicate.inline 1.43% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.61% : 0.000004s : 21: predicate.less_batch_normalization 1.58% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.64% : 0.000019s : 124: predicate.load_eliminater 0.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.56% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.47% : 0.000003s : 21: predicate.merge_addn 2.50% : 0.000018s : 50: predicate.micro_step_allgather_replace 1.08% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 53: predicate.minmaximum_grad 0.30% : 0.000002s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.11% : 0.000001s : 4: predicate.parallel_virtual_node 2.11% : 0.000015s : 80: predicate.partial_defer_inline 1.69% : 0.000012s : 67: predicate.partial_eliminate 1.10% : 0.000008s : 53: predicate.print_const_string_wrapper 0.46% : 0.000003s : 21: predicate.reduce_all_const_elim 1.37% : 0.000010s : 53: predicate.reduce_eliminate 2.58% : 0.000018s : 124: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 21: predicate.remove_not_recompute_node 1.83% : 0.000013s : 113: predicate.replace_applicator 0.67% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.13% : 0.000008s : 53: predicate.reshape_eliminate 1.10% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 4: predicate.row_tensor_eliminate 1.27% : 0.000009s : 50: predicate.same_eliminate 0.34% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.55% : 0.000004s : 21: predicate.shard_identity_eliminate 0.21% : 0.000001s : 8: predicate.special_op_eliminate 0.60% : 0.000004s : 21: predicate.specialize_transform 1.18% : 0.000008s : 50: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.93% : 0.000014s : 80: predicate.switch_defer_inline 2.94% : 0.000021s : 130: predicate.switch_layer_defer_inline 5.19% : 0.000037s : 218: predicate.switch_simplify 1.10% : 0.000008s : 53: predicate.tile_eliminate 1.07% : 0.000008s : 53: predicate.transpose_eliminate 1.39% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000009s : 61: predicate.tuple_list_get_item_depend_reorder 2.74% : 0.000019s : 92: predicate.tuple_list_get_item_eliminator 1.45% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.56% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.56% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.11% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.11% : 0.000001s : 4: predicate.value_based_eliminate 0.57% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.49% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.13% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001876 36 57.99% : 0.001088s : 15: func_graph_cloner_run.FuncGraphClonerGraph 42.01% : 0.000788s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.110541 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.97% : 0.003282s : 1: add_attr 2.96% : 0.003273s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.13% : 0.000141s : 1: auto_monad 0.02% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.46% : 0.000511s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000028s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.05% : 0.000059s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.41% : 0.000453s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.52% : 0.000578s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000015s : 1: opt.transform.mutable_eliminate 4.02% : 0.004440s : 117: opt.transform.opt_a 0.03% : 0.000032s : 1: opt.transform.opt_after_cconv 0.02% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000116s : 28: opt.transform.opt_b 0.05% : 0.000054s : 2: opt.transform.opt_trans_graph 0.04% : 0.000041s : 4: opt.transform.symbol_engine_opt 13.25% : 0.014643s : 1: opt_a 0.10% : 0.000115s : 1: opt_after_cconv 0.44% : 0.000487s : 1: opt_after_jit_grad 0.20% : 0.000222s : 1: opt_b 15.28% : 0.016888s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000055s : 1: pre_auto_parallel 0.04% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000019s : 1: remove_dup_value 4.91% : 0.005424s : 2: renormalize.infer 1.40% : 0.001550s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000046s : 1: rewriter_after_opt_a 0.15% : 0.000163s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000090s : 1: symbol_engine_optimizer 40.50% : 0.044765s : 1: task_emit 0.08% : 0.000084s : 1: tuple_transform 11.18% : 0.012354s : 1: type_inference 0.06% : 0.000065s : 1: validate TotalTime = 0.0586897, [24] [bootstrap]: 0.00048779 [type_inference]: 0.00583025 [event_method]: 1.264e-05 [auto_monad]: 6.033e-05 [graph_reusing]: 5.00999e-06 [inline]: 1.97999e-06 [add_attr]: 0.00310296, [1] [add_attr_with_inline]: 0.0030939, [1] [Cycle 1]: 5.655e-05, [2] [tag_attr]: 1.461e-05 [meta_addattr_fg_expand]: 3.87998e-06 [parallel-infer-symbol]: 3.77002e-06 [pre_auto_parallel]: 2.663e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.83997e-06 [pipeline_split]: 2.07001e-06 [optimize]: 0.00409339, [53] [py_interpret_to_execute]: 2.066e-05 [rewriter_before_opt_a]: 5.113e-05 [opt_a]: 0.00215647, [2] [Cycle 1]: 0.00147766, [45] [expand_dump_flag]: 3.03e-06 [switch_simplify]: 3.045e-05 [loop_unroll]: 1.749e-05 [a_1]: 0.00036053 [with_stream_mark]: 1.557e-05 [recompute_prepare]: 8.27998e-06 [updatestate_depend_eliminate]: 3.83999e-06 [updatestate_assign_eliminate]: 3.52002e-06 [updatestate_loads_eliminate]: 3.38e-06 [parameter_eliminate]: 1.92999e-06 [a_2]: 8.368e-05 [accelerated_algorithm]: 6.84999e-06 [shard]: 2.60002e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 9.11002e-06 [auto_parallel]: 6.07999e-06 [parallel]: 1.813e-05 [flash_sp]: 7.11001e-06 [merge_comm]: 3.71999e-06 [allreduce_fusion]: 3.64002e-06 [matmul_add_comm_reduction]: 8.87999e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.53999e-06 [virtual_dataset]: 6.19999e-06 [get_grad_eliminate_]: 6.06998e-06 [virtual_output]: 5.77001e-06 [merge_forward]: 3.81999e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 9.54e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.244e-05 [merge_recompute_call_nodes]: 1.72001e-06 [before_grad]: 9.83002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.99002e-06 [meta_fg_expand]: 2.60002e-06 [flash_sp_send_recv_attached]: 2.75002e-06 [receive_attached]: 2.38002e-06 [after_resolve]: 9.31002e-06 [a_after_grad]: 8.55999e-06 [renormalize]: 0.00045008 [add_forward_monad_depend]: 4.99998e-06 [auto_monad_grad]: 2.08998e-06 [auto_monad_eliminator]: 1.386e-05 [cse]: 2.975e-05 [a_3]: 4.283e-05 [Cycle 2]: 0.00066847, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 7.16999e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00011419 [with_stream_mark]: 1.066e-05 [recompute_prepare]: 5.89999e-06 [updatestate_depend_eliminate]: 2.95002e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 7.171e-05 [accelerated_algorithm]: 5.76e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.27999e-06 [shard_inline]: 5.72001e-06 [merge_send_recv]: 4.55999e-06 [auto_parallel]: 5.26002e-06 [parallel]: 6.126e-05 [flash_sp]: 3.79002e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.04001e-06 [matmul_add_comm_reduction]: 5.52001e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.78998e-06 [virtual_dataset]: 5.47999e-06 [get_grad_eliminate_]: 5.62001e-06 [virtual_output]: 5.17e-06 [merge_forward]: 2.66e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 7.11999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.133e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 8.72e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 9.20001e-07 [after_resolve]: 8.67e-06 [a_after_grad]: 7.83001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.44e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.51999e-06 [cse]: 1.394e-05 [a_3]: 3.303e-05 [py_interpret_to_execute_after_opt_a]: 8.08001e-06 [slice_cell_reuse_recomputed_activation]: 2.32999e-06 [rewriter_after_opt_a]: 3.403e-05 [convert_after_rewriter]: 6.75998e-06 [order_py_execute_after_rewriter]: 5.16002e-06 [mutable_eliminate]: 0.00050194 [opt_b]: 0.00018916, [1] [Cycle 1]: 0.0001825, [7] [b_1]: 0.0001112 [b_2]: 7.33e-06 [updatestate_depend_eliminate]: 5.66e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.14999e-06 [renormalize]: 4.50003e-07 [cse]: 1.805e-05 [optimize_parallel_all_gather_comm]: 1.604e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.424e-05 [loop_unroll]: 0.00042744 [opt_after_cconv]: 9.694e-05, [1] [Cycle 1]: 9.12e-05, [7] [c_1]: 2.576e-05 [parameter_eliminate]: 3.03998e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.66999e-06 [updatestate_loads_eliminate]: 2.49001e-06 [cse]: 1.728e-05 [renormalize]: 5.19998e-07 [remove_dup_value]: 1.58e-05 [tuple_transform]: 6.928e-05, [1] [Cycle 1]: 6.483e-05, [4] [d_1]: 3.763e-05 [none_parameter_eliminate]: 2.19999e-06 [renormalize]: 2.79979e-07 [switch_simplify]: 6.46e-06 [partial_unused_args_eliminate]: 1.97001e-06 [add_recomputation]: 4.685e-05 [cse_after_recomputation]: 2.179e-05, [1] [Cycle 1]: 1.683e-05, [1] [cse]: 1.134e-05 [environ_conv]: 6.34001e-06 [swap_dp_allreduce_reducescatter]: 4.99003e-06 [bias_add_comm_swap]: 2.49001e-06 [label_micro_interleaved_index]: 4.99e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.34998e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.73e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.21e-06 [reorder_send_recv_between_fp_bp]: 3.09001e-06 [comm_op_add_attrs]: 1.42999e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.23002e-06 [interleave_parallel_branches]: 1.13001e-06 [overlap_opt_shard_in_pipeline]: 1.35999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.07999e-06 [control_data_broadcast_order]: 1.218e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 3.91001e-06 [overlap_recompute_and_grad_model_parallel]: 4.94e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.36e-06 [overlap_grad_ring_attention]: 4.28999e-06 [overlap_grad_flash_sp]: 1.969e-05 [begin_end_overlap_inline]: 8.39995e-07 [split_matmul_comm_elemetwise]: 2.46998e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 1.12999e-06 [symbol_engine_optimizer]: 7.326e-05, [1] [Cycle 1]: 6.857e-05, [6] [build]: 2.69999e-06 [elim_shapecalc]: 9.29998e-06 [elim_not_effective]: 1.224e-05 [opt_reshape]: 6.26e-06 [fold_const_symbol]: 9.36e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.99e-06 [pipeline_parallel_scheduler]: 1.62999e-06 [auto_monad_reorder]: 1.772e-05 [get_jit_bprop_graph]: 1.22e-06 [rewriter_after_jit_bprop_graph]: 4.04002e-06 [opt_after_jit_grad]: 0.00046828 [validate]: 3.712e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.0442936 [execute]: 1.017e-05 Sums bootstrap : 0.000488s : 0.89% type_inference : 0.005830s : 10.69% event_method : 0.000013s : 0.02% auto_monad : 0.000060s : 0.11% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000027s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.04% optimize.rewriter_before_opt_a : 0.000051s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.07% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000475s : 0.87% optimize.opt_a.with_stream_mark : 0.000026s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000155s : 0.28% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000014s : 0.03% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000079s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.03% optimize.opt_a.renormalize : 0.000450s : 0.83% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000044s : 0.08% optimize.opt_a.a_3 : 0.000076s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000502s : 0.92% optimize.opt_b.b_1 : 0.000111s : 0.20% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.04% optimize.loop_unroll : 0.000427s : 0.78% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.03% optimize.tuple_transform.d_1 : 0.000038s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.09% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.04% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000468s : 0.86% validate : 0.000037s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.044294s : 81.18% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000147 24 20.23% : 0.000030s : 4: substitution.arithmetic_simplify 1.30% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000001s : 2: substitution.fold_const_symbol 3.64% : 0.000005s : 3: substitution.graph_param_transform 66.12% : 0.000097s : 3: substitution.inline 2.09% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.68% : 0.000005s : 4: substitution.remove_not_recompute_node 1.99% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005782 2 91.82% : 0.005309s : 1: type_inference.infer 8.18% : 0.000473s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000095 3 100.00% : 0.000095s : 3: match.inline ------[predicate.] 0.000148 815 0.89% : 0.000001s : 8: predicate.accumulaten_eliminater 0.96% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 8: predicate.addn_zero_filter 0.80% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.44% : 0.000004s : 14: predicate.arithmetic_simplify 1.03% : 0.000002s : 8: predicate.cast_eliminate 0.70% : 0.000001s : 6: predicate.check_bprop_eliminate 0.63% : 0.000001s : 6: predicate.compare_switch_simplify 0.24% : 0.000000s : 3: predicate.const_output_eliminate 0.61% : 0.000001s : 6: predicate.depend_value_elim 0.82% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 1.02% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.42% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_depend_swap 1.76% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.16% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.14% : 0.000003s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 6: predicate.float_environ_get_switch 0.89% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 3: predicate.fold_const_symbol 1.05% : 0.000002s : 6: predicate.get_grad_eliminate 0.36% : 0.000001s : 3: predicate.graph_param_transform 0.74% : 0.000001s : 6: predicate.incorporate_call 0.63% : 0.000001s : 6: predicate.incorporate_call_switch 6.22% : 0.000009s : 37: predicate.inline 1.02% : 0.000002s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 6: predicate.less_batch_normalization 1.65% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.28% : 0.000003s : 22: predicate.load_eliminater 1.27% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.94% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.66% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 6: predicate.merge_addn 0.70% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 8: predicate.minmaximum_grad 1.53% : 0.000002s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.67% : 0.000002s : 11: predicate.partial_defer_inline 1.32% : 0.000002s : 11: predicate.partial_eliminate 0.94% : 0.000001s : 8: predicate.print_const_string_wrapper 0.67% : 0.000001s : 6: predicate.reduce_all_const_elim 1.19% : 0.000002s : 8: predicate.reduce_eliminate 2.19% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 6: predicate.remove_not_recompute_node 1.18% : 0.000002s : 14: predicate.replace_applicator 0.69% : 0.000001s : 6: predicate.replace_old_param 0.36% : 0.000001s : 3: predicate.reset_defer_inline 0.87% : 0.000001s : 8: predicate.reshape_eliminate 0.68% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.55% : 0.000001s : 3: predicate.row_tensor_eliminate 0.87% : 0.000001s : 6: predicate.same_eliminate 0.51% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 6: predicate.shard_identity_eliminate 0.88% : 0.000001s : 6: predicate.special_op_eliminate 0.88% : 0.000001s : 6: predicate.specialize_transform 1.02% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.21% : 0.000002s : 11: predicate.switch_defer_inline 1.87% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.92% : 0.000007s : 38: predicate.switch_simplify 0.86% : 0.000001s : 8: predicate.tile_eliminate 0.86% : 0.000001s : 8: predicate.transpose_eliminate 1.48% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 2.93% : 0.000004s : 20: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.93% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.76% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.69% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000295 7 38.30% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.70% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.067335 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.62% : 0.003108s : 1: add_attr 4.60% : 0.003098s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000066s : 1: auto_monad 0.03% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.78% : 0.000525s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000010s : 1: environ_conv 0.03% : 0.000018s : 1: event_method 0.03% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.65% : 0.000437s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.76% : 0.000511s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.26% : 0.000847s : 78: opt.transform.opt_a 0.04% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000091s : 28: opt.transform.opt_b 0.06% : 0.000042s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 3.21% : 0.002160s : 1: opt_a 0.15% : 0.000100s : 1: opt_after_cconv 0.71% : 0.000478s : 1: opt_after_jit_grad 0.29% : 0.000192s : 1: opt_b 6.09% : 0.004097s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.05% : 0.000031s : 1: pre_auto_parallel 0.04% : 0.000025s : 1: py_interpret_to_execute 0.02% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.35% : 0.000238s : 1: renormalize.infer 0.31% : 0.000206s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000038s : 1: rewriter_after_opt_a 0.08% : 0.000055s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000076s : 1: symbol_engine_optimizer 65.81% : 0.044316s : 1: task_emit 0.11% : 0.000072s : 1: tuple_transform 8.68% : 0.005846s : 1: type_inference 0.09% : 0.000062s : 1: validate TotalTime = 0.0791925, [24] [bootstrap]: 0.00046633 [type_inference]: 0.0122805 [event_method]: 4.423e-05 [auto_monad]: 0.00013178 [graph_reusing]: 8.90001e-06 [inline]: 2.01e-06 [add_attr]: 0.00323474, [1] [add_attr_with_inline]: 0.003224, [1] [Cycle 1]: 7.857e-05, [2] [tag_attr]: 3.487e-05 [meta_addattr_fg_expand]: 9.64e-06 [parallel-infer-symbol]: 4.03001e-06 [pre_auto_parallel]: 5.166e-05 [insert-virtual-dataset]: 3.09999e-06 [parallel-infer-symbol-second]: 9.70002e-07 [dataset_repeat_opt]: 2.01003e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.0179565, [53] [py_interpret_to_execute]: 4.135e-05 [rewriter_before_opt_a]: 0.0001475 [opt_a]: 0.0156573, [3] [Cycle 1]: 0.0120642, [45] [expand_dump_flag]: 4.42e-06 [switch_simplify]: 7.572e-05 [loop_unroll]: 6.086e-05 [a_1]: 0.00143878 [with_stream_mark]: 2.674e-05 [recompute_prepare]: 2.66e-05 [updatestate_depend_eliminate]: 8.97e-06 [updatestate_assign_eliminate]: 7.53e-06 [updatestate_loads_eliminate]: 7.63999e-06 [parameter_eliminate]: 3.71999e-06 [a_2]: 0.00025159 [accelerated_algorithm]: 3.522e-05 [shard]: 2.75002e-06 [meta_shard_fg_expand]: 3.92002e-06 [shard_inline]: 1.699e-05 [merge_send_recv]: 1.865e-05 [auto_parallel]: 1.161e-05 [parallel]: 2.029e-05 [flash_sp]: 1.387e-05 [merge_comm]: 1.088e-05 [allreduce_fusion]: 8.87e-06 [matmul_add_comm_reduction]: 3.069e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 9.322e-05 [virtual_dataset]: 1.661e-05 [get_grad_eliminate_]: 1.522e-05 [virtual_output]: 1.565e-05 [merge_forward]: 1.005e-05 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 1.891e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.283e-05 [merge_recompute_call_nodes]: 2.07999e-06 [before_grad]: 2.939e-05 [set_forward_comm_id_for_comm_node_pass]: 1.081e-05 [meta_fg_expand]: 0.00172679 [flash_sp_send_recv_attached]: 4.08999e-06 [receive_attached]: 2.11e-06 [after_resolve]: 6.958e-05 [a_after_grad]: 9.215e-05 [renormalize]: 0.00688307 [add_forward_monad_depend]: 1.141e-05 [auto_monad_grad]: 6.77002e-06 [auto_monad_eliminator]: 5.303e-05 [cse]: 0.00019537 [a_3]: 0.00033978 [Cycle 2]: 0.00288106, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 4.655e-05 [loop_unroll]: 4.222e-05 [a_1]: 0.00135345 [with_stream_mark]: 1.563e-05 [recompute_prepare]: 1.017e-05 [updatestate_depend_eliminate]: 4.81002e-06 [updatestate_assign_eliminate]: 3.93999e-06 [updatestate_loads_eliminate]: 3.26999e-06 [parameter_eliminate]: 2.24001e-06 [a_2]: 9.075e-05 [accelerated_algorithm]: 1.147e-05 [shard]: 1.81003e-06 [meta_shard_fg_expand]: 2.16e-06 [shard_inline]: 6.93e-06 [merge_send_recv]: 8.32998e-06 [auto_parallel]: 9.23002e-06 [parallel]: 8.57e-06 [flash_sp]: 3.9e-06 [merge_comm]: 4.4e-06 [allreduce_fusion]: 3.74002e-06 [matmul_add_comm_reduction]: 9.72001e-06 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 7.98999e-06 [virtual_dataset]: 6.98998e-06 [get_grad_eliminate_]: 6.29999e-06 [virtual_output]: 6.17999e-06 [merge_forward]: 4.73001e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 1.091e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.416e-05 [merge_recompute_call_nodes]: 1.23002e-06 [before_grad]: 1.269e-05 [set_forward_comm_id_for_comm_node_pass]: 4.47e-06 [meta_fg_expand]: 6.423e-05 [flash_sp_send_recv_attached]: 1.92999e-06 [receive_attached]: 1.69e-06 [after_resolve]: 1.218e-05 [a_after_grad]: 1.032e-05 [renormalize]: 0.00072553 [add_forward_monad_depend]: 4.52e-06 [auto_monad_grad]: 1.82999e-06 [auto_monad_eliminator]: 1.257e-05 [cse]: 2.453e-05 [a_3]: 4.876e-05 [Cycle 3]: 0.00069329, [45] [expand_dump_flag]: 1.15999e-06 [switch_simplify]: 8.03001e-06 [loop_unroll]: 6.74001e-06 [a_1]: 0.00014873 [with_stream_mark]: 8.85999e-06 [recompute_prepare]: 7.08e-06 [updatestate_depend_eliminate]: 3.86999e-06 [updatestate_assign_eliminate]: 2.84999e-06 [updatestate_loads_eliminate]: 2.94999e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 8.607e-05 [accelerated_algorithm]: 1.035e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 6.88e-06 [merge_send_recv]: 6.16e-06 [auto_parallel]: 6.86999e-06 [parallel]: 6.02999e-06 [flash_sp]: 8.89995e-07 [merge_comm]: 3.68e-06 [allreduce_fusion]: 3.55998e-06 [matmul_add_comm_reduction]: 5.96e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 8.13001e-06 [virtual_dataset]: 6.23e-06 [get_grad_eliminate_]: 6.42001e-06 [virtual_output]: 6.14001e-06 [merge_forward]: 3.45e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 8.22998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.301e-05 [merge_recompute_call_nodes]: 9.5999e-07 [before_grad]: 1.079e-05 [set_forward_comm_id_for_comm_node_pass]: 4e-06 [meta_fg_expand]: 2.37999e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 9.12999e-06 [a_after_grad]: 9.54e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 7.39002e-06 [cse]: 1.721e-05 [a_3]: 3.951e-05 [py_interpret_to_execute_after_opt_a]: 1.304e-05 [slice_cell_reuse_recomputed_activation]: 2.03997e-06 [rewriter_after_opt_a]: 4.309e-05 [convert_after_rewriter]: 8.38999e-06 [order_py_execute_after_rewriter]: 5.58002e-06 [mutable_eliminate]: 0.00061243 [opt_b]: 0.00022433, [1] [Cycle 1]: 0.00021663, [7] [b_1]: 0.00013541 [b_2]: 8.64998e-06 [updatestate_depend_eliminate]: 6.95002e-06 [updatestate_assign_eliminate]: 2.92002e-06 [updatestate_loads_eliminate]: 2.86e-06 [renormalize]: 5.40022e-07 [cse]: 2.253e-05 [optimize_parallel_all_gather_comm]: 1.786e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.188e-05 [loop_unroll]: 0.00044259 [opt_after_cconv]: 0.00011316, [1] [Cycle 1]: 0.00010711, [7] [c_1]: 3.443e-05 [parameter_eliminate]: 2.96001e-06 [updatestate_depend_eliminate]: 6.07001e-06 [updatestate_assign_eliminate]: 3.16999e-06 [updatestate_loads_eliminate]: 2.78003e-06 [cse]: 2.231e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.364e-05 [tuple_transform]: 8.159e-05, [1] [Cycle 1]: 7.654e-05, [4] [d_1]: 4.764e-05 [none_parameter_eliminate]: 1.56998e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 7.87e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 5.351e-05 [cse_after_recomputation]: 2.527e-05, [1] [Cycle 1]: 2.036e-05, [1] [cse]: 1.466e-05 [environ_conv]: 9.69e-06 [swap_dp_allreduce_reducescatter]: 5.72001e-06 [bias_add_comm_swap]: 2.83998e-06 [label_micro_interleaved_index]: 3.79002e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.54e-06 [slice_recompute_activation]: 2.30002e-06 [micro_interleaved_order_control]: 2.76e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.04998e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.91e-06 [comm_op_add_attrs]: 1.39e-06 [add_comm_op_reuse_tag]: 1.28002e-06 [interleave_split_concat_branches]: 1.25001e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.83002e-06 [control_data_broadcast_order]: 1.415e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 4.52e-06 [overlap_recompute_and_grad_model_parallel]: 5.57999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.69998e-06 [overlap_recompute_comm]: 2.33002e-06 [overlap_grad_ring_attention]: 4.88001e-06 [overlap_grad_flash_sp]: 2.353e-05 [begin_end_overlap_inline]: 7.30011e-07 [split_matmul_comm_elemetwise]: 2.18998e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 1.60001e-06 [symbol_engine_optimizer]: 8.676e-05, [1] [Cycle 1]: 8.186e-05, [6] [build]: 9.13002e-06 [elim_shapecalc]: 1.116e-05 [elim_not_effective]: 1.441e-05 [opt_reshape]: 7.53999e-06 [fold_const_symbol]: 1.144e-05 [renormalize]: 2.3999e-07 [detach_backward]: 2.23998e-06 [pipeline_parallel_scheduler]: 1.89e-06 [auto_monad_reorder]: 2.212e-05 [get_jit_bprop_graph]: 1.90001e-06 [rewriter_after_jit_bprop_graph]: 3.97e-06 [opt_after_jit_grad]: 0.00054402 [validate]: 4.658e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.0441644 [execute]: 7.81001e-06 Sums bootstrap : 0.000466s : 0.63% type_inference : 0.012281s : 16.46% event_method : 0.000044s : 0.06% auto_monad : 0.000132s : 0.18% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.05% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000052s : 0.07% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000041s : 0.06% optimize.rewriter_before_opt_a : 0.000147s : 0.20% optimize.opt_a.expand_dump_flag : 0.000008s : 0.01% optimize.opt_a.switch_simplify : 0.000130s : 0.17% optimize.opt_a.loop_unroll : 0.000110s : 0.15% optimize.opt_a.a_1 : 0.002941s : 3.94% optimize.opt_a.with_stream_mark : 0.000051s : 0.07% optimize.opt_a.recompute_prepare : 0.000044s : 0.06% optimize.opt_a.updatestate_depend_eliminate : 0.000018s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.02% optimize.opt_a.parameter_eliminate : 0.000007s : 0.01% optimize.opt_a.a_2 : 0.000428s : 0.57% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.08% optimize.opt_a.shard : 0.000006s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000031s : 0.04% optimize.opt_a.merge_send_recv : 0.000033s : 0.04% optimize.opt_a.auto_parallel : 0.000028s : 0.04% optimize.opt_a.parallel : 0.000035s : 0.05% optimize.opt_a.flash_sp : 0.000019s : 0.03% optimize.opt_a.merge_comm : 0.000019s : 0.03% optimize.opt_a.allreduce_fusion : 0.000016s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000046s : 0.06% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000109s : 0.15% optimize.opt_a.virtual_dataset : 0.000030s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.04% optimize.opt_a.virtual_output : 0.000028s : 0.04% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000038s : 0.05% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000053s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000019s : 0.03% optimize.opt_a.meta_fg_expand : 0.001793s : 2.40% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000091s : 0.12% optimize.opt_a.a_after_grad : 0.000112s : 0.15% optimize.opt_a.renormalize : 0.007609s : 10.20% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.02% optimize.opt_a.auto_monad_grad : 0.000010s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000073s : 0.10% optimize.opt_a.cse : 0.000237s : 0.32% optimize.opt_a.a_3 : 0.000428s : 0.57% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000043s : 0.06% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000612s : 0.82% optimize.opt_b.b_1 : 0.000135s : 0.18% optimize.opt_b.b_2 : 0.000009s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000023s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000443s : 0.59% optimize.opt_after_cconv.c_1 : 0.000034s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000022s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.02% optimize.tuple_transform.d_1 : 0.000048s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000054s : 0.07% optimize.cse_after_recomputation.cse : 0.000015s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000002s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000544s : 0.73% validate : 0.000047s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.044164s : 59.21% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000787 159 6.91% : 0.000054s : 7: substitution.arithmetic_simplify 0.32% : 0.000002s : 3: substitution.elim_not_effective 0.63% : 0.000005s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 3: substitution.fold_const_symbol 0.82% : 0.000006s : 4: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.32% : 0.000002s : 2: substitution.incorporate_call_switch 58.91% : 0.000464s : 17: substitution.inline 2.49% : 0.000020s : 2: substitution.inline_without_move 1.34% : 0.000011s : 15: substitution.j_node_and_user_rematch 2.22% : 0.000017s : 3: substitution.less_batch_normalization 1.38% : 0.000011s : 7: substitution.minmaximum_grad 0.84% : 0.000007s : 5: substitution.partial_eliminate 1.56% : 0.000012s : 15: substitution.remove_not_recompute_node 3.89% : 0.000031s : 10: substitution.replace_applicator 1.34% : 0.000011s : 10: substitution.replace_old_param 0.49% : 0.000004s : 1: substitution.set_cell_output_no_recompute 2.93% : 0.000023s : 7: substitution.tuple_list_convert_item_index_to_positive 1.37% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.99% : 0.000016s : 7: substitution.tuple_list_get_item_depend_reorder 7.27% : 0.000057s : 18: substitution.tuple_list_get_item_eliminator 1.85% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012199 2 87.64% : 0.010691s : 1: type_inference.infer 12.36% : 0.001508s : 1: type_inference.specialize ------[replace.] 0.000202 26 67.22% : 0.000136s : 17: replace.inline 32.78% : 0.000066s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000480 26 94.50% : 0.000454s : 17: match.inline 5.50% : 0.000026s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000734 4180 1.04% : 0.000008s : 52: predicate.accumulaten_eliminater 7.13% : 0.000052s : 4: predicate.ad_related_special_op_eliminate 0.42% : 0.000003s : 21: predicate.addn_check_dump 1.02% : 0.000008s : 52: predicate.addn_zero_filter 1.02% : 0.000007s : 52: predicate.adjust_all_reduce_mul_add 1.84% : 0.000014s : 73: predicate.arithmetic_simplify 1.07% : 0.000008s : 52: predicate.cast_eliminate 1.05% : 0.000008s : 50: predicate.check_bprop_eliminate 0.43% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.44% : 0.000003s : 21: predicate.depend_value_elim 1.06% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.13% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000001s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.12% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.09% : 0.000008s : 56: predicate.environ_get_depend_swap 1.55% : 0.000011s : 77: predicate.environ_get_eliminate 1.07% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.66% : 0.000012s : 78: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 78: predicate.float_depend_g_call 0.45% : 0.000003s : 21: predicate.float_environ_get_switch 0.53% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.50% : 0.000004s : 21: predicate.get_grad_eliminate 0.06% : 0.000000s : 4: predicate.graph_param_transform 0.48% : 0.000004s : 21: predicate.incorporate_call 0.43% : 0.000003s : 21: predicate.incorporate_call_switch 5.56% : 0.000041s : 180: predicate.inline 1.39% : 0.000010s : 45: predicate.inline_without_move 0.27% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.56% : 0.000004s : 21: predicate.less_batch_normalization 1.40% : 0.000010s : 69: predicate.list_to_tuple_eliminator_ 2.42% : 0.000018s : 121: predicate.load_eliminater 0.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.36% : 0.000017s : 110: predicate.loop_unroll_before_grad 1.26% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.48% : 0.000003s : 21: predicate.merge_addn 1.00% : 0.000007s : 50: predicate.micro_step_allgather_replace 1.02% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.03% : 0.000008s : 52: predicate.minmaximum_grad 0.33% : 0.000002s : 4: predicate.mutable_eliminate 0.10% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 1.95% : 0.000014s : 78: predicate.partial_defer_inline 1.57% : 0.000012s : 65: predicate.partial_eliminate 1.02% : 0.000007s : 52: predicate.print_const_string_wrapper 0.45% : 0.000003s : 21: predicate.reduce_all_const_elim 1.23% : 0.000009s : 52: predicate.reduce_eliminate 2.42% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 21: predicate.remove_not_recompute_node 1.75% : 0.000013s : 111: predicate.replace_applicator 0.63% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.03% : 0.000008s : 52: predicate.reshape_eliminate 1.02% : 0.000007s : 50: predicate.row_tensor_add_zeros_like 0.11% : 0.000001s : 4: predicate.row_tensor_eliminate 1.17% : 0.000009s : 50: predicate.same_eliminate 0.32% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 21: predicate.shard_identity_eliminate 0.26% : 0.000002s : 8: predicate.special_op_eliminate 0.57% : 0.000004s : 21: predicate.specialize_transform 1.21% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.77% : 0.000013s : 78: predicate.switch_defer_inline 2.77% : 0.000020s : 128: predicate.switch_layer_defer_inline 5.00% : 0.000037s : 213: predicate.switch_simplify 1.07% : 0.000008s : 52: predicate.tile_eliminate 1.03% : 0.000008s : 52: predicate.transpose_eliminate 1.41% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.45% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.23% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.67% : 0.000020s : 90: predicate.tuple_list_get_item_eliminator 1.39% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 1.86% : 0.000014s : 81: predicate.tuple_list_set_item_eliminator 1.42% : 0.000010s : 69: predicate.tuple_to_list_eliminator_ 2.38% : 0.000017s : 121: predicate.updatestate_pure_node_eliminater 2.93% : 0.000021s : 142: predicate.updatestate_useless_node_eliminater 0.12% : 0.000001s : 4: predicate.value_based_eliminate 0.49% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.48% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.13% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001879 35 61.76% : 0.001161s : 14: func_graph_cloner_run.FuncGraphClonerGraph 38.24% : 0.000719s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.112775 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.87% : 0.003239s : 1: add_attr 2.86% : 0.003229s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000058s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.12% : 0.000140s : 1: auto_monad 0.02% : 0.000026s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.43% : 0.000488s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000028s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.05% : 0.000051s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.40% : 0.000452s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.55% : 0.000622s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000016s : 1: opt.transform.mutable_eliminate 4.02% : 0.004536s : 117: opt.transform.opt_a 0.03% : 0.000033s : 1: opt.transform.opt_after_cconv 0.07% : 0.000077s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000116s : 28: opt.transform.opt_b 0.05% : 0.000053s : 2: opt.transform.opt_trans_graph 0.04% : 0.000041s : 4: opt.transform.symbol_engine_opt 13.89% : 0.015661s : 1: opt_a 0.10% : 0.000117s : 1: opt_after_cconv 0.49% : 0.000555s : 1: opt_after_jit_grad 0.20% : 0.000228s : 1: opt_b 15.93% : 0.017961s : 1: optimize 0.02% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000056s : 1: pre_auto_parallel 0.04% : 0.000045s : 1: py_interpret_to_execute 0.01% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 5.27% : 0.005948s : 2: renormalize.infer 1.46% : 0.001644s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000047s : 1: rewriter_after_opt_a 0.14% : 0.000154s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000089s : 1: symbol_engine_optimizer 39.18% : 0.044182s : 1: task_emit 0.07% : 0.000084s : 1: tuple_transform 10.91% : 0.012302s : 1: type_inference 0.06% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x3-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x3-ge],max_mem:8.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x4-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x4-pynative],max_mem:8.0M TotalTime = 0.0222064, [24] [bootstrap]: 0.00063702 [type_inference]: 0.0063872 [event_method]: 1.451e-05 [auto_monad]: 6.004e-05 [graph_reusing]: 5.66998e-06 [inline]: 2.19001e-06 [add_attr]: 0.00354967, [1] [add_attr_with_inline]: 0.00353899, [1] [Cycle 1]: 4.703e-05, [2] [tag_attr]: 1.636e-05 [meta_addattr_fg_expand]: 4.76002e-06 [parallel-infer-symbol]: 3.45e-06 [pre_auto_parallel]: 2.567e-05 [insert-virtual-dataset]: 2.59999e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.78002e-06 [optimize]: 0.00417381, [53] [py_interpret_to_execute]: 1.989e-05 [rewriter_before_opt_a]: 6.246e-05 [opt_a]: 0.00225097, [2] [Cycle 1]: 0.00163394, [45] [expand_dump_flag]: 2.84001e-06 [switch_simplify]: 3.314e-05 [loop_unroll]: 2.094e-05 [a_1]: 0.00044502 [with_stream_mark]: 1.349e-05 [recompute_prepare]: 8.22e-06 [updatestate_depend_eliminate]: 3.8e-06 [updatestate_assign_eliminate]: 3.41001e-06 [updatestate_loads_eliminate]: 3.19001e-06 [parameter_eliminate]: 1.87001e-06 [a_2]: 8.203e-05 [accelerated_algorithm]: 6.71e-06 [shard]: 2.06e-06 [meta_shard_fg_expand]: 2.02999e-06 [shard_inline]: 6.31998e-06 [merge_send_recv]: 8.94e-06 [auto_parallel]: 5.83002e-06 [parallel]: 2.594e-05 [flash_sp]: 7.6e-06 [merge_comm]: 3.85e-06 [allreduce_fusion]: 3.53e-06 [matmul_add_comm_reduction]: 9.12001e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.53999e-06 [virtual_dataset]: 6.20997e-06 [get_grad_eliminate_]: 5.84999e-06 [virtual_output]: 6.23e-06 [merge_forward]: 4.04002e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.17999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.163e-05 [merge_recompute_call_nodes]: 1.54998e-06 [before_grad]: 1.033e-05 [set_forward_comm_id_for_comm_node_pass]: 3.76999e-06 [meta_fg_expand]: 2.70002e-06 [flash_sp_send_recv_attached]: 2.90002e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.019e-05 [a_after_grad]: 8.81997e-06 [renormalize]: 0.00050508 [add_forward_monad_depend]: 7.93999e-06 [auto_monad_grad]: 1.84e-06 [auto_monad_eliminator]: 1.406e-05 [cse]: 2.938e-05 [a_3]: 4.254e-05 [Cycle 2]: 0.00060762, [45] [expand_dump_flag]: 1.15001e-06 [switch_simplify]: 7.16001e-06 [loop_unroll]: 5.87999e-06 [a_1]: 0.00011526 [with_stream_mark]: 1.071e-05 [recompute_prepare]: 5.85002e-06 [updatestate_depend_eliminate]: 2.86e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.77002e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 7.254e-05 [accelerated_algorithm]: 5.81998e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.34e-06 [shard_inline]: 5.94999e-06 [merge_send_recv]: 4.46002e-06 [auto_parallel]: 5.71e-06 [parallel]: 4.80999e-06 [flash_sp]: 3.66999e-06 [merge_comm]: 3.31001e-06 [allreduce_fusion]: 2.94001e-06 [matmul_add_comm_reduction]: 5.64e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.26998e-06 [virtual_dataset]: 5.44e-06 [get_grad_eliminate_]: 5.17999e-06 [virtual_output]: 5.10999e-06 [merge_forward]: 2.63e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 6.28e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.053e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 8.42998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.67998e-06 [meta_fg_expand]: 1.74e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 8.23999e-06 [a_after_grad]: 8.13001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.31998e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.16e-06 [cse]: 1.729e-05 [a_3]: 3.378e-05 [py_interpret_to_execute_after_opt_a]: 7.48999e-06 [slice_cell_reuse_recomputed_activation]: 2.22001e-06 [rewriter_after_opt_a]: 3.337e-05 [convert_after_rewriter]: 6.53e-06 [order_py_execute_after_rewriter]: 5.15999e-06 [mutable_eliminate]: 0.00047928 [opt_b]: 0.00019112, [1] [Cycle 1]: 0.00018414, [7] [b_1]: 0.00011258 [b_2]: 7.33999e-06 [updatestate_depend_eliminate]: 5.06002e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 4.69998e-07 [cse]: 1.844e-05 [optimize_parallel_all_gather_comm]: 1.691e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.309e-05 [loop_unroll]: 0.00042195 [opt_after_cconv]: 9.606e-05, [1] [Cycle 1]: 9.026e-05, [7] [c_1]: 2.608e-05 [parameter_eliminate]: 2.14999e-06 [updatestate_depend_eliminate]: 5.56e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.18002e-06 [cse]: 1.74e-05 [renormalize]: 3.9002e-07 [remove_dup_value]: 1.54e-05 [tuple_transform]: 6.771e-05, [1] [Cycle 1]: 6.32e-05, [4] [d_1]: 3.652e-05 [none_parameter_eliminate]: 1.82999e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.58e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 5.02e-05 [cse_after_recomputation]: 2.212e-05, [1] [Cycle 1]: 1.707e-05, [1] [cse]: 1.154e-05 [environ_conv]: 7.91001e-06 [swap_dp_allreduce_reducescatter]: 5.36998e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.52e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.09999e-06 [micro_interleaved_order_control]: 2.63e-06 [assign_add_opt]: 1.60999e-06 [ForceFp32Comm]: 1.10001e-06 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.96001e-06 [comm_op_add_attrs]: 1.20999e-06 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.24e-06 [interleave_parallel_branches]: 1.17e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.73002e-06 [control_data_broadcast_order]: 1.247e-05 [grouped_pairwise_exchange_alltoall]: 2.12999e-06 [offloading_packed_experts]: 3.81001e-06 [overlap_recompute_and_grad_model_parallel]: 4.85999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.74999e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.827e-05 [begin_end_overlap_inline]: 5.60016e-07 [split_matmul_comm_elemetwise]: 2.26998e-06 [split_layernorm_comm]: 2.19999e-06 [handle_group_info]: 1.12999e-06 [symbol_engine_optimizer]: 7.284e-05, [1] [Cycle 1]: 6.834e-05, [6] [build]: 2.84999e-06 [elim_shapecalc]: 8.85001e-06 [elim_not_effective]: 1.262e-05 [opt_reshape]: 6.54999e-06 [fold_const_symbol]: 9.87999e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.87999e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.556e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.97e-06 [opt_after_jit_grad]: 0.00045964 [validate]: 3.502e-05 [backend_pass]: 1.12e-06 [task_emit]: 0.00660797 [execute]: 7.88999e-06 Sums bootstrap : 0.000637s : 3.61% type_inference : 0.006387s : 36.18% event_method : 0.000015s : 0.08% auto_monad : 0.000060s : 0.34% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.11% optimize.rewriter_before_opt_a : 0.000062s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000560s : 3.17% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000155s : 0.88% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000031s : 0.17% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000505s : 2.86% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.05% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000047s : 0.26% optimize.opt_a.a_3 : 0.000076s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000479s : 2.71% optimize.opt_b.b_1 : 0.000113s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000422s : 2.39% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000008s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000460s : 2.60% validate : 0.000035s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006608s : 37.43% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000172 26 18.81% : 0.000032s : 5: substitution.arithmetic_simplify 1.34% : 0.000002s : 2: substitution.elim_not_effective 0.87% : 0.000002s : 2: substitution.fold_const_symbol 3.05% : 0.000005s : 3: substitution.graph_param_transform 64.08% : 0.000110s : 3: substitution.inline 1.89% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.64% : 0.000005s : 4: substitution.remove_not_recompute_node 2.16% : 0.000004s : 2: substitution.replace_old_param 5.14% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006330 2 89.95% : 0.005694s : 1: type_inference.infer 10.05% : 0.000636s : 1: type_inference.specialize ------[replace.] 0.000036 4 78.97% : 0.000029s : 3: replace.inline 21.03% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 4 92.99% : 0.000108s : 3: match.inline 7.01% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 883 0.90% : 0.000001s : 9: predicate.accumulaten_eliminater 0.84% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.92% : 0.000001s : 9: predicate.addn_zero_filter 0.83% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.21% : 0.000003s : 15: predicate.arithmetic_simplify 0.92% : 0.000001s : 9: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.83% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.06% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.16% : 0.000002s : 12: predicate.environ_get_depend_swap 1.76% : 0.000003s : 18: predicate.environ_get_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.35% : 0.000004s : 13: predicate.float_depend_g_call 0.57% : 0.000001s : 6: predicate.float_environ_get_switch 0.85% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.68% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.71% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.52% : 0.000010s : 40: predicate.inline 0.87% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 6: predicate.less_batch_normalization 1.78% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 25: predicate.load_eliminater 1.02% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.20% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.90% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.06% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.62% : 0.000001s : 3: predicate.parallel_virtual_node 1.58% : 0.000002s : 13: predicate.partial_defer_inline 1.47% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.60% : 0.000001s : 6: predicate.reduce_all_const_elim 1.18% : 0.000002s : 9: predicate.reduce_eliminate 2.38% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 6: predicate.remove_not_recompute_node 1.33% : 0.000002s : 16: predicate.replace_applicator 0.64% : 0.000001s : 6: predicate.replace_old_param 0.25% : 0.000000s : 3: predicate.reset_defer_inline 0.99% : 0.000002s : 9: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 3: predicate.row_tensor_eliminate 0.73% : 0.000001s : 6: predicate.same_eliminate 0.48% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 6: predicate.shard_identity_eliminate 0.70% : 0.000001s : 6: predicate.special_op_eliminate 0.85% : 0.000001s : 6: predicate.specialize_transform 0.91% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.70% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 13: predicate.switch_defer_inline 1.95% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.95% : 0.000008s : 43: predicate.switch_simplify 0.90% : 0.000001s : 9: predicate.tile_eliminate 0.97% : 0.000002s : 9: predicate.transpose_eliminate 1.61% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.38% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 3: predicate.value_based_eliminate 0.65% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 6: predicate.virtual_output_eliminate 0.35% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000388 8 46.35% : 0.000180s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.65% : 0.000208s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031522 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.28% : 0.003554s : 1: add_attr 11.24% : 0.003542s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000055s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.21% : 0.000065s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.15% : 0.000677s : 1: bootstrap 0.08% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000011s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.37% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.55% : 0.000488s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.98% : 0.000938s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000092s : 28: opt.transform.opt_b 0.13% : 0.000041s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.15% : 0.002254s : 1: opt_a 0.32% : 0.000100s : 1: opt_after_cconv 1.49% : 0.000468s : 1: opt_after_jit_grad 0.62% : 0.000194s : 1: opt_b 13.25% : 0.004178s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000030s : 1: pre_auto_parallel 0.07% : 0.000024s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.76% : 0.000241s : 1: renormalize.infer 0.82% : 0.000257s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000037s : 1: rewriter_after_opt_a 0.21% : 0.000067s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000076s : 1: symbol_engine_optimizer 21.00% : 0.006619s : 1: task_emit 0.22% : 0.000070s : 1: tuple_transform 20.31% : 0.006401s : 1: type_inference 0.20% : 0.000064s : 1: validate TotalTime = 0.0207206, [24] [bootstrap]: 0.00041696 [type_inference]: 0.00600296 [event_method]: 1.513e-05 [auto_monad]: 6.221e-05 [graph_reusing]: 5.90002e-06 [inline]: 1.94999e-06 [add_attr]: 0.00310054, [1] [add_attr_with_inline]: 0.00309219, [1] [Cycle 1]: 4.895e-05, [2] [tag_attr]: 1.455e-05 [meta_addattr_fg_expand]: 3.91001e-06 [parallel-infer-symbol]: 3.04999e-06 [pre_auto_parallel]: 2.612e-05 [insert-virtual-dataset]: 2.61e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.21998e-06 [pipeline_split]: 1.91e-06 [optimize]: 0.00404381, [53] [py_interpret_to_execute]: 2.092e-05 [rewriter_before_opt_a]: 5.213e-05 [opt_a]: 0.00209911, [2] [Cycle 1]: 0.0014799, [45] [expand_dump_flag]: 2.88e-06 [switch_simplify]: 2.905e-05 [loop_unroll]: 1.719e-05 [a_1]: 0.00035339 [with_stream_mark]: 1.453e-05 [recompute_prepare]: 7.98001e-06 [updatestate_depend_eliminate]: 3.95e-06 [updatestate_assign_eliminate]: 3.55e-06 [updatestate_loads_eliminate]: 3.5e-06 [parameter_eliminate]: 1.86e-06 [a_2]: 9.777e-05 [accelerated_algorithm]: 7.65e-06 [shard]: 2.55002e-06 [meta_shard_fg_expand]: 1.98002e-06 [shard_inline]: 6.30002e-06 [merge_send_recv]: 8.79e-06 [auto_parallel]: 7e-06 [parallel]: 2.004e-05 [flash_sp]: 7.55998e-06 [merge_comm]: 3.70998e-06 [allreduce_fusion]: 3.83001e-06 [matmul_add_comm_reduction]: 9.50001e-06 [allreduce_slice_to_reducescatter]: 7.60017e-07 [virtual_shard_identity]: 8.06001e-06 [virtual_dataset]: 6.18998e-06 [get_grad_eliminate_]: 5.79999e-06 [virtual_output]: 5.70001e-06 [merge_forward]: 4.04002e-06 [cell_reuse_recompute_pass]: 1.09998e-06 [offload_activation]: 1.056e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.205e-05 [merge_recompute_call_nodes]: 1.49998e-06 [before_grad]: 1.022e-05 [set_forward_comm_id_for_comm_node_pass]: 3.83001e-06 [meta_fg_expand]: 2.73e-06 [flash_sp_send_recv_attached]: 2.60002e-06 [receive_attached]: 2.04e-06 [after_resolve]: 9.97999e-06 [a_after_grad]: 8.52e-06 [renormalize]: 0.00043872 [add_forward_monad_depend]: 4.81002e-06 [auto_monad_grad]: 1.81e-06 [auto_monad_eliminator]: 1.468e-05 [cse]: 2.909e-05 [a_3]: 4.301e-05 [Cycle 2]: 0.00060879, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 7.33e-06 [loop_unroll]: 5.65001e-06 [a_1]: 0.00011553 [with_stream_mark]: 1.255e-05 [recompute_prepare]: 6.04001e-06 [updatestate_depend_eliminate]: 2.94999e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 7.342e-05 [accelerated_algorithm]: 5.66e-06 [shard]: 1.19e-06 [meta_shard_fg_expand]: 1.37999e-06 [shard_inline]: 5.95002e-06 [merge_send_recv]: 4.88001e-06 [auto_parallel]: 5.84e-06 [parallel]: 4.3e-06 [flash_sp]: 3.41999e-06 [merge_comm]: 3.09999e-06 [allreduce_fusion]: 2.91e-06 [matmul_add_comm_reduction]: 5.27001e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 6.73e-06 [virtual_dataset]: 5.46998e-06 [get_grad_eliminate_]: 5.24e-06 [virtual_output]: 5.17e-06 [merge_forward]: 2.87002e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 6.38003e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.019e-05 [merge_recompute_call_nodes]: 1.00001e-06 [before_grad]: 8.72e-06 [set_forward_comm_id_for_comm_node_pass]: 3.69002e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 9.99979e-07 [after_resolve]: 8.91997e-06 [a_after_grad]: 8.25999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 6.17001e-06 [cse]: 1.27e-05 [a_3]: 3.348e-05 [py_interpret_to_execute_after_opt_a]: 8.99998e-06 [slice_cell_reuse_recomputed_activation]: 2.02001e-06 [rewriter_after_opt_a]: 3.418e-05 [convert_after_rewriter]: 6.49999e-06 [order_py_execute_after_rewriter]: 5.79e-06 [mutable_eliminate]: 0.00050724 [opt_b]: 0.00019349, [1] [Cycle 1]: 0.00018645, [7] [b_1]: 0.00011421 [b_2]: 7.70998e-06 [updatestate_depend_eliminate]: 6.21e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.17999e-06 [renormalize]: 4.00003e-07 [cse]: 1.78e-05 [optimize_parallel_all_gather_comm]: 1.722e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.455e-05 [loop_unroll]: 0.00043071 [opt_after_cconv]: 9.539e-05, [1] [Cycle 1]: 8.944e-05, [7] [c_1]: 2.513e-05 [parameter_eliminate]: 2.66999e-06 [updatestate_depend_eliminate]: 5.71998e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.56998e-06 [cse]: 1.658e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 1.465e-05 [tuple_transform]: 6.966e-05, [1] [Cycle 1]: 6.507e-05, [4] [d_1]: 3.802e-05 [none_parameter_eliminate]: 1.68002e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.47001e-06 [partial_unused_args_eliminate]: 1.97999e-06 [add_recomputation]: 4.528e-05 [cse_after_recomputation]: 1.99e-05, [1] [Cycle 1]: 1.567e-05, [1] [cse]: 1.046e-05 [environ_conv]: 5.50001e-06 [swap_dp_allreduce_reducescatter]: 4.87e-06 [bias_add_comm_swap]: 2.91999e-06 [label_micro_interleaved_index]: 4.47998e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.35002e-06 [micro_interleaved_order_control]: 2.19001e-06 [assign_add_opt]: 1.37999e-06 [ForceFp32Comm]: 1.05001e-06 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.40002e-06 [reorder_send_recv_between_fp_bp]: 2.91e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.08001e-06 [interleave_split_concat_branches]: 1.45001e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.50001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.333e-05 [grouped_pairwise_exchange_alltoall]: 1.77001e-06 [offloading_packed_experts]: 3.76999e-06 [overlap_recompute_and_grad_model_parallel]: 5.12999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.49999e-06 [overlap_grad_ring_attention]: 4.17003e-06 [overlap_grad_flash_sp]: 1.904e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.29001e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.10999e-06 [symbol_engine_optimizer]: 7.212e-05, [1] [Cycle 1]: 6.756e-05, [6] [build]: 2.60002e-06 [elim_shapecalc]: 8.89e-06 [elim_not_effective]: 1.205e-05 [opt_reshape]: 6.31e-06 [fold_const_symbol]: 9.52001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.522e-05 [get_jit_bprop_graph]: 1.29e-06 [rewriter_after_jit_bprop_graph]: 3.63999e-06 [opt_after_jit_grad]: 0.00047381 [validate]: 3.618e-05 [backend_pass]: 1.07998e-06 [task_emit]: 0.00627072 [execute]: 8.63001e-06 Sums bootstrap : 0.000417s : 2.51% type_inference : 0.006003s : 36.16% event_method : 0.000015s : 0.09% auto_monad : 0.000062s : 0.37% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000052s : 0.31% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000036s : 0.22% optimize.opt_a.loop_unroll : 0.000023s : 0.14% optimize.opt_a.a_1 : 0.000469s : 2.82% optimize.opt_a.with_stream_mark : 0.000027s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000171s : 1.03% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000024s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000439s : 2.64% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000042s : 0.25% optimize.opt_a.a_3 : 0.000076s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.21% optimize.convert_after_rewriter : 0.000006s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000507s : 3.06% optimize.opt_b.b_1 : 0.000114s : 0.69% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.15% optimize.loop_unroll : 0.000431s : 2.59% optimize.opt_after_cconv.c_1 : 0.000025s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.27% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000474s : 2.85% validate : 0.000036s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006271s : 37.77% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000143 24 20.58% : 0.000029s : 4: substitution.arithmetic_simplify 1.40% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.09% : 0.000006s : 3: substitution.graph_param_transform 64.92% : 0.000093s : 3: substitution.inline 2.29% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.16% : 0.000005s : 4: substitution.remove_not_recompute_node 2.55% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.005900 2 91.97% : 0.005426s : 1: type_inference.infer 8.03% : 0.000474s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000091 3 100.00% : 0.000091s : 3: match.inline ------[predicate.] 0.000147 815 0.89% : 0.000001s : 8: predicate.accumulaten_eliminater 0.81% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 1.11% : 0.000002s : 8: predicate.addn_zero_filter 0.81% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.24% : 0.000003s : 14: predicate.arithmetic_simplify 0.90% : 0.000001s : 8: predicate.cast_eliminate 0.71% : 0.000001s : 6: predicate.check_bprop_eliminate 0.63% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.68% : 0.000001s : 6: predicate.depend_value_elim 0.80% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 8: predicate.dict_get_item_eliminator 1.00% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.46% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.14% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_depend_swap 1.82% : 0.000003s : 17: predicate.environ_get_eliminate 1.11% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 6: predicate.float_environ_get_switch 0.91% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.79% : 0.000001s : 6: predicate.get_grad_eliminate 0.30% : 0.000000s : 3: predicate.graph_param_transform 0.79% : 0.000001s : 6: predicate.incorporate_call 0.65% : 0.000001s : 6: predicate.incorporate_call_switch 6.18% : 0.000009s : 37: predicate.inline 1.02% : 0.000002s : 6: predicate.inline_without_move 0.45% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 6: predicate.less_batch_normalization 1.56% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 22: predicate.load_eliminater 1.08% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.06% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.69% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 8: predicate.minmaximum_grad 1.11% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.42% : 0.000001s : 3: predicate.parallel_virtual_node 1.42% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 11: predicate.partial_eliminate 0.83% : 0.000001s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.17% : 0.000002s : 8: predicate.reduce_eliminate 2.26% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.64% : 0.000001s : 6: predicate.remove_not_recompute_node 1.21% : 0.000002s : 14: predicate.replace_applicator 0.76% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.89% : 0.000001s : 8: predicate.reshape_eliminate 0.68% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.56% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000001s : 6: predicate.same_eliminate 0.51% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 6: predicate.shard_identity_eliminate 0.77% : 0.000001s : 6: predicate.special_op_eliminate 0.88% : 0.000001s : 6: predicate.specialize_transform 0.96% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.23% : 0.000002s : 11: predicate.switch_defer_inline 1.87% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.83% : 0.000007s : 38: predicate.switch_simplify 0.87% : 0.000001s : 8: predicate.tile_eliminate 0.85% : 0.000001s : 8: predicate.transpose_eliminate 1.95% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.17% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 3: predicate.value_based_eliminate 0.79% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000290 7 36.72% : 0.000106s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.28% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029316 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.59% : 0.003106s : 1: add_attr 10.56% : 0.003096s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000068s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.54% : 0.000450s : 1: bootstrap 0.10% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.05% : 0.000014s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000439s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000516s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.92% : 0.000856s : 78: opt.transform.opt_a 0.08% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000093s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.17% : 0.002102s : 1: opt_a 0.34% : 0.000099s : 1: opt_after_cconv 1.65% : 0.000484s : 1: opt_after_jit_grad 0.67% : 0.000197s : 1: opt_b 13.81% : 0.004048s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.79% : 0.000231s : 1: renormalize.infer 0.68% : 0.000201s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000038s : 1: rewriter_after_opt_a 0.19% : 0.000056s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000075s : 1: symbol_engine_optimizer 21.44% : 0.006285s : 1: task_emit 0.25% : 0.000073s : 1: tuple_transform 20.54% : 0.006020s : 1: type_inference 0.26% : 0.000076s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x4-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x4-kbk],max_mem:8.0M . TotalTime = 1.05705, [24] [bootstrap]: 0.00060834 [type_inference]: 0.00674966 [event_method]: 1.465e-05 [auto_monad]: 6.038e-05 [graph_reusing]: 5.52999e-06 [inline]: 2.21998e-06 [add_attr]: 0.00374986, [1] [add_attr_with_inline]: 0.0037375, [1] [Cycle 1]: 5.552e-05, [2] [tag_attr]: 1.58e-05 [meta_addattr_fg_expand]: 4.94e-06 [parallel-infer-symbol]: 3.61999e-06 [pre_auto_parallel]: 2.85e-05 [insert-virtual-dataset]: 2.81999e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 2.20002e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.0043349, [53] [py_interpret_to_execute]: 2.416e-05 [rewriter_before_opt_a]: 6.561e-05 [opt_a]: 0.00235787, [2] [Cycle 1]: 0.00171381, [45] [expand_dump_flag]: 3.08998e-06 [switch_simplify]: 3.322e-05 [loop_unroll]: 2.041e-05 [a_1]: 0.00045242 [with_stream_mark]: 1.545e-05 [recompute_prepare]: 8.54e-06 [updatestate_depend_eliminate]: 4.22003e-06 [updatestate_assign_eliminate]: 3.83999e-06 [updatestate_loads_eliminate]: 3.09999e-06 [parameter_eliminate]: 1.80001e-06 [a_2]: 8.187e-05 [accelerated_algorithm]: 7.34002e-06 [shard]: 2.15002e-06 [meta_shard_fg_expand]: 2.39999e-06 [shard_inline]: 6.28002e-06 [merge_send_recv]: 8.86002e-06 [auto_parallel]: 6.45002e-06 [parallel]: 2.751e-05 [flash_sp]: 8.12998e-06 [merge_comm]: 4.13001e-06 [allreduce_fusion]: 3.53e-06 [matmul_add_comm_reduction]: 9.79e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.59002e-06 [virtual_dataset]: 6.16998e-06 [get_grad_eliminate_]: 5.76e-06 [virtual_output]: 5.77001e-06 [merge_forward]: 4.08001e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 1.054e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.212e-05 [merge_recompute_call_nodes]: 1.48002e-06 [before_grad]: 1.046e-05 [set_forward_comm_id_for_comm_node_pass]: 4.11001e-06 [meta_fg_expand]: 2.78998e-06 [flash_sp_send_recv_attached]: 2.73e-06 [receive_attached]: 2.36998e-06 [after_resolve]: 1.114e-05 [a_after_grad]: 9.23002e-06 [renormalize]: 0.00055353 [add_forward_monad_depend]: 9.00999e-06 [auto_monad_grad]: 2.22001e-06 [auto_monad_eliminator]: 1.569e-05 [cse]: 3.109e-05 [a_3]: 4.459e-05 [Cycle 2]: 0.00063288, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 7.23e-06 [loop_unroll]: 6.06e-06 [a_1]: 0.00011755 [with_stream_mark]: 1.056e-05 [recompute_prepare]: 6.01e-06 [updatestate_depend_eliminate]: 3.10002e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 7.315e-05 [accelerated_algorithm]: 5.72999e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.37999e-06 [shard_inline]: 5.80002e-06 [merge_send_recv]: 4.96002e-06 [auto_parallel]: 6.12001e-06 [parallel]: 4.49998e-06 [flash_sp]: 3.36999e-06 [merge_comm]: 3.19001e-06 [allreduce_fusion]: 2.89001e-06 [matmul_add_comm_reduction]: 6.39999e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 6.22001e-06 [virtual_dataset]: 5.34998e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 5.06002e-06 [merge_forward]: 3.18e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 6.76e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.115e-05 [merge_recompute_call_nodes]: 9.29984e-07 [before_grad]: 8.95999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.86999e-06 [meta_fg_expand]: 1.92001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.01997e-06 [after_resolve]: 8.33001e-06 [a_after_grad]: 7.9e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.79984e-07 [auto_monad_grad]: 1.27e-06 [auto_monad_eliminator]: 6.84999e-06 [cse]: 1.489e-05 [a_3]: 3.383e-05 [py_interpret_to_execute_after_opt_a]: 9.91e-06 [slice_cell_reuse_recomputed_activation]: 2.34999e-06 [rewriter_after_opt_a]: 3.574e-05 [convert_after_rewriter]: 6.51999e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00052473 [opt_b]: 0.00018896, [1] [Cycle 1]: 0.0001823, [7] [b_1]: 0.00011187 [b_2]: 6.89001e-06 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 2.58998e-06 [updatestate_loads_eliminate]: 2.59999e-06 [renormalize]: 7.7e-07 [cse]: 1.701e-05 [optimize_parallel_all_gather_comm]: 1.653e-05 [overlap_param_gather]: 2.32999e-06 [cconv]: 2.399e-05 [loop_unroll]: 0.00041971 [opt_after_cconv]: 9.527e-05, [1] [Cycle 1]: 8.925e-05, [7] [c_1]: 2.526e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.24999e-06 [cse]: 1.768e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.537e-05 [tuple_transform]: 6.798e-05, [1] [Cycle 1]: 6.373e-05, [4] [d_1]: 3.706e-05 [none_parameter_eliminate]: 1.81e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.78998e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.888e-05 [cse_after_recomputation]: 2.136e-05, [1] [Cycle 1]: 1.674e-05, [1] [cse]: 1.113e-05 [environ_conv]: 7.71001e-06 [swap_dp_allreduce_reducescatter]: 5.61e-06 [bias_add_comm_swap]: 2.90002e-06 [label_micro_interleaved_index]: 4.63001e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.34998e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.55002e-06 [assign_add_opt]: 1.84998e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.41998e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.06002e-06 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.25001e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.35001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.235e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.93001e-06 [overlap_recompute_and_grad_model_parallel]: 5.39e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.22001e-06 [overlap_grad_ring_attention]: 4.47998e-06 [overlap_grad_flash_sp]: 1.954e-05 [begin_end_overlap_inline]: 7.89994e-07 [split_matmul_comm_elemetwise]: 2.60002e-06 [split_layernorm_comm]: 2.04e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 7.337e-05, [1] [Cycle 1]: 6.888e-05, [6] [build]: 2.81999e-06 [elim_shapecalc]: 9.04e-06 [elim_not_effective]: 1.246e-05 [opt_reshape]: 6.52001e-06 [fold_const_symbol]: 9.78002e-06 [renormalize]: 2.00002e-07 [detach_backward]: 2.23998e-06 [pipeline_parallel_scheduler]: 1.56998e-06 [auto_monad_reorder]: 1.608e-05 [get_jit_bprop_graph]: 1.27e-06 [rewriter_after_jit_bprop_graph]: 3.61001e-06 [opt_after_jit_grad]: 0.00045435 [validate]: 3.466e-05 [backend_pass]: 9.00007e-07 [task_emit]: 1.04074 [execute]: 8.59998e-06 Sums bootstrap : 0.000608s : 0.06% type_inference : 0.006750s : 0.64% event_method : 0.000015s : 0.00% auto_monad : 0.000060s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.00% optimize.rewriter_before_opt_a : 0.000066s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000570s : 0.05% optimize.opt_a.with_stream_mark : 0.000026s : 0.00% optimize.opt_a.recompute_prepare : 0.000015s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000155s : 0.01% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000014s : 0.00% optimize.opt_a.auto_parallel : 0.000013s : 0.00% optimize.opt_a.parallel : 0.000032s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000554s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.00% optimize.opt_a.cse : 0.000046s : 0.00% optimize.opt_a.a_3 : 0.000078s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000525s : 0.05% optimize.opt_b.b_1 : 0.000112s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.00% optimize.loop_unroll : 0.000420s : 0.04% optimize.opt_after_cconv.c_1 : 0.000025s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.00% optimize.tuple_transform.d_1 : 0.000037s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.00% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000008s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000020s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000454s : 0.04% validate : 0.000035s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 1.040736s : 98.91% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000180 26 18.62% : 0.000033s : 5: substitution.arithmetic_simplify 1.02% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 2.98% : 0.000005s : 3: substitution.graph_param_transform 64.70% : 0.000116s : 3: substitution.inline 2.15% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.67% : 0.000005s : 4: substitution.remove_not_recompute_node 2.03% : 0.000004s : 2: substitution.replace_old_param 5.04% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006697 2 90.81% : 0.006082s : 1: type_inference.infer 9.19% : 0.000615s : 1: type_inference.specialize ------[replace.] 0.000039 4 79.52% : 0.000031s : 3: replace.inline 20.48% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 4 93.19% : 0.000114s : 3: match.inline 6.81% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 883 0.90% : 0.000001s : 9: predicate.accumulaten_eliminater 0.78% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.88% : 0.000001s : 9: predicate.addn_zero_filter 0.83% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 15: predicate.arithmetic_simplify 1.10% : 0.000002s : 9: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.61% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.05% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_depend_swap 1.79% : 0.000003s : 18: predicate.environ_get_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.43% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.87% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.70% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.70% : 0.000001s : 6: predicate.incorporate_call 0.56% : 0.000001s : 6: predicate.incorporate_call_switch 6.29% : 0.000010s : 40: predicate.inline 1.05% : 0.000002s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.97% : 0.000002s : 6: predicate.less_batch_normalization 1.64% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 25: predicate.load_eliminater 1.01% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.20% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.80% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 9: predicate.minmaximum_grad 1.08% : 0.000002s : 3: predicate.mutable_eliminate 0.42% : 0.000001s : 3: predicate.opt_reshape 0.35% : 0.000001s : 3: predicate.parallel_virtual_node 1.71% : 0.000003s : 13: predicate.partial_defer_inline 1.44% : 0.000002s : 13: predicate.partial_eliminate 0.88% : 0.000001s : 9: predicate.print_const_string_wrapper 0.61% : 0.000001s : 6: predicate.reduce_all_const_elim 1.10% : 0.000002s : 9: predicate.reduce_eliminate 2.38% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 6: predicate.remove_not_recompute_node 1.31% : 0.000002s : 16: predicate.replace_applicator 0.60% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 9: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.83% : 0.000001s : 6: predicate.same_eliminate 0.50% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 6: predicate.shard_identity_eliminate 0.75% : 0.000001s : 6: predicate.special_op_eliminate 0.86% : 0.000001s : 6: predicate.specialize_transform 0.97% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 13: predicate.switch_defer_inline 1.98% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.98% : 0.000008s : 43: predicate.switch_simplify 0.89% : 0.000001s : 9: predicate.tile_eliminate 0.90% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.07% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.68% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000394 8 46.50% : 0.000183s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.50% : 0.000211s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.066778 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.35% : 0.003755s : 1: add_attr 0.35% : 0.003741s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000053s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.01% : 0.000065s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.06% : 0.000647s : 1: bootstrap 0.00% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000011s : 1: environ_conv 0.00% : 0.000021s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.04% : 0.000428s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000533s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.09% : 0.000951s : 78: opt.transform.opt_a 0.00% : 0.000024s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000091s : 28: opt.transform.opt_b 0.00% : 0.000042s : 2: opt.transform.opt_trans_graph 0.00% : 0.000034s : 4: opt.transform.symbol_engine_opt 0.22% : 0.002361s : 1: opt_a 0.01% : 0.000099s : 1: opt_after_cconv 0.04% : 0.000464s : 1: opt_after_jit_grad 0.02% : 0.000192s : 1: opt_b 0.41% : 0.004339s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000028s : 1: py_interpret_to_execute 0.00% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000019s : 1: remove_dup_value 0.03% : 0.000289s : 1: renormalize.infer 0.02% : 0.000256s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000040s : 1: rewriter_after_opt_a 0.01% : 0.000070s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000076s : 1: symbol_engine_optimizer 97.56% : 1.040759s : 1: task_emit 0.01% : 0.000071s : 1: tuple_transform 0.63% : 0.006767s : 1: type_inference 0.01% : 0.000059s : 1: validate TotalTime = 0.0630147, [24] [bootstrap]: 0.00041975 [type_inference]: 0.0061633 [event_method]: 1.274e-05 [auto_monad]: 5.786e-05 [graph_reusing]: 5.67001e-06 [inline]: 2.42001e-06 [add_attr]: 0.00300923, [1] [add_attr_with_inline]: 0.00300074, [1] [Cycle 1]: 4.814e-05, [2] [tag_attr]: 1.37e-05 [meta_addattr_fg_expand]: 3.97998e-06 [parallel-infer-symbol]: 2.97002e-06 [pre_auto_parallel]: 2.363e-05 [insert-virtual-dataset]: 2.58003e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 2.27999e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00393983, [53] [py_interpret_to_execute]: 2.048e-05 [rewriter_before_opt_a]: 6.471e-05 [opt_a]: 0.00204476, [2] [Cycle 1]: 0.00142816, [45] [expand_dump_flag]: 3.01999e-06 [switch_simplify]: 2.973e-05 [loop_unroll]: 1.752e-05 [a_1]: 0.00035276 [with_stream_mark]: 1.468e-05 [recompute_prepare]: 7.98999e-06 [updatestate_depend_eliminate]: 3.88001e-06 [updatestate_assign_eliminate]: 3.71001e-06 [updatestate_loads_eliminate]: 3.2e-06 [parameter_eliminate]: 1.87001e-06 [a_2]: 8.289e-05 [accelerated_algorithm]: 6.49999e-06 [shard]: 2.54001e-06 [meta_shard_fg_expand]: 1.56998e-06 [shard_inline]: 6.23998e-06 [merge_send_recv]: 8.67998e-06 [auto_parallel]: 6.28e-06 [parallel]: 1.84e-05 [flash_sp]: 7.77998e-06 [merge_comm]: 3.75e-06 [allreduce_fusion]: 3.62002e-06 [matmul_add_comm_reduction]: 9.58997e-06 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 7.61999e-06 [virtual_dataset]: 6.16998e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 5.58997e-06 [merge_forward]: 4.18001e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 9.16002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.175e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 1.02e-05 [set_forward_comm_id_for_comm_node_pass]: 3.57002e-06 [meta_fg_expand]: 2.58e-06 [flash_sp_send_recv_attached]: 2.55997e-06 [receive_attached]: 2.23998e-06 [after_resolve]: 9.76e-06 [a_after_grad]: 8.69998e-06 [renormalize]: 0.00041121 [add_forward_monad_depend]: 4.85999e-06 [auto_monad_grad]: 2.04e-06 [auto_monad_eliminator]: 1.411e-05 [cse]: 3.126e-05 [a_3]: 4.325e-05 [Cycle 2]: 0.00060697, [45] [expand_dump_flag]: 1.07998e-06 [switch_simplify]: 7.18e-06 [loop_unroll]: 5.90002e-06 [a_1]: 0.00011447 [with_stream_mark]: 1.031e-05 [recompute_prepare]: 5.91e-06 [updatestate_depend_eliminate]: 2.99001e-06 [updatestate_assign_eliminate]: 2.33002e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 7.256e-05 [accelerated_algorithm]: 5.82001e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.66998e-06 [merge_send_recv]: 4.65001e-06 [auto_parallel]: 5.52999e-06 [parallel]: 3.97e-06 [flash_sp]: 3.93001e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 2.96001e-06 [matmul_add_comm_reduction]: 5.15999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.34001e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.34e-06 [virtual_output]: 5.26002e-06 [merge_forward]: 2.73998e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 6.14001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.041e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.41002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.40998e-06 [meta_fg_expand]: 1.86e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 9.50007e-07 [after_resolve]: 8.43001e-06 [a_after_grad]: 8.05e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.31002e-06 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 6.54001e-06 [cse]: 1.344e-05 [a_3]: 3.32e-05 [py_interpret_to_execute_after_opt_a]: 7.39002e-06 [slice_cell_reuse_recomputed_activation]: 1.88002e-06 [rewriter_after_opt_a]: 3.295e-05 [convert_after_rewriter]: 6.80002e-06 [order_py_execute_after_rewriter]: 4.97e-06 [mutable_eliminate]: 0.00046663 [opt_b]: 0.00018875, [1] [Cycle 1]: 0.00018234, [7] [b_1]: 0.00011093 [b_2]: 7.14001e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.16e-06 [renormalize]: 4.39992e-07 [cse]: 1.852e-05 [optimize_parallel_all_gather_comm]: 1.595e-05 [overlap_param_gather]: 2.20002e-06 [cconv]: 2.321e-05 [loop_unroll]: 0.0004221 [opt_after_cconv]: 9.613e-05, [1] [Cycle 1]: 9.053e-05, [7] [c_1]: 2.576e-05 [parameter_eliminate]: 2.45002e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.34999e-06 [cse]: 1.799e-05 [renormalize]: 6.59988e-07 [remove_dup_value]: 1.478e-05 [tuple_transform]: 6.818e-05, [1] [Cycle 1]: 6.378e-05, [4] [d_1]: 3.673e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.73e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 4.393e-05 [cse_after_recomputation]: 2.189e-05, [1] [Cycle 1]: 1.726e-05, [1] [cse]: 1.175e-05 [environ_conv]: 5.51998e-06 [swap_dp_allreduce_reducescatter]: 5.00001e-06 [bias_add_comm_swap]: 2.52001e-06 [label_micro_interleaved_index]: 4.48999e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 3.08998e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 8.59989e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.16998e-06 [reorder_send_recv_between_fp_bp]: 3.09999e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.39998e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.255e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 4.10998e-06 [overlap_recompute_and_grad_model_parallel]: 4.91002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50999e-06 [overlap_recompute_comm]: 2.76e-06 [overlap_grad_ring_attention]: 4.15e-06 [overlap_grad_flash_sp]: 1.727e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.22001e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 1.22e-06 [symbol_engine_optimizer]: 7.244e-05, [1] [Cycle 1]: 6.799e-05, [6] [build]: 2.51e-06 [elim_shapecalc]: 8.66997e-06 [elim_not_effective]: 1.237e-05 [opt_reshape]: 6.41e-06 [fold_const_symbol]: 9.64999e-06 [renormalize]: 2.60014e-07 [detach_backward]: 1.50999e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 1.553e-05 [get_jit_bprop_graph]: 1.09998e-06 [rewriter_after_jit_bprop_graph]: 3.75e-06 [opt_after_jit_grad]: 0.00048507 [validate]: 3.601e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.0486226 [execute]: 8.48001e-06 Sums bootstrap : 0.000420s : 0.71% type_inference : 0.006163s : 10.44% event_method : 0.000013s : 0.02% auto_monad : 0.000058s : 0.10% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000024s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000065s : 0.11% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000037s : 0.06% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000467s : 0.79% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000155s : 0.26% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.04% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000411s : 0.70% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000045s : 0.08% optimize.opt_a.a_3 : 0.000076s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000467s : 0.79% optimize.opt_b.b_1 : 0.000111s : 0.19% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.04% optimize.loop_unroll : 0.000422s : 0.72% optimize.opt_after_cconv.c_1 : 0.000026s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000037s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.07% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000485s : 0.82% validate : 0.000036s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.048623s : 82.39% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000141 24 20.52% : 0.000029s : 4: substitution.arithmetic_simplify 1.34% : 0.000002s : 2: substitution.elim_not_effective 1.10% : 0.000002s : 2: substitution.fold_const_symbol 3.89% : 0.000005s : 3: substitution.graph_param_transform 65.37% : 0.000092s : 3: substitution.inline 2.22% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.29% : 0.000005s : 4: substitution.remove_not_recompute_node 2.28% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006121 2 92.09% : 0.005636s : 1: type_inference.infer 7.91% : 0.000484s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000091 3 100.00% : 0.000091s : 3: match.inline ------[predicate.] 0.000148 815 0.92% : 0.000001s : 8: predicate.accumulaten_eliminater 1.04% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 8: predicate.addn_zero_filter 0.76% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.42% : 0.000004s : 14: predicate.arithmetic_simplify 0.91% : 0.000001s : 8: predicate.cast_eliminate 0.69% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.70% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.06% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.47% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.39% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 11: predicate.environ_get_depend_swap 1.87% : 0.000003s : 17: predicate.environ_get_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.16% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.25% : 0.000003s : 11: predicate.float_depend_g_call 0.60% : 0.000001s : 6: predicate.float_environ_get_switch 0.90% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 3: predicate.fold_const_symbol 0.82% : 0.000001s : 6: predicate.get_grad_eliminate 0.26% : 0.000000s : 3: predicate.graph_param_transform 0.76% : 0.000001s : 6: predicate.incorporate_call 0.64% : 0.000001s : 6: predicate.incorporate_call_switch 6.26% : 0.000009s : 37: predicate.inline 0.97% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 6: predicate.less_batch_normalization 1.52% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.25% : 0.000003s : 22: predicate.load_eliminater 1.16% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.03% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 8: predicate.minmaximum_grad 1.17% : 0.000002s : 3: predicate.mutable_eliminate 0.43% : 0.000001s : 3: predicate.opt_reshape 0.39% : 0.000001s : 3: predicate.parallel_virtual_node 1.48% : 0.000002s : 11: predicate.partial_defer_inline 1.30% : 0.000002s : 11: predicate.partial_eliminate 0.84% : 0.000001s : 8: predicate.print_const_string_wrapper 0.67% : 0.000001s : 6: predicate.reduce_all_const_elim 1.30% : 0.000002s : 8: predicate.reduce_eliminate 2.17% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.64% : 0.000001s : 6: predicate.remove_not_recompute_node 1.20% : 0.000002s : 14: predicate.replace_applicator 0.83% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.89% : 0.000001s : 8: predicate.reshape_eliminate 0.68% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 3: predicate.row_tensor_eliminate 0.87% : 0.000001s : 6: predicate.same_eliminate 0.51% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 6: predicate.shard_identity_eliminate 0.77% : 0.000001s : 6: predicate.special_op_eliminate 0.89% : 0.000001s : 6: predicate.specialize_transform 0.93% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.25% : 0.000002s : 11: predicate.switch_defer_inline 1.87% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.93% : 0.000007s : 38: predicate.switch_simplify 0.84% : 0.000001s : 8: predicate.tile_eliminate 0.86% : 0.000001s : 8: predicate.transpose_eliminate 1.52% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.15% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.16% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.77% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000316 7 39.12% : 0.000124s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.88% : 0.000192s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.071365 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.22% : 0.003014s : 1: add_attr 4.21% : 0.003004s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000063s : 1: auto_monad 0.03% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.63% : 0.000447s : 1: bootstrap 0.04% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000018s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.60% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.67% : 0.000476s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.17% : 0.000838s : 78: opt.transform.opt_a 0.03% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000091s : 28: opt.transform.opt_b 0.06% : 0.000041s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 2.87% : 0.002048s : 1: opt_a 0.14% : 0.000100s : 1: opt_after_cconv 0.69% : 0.000495s : 1: opt_after_jit_grad 0.27% : 0.000192s : 1: opt_b 5.53% : 0.003944s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000028s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000018s : 1: remove_dup_value 0.29% : 0.000207s : 1: renormalize.infer 0.28% : 0.000197s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000037s : 1: rewriter_after_opt_a 0.10% : 0.000069s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000075s : 1: symbol_engine_optimizer 68.15% : 0.048639s : 1: task_emit 0.10% : 0.000071s : 1: tuple_transform 8.66% : 0.006177s : 1: type_inference 0.08% : 0.000058s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x4-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x4-ge],max_mem:8.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x5-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x5-pynative],max_mem:8.0M TotalTime = 0.0225152, [24] [bootstrap]: 0.00060494 [type_inference]: 0.00666101 [event_method]: 1.459e-05 [auto_monad]: 6.143e-05 [graph_reusing]: 6.78998e-06 [inline]: 1.99999e-06 [add_attr]: 0.00363792, [1] [add_attr_with_inline]: 0.00362662, [1] [Cycle 1]: 5.063e-05, [2] [tag_attr]: 1.54e-05 [meta_addattr_fg_expand]: 4.50001e-06 [parallel-infer-symbol]: 3.3e-06 [pre_auto_parallel]: 2.724e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 2.31998e-06 [pipeline_split]: 2.00002e-06 [optimize]: 0.0043009, [53] [py_interpret_to_execute]: 8.662e-05 [rewriter_before_opt_a]: 6.655e-05 [opt_a]: 0.00224405, [2] [Cycle 1]: 0.00162626, [45] [expand_dump_flag]: 3.08e-06 [switch_simplify]: 3.501e-05 [loop_unroll]: 2.072e-05 [a_1]: 0.00044882 [with_stream_mark]: 1.404e-05 [recompute_prepare]: 8.03999e-06 [updatestate_depend_eliminate]: 4.13999e-06 [updatestate_assign_eliminate]: 3.38e-06 [updatestate_loads_eliminate]: 3.38999e-06 [parameter_eliminate]: 1.71998e-06 [a_2]: 8.191e-05 [accelerated_algorithm]: 6.81001e-06 [shard]: 2.39999e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 6.43e-06 [merge_send_recv]: 8.28999e-06 [auto_parallel]: 6.59999e-06 [parallel]: 2.865e-05 [flash_sp]: 7.89002e-06 [merge_comm]: 4.05e-06 [allreduce_fusion]: 3.62998e-06 [matmul_add_comm_reduction]: 9.69999e-06 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 7.7e-06 [virtual_dataset]: 6.14001e-06 [get_grad_eliminate_]: 5.70001e-06 [virtual_output]: 5.78002e-06 [merge_forward]: 4.12998e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 1.077e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.155e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 1.032e-05 [set_forward_comm_id_for_comm_node_pass]: 3.88001e-06 [meta_fg_expand]: 2.56998e-06 [flash_sp_send_recv_attached]: 2.96999e-06 [receive_attached]: 2.59001e-06 [after_resolve]: 9.74999e-06 [a_after_grad]: 8.74e-06 [renormalize]: 0.00048246 [add_forward_monad_depend]: 8.92e-06 [auto_monad_grad]: 2.21998e-06 [auto_monad_eliminator]: 1.339e-05 [cse]: 3.087e-05 [a_3]: 4.251e-05 [Cycle 2]: 0.00060787, [45] [expand_dump_flag]: 9.49978e-07 [switch_simplify]: 7.16001e-06 [loop_unroll]: 5.87001e-06 [a_1]: 0.00011618 [with_stream_mark]: 1.027e-05 [recompute_prepare]: 5.89999e-06 [updatestate_depend_eliminate]: 3.18e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.58003e-06 [parameter_eliminate]: 9.29984e-07 [a_2]: 7.3e-05 [accelerated_algorithm]: 5.80002e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.37999e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 4.45999e-06 [auto_parallel]: 5.40001e-06 [parallel]: 4.17e-06 [flash_sp]: 3.68e-06 [merge_comm]: 3.20002e-06 [allreduce_fusion]: 3.03e-06 [matmul_add_comm_reduction]: 5.22e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 6.74001e-06 [virtual_dataset]: 5.74e-06 [get_grad_eliminate_]: 5.29e-06 [virtual_output]: 5.33002e-06 [merge_forward]: 3.03e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 6.04999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.037e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 9.12001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 1.81e-06 [flash_sp_send_recv_attached]: 7.29982e-07 [receive_attached]: 1.11002e-06 [after_resolve]: 8.15e-06 [a_after_grad]: 7.89002e-06 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 9.50007e-07 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 6.61999e-06 [cse]: 1.437e-05 [a_3]: 3.363e-05 [py_interpret_to_execute_after_opt_a]: 8.43999e-06 [slice_cell_reuse_recomputed_activation]: 1.87999e-06 [rewriter_after_opt_a]: 3.389e-05 [convert_after_rewriter]: 6.79999e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.00049933 [opt_b]: 0.00019587, [1] [Cycle 1]: 0.00018875, [7] [b_1]: 0.00011284 [b_2]: 7.83001e-06 [updatestate_depend_eliminate]: 5.96e-06 [updatestate_assign_eliminate]: 2.91e-06 [updatestate_loads_eliminate]: 2.81e-06 [renormalize]: 3.9002e-07 [cse]: 1.923e-05 [optimize_parallel_all_gather_comm]: 1.744e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.425e-05 [loop_unroll]: 0.00043491 [opt_after_cconv]: 9.895e-05, [1] [Cycle 1]: 9.27e-05, [7] [c_1]: 2.65e-05 [parameter_eliminate]: 3.13e-06 [updatestate_depend_eliminate]: 5.26002e-06 [updatestate_assign_eliminate]: 2.70997e-06 [updatestate_loads_eliminate]: 2.34999e-06 [cse]: 1.773e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.482e-05 [tuple_transform]: 6.881e-05, [1] [Cycle 1]: 6.424e-05, [4] [d_1]: 3.757e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.33e-06 [partial_unused_args_eliminate]: 2.19999e-06 [add_recomputation]: 5.222e-05 [cse_after_recomputation]: 2.216e-05, [1] [Cycle 1]: 1.742e-05, [1] [cse]: 1.193e-05 [environ_conv]: 8.86002e-06 [swap_dp_allreduce_reducescatter]: 5.27999e-06 [bias_add_comm_swap]: 2.49999e-06 [label_micro_interleaved_index]: 1.518e-05 [label_fine_grained_interleaved_index]: 2.74001e-06 [merge_cast_opt]: 1.52999e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.62001e-06 [ForceFp32Comm]: 1.02e-06 [remove_cast_before_assign_add]: 1.19998e-06 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 2.96999e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.34e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.48002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.231e-05 [grouped_pairwise_exchange_alltoall]: 1.67999e-06 [offloading_packed_experts]: 3.85e-06 [overlap_recompute_and_grad_model_parallel]: 4.50999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.64999e-06 [overlap_grad_ring_attention]: 4.31002e-06 [overlap_grad_flash_sp]: 1.886e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.76998e-06 [handle_group_info]: 1.19e-06 [symbol_engine_optimizer]: 7.358e-05, [1] [Cycle 1]: 6.905e-05, [6] [build]: 2.98e-06 [elim_shapecalc]: 9.63002e-06 [elim_not_effective]: 1.237e-05 [opt_reshape]: 6.24999e-06 [fold_const_symbol]: 9.70002e-06 [renormalize]: 2.3999e-07 [detach_backward]: 2.06998e-06 [pipeline_parallel_scheduler]: 1.67999e-06 [auto_monad_reorder]: 1.702e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.78999e-06 [opt_after_jit_grad]: 0.00047082 [validate]: 3.664e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.0064355 [execute]: 8.15e-06 Sums bootstrap : 0.000605s : 3.39% type_inference : 0.006661s : 37.30% event_method : 0.000015s : 0.08% auto_monad : 0.000061s : 0.34% graph_reusing : 0.000007s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000087s : 0.49% optimize.rewriter_before_opt_a : 0.000067s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000042s : 0.24% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000565s : 3.16% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000155s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000033s : 0.18% optimize.opt_a.flash_sp : 0.000012s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000017s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000483s : 2.70% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.06% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000045s : 0.25% optimize.opt_a.a_3 : 0.000076s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000499s : 2.80% optimize.opt_b.b_1 : 0.000113s : 0.63% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.14% optimize.loop_unroll : 0.000435s : 2.44% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000009s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000015s : 0.09% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000471s : 2.64% validate : 0.000037s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006436s : 36.04% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000172 26 18.86% : 0.000032s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.96% : 0.000002s : 2: substitution.fold_const_symbol 3.04% : 0.000005s : 3: substitution.graph_param_transform 64.36% : 0.000111s : 3: substitution.inline 1.87% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.62% : 0.000005s : 4: substitution.remove_not_recompute_node 1.91% : 0.000003s : 2: substitution.replace_old_param 5.26% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006607 2 89.99% : 0.005945s : 1: type_inference.infer 10.01% : 0.000661s : 1: type_inference.specialize ------[replace.] 0.000036 4 78.95% : 0.000028s : 3: replace.inline 21.05% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 4 92.98% : 0.000109s : 3: match.inline 7.02% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 883 0.97% : 0.000002s : 9: predicate.accumulaten_eliminater 0.90% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 0.87% : 0.000001s : 9: predicate.addn_zero_filter 0.83% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.30% : 0.000004s : 15: predicate.arithmetic_simplify 0.94% : 0.000001s : 9: predicate.cast_eliminate 0.60% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.18% : 0.000000s : 3: predicate.const_output_eliminate 0.75% : 0.000001s : 6: predicate.depend_value_elim 0.97% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 3: predicate.elim_not_effective 0.42% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_depend_swap 1.84% : 0.000003s : 18: predicate.environ_get_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.31% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.36% : 0.000004s : 13: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.82% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.71% : 0.000001s : 6: predicate.get_grad_eliminate 0.36% : 0.000001s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.60% : 0.000001s : 6: predicate.incorporate_call_switch 6.45% : 0.000010s : 40: predicate.inline 0.87% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 6: predicate.less_batch_normalization 1.66% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.32% : 0.000004s : 25: predicate.load_eliminater 0.98% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.86% : 0.000001s : 9: predicate.minmaximum_grad 1.32% : 0.000002s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.39% : 0.000001s : 3: predicate.parallel_virtual_node 1.56% : 0.000002s : 13: predicate.partial_defer_inline 1.45% : 0.000002s : 13: predicate.partial_eliminate 0.91% : 0.000001s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.16% : 0.000002s : 9: predicate.reduce_eliminate 2.37% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.30% : 0.000002s : 16: predicate.replace_applicator 0.61% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 9: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.84% : 0.000001s : 6: predicate.same_eliminate 0.50% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 6: predicate.shard_identity_eliminate 0.72% : 0.000001s : 6: predicate.special_op_eliminate 0.82% : 0.000001s : 6: predicate.specialize_transform 0.92% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.68% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 13: predicate.switch_defer_inline 1.96% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 43: predicate.switch_simplify 0.93% : 0.000001s : 9: predicate.tile_eliminate 0.86% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.68% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.55% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.67% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.42% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000388 8 48.60% : 0.000189s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.40% : 0.000200s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032034 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.37% : 0.003642s : 1: add_attr 11.33% : 0.003630s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000067s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.02% : 0.000646s : 1: bootstrap 0.09% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000012s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000011s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.06% : 0.000019s : 1: label_micro_interleaved_index 1.38% : 0.000444s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.59% : 0.000509s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 2.95% : 0.000945s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000093s : 28: opt.transform.opt_b 0.13% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.02% : 0.002247s : 1: opt_a 0.32% : 0.000102s : 1: opt_after_cconv 1.50% : 0.000481s : 1: opt_after_jit_grad 0.62% : 0.000199s : 1: opt_b 13.44% : 0.004305s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000032s : 1: pre_auto_parallel 0.29% : 0.000092s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.79% : 0.000254s : 1: renormalize.infer 0.69% : 0.000221s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000038s : 1: rewriter_after_opt_a 0.22% : 0.000071s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000076s : 1: symbol_engine_optimizer 20.12% : 0.006446s : 1: task_emit 0.22% : 0.000072s : 1: tuple_transform 20.84% : 0.006676s : 1: type_inference 0.20% : 0.000064s : 1: validate TotalTime = 0.0213937, [24] [bootstrap]: 0.00054435 [type_inference]: 0.00640587 [event_method]: 1.336e-05 [auto_monad]: 6.18e-05 [graph_reusing]: 5.55001e-06 [inline]: 1.89e-06 [add_attr]: 0.00323656, [1] [add_attr_with_inline]: 0.00322752, [1] [Cycle 1]: 5.817e-05, [2] [tag_attr]: 1.632e-05 [meta_addattr_fg_expand]: 4.18999e-06 [parallel-infer-symbol]: 3.35e-06 [pre_auto_parallel]: 2.613e-05 [insert-virtual-dataset]: 2.53e-06 [parallel-infer-symbol-second]: 9.79984e-07 [dataset_repeat_opt]: 2.23998e-06 [pipeline_split]: 1.96003e-06 [optimize]: 0.00414392, [53] [py_interpret_to_execute]: 2.206e-05 [rewriter_before_opt_a]: 5.229e-05 [opt_a]: 0.00217697, [2] [Cycle 1]: 0.00150777, [45] [expand_dump_flag]: 2.98e-06 [switch_simplify]: 2.967e-05 [loop_unroll]: 1.701e-05 [a_1]: 0.00036531 [with_stream_mark]: 1.58e-05 [recompute_prepare]: 8.94e-06 [updatestate_depend_eliminate]: 4.09002e-06 [updatestate_assign_eliminate]: 3.44001e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.79998e-06 [a_2]: 8.279e-05 [accelerated_algorithm]: 7.15e-06 [shard]: 1.94e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 6.17999e-06 [merge_send_recv]: 8.3e-06 [auto_parallel]: 7.15998e-06 [parallel]: 1.818e-05 [flash_sp]: 7.81001e-06 [merge_comm]: 4.05998e-06 [allreduce_fusion]: 3.95998e-06 [matmul_add_comm_reduction]: 1.016e-05 [allreduce_slice_to_reducescatter]: 1.10999e-06 [virtual_shard_identity]: 8.25e-06 [virtual_dataset]: 6.16998e-06 [get_grad_eliminate_]: 6.16e-06 [virtual_output]: 5.75001e-06 [merge_forward]: 3.73999e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 1.02e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.248e-05 [merge_recompute_call_nodes]: 1.57001e-06 [before_grad]: 1.067e-05 [set_forward_comm_id_for_comm_node_pass]: 3.95998e-06 [meta_fg_expand]: 2.48e-06 [flash_sp_send_recv_attached]: 2.76e-06 [receive_attached]: 2.48002e-06 [after_resolve]: 1.158e-05 [a_after_grad]: 8.94003e-06 [renormalize]: 0.00045568 [add_forward_monad_depend]: 5.29998e-06 [auto_monad_grad]: 2.22999e-06 [auto_monad_eliminator]: 1.524e-05 [cse]: 3.027e-05 [a_3]: 4.416e-05 [Cycle 2]: 0.00065819, [45] [expand_dump_flag]: 1.39e-06 [switch_simplify]: 6.58998e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.000117 [with_stream_mark]: 1.163e-05 [recompute_prepare]: 6.11998e-06 [updatestate_depend_eliminate]: 3.44001e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.64999e-06 [parameter_eliminate]: 1.09e-06 [a_2]: 7.273e-05 [accelerated_algorithm]: 6.04001e-06 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.45001e-06 [shard_inline]: 5.92001e-06 [merge_send_recv]: 4.43001e-06 [auto_parallel]: 6.06e-06 [parallel]: 4.53999e-06 [flash_sp]: 3.73001e-06 [merge_comm]: 3.27002e-06 [allreduce_fusion]: 2.98e-06 [matmul_add_comm_reduction]: 6.11e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.76e-06 [virtual_dataset]: 4.213e-05 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.27001e-06 [merge_forward]: 2.84001e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 7.43999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.096e-05 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 9.06998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.73001e-06 [meta_fg_expand]: 1.87999e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.17999e-06 [after_resolve]: 8.15e-06 [a_after_grad]: 7.85e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.54998e-06 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 7.11999e-06 [cse]: 1.519e-05 [a_3]: 3.33e-05 [py_interpret_to_execute_after_opt_a]: 9.32999e-06 [slice_cell_reuse_recomputed_activation]: 2.08002e-06 [rewriter_after_opt_a]: 3.424e-05 [convert_after_rewriter]: 6.21998e-06 [order_py_execute_after_rewriter]: 5.17e-06 [mutable_eliminate]: 0.00050836 [opt_b]: 0.00019282, [1] [Cycle 1]: 0.00018606, [7] [b_1]: 0.00011217 [b_2]: 7.34002e-06 [updatestate_depend_eliminate]: 5.89e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 4.80009e-07 [cse]: 1.955e-05 [optimize_parallel_all_gather_comm]: 1.633e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.36e-05 [loop_unroll]: 0.00044463 [opt_after_cconv]: 9.938e-05, [1] [Cycle 1]: 9.355e-05, [7] [c_1]: 2.593e-05 [parameter_eliminate]: 2.91999e-06 [updatestate_depend_eliminate]: 5.43002e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.894e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.468e-05 [tuple_transform]: 7.059e-05, [1] [Cycle 1]: 6.59e-05, [4] [d_1]: 3.905e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 2.89991e-07 [switch_simplify]: 6.54999e-06 [partial_unused_args_eliminate]: 1.92001e-06 [add_recomputation]: 4.692e-05 [cse_after_recomputation]: 2.159e-05, [1] [Cycle 1]: 1.716e-05, [1] [cse]: 1.181e-05 [environ_conv]: 5.84e-06 [swap_dp_allreduce_reducescatter]: 5.27999e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.35999e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.69001e-06 [reorder_send_recv_between_fp_bp]: 3.09999e-06 [comm_op_add_attrs]: 1.06002e-06 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.49e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.308e-05 [grouped_pairwise_exchange_alltoall]: 1.96e-06 [offloading_packed_experts]: 3.4e-06 [overlap_recompute_and_grad_model_parallel]: 4.58001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.29001e-06 [overlap_grad_ring_attention]: 4.70999e-06 [overlap_grad_flash_sp]: 1.91e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.37001e-06 [split_layernorm_comm]: 2.07001e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 7.147e-05, [1] [Cycle 1]: 6.709e-05, [6] [build]: 2.69001e-06 [elim_shapecalc]: 9.47999e-06 [elim_not_effective]: 1.205e-05 [opt_reshape]: 6.12999e-06 [fold_const_symbol]: 9.39998e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.89999e-06 [pipeline_parallel_scheduler]: 1.62999e-06 [auto_monad_reorder]: 1.678e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 4.07e-06 [opt_after_jit_grad]: 0.00046897 [validate]: 3.785e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.00619401 [execute]: 8.61002e-06 Sums bootstrap : 0.000544s : 3.18% type_inference : 0.006406s : 37.38% event_method : 0.000013s : 0.08% auto_monad : 0.000062s : 0.36% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.13% optimize.rewriter_before_opt_a : 0.000052s : 0.31% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000036s : 0.21% optimize.opt_a.loop_unroll : 0.000023s : 0.13% optimize.opt_a.a_1 : 0.000482s : 2.81% optimize.opt_a.with_stream_mark : 0.000027s : 0.16% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000156s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.13% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000048s : 0.28% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000018s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000020s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000456s : 2.66% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.13% optimize.opt_a.cse : 0.000045s : 0.27% optimize.opt_a.a_3 : 0.000077s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.20% optimize.convert_after_rewriter : 0.000006s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000508s : 2.97% optimize.opt_b.b_1 : 0.000112s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.14% optimize.loop_unroll : 0.000445s : 2.59% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000019s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.27% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000469s : 2.74% validate : 0.000038s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006194s : 36.14% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000155 24 21.77% : 0.000034s : 4: substitution.arithmetic_simplify 1.23% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000001s : 2: substitution.fold_const_symbol 4.20% : 0.000007s : 3: substitution.graph_param_transform 64.48% : 0.000100s : 3: substitution.inline 2.15% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.22% : 0.000005s : 4: substitution.remove_not_recompute_node 2.06% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006355 2 92.36% : 0.005869s : 1: type_inference.infer 7.64% : 0.000486s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000098 3 100.00% : 0.000098s : 3: match.inline ------[predicate.] 0.000149 815 0.87% : 0.000001s : 8: predicate.accumulaten_eliminater 0.89% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 6: predicate.addn_check_dump 0.95% : 0.000001s : 8: predicate.addn_zero_filter 0.81% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.18% : 0.000003s : 14: predicate.arithmetic_simplify 0.87% : 0.000001s : 8: predicate.cast_eliminate 0.66% : 0.000001s : 6: predicate.check_bprop_eliminate 0.63% : 0.000001s : 6: predicate.compare_switch_simplify 0.24% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.21% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.68% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_depend_swap 1.74% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.16% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.26% : 0.000003s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 6: predicate.float_environ_get_switch 0.89% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.70% : 0.000001s : 6: predicate.get_grad_eliminate 0.36% : 0.000001s : 3: predicate.graph_param_transform 0.74% : 0.000001s : 6: predicate.incorporate_call 0.63% : 0.000001s : 6: predicate.incorporate_call_switch 6.41% : 0.000010s : 37: predicate.inline 1.05% : 0.000002s : 6: predicate.inline_without_move 0.44% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.08% : 0.000002s : 6: predicate.less_batch_normalization 1.67% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 22: predicate.load_eliminater 1.31% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.68% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 8: predicate.minmaximum_grad 1.36% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.36% : 0.000001s : 3: predicate.parallel_virtual_node 1.49% : 0.000002s : 11: predicate.partial_defer_inline 1.29% : 0.000002s : 11: predicate.partial_eliminate 0.85% : 0.000001s : 8: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.09% : 0.000002s : 8: predicate.reduce_eliminate 2.27% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 6: predicate.remove_not_recompute_node 1.17% : 0.000002s : 14: predicate.replace_applicator 0.64% : 0.000001s : 6: predicate.replace_old_param 0.25% : 0.000000s : 3: predicate.reset_defer_inline 0.88% : 0.000001s : 8: predicate.reshape_eliminate 0.68% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 3: predicate.row_tensor_eliminate 0.99% : 0.000001s : 6: predicate.same_eliminate 0.53% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 6: predicate.shard_identity_eliminate 0.76% : 0.000001s : 6: predicate.special_op_eliminate 0.91% : 0.000001s : 6: predicate.specialize_transform 1.01% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.24% : 0.000002s : 11: predicate.switch_defer_inline 1.90% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.90% : 0.000007s : 38: predicate.switch_simplify 0.91% : 0.000001s : 8: predicate.tile_eliminate 0.83% : 0.000001s : 8: predicate.transpose_eliminate 1.55% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.96% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 3: predicate.value_based_eliminate 0.78% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 6: predicate.virtual_output_eliminate 0.36% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000302 7 38.64% : 0.000117s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.36% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030282 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.71% : 0.003242s : 1: add_attr 10.67% : 0.003231s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000067s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.91% : 0.000580s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.05% : 0.000014s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000454s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.71% : 0.000518s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 2.96% : 0.000895s : 78: opt.transform.opt_a 0.08% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000092s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.20% : 0.002180s : 1: opt_a 0.34% : 0.000103s : 1: opt_after_cconv 1.58% : 0.000478s : 1: opt_after_jit_grad 0.65% : 0.000196s : 1: opt_b 13.70% : 0.004148s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.82% : 0.000248s : 1: renormalize.infer 0.66% : 0.000199s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000038s : 1: rewriter_after_opt_a 0.19% : 0.000057s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000074s : 1: symbol_engine_optimizer 20.49% : 0.006205s : 1: task_emit 0.24% : 0.000074s : 1: tuple_transform 21.21% : 0.006424s : 1: type_inference 0.22% : 0.000066s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x5-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x5-kbk],max_mem:8.0M TotalTime = 0.109877, [24] [bootstrap]: 0.00052935 [type_inference]: 0.0060524 [event_method]: 1.406e-05 [auto_monad]: 5.981e-05 [graph_reusing]: 5.42999e-06 [inline]: 1.98997e-06 [add_attr]: 0.00361825, [1] [add_attr_with_inline]: 0.00360636, [1] [Cycle 1]: 4.983e-05, [2] [tag_attr]: 1.51e-05 [meta_addattr_fg_expand]: 4.55999e-06 [parallel-infer-symbol]: 3.10998e-06 [pre_auto_parallel]: 2.896e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.73997e-06 [optimize]: 0.00427466, [53] [py_interpret_to_execute]: 2.048e-05 [rewriter_before_opt_a]: 6.21e-05 [opt_a]: 0.00230973, [2] [Cycle 1]: 0.00162551, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 3.275e-05 [loop_unroll]: 2.033e-05 [a_1]: 0.00044982 [with_stream_mark]: 1.443e-05 [recompute_prepare]: 8.48001e-06 [updatestate_depend_eliminate]: 3.95e-06 [updatestate_assign_eliminate]: 3.42002e-06 [updatestate_loads_eliminate]: 3.31001e-06 [parameter_eliminate]: 2.16e-06 [a_2]: 8.035e-05 [accelerated_algorithm]: 6.91001e-06 [shard]: 2.36998e-06 [meta_shard_fg_expand]: 1.96998e-06 [shard_inline]: 6.54001e-06 [merge_send_recv]: 8.60999e-06 [auto_parallel]: 6.76e-06 [parallel]: 2.496e-05 [flash_sp]: 7.64002e-06 [merge_comm]: 3.83999e-06 [allreduce_fusion]: 3.98001e-06 [matmul_add_comm_reduction]: 9.31e-06 [allreduce_slice_to_reducescatter]: 8.70001e-07 [virtual_shard_identity]: 7.31999e-06 [virtual_dataset]: 6.21998e-06 [get_grad_eliminate_]: 5.76e-06 [virtual_output]: 6.12999e-06 [merge_forward]: 4.41002e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 9.15999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.194e-05 [merge_recompute_call_nodes]: 1.67001e-06 [before_grad]: 9.74e-06 [set_forward_comm_id_for_comm_node_pass]: 3.98001e-06 [meta_fg_expand]: 3.01001e-06 [flash_sp_send_recv_attached]: 2.63e-06 [receive_attached]: 2.04e-06 [after_resolve]: 9.72001e-06 [a_after_grad]: 8.67998e-06 [renormalize]: 0.00048421 [add_forward_monad_depend]: 9.27999e-06 [auto_monad_grad]: 2.27999e-06 [auto_monad_eliminator]: 1.423e-05 [cse]: 2.939e-05 [a_3]: 4.318e-05 [Cycle 2]: 0.00067319, [45] [expand_dump_flag]: 1.23002e-06 [switch_simplify]: 6.94001e-06 [loop_unroll]: 5.73002e-06 [a_1]: 0.00017339 [with_stream_mark]: 1.13e-05 [recompute_prepare]: 6.32001e-06 [updatestate_depend_eliminate]: 2.98e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 7.355e-05 [accelerated_algorithm]: 5.84e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.31002e-06 [shard_inline]: 6.16998e-06 [merge_send_recv]: 5.23002e-06 [auto_parallel]: 5.94e-06 [parallel]: 4.88001e-06 [flash_sp]: 3.35e-06 [merge_comm]: 3.15998e-06 [allreduce_fusion]: 3.08e-06 [matmul_add_comm_reduction]: 5.91998e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.29999e-06 [virtual_dataset]: 5.62999e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.19998e-06 [merge_forward]: 2.68e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 7.11001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.089e-05 [merge_recompute_call_nodes]: 9.30013e-07 [before_grad]: 8.57e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58e-06 [meta_fg_expand]: 1.96e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 7.97e-06 [a_after_grad]: 7.9e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 7.52998e-06 [cse]: 1.481e-05 [a_3]: 3.358e-05 [py_interpret_to_execute_after_opt_a]: 8.12998e-06 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 3.484e-05 [convert_after_rewriter]: 7.1e-06 [order_py_execute_after_rewriter]: 5.62001e-06 [mutable_eliminate]: 0.0005054 [opt_b]: 0.0001928, [1] [Cycle 1]: 0.00018527, [7] [b_1]: 0.00011224 [b_2]: 7.05e-06 [updatestate_depend_eliminate]: 5.70001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.68e-06 [renormalize]: 3.80009e-07 [cse]: 1.868e-05 [optimize_parallel_all_gather_comm]: 1.568e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.51e-05 [loop_unroll]: 0.0004256 [opt_after_cconv]: 9.633e-05, [1] [Cycle 1]: 9.038e-05, [7] [c_1]: 2.505e-05 [parameter_eliminate]: 3.2e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.82002e-06 [updatestate_loads_eliminate]: 2.48e-06 [cse]: 1.742e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.535e-05 [tuple_transform]: 6.965e-05, [1] [Cycle 1]: 6.498e-05, [4] [d_1]: 3.783e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 2.80008e-07 [switch_simplify]: 6.34999e-06 [partial_unused_args_eliminate]: 2.21e-06 [add_recomputation]: 5.074e-05 [cse_after_recomputation]: 2.095e-05, [1] [Cycle 1]: 1.651e-05, [1] [cse]: 1.12e-05 [environ_conv]: 7.88999e-06 [swap_dp_allreduce_reducescatter]: 5.30999e-06 [bias_add_comm_swap]: 2.61999e-06 [label_micro_interleaved_index]: 4.62e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.50001e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.39999e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.84999e-06 [reorder_send_recv_between_fp_bp]: 2.61e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.25001e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.44003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.286e-05 [grouped_pairwise_exchange_alltoall]: 1.84998e-06 [offloading_packed_experts]: 3.99002e-06 [overlap_recompute_and_grad_model_parallel]: 5.20999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.89999e-06 [overlap_grad_ring_attention]: 4.80001e-06 [overlap_grad_flash_sp]: 1.801e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.69999e-06 [split_layernorm_comm]: 1.81998e-06 [handle_group_info]: 1.13001e-06 [symbol_engine_optimizer]: 7.287e-05, [1] [Cycle 1]: 6.826e-05, [6] [build]: 2.60002e-06 [elim_shapecalc]: 9.25999e-06 [elim_not_effective]: 1.194e-05 [opt_reshape]: 6.38e-06 [fold_const_symbol]: 9.51e-06 [renormalize]: 1.80007e-07 [detach_backward]: 2.01e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.58e-05 [get_jit_bprop_graph]: 1.19998e-06 [rewriter_after_jit_bprop_graph]: 3.75e-06 [opt_after_jit_grad]: 0.00047697 [validate]: 3.7e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.0945191 [execute]: 9.42999e-06 Sums bootstrap : 0.000529s : 0.50% type_inference : 0.006052s : 5.75% event_method : 0.000014s : 0.01% auto_monad : 0.000060s : 0.06% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000062s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000623s : 0.59% optimize.opt_a.with_stream_mark : 0.000026s : 0.02% optimize.opt_a.recompute_prepare : 0.000015s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000154s : 0.15% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.01% optimize.opt_a.merge_send_recv : 0.000014s : 0.01% optimize.opt_a.auto_parallel : 0.000013s : 0.01% optimize.opt_a.parallel : 0.000030s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000018s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000484s : 0.46% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.02% optimize.opt_a.cse : 0.000044s : 0.04% optimize.opt_a.a_3 : 0.000077s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000505s : 0.48% optimize.opt_b.b_1 : 0.000112s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.02% optimize.loop_unroll : 0.000426s : 0.40% optimize.opt_after_cconv.c_1 : 0.000025s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000477s : 0.45% validate : 0.000037s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.094519s : 89.82% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000174 26 19.69% : 0.000034s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 3.43% : 0.000006s : 3: substitution.graph_param_transform 63.55% : 0.000111s : 3: substitution.inline 1.87% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.75% : 0.000005s : 4: substitution.remove_not_recompute_node 1.58% : 0.000003s : 2: substitution.replace_old_param 5.18% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006001 2 90.10% : 0.005407s : 1: type_inference.infer 9.90% : 0.000594s : 1: type_inference.specialize ------[replace.] 0.000038 4 78.34% : 0.000030s : 3: replace.inline 21.66% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 4 92.99% : 0.000109s : 3: match.inline 7.01% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000215 883 0.66% : 0.000001s : 9: predicate.accumulaten_eliminater 0.81% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.42% : 0.000001s : 6: predicate.addn_check_dump 0.67% : 0.000001s : 9: predicate.addn_zero_filter 0.62% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 1.64% : 0.000004s : 15: predicate.arithmetic_simplify 0.68% : 0.000001s : 9: predicate.cast_eliminate 0.47% : 0.000001s : 6: predicate.check_bprop_eliminate 0.45% : 0.000001s : 6: predicate.compare_switch_simplify 0.15% : 0.000000s : 3: predicate.const_output_eliminate 0.46% : 0.000001s : 6: predicate.depend_value_elim 0.65% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.73% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.65% : 0.000001s : 9: predicate.dict_set_item_eliminator 0.78% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.17% : 0.000000s : 3: predicate.elim_not_effective 0.45% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 0.86% : 0.000002s : 12: predicate.environ_add_const_eliminate 0.81% : 0.000002s : 12: predicate.environ_get_add_eliminate 0.79% : 0.000002s : 12: predicate.environ_get_depend_swap 1.30% : 0.000003s : 18: predicate.environ_get_eliminate 0.81% : 0.000002s : 12: predicate.environ_get_set_eliminate 0.95% : 0.000002s : 13: predicate.exchange_switch_depend_value 1.72% : 0.000004s : 13: predicate.float_depend_g_call 0.42% : 0.000001s : 6: predicate.float_environ_get_switch 0.62% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.15% : 0.000000s : 3: predicate.fold_const_symbol 0.49% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000001s : 3: predicate.graph_param_transform 0.50% : 0.000001s : 6: predicate.incorporate_call 0.43% : 0.000001s : 6: predicate.incorporate_call_switch 4.71% : 0.000010s : 40: predicate.inline 0.70% : 0.000002s : 6: predicate.inline_without_move 0.28% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.82% : 0.000002s : 6: predicate.less_batch_normalization 1.19% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 1.79% : 0.000004s : 25: predicate.load_eliminater 0.73% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.65% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.23% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.43% : 0.000001s : 6: predicate.merge_addn 0.45% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.46% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.61% : 0.000001s : 9: predicate.minmaximum_grad 0.83% : 0.000002s : 3: predicate.mutable_eliminate 0.28% : 0.000001s : 3: predicate.opt_reshape 0.28% : 0.000001s : 3: predicate.parallel_virtual_node 1.20% : 0.000003s : 13: predicate.partial_defer_inline 1.05% : 0.000002s : 13: predicate.partial_eliminate 0.64% : 0.000001s : 9: predicate.print_const_string_wrapper 0.46% : 0.000001s : 6: predicate.reduce_all_const_elim 0.83% : 0.000002s : 9: predicate.reduce_eliminate 27.80% : 0.000060s : 25: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000001s : 6: predicate.remove_not_recompute_node 0.94% : 0.000002s : 16: predicate.replace_applicator 0.63% : 0.000001s : 6: predicate.replace_old_param 0.20% : 0.000000s : 3: predicate.reset_defer_inline 0.67% : 0.000001s : 9: predicate.reshape_eliminate 0.42% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.33% : 0.000001s : 3: predicate.row_tensor_eliminate 0.59% : 0.000001s : 6: predicate.same_eliminate 0.34% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.55% : 0.000001s : 6: predicate.shard_identity_eliminate 0.50% : 0.000001s : 6: predicate.special_op_eliminate 0.62% : 0.000001s : 6: predicate.specialize_transform 0.69% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.58% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.27% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.00% : 0.000002s : 13: predicate.switch_defer_inline 1.47% : 0.000003s : 19: predicate.switch_layer_defer_inline 3.71% : 0.000008s : 43: predicate.switch_simplify 0.66% : 0.000001s : 9: predicate.tile_eliminate 0.68% : 0.000001s : 9: predicate.transpose_eliminate 1.16% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.21% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.03% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 2.47% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.09% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 1.66% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.22% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 1.72% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.28% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 3: predicate.value_based_eliminate 0.52% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.52% : 0.000001s : 6: predicate.virtual_output_eliminate 0.24% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.35% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000356 8 43.47% : 0.000155s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.53% : 0.000201s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.119397 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.04% : 0.003624s : 1: add_attr 3.02% : 0.003610s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000065s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.47% : 0.000560s : 1: bootstrap 0.02% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.01% : 0.000017s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.36% : 0.000435s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.43% : 0.000515s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.84% : 0.001000s : 78: opt.transform.opt_a 0.02% : 0.000024s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000042s : 2: opt.transform.opt_trans_graph 0.03% : 0.000034s : 4: opt.transform.symbol_engine_opt 1.94% : 0.002313s : 1: opt_a 0.08% : 0.000100s : 1: opt_after_cconv 0.41% : 0.000487s : 1: opt_after_jit_grad 0.16% : 0.000196s : 1: opt_b 3.58% : 0.004279s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000019s : 1: remove_dup_value 0.20% : 0.000242s : 1: renormalize.infer 0.20% : 0.000234s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000039s : 1: rewriter_after_opt_a 0.06% : 0.000066s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000076s : 1: symbol_engine_optimizer 79.18% : 0.094543s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 5.08% : 0.006068s : 1: type_inference 0.05% : 0.000060s : 1: validate TotalTime = 0.10582, [24] [bootstrap]: 0.00055291 [type_inference]: 0.00629938 [event_method]: 1.407e-05 [auto_monad]: 6.118e-05 [graph_reusing]: 5.57999e-06 [inline]: 1.95001e-06 [add_attr]: 0.00322369, [1] [add_attr_with_inline]: 0.00321346, [1] [Cycle 1]: 5.553e-05, [2] [tag_attr]: 1.451e-05 [meta_addattr_fg_expand]: 3.96001e-06 [parallel-infer-symbol]: 3.67002e-06 [pre_auto_parallel]: 2.69e-05 [insert-virtual-dataset]: 2.94999e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 2.41998e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.00414178, [53] [py_interpret_to_execute]: 2.337e-05 [rewriter_before_opt_a]: 5.693e-05 [opt_a]: 0.0021708, [2] [Cycle 1]: 0.00155404, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 3.024e-05 [loop_unroll]: 1.805e-05 [a_1]: 0.00040933 [with_stream_mark]: 1.74e-05 [recompute_prepare]: 8.92e-06 [updatestate_depend_eliminate]: 4.07e-06 [updatestate_assign_eliminate]: 3.44001e-06 [updatestate_loads_eliminate]: 3.11999e-06 [parameter_eliminate]: 1.97999e-06 [a_2]: 8.836e-05 [accelerated_algorithm]: 7.06001e-06 [shard]: 2.37001e-06 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 6.29999e-06 [merge_send_recv]: 8.90999e-06 [auto_parallel]: 6.53003e-06 [parallel]: 1.915e-05 [flash_sp]: 7.65e-06 [merge_comm]: 4.18001e-06 [allreduce_fusion]: 3.95e-06 [matmul_add_comm_reduction]: 1.017e-05 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 8.18999e-06 [virtual_dataset]: 6.05002e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 3.79002e-06 [cell_reuse_recompute_pass]: 1.50999e-06 [offload_activation]: 9.74e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.337e-05 [merge_recompute_call_nodes]: 1.64998e-06 [before_grad]: 1.148e-05 [set_forward_comm_id_for_comm_node_pass]: 3.91999e-06 [meta_fg_expand]: 2.96999e-06 [flash_sp_send_recv_attached]: 3.11001e-06 [receive_attached]: 2.16998e-06 [after_resolve]: 1.014e-05 [a_after_grad]: 9.16002e-06 [renormalize]: 0.00045237 [add_forward_monad_depend]: 6.16998e-06 [auto_monad_grad]: 2.34001e-06 [auto_monad_eliminator]: 1.373e-05 [cse]: 2.992e-05 [a_3]: 4.216e-05 [Cycle 2]: 0.00060648, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 7.08e-06 [loop_unroll]: 6.04001e-06 [a_1]: 0.00011404 [with_stream_mark]: 1.28e-05 [recompute_prepare]: 6.02999e-06 [updatestate_depend_eliminate]: 3.02002e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.70002e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 7.222e-05 [accelerated_algorithm]: 5.84e-06 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.27e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 4.53001e-06 [auto_parallel]: 5.59e-06 [parallel]: 4.27998e-06 [flash_sp]: 3.54002e-06 [merge_comm]: 3.41999e-06 [allreduce_fusion]: 3.08998e-06 [matmul_add_comm_reduction]: 5.14e-06 [allreduce_slice_to_reducescatter]: 4.30009e-07 [virtual_shard_identity]: 6.94999e-06 [virtual_dataset]: 5.63002e-06 [get_grad_eliminate_]: 5.22999e-06 [virtual_output]: 5.07999e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 6.06e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.07e-05 [merge_recompute_call_nodes]: 9.10019e-07 [before_grad]: 8.50999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 1.88002e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 9.30013e-07 [after_resolve]: 8.27e-06 [a_after_grad]: 7.83001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.50999e-06 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 6.33e-06 [cse]: 1.328e-05 [a_3]: 3.306e-05 [py_interpret_to_execute_after_opt_a]: 8.3e-06 [slice_cell_reuse_recomputed_activation]: 2.12999e-06 [rewriter_after_opt_a]: 3.679e-05 [convert_after_rewriter]: 6.28e-06 [order_py_execute_after_rewriter]: 4.78001e-06 [mutable_eliminate]: 0.00052017 [opt_b]: 0.00019006, [1] [Cycle 1]: 0.00018346, [7] [b_1]: 0.00011171 [b_2]: 6.69001e-06 [updatestate_depend_eliminate]: 5.39998e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.46e-06 [renormalize]: 2.69996e-07 [cse]: 1.894e-05 [optimize_parallel_all_gather_comm]: 1.966e-05 [overlap_param_gather]: 2.24001e-06 [cconv]: 2.431e-05 [loop_unroll]: 0.00043101 [opt_after_cconv]: 9.596e-05, [1] [Cycle 1]: 9.019e-05, [7] [c_1]: 2.5e-05 [parameter_eliminate]: 2.74999e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.62001e-06 [updatestate_loads_eliminate]: 2.51998e-06 [cse]: 1.766e-05 [renormalize]: 7.49977e-07 [remove_dup_value]: 1.624e-05 [tuple_transform]: 6.876e-05, [1] [Cycle 1]: 6.43e-05, [4] [d_1]: 3.759e-05 [none_parameter_eliminate]: 1.71998e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.27001e-06 [partial_unused_args_eliminate]: 1.86e-06 [add_recomputation]: 4.663e-05 [cse_after_recomputation]: 2.063e-05, [1] [Cycle 1]: 1.605e-05, [1] [cse]: 1.077e-05 [environ_conv]: 5.55001e-06 [swap_dp_allreduce_reducescatter]: 5.28002e-06 [bias_add_comm_swap]: 2.34999e-06 [label_micro_interleaved_index]: 4.26001e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 7.99977e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.69999e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.57001e-06 [interleave_parallel_branches]: 1.09998e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.213e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 3.91001e-06 [overlap_recompute_and_grad_model_parallel]: 4.50999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.61999e-06 [overlap_grad_ring_attention]: 4.46002e-06 [overlap_grad_flash_sp]: 1.869e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.30002e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.15001e-06 [symbol_engine_optimizer]: 7.189e-05, [1] [Cycle 1]: 6.703e-05, [6] [build]: 2.68e-06 [elim_shapecalc]: 9.09998e-06 [elim_not_effective]: 1.195e-05 [opt_reshape]: 6.36998e-06 [fold_const_symbol]: 9.19998e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.62999e-06 [pipeline_parallel_scheduler]: 1.74e-06 [auto_monad_reorder]: 1.736e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 4.13999e-06 [opt_after_jit_grad]: 0.00047541 [validate]: 3.805e-05 [backend_pass]: 1.03001e-06 [task_emit]: 0.0907027 [execute]: 9.57999e-06 Sums bootstrap : 0.000553s : 0.54% type_inference : 0.006299s : 6.20% event_method : 0.000014s : 0.01% auto_monad : 0.000061s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000027s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.02% optimize.rewriter_before_opt_a : 0.000057s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000037s : 0.04% optimize.opt_a.loop_unroll : 0.000024s : 0.02% optimize.opt_a.a_1 : 0.000523s : 0.52% optimize.opt_a.with_stream_mark : 0.000030s : 0.03% optimize.opt_a.recompute_prepare : 0.000015s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000161s : 0.16% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000018s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000452s : 0.45% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000043s : 0.04% optimize.opt_a.a_3 : 0.000075s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000037s : 0.04% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000520s : 0.51% optimize.opt_b.b_1 : 0.000112s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.02% optimize.loop_unroll : 0.000431s : 0.42% optimize.opt_after_cconv.c_1 : 0.000025s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000475s : 0.47% validate : 0.000038s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.090703s : 89.31% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000189 24 15.95% : 0.000030s : 4: substitution.arithmetic_simplify 0.98% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 2.79% : 0.000005s : 3: substitution.graph_param_transform 73.10% : 0.000138s : 3: substitution.inline 1.84% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.75% : 0.000005s : 4: substitution.remove_not_recompute_node 1.84% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006250 2 92.34% : 0.005771s : 1: type_inference.infer 7.66% : 0.000479s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000136 3 100.00% : 0.000136s : 3: match.inline ------[predicate.] 0.000151 815 0.90% : 0.000001s : 8: predicate.accumulaten_eliminater 1.03% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000001s : 8: predicate.addn_zero_filter 0.83% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.36% : 0.000004s : 14: predicate.arithmetic_simplify 0.87% : 0.000001s : 8: predicate.cast_eliminate 0.68% : 0.000001s : 6: predicate.check_bprop_eliminate 0.67% : 0.000001s : 6: predicate.compare_switch_simplify 0.27% : 0.000000s : 3: predicate.const_output_eliminate 0.70% : 0.000001s : 6: predicate.depend_value_elim 0.86% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.37% : 0.000001s : 3: predicate.elim_not_effective 0.44% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_depend_swap 1.79% : 0.000003s : 17: predicate.environ_get_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.22% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 6: predicate.float_environ_get_switch 0.90% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.81% : 0.000001s : 6: predicate.get_grad_eliminate 0.26% : 0.000000s : 3: predicate.graph_param_transform 0.75% : 0.000001s : 6: predicate.incorporate_call 0.67% : 0.000001s : 6: predicate.incorporate_call_switch 6.36% : 0.000010s : 37: predicate.inline 0.95% : 0.000001s : 6: predicate.inline_without_move 0.44% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 6: predicate.less_batch_normalization 1.60% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 22: predicate.load_eliminater 1.00% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 6: predicate.merge_addn 0.62% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 8: predicate.minmaximum_grad 1.28% : 0.000002s : 3: predicate.mutable_eliminate 0.48% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.46% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 11: predicate.partial_eliminate 0.96% : 0.000001s : 8: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.28% : 0.000002s : 8: predicate.reduce_eliminate 2.28% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.61% : 0.000001s : 6: predicate.remove_not_recompute_node 1.18% : 0.000002s : 14: predicate.replace_applicator 0.66% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.87% : 0.000001s : 8: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.37% : 0.000002s : 6: predicate.shard_identity_eliminate 0.76% : 0.000001s : 6: predicate.special_op_eliminate 0.91% : 0.000001s : 6: predicate.specialize_transform 0.98% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.26% : 0.000002s : 11: predicate.switch_defer_inline 1.87% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.76% : 0.000007s : 38: predicate.switch_simplify 0.98% : 0.000001s : 8: predicate.tile_eliminate 0.79% : 0.000001s : 8: predicate.transpose_eliminate 1.50% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.05% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 3: predicate.value_based_eliminate 0.74% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000320 7 41.17% : 0.000132s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.83% : 0.000188s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.114685 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.82% : 0.003229s : 1: add_attr 2.81% : 0.003217s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000066s : 1: auto_monad 0.02% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.52% : 0.000592s : 1: bootstrap 0.02% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.38% : 0.000440s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.46% : 0.000530s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.79% : 0.000904s : 78: opt.transform.opt_a 0.02% : 0.000024s : 1: opt.transform.opt_after_cconv 0.02% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000042s : 2: opt.transform.opt_trans_graph 0.03% : 0.000033s : 4: opt.transform.symbol_engine_opt 1.90% : 0.002174s : 1: opt_a 0.09% : 0.000099s : 1: opt_after_cconv 0.42% : 0.000486s : 1: opt_after_jit_grad 0.17% : 0.000193s : 1: opt_b 3.62% : 0.004146s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000031s : 1: pre_auto_parallel 0.02% : 0.000028s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000020s : 1: remove_dup_value 0.21% : 0.000241s : 1: renormalize.infer 0.18% : 0.000204s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000041s : 1: rewriter_after_opt_a 0.05% : 0.000061s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000075s : 1: symbol_engine_optimizer 79.11% : 0.090726s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 5.51% : 0.006317s : 1: type_inference 0.05% : 0.000063s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x5-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x5-ge],max_mem:10.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x6-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x6-pynative],max_mem:10.0M TotalTime = 0.0217327, [24] [bootstrap]: 0.00049962 [type_inference]: 0.00625094 [event_method]: 1.42e-05 [auto_monad]: 4.824e-05 [graph_reusing]: 4.77e-06 [inline]: 1.88002e-06 [add_attr]: 0.00350418, [1] [add_attr_with_inline]: 0.0034936, [1] [Cycle 1]: 4.189e-05, [2] [tag_attr]: 1.316e-05 [meta_addattr_fg_expand]: 3.43e-06 [parallel-infer-symbol]: 2.58e-06 [pre_auto_parallel]: 2.303e-05 [insert-virtual-dataset]: 1.84e-06 [parallel-infer-symbol-second]: 8.29983e-07 [dataset_repeat_opt]: 1.48002e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.00420404, [53] [py_interpret_to_execute]: 2.082e-05 [rewriter_before_opt_a]: 5.943e-05 [opt_a]: 0.00220628, [2] [Cycle 1]: 0.00156968, [45] [expand_dump_flag]: 2.15002e-06 [switch_simplify]: 3.044e-05 [loop_unroll]: 2.084e-05 [a_1]: 0.00042264 [with_stream_mark]: 1.215e-05 [recompute_prepare]: 8.15e-06 [updatestate_depend_eliminate]: 4.61002e-06 [updatestate_assign_eliminate]: 2.78e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 7.917e-05 [accelerated_algorithm]: 6.80998e-06 [shard]: 1.79e-06 [meta_shard_fg_expand]: 1.43002e-06 [shard_inline]: 6.00002e-06 [merge_send_recv]: 6.02999e-06 [auto_parallel]: 5.86998e-06 [parallel]: 2.134e-05 [flash_sp]: 6.14999e-06 [merge_comm]: 4.08001e-06 [allreduce_fusion]: 3.63e-06 [matmul_add_comm_reduction]: 6.39999e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 7.38e-06 [virtual_dataset]: 6.07001e-06 [get_grad_eliminate_]: 5.75001e-06 [virtual_output]: 5.86e-06 [merge_forward]: 3.71999e-06 [cell_reuse_recompute_pass]: 1.06002e-06 [offload_activation]: 7.82998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.147e-05 [merge_recompute_call_nodes]: 1.10999e-06 [before_grad]: 9.85002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.93999e-06 [meta_fg_expand]: 2.31998e-06 [flash_sp_send_recv_attached]: 1.51002e-06 [receive_attached]: 1.60999e-06 [after_resolve]: 9.37001e-06 [a_after_grad]: 8.55001e-06 [renormalize]: 0.00049006 [add_forward_monad_depend]: 8.79e-06 [auto_monad_grad]: 2.31e-06 [auto_monad_eliminator]: 1.407e-05 [cse]: 2.956e-05 [a_3]: 4.271e-05 [Cycle 2]: 0.00062601, [45] [expand_dump_flag]: 1.10001e-06 [switch_simplify]: 7.11001e-06 [loop_unroll]: 5.95002e-06 [a_1]: 0.00011553 [with_stream_mark]: 1.021e-05 [recompute_prepare]: 6.23e-06 [updatestate_depend_eliminate]: 3.03e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.62001e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 7.301e-05 [accelerated_algorithm]: 5.72999e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 4.81997e-06 [auto_parallel]: 6.26e-06 [parallel]: 4.17e-06 [flash_sp]: 3.29001e-06 [merge_comm]: 3.26001e-06 [allreduce_fusion]: 3.04999e-06 [matmul_add_comm_reduction]: 5.54998e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.69999e-06 [virtual_dataset]: 5.56e-06 [get_grad_eliminate_]: 5.30999e-06 [virtual_output]: 5.52999e-06 [merge_forward]: 2.83e-06 [cell_reuse_recompute_pass]: 1.74e-06 [offload_activation]: 7.03998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.28e-05 [merge_recompute_call_nodes]: 1.10999e-06 [before_grad]: 9.61e-06 [set_forward_comm_id_for_comm_node_pass]: 3.63e-06 [meta_fg_expand]: 1.82999e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 9.36e-06 [a_after_grad]: 7.93001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.81003e-06 [auto_monad_grad]: 1.01002e-06 [auto_monad_eliminator]: 7.08e-06 [cse]: 1.428e-05 [a_3]: 3.569e-05 [py_interpret_to_execute_after_opt_a]: 9.55001e-06 [slice_cell_reuse_recomputed_activation]: 2.56e-06 [rewriter_after_opt_a]: 3.452e-05 [convert_after_rewriter]: 7.1e-06 [order_py_execute_after_rewriter]: 5.71e-06 [mutable_eliminate]: 0.00052406 [opt_b]: 0.00020476, [1] [Cycle 1]: 0.00019771, [7] [b_1]: 0.00012093 [b_2]: 7.75e-06 [updatestate_depend_eliminate]: 5.74e-06 [updatestate_assign_eliminate]: 2.72001e-06 [updatestate_loads_eliminate]: 2.78e-06 [renormalize]: 6.79982e-07 [cse]: 1.872e-05 [optimize_parallel_all_gather_comm]: 1.64e-05 [overlap_param_gather]: 1.83997e-06 [cconv]: 2.393e-05 [loop_unroll]: 0.00043203 [opt_after_cconv]: 9.882e-05, [1] [Cycle 1]: 9.29e-05, [7] [c_1]: 2.646e-05 [parameter_eliminate]: 2.57001e-06 [updatestate_depend_eliminate]: 5.18002e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.41e-06 [cse]: 1.815e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.438e-05 [tuple_transform]: 6.816e-05, [1] [Cycle 1]: 6.354e-05, [4] [d_1]: 3.698e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 6.54999e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 5.203e-05 [cse_after_recomputation]: 2.158e-05, [1] [Cycle 1]: 1.69e-05, [1] [cse]: 1.138e-05 [environ_conv]: 8.03001e-06 [swap_dp_allreduce_reducescatter]: 5.34e-06 [bias_add_comm_swap]: 2.91e-06 [label_micro_interleaved_index]: 4.30999e-06 [label_fine_grained_interleaved_index]: 2.97002e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.18998e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.35999e-06 [full_micro_interleaved_order_control]: 2.37999e-06 [reorder_send_recv_between_fp_bp]: 2.95998e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 1.17e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.29003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67001e-06 [control_data_broadcast_order]: 1.241e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.07e-06 [overlap_recompute_and_grad_model_parallel]: 5.05001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.47001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.12001e-06 [overlap_grad_ring_attention]: 4.10998e-06 [overlap_grad_flash_sp]: 1.806e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.44001e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.13001e-06 [symbol_engine_optimizer]: 7.122e-05, [1] [Cycle 1]: 6.651e-05, [6] [build]: 2.44999e-06 [elim_shapecalc]: 9.19e-06 [elim_not_effective]: 1.183e-05 [opt_reshape]: 6.26e-06 [fold_const_symbol]: 9.19998e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.91e-06 [pipeline_parallel_scheduler]: 1.65001e-06 [auto_monad_reorder]: 1.521e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.90998e-06 [opt_after_jit_grad]: 0.00046804 [validate]: 3.532e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.00645322 [execute]: 7.38999e-06 Sums bootstrap : 0.000500s : 2.90% type_inference : 0.006251s : 36.29% event_method : 0.000014s : 0.08% auto_monad : 0.000048s : 0.28% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000023s : 0.13% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000001s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.35% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000538s : 3.12% optimize.opt_a.with_stream_mark : 0.000022s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000152s : 0.88% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000011s : 0.06% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000026s : 0.15% optimize.opt_a.flash_sp : 0.000009s : 0.05% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000012s : 0.07% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000002s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000490s : 2.85% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.06% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000044s : 0.25% optimize.opt_a.a_3 : 0.000078s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.01% optimize.rewriter_after_opt_a : 0.000035s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000524s : 3.04% optimize.opt_b.b_1 : 0.000121s : 0.70% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000019s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.14% optimize.loop_unroll : 0.000432s : 2.51% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000037s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000008s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000468s : 2.72% validate : 0.000035s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006453s : 37.47% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000153 26 18.94% : 0.000029s : 5: substitution.arithmetic_simplify 1.29% : 0.000002s : 2: substitution.elim_not_effective 0.89% : 0.000001s : 2: substitution.fold_const_symbol 3.45% : 0.000005s : 3: substitution.graph_param_transform 62.52% : 0.000096s : 3: substitution.inline 1.98% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.47% : 0.000005s : 4: substitution.remove_not_recompute_node 2.10% : 0.000003s : 2: substitution.replace_old_param 5.37% : 0.000008s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006207 2 90.40% : 0.005611s : 1: type_inference.infer 9.60% : 0.000596s : 1: type_inference.specialize ------[replace.] 0.000035 4 76.75% : 0.000027s : 3: replace.inline 23.25% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000101 4 92.74% : 0.000094s : 3: match.inline 7.26% : 0.000007s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 883 0.93% : 0.000001s : 9: predicate.accumulaten_eliminater 0.93% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.93% : 0.000001s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 15: predicate.arithmetic_simplify 1.07% : 0.000002s : 9: predicate.cast_eliminate 0.66% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.67% : 0.000001s : 6: predicate.depend_value_elim 0.90% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.93% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.42% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.14% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_depend_swap 1.73% : 0.000003s : 18: predicate.environ_get_eliminate 1.16% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.36% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.09% : 0.000003s : 13: predicate.float_depend_g_call 0.57% : 0.000001s : 6: predicate.float_environ_get_switch 0.88% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 3: predicate.fold_const_symbol 0.69% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.64% : 0.000001s : 6: predicate.incorporate_call 0.57% : 0.000001s : 6: predicate.incorporate_call_switch 6.19% : 0.000010s : 40: predicate.inline 0.93% : 0.000001s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 6: predicate.less_batch_normalization 1.60% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.50% : 0.000004s : 25: predicate.load_eliminater 1.00% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.24% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.71% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 9: predicate.minmaximum_grad 1.10% : 0.000002s : 3: predicate.mutable_eliminate 0.33% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.63% : 0.000003s : 13: predicate.partial_defer_inline 1.44% : 0.000002s : 13: predicate.partial_eliminate 0.90% : 0.000001s : 9: predicate.print_const_string_wrapper 0.64% : 0.000001s : 6: predicate.reduce_all_const_elim 1.16% : 0.000002s : 9: predicate.reduce_eliminate 2.45% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 6: predicate.remove_not_recompute_node 1.39% : 0.000002s : 16: predicate.replace_applicator 0.49% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 0.88% : 0.000001s : 9: predicate.reshape_eliminate 0.62% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.89% : 0.000001s : 6: predicate.same_eliminate 0.47% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 6: predicate.shard_identity_eliminate 0.74% : 0.000001s : 6: predicate.special_op_eliminate 0.80% : 0.000001s : 6: predicate.specialize_transform 0.92% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.75% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 13: predicate.switch_defer_inline 1.97% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.86% : 0.000008s : 43: predicate.switch_simplify 1.07% : 0.000002s : 9: predicate.tile_eliminate 0.86% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.47% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.69% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000430 8 40.61% : 0.000175s : 3: func_graph_cloner_run.FuncGraphClonerGraph 59.39% : 0.000256s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031005 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.32% : 0.003508s : 1: add_attr 11.28% : 0.003497s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.17% : 0.000054s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.68% : 0.000521s : 1: bootstrap 0.09% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000011s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.42% : 0.000441s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000534s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.95% : 0.000916s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000098s : 28: opt.transform.opt_b 0.13% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.13% : 0.002209s : 1: opt_a 0.33% : 0.000102s : 1: opt_after_cconv 1.54% : 0.000478s : 1: opt_after_jit_grad 0.67% : 0.000208s : 1: opt_b 13.57% : 0.004208s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.09% : 0.000027s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.70% : 0.000217s : 1: renormalize.infer 0.86% : 0.000267s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000039s : 1: rewriter_after_opt_a 0.20% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000074s : 1: symbol_engine_optimizer 20.85% : 0.006463s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.21% : 0.006266s : 1: type_inference 0.21% : 0.000064s : 1: validate TotalTime = 0.0215441, [24] [bootstrap]: 0.00050912 [type_inference]: 0.00632636 [event_method]: 1.372e-05 [auto_monad]: 6.159e-05 [graph_reusing]: 5.69999e-06 [inline]: 2.11003e-06 [add_attr]: 0.00320828, [1] [add_attr_with_inline]: 0.00319915, [1] [Cycle 1]: 6.003e-05, [2] [tag_attr]: 1.644e-05 [meta_addattr_fg_expand]: 3.61001e-06 [parallel-infer-symbol]: 3.36001e-06 [pre_auto_parallel]: 2.757e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.98997e-06 [optimize]: 0.00423778, [53] [py_interpret_to_execute]: 2.352e-05 [rewriter_before_opt_a]: 5.441e-05 [opt_a]: 0.00217729, [2] [Cycle 1]: 0.00154958, [45] [expand_dump_flag]: 2.94999e-06 [switch_simplify]: 3.073e-05 [loop_unroll]: 1.724e-05 [a_1]: 0.00036352 [with_stream_mark]: 1.723e-05 [recompute_prepare]: 8.77999e-06 [updatestate_depend_eliminate]: 4.53999e-06 [updatestate_assign_eliminate]: 3.55e-06 [updatestate_loads_eliminate]: 3.31001e-06 [parameter_eliminate]: 1.97999e-06 [a_2]: 8.185e-05 [accelerated_algorithm]: 6.76e-06 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 1.74998e-06 [shard_inline]: 6.28e-06 [merge_send_recv]: 8.42e-06 [auto_parallel]: 6.19001e-06 [parallel]: 1.827e-05 [flash_sp]: 8.62998e-06 [merge_comm]: 4.15e-06 [allreduce_fusion]: 3.80998e-06 [matmul_add_comm_reduction]: 9.59e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.24001e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.81998e-06 [virtual_output]: 5.77999e-06 [merge_forward]: 3.83999e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 1.021e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.186e-05 [merge_recompute_call_nodes]: 1.89e-06 [before_grad]: 9.87999e-06 [set_forward_comm_id_for_comm_node_pass]: 4.60001e-06 [meta_fg_expand]: 3.16999e-06 [flash_sp_send_recv_attached]: 2.81999e-06 [receive_attached]: 2.43e-06 [after_resolve]: 1.021e-05 [a_after_grad]: 9.02999e-06 [renormalize]: 0.0004956 [add_forward_monad_depend]: 5.61e-06 [auto_monad_grad]: 2.20002e-06 [auto_monad_eliminator]: 1.491e-05 [cse]: 3.1e-05 [a_3]: 4.8e-05 [Cycle 2]: 0.00061748, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 7.58001e-06 [loop_unroll]: 5.62001e-06 [a_1]: 0.00011702 [with_stream_mark]: 1.082e-05 [recompute_prepare]: 6.31e-06 [updatestate_depend_eliminate]: 3.14001e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.78998e-06 [parameter_eliminate]: 9.69972e-07 [a_2]: 7.318e-05 [accelerated_algorithm]: 5.80002e-06 [shard]: 1.37e-06 [meta_shard_fg_expand]: 1.29998e-06 [shard_inline]: 5.98002e-06 [merge_send_recv]: 4.66002e-06 [auto_parallel]: 5.84999e-06 [parallel]: 4.23999e-06 [flash_sp]: 3.58e-06 [merge_comm]: 3.61001e-06 [allreduce_fusion]: 2.83e-06 [matmul_add_comm_reduction]: 5.52001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.24001e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 5.20999e-06 [virtual_output]: 5.05999e-06 [merge_forward]: 3.08e-06 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 7.01001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.124e-05 [merge_recompute_call_nodes]: 8.39995e-07 [before_grad]: 8.80001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.83999e-06 [meta_fg_expand]: 1.95001e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 8.80013e-07 [after_resolve]: 8.80001e-06 [a_after_grad]: 7.82998e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.64e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.83e-06 [cse]: 1.399e-05 [a_3]: 3.385e-05 [py_interpret_to_execute_after_opt_a]: 9.10999e-06 [slice_cell_reuse_recomputed_activation]: 2.34001e-06 [rewriter_after_opt_a]: 3.584e-05 [convert_after_rewriter]: 6.76999e-06 [order_py_execute_after_rewriter]: 5.64e-06 [mutable_eliminate]: 0.00054167 [opt_b]: 0.00019329, [1] [Cycle 1]: 0.0001856, [7] [b_1]: 0.00011224 [b_2]: 6.78e-06 [updatestate_depend_eliminate]: 5.74999e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.66e-06 [renormalize]: 4.00003e-07 [cse]: 1.964e-05 [optimize_parallel_all_gather_comm]: 1.656e-05 [overlap_param_gather]: 2.49001e-06 [cconv]: 2.588e-05 [loop_unroll]: 0.0004924 [opt_after_cconv]: 0.00010082, [1] [Cycle 1]: 9.43e-05, [7] [c_1]: 2.673e-05 [parameter_eliminate]: 3.73001e-06 [updatestate_depend_eliminate]: 5.74e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.809e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.546e-05 [tuple_transform]: 6.975e-05, [1] [Cycle 1]: 6.509e-05, [4] [d_1]: 3.775e-05 [none_parameter_eliminate]: 1.57999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.68e-06 [partial_unused_args_eliminate]: 1.76998e-06 [add_recomputation]: 4.602e-05 [cse_after_recomputation]: 2.206e-05, [1] [Cycle 1]: 1.706e-05, [1] [cse]: 1.134e-05 [environ_conv]: 5.51998e-06 [swap_dp_allreduce_reducescatter]: 5.19998e-06 [bias_add_comm_swap]: 2.56e-06 [label_micro_interleaved_index]: 4.57998e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.36998e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.06e-06 [assign_add_opt]: 1.40001e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.36998e-06 [reorder_send_recv_between_fp_bp]: 2.79999e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.21002e-06 [interleave_parallel_branches]: 1.39e-06 [overlap_opt_shard_in_pipeline]: 1.58002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86998e-06 [control_data_broadcast_order]: 1.236e-05 [grouped_pairwise_exchange_alltoall]: 1.80001e-06 [offloading_packed_experts]: 4.03999e-06 [overlap_recompute_and_grad_model_parallel]: 4.53999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.41998e-06 [overlap_recompute_comm]: 2.43998e-06 [overlap_grad_ring_attention]: 4.40999e-06 [overlap_grad_flash_sp]: 1.935e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.39999e-06 [split_layernorm_comm]: 1.97999e-06 [handle_group_info]: 1.14e-06 [symbol_engine_optimizer]: 7.316e-05, [1] [Cycle 1]: 6.858e-05, [6] [build]: 3.66999e-06 [elim_shapecalc]: 8.78001e-06 [elim_not_effective]: 1.194e-05 [opt_reshape]: 6.84999e-06 [fold_const_symbol]: 9.26998e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.99e-06 [pipeline_parallel_scheduler]: 1.62999e-06 [auto_monad_reorder]: 1.646e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.86001e-06 [opt_after_jit_grad]: 0.00048108 [validate]: 3.838e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.00635452 [execute]: 8.83001e-06 Sums bootstrap : 0.000509s : 2.94% type_inference : 0.006326s : 36.59% event_method : 0.000014s : 0.08% auto_monad : 0.000062s : 0.36% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000024s : 0.14% optimize.rewriter_before_opt_a : 0.000054s : 0.31% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.22% optimize.opt_a.loop_unroll : 0.000023s : 0.13% optimize.opt_a.a_1 : 0.000481s : 2.78% optimize.opt_a.with_stream_mark : 0.000028s : 0.16% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000155s : 0.90% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.13% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000496s : 2.87% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.13% optimize.opt_a.cse : 0.000045s : 0.26% optimize.opt_a.a_3 : 0.000082s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000542s : 3.13% optimize.opt_b.b_1 : 0.000112s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000026s : 0.15% optimize.loop_unroll : 0.000492s : 2.85% optimize.opt_after_cconv.c_1 : 0.000027s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000481s : 2.78% validate : 0.000038s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006355s : 36.75% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000153 24 20.32% : 0.000031s : 4: substitution.arithmetic_simplify 1.29% : 0.000002s : 2: substitution.elim_not_effective 0.88% : 0.000001s : 2: substitution.fold_const_symbol 3.44% : 0.000005s : 3: substitution.graph_param_transform 66.52% : 0.000102s : 3: substitution.inline 1.96% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.96% : 0.000005s : 4: substitution.remove_not_recompute_node 2.64% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.006277 2 92.48% : 0.005805s : 1: type_inference.infer 7.52% : 0.000472s : 1: type_inference.specialize ------[replace.] 0.000029 3 100.00% : 0.000029s : 3: replace.inline ------[match.] 0.000100 3 100.00% : 0.000100s : 3: match.inline ------[predicate.] 0.000149 815 0.85% : 0.000001s : 8: predicate.accumulaten_eliminater 1.06% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.86% : 0.000001s : 8: predicate.addn_zero_filter 0.78% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.38% : 0.000004s : 14: predicate.arithmetic_simplify 0.87% : 0.000001s : 8: predicate.cast_eliminate 0.72% : 0.000001s : 6: predicate.check_bprop_eliminate 0.63% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.84% : 0.000001s : 6: predicate.depend_value_elim 0.85% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.47% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_depend_swap 1.82% : 0.000003s : 17: predicate.environ_get_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.15% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.11% : 0.000003s : 11: predicate.float_depend_g_call 0.60% : 0.000001s : 6: predicate.float_environ_get_switch 0.89% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 3: predicate.fold_const_symbol 0.77% : 0.000001s : 6: predicate.get_grad_eliminate 0.37% : 0.000001s : 3: predicate.graph_param_transform 0.75% : 0.000001s : 6: predicate.incorporate_call 0.61% : 0.000001s : 6: predicate.incorporate_call_switch 6.67% : 0.000010s : 37: predicate.inline 0.96% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 6: predicate.less_batch_normalization 1.53% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.31% : 0.000003s : 22: predicate.load_eliminater 1.08% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.96% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 6: predicate.merge_addn 0.77% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.29% : 0.000002s : 3: predicate.mutable_eliminate 0.40% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.59% : 0.000002s : 11: predicate.partial_defer_inline 1.33% : 0.000002s : 11: predicate.partial_eliminate 0.82% : 0.000001s : 8: predicate.print_const_string_wrapper 0.67% : 0.000001s : 6: predicate.reduce_all_const_elim 1.12% : 0.000002s : 8: predicate.reduce_eliminate 2.17% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.87% : 0.000001s : 6: predicate.remove_not_recompute_node 1.36% : 0.000002s : 14: predicate.replace_applicator 0.73% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 0.85% : 0.000001s : 8: predicate.reshape_eliminate 0.67% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 3: predicate.row_tensor_eliminate 0.99% : 0.000001s : 6: predicate.same_eliminate 0.55% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 6: predicate.shard_identity_eliminate 0.84% : 0.000001s : 6: predicate.special_op_eliminate 0.89% : 0.000001s : 6: predicate.specialize_transform 1.01% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.22% : 0.000002s : 11: predicate.switch_defer_inline 1.91% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.82% : 0.000007s : 38: predicate.switch_simplify 0.85% : 0.000001s : 8: predicate.tile_eliminate 0.85% : 0.000001s : 8: predicate.transpose_eliminate 1.51% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 2.99% : 0.000004s : 20: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.19% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.91% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.73% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000307 7 38.51% : 0.000118s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.49% : 0.000189s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030495 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.54% : 0.003213s : 1: add_attr 10.50% : 0.003203s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000067s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.80% : 0.000550s : 1: bootstrap 0.10% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.05% : 0.000015s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.64% : 0.000502s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.81% : 0.000551s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.81% : 0.000856s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.15% : 0.002181s : 1: opt_a 0.34% : 0.000104s : 1: opt_after_cconv 1.61% : 0.000491s : 1: opt_after_jit_grad 0.64% : 0.000197s : 1: opt_b 13.91% : 0.004242s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.09% : 0.000028s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.86% : 0.000263s : 1: renormalize.infer 0.74% : 0.000224s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000040s : 1: rewriter_after_opt_a 0.19% : 0.000059s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000076s : 1: symbol_engine_optimizer 20.91% : 0.006377s : 1: task_emit 0.24% : 0.000073s : 1: tuple_transform 20.81% : 0.006347s : 1: type_inference 0.23% : 0.000069s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x6-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x6-kbk],max_mem:10.0M TotalTime = 0.751035, [24] [bootstrap]: 0.00058848 [type_inference]: 0.00649357 [event_method]: 1.364e-05 [auto_monad]: 5.89e-05 [graph_reusing]: 5.66e-06 [inline]: 1.99e-06 [add_attr]: 0.00361168, [1] [add_attr_with_inline]: 0.00360058, [1] [Cycle 1]: 5.136e-05, [2] [tag_attr]: 1.577e-05 [meta_addattr_fg_expand]: 4.17998e-06 [parallel-infer-symbol]: 3.81001e-06 [pre_auto_parallel]: 2.802e-05 [insert-virtual-dataset]: 2.48998e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.82001e-06 [optimize]: 0.00450525, [53] [py_interpret_to_execute]: 2.334e-05 [rewriter_before_opt_a]: 6.332e-05 [opt_a]: 0.00237756, [2] [Cycle 1]: 0.001729, [45] [expand_dump_flag]: 3.14001e-06 [switch_simplify]: 3.411e-05 [loop_unroll]: 2.064e-05 [a_1]: 0.0004554 [with_stream_mark]: 1.669e-05 [recompute_prepare]: 8.32e-06 [updatestate_depend_eliminate]: 3.78001e-06 [updatestate_assign_eliminate]: 3.43999e-06 [updatestate_loads_eliminate]: 3.75e-06 [parameter_eliminate]: 2.26998e-06 [a_2]: 8.285e-05 [accelerated_algorithm]: 7.25e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 1.79e-06 [shard_inline]: 6.78998e-06 [merge_send_recv]: 8.47e-06 [auto_parallel]: 6.56999e-06 [parallel]: 2.804e-05 [flash_sp]: 8.80001e-06 [merge_comm]: 4.45e-06 [allreduce_fusion]: 3.63e-06 [matmul_add_comm_reduction]: 9.96998e-06 [allreduce_slice_to_reducescatter]: 1.39998e-06 [virtual_shard_identity]: 9.27999e-06 [virtual_dataset]: 6.59999e-06 [get_grad_eliminate_]: 5.67001e-06 [virtual_output]: 6.34999e-06 [merge_forward]: 5.07e-06 [cell_reuse_recompute_pass]: 1.75001e-06 [offload_activation]: 9.95002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.447e-05 [merge_recompute_call_nodes]: 1.71e-06 [before_grad]: 1.051e-05 [set_forward_comm_id_for_comm_node_pass]: 3.84002e-06 [meta_fg_expand]: 2.64001e-06 [flash_sp_send_recv_attached]: 2.75002e-06 [receive_attached]: 2.16e-06 [after_resolve]: 9.82001e-06 [a_after_grad]: 9.54e-06 [renormalize]: 0.00055063 [add_forward_monad_depend]: 1.031e-05 [auto_monad_grad]: 2.51e-06 [auto_monad_eliminator]: 1.504e-05 [cse]: 3.05e-05 [a_3]: 4.428e-05 [Cycle 2]: 0.00063851, [45] [expand_dump_flag]: 1.23002e-06 [switch_simplify]: 7.25e-06 [loop_unroll]: 5.73002e-06 [a_1]: 0.000118 [with_stream_mark]: 1.171e-05 [recompute_prepare]: 6.11e-06 [updatestate_depend_eliminate]: 3.44001e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.67001e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 7.346e-05 [accelerated_algorithm]: 5.83002e-06 [shard]: 1.20001e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.77001e-06 [merge_send_recv]: 5.55001e-06 [auto_parallel]: 6.36998e-06 [parallel]: 5.24e-06 [flash_sp]: 3.41999e-06 [merge_comm]: 3.28e-06 [allreduce_fusion]: 3.11999e-06 [matmul_add_comm_reduction]: 6.06998e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 6.68e-06 [virtual_dataset]: 5.35999e-06 [get_grad_eliminate_]: 5.24e-06 [virtual_output]: 5.14e-06 [merge_forward]: 3.38e-06 [cell_reuse_recompute_pass]: 1.52001e-06 [offload_activation]: 7.3e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.145e-05 [merge_recompute_call_nodes]: 9.10019e-07 [before_grad]: 9.40001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.83001e-06 [meta_fg_expand]: 2.40002e-06 [flash_sp_send_recv_attached]: 9.60019e-07 [receive_attached]: 1.27e-06 [after_resolve]: 8.93002e-06 [a_after_grad]: 7.85e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 2.25002e-06 [auto_monad_grad]: 1.58002e-06 [auto_monad_eliminator]: 9.32999e-06 [cse]: 1.655e-05 [a_3]: 3.481e-05 [py_interpret_to_execute_after_opt_a]: 9.91998e-06 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 3.696e-05 [convert_after_rewriter]: 7e-06 [order_py_execute_after_rewriter]: 5.67001e-06 [mutable_eliminate]: 0.00054188 [opt_b]: 0.0002003, [1] [Cycle 1]: 0.00019338, [7] [b_1]: 0.00011409 [b_2]: 7.02997e-06 [updatestate_depend_eliminate]: 7.21001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.30002e-06 [renormalize]: 4.80009e-07 [cse]: 2.161e-05 [optimize_parallel_all_gather_comm]: 1.808e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.396e-05 [loop_unroll]: 0.00051714 [opt_after_cconv]: 0.00010145, [1] [Cycle 1]: 9.479e-05, [7] [c_1]: 2.623e-05 [parameter_eliminate]: 3.18998e-06 [updatestate_depend_eliminate]: 6.41998e-06 [updatestate_assign_eliminate]: 2.63003e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.97e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.461e-05 [tuple_transform]: 7.152e-05, [1] [Cycle 1]: 6.7e-05, [4] [d_1]: 3.862e-05 [none_parameter_eliminate]: 1.56998e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 7.11001e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 5.349e-05 [cse_after_recomputation]: 2.392e-05, [1] [Cycle 1]: 1.9e-05, [1] [cse]: 1.232e-05 [environ_conv]: 8.09002e-06 [swap_dp_allreduce_reducescatter]: 5.08002e-06 [bias_add_comm_swap]: 3.03e-06 [label_micro_interleaved_index]: 5.27001e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.45001e-06 [slice_recompute_activation]: 2.44001e-06 [micro_interleaved_order_control]: 2.78998e-06 [assign_add_opt]: 1.46998e-06 [ForceFp32Comm]: 8.30012e-07 [remove_cast_before_assign_add]: 1.11997e-06 [full_micro_interleaved_order_control]: 2.27001e-06 [reorder_send_recv_between_fp_bp]: 3.08998e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.32e-06 [interleave_parallel_branches]: 1.06997e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.344e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 4.15999e-06 [overlap_recompute_and_grad_model_parallel]: 4.87e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.47001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.94999e-06 [overlap_grad_ring_attention]: 4.67e-06 [overlap_grad_flash_sp]: 1.942e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.35002e-06 [split_layernorm_comm]: 2.19001e-06 [handle_group_info]: 1.13001e-06 [symbol_engine_optimizer]: 7.342e-05, [1] [Cycle 1]: 6.875e-05, [6] [build]: 3.33998e-06 [elim_shapecalc]: 9.82999e-06 [elim_not_effective]: 1.223e-05 [opt_reshape]: 6.19999e-06 [fold_const_symbol]: 9.43002e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.87001e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.606e-05 [get_jit_bprop_graph]: 1.64e-06 [rewriter_after_jit_bprop_graph]: 4.00998e-06 [opt_after_jit_grad]: 0.00051237 [validate]: 3.839e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.7349 [execute]: 1.095e-05 Sums bootstrap : 0.000588s : 0.08% type_inference : 0.006494s : 0.87% event_method : 0.000014s : 0.00% auto_monad : 0.000059s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000028s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.00% optimize.rewriter_before_opt_a : 0.000063s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.01% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000573s : 0.08% optimize.opt_a.with_stream_mark : 0.000028s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000156s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000014s : 0.00% optimize.opt_a.auto_parallel : 0.000013s : 0.00% optimize.opt_a.parallel : 0.000033s : 0.00% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000026s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000551s : 0.07% optimize.opt_a.add_forward_monad_depend : 0.000013s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.00% optimize.opt_a.cse : 0.000047s : 0.01% optimize.opt_a.a_3 : 0.000079s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000037s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000542s : 0.07% optimize.opt_b.b_1 : 0.000114s : 0.02% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000022s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.00% optimize.loop_unroll : 0.000517s : 0.07% optimize.opt_after_cconv.c_1 : 0.000026s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000020s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000008s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000512s : 0.07% validate : 0.000038s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.734900s : 98.46% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.000180 26 20.08% : 0.000036s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.32% : 0.000006s : 3: substitution.graph_param_transform 62.96% : 0.000113s : 3: substitution.inline 1.89% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.93% : 0.000005s : 4: substitution.remove_not_recompute_node 2.25% : 0.000004s : 2: substitution.replace_old_param 4.68% : 0.000008s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006444 2 90.72% : 0.005846s : 1: type_inference.infer 9.28% : 0.000598s : 1: type_inference.specialize ------[replace.] 0.000039 4 76.91% : 0.000030s : 3: replace.inline 23.09% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 4 93.62% : 0.000111s : 3: match.inline 6.38% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000168 883 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 0.96% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000002s : 9: predicate.addn_zero_filter 0.81% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.20% : 0.000004s : 15: predicate.arithmetic_simplify 0.91% : 0.000002s : 9: predicate.cast_eliminate 0.61% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.62% : 0.000001s : 6: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.41% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.50% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 12: predicate.environ_get_depend_swap 1.66% : 0.000003s : 18: predicate.environ_get_eliminate 1.06% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.41% : 0.000004s : 13: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.80% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 3: predicate.fold_const_symbol 0.66% : 0.000001s : 6: predicate.get_grad_eliminate 0.40% : 0.000001s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.53% : 0.000001s : 6: predicate.incorporate_call_switch 6.64% : 0.000011s : 40: predicate.inline 1.41% : 0.000002s : 6: predicate.inline_without_move 0.37% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 6: predicate.less_batch_normalization 1.81% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 25: predicate.load_eliminater 1.71% : 0.000003s : 3: predicate.loop_unroll_after_grad 2.04% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.60% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.59% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 9: predicate.minmaximum_grad 1.61% : 0.000003s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.35% : 0.000001s : 3: predicate.parallel_virtual_node 1.53% : 0.000003s : 13: predicate.partial_defer_inline 1.38% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000001s : 9: predicate.print_const_string_wrapper 0.58% : 0.000001s : 6: predicate.reduce_all_const_elim 1.15% : 0.000002s : 9: predicate.reduce_eliminate 2.23% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.64% : 0.000001s : 6: predicate.remove_not_recompute_node 1.50% : 0.000003s : 16: predicate.replace_applicator 0.56% : 0.000001s : 6: predicate.replace_old_param 0.38% : 0.000001s : 3: predicate.reset_defer_inline 0.89% : 0.000002s : 9: predicate.reshape_eliminate 0.56% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 3: predicate.row_tensor_eliminate 0.76% : 0.000001s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.22% : 0.000002s : 6: predicate.shard_identity_eliminate 0.67% : 0.000001s : 6: predicate.special_op_eliminate 0.82% : 0.000001s : 6: predicate.specialize_transform 0.93% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.68% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 13: predicate.switch_defer_inline 1.93% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.00% : 0.000008s : 43: predicate.switch_simplify 0.85% : 0.000001s : 9: predicate.tile_eliminate 0.87% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.08% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.56% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.21% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.94% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.78% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.64% : 0.000001s : 6: predicate.virtual_output_eliminate 0.27% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000370 8 46.55% : 0.000172s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.45% : 0.000198s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.760818 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.48% : 0.003616s : 1: add_attr 0.47% : 0.003604s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000058s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000064s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.08% : 0.000629s : 1: bootstrap 0.00% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000017s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000027s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000019s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.07% : 0.000528s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.07% : 0.000552s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000015s : 1: opt.transform.mutable_eliminate 0.13% : 0.000961s : 78: opt.transform.opt_a 0.00% : 0.000025s : 1: opt.transform.opt_after_cconv 0.00% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000092s : 28: opt.transform.opt_b 0.01% : 0.000044s : 2: opt.transform.opt_trans_graph 0.00% : 0.000034s : 4: opt.transform.symbol_engine_opt 0.31% : 0.002381s : 1: opt_a 0.01% : 0.000105s : 1: opt_after_cconv 0.07% : 0.000523s : 1: opt_after_jit_grad 0.03% : 0.000204s : 1: opt_b 0.59% : 0.004509s : 1: optimize 0.00% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000032s : 1: pre_auto_parallel 0.00% : 0.000027s : 1: py_interpret_to_execute 0.00% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000018s : 1: remove_dup_value 0.04% : 0.000302s : 1: renormalize.infer 0.03% : 0.000240s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000041s : 1: rewriter_after_opt_a 0.01% : 0.000068s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000076s : 1: symbol_engine_optimizer 96.60% : 0.734926s : 1: task_emit 0.01% : 0.000075s : 1: tuple_transform 0.86% : 0.006509s : 1: type_inference 0.01% : 0.000067s : 1: validate TotalTime = 0.0597226, [24] [bootstrap]: 0.00054727 [type_inference]: 0.00639564 [event_method]: 1.451e-05 [auto_monad]: 6.28e-05 [graph_reusing]: 5.52001e-06 [inline]: 1.96e-06 [add_attr]: 0.00316453, [1] [add_attr_with_inline]: 0.0031551, [1] [Cycle 1]: 5.812e-05, [2] [tag_attr]: 1.674e-05 [meta_addattr_fg_expand]: 4.48001e-06 [parallel-infer-symbol]: 3.25e-06 [pre_auto_parallel]: 2.566e-05 [insert-virtual-dataset]: 2.76e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.73002e-06 [optimize]: 0.00418637, [53] [py_interpret_to_execute]: 2.312e-05 [rewriter_before_opt_a]: 5.295e-05 [opt_a]: 0.00214395, [2] [Cycle 1]: 0.00151823, [45] [expand_dump_flag]: 2.77002e-06 [switch_simplify]: 3.076e-05 [loop_unroll]: 1.725e-05 [a_1]: 0.00036151 [with_stream_mark]: 1.634e-05 [recompute_prepare]: 8.02003e-06 [updatestate_depend_eliminate]: 3.8e-06 [updatestate_assign_eliminate]: 3.78999e-06 [updatestate_loads_eliminate]: 3.41001e-06 [parameter_eliminate]: 2.07999e-06 [a_2]: 8.337e-05 [accelerated_algorithm]: 6.86001e-06 [shard]: 2.29001e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 6.27001e-06 [merge_send_recv]: 9.72001e-06 [auto_parallel]: 6.20002e-06 [parallel]: 1.912e-05 [flash_sp]: 7.81001e-06 [merge_comm]: 3.95e-06 [allreduce_fusion]: 3.96001e-06 [matmul_add_comm_reduction]: 9.71998e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 7.25003e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.69e-06 [virtual_output]: 6.14999e-06 [merge_forward]: 4.40999e-06 [cell_reuse_recompute_pass]: 1.16997e-06 [offload_activation]: 1.001e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.186e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 1.034e-05 [set_forward_comm_id_for_comm_node_pass]: 3.72002e-06 [meta_fg_expand]: 2.74001e-06 [flash_sp_send_recv_attached]: 2.81e-06 [receive_attached]: 2.46e-06 [after_resolve]: 9.94001e-06 [a_after_grad]: 8.89e-06 [renormalize]: 0.00047647 [add_forward_monad_depend]: 5.14e-06 [auto_monad_grad]: 1.95001e-06 [auto_monad_eliminator]: 1.368e-05 [cse]: 3.18e-05 [a_3]: 4.449e-05 [Cycle 2]: 0.00061588, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 7.48e-06 [loop_unroll]: 5.85002e-06 [a_1]: 0.0001168 [with_stream_mark]: 1.208e-05 [recompute_prepare]: 6.39999e-06 [updatestate_depend_eliminate]: 2.87002e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 7.291e-05 [accelerated_algorithm]: 6.01998e-06 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.97001e-06 [merge_send_recv]: 5.17e-06 [auto_parallel]: 6.11e-06 [parallel]: 4.37e-06 [flash_sp]: 3.58e-06 [merge_comm]: 3.11999e-06 [allreduce_fusion]: 2.83e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.54999e-06 [virtual_dataset]: 5.42001e-06 [get_grad_eliminate_]: 5.30999e-06 [virtual_output]: 5.17e-06 [merge_forward]: 2.96999e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 7.08e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.037e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.50999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94999e-06 [meta_fg_expand]: 1.90001e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 8.17998e-06 [a_after_grad]: 8.1e-06 [renormalize]: 1.19995e-07 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 1.00999e-06 [auto_monad_eliminator]: 7.05e-06 [cse]: 1.426e-05 [a_3]: 3.358e-05 [py_interpret_to_execute_after_opt_a]: 8.83001e-06 [slice_cell_reuse_recomputed_activation]: 2.07999e-06 [rewriter_after_opt_a]: 3.319e-05 [convert_after_rewriter]: 7.05002e-06 [order_py_execute_after_rewriter]: 4.81002e-06 [mutable_eliminate]: 0.00050793 [opt_b]: 0.00026312, [1] [Cycle 1]: 0.00025637, [7] [b_1]: 0.00017918 [b_2]: 8.04002e-06 [updatestate_depend_eliminate]: 6.32001e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.51e-06 [renormalize]: 4.59986e-07 [cse]: 2.016e-05 [optimize_parallel_all_gather_comm]: 1.73e-05 [overlap_param_gather]: 2.04e-06 [cconv]: 2.589e-05 [loop_unroll]: 0.00044033 [opt_after_cconv]: 0.00010146, [1] [Cycle 1]: 9.512e-05, [7] [c_1]: 2.597e-05 [parameter_eliminate]: 2.78003e-06 [updatestate_depend_eliminate]: 5.77999e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.58e-06 [cse]: 1.88e-05 [renormalize]: 6.19999e-07 [remove_dup_value]: 1.518e-05 [tuple_transform]: 6.985e-05, [1] [Cycle 1]: 6.547e-05, [4] [d_1]: 3.855e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 6.84999e-06 [partial_unused_args_eliminate]: 2.06e-06 [add_recomputation]: 4.577e-05 [cse_after_recomputation]: 2.189e-05, [1] [Cycle 1]: 1.729e-05, [1] [cse]: 1.131e-05 [environ_conv]: 6.51e-06 [swap_dp_allreduce_reducescatter]: 5.29e-06 [bias_add_comm_swap]: 2.59001e-06 [label_micro_interleaved_index]: 5.08002e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.42999e-06 [slice_recompute_activation]: 2.28002e-06 [micro_interleaved_order_control]: 2.66e-06 [assign_add_opt]: 1.34998e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.43e-06 [reorder_send_recv_between_fp_bp]: 2.89001e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 9.99979e-07 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.16997e-06 [overlap_opt_shard_in_pipeline]: 1.57999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62999e-06 [control_data_broadcast_order]: 1.257e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 4.17998e-06 [overlap_recompute_and_grad_model_parallel]: 4.83001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.67001e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 4.18999e-06 [overlap_grad_flash_sp]: 1.927e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 1.13001e-06 [symbol_engine_optimizer]: 7.485e-05, [1] [Cycle 1]: 6.986e-05, [6] [build]: 2.68e-06 [elim_shapecalc]: 9.17001e-06 [elim_not_effective]: 1.27e-05 [opt_reshape]: 6.61999e-06 [fold_const_symbol]: 9.64999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.97001e-06 [pipeline_parallel_scheduler]: 1.81e-06 [auto_monad_reorder]: 1.647e-05 [get_jit_bprop_graph]: 1.34e-06 [rewriter_after_jit_bprop_graph]: 4.37e-06 [opt_after_jit_grad]: 0.00048728 [validate]: 3.773e-05 [backend_pass]: 1.10001e-06 [task_emit]: 0.0445109 [execute]: 9.81e-06 Sums bootstrap : 0.000547s : 0.99% type_inference : 0.006396s : 11.52% event_method : 0.000015s : 0.03% auto_monad : 0.000063s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000026s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.04% optimize.rewriter_before_opt_a : 0.000053s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.07% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000478s : 0.86% optimize.opt_a.with_stream_mark : 0.000028s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000156s : 0.28% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000015s : 0.03% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000477s : 0.86% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.04% optimize.opt_a.cse : 0.000046s : 0.08% optimize.opt_a.a_3 : 0.000078s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000508s : 0.91% optimize.opt_b.b_1 : 0.000179s : 0.32% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.05% optimize.loop_unroll : 0.000440s : 0.79% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000039s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000007s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000019s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000487s : 0.88% validate : 0.000038s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.044511s : 80.17% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000149 24 20.46% : 0.000031s : 4: substitution.arithmetic_simplify 1.55% : 0.000002s : 2: substitution.elim_not_effective 0.92% : 0.000001s : 2: substitution.fold_const_symbol 3.67% : 0.000005s : 3: substitution.graph_param_transform 66.24% : 0.000099s : 3: substitution.inline 2.04% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.17% : 0.000005s : 4: substitution.remove_not_recompute_node 1.95% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006347 2 92.28% : 0.005857s : 1: type_inference.infer 7.72% : 0.000490s : 1: type_inference.specialize ------[replace.] 0.000029 3 100.00% : 0.000029s : 3: replace.inline ------[match.] 0.000097 3 100.00% : 0.000097s : 3: match.inline ------[predicate.] 0.000149 815 0.85% : 0.000001s : 8: predicate.accumulaten_eliminater 1.09% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 6: predicate.addn_check_dump 0.86% : 0.000001s : 8: predicate.addn_zero_filter 0.81% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.24% : 0.000003s : 14: predicate.arithmetic_simplify 0.93% : 0.000001s : 8: predicate.cast_eliminate 0.68% : 0.000001s : 6: predicate.check_bprop_eliminate 0.65% : 0.000001s : 6: predicate.compare_switch_simplify 0.25% : 0.000000s : 3: predicate.const_output_eliminate 0.67% : 0.000001s : 6: predicate.depend_value_elim 0.81% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 3: predicate.elim_not_effective 0.46% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_depend_swap 1.90% : 0.000003s : 17: predicate.environ_get_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.16% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 11: predicate.float_depend_g_call 0.60% : 0.000001s : 6: predicate.float_environ_get_switch 0.93% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.27% : 0.000000s : 3: predicate.graph_param_transform 0.71% : 0.000001s : 6: predicate.incorporate_call 0.63% : 0.000001s : 6: predicate.incorporate_call_switch 6.27% : 0.000009s : 37: predicate.inline 1.00% : 0.000001s : 6: predicate.inline_without_move 0.44% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.01% : 0.000002s : 6: predicate.less_batch_normalization 1.61% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.22% : 0.000003s : 22: predicate.load_eliminater 1.11% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.07% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.66% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 8: predicate.minmaximum_grad 1.15% : 0.000002s : 3: predicate.mutable_eliminate 0.40% : 0.000001s : 3: predicate.opt_reshape 0.47% : 0.000001s : 3: predicate.parallel_virtual_node 1.63% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 11: predicate.partial_eliminate 0.91% : 0.000001s : 8: predicate.print_const_string_wrapper 0.77% : 0.000001s : 6: predicate.reduce_all_const_elim 1.10% : 0.000002s : 8: predicate.reduce_eliminate 2.19% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.62% : 0.000001s : 6: predicate.remove_not_recompute_node 1.18% : 0.000002s : 14: predicate.replace_applicator 0.82% : 0.000001s : 6: predicate.replace_old_param 0.35% : 0.000001s : 3: predicate.reset_defer_inline 0.85% : 0.000001s : 8: predicate.reshape_eliminate 0.73% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 3: predicate.row_tensor_eliminate 0.87% : 0.000001s : 6: predicate.same_eliminate 0.50% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 6: predicate.shard_identity_eliminate 0.81% : 0.000001s : 6: predicate.special_op_eliminate 0.84% : 0.000001s : 6: predicate.specialize_transform 1.20% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.26% : 0.000002s : 11: predicate.switch_defer_inline 1.89% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.95% : 0.000007s : 38: predicate.switch_simplify 0.95% : 0.000001s : 8: predicate.tile_eliminate 0.83% : 0.000001s : 8: predicate.transpose_eliminate 1.59% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.64% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.96% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 3: predicate.value_based_eliminate 0.70% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000323 7 38.64% : 0.000125s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.36% : 0.000198s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.068623 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.62% : 0.003170s : 1: add_attr 4.60% : 0.003159s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000068s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.85% : 0.000586s : 1: bootstrap 0.04% : 0.000030s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000010s : 1: environ_conv 0.03% : 0.000020s : 1: event_method 0.03% : 0.000018s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.65% : 0.000449s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000518s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 1.24% : 0.000853s : 78: opt.transform.opt_a 0.04% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.23% : 0.000158s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 3.13% : 0.002147s : 1: opt_a 0.15% : 0.000105s : 1: opt_after_cconv 0.73% : 0.000498s : 1: opt_after_jit_grad 0.39% : 0.000266s : 1: opt_b 6.11% : 0.004191s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000030s : 1: pre_auto_parallel 0.04% : 0.000027s : 1: py_interpret_to_execute 0.02% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.38% : 0.000258s : 1: renormalize.infer 0.31% : 0.000212s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000037s : 1: rewriter_after_opt_a 0.08% : 0.000057s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000078s : 1: symbol_engine_optimizer 64.90% : 0.044537s : 1: task_emit 0.11% : 0.000073s : 1: tuple_transform 9.35% : 0.006414s : 1: type_inference 0.09% : 0.000063s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x6-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x6-ge],max_mem:10.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x7-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x7-pynative],max_mem:10.0M TotalTime = 0.0228063, [24] [bootstrap]: 0.00054103 [type_inference]: 0.0067465 [event_method]: 1.565e-05 [auto_monad]: 6.17e-05 [graph_reusing]: 5.31002e-06 [inline]: 2.44999e-06 [add_attr]: 0.00370156, [1] [add_attr_with_inline]: 0.00369006, [1] [Cycle 1]: 5.054e-05, [2] [tag_attr]: 1.592e-05 [meta_addattr_fg_expand]: 4.44998e-06 [parallel-infer-symbol]: 3.35e-06 [pre_auto_parallel]: 2.674e-05 [insert-virtual-dataset]: 2.49999e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 2.36e-06 [pipeline_split]: 1.92001e-06 [optimize]: 0.00429426, [53] [py_interpret_to_execute]: 2.574e-05 [rewriter_before_opt_a]: 6.692e-05 [opt_a]: 0.0022978, [2] [Cycle 1]: 0.00166183, [45] [expand_dump_flag]: 3.31001e-06 [switch_simplify]: 3.372e-05 [loop_unroll]: 2.082e-05 [a_1]: 0.00045083 [with_stream_mark]: 1.621e-05 [recompute_prepare]: 9.31e-06 [updatestate_depend_eliminate]: 3.81001e-06 [updatestate_assign_eliminate]: 3.65998e-06 [updatestate_loads_eliminate]: 3.51001e-06 [parameter_eliminate]: 1.69998e-06 [a_2]: 8.218e-05 [accelerated_algorithm]: 7.48e-06 [shard]: 2.42001e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 6.49001e-06 [merge_send_recv]: 9.72001e-06 [auto_parallel]: 6.78998e-06 [parallel]: 2.686e-05 [flash_sp]: 9.09e-06 [merge_comm]: 4.27e-06 [allreduce_fusion]: 3.82002e-06 [matmul_add_comm_reduction]: 9.36e-06 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 8.06001e-06 [virtual_dataset]: 6.44001e-06 [get_grad_eliminate_]: 5.81e-06 [virtual_output]: 6.31998e-06 [merge_forward]: 4.85001e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 1.168e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.238e-05 [merge_recompute_call_nodes]: 1.71e-06 [before_grad]: 1.095e-05 [set_forward_comm_id_for_comm_node_pass]: 3.76001e-06 [meta_fg_expand]: 2.69001e-06 [flash_sp_send_recv_attached]: 2.84001e-06 [receive_attached]: 1.99e-06 [after_resolve]: 1.047e-05 [a_after_grad]: 9.04e-06 [renormalize]: 0.00049874 [add_forward_monad_depend]: 9.05001e-06 [auto_monad_grad]: 2.29001e-06 [auto_monad_eliminator]: 1.409e-05 [cse]: 2.929e-05 [a_3]: 4.428e-05 [Cycle 2]: 0.00062512, [45] [expand_dump_flag]: 8.09989e-07 [switch_simplify]: 7.45e-06 [loop_unroll]: 5.99999e-06 [a_1]: 0.00011456 [with_stream_mark]: 1.05e-05 [recompute_prepare]: 6.49001e-06 [updatestate_depend_eliminate]: 3.04999e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.86999e-06 [parameter_eliminate]: 7.99977e-07 [a_2]: 7.255e-05 [accelerated_algorithm]: 5.91e-06 [shard]: 1.52999e-06 [meta_shard_fg_expand]: 1.35001e-06 [shard_inline]: 5.96e-06 [merge_send_recv]: 4.70999e-06 [auto_parallel]: 6.30002e-06 [parallel]: 4.42e-06 [flash_sp]: 3.26999e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 2.82002e-06 [matmul_add_comm_reduction]: 5.42999e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.31e-06 [virtual_dataset]: 5.58997e-06 [get_grad_eliminate_]: 5.62999e-06 [virtual_output]: 5.21998e-06 [merge_forward]: 2.83e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 7.23e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.05e-05 [merge_recompute_call_nodes]: 8.10018e-07 [before_grad]: 8.98002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 2.01e-06 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.02998e-06 [after_resolve]: 8.78001e-06 [a_after_grad]: 8.25999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.99979e-07 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 6.86001e-06 [cse]: 1.687e-05 [a_3]: 4.465e-05 [py_interpret_to_execute_after_opt_a]: 9.25999e-06 [slice_cell_reuse_recomputed_activation]: 2.41e-06 [rewriter_after_opt_a]: 3.642e-05 [convert_after_rewriter]: 8.03999e-06 [order_py_execute_after_rewriter]: 5.29e-06 [mutable_eliminate]: 0.00050873 [opt_b]: 0.00019705, [1] [Cycle 1]: 0.00019015, [7] [b_1]: 0.00011244 [b_2]: 8.37e-06 [updatestate_depend_eliminate]: 5.94999e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.46e-06 [renormalize]: 4.89992e-07 [cse]: 2.002e-05 [optimize_parallel_all_gather_comm]: 1.683e-05 [overlap_param_gather]: 2.11998e-06 [cconv]: 2.453e-05 [loop_unroll]: 0.00042966 [opt_after_cconv]: 9.763e-05, [1] [Cycle 1]: 9.141e-05, [7] [c_1]: 2.566e-05 [parameter_eliminate]: 3.06999e-06 [updatestate_depend_eliminate]: 5.42001e-06 [updatestate_assign_eliminate]: 2.68998e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.745e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.498e-05 [tuple_transform]: 6.846e-05, [1] [Cycle 1]: 6.408e-05, [4] [d_1]: 3.714e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.29978e-07 [switch_simplify]: 6.38998e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 5.207e-05 [cse_after_recomputation]: 2.198e-05, [1] [Cycle 1]: 1.688e-05, [1] [cse]: 1.149e-05 [environ_conv]: 7.83999e-06 [swap_dp_allreduce_reducescatter]: 5.56998e-06 [bias_add_comm_swap]: 2.71e-06 [label_micro_interleaved_index]: 4.48999e-06 [label_fine_grained_interleaved_index]: 3.16999e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 1.10999e-06 [remove_cast_before_assign_add]: 1.45001e-06 [full_micro_interleaved_order_control]: 2.24001e-06 [reorder_send_recv_between_fp_bp]: 2.51998e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.12e-06 [interleave_split_concat_branches]: 1.25001e-06 [interleave_parallel_branches]: 1.63002e-06 [overlap_opt_shard_in_pipeline]: 1.52001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.96e-06 [control_data_broadcast_order]: 1.244e-05 [grouped_pairwise_exchange_alltoall]: 1.96e-06 [offloading_packed_experts]: 3.68e-06 [overlap_recompute_and_grad_model_parallel]: 5.13002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 4.14997e-06 [overlap_grad_flash_sp]: 2.015e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.24999e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 1.30001e-06 [symbol_engine_optimizer]: 7.297e-05, [1] [Cycle 1]: 6.839e-05, [6] [build]: 2.84999e-06 [elim_shapecalc]: 8.99e-06 [elim_not_effective]: 1.24e-05 [opt_reshape]: 6.24001e-06 [fold_const_symbol]: 9.32001e-06 [renormalize]: 2.79979e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.584e-05 [get_jit_bprop_graph]: 1.24e-06 [rewriter_after_jit_bprop_graph]: 4.07998e-06 [opt_after_jit_grad]: 0.00048648 [validate]: 3.768e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00662514 [execute]: 8.60001e-06 Sums bootstrap : 0.000541s : 2.99% type_inference : 0.006747s : 37.34% event_method : 0.000016s : 0.09% auto_monad : 0.000062s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000026s : 0.14% optimize.rewriter_before_opt_a : 0.000067s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000565s : 3.13% optimize.opt_a.with_stream_mark : 0.000027s : 0.15% optimize.opt_a.recompute_prepare : 0.000016s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000155s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.07% optimize.opt_a.parallel : 0.000031s : 0.17% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000012s : 0.06% optimize.opt_a.merge_forward : 0.000008s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000019s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000020s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000499s : 2.76% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.06% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000046s : 0.26% optimize.opt_a.a_3 : 0.000089s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.20% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000509s : 2.82% optimize.opt_b.b_1 : 0.000112s : 0.62% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.14% optimize.loop_unroll : 0.000430s : 2.38% optimize.opt_after_cconv.c_1 : 0.000026s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.08% optimize.tuple_transform.d_1 : 0.000037s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000008s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000002s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000020s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000486s : 2.69% validate : 0.000038s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006625s : 36.67% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000174 26 19.16% : 0.000033s : 5: substitution.arithmetic_simplify 1.21% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 2.83% : 0.000005s : 3: substitution.graph_param_transform 64.12% : 0.000111s : 3: substitution.inline 2.10% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.50% : 0.000004s : 4: substitution.remove_not_recompute_node 2.23% : 0.000004s : 2: substitution.replace_old_param 5.03% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006692 2 90.76% : 0.006073s : 1: type_inference.infer 9.24% : 0.000618s : 1: type_inference.specialize ------[replace.] 0.000039 4 78.40% : 0.000030s : 3: replace.inline 21.60% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 4 93.24% : 0.000109s : 3: match.inline 6.76% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 883 0.94% : 0.000002s : 9: predicate.accumulaten_eliminater 1.09% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 1.04% : 0.000002s : 9: predicate.addn_zero_filter 0.82% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.31% : 0.000004s : 15: predicate.arithmetic_simplify 1.05% : 0.000002s : 9: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.61% : 0.000001s : 6: predicate.depend_value_elim 0.86% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.14% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.38% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.18% : 0.000002s : 12: predicate.environ_get_depend_swap 1.71% : 0.000003s : 18: predicate.environ_get_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.38% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.36% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.15% : 0.000010s : 40: predicate.inline 0.94% : 0.000002s : 6: predicate.inline_without_move 0.45% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 6: predicate.less_batch_normalization 1.61% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.30% : 0.000004s : 25: predicate.load_eliminater 1.01% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.33% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.59% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.58% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 9: predicate.minmaximum_grad 1.03% : 0.000002s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.36% : 0.000001s : 3: predicate.parallel_virtual_node 1.57% : 0.000003s : 13: predicate.partial_defer_inline 1.46% : 0.000002s : 13: predicate.partial_eliminate 0.88% : 0.000001s : 9: predicate.print_const_string_wrapper 0.63% : 0.000001s : 6: predicate.reduce_all_const_elim 1.17% : 0.000002s : 9: predicate.reduce_eliminate 2.41% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 6: predicate.remove_not_recompute_node 1.24% : 0.000002s : 16: predicate.replace_applicator 0.62% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 9: predicate.reshape_eliminate 0.67% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 3: predicate.row_tensor_eliminate 0.81% : 0.000001s : 6: predicate.same_eliminate 0.50% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 6: predicate.shard_identity_eliminate 0.71% : 0.000001s : 6: predicate.special_op_eliminate 0.81% : 0.000001s : 6: predicate.specialize_transform 1.11% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 13: predicate.switch_defer_inline 2.01% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.91% : 0.000008s : 43: predicate.switch_simplify 0.93% : 0.000001s : 9: predicate.tile_eliminate 0.88% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.56% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.57% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.06% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 3: predicate.value_based_eliminate 0.65% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000397 8 48.68% : 0.000193s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.32% : 0.000204s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032401 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.44% : 0.003707s : 1: add_attr 11.40% : 0.003694s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000067s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.79% : 0.000581s : 1: bootstrap 0.09% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000012s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000011s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000014s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.36% : 0.000440s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.60% : 0.000518s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000014s : 1: opt.transform.mutable_eliminate 2.94% : 0.000954s : 78: opt.transform.opt_a 0.07% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000093s : 28: opt.transform.opt_b 0.13% : 0.000041s : 2: opt.transform.opt_trans_graph 0.10% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.10% : 0.002301s : 1: opt_a 0.31% : 0.000101s : 1: opt_after_cconv 1.54% : 0.000498s : 1: opt_after_jit_grad 0.62% : 0.000200s : 1: opt_b 13.27% : 0.004298s : 1: optimize 0.06% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.09% : 0.000030s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.81% : 0.000261s : 1: renormalize.infer 0.71% : 0.000229s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000040s : 1: rewriter_after_opt_a 0.22% : 0.000071s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000076s : 1: symbol_engine_optimizer 20.48% : 0.006637s : 1: task_emit 0.22% : 0.000071s : 1: tuple_transform 20.88% : 0.006765s : 1: type_inference 0.21% : 0.000067s : 1: validate TotalTime = 0.0200693, [24] [bootstrap]: 0.00040801 [type_inference]: 0.00563999 [event_method]: 1.301e-05 [auto_monad]: 6.055e-05 [graph_reusing]: 5.62001e-06 [inline]: 1.97001e-06 [add_attr]: 0.00307457, [1] [add_attr_with_inline]: 0.00306588, [1] [Cycle 1]: 5.334e-05, [2] [tag_attr]: 1.359e-05 [meta_addattr_fg_expand]: 3.88999e-06 [parallel-infer-symbol]: 3.14999e-06 [pre_auto_parallel]: 2.455e-05 [insert-virtual-dataset]: 2.51e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 1.99999e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.00399093, [53] [py_interpret_to_execute]: 2.223e-05 [rewriter_before_opt_a]: 5.076e-05 [opt_a]: 0.00208385, [2] [Cycle 1]: 0.00146321, [45] [expand_dump_flag]: 2.84999e-06 [switch_simplify]: 2.934e-05 [loop_unroll]: 1.732e-05 [a_1]: 0.00034972 [with_stream_mark]: 1.452e-05 [recompute_prepare]: 7.86001e-06 [updatestate_depend_eliminate]: 4.35999e-06 [updatestate_assign_eliminate]: 3.58999e-06 [updatestate_loads_eliminate]: 3.28e-06 [parameter_eliminate]: 1.79e-06 [a_2]: 8.294e-05 [accelerated_algorithm]: 7.10998e-06 [shard]: 2.53e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 6.32001e-06 [merge_send_recv]: 8.23001e-06 [auto_parallel]: 6.41998e-06 [parallel]: 1.884e-05 [flash_sp]: 7.13998e-06 [merge_comm]: 3.70998e-06 [allreduce_fusion]: 3.63e-06 [matmul_add_comm_reduction]: 9.61998e-06 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 7.43e-06 [virtual_dataset]: 6.11998e-06 [get_grad_eliminate_]: 5.80002e-06 [virtual_output]: 6.05002e-06 [merge_forward]: 3.96001e-06 [cell_reuse_recompute_pass]: 1.06002e-06 [offload_activation]: 9.39e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.234e-05 [merge_recompute_call_nodes]: 1.81003e-06 [before_grad]: 1.064e-05 [set_forward_comm_id_for_comm_node_pass]: 3.96001e-06 [meta_fg_expand]: 2.63e-06 [flash_sp_send_recv_attached]: 2.55002e-06 [receive_attached]: 2.88e-06 [after_resolve]: 9.93002e-06 [a_after_grad]: 8.82e-06 [renormalize]: 0.00042265 [add_forward_monad_depend]: 4.70001e-06 [auto_monad_grad]: 1.87001e-06 [auto_monad_eliminator]: 1.349e-05 [cse]: 2.879e-05 [a_3]: 6.947e-05 [Cycle 2]: 0.00061046, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 7.46001e-06 [loop_unroll]: 5.70001e-06 [a_1]: 0.000115 [with_stream_mark]: 1.187e-05 [recompute_prepare]: 6.14999e-06 [updatestate_depend_eliminate]: 3.01001e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 7.328e-05 [accelerated_algorithm]: 5.64e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 5.00001e-06 [auto_parallel]: 5.91998e-06 [parallel]: 4.23999e-06 [flash_sp]: 3.14999e-06 [merge_comm]: 3.26001e-06 [allreduce_fusion]: 2.88003e-06 [matmul_add_comm_reduction]: 5.45001e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.58e-06 [virtual_dataset]: 5.32999e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 5.15001e-06 [merge_forward]: 2.87002e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 6.91999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.071e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 9.06998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.8e-06 [meta_fg_expand]: 2.00002e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 8.61997e-06 [a_after_grad]: 7.71999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.31002e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.39999e-06 [cse]: 1.453e-05 [a_3]: 3.41e-05 [py_interpret_to_execute_after_opt_a]: 7.85e-06 [slice_cell_reuse_recomputed_activation]: 2.39999e-06 [rewriter_after_opt_a]: 3.5e-05 [convert_after_rewriter]: 6.64001e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00048648 [opt_b]: 0.00019167, [1] [Cycle 1]: 0.00018512, [7] [b_1]: 0.00011318 [b_2]: 7.18998e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.67001e-06 [updatestate_loads_eliminate]: 2.46e-06 [renormalize]: 3.00002e-07 [cse]: 1.82e-05 [optimize_parallel_all_gather_comm]: 1.602e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.392e-05 [loop_unroll]: 0.0004208 [opt_after_cconv]: 9.691e-05, [1] [Cycle 1]: 9.112e-05, [7] [c_1]: 2.649e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.82002e-06 [updatestate_loads_eliminate]: 2.36998e-06 [cse]: 1.672e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.47e-05 [tuple_transform]: 6.718e-05, [1] [Cycle 1]: 6.269e-05, [4] [d_1]: 3.623e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 6.33002e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 4.34e-05 [cse_after_recomputation]: 2.059e-05, [1] [Cycle 1]: 1.609e-05, [1] [cse]: 1.07e-05 [environ_conv]: 5.08002e-06 [swap_dp_allreduce_reducescatter]: 5.66e-06 [bias_add_comm_swap]: 2.41e-06 [label_micro_interleaved_index]: 4.50001e-06 [label_fine_grained_interleaved_index]: 2.99001e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.11998e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.56002e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.06997e-06 [full_micro_interleaved_order_control]: 2.58e-06 [reorder_send_recv_between_fp_bp]: 2.98e-06 [comm_op_add_attrs]: 1.38002e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.44e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.27999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.06e-06 [control_data_broadcast_order]: 1.179e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 3.68999e-06 [overlap_recompute_and_grad_model_parallel]: 4.84e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.38998e-06 [overlap_grad_ring_attention]: 4.03001e-06 [overlap_grad_flash_sp]: 1.816e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.43e-06 [split_layernorm_comm]: 2.06e-06 [handle_group_info]: 1.10999e-06 [symbol_engine_optimizer]: 7.321e-05, [1] [Cycle 1]: 6.869e-05, [6] [build]: 2.46e-06 [elim_shapecalc]: 8.87999e-06 [elim_not_effective]: 1.25e-05 [opt_reshape]: 6.57002e-06 [fold_const_symbol]: 9.75002e-06 [renormalize]: 2.69996e-07 [detach_backward]: 1.64998e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.659e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.28998e-06 [opt_after_jit_grad]: 0.00045271 [validate]: 3.54e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.00612583 [execute]: 7.75998e-06 Sums bootstrap : 0.000408s : 2.55% type_inference : 0.005640s : 35.24% event_method : 0.000013s : 0.08% auto_monad : 0.000061s : 0.38% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.14% optimize.rewriter_before_opt_a : 0.000051s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000037s : 0.23% optimize.opt_a.loop_unroll : 0.000023s : 0.14% optimize.opt_a.a_1 : 0.000465s : 2.90% optimize.opt_a.with_stream_mark : 0.000026s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000156s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000020s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000423s : 2.64% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000043s : 0.27% optimize.opt_a.a_3 : 0.000104s : 0.65% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000035s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000486s : 3.04% optimize.opt_b.b_1 : 0.000113s : 0.71% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.15% optimize.loop_unroll : 0.000421s : 2.63% optimize.opt_after_cconv.c_1 : 0.000026s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000036s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000453s : 2.83% validate : 0.000035s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006126s : 38.27% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000140 24 20.95% : 0.000029s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 1.11% : 0.000002s : 2: substitution.fold_const_symbol 3.60% : 0.000005s : 3: substitution.graph_param_transform 64.74% : 0.000091s : 3: substitution.inline 2.36% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.44% : 0.000005s : 4: substitution.remove_not_recompute_node 2.38% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005597 2 91.94% : 0.005146s : 1: type_inference.infer 8.06% : 0.000451s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000089 3 100.00% : 0.000089s : 3: match.inline ------[predicate.] 0.000147 815 0.87% : 0.000001s : 8: predicate.accumulaten_eliminater 0.89% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.89% : 0.000001s : 8: predicate.addn_zero_filter 0.78% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.38% : 0.000003s : 14: predicate.arithmetic_simplify 0.91% : 0.000001s : 8: predicate.cast_eliminate 0.67% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.68% : 0.000001s : 6: predicate.depend_value_elim 0.85% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_depend_swap 1.83% : 0.000003s : 17: predicate.environ_get_eliminate 1.11% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.16% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.13% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 6: predicate.float_environ_get_switch 0.91% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.73% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.77% : 0.000001s : 6: predicate.incorporate_call 0.63% : 0.000001s : 6: predicate.incorporate_call_switch 6.33% : 0.000009s : 37: predicate.inline 1.05% : 0.000002s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 6: predicate.less_batch_normalization 1.53% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.26% : 0.000003s : 22: predicate.load_eliminater 1.07% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.99% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 6: predicate.merge_addn 0.67% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.21% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.42% : 0.000001s : 3: predicate.parallel_virtual_node 1.43% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 11: predicate.partial_eliminate 0.85% : 0.000001s : 8: predicate.print_const_string_wrapper 0.67% : 0.000001s : 6: predicate.reduce_all_const_elim 1.11% : 0.000002s : 8: predicate.reduce_eliminate 2.30% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.66% : 0.000001s : 6: predicate.remove_not_recompute_node 1.15% : 0.000002s : 14: predicate.replace_applicator 0.74% : 0.000001s : 6: predicate.replace_old_param 0.32% : 0.000000s : 3: predicate.reset_defer_inline 0.93% : 0.000001s : 8: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 3: predicate.row_tensor_eliminate 0.93% : 0.000001s : 6: predicate.same_eliminate 0.53% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 6: predicate.shard_identity_eliminate 0.82% : 0.000001s : 6: predicate.special_op_eliminate 0.89% : 0.000001s : 6: predicate.specialize_transform 1.09% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.26% : 0.000002s : 11: predicate.switch_defer_inline 1.92% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.85% : 0.000007s : 38: predicate.switch_simplify 0.87% : 0.000001s : 8: predicate.tile_eliminate 0.85% : 0.000001s : 8: predicate.transpose_eliminate 1.60% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.56% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.23% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.73% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000266 7 34.82% : 0.000093s : 2: func_graph_cloner_run.FuncGraphClonerGraph 65.18% : 0.000173s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028579 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.77% : 0.003079s : 1: add_attr 10.74% : 0.003070s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.23% : 0.000066s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.52% : 0.000434s : 1: bootstrap 0.10% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000430s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.73% : 0.000495s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.02% : 0.000864s : 78: opt.transform.opt_a 0.09% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000093s : 28: opt.transform.opt_b 0.14% : 0.000040s : 2: opt.transform.opt_trans_graph 0.12% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.30% : 0.002087s : 1: opt_a 0.35% : 0.000100s : 1: opt_after_cconv 1.62% : 0.000463s : 1: opt_after_jit_grad 0.68% : 0.000195s : 1: opt_b 13.98% : 0.003995s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.09% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.78% : 0.000224s : 1: renormalize.infer 0.67% : 0.000192s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000039s : 1: rewriter_after_opt_a 0.19% : 0.000055s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000076s : 1: symbol_engine_optimizer 21.47% : 0.006137s : 1: task_emit 0.24% : 0.000070s : 1: tuple_transform 19.79% : 0.005655s : 1: type_inference 0.22% : 0.000064s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x7-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x7-kbk],max_mem:10.0M TotalTime = 0.826302, [24] [bootstrap]: 0.00058306 [type_inference]: 0.00684885 [event_method]: 1.486e-05 [auto_monad]: 6.391e-05 [graph_reusing]: 6.02999e-06 [inline]: 2.72001e-06 [add_attr]: 0.00382881, [1] [add_attr_with_inline]: 0.00381595, [1] [Cycle 1]: 5.569e-05, [2] [tag_attr]: 1.69e-05 [meta_addattr_fg_expand]: 4.32998e-06 [parallel-infer-symbol]: 3.53e-06 [pre_auto_parallel]: 2.867e-05 [insert-virtual-dataset]: 3.13998e-06 [parallel-infer-symbol-second]: 9.00007e-07 [dataset_repeat_opt]: 2.27999e-06 [pipeline_split]: 1.69998e-06 [optimize]: 0.00458752, [53] [py_interpret_to_execute]: 2.572e-05 [rewriter_before_opt_a]: 6.765e-05 [opt_a]: 0.00243969, [2] [Cycle 1]: 0.00179846, [45] [expand_dump_flag]: 2.84999e-06 [switch_simplify]: 3.621e-05 [loop_unroll]: 2.312e-05 [a_1]: 0.00046277 [with_stream_mark]: 1.685e-05 [recompute_prepare]: 1.06e-05 [updatestate_depend_eliminate]: 4.33001e-06 [updatestate_assign_eliminate]: 3.56999e-06 [updatestate_loads_eliminate]: 3.52997e-06 [parameter_eliminate]: 2.40002e-06 [a_2]: 8.457e-05 [accelerated_algorithm]: 7.65e-06 [shard]: 2.53e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 6.47001e-06 [merge_send_recv]: 9.05999e-06 [auto_parallel]: 8.03001e-06 [parallel]: 2.82e-05 [flash_sp]: 9.91e-06 [merge_comm]: 4.1e-06 [allreduce_fusion]: 3.78001e-06 [matmul_add_comm_reduction]: 9.74e-06 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 8.27998e-06 [virtual_dataset]: 6.43e-06 [get_grad_eliminate_]: 5.79999e-06 [virtual_output]: 6.43e-06 [merge_forward]: 4.53999e-06 [cell_reuse_recompute_pass]: 9.60019e-07 [offload_activation]: 1.149e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.397e-05 [merge_recompute_call_nodes]: 1.51998e-06 [before_grad]: 1.067e-05 [set_forward_comm_id_for_comm_node_pass]: 4.52e-06 [meta_fg_expand]: 2.81999e-06 [flash_sp_send_recv_attached]: 2.74001e-06 [receive_attached]: 2.06998e-06 [after_resolve]: 9.69999e-06 [a_after_grad]: 8.84003e-06 [renormalize]: 0.00058747 [add_forward_monad_depend]: 1.071e-05 [auto_monad_grad]: 2.48e-06 [auto_monad_eliminator]: 1.666e-05 [cse]: 2.976e-05 [a_3]: 4.655e-05 [Cycle 2]: 0.00062968, [45] [expand_dump_flag]: 1.37999e-06 [switch_simplify]: 7.38e-06 [loop_unroll]: 5.79e-06 [a_1]: 0.00011648 [with_stream_mark]: 1.079e-05 [recompute_prepare]: 6.11e-06 [updatestate_depend_eliminate]: 3.3e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 3.33998e-06 [parameter_eliminate]: 1.12999e-06 [a_2]: 7.366e-05 [accelerated_algorithm]: 5.81e-06 [shard]: 1.28002e-06 [meta_shard_fg_expand]: 1.25999e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 5.69999e-06 [auto_parallel]: 6.58998e-06 [parallel]: 5.05999e-06 [flash_sp]: 3.16001e-06 [merge_comm]: 3.40003e-06 [allreduce_fusion]: 3.04999e-06 [matmul_add_comm_reduction]: 6.24001e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.63998e-06 [virtual_dataset]: 5.47001e-06 [get_grad_eliminate_]: 5.17999e-06 [virtual_output]: 5.27001e-06 [merge_forward]: 3.26999e-06 [cell_reuse_recompute_pass]: 1.52001e-06 [offload_activation]: 7.21999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.043e-05 [merge_recompute_call_nodes]: 9.99979e-07 [before_grad]: 8.67e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 1.92001e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.71e-06 [after_resolve]: 9.11998e-06 [a_after_grad]: 8.16002e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.74e-06 [auto_monad_grad]: 1.37999e-06 [auto_monad_eliminator]: 9.60001e-06 [cse]: 1.498e-05 [a_3]: 3.348e-05 [py_interpret_to_execute_after_opt_a]: 1.202e-05 [slice_cell_reuse_recomputed_activation]: 2.34001e-06 [rewriter_after_opt_a]: 3.656e-05 [convert_after_rewriter]: 6.66e-06 [order_py_execute_after_rewriter]: 5.00999e-06 [mutable_eliminate]: 0.00052895 [opt_b]: 0.00019923, [1] [Cycle 1]: 0.00019204, [7] [b_1]: 0.00011375 [b_2]: 6.98e-06 [updatestate_depend_eliminate]: 6.96001e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 3.08e-06 [renormalize]: 5.89993e-07 [cse]: 2.048e-05 [optimize_parallel_all_gather_comm]: 1.82e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.629e-05 [loop_unroll]: 0.00052871 [opt_after_cconv]: 0.00010431, [1] [Cycle 1]: 9.646e-05, [7] [c_1]: 2.551e-05 [parameter_eliminate]: 2.96001e-06 [updatestate_depend_eliminate]: 6.80002e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.49999e-06 [cse]: 2.055e-05 [renormalize]: 4.10015e-07 [remove_dup_value]: 1.593e-05 [tuple_transform]: 7.166e-05, [1] [Cycle 1]: 6.723e-05, [4] [d_1]: 3.987e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.82002e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 5.267e-05 [cse_after_recomputation]: 2.227e-05, [1] [Cycle 1]: 1.723e-05, [1] [cse]: 1.151e-05 [environ_conv]: 8.13999e-06 [swap_dp_allreduce_reducescatter]: 5.40001e-06 [bias_add_comm_swap]: 3.08998e-06 [label_micro_interleaved_index]: 5.12999e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.57999e-06 [slice_recompute_activation]: 2.84999e-06 [micro_interleaved_order_control]: 2.70002e-06 [assign_add_opt]: 1.40001e-06 [ForceFp32Comm]: 1.07e-06 [remove_cast_before_assign_add]: 1.06997e-06 [full_micro_interleaved_order_control]: 2.36998e-06 [reorder_send_recv_between_fp_bp]: 3.4e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.06997e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.45999e-06 [overlap_opt_shard_in_pipeline]: 1.23002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89999e-06 [control_data_broadcast_order]: 1.337e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 4.04002e-06 [overlap_recompute_and_grad_model_parallel]: 4.77e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.41998e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 4.41002e-06 [overlap_grad_flash_sp]: 2.041e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.68003e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 1.32999e-06 [symbol_engine_optimizer]: 7.765e-05, [1] [Cycle 1]: 7.211e-05, [6] [build]: 4.05e-06 [elim_shapecalc]: 1.061e-05 [elim_not_effective]: 1.264e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 9.41e-06 [renormalize]: 2.00002e-07 [detach_backward]: 2.09999e-06 [pipeline_parallel_scheduler]: 1.67001e-06 [auto_monad_reorder]: 1.791e-05 [get_jit_bprop_graph]: 1.55999e-06 [rewriter_after_jit_bprop_graph]: 4.42e-06 [opt_after_jit_grad]: 0.00051427 [validate]: 3.952e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.809354 [execute]: 1.068e-05 Sums bootstrap : 0.000583s : 0.07% type_inference : 0.006849s : 0.83% event_method : 0.000015s : 0.00% auto_monad : 0.000064s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000026s : 0.00% optimize.rewriter_before_opt_a : 0.000068s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000044s : 0.01% optimize.opt_a.loop_unroll : 0.000029s : 0.00% optimize.opt_a.a_1 : 0.000579s : 0.07% optimize.opt_a.with_stream_mark : 0.000028s : 0.00% optimize.opt_a.recompute_prepare : 0.000017s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000158s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000015s : 0.00% optimize.opt_a.auto_parallel : 0.000015s : 0.00% optimize.opt_a.parallel : 0.000033s : 0.00% optimize.opt_a.flash_sp : 0.000013s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000019s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000588s : 0.07% optimize.opt_a.add_forward_monad_depend : 0.000012s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000026s : 0.00% optimize.opt_a.cse : 0.000045s : 0.01% optimize.opt_a.a_3 : 0.000080s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000037s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000529s : 0.06% optimize.opt_b.b_1 : 0.000114s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000020s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.00% optimize.loop_unroll : 0.000529s : 0.06% optimize.opt_after_cconv.c_1 : 0.000026s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000021s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.00% optimize.tuple_transform.d_1 : 0.000040s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000008s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000020s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000514s : 0.06% validate : 0.000040s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.809354s : 98.55% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.000184 26 19.93% : 0.000037s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.04% : 0.000006s : 3: substitution.graph_param_transform 63.13% : 0.000116s : 3: substitution.inline 1.85% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.00% : 0.000006s : 4: substitution.remove_not_recompute_node 1.95% : 0.000004s : 2: substitution.replace_old_param 5.27% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006787 2 90.69% : 0.006155s : 1: type_inference.infer 9.31% : 0.000632s : 1: type_inference.specialize ------[replace.] 0.000041 4 76.70% : 0.000032s : 3: replace.inline 23.30% : 0.000010s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000123 4 92.95% : 0.000114s : 3: match.inline 7.05% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 883 0.89% : 0.000001s : 9: predicate.accumulaten_eliminater 0.90% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 9: predicate.addn_zero_filter 0.82% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.04% : 0.000003s : 15: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.73% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.84% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.34% : 0.000001s : 3: predicate.elim_not_effective 0.38% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 12: predicate.environ_get_depend_swap 1.69% : 0.000003s : 18: predicate.environ_get_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.31% : 0.000004s : 13: predicate.float_depend_g_call 0.55% : 0.000001s : 6: predicate.float_environ_get_switch 0.80% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.64% : 0.000001s : 6: predicate.incorporate_call 0.56% : 0.000001s : 6: predicate.incorporate_call_switch 6.51% : 0.000011s : 40: predicate.inline 0.90% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 6: predicate.less_batch_normalization 1.63% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 25: predicate.load_eliminater 1.34% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.31% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.59% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 9: predicate.minmaximum_grad 1.08% : 0.000002s : 3: predicate.mutable_eliminate 0.34% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.62% : 0.000003s : 13: predicate.partial_defer_inline 1.37% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.59% : 0.000001s : 6: predicate.reduce_all_const_elim 1.20% : 0.000002s : 9: predicate.reduce_eliminate 2.35% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.63% : 0.000001s : 6: predicate.remove_not_recompute_node 1.20% : 0.000002s : 16: predicate.replace_applicator 0.63% : 0.000001s : 6: predicate.replace_old_param 0.26% : 0.000000s : 3: predicate.reset_defer_inline 0.88% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 3: predicate.row_tensor_eliminate 1.02% : 0.000002s : 6: predicate.same_eliminate 0.43% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 6: predicate.shard_identity_eliminate 0.75% : 0.000001s : 6: predicate.special_op_eliminate 0.77% : 0.000001s : 6: predicate.specialize_transform 1.32% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.74% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 13: predicate.switch_defer_inline 1.96% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.21% : 0.000008s : 43: predicate.switch_simplify 0.90% : 0.000001s : 9: predicate.tile_eliminate 0.87% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.49% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.58% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.99% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 3: predicate.value_based_eliminate 0.66% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000399 8 47.39% : 0.000189s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.61% : 0.000210s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.836425 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.46% : 0.003834s : 1: add_attr 0.46% : 0.003820s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000069s : 1: auto_monad 0.00% : 0.000022s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.07% : 0.000625s : 1: bootstrap 0.00% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000017s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000026s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000011s : 1: environ_conv 0.01% : 0.000070s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.06% : 0.000540s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.06% : 0.000541s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000015s : 1: opt.transform.mutable_eliminate 0.12% : 0.000973s : 78: opt.transform.opt_a 0.00% : 0.000024s : 1: opt.transform.opt_after_cconv 0.00% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000092s : 28: opt.transform.opt_b 0.01% : 0.000044s : 2: opt.transform.opt_trans_graph 0.00% : 0.000035s : 4: opt.transform.symbol_engine_opt 0.29% : 0.002443s : 1: opt_a 0.01% : 0.000108s : 1: opt_after_cconv 0.06% : 0.000526s : 1: opt_after_jit_grad 0.02% : 0.000202s : 1: opt_b 0.55% : 0.004592s : 1: optimize 0.00% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000024s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000090s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000034s : 1: pre_auto_parallel 0.00% : 0.000030s : 1: py_interpret_to_execute 0.00% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000019s : 1: remove_dup_value 0.04% : 0.000319s : 1: renormalize.infer 0.03% : 0.000260s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000041s : 1: rewriter_after_opt_a 0.01% : 0.000072s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000080s : 1: symbol_engine_optimizer 96.77% : 0.809379s : 1: task_emit 0.01% : 0.000075s : 1: tuple_transform 0.82% : 0.006872s : 1: type_inference 0.01% : 0.000069s : 1: validate TotalTime = 0.0566996, [24] [bootstrap]: 0.00045419 [type_inference]: 0.00603884 [event_method]: 1.244e-05 [auto_monad]: 6.185e-05 [graph_reusing]: 5.59e-06 [inline]: 1.99999e-06 [add_attr]: 0.00303908, [1] [add_attr_with_inline]: 0.00303103, [1] [Cycle 1]: 5.255e-05, [2] [tag_attr]: 1.443e-05 [meta_addattr_fg_expand]: 3.75e-06 [parallel-infer-symbol]: 3.15998e-06 [pre_auto_parallel]: 2.507e-05 [insert-virtual-dataset]: 2.88003e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 2.37999e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00399365, [53] [py_interpret_to_execute]: 2.055e-05 [rewriter_before_opt_a]: 5.188e-05 [opt_a]: 0.00210295, [2] [Cycle 1]: 0.00148404, [45] [expand_dump_flag]: 2.98998e-06 [switch_simplify]: 2.903e-05 [loop_unroll]: 1.762e-05 [a_1]: 0.00035395 [with_stream_mark]: 1.584e-05 [recompute_prepare]: 7.94002e-06 [updatestate_depend_eliminate]: 3.92002e-06 [updatestate_assign_eliminate]: 4.18999e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 2.21998e-06 [a_2]: 8.159e-05 [accelerated_algorithm]: 6.89999e-06 [shard]: 2.10002e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 6.69001e-06 [merge_send_recv]: 8.35001e-06 [auto_parallel]: 6.44001e-06 [parallel]: 1.877e-05 [flash_sp]: 7.66999e-06 [merge_comm]: 3.51999e-06 [allreduce_fusion]: 3.50998e-06 [matmul_add_comm_reduction]: 9.31e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.13e-06 [virtual_dataset]: 5.92999e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 5.72999e-06 [merge_forward]: 3.83999e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 1.033e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.152e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 1.034e-05 [set_forward_comm_id_for_comm_node_pass]: 3.84002e-06 [meta_fg_expand]: 2.62001e-06 [flash_sp_send_recv_attached]: 2.64001e-06 [receive_attached]: 2.44999e-06 [after_resolve]: 9.56998e-06 [a_after_grad]: 8.62e-06 [renormalize]: 0.00043131 [add_forward_monad_depend]: 4.78001e-06 [auto_monad_grad]: 1.97999e-06 [auto_monad_eliminator]: 1.34e-05 [cse]: 3.22e-05 [a_3]: 7.649e-05 [Cycle 2]: 0.0006091, [45] [expand_dump_flag]: 8.49977e-07 [switch_simplify]: 7.18998e-06 [loop_unroll]: 5.67001e-06 [a_1]: 0.00011534 [with_stream_mark]: 1.012e-05 [recompute_prepare]: 6.01e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 7.247e-05 [accelerated_algorithm]: 5.79e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.99999e-06 [merge_send_recv]: 4.48001e-06 [auto_parallel]: 5.56e-06 [parallel]: 4.57998e-06 [flash_sp]: 3.35998e-06 [merge_comm]: 3.14999e-06 [allreduce_fusion]: 2.97002e-06 [matmul_add_comm_reduction]: 7.52998e-06 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 6.54999e-06 [virtual_dataset]: 5.51e-06 [get_grad_eliminate_]: 5.37999e-06 [virtual_output]: 5.51e-06 [merge_forward]: 2.69999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 6.64999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.098e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 9.00001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21999e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 8.69998e-06 [a_after_grad]: 7.85e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.09998e-06 [auto_monad_grad]: 1.29998e-06 [auto_monad_eliminator]: 6.70998e-06 [cse]: 1.385e-05 [a_3]: 3.338e-05 [py_interpret_to_execute_after_opt_a]: 7.71999e-06 [slice_cell_reuse_recomputed_activation]: 2.16e-06 [rewriter_after_opt_a]: 3.226e-05 [convert_after_rewriter]: 6.81001e-06 [order_py_execute_after_rewriter]: 5.16002e-06 [mutable_eliminate]: 0.00047462 [opt_b]: 0.00018886, [1] [Cycle 1]: 0.00018262, [7] [b_1]: 0.0001125 [b_2]: 7.39002e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.28998e-06 [renormalize]: 3.19997e-07 [cse]: 1.736e-05 [optimize_parallel_all_gather_comm]: 1.679e-05 [overlap_param_gather]: 2.10002e-06 [cconv]: 2.298e-05 [loop_unroll]: 0.00041938 [opt_after_cconv]: 9.587e-05, [1] [Cycle 1]: 9.016e-05, [7] [c_1]: 2.545e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.41998e-06 [cse]: 1.807e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.531e-05 [tuple_transform]: 6.792e-05, [1] [Cycle 1]: 6.363e-05, [4] [d_1]: 3.739e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.41e-06 [partial_unused_args_eliminate]: 2.32001e-06 [add_recomputation]: 4.649e-05 [cse_after_recomputation]: 2.154e-05, [1] [Cycle 1]: 1.707e-05, [1] [cse]: 1.146e-05 [environ_conv]: 5.33002e-06 [swap_dp_allreduce_reducescatter]: 5.12e-06 [bias_add_comm_swap]: 2.48998e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.87002e-06 [merge_cast_opt]: 1.64998e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.56e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 8.29983e-07 [remove_cast_before_assign_add]: 8.29983e-07 [full_micro_interleaved_order_control]: 2.68e-06 [reorder_send_recv_between_fp_bp]: 2.85002e-06 [comm_op_add_attrs]: 1.25001e-06 [add_comm_op_reuse_tag]: 1.15999e-06 [interleave_split_concat_branches]: 1.32999e-06 [interleave_parallel_branches]: 1.09003e-06 [overlap_opt_shard_in_pipeline]: 1.23002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.192e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 3.86001e-06 [overlap_recompute_and_grad_model_parallel]: 5.32999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.63e-06 [overlap_grad_ring_attention]: 4.33999e-06 [overlap_grad_flash_sp]: 1.843e-05 [begin_end_overlap_inline]: 8.50006e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 2.38002e-06 [handle_group_info]: 1.50999e-06 [symbol_engine_optimizer]: 7.176e-05, [1] [Cycle 1]: 6.752e-05, [6] [build]: 2.43998e-06 [elim_shapecalc]: 8.87999e-06 [elim_not_effective]: 1.221e-05 [opt_reshape]: 6.24001e-06 [fold_const_symbol]: 9.44998e-06 [renormalize]: 4.30009e-07 [detach_backward]: 1.91e-06 [pipeline_parallel_scheduler]: 1.90001e-06 [auto_monad_reorder]: 1.608e-05 [get_jit_bprop_graph]: 1.15999e-06 [rewriter_after_jit_bprop_graph]: 3.31999e-06 [opt_after_jit_grad]: 0.00045382 [validate]: 3.477e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0423093 [execute]: 1.028e-05 Sums bootstrap : 0.000454s : 0.86% type_inference : 0.006039s : 11.47% event_method : 0.000012s : 0.02% auto_monad : 0.000062s : 0.12% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000025s : 0.05% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.04% optimize.rewriter_before_opt_a : 0.000052s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000036s : 0.07% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000469s : 0.89% optimize.opt_a.with_stream_mark : 0.000026s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000154s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000013s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.03% optimize.opt_a.renormalize : 0.000431s : 0.82% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000046s : 0.09% optimize.opt_a.a_3 : 0.000110s : 0.21% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000475s : 0.90% optimize.opt_b.b_1 : 0.000113s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.04% optimize.loop_unroll : 0.000419s : 0.80% optimize.opt_after_cconv.c_1 : 0.000025s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000037s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.09% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.04% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000002s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000454s : 0.86% validate : 0.000035s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.042309s : 80.36% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000143 24 20.44% : 0.000029s : 4: substitution.arithmetic_simplify 1.53% : 0.000002s : 2: substitution.elim_not_effective 1.16% : 0.000002s : 2: substitution.fold_const_symbol 3.95% : 0.000006s : 3: substitution.graph_param_transform 65.13% : 0.000093s : 3: substitution.inline 2.28% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.29% : 0.000005s : 4: substitution.remove_not_recompute_node 2.22% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005993 2 92.27% : 0.005530s : 1: type_inference.infer 7.73% : 0.000463s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000091 3 100.00% : 0.000091s : 3: match.inline ------[predicate.] 0.000179 815 0.71% : 0.000001s : 8: predicate.accumulaten_eliminater 0.74% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.51% : 0.000001s : 6: predicate.addn_check_dump 0.79% : 0.000001s : 8: predicate.addn_zero_filter 0.65% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 1.87% : 0.000003s : 14: predicate.arithmetic_simplify 0.70% : 0.000001s : 8: predicate.cast_eliminate 0.54% : 0.000001s : 6: predicate.check_bprop_eliminate 0.52% : 0.000001s : 6: predicate.compare_switch_simplify 0.17% : 0.000000s : 3: predicate.const_output_eliminate 0.54% : 0.000001s : 6: predicate.depend_value_elim 0.70% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.81% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.72% : 0.000001s : 8: predicate.dict_set_item_eliminator 0.89% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 3: predicate.elim_not_effective 0.34% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 0.94% : 0.000002s : 11: predicate.environ_add_const_eliminate 0.89% : 0.000002s : 11: predicate.environ_get_add_eliminate 0.91% : 0.000002s : 11: predicate.environ_get_depend_swap 1.47% : 0.000003s : 17: predicate.environ_get_eliminate 0.88% : 0.000002s : 11: predicate.environ_get_set_eliminate 0.97% : 0.000002s : 11: predicate.exchange_switch_depend_value 1.80% : 0.000003s : 11: predicate.float_depend_g_call 0.50% : 0.000001s : 6: predicate.float_environ_get_switch 0.74% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.18% : 0.000000s : 3: predicate.fold_const_symbol 0.66% : 0.000001s : 6: predicate.get_grad_eliminate 0.21% : 0.000000s : 3: predicate.graph_param_transform 0.61% : 0.000001s : 6: predicate.incorporate_call 0.50% : 0.000001s : 6: predicate.incorporate_call_switch 5.17% : 0.000009s : 37: predicate.inline 0.77% : 0.000001s : 6: predicate.inline_without_move 0.37% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.75% : 0.000001s : 6: predicate.less_batch_normalization 1.27% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 1.88% : 0.000003s : 22: predicate.load_eliminater 0.83% : 0.000001s : 3: predicate.loop_unroll_after_grad 1.69% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.46% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.51% : 0.000001s : 6: predicate.merge_addn 19.16% : 0.000034s : 6: predicate.micro_step_allgather_replace 0.55% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.67% : 0.000001s : 8: predicate.minmaximum_grad 0.96% : 0.000002s : 3: predicate.mutable_eliminate 0.33% : 0.000001s : 3: predicate.opt_reshape 0.33% : 0.000001s : 3: predicate.parallel_virtual_node 1.16% : 0.000002s : 11: predicate.partial_defer_inline 1.09% : 0.000002s : 11: predicate.partial_eliminate 0.68% : 0.000001s : 8: predicate.print_const_string_wrapper 0.54% : 0.000001s : 6: predicate.reduce_all_const_elim 0.88% : 0.000002s : 8: predicate.reduce_eliminate 1.90% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 6: predicate.remove_not_recompute_node 1.00% : 0.000002s : 14: predicate.replace_applicator 0.55% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000001s : 3: predicate.reset_defer_inline 0.74% : 0.000001s : 8: predicate.reshape_eliminate 0.54% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 3: predicate.row_tensor_eliminate 0.68% : 0.000001s : 6: predicate.same_eliminate 0.41% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.71% : 0.000001s : 6: predicate.shard_identity_eliminate 0.67% : 0.000001s : 6: predicate.special_op_eliminate 0.80% : 0.000001s : 6: predicate.specialize_transform 0.80% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.65% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.33% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.04% : 0.000002s : 11: predicate.switch_defer_inline 1.54% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.03% : 0.000007s : 38: predicate.switch_simplify 0.75% : 0.000001s : 8: predicate.tile_eliminate 0.70% : 0.000001s : 8: predicate.transpose_eliminate 1.29% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.32% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.17% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 2.58% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.14% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 1.85% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.41% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 1.81% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.50% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.33% : 0.000001s : 3: predicate.value_based_eliminate 0.58% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.59% : 0.000001s : 6: predicate.virtual_output_eliminate 0.25% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000305 7 40.55% : 0.000124s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.45% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.065186 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.67% : 0.003044s : 1: add_attr 4.65% : 0.003034s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000067s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.75% : 0.000492s : 1: bootstrap 0.04% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000018s : 1: event_method 0.03% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.66% : 0.000428s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.74% : 0.000483s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.34% : 0.000873s : 78: opt.transform.opt_a 0.04% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000020s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000092s : 28: opt.transform.opt_b 0.06% : 0.000042s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 3.23% : 0.002106s : 1: opt_a 0.15% : 0.000099s : 1: opt_after_cconv 0.71% : 0.000463s : 1: opt_after_jit_grad 0.30% : 0.000192s : 1: opt_b 6.13% : 0.003998s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.05% : 0.000030s : 1: pre_auto_parallel 0.04% : 0.000024s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.35% : 0.000231s : 1: renormalize.infer 0.30% : 0.000194s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000036s : 1: rewriter_after_opt_a 0.09% : 0.000056s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000074s : 1: symbol_engine_optimizer 64.94% : 0.042335s : 1: task_emit 0.11% : 0.000071s : 1: tuple_transform 9.29% : 0.006054s : 1: type_inference 0.09% : 0.000059s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x7-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x7-ge],max_mem:10.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x8-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x8-pynative],max_mem:10.0M TotalTime = 0.0214437, [24] [bootstrap]: 0.00047766 [type_inference]: 0.00602937 [event_method]: 1.406e-05 [auto_monad]: 5.953e-05 [graph_reusing]: 5.83002e-06 [inline]: 2.03002e-06 [add_attr]: 0.00351061, [1] [add_attr_with_inline]: 0.00349989, [1] [Cycle 1]: 4.55e-05, [2] [tag_attr]: 1.491e-05 [meta_addattr_fg_expand]: 4.30999e-06 [parallel-infer-symbol]: 2.86e-06 [pre_auto_parallel]: 2.601e-05 [insert-virtual-dataset]: 2.51998e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.17001e-06 [pipeline_split]: 1.91003e-06 [optimize]: 0.00412698, [53] [py_interpret_to_execute]: 2.158e-05 [rewriter_before_opt_a]: 6.386e-05 [opt_a]: 0.00221799, [2] [Cycle 1]: 0.00160362, [45] [expand_dump_flag]: 2.81999e-06 [switch_simplify]: 3.489e-05 [loop_unroll]: 2.076e-05 [a_1]: 0.00044025 [with_stream_mark]: 1.358e-05 [recompute_prepare]: 7.9e-06 [updatestate_depend_eliminate]: 3.9e-06 [updatestate_assign_eliminate]: 3.78001e-06 [updatestate_loads_eliminate]: 3.33998e-06 [parameter_eliminate]: 1.77001e-06 [a_2]: 8.121e-05 [accelerated_algorithm]: 6.76e-06 [shard]: 2.51998e-06 [meta_shard_fg_expand]: 1.73002e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 8.79e-06 [auto_parallel]: 5.89e-06 [parallel]: 2.595e-05 [flash_sp]: 8.07998e-06 [merge_comm]: 1.891e-05 [allreduce_fusion]: 4.28001e-06 [matmul_add_comm_reduction]: 9.30001e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.77e-06 [virtual_dataset]: 6.26998e-06 [get_grad_eliminate_]: 5.93002e-06 [virtual_output]: 6.02999e-06 [merge_forward]: 4.35999e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 1.024e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.169e-05 [merge_recompute_call_nodes]: 1.64998e-06 [before_grad]: 1.04e-05 [set_forward_comm_id_for_comm_node_pass]: 4.12e-06 [meta_fg_expand]: 2.61e-06 [flash_sp_send_recv_attached]: 3.03e-06 [receive_attached]: 2.49001e-06 [after_resolve]: 9.28002e-06 [a_after_grad]: 8.85001e-06 [renormalize]: 0.00046254 [add_forward_monad_depend]: 8.13999e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.408e-05 [cse]: 2.829e-05 [a_3]: 4.261e-05 [Cycle 2]: 0.00060488, [45] [expand_dump_flag]: 1.08001e-06 [switch_simplify]: 6.91999e-06 [loop_unroll]: 5.86e-06 [a_1]: 0.00011592 [with_stream_mark]: 1.027e-05 [recompute_prepare]: 6.12001e-06 [updatestate_depend_eliminate]: 3.11999e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 7.233e-05 [accelerated_algorithm]: 5.88002e-06 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 4.50999e-06 [auto_parallel]: 5.44e-06 [parallel]: 4.36002e-06 [flash_sp]: 3.36999e-06 [merge_comm]: 3.06001e-06 [allreduce_fusion]: 2.86999e-06 [matmul_add_comm_reduction]: 5.38002e-06 [allreduce_slice_to_reducescatter]: 4.69998e-07 [virtual_shard_identity]: 5.99e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 5.27001e-06 [virtual_output]: 5.05999e-06 [merge_forward]: 2.68e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 6.23e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.028e-05 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 8.64e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.60019e-07 [after_resolve]: 8.33001e-06 [a_after_grad]: 7.91001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.23002e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.38e-06 [cse]: 1.454e-05 [a_3]: 3.293e-05 [py_interpret_to_execute_after_opt_a]: 7.38e-06 [slice_cell_reuse_recomputed_activation]: 2.31998e-06 [rewriter_after_opt_a]: 3.34e-05 [convert_after_rewriter]: 6.36e-06 [order_py_execute_after_rewriter]: 5.36002e-06 [mutable_eliminate]: 0.00046314 [opt_b]: 0.00018948, [1] [Cycle 1]: 0.00018318, [7] [b_1]: 0.00011154 [b_2]: 7.21999e-06 [updatestate_depend_eliminate]: 5.33002e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.10015e-07 [cse]: 1.783e-05 [optimize_parallel_all_gather_comm]: 1.656e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.284e-05 [loop_unroll]: 0.0004261 [opt_after_cconv]: 9.768e-05, [1] [Cycle 1]: 9.135e-05, [7] [c_1]: 2.672e-05 [parameter_eliminate]: 2.49999e-06 [updatestate_depend_eliminate]: 5.56e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.14999e-06 [cse]: 1.713e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.454e-05 [tuple_transform]: 6.778e-05, [1] [Cycle 1]: 6.348e-05, [4] [d_1]: 3.68e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.51999e-06 [partial_unused_args_eliminate]: 1.91e-06 [add_recomputation]: 4.949e-05 [cse_after_recomputation]: 2.185e-05, [1] [Cycle 1]: 1.737e-05, [1] [cse]: 1.183e-05 [environ_conv]: 7.46999e-06 [swap_dp_allreduce_reducescatter]: 5.25999e-06 [bias_add_comm_swap]: 2.79999e-06 [label_micro_interleaved_index]: 4.33999e-06 [label_fine_grained_interleaved_index]: 3.04999e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.27001e-06 [micro_interleaved_order_control]: 2.17001e-06 [assign_add_opt]: 1.65001e-06 [ForceFp32Comm]: 1.09e-06 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.52001e-06 [reorder_send_recv_between_fp_bp]: 3.11001e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 1.34e-06 [interleave_split_concat_branches]: 1.32e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.255e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 4.22998e-06 [overlap_recompute_and_grad_model_parallel]: 5.15999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50001e-06 [overlap_recompute_comm]: 2.48e-06 [overlap_grad_ring_attention]: 4.08001e-06 [overlap_grad_flash_sp]: 1.788e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 1.10999e-06 [symbol_engine_optimizer]: 7.063e-05, [1] [Cycle 1]: 6.623e-05, [6] [build]: 2.43e-06 [elim_shapecalc]: 9.02e-06 [elim_not_effective]: 1.173e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 9.35001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.639e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00045763 [validate]: 3.439e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00646858 [execute]: 7.68001e-06 Sums bootstrap : 0.000478s : 2.82% type_inference : 0.006029s : 35.58% event_method : 0.000014s : 0.08% auto_monad : 0.000060s : 0.35% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.13% optimize.rewriter_before_opt_a : 0.000064s : 0.38% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000042s : 0.25% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000556s : 3.28% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000154s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000030s : 0.18% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000022s : 0.13% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000463s : 2.73% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.06% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000043s : 0.25% optimize.opt_a.a_3 : 0.000076s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.20% optimize.convert_after_rewriter : 0.000006s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000463s : 2.73% optimize.opt_b.b_1 : 0.000112s : 0.66% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000426s : 2.51% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000007s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000458s : 2.70% validate : 0.000034s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006469s : 38.18% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000165 26 19.27% : 0.000032s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.85% : 0.000001s : 2: substitution.fold_const_symbol 3.12% : 0.000005s : 3: substitution.graph_param_transform 63.87% : 0.000105s : 3: substitution.inline 1.85% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.88% : 0.000005s : 4: substitution.remove_not_recompute_node 1.82% : 0.000003s : 2: substitution.replace_old_param 5.23% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005982 2 89.99% : 0.005383s : 1: type_inference.infer 10.01% : 0.000599s : 1: type_inference.specialize ------[replace.] 0.000035 4 78.83% : 0.000028s : 3: replace.inline 21.17% : 0.000007s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000111 4 92.98% : 0.000103s : 3: match.inline 7.02% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 883 0.95% : 0.000002s : 9: predicate.accumulaten_eliminater 0.90% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.94% : 0.000001s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.09% : 0.000003s : 15: predicate.arithmetic_simplify 0.94% : 0.000001s : 9: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.62% : 0.000001s : 6: predicate.depend_value_elim 0.85% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.42% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_depend_swap 1.73% : 0.000003s : 18: predicate.environ_get_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.32% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.34% : 0.000004s : 13: predicate.float_depend_g_call 0.55% : 0.000001s : 6: predicate.float_environ_get_switch 0.83% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.26% : 0.000000s : 3: predicate.graph_param_transform 0.69% : 0.000001s : 6: predicate.incorporate_call 0.57% : 0.000001s : 6: predicate.incorporate_call_switch 6.18% : 0.000010s : 40: predicate.inline 0.91% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 6: predicate.less_batch_normalization 1.67% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.47% : 0.000004s : 25: predicate.load_eliminater 1.04% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.22% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.66% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.56% : 0.000001s : 6: predicate.mini_step_allgather_replace 1.07% : 0.000002s : 9: predicate.minmaximum_grad 1.28% : 0.000002s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.57% : 0.000002s : 13: predicate.partial_defer_inline 1.44% : 0.000002s : 13: predicate.partial_eliminate 0.87% : 0.000001s : 9: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.19% : 0.000002s : 9: predicate.reduce_eliminate 2.36% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.23% : 0.000002s : 16: predicate.replace_applicator 0.60% : 0.000001s : 6: predicate.replace_old_param 0.23% : 0.000000s : 3: predicate.reset_defer_inline 1.00% : 0.000002s : 9: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 6: predicate.shard_identity_eliminate 0.71% : 0.000001s : 6: predicate.special_op_eliminate 0.83% : 0.000001s : 6: predicate.specialize_transform 0.91% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.75% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.40% : 0.000002s : 13: predicate.switch_defer_inline 2.02% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.07% : 0.000008s : 43: predicate.switch_simplify 0.90% : 0.000001s : 9: predicate.tile_eliminate 0.89% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.48% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.71% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 6: predicate.virtual_output_eliminate 0.35% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000348 8 42.76% : 0.000149s : 3: func_graph_cloner_run.FuncGraphClonerGraph 57.24% : 0.000199s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030628 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.48% : 0.003515s : 1: add_attr 11.44% : 0.003503s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000065s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.65% : 0.000505s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000011s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.42% : 0.000434s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.54% : 0.000472s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 3.05% : 0.000934s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000092s : 28: opt.transform.opt_b 0.13% : 0.000041s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.25% : 0.002221s : 1: opt_a 0.33% : 0.000101s : 1: opt_after_cconv 1.53% : 0.000468s : 1: opt_after_jit_grad 0.63% : 0.000193s : 1: opt_b 13.49% : 0.004131s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.79% : 0.000243s : 1: renormalize.infer 0.70% : 0.000213s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000038s : 1: rewriter_after_opt_a 0.22% : 0.000068s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000073s : 1: symbol_engine_optimizer 21.15% : 0.006478s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 19.74% : 0.006044s : 1: type_inference 0.20% : 0.000060s : 1: validate TotalTime = 0.0203392, [24] [bootstrap]: 0.00049491 [type_inference]: 0.00602947 [event_method]: 1.231e-05 [auto_monad]: 6.003e-05 [graph_reusing]: 5.49e-06 [inline]: 1.66e-06 [add_attr]: 0.00301557, [1] [add_attr_with_inline]: 0.00300777, [1] [Cycle 1]: 5.219e-05, [2] [tag_attr]: 1.471e-05 [meta_addattr_fg_expand]: 4.45999e-06 [parallel-infer-symbol]: 3.13e-06 [pre_auto_parallel]: 2.32e-05 [insert-virtual-dataset]: 2.88e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.36998e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.00393484, [53] [py_interpret_to_execute]: 1.904e-05 [rewriter_before_opt_a]: 5.19e-05 [opt_a]: 0.00200414, [2] [Cycle 1]: 0.00139046, [45] [expand_dump_flag]: 3.2e-06 [switch_simplify]: 2.886e-05 [loop_unroll]: 1.681e-05 [a_1]: 0.00035238 [with_stream_mark]: 1.415e-05 [recompute_prepare]: 7.72998e-06 [updatestate_depend_eliminate]: 4.13999e-06 [updatestate_assign_eliminate]: 3.53999e-06 [updatestate_loads_eliminate]: 3.13e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 8.274e-05 [accelerated_algorithm]: 7.16001e-06 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 1.58002e-06 [shard_inline]: 6.33e-06 [merge_send_recv]: 8.52e-06 [auto_parallel]: 6.19999e-06 [parallel]: 1.869e-05 [flash_sp]: 7.18e-06 [merge_comm]: 3.88001e-06 [allreduce_fusion]: 3.48999e-06 [matmul_add_comm_reduction]: 9.45001e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 7.92e-06 [virtual_dataset]: 6.39999e-06 [get_grad_eliminate_]: 5.57001e-06 [virtual_output]: 6.04001e-06 [merge_forward]: 3.74002e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 9.34998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.198e-05 [merge_recompute_call_nodes]: 1.67001e-06 [before_grad]: 1.01e-05 [set_forward_comm_id_for_comm_node_pass]: 3.8e-06 [meta_fg_expand]: 2.61999e-06 [flash_sp_send_recv_attached]: 2.48e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 9.49e-06 [a_after_grad]: 8.73001e-06 [renormalize]: 0.00038199 [add_forward_monad_depend]: 4.78001e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.332e-05 [cse]: 2.863e-05 [a_3]: 4.163e-05 [Cycle 2]: 0.00060461, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 6.77002e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00011386 [with_stream_mark]: 1.034e-05 [recompute_prepare]: 5.94e-06 [updatestate_depend_eliminate]: 2.86999e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 7.248e-05 [accelerated_algorithm]: 6.07001e-06 [shard]: 1.47001e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 4.67e-06 [auto_parallel]: 6.04001e-06 [parallel]: 4.23001e-06 [flash_sp]: 3.26001e-06 [merge_comm]: 3.36001e-06 [allreduce_fusion]: 3.06001e-06 [matmul_add_comm_reduction]: 5.64e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 6.67002e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.50001e-06 [virtual_output]: 5.15999e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 6.23998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.029e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 8.77999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 1.86e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.37e-06 [a_after_grad]: 7.86001e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.05999e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 6.12001e-06 [cse]: 1.338e-05 [a_3]: 3.315e-05 [py_interpret_to_execute_after_opt_a]: 7.56999e-06 [slice_cell_reuse_recomputed_activation]: 1.84e-06 [rewriter_after_opt_a]: 3.325e-05 [convert_after_rewriter]: 6.82002e-06 [order_py_execute_after_rewriter]: 4.95001e-06 [mutable_eliminate]: 0.00050912 [opt_b]: 0.00019174, [1] [Cycle 1]: 0.00018514, [7] [b_1]: 0.00011345 [b_2]: 7.95e-06 [updatestate_depend_eliminate]: 5.56e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 5.39992e-07 [cse]: 1.733e-05 [optimize_parallel_all_gather_comm]: 1.675e-05 [overlap_param_gather]: 2.12999e-06 [cconv]: 2.29e-05 [loop_unroll]: 0.00042739 [opt_after_cconv]: 9.525e-05, [1] [Cycle 1]: 8.92e-05, [7] [c_1]: 2.527e-05 [parameter_eliminate]: 2.25002e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.89001e-06 [updatestate_loads_eliminate]: 2.37999e-06 [cse]: 1.746e-05 [renormalize]: 5.60016e-07 [remove_dup_value]: 1.504e-05 [tuple_transform]: 6.799e-05, [1] [Cycle 1]: 6.362e-05, [4] [d_1]: 3.649e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 6.53e-06 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 4.416e-05 [cse_after_recomputation]: 2.032e-05, [1] [Cycle 1]: 1.602e-05, [1] [cse]: 1.074e-05 [environ_conv]: 5.14e-06 [swap_dp_allreduce_reducescatter]: 5.42001e-06 [bias_add_comm_swap]: 2.74999e-06 [label_micro_interleaved_index]: 4.29002e-06 [label_fine_grained_interleaved_index]: 3.08e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.20002e-06 [micro_interleaved_order_control]: 2.73e-06 [assign_add_opt]: 1.33002e-06 [ForceFp32Comm]: 1.15999e-06 [remove_cast_before_assign_add]: 1.13001e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.91e-06 [comm_op_add_attrs]: 1.25001e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.25001e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.57999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.224e-05 [grouped_pairwise_exchange_alltoall]: 2.02001e-06 [offloading_packed_experts]: 3.90998e-06 [overlap_recompute_and_grad_model_parallel]: 4.97999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.37999e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 1.802e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.12001e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.09998e-06 [symbol_engine_optimizer]: 7.241e-05, [1] [Cycle 1]: 6.819e-05, [6] [build]: 2.35002e-06 [elim_shapecalc]: 8.64e-06 [elim_not_effective]: 1.273e-05 [opt_reshape]: 6.43998e-06 [fold_const_symbol]: 9.60001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.618e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.28e-06 [opt_after_jit_grad]: 0.00045958 [validate]: 3.388e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00602597 [execute]: 7.5e-06 Sums bootstrap : 0.000495s : 3.03% type_inference : 0.006029s : 36.91% event_method : 0.000012s : 0.08% auto_monad : 0.000060s : 0.37% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.14% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000052s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000036s : 0.22% optimize.opt_a.loop_unroll : 0.000022s : 0.14% optimize.opt_a.a_1 : 0.000466s : 2.85% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000155s : 0.95% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000382s : 2.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000042s : 0.26% optimize.opt_a.a_3 : 0.000075s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000509s : 3.12% optimize.opt_b.b_1 : 0.000113s : 0.69% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000427s : 2.62% optimize.opt_after_cconv.c_1 : 0.000025s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000036s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000460s : 2.81% validate : 0.000034s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006026s : 36.89% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000140 24 20.53% : 0.000029s : 4: substitution.arithmetic_simplify 1.37% : 0.000002s : 2: substitution.elim_not_effective 0.97% : 0.000001s : 2: substitution.fold_const_symbol 3.77% : 0.000005s : 3: substitution.graph_param_transform 65.63% : 0.000092s : 3: substitution.inline 2.37% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.29% : 0.000005s : 4: substitution.remove_not_recompute_node 2.06% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005987 2 92.40% : 0.005532s : 1: type_inference.infer 7.60% : 0.000455s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000090 3 100.00% : 0.000090s : 3: match.inline ------[predicate.] 0.000148 815 0.89% : 0.000001s : 8: predicate.accumulaten_eliminater 0.88% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 6: predicate.addn_check_dump 0.86% : 0.000001s : 8: predicate.addn_zero_filter 0.84% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.24% : 0.000003s : 14: predicate.arithmetic_simplify 0.87% : 0.000001s : 8: predicate.cast_eliminate 0.68% : 0.000001s : 6: predicate.check_bprop_eliminate 0.63% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.69% : 0.000001s : 6: predicate.depend_value_elim 0.81% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 3: predicate.elim_not_effective 0.38% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_depend_swap 1.76% : 0.000003s : 17: predicate.environ_get_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.16% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.23% : 0.000003s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.90% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 3: predicate.fold_const_symbol 0.72% : 0.000001s : 6: predicate.get_grad_eliminate 0.26% : 0.000000s : 3: predicate.graph_param_transform 0.74% : 0.000001s : 6: predicate.incorporate_call 0.63% : 0.000001s : 6: predicate.incorporate_call_switch 6.22% : 0.000009s : 37: predicate.inline 1.04% : 0.000002s : 6: predicate.inline_without_move 0.46% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.10% : 0.000002s : 6: predicate.less_batch_normalization 1.68% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.22% : 0.000003s : 22: predicate.load_eliminater 1.08% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.99% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.67% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.21% : 0.000002s : 3: predicate.mutable_eliminate 0.34% : 0.000001s : 3: predicate.opt_reshape 0.60% : 0.000001s : 3: predicate.parallel_virtual_node 1.47% : 0.000002s : 11: predicate.partial_defer_inline 1.29% : 0.000002s : 11: predicate.partial_eliminate 0.86% : 0.000001s : 8: predicate.print_const_string_wrapper 0.67% : 0.000001s : 6: predicate.reduce_all_const_elim 1.07% : 0.000002s : 8: predicate.reduce_eliminate 2.30% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.66% : 0.000001s : 6: predicate.remove_not_recompute_node 1.25% : 0.000002s : 14: predicate.replace_applicator 0.65% : 0.000001s : 6: predicate.replace_old_param 0.32% : 0.000000s : 3: predicate.reset_defer_inline 0.88% : 0.000001s : 8: predicate.reshape_eliminate 0.65% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 3: predicate.row_tensor_eliminate 0.92% : 0.000001s : 6: predicate.same_eliminate 0.54% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 6: predicate.shard_identity_eliminate 0.75% : 0.000001s : 6: predicate.special_op_eliminate 0.94% : 0.000001s : 6: predicate.specialize_transform 0.98% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.74% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.25% : 0.000002s : 11: predicate.switch_defer_inline 2.11% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.82% : 0.000007s : 38: predicate.switch_simplify 0.90% : 0.000001s : 8: predicate.tile_eliminate 0.84% : 0.000001s : 8: predicate.transpose_eliminate 1.56% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.40% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.50% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.98% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 3: predicate.value_based_eliminate 0.77% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.90% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000271 7 38.78% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.22% : 0.000166s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028664 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.54% : 0.003020s : 1: add_attr 10.51% : 0.003011s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000065s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.85% : 0.000532s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.52% : 0.000436s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.81% : 0.000518s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.91% : 0.000835s : 78: opt.transform.opt_a 0.08% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000093s : 28: opt.transform.opt_b 0.14% : 0.000041s : 2: opt.transform.opt_trans_graph 0.12% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.00% : 0.002007s : 1: opt_a 0.34% : 0.000099s : 1: opt_after_cconv 1.64% : 0.000469s : 1: opt_after_jit_grad 0.68% : 0.000195s : 1: opt_b 13.74% : 0.003939s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.70% : 0.000202s : 1: renormalize.infer 0.61% : 0.000174s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000038s : 1: rewriter_after_opt_a 0.19% : 0.000056s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000075s : 1: symbol_engine_optimizer 21.06% : 0.006036s : 1: task_emit 0.25% : 0.000071s : 1: tuple_transform 21.08% : 0.006044s : 1: type_inference 0.21% : 0.000060s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x8-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x8-kbk],max_mem:10.0M . TotalTime = 0.95291, [24] [bootstrap]: 0.00058823 [type_inference]: 0.00656205 [event_method]: 1.428e-05 [auto_monad]: 6.149e-05 [graph_reusing]: 5.87999e-06 [inline]: 2.28002e-06 [add_attr]: 0.00370322, [1] [add_attr_with_inline]: 0.00369152, [1] [Cycle 1]: 5.275e-05, [2] [tag_attr]: 1.563e-05 [meta_addattr_fg_expand]: 4.51002e-06 [parallel-infer-symbol]: 3.70998e-06 [pre_auto_parallel]: 2.868e-05 [insert-virtual-dataset]: 2.89001e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.77001e-06 [optimize]: 0.00431304, [53] [py_interpret_to_execute]: 2.339e-05 [rewriter_before_opt_a]: 6.428e-05 [opt_a]: 0.0022824, [2] [Cycle 1]: 0.00165024, [45] [expand_dump_flag]: 3.04001e-06 [switch_simplify]: 3.304e-05 [loop_unroll]: 2.114e-05 [a_1]: 0.00045772 [with_stream_mark]: 1.559e-05 [recompute_prepare]: 9.11002e-06 [updatestate_depend_eliminate]: 4.05e-06 [updatestate_assign_eliminate]: 3.24001e-06 [updatestate_loads_eliminate]: 3.18998e-06 [parameter_eliminate]: 2.32001e-06 [a_2]: 8.288e-05 [accelerated_algorithm]: 6.61999e-06 [shard]: 2.05002e-06 [meta_shard_fg_expand]: 1.67999e-06 [shard_inline]: 6.32001e-06 [merge_send_recv]: 8.38999e-06 [auto_parallel]: 6.72002e-06 [parallel]: 2.583e-05 [flash_sp]: 7.65e-06 [merge_comm]: 3.81999e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 1.031e-05 [allreduce_slice_to_reducescatter]: 8.80013e-07 [virtual_shard_identity]: 8.45999e-06 [virtual_dataset]: 6.24999e-06 [get_grad_eliminate_]: 5.90002e-06 [virtual_output]: 5.87001e-06 [merge_forward]: 4.22998e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 9.79999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.25e-05 [merge_recompute_call_nodes]: 1.45001e-06 [before_grad]: 1.022e-05 [set_forward_comm_id_for_comm_node_pass]: 3.61001e-06 [meta_fg_expand]: 2.96999e-06 [flash_sp_send_recv_attached]: 2.59001e-06 [receive_attached]: 2.24001e-06 [after_resolve]: 9.82999e-06 [a_after_grad]: 9.89999e-06 [renormalize]: 0.0004915 [add_forward_monad_depend]: 9.00999e-06 [auto_monad_grad]: 2.31e-06 [auto_monad_eliminator]: 1.563e-05 [cse]: 2.915e-05 [a_3]: 4.338e-05 [Cycle 2]: 0.00062167, [45] [expand_dump_flag]: 1.29998e-06 [switch_simplify]: 7.65e-06 [loop_unroll]: 5.67999e-06 [a_1]: 0.00011672 [with_stream_mark]: 1.165e-05 [recompute_prepare]: 6.33e-06 [updatestate_depend_eliminate]: 3.38999e-06 [updatestate_assign_eliminate]: 2.62001e-06 [updatestate_loads_eliminate]: 2.84001e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 7.284e-05 [accelerated_algorithm]: 5.88998e-06 [shard]: 1.44998e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.83002e-06 [merge_send_recv]: 5.08002e-06 [auto_parallel]: 6.01e-06 [parallel]: 4.99e-06 [flash_sp]: 3.13e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 2.98e-06 [matmul_add_comm_reduction]: 6.07999e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 6.10002e-06 [virtual_dataset]: 5.61e-06 [get_grad_eliminate_]: 5.35001e-06 [virtual_output]: 5.10999e-06 [merge_forward]: 2.89001e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 7e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.091e-05 [merge_recompute_call_nodes]: 9.39996e-07 [before_grad]: 8.77999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.41998e-06 [after_resolve]: 8.48999e-06 [a_after_grad]: 7.77e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.45001e-06 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 6.98998e-06 [cse]: 1.493e-05 [a_3]: 3.292e-05 [py_interpret_to_execute_after_opt_a]: 9.31e-06 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 3.441e-05 [convert_after_rewriter]: 6.86001e-06 [order_py_execute_after_rewriter]: 5.09003e-06 [mutable_eliminate]: 0.00050193 [opt_b]: 0.00019541, [1] [Cycle 1]: 0.00018839, [7] [b_1]: 0.00011268 [b_2]: 7.36001e-06 [updatestate_depend_eliminate]: 6.51e-06 [updatestate_assign_eliminate]: 2.87002e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 2.50002e-07 [cse]: 2.003e-05 [optimize_parallel_all_gather_comm]: 6.105e-05 [overlap_param_gather]: 1.92999e-06 [cconv]: 2.643e-05 [loop_unroll]: 0.00043474 [opt_after_cconv]: 0.00010018, [1] [Cycle 1]: 9.426e-05, [7] [c_1]: 2.627e-05 [parameter_eliminate]: 3.11999e-06 [updatestate_depend_eliminate]: 5.75001e-06 [updatestate_assign_eliminate]: 2.74999e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.836e-05 [renormalize]: 4.10015e-07 [remove_dup_value]: 1.543e-05 [tuple_transform]: 7.092e-05, [1] [Cycle 1]: 6.642e-05, [4] [d_1]: 3.818e-05 [none_parameter_eliminate]: 1.96e-06 [renormalize]: 2.9002e-07 [switch_simplify]: 6.64999e-06 [partial_unused_args_eliminate]: 2.02999e-06 [add_recomputation]: 4.99e-05 [cse_after_recomputation]: 2.253e-05, [1] [Cycle 1]: 1.725e-05, [1] [cse]: 1.167e-05 [environ_conv]: 7.42002e-06 [swap_dp_allreduce_reducescatter]: 5.39998e-06 [bias_add_comm_swap]: 2.49001e-06 [label_micro_interleaved_index]: 4.22998e-06 [label_fine_grained_interleaved_index]: 2.70002e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.23002e-06 [micro_interleaved_order_control]: 2.93e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.21e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.37e-06 [interleave_parallel_branches]: 1.09998e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.75001e-06 [control_data_broadcast_order]: 1.288e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.83999e-06 [overlap_recompute_and_grad_model_parallel]: 4.83001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.72999e-06 [overlap_recompute_comm]: 2.52001e-06 [overlap_grad_ring_attention]: 4.39002e-06 [overlap_grad_flash_sp]: 1.876e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.44001e-06 [split_layernorm_comm]: 2.24001e-06 [handle_group_info]: 1.06002e-06 [symbol_engine_optimizer]: 7.391e-05, [1] [Cycle 1]: 6.951e-05, [6] [build]: 2.99999e-06 [elim_shapecalc]: 9.66998e-06 [elim_not_effective]: 1.245e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 9.34e-06 [renormalize]: 1.99972e-07 [detach_backward]: 1.94999e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 1.601e-05 [get_jit_bprop_graph]: 1.09003e-06 [rewriter_after_jit_bprop_graph]: 4.43001e-06 [opt_after_jit_grad]: 0.00048711 [validate]: 3.717e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.936839 [execute]: 9.51e-06 Sums bootstrap : 0.000588s : 0.06% type_inference : 0.006562s : 0.69% event_method : 0.000014s : 0.00% auto_monad : 0.000061s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.00% optimize.rewriter_before_opt_a : 0.000064s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000574s : 0.06% optimize.opt_a.with_stream_mark : 0.000027s : 0.00% optimize.opt_a.recompute_prepare : 0.000015s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000156s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000013s : 0.00% optimize.opt_a.parallel : 0.000031s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000018s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000492s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.00% optimize.opt_a.cse : 0.000044s : 0.00% optimize.opt_a.a_3 : 0.000076s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000502s : 0.05% optimize.opt_b.b_1 : 0.000113s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000061s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.00% optimize.loop_unroll : 0.000435s : 0.05% optimize.opt_after_cconv.c_1 : 0.000026s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.00% optimize.tuple_transform.d_1 : 0.000038s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000007s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000487s : 0.05% validate : 0.000037s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.936839s : 98.81% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000181 26 20.15% : 0.000036s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 2.76% : 0.000005s : 3: substitution.graph_param_transform 63.70% : 0.000115s : 3: substitution.inline 1.74% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.71% : 0.000005s : 4: substitution.remove_not_recompute_node 1.75% : 0.000003s : 2: substitution.replace_old_param 5.37% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006507 2 90.64% : 0.005898s : 1: type_inference.infer 9.36% : 0.000609s : 1: type_inference.specialize ------[replace.] 0.000038 4 77.48% : 0.000029s : 3: replace.inline 22.52% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 4 92.72% : 0.000113s : 3: match.inline 7.28% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 883 0.95% : 0.000002s : 9: predicate.accumulaten_eliminater 1.03% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 9: predicate.addn_zero_filter 0.81% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 15: predicate.arithmetic_simplify 0.97% : 0.000002s : 9: predicate.cast_eliminate 0.62% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.58% : 0.000001s : 6: predicate.depend_value_elim 0.87% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.94% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.37% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 12: predicate.environ_get_depend_swap 1.72% : 0.000003s : 18: predicate.environ_get_eliminate 1.07% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.32% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.81% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 3: predicate.fold_const_symbol 0.69% : 0.000001s : 6: predicate.get_grad_eliminate 0.37% : 0.000001s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.56% : 0.000001s : 6: predicate.incorporate_call_switch 6.48% : 0.000010s : 40: predicate.inline 1.10% : 0.000002s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 6: predicate.less_batch_normalization 1.65% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 25: predicate.load_eliminater 1.02% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 6: predicate.merge_addn 0.56% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.59% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 9: predicate.minmaximum_grad 1.45% : 0.000002s : 3: predicate.mutable_eliminate 0.33% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.61% : 0.000003s : 13: predicate.partial_defer_inline 1.48% : 0.000002s : 13: predicate.partial_eliminate 0.88% : 0.000001s : 9: predicate.print_const_string_wrapper 0.61% : 0.000001s : 6: predicate.reduce_all_const_elim 1.16% : 0.000002s : 9: predicate.reduce_eliminate 2.40% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.66% : 0.000001s : 6: predicate.remove_not_recompute_node 1.20% : 0.000002s : 16: predicate.replace_applicator 0.63% : 0.000001s : 6: predicate.replace_old_param 0.49% : 0.000001s : 3: predicate.reset_defer_inline 0.90% : 0.000001s : 9: predicate.reshape_eliminate 0.59% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 3: predicate.row_tensor_eliminate 0.98% : 0.000002s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 6: predicate.shard_identity_eliminate 0.69% : 0.000001s : 6: predicate.special_op_eliminate 0.77% : 0.000001s : 6: predicate.specialize_transform 0.86% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000002s : 6: predicate.stack_unstack_eliminate 0.50% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 13: predicate.switch_defer_inline 1.89% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.12% : 0.000008s : 43: predicate.switch_simplify 0.85% : 0.000001s : 9: predicate.tile_eliminate 0.86% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.60% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.99% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 3: predicate.value_based_eliminate 0.67% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000363 8 46.30% : 0.000168s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.70% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.962519 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.39% : 0.003708s : 1: add_attr 0.38% : 0.003695s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000067s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.07% : 0.000626s : 1: bootstrap 0.00% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000026s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000011s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000443s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.05% : 0.000512s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000015s : 1: opt.transform.mutable_eliminate 0.10% : 0.000957s : 78: opt.transform.opt_a 0.00% : 0.000025s : 1: opt.transform.opt_after_cconv 0.00% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000091s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000034s : 4: opt.transform.symbol_engine_opt 0.24% : 0.002285s : 1: opt_a 0.01% : 0.000104s : 1: opt_after_cconv 0.05% : 0.000497s : 1: opt_after_jit_grad 0.02% : 0.000199s : 1: opt_b 0.45% : 0.004317s : 1: optimize 0.01% : 0.000065s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000028s : 1: py_interpret_to_execute 0.00% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000019s : 1: remove_dup_value 0.03% : 0.000257s : 1: renormalize.infer 0.02% : 0.000227s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000038s : 1: rewriter_after_opt_a 0.01% : 0.000068s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000077s : 1: symbol_engine_optimizer 97.33% : 0.936863s : 1: task_emit 0.01% : 0.000074s : 1: tuple_transform 0.68% : 0.006579s : 1: type_inference 0.01% : 0.000061s : 1: validate TotalTime = 0.0627794, [24] [bootstrap]: 0.00043333 [type_inference]: 0.00915849 [event_method]: 1.638e-05 [auto_monad]: 6.512e-05 [graph_reusing]: 6.11998e-06 [inline]: 5.27001e-06 [add_attr]: 0.00348666, [1] [add_attr_with_inline]: 0.0034769, [1] [Cycle 1]: 5.642e-05, [2] [tag_attr]: 1.523e-05 [meta_addattr_fg_expand]: 3.65003e-06 [parallel-infer-symbol]: 4.10998e-06 [pre_auto_parallel]: 2.853e-05 [insert-virtual-dataset]: 2.68998e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.89999e-06 [optimize]: 0.00437684, [53] [py_interpret_to_execute]: 2.188e-05 [rewriter_before_opt_a]: 5.354e-05 [opt_a]: 0.0022843, [2] [Cycle 1]: 0.00164697, [45] [expand_dump_flag]: 3.10002e-06 [switch_simplify]: 3.099e-05 [loop_unroll]: 1.744e-05 [a_1]: 0.00042688 [with_stream_mark]: 1.761e-05 [recompute_prepare]: 8.80001e-06 [updatestate_depend_eliminate]: 4.48999e-06 [updatestate_assign_eliminate]: 3.46001e-06 [updatestate_loads_eliminate]: 3.78999e-06 [parameter_eliminate]: 2.19001e-06 [a_2]: 8.565e-05 [accelerated_algorithm]: 7.55003e-06 [shard]: 2.36998e-06 [meta_shard_fg_expand]: 1.89999e-06 [shard_inline]: 6.25002e-06 [merge_send_recv]: 9.26002e-06 [auto_parallel]: 7.18e-06 [parallel]: 2.174e-05 [flash_sp]: 1.004e-05 [merge_comm]: 4.60001e-06 [allreduce_fusion]: 4.14002e-06 [matmul_add_comm_reduction]: 1.128e-05 [allreduce_slice_to_reducescatter]: 8.89995e-07 [virtual_shard_identity]: 8.93002e-06 [virtual_dataset]: 7.03e-06 [get_grad_eliminate_]: 5.51998e-06 [virtual_output]: 6.36e-06 [merge_forward]: 4.80999e-06 [cell_reuse_recompute_pass]: 1.69998e-06 [offload_activation]: 1.014e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.461e-05 [merge_recompute_call_nodes]: 2.03002e-06 [before_grad]: 1.101e-05 [set_forward_comm_id_for_comm_node_pass]: 4.15e-06 [meta_fg_expand]: 3.01001e-06 [flash_sp_send_recv_attached]: 2.95998e-06 [receive_attached]: 2.58e-06 [after_resolve]: 9.89001e-06 [a_after_grad]: 8.87e-06 [renormalize]: 0.00050058 [add_forward_monad_depend]: 5.49e-06 [auto_monad_grad]: 2.09e-06 [auto_monad_eliminator]: 1.543e-05 [cse]: 2.945e-05 [a_3]: 4.391e-05 [Cycle 2]: 0.00062734, [45] [expand_dump_flag]: 1.34998e-06 [switch_simplify]: 7.93001e-06 [loop_unroll]: 5.96e-06 [a_1]: 0.00011791 [with_stream_mark]: 1.229e-05 [recompute_prepare]: 6.09001e-06 [updatestate_depend_eliminate]: 3.32002e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.75002e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 7.281e-05 [accelerated_algorithm]: 6.15002e-06 [shard]: 1.19998e-06 [meta_shard_fg_expand]: 1.34e-06 [shard_inline]: 5.82001e-06 [merge_send_recv]: 4.89e-06 [auto_parallel]: 5.86998e-06 [parallel]: 4.02002e-06 [flash_sp]: 3.6e-06 [merge_comm]: 3.70998e-06 [allreduce_fusion]: 2.92002e-06 [matmul_add_comm_reduction]: 5.90002e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 6.47001e-06 [virtual_dataset]: 5.56998e-06 [get_grad_eliminate_]: 5.27999e-06 [virtual_output]: 5.22e-06 [merge_forward]: 2.69001e-06 [cell_reuse_recompute_pass]: 1.68002e-06 [offload_activation]: 6.84001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.171e-05 [merge_recompute_call_nodes]: 9.20001e-07 [before_grad]: 8.80001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46999e-06 [meta_fg_expand]: 1.99e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.40999e-06 [after_resolve]: 8.61002e-06 [a_after_grad]: 7.8e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 7.35e-06 [cse]: 1.45e-05 [a_3]: 3.399e-05 [py_interpret_to_execute_after_opt_a]: 9.22001e-06 [slice_cell_reuse_recomputed_activation]: 2.26e-06 [rewriter_after_opt_a]: 3.727e-05 [convert_after_rewriter]: 6.91999e-06 [order_py_execute_after_rewriter]: 5.32999e-06 [mutable_eliminate]: 0.00053135 [opt_b]: 0.00019603, [1] [Cycle 1]: 0.00018848, [7] [b_1]: 0.00011464 [b_2]: 7.97e-06 [updatestate_depend_eliminate]: 6.63e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.37999e-06 [renormalize]: 4.60015e-07 [cse]: 1.853e-05 [optimize_parallel_all_gather_comm]: 1.758e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.452e-05 [loop_unroll]: 0.00046371 [opt_after_cconv]: 0.00010111, [1] [Cycle 1]: 9.458e-05, [7] [c_1]: 2.64e-05 [parameter_eliminate]: 3.31999e-06 [updatestate_depend_eliminate]: 5.86e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.33002e-06 [cse]: 1.887e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.515e-05 [tuple_transform]: 7.016e-05, [1] [Cycle 1]: 6.539e-05, [4] [d_1]: 3.832e-05 [none_parameter_eliminate]: 1.58002e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.48e-06 [partial_unused_args_eliminate]: 2.04999e-06 [add_recomputation]: 5.215e-05 [cse_after_recomputation]: 2.287e-05, [1] [Cycle 1]: 1.835e-05, [1] [cse]: 1.255e-05 [environ_conv]: 6.76e-06 [swap_dp_allreduce_reducescatter]: 5.35999e-06 [bias_add_comm_swap]: 2.56e-06 [label_micro_interleaved_index]: 4.48001e-06 [label_fine_grained_interleaved_index]: 2.71999e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.48e-06 [assign_add_opt]: 1.36998e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 8.10018e-07 [full_micro_interleaved_order_control]: 2.14999e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 1.30001e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.43002e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 1.75001e-06 [control_data_broadcast_order]: 1.314e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.79002e-06 [overlap_recompute_and_grad_model_parallel]: 5.09003e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.47999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.51998e-06 [overlap_grad_ring_attention]: 4.44998e-06 [overlap_grad_flash_sp]: 1.913e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 1.84e-06 [handle_group_info]: 1.09998e-06 [symbol_engine_optimizer]: 0.00012768, [1] [Cycle 1]: 0.0001233, [6] [build]: 2.78998e-06 [elim_shapecalc]: 6.135e-05 [elim_not_effective]: 1.299e-05 [opt_reshape]: 6.59999e-06 [fold_const_symbol]: 9.34e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.88002e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.655e-05 [get_jit_bprop_graph]: 1.19e-06 [rewriter_after_jit_bprop_graph]: 4.66002e-06 [opt_after_jit_grad]: 0.00051578 [validate]: 4.302e-05 [backend_pass]: 1.13001e-06 [task_emit]: 0.0443664 [execute]: 7.87e-06 Sums bootstrap : 0.000433s : 0.74% type_inference : 0.009158s : 15.73% event_method : 0.000016s : 0.03% auto_monad : 0.000065s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000005s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000029s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.04% optimize.rewriter_before_opt_a : 0.000054s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.07% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000545s : 0.94% optimize.opt_a.with_stream_mark : 0.000030s : 0.05% optimize.opt_a.recompute_prepare : 0.000015s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000158s : 0.27% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000014s : 0.02% optimize.opt_a.auto_parallel : 0.000013s : 0.02% optimize.opt_a.parallel : 0.000026s : 0.04% optimize.opt_a.flash_sp : 0.000014s : 0.02% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.03% optimize.opt_a.virtual_dataset : 0.000013s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000012s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000017s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000026s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000020s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000501s : 0.86% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.04% optimize.opt_a.cse : 0.000044s : 0.08% optimize.opt_a.a_3 : 0.000078s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000037s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000531s : 0.91% optimize.opt_b.b_1 : 0.000115s : 0.20% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.04% optimize.loop_unroll : 0.000464s : 0.80% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000038s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.09% optimize.cse_after_recomputation.cse : 0.000013s : 0.02% optimize.environ_conv : 0.000007s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000019s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000061s : 0.11% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000017s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.01% opt_after_jit_grad : 0.000516s : 0.89% validate : 0.000043s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.044366s : 76.20% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000218 24 15.18% : 0.000033s : 4: substitution.arithmetic_simplify 0.87% : 0.000002s : 2: substitution.elim_not_effective 0.62% : 0.000001s : 2: substitution.fold_const_symbol 2.40% : 0.000005s : 3: substitution.graph_param_transform 75.55% : 0.000164s : 3: substitution.inline 1.45% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.22% : 0.000005s : 4: substitution.remove_not_recompute_node 1.70% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.009088 2 93.79% : 0.008524s : 1: type_inference.infer 6.21% : 0.000564s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000162 3 100.00% : 0.000162s : 3: match.inline ------[predicate.] 0.000154 815 0.89% : 0.000001s : 8: predicate.accumulaten_eliminater 1.11% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.82% : 0.000001s : 8: predicate.addn_zero_filter 0.76% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.18% : 0.000003s : 14: predicate.arithmetic_simplify 0.82% : 0.000001s : 8: predicate.cast_eliminate 0.77% : 0.000001s : 6: predicate.check_bprop_eliminate 0.61% : 0.000001s : 6: predicate.compare_switch_simplify 0.26% : 0.000000s : 3: predicate.const_output_eliminate 0.72% : 0.000001s : 6: predicate.depend_value_elim 0.80% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.86% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.44% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.02% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.03% : 0.000002s : 11: predicate.environ_get_depend_swap 1.72% : 0.000003s : 17: predicate.environ_get_eliminate 1.05% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.12% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.18% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 6: predicate.float_environ_get_switch 0.86% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.72% : 0.000001s : 6: predicate.get_grad_eliminate 0.26% : 0.000000s : 3: predicate.graph_param_transform 0.80% : 0.000001s : 6: predicate.incorporate_call 0.62% : 0.000001s : 6: predicate.incorporate_call_switch 6.37% : 0.000010s : 37: predicate.inline 0.88% : 0.000001s : 6: predicate.inline_without_move 0.44% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 6: predicate.less_batch_normalization 1.57% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 22: predicate.load_eliminater 1.28% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.92% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.81% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.63% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 8: predicate.minmaximum_grad 1.60% : 0.000002s : 3: predicate.mutable_eliminate 0.44% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.65% : 0.000003s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 11: predicate.partial_eliminate 0.84% : 0.000001s : 8: predicate.print_const_string_wrapper 0.64% : 0.000001s : 6: predicate.reduce_all_const_elim 1.09% : 0.000002s : 8: predicate.reduce_eliminate 2.22% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.77% : 0.000001s : 6: predicate.remove_not_recompute_node 1.30% : 0.000002s : 14: predicate.replace_applicator 0.63% : 0.000001s : 6: predicate.replace_old_param 0.43% : 0.000001s : 3: predicate.reset_defer_inline 0.87% : 0.000001s : 8: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 1.17% : 0.000002s : 6: predicate.same_eliminate 0.47% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.15% : 0.000002s : 6: predicate.shard_identity_eliminate 0.75% : 0.000001s : 6: predicate.special_op_eliminate 0.86% : 0.000001s : 6: predicate.specialize_transform 1.14% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.51% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.19% : 0.000002s : 11: predicate.switch_defer_inline 1.82% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.89% : 0.000008s : 38: predicate.switch_simplify 0.87% : 0.000001s : 8: predicate.tile_eliminate 0.80% : 0.000001s : 8: predicate.transpose_eliminate 1.48% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.05% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.91% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.76% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000357 7 35.44% : 0.000127s : 2: func_graph_cloner_run.FuncGraphClonerGraph 64.56% : 0.000231s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.072230 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.83% : 0.003492s : 1: add_attr 4.82% : 0.003481s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000057s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000071s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.65% : 0.000469s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000026s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000010s : 1: environ_conv 0.03% : 0.000023s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000009s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.66% : 0.000474s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000542s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000016s : 1: opt.transform.mutable_eliminate 1.29% : 0.000929s : 78: opt.transform.opt_a 0.03% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000094s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000035s : 4: opt.transform.symbol_engine_opt 3.17% : 0.002287s : 1: opt_a 0.14% : 0.000105s : 1: opt_after_cconv 0.73% : 0.000527s : 1: opt_after_jit_grad 0.28% : 0.000199s : 1: opt_b 6.07% : 0.004381s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.05% : 0.000033s : 1: pre_auto_parallel 0.04% : 0.000026s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.37% : 0.000270s : 1: renormalize.infer 0.31% : 0.000223s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000042s : 1: rewriter_after_opt_a 0.08% : 0.000058s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.18% : 0.000131s : 1: symbol_engine_optimizer 61.45% : 0.044388s : 1: task_emit 0.10% : 0.000073s : 1: tuple_transform 12.71% : 0.009181s : 1: type_inference 0.10% : 0.000070s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x8-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x8-ge],max_mem:10.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x9-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x9-pynative],max_mem:10.0M TotalTime = 0.0233246, [24] [bootstrap]: 0.00056233 [type_inference]: 0.00662757 [event_method]: 1.539e-05 [auto_monad]: 6.248e-05 [graph_reusing]: 5.75001e-06 [inline]: 2.08002e-06 [add_attr]: 0.00369239, [1] [add_attr_with_inline]: 0.00368083, [1] [Cycle 1]: 5.434e-05, [2] [tag_attr]: 1.593e-05 [meta_addattr_fg_expand]: 4.50001e-06 [parallel-infer-symbol]: 4.30999e-06 [pre_auto_parallel]: 2.849e-05 [insert-virtual-dataset]: 2.89999e-06 [parallel-infer-symbol-second]: 8.29983e-07 [dataset_repeat_opt]: 2.41998e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.00444977, [53] [py_interpret_to_execute]: 2.428e-05 [rewriter_before_opt_a]: 6.488e-05 [opt_a]: 0.00241268, [2] [Cycle 1]: 0.00171841, [45] [expand_dump_flag]: 3.14001e-06 [switch_simplify]: 3.395e-05 [loop_unroll]: 2.073e-05 [a_1]: 0.00046471 [with_stream_mark]: 1.501e-05 [recompute_prepare]: 8.49002e-06 [updatestate_depend_eliminate]: 4.13001e-06 [updatestate_assign_eliminate]: 3.7e-06 [updatestate_loads_eliminate]: 2.83998e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 8.26e-05 [accelerated_algorithm]: 6.89999e-06 [shard]: 2.44001e-06 [meta_shard_fg_expand]: 1.84998e-06 [shard_inline]: 6.59001e-06 [merge_send_recv]: 8.48999e-06 [auto_parallel]: 6.45002e-06 [parallel]: 2.518e-05 [flash_sp]: 8.07e-06 [merge_comm]: 3.97e-06 [allreduce_fusion]: 4.36002e-06 [matmul_add_comm_reduction]: 9.97001e-06 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 7.80998e-06 [virtual_dataset]: 6.24001e-06 [get_grad_eliminate_]: 5.70001e-06 [virtual_output]: 6.00002e-06 [merge_forward]: 4.13999e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 9.97001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.243e-05 [merge_recompute_call_nodes]: 1.57999e-06 [before_grad]: 1.068e-05 [set_forward_comm_id_for_comm_node_pass]: 4.63001e-06 [meta_fg_expand]: 2.90002e-06 [flash_sp_send_recv_attached]: 2.94999e-06 [receive_attached]: 2.23998e-06 [after_resolve]: 1.254e-05 [a_after_grad]: 9.05001e-06 [renormalize]: 0.00053884 [add_forward_monad_depend]: 9.74e-06 [auto_monad_grad]: 2.15002e-06 [auto_monad_eliminator]: 1.551e-05 [cse]: 3.109e-05 [a_3]: 4.437e-05 [Cycle 2]: 0.00068277, [45] [expand_dump_flag]: 1.40999e-06 [switch_simplify]: 7.46001e-06 [loop_unroll]: 5.87999e-06 [a_1]: 0.00017209 [with_stream_mark]: 1.21e-05 [recompute_prepare]: 6.42001e-06 [updatestate_depend_eliminate]: 3.25e-06 [updatestate_assign_eliminate]: 2.88e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 7.375e-05 [accelerated_algorithm]: 6.24001e-06 [shard]: 1.06002e-06 [meta_shard_fg_expand]: 1.34e-06 [shard_inline]: 6.06998e-06 [merge_send_recv]: 4.97e-06 [auto_parallel]: 6.00002e-06 [parallel]: 4.62998e-06 [flash_sp]: 3.23e-06 [merge_comm]: 3.23998e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.82001e-06 [allreduce_slice_to_reducescatter]: 3.20026e-07 [virtual_shard_identity]: 6.52001e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.29998e-06 [virtual_output]: 5.42001e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.58002e-06 [offload_activation]: 6.29001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.076e-05 [merge_recompute_call_nodes]: 8.79983e-07 [before_grad]: 9.56998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28998e-06 [meta_fg_expand]: 2.19999e-06 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.19e-06 [after_resolve]: 8.88002e-06 [a_after_grad]: 8.00999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.67001e-06 [auto_monad_grad]: 1.14e-06 [auto_monad_eliminator]: 8.43999e-06 [cse]: 1.485e-05 [a_3]: 3.341e-05 [py_interpret_to_execute_after_opt_a]: 9.34e-06 [slice_cell_reuse_recomputed_activation]: 1.94999e-06 [rewriter_after_opt_a]: 3.644e-05 [convert_after_rewriter]: 6.96001e-06 [order_py_execute_after_rewriter]: 4.80999e-06 [mutable_eliminate]: 0.00051787 [opt_b]: 0.00019982, [1] [Cycle 1]: 0.00019219, [7] [b_1]: 0.0001138 [b_2]: 7.97e-06 [updatestate_depend_eliminate]: 6.49001e-06 [updatestate_assign_eliminate]: 3.20002e-06 [updatestate_loads_eliminate]: 2.48002e-06 [renormalize]: 3.00002e-07 [cse]: 2.122e-05 [optimize_parallel_all_gather_comm]: 1.807e-05 [overlap_param_gather]: 2.26e-06 [cconv]: 2.584e-05 [loop_unroll]: 0.00045147 [opt_after_cconv]: 0.00010219, [1] [Cycle 1]: 9.59e-05, [7] [c_1]: 2.7e-05 [parameter_eliminate]: 3.26001e-06 [updatestate_depend_eliminate]: 5.99e-06 [updatestate_assign_eliminate]: 2.89999e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.864e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.578e-05 [tuple_transform]: 7.169e-05, [1] [Cycle 1]: 6.704e-05, [4] [d_1]: 3.981e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 6.78e-06 [partial_unused_args_eliminate]: 1.94e-06 [add_recomputation]: 5.204e-05 [cse_after_recomputation]: 2.171e-05, [1] [Cycle 1]: 1.712e-05, [1] [cse]: 1.17e-05 [environ_conv]: 9.07999e-06 [swap_dp_allreduce_reducescatter]: 5.44e-06 [bias_add_comm_swap]: 3.25e-06 [label_micro_interleaved_index]: 4.65001e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.34003e-06 [slice_recompute_activation]: 2.20002e-06 [micro_interleaved_order_control]: 2.51998e-06 [assign_add_opt]: 1.45999e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.57001e-06 [reorder_send_recv_between_fp_bp]: 2.69999e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 1.21002e-06 [interleave_split_concat_branches]: 1.24e-06 [interleave_parallel_branches]: 1.15001e-06 [overlap_opt_shard_in_pipeline]: 1.23002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.316e-05 [grouped_pairwise_exchange_alltoall]: 1.85001e-06 [offloading_packed_experts]: 4.08001e-06 [overlap_recompute_and_grad_model_parallel]: 4.71002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.35001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.43002e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 4.07998e-06 [overlap_grad_flash_sp]: 1.927e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.05002e-06 [split_layernorm_comm]: 1.75001e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 7.503e-05, [1] [Cycle 1]: 7.057e-05, [6] [build]: 3.19001e-06 [elim_shapecalc]: 1.06e-05 [elim_not_effective]: 1.229e-05 [opt_reshape]: 6.39001e-06 [fold_const_symbol]: 9.68997e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.92001e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.714e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 4.28001e-06 [opt_after_jit_grad]: 0.00050555 [validate]: 4e-05 [backend_pass]: 9.99979e-07 [task_emit]: 0.00706188 [execute]: 8e-06 Sums bootstrap : 0.000562s : 3.03% type_inference : 0.006628s : 35.67% event_method : 0.000015s : 0.08% auto_monad : 0.000062s : 0.34% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000028s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000024s : 0.13% optimize.rewriter_before_opt_a : 0.000065s : 0.35% optimize.opt_a.expand_dump_flag : 0.000005s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.14% optimize.opt_a.a_1 : 0.000637s : 3.43% optimize.opt_a.with_stream_mark : 0.000027s : 0.15% optimize.opt_a.recompute_prepare : 0.000015s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000156s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000030s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000020s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000539s : 2.90% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.06% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.13% optimize.opt_a.cse : 0.000046s : 0.25% optimize.opt_a.a_3 : 0.000078s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000518s : 2.79% optimize.opt_b.b_1 : 0.000114s : 0.61% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000021s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000026s : 0.14% optimize.loop_unroll : 0.000451s : 2.43% optimize.opt_after_cconv.c_1 : 0.000027s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000019s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000009s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000506s : 2.72% validate : 0.000040s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.007062s : 38.01% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000184 26 17.98% : 0.000033s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 3.24% : 0.000006s : 3: substitution.graph_param_transform 65.91% : 0.000121s : 3: substitution.inline 1.80% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.64% : 0.000005s : 4: substitution.remove_not_recompute_node 1.92% : 0.000004s : 2: substitution.replace_old_param 4.67% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006576 2 90.14% : 0.005928s : 1: type_inference.infer 9.86% : 0.000648s : 1: type_inference.specialize ------[replace.] 0.000037 4 78.59% : 0.000029s : 3: replace.inline 21.41% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000127 4 93.89% : 0.000119s : 3: match.inline 6.11% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000167 883 0.90% : 0.000001s : 9: predicate.accumulaten_eliminater 1.33% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000002s : 9: predicate.addn_zero_filter 0.81% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.17% : 0.000004s : 15: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.61% : 0.000001s : 6: predicate.check_bprop_eliminate 0.55% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.59% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.44% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.47% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.33% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_depend_swap 1.83% : 0.000003s : 18: predicate.environ_get_eliminate 1.07% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.41% : 0.000004s : 13: predicate.float_depend_g_call 0.53% : 0.000001s : 6: predicate.float_environ_get_switch 0.80% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.68% : 0.000001s : 6: predicate.get_grad_eliminate 0.33% : 0.000001s : 3: predicate.graph_param_transform 0.62% : 0.000001s : 6: predicate.incorporate_call 0.55% : 0.000001s : 6: predicate.incorporate_call_switch 6.60% : 0.000011s : 40: predicate.inline 0.85% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 6: predicate.less_batch_normalization 1.74% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.31% : 0.000004s : 25: predicate.load_eliminater 1.34% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.11% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.56% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.59% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 9: predicate.minmaximum_grad 1.45% : 0.000002s : 3: predicate.mutable_eliminate 0.34% : 0.000001s : 3: predicate.opt_reshape 0.49% : 0.000001s : 3: predicate.parallel_virtual_node 1.51% : 0.000003s : 13: predicate.partial_defer_inline 1.38% : 0.000002s : 13: predicate.partial_eliminate 0.86% : 0.000001s : 9: predicate.print_const_string_wrapper 0.59% : 0.000001s : 6: predicate.reduce_all_const_elim 1.14% : 0.000002s : 9: predicate.reduce_eliminate 2.28% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.18% : 0.000002s : 16: predicate.replace_applicator 0.62% : 0.000001s : 6: predicate.replace_old_param 0.40% : 0.000001s : 3: predicate.reset_defer_inline 0.92% : 0.000002s : 9: predicate.reshape_eliminate 0.58% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 3: predicate.row_tensor_eliminate 0.95% : 0.000002s : 6: predicate.same_eliminate 0.44% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.91% : 0.000002s : 6: predicate.shard_identity_eliminate 0.74% : 0.000001s : 6: predicate.special_op_eliminate 0.81% : 0.000001s : 6: predicate.specialize_transform 0.96% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.69% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 13: predicate.switch_defer_inline 1.91% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.98% : 0.000008s : 43: predicate.switch_simplify 0.86% : 0.000001s : 9: predicate.tile_eliminate 0.94% : 0.000002s : 9: predicate.transpose_eliminate 1.55% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.61% : 0.000003s : 15: predicate.tuple_list_get_item_depend_reorder 3.09% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.20% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.00% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 3: predicate.value_based_eliminate 0.72% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000404 8 44.88% : 0.000181s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.12% : 0.000223s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.033187 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.14% : 0.003698s : 1: add_attr 11.10% : 0.003684s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000069s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.82% : 0.000604s : 1: bootstrap 0.09% : 0.000030s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000012s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.39% : 0.000461s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.59% : 0.000528s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 3.09% : 0.001024s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000093s : 28: opt.transform.opt_b 0.13% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000035s : 4: opt.transform.symbol_engine_opt 7.28% : 0.002416s : 1: opt_a 0.32% : 0.000106s : 1: opt_after_cconv 1.55% : 0.000516s : 1: opt_after_jit_grad 0.61% : 0.000203s : 1: opt_b 13.42% : 0.004454s : 1: optimize 0.07% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000007s : 1: pipeline_split 0.10% : 0.000033s : 1: pre_auto_parallel 0.09% : 0.000028s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.85% : 0.000283s : 1: renormalize.infer 0.75% : 0.000248s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000040s : 1: rewriter_after_opt_a 0.21% : 0.000070s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000078s : 1: symbol_engine_optimizer 21.32% : 0.007075s : 1: task_emit 0.22% : 0.000075s : 1: tuple_transform 20.02% : 0.006645s : 1: type_inference 0.21% : 0.000071s : 1: validate TotalTime = 0.0223384, [24] [bootstrap]: 0.00050863 [type_inference]: 0.00648767 [event_method]: 1.358e-05 [auto_monad]: 6.226e-05 [graph_reusing]: 5.69e-06 [inline]: 2.44001e-06 [add_attr]: 0.00328604, [1] [add_attr_with_inline]: 0.00327607, [1] [Cycle 1]: 6.251e-05, [2] [tag_attr]: 1.582e-05 [meta_addattr_fg_expand]: 4.15e-06 [parallel-infer-symbol]: 4.49998e-06 [pre_auto_parallel]: 2.713e-05 [insert-virtual-dataset]: 3.01999e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.07001e-06 [pipeline_split]: 1.96e-06 [optimize]: 0.0044208, [53] [py_interpret_to_execute]: 2.449e-05 [rewriter_before_opt_a]: 5.467e-05 [opt_a]: 0.00233934, [2] [Cycle 1]: 0.0016915, [45] [expand_dump_flag]: 2.95998e-06 [switch_simplify]: 2.998e-05 [loop_unroll]: 1.809e-05 [a_1]: 0.00036616 [with_stream_mark]: 1.784e-05 [recompute_prepare]: 1.038e-05 [updatestate_depend_eliminate]: 4.28001e-06 [updatestate_assign_eliminate]: 3.51001e-06 [updatestate_loads_eliminate]: 3.58e-06 [parameter_eliminate]: 1.73002e-06 [a_2]: 8.485e-05 [accelerated_algorithm]: 7.65e-06 [shard]: 2.51e-06 [meta_shard_fg_expand]: 1.94e-06 [shard_inline]: 6.67002e-06 [merge_send_recv]: 9.11002e-06 [auto_parallel]: 7.23e-06 [parallel]: 1.894e-05 [flash_sp]: 8.94e-06 [merge_comm]: 4.02e-06 [allreduce_fusion]: 3.53e-06 [matmul_add_comm_reduction]: 1.014e-05 [allreduce_slice_to_reducescatter]: 8.2e-07 [virtual_shard_identity]: 1.016e-05 [virtual_dataset]: 6.78e-06 [get_grad_eliminate_]: 6.04999e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 4.32e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 1.032e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.264e-05 [merge_recompute_call_nodes]: 1.72001e-06 [before_grad]: 1.069e-05 [set_forward_comm_id_for_comm_node_pass]: 4.64998e-06 [meta_fg_expand]: 3.03e-06 [flash_sp_send_recv_attached]: 3.51999e-06 [receive_attached]: 2.27999e-06 [after_resolve]: 1.05e-05 [a_after_grad]: 8.82e-06 [renormalize]: 0.00061071 [add_forward_monad_depend]: 5.22e-06 [auto_monad_grad]: 2.22999e-06 [auto_monad_eliminator]: 1.569e-05 [cse]: 2.967e-05 [a_3]: 4.494e-05 [Cycle 2]: 0.00063684, [45] [expand_dump_flag]: 1.72999e-06 [switch_simplify]: 8.17003e-06 [loop_unroll]: 5.81e-06 [a_1]: 0.00011875 [with_stream_mark]: 1.182e-05 [recompute_prepare]: 6.34999e-06 [updatestate_depend_eliminate]: 3.35e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.87002e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 7.25e-05 [accelerated_algorithm]: 6.12999e-06 [shard]: 1.79e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.84999e-06 [merge_send_recv]: 5.49e-06 [auto_parallel]: 6.80998e-06 [parallel]: 5.03002e-06 [flash_sp]: 3.38e-06 [merge_comm]: 4.2e-06 [allreduce_fusion]: 2.96001e-06 [matmul_add_comm_reduction]: 7.39002e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 6.91999e-06 [virtual_dataset]: 5.25999e-06 [get_grad_eliminate_]: 5.32999e-06 [virtual_output]: 5.09e-06 [merge_forward]: 3.23e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 7.33999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.085e-05 [merge_recompute_call_nodes]: 1.07e-06 [before_grad]: 9.27999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.68e-06 [meta_fg_expand]: 1.99e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.14998e-06 [after_resolve]: 1.011e-05 [a_after_grad]: 7.95e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.79998e-06 [auto_monad_grad]: 1.15999e-06 [auto_monad_eliminator]: 7.95e-06 [cse]: 1.445e-05 [a_3]: 3.417e-05 [py_interpret_to_execute_after_opt_a]: 1.093e-05 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 3.571e-05 [convert_after_rewriter]: 7.58999e-06 [order_py_execute_after_rewriter]: 5.23002e-06 [mutable_eliminate]: 0.00053929 [opt_b]: 0.00020133, [1] [Cycle 1]: 0.00019317, [7] [b_1]: 0.00011421 [b_2]: 7.76001e-06 [updatestate_depend_eliminate]: 7.43e-06 [updatestate_assign_eliminate]: 2.58998e-06 [updatestate_loads_eliminate]: 2.64999e-06 [renormalize]: 3.19997e-07 [cse]: 2.172e-05 [optimize_parallel_all_gather_comm]: 1.8e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.768e-05 [loop_unroll]: 0.00047315 [opt_after_cconv]: 0.00010572, [1] [Cycle 1]: 9.879e-05, [7] [c_1]: 2.619e-05 [parameter_eliminate]: 3.2e-06 [updatestate_depend_eliminate]: 6.33998e-06 [updatestate_assign_eliminate]: 3.67998e-06 [updatestate_loads_eliminate]: 2.59001e-06 [cse]: 2.048e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.715e-05 [tuple_transform]: 7.532e-05, [1] [Cycle 1]: 7.103e-05, [4] [d_1]: 4.252e-05 [none_parameter_eliminate]: 1.97999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.85002e-06 [partial_unused_args_eliminate]: 1.81998e-06 [add_recomputation]: 4.705e-05 [cse_after_recomputation]: 2.301e-05, [1] [Cycle 1]: 1.712e-05, [1] [cse]: 1.164e-05 [environ_conv]: 5.24998e-06 [swap_dp_allreduce_reducescatter]: 5.15001e-06 [bias_add_comm_swap]: 2.96999e-06 [label_micro_interleaved_index]: 4.38001e-06 [label_fine_grained_interleaved_index]: 2.77002e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.60999e-06 [ForceFp32Comm]: 7.99977e-07 [remove_cast_before_assign_add]: 1.13001e-06 [full_micro_interleaved_order_control]: 2.71999e-06 [reorder_send_recv_between_fp_bp]: 2.94001e-06 [comm_op_add_attrs]: 1.44e-06 [add_comm_op_reuse_tag]: 1.19e-06 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69998e-06 [control_data_broadcast_order]: 1.322e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 4.65999e-06 [overlap_recompute_and_grad_model_parallel]: 4.92e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.26002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.36e-06 [overlap_grad_ring_attention]: 4.60001e-06 [overlap_grad_flash_sp]: 1.931e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 1.91e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 7.828e-05, [1] [Cycle 1]: 7.371e-05, [6] [build]: 3.04001e-06 [elim_shapecalc]: 1.066e-05 [elim_not_effective]: 1.258e-05 [opt_reshape]: 6.63998e-06 [fold_const_symbol]: 9.69999e-06 [renormalize]: 2.50002e-07 [detach_backward]: 2.07999e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.701e-05 [get_jit_bprop_graph]: 1.65001e-06 [rewriter_after_jit_bprop_graph]: 4.87e-06 [opt_after_jit_grad]: 0.0005188 [validate]: 3.982e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00669253 [execute]: 7.92e-06 Sums bootstrap : 0.000509s : 2.83% type_inference : 0.006488s : 36.07% event_method : 0.000014s : 0.08% auto_monad : 0.000062s : 0.35% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.03% pre_auto_parallel : 0.000027s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000024s : 0.14% optimize.rewriter_before_opt_a : 0.000055s : 0.30% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000038s : 0.21% optimize.opt_a.loop_unroll : 0.000024s : 0.13% optimize.opt_a.a_1 : 0.000485s : 2.70% optimize.opt_a.with_stream_mark : 0.000030s : 0.16% optimize.opt_a.recompute_prepare : 0.000017s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000157s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.07% optimize.opt_a.merge_send_recv : 0.000015s : 0.08% optimize.opt_a.auto_parallel : 0.000014s : 0.08% optimize.opt_a.parallel : 0.000024s : 0.13% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000017s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000008s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000018s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000020s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000611s : 3.40% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.13% optimize.opt_a.cse : 0.000044s : 0.25% optimize.opt_a.a_3 : 0.000079s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.20% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000539s : 3.00% optimize.opt_b.b_1 : 0.000114s : 0.64% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000022s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000028s : 0.15% optimize.loop_unroll : 0.000473s : 2.63% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000020s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000017s : 0.10% optimize.tuple_transform.d_1 : 0.000043s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.26% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.03% opt_after_jit_grad : 0.000519s : 2.88% validate : 0.000040s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006693s : 37.21% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000155 24 20.45% : 0.000032s : 4: substitution.arithmetic_simplify 1.39% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000001s : 2: substitution.fold_const_symbol 4.00% : 0.000006s : 3: substitution.graph_param_transform 65.49% : 0.000101s : 3: substitution.inline 2.05% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.94% : 0.000005s : 4: substitution.remove_not_recompute_node 2.74% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.006439 2 92.41% : 0.005951s : 1: type_inference.infer 7.59% : 0.000489s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000099 3 100.00% : 0.000099s : 3: match.inline ------[predicate.] 0.000153 815 0.82% : 0.000001s : 8: predicate.accumulaten_eliminater 0.94% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 6: predicate.addn_check_dump 0.88% : 0.000001s : 8: predicate.addn_zero_filter 0.77% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.27% : 0.000003s : 14: predicate.arithmetic_simplify 0.86% : 0.000001s : 8: predicate.cast_eliminate 0.66% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.80% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 1.06% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.33% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 11: predicate.environ_get_depend_swap 1.92% : 0.000003s : 17: predicate.environ_get_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.15% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 11: predicate.float_depend_g_call 0.60% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.71% : 0.000001s : 6: predicate.get_grad_eliminate 0.34% : 0.000001s : 3: predicate.graph_param_transform 0.69% : 0.000001s : 6: predicate.incorporate_call 0.60% : 0.000001s : 6: predicate.incorporate_call_switch 6.55% : 0.000010s : 37: predicate.inline 0.94% : 0.000001s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 6: predicate.less_batch_normalization 1.55% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 22: predicate.load_eliminater 1.83% : 0.000003s : 3: predicate.loop_unroll_after_grad 1.91% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 6: predicate.merge_addn 0.63% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.61% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 8: predicate.minmaximum_grad 2.22% : 0.000003s : 3: predicate.mutable_eliminate 0.40% : 0.000001s : 3: predicate.opt_reshape 0.39% : 0.000001s : 3: predicate.parallel_virtual_node 1.38% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 11: predicate.partial_eliminate 0.82% : 0.000001s : 8: predicate.print_const_string_wrapper 0.61% : 0.000001s : 6: predicate.reduce_all_const_elim 1.18% : 0.000002s : 8: predicate.reduce_eliminate 2.18% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.92% : 0.000001s : 6: predicate.remove_not_recompute_node 1.13% : 0.000002s : 14: predicate.replace_applicator 0.63% : 0.000001s : 6: predicate.replace_old_param 0.26% : 0.000000s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 8: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 3: predicate.row_tensor_eliminate 0.84% : 0.000001s : 6: predicate.same_eliminate 0.53% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.14% : 0.000002s : 6: predicate.shard_identity_eliminate 0.73% : 0.000001s : 6: predicate.special_op_eliminate 0.84% : 0.000001s : 6: predicate.specialize_transform 1.19% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.73% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.22% : 0.000002s : 11: predicate.switch_defer_inline 1.86% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.65% : 0.000007s : 38: predicate.switch_simplify 0.78% : 0.000001s : 8: predicate.tile_eliminate 0.81% : 0.000001s : 8: predicate.transpose_eliminate 1.54% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.15% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.94% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 3: predicate.value_based_eliminate 0.75% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000314 7 37.49% : 0.000118s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.51% : 0.000196s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031693 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.39% : 0.003291s : 1: add_attr 10.35% : 0.003280s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000068s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.72% : 0.000547s : 1: bootstrap 0.10% : 0.000032s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000026s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.53% : 0.000484s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000550s : 1: mutable_eliminate 0.02% : 0.000008s : 1: offloading_packed_experts 0.05% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.06% : 0.000018s : 1: opt.transform.mutable_eliminate 2.75% : 0.000871s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.08% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000093s : 28: opt.transform.opt_b 0.15% : 0.000047s : 2: opt.transform.opt_trans_graph 0.11% : 0.000035s : 4: opt.transform.symbol_engine_opt 7.39% : 0.002343s : 1: opt_a 0.34% : 0.000109s : 1: opt_after_cconv 1.67% : 0.000531s : 1: opt_after_jit_grad 0.65% : 0.000205s : 1: opt_b 13.96% : 0.004425s : 1: optimize 0.07% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000032s : 1: pre_auto_parallel 0.09% : 0.000028s : 1: py_interpret_to_execute 0.05% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000021s : 1: remove_dup_value 1.19% : 0.000378s : 1: renormalize.infer 0.70% : 0.000223s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000040s : 1: rewriter_after_opt_a 0.19% : 0.000060s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000081s : 1: symbol_engine_optimizer 21.16% : 0.006706s : 1: task_emit 0.25% : 0.000079s : 1: tuple_transform 20.53% : 0.006508s : 1: type_inference 0.23% : 0.000073s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x9-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x9-kbk],max_mem:10.0M . TotalTime = 9.54384, [24] [bootstrap]: 0.00068582 [type_inference]: 0.00676389 [event_method]: 1.48e-05 [auto_monad]: 6.194e-05 [graph_reusing]: 6.44001e-06 [inline]: 1.94999e-06 [add_attr]: 0.00428168, [1] [add_attr_with_inline]: 0.00426937, [1] [Cycle 1]: 5.801e-05, [2] [tag_attr]: 1.795e-05 [meta_addattr_fg_expand]: 4.50001e-06 [parallel-infer-symbol]: 3.76999e-06 [pre_auto_parallel]: 3.051e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.12001e-06 [pipeline_split]: 1.61998e-06 [optimize]: 0.00451991, [53] [py_interpret_to_execute]: 2.387e-05 [rewriter_before_opt_a]: 6.567e-05 [opt_a]: 0.00248614, [2] [Cycle 1]: 0.00185716, [45] [expand_dump_flag]: 2.81999e-06 [switch_simplify]: 3.637e-05 [loop_unroll]: 2.413e-05 [a_1]: 0.00055191 [with_stream_mark]: 1.613e-05 [recompute_prepare]: 9.41e-06 [updatestate_depend_eliminate]: 4.68999e-06 [updatestate_assign_eliminate]: 4.04002e-06 [updatestate_loads_eliminate]: 3.66001e-06 [parameter_eliminate]: 1.64998e-06 [a_2]: 9.668e-05 [accelerated_algorithm]: 7.95e-06 [shard]: 2.01e-06 [meta_shard_fg_expand]: 2.02999e-06 [shard_inline]: 7.23e-06 [merge_send_recv]: 8.82e-06 [auto_parallel]: 8.27998e-06 [parallel]: 2.726e-05 [flash_sp]: 8.47998e-06 [merge_comm]: 4.25e-06 [allreduce_fusion]: 4.01001e-06 [matmul_add_comm_reduction]: 1.012e-05 [allreduce_slice_to_reducescatter]: 9.39996e-07 [virtual_shard_identity]: 8.81002e-06 [virtual_dataset]: 7.23999e-06 [get_grad_eliminate_]: 6.71999e-06 [virtual_output]: 6.49001e-06 [merge_forward]: 4.72e-06 [cell_reuse_recompute_pass]: 1.46998e-06 [offload_activation]: 1.088e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.427e-05 [merge_recompute_call_nodes]: 1.85001e-06 [before_grad]: 1.107e-05 [set_forward_comm_id_for_comm_node_pass]: 4.4e-06 [meta_fg_expand]: 3.03998e-06 [flash_sp_send_recv_attached]: 3.21001e-06 [receive_attached]: 2.42001e-06 [after_resolve]: 1.116e-05 [a_after_grad]: 9.98002e-06 [renormalize]: 0.00054169 [add_forward_monad_depend]: 1.095e-05 [auto_monad_grad]: 2.36e-06 [auto_monad_eliminator]: 1.469e-05 [cse]: 2.915e-05 [a_3]: 4.389e-05 [Cycle 2]: 0.00061839, [45] [expand_dump_flag]: 1.32999e-06 [switch_simplify]: 7.13e-06 [loop_unroll]: 6.24999e-06 [a_1]: 0.00011731 [with_stream_mark]: 1.214e-05 [recompute_prepare]: 5.96e-06 [updatestate_depend_eliminate]: 3.11001e-06 [updatestate_assign_eliminate]: 2.99001e-06 [updatestate_loads_eliminate]: 2.69999e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 7.368e-05 [accelerated_algorithm]: 5.64998e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.33002e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 4.94003e-06 [auto_parallel]: 5.38002e-06 [parallel]: 5.04e-06 [flash_sp]: 3.37002e-06 [merge_comm]: 3.14001e-06 [allreduce_fusion]: 2.94999e-06 [matmul_add_comm_reduction]: 5.86998e-06 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 7.07002e-06 [virtual_dataset]: 5.61e-06 [get_grad_eliminate_]: 5.43002e-06 [virtual_output]: 5.13002e-06 [merge_forward]: 2.63e-06 [cell_reuse_recompute_pass]: 1.72999e-06 [offload_activation]: 5.98002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 8.58001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11001e-06 [meta_fg_expand]: 1.98002e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 9.79984e-07 [after_resolve]: 8.60999e-06 [a_after_grad]: 7.90998e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.31002e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 7.16001e-06 [cse]: 1.438e-05 [a_3]: 3.37e-05 [py_interpret_to_execute_after_opt_a]: 9.10999e-06 [slice_cell_reuse_recomputed_activation]: 1.92001e-06 [rewriter_after_opt_a]: 3.356e-05 [convert_after_rewriter]: 7.20998e-06 [order_py_execute_after_rewriter]: 5.01002e-06 [mutable_eliminate]: 0.00052299 [opt_b]: 0.00019619, [1] [Cycle 1]: 0.00018897, [7] [b_1]: 0.00011441 [b_2]: 7.5e-06 [updatestate_depend_eliminate]: 6.40997e-06 [updatestate_assign_eliminate]: 2.68998e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 5.3001e-07 [cse]: 1.892e-05 [optimize_parallel_all_gather_comm]: 1.723e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.57e-05 [loop_unroll]: 0.00043789 [opt_after_cconv]: 9.704e-05, [1] [Cycle 1]: 9.116e-05, [7] [c_1]: 2.654e-05 [parameter_eliminate]: 2.86e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.83e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.698e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.534e-05 [tuple_transform]: 6.936e-05, [1] [Cycle 1]: 6.493e-05, [4] [d_1]: 3.753e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.76e-06 [partial_unused_args_eliminate]: 2.18002e-06 [add_recomputation]: 5.452e-05 [cse_after_recomputation]: 2.217e-05, [1] [Cycle 1]: 1.742e-05, [1] [cse]: 1.207e-05 [environ_conv]: 1.045e-05 [swap_dp_allreduce_reducescatter]: 5.53002e-06 [bias_add_comm_swap]: 2.48002e-06 [label_micro_interleaved_index]: 4.98001e-06 [label_fine_grained_interleaved_index]: 2.66999e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 3.06001e-06 [assign_add_opt]: 1.50999e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.20002e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.26002e-06 [interleave_parallel_branches]: 1.37999e-06 [overlap_opt_shard_in_pipeline]: 1.21997e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.317e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 4.08001e-06 [overlap_recompute_and_grad_model_parallel]: 4.97e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.41002e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 4.28001e-06 [overlap_grad_flash_sp]: 1.884e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.53e-06 [split_layernorm_comm]: 2.09e-06 [handle_group_info]: 1.15999e-06 [symbol_engine_optimizer]: 7.378e-05, [1] [Cycle 1]: 6.937e-05, [6] [build]: 2.91e-06 [elim_shapecalc]: 9.07999e-06 [elim_not_effective]: 1.227e-05 [opt_reshape]: 6.17999e-06 [fold_const_symbol]: 9.97999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.95001e-06 [pipeline_parallel_scheduler]: 1.44998e-06 [auto_monad_reorder]: 1.67e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.93999e-06 [opt_after_jit_grad]: 0.00048052 [validate]: 3.827e-05 [backend_pass]: 9.79984e-07 [task_emit]: 9.52669 [execute]: 8.54e-06 Sums bootstrap : 0.000686s : 0.01% type_inference : 0.006764s : 0.07% event_method : 0.000015s : 0.00% auto_monad : 0.000062s : 0.00% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000031s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.00% optimize.rewriter_before_opt_a : 0.000066s : 0.00% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000044s : 0.00% optimize.opt_a.loop_unroll : 0.000030s : 0.00% optimize.opt_a.a_1 : 0.000669s : 0.01% optimize.opt_a.with_stream_mark : 0.000028s : 0.00% optimize.opt_a.recompute_prepare : 0.000015s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000170s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000014s : 0.00% optimize.opt_a.auto_parallel : 0.000014s : 0.00% optimize.opt_a.parallel : 0.000032s : 0.00% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.00% optimize.opt_a.virtual_dataset : 0.000013s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000542s : 0.01% optimize.opt_a.add_forward_monad_depend : 0.000012s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.00% optimize.opt_a.cse : 0.000044s : 0.00% optimize.opt_a.a_3 : 0.000078s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000523s : 0.01% optimize.opt_b.b_1 : 0.000114s : 0.00% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000019s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.00% optimize.loop_unroll : 0.000438s : 0.00% optimize.opt_after_cconv.c_1 : 0.000027s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.00% optimize.tuple_transform.d_1 : 0.000038s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000055s : 0.00% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000481s : 0.01% validate : 0.000038s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 9.526688s : 99.88% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000187 26 19.74% : 0.000037s : 5: substitution.arithmetic_simplify 1.00% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000002s : 2: substitution.fold_const_symbol 2.87% : 0.000005s : 3: substitution.graph_param_transform 64.12% : 0.000120s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.70% : 0.000005s : 4: substitution.remove_not_recompute_node 1.78% : 0.000003s : 2: substitution.replace_old_param 5.12% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006712 2 90.92% : 0.006103s : 1: type_inference.infer 9.08% : 0.000609s : 1: type_inference.specialize ------[replace.] 0.000040 4 76.49% : 0.000030s : 3: replace.inline 23.51% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000126 4 93.05% : 0.000117s : 3: match.inline 6.95% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000222 883 0.78% : 0.000002s : 9: predicate.accumulaten_eliminater 0.55% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.46% : 0.000001s : 6: predicate.addn_check_dump 0.76% : 0.000002s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 1.81% : 0.000004s : 15: predicate.arithmetic_simplify 0.67% : 0.000001s : 9: predicate.cast_eliminate 0.46% : 0.000001s : 6: predicate.check_bprop_eliminate 0.48% : 0.000001s : 6: predicate.compare_switch_simplify 0.15% : 0.000000s : 3: predicate.const_output_eliminate 0.49% : 0.000001s : 6: predicate.depend_value_elim 0.67% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.75% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.69% : 0.000002s : 9: predicate.dict_set_item_eliminator 0.85% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.18% : 0.000000s : 3: predicate.elim_not_effective 0.30% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 0.86% : 0.000002s : 12: predicate.environ_add_const_eliminate 0.83% : 0.000002s : 12: predicate.environ_get_add_eliminate 0.83% : 0.000002s : 12: predicate.environ_get_depend_swap 1.34% : 0.000003s : 18: predicate.environ_get_eliminate 0.83% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.03% : 0.000002s : 13: predicate.exchange_switch_depend_value 1.87% : 0.000004s : 13: predicate.float_depend_g_call 0.45% : 0.000001s : 6: predicate.float_environ_get_switch 0.63% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.15% : 0.000000s : 3: predicate.fold_const_symbol 0.55% : 0.000001s : 6: predicate.get_grad_eliminate 0.17% : 0.000000s : 3: predicate.graph_param_transform 0.52% : 0.000001s : 6: predicate.incorporate_call 0.44% : 0.000001s : 6: predicate.incorporate_call_switch 27.93% : 0.000062s : 40: predicate.inline 0.76% : 0.000002s : 6: predicate.inline_without_move 0.30% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.66% : 0.000001s : 6: predicate.less_batch_normalization 1.31% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 1.82% : 0.000004s : 25: predicate.load_eliminater 0.89% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.67% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.30% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.45% : 0.000001s : 6: predicate.merge_addn 0.44% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.45% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.66% : 0.000001s : 9: predicate.minmaximum_grad 0.99% : 0.000002s : 3: predicate.mutable_eliminate 0.26% : 0.000001s : 3: predicate.opt_reshape 0.36% : 0.000001s : 3: predicate.parallel_virtual_node 1.25% : 0.000003s : 13: predicate.partial_defer_inline 1.11% : 0.000002s : 13: predicate.partial_eliminate 0.70% : 0.000002s : 9: predicate.print_const_string_wrapper 0.50% : 0.000001s : 6: predicate.reduce_all_const_elim 0.90% : 0.000002s : 9: predicate.reduce_eliminate 1.79% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000001s : 6: predicate.remove_not_recompute_node 0.93% : 0.000002s : 16: predicate.replace_applicator 0.48% : 0.000001s : 6: predicate.replace_old_param 0.21% : 0.000000s : 3: predicate.reset_defer_inline 0.75% : 0.000002s : 9: predicate.reshape_eliminate 0.47% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.28% : 0.000001s : 3: predicate.row_tensor_eliminate 0.64% : 0.000001s : 6: predicate.same_eliminate 0.35% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.70% : 0.000002s : 6: predicate.shard_identity_eliminate 0.52% : 0.000001s : 6: predicate.special_op_eliminate 0.61% : 0.000001s : 6: predicate.specialize_transform 0.70% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.58% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.07% : 0.000002s : 13: predicate.switch_defer_inline 1.53% : 0.000003s : 19: predicate.switch_layer_defer_inline 3.89% : 0.000009s : 43: predicate.switch_simplify 0.75% : 0.000002s : 9: predicate.tile_eliminate 0.69% : 0.000002s : 9: predicate.transpose_eliminate 1.19% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.23% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.06% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 2.45% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.24% : 0.000003s : 15: predicate.tuple_list_get_set_item_eliminator 1.80% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.23% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 1.84% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.41% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.27% : 0.000001s : 3: predicate.value_based_eliminate 0.56% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.50% : 0.000001s : 6: predicate.virtual_output_eliminate 0.21% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000380 8 44.37% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.63% : 0.000211s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 9.554404 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.04% : 0.004287s : 1: add_attr 0.04% : 0.004274s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000059s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000068s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.01% : 0.000731s : 1: bootstrap 0.00% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000014s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.00% : 0.000447s : 1: loop_unroll 0.00% : 0.000016s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.01% : 0.000533s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000015s : 1: opt.transform.mutable_eliminate 0.01% : 0.001077s : 78: opt.transform.opt_a 0.00% : 0.000025s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000093s : 28: opt.transform.opt_b 0.00% : 0.000042s : 2: opt.transform.opt_trans_graph 0.00% : 0.000034s : 4: opt.transform.symbol_engine_opt 0.03% : 0.002489s : 1: opt_a 0.00% : 0.000101s : 1: opt_after_cconv 0.01% : 0.000490s : 1: opt_after_jit_grad 0.00% : 0.000200s : 1: opt_b 0.05% : 0.004524s : 1: optimize 0.00% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000035s : 1: pre_auto_parallel 0.00% : 0.000028s : 1: py_interpret_to_execute 0.00% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000019s : 1: remove_dup_value 0.00% : 0.000290s : 1: renormalize.infer 0.00% : 0.000244s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000038s : 1: rewriter_after_opt_a 0.00% : 0.000070s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000077s : 1: symbol_engine_optimizer 99.71% : 9.526706s : 1: task_emit 0.00% : 0.000072s : 1: tuple_transform 0.07% : 0.006780s : 1: type_inference 0.00% : 0.000062s : 1: validate TotalTime = 0.0794893, [24] [bootstrap]: 0.00054033 [type_inference]: 0.00599869 [event_method]: 1.244e-05 [auto_monad]: 5.947e-05 [graph_reusing]: 5.97001e-06 [inline]: 2.01e-06 [add_attr]: 0.00308097, [1] [add_attr_with_inline]: 0.00307329, [1] [Cycle 1]: 4.835e-05, [2] [tag_attr]: 1.435e-05 [meta_addattr_fg_expand]: 3.91999e-06 [parallel-infer-symbol]: 3.01999e-06 [pre_auto_parallel]: 2.269e-05 [insert-virtual-dataset]: 2.79001e-06 [parallel-infer-symbol-second]: 6.99976e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.00397551, [53] [py_interpret_to_execute]: 1.9e-05 [rewriter_before_opt_a]: 5.256e-05 [opt_a]: 0.00203256, [2] [Cycle 1]: 0.00141645, [45] [expand_dump_flag]: 3.28e-06 [switch_simplify]: 2.907e-05 [loop_unroll]: 1.726e-05 [a_1]: 0.00035541 [with_stream_mark]: 1.369e-05 [recompute_prepare]: 8.37e-06 [updatestate_depend_eliminate]: 3.97002e-06 [updatestate_assign_eliminate]: 3.6e-06 [updatestate_loads_eliminate]: 3.39001e-06 [parameter_eliminate]: 1.91e-06 [a_2]: 8.265e-05 [accelerated_algorithm]: 7.05e-06 [shard]: 2.17001e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 6.45002e-06 [merge_send_recv]: 8.79998e-06 [auto_parallel]: 6.12999e-06 [parallel]: 1.892e-05 [flash_sp]: 7.23999e-06 [merge_comm]: 4.02e-06 [allreduce_fusion]: 3.76001e-06 [matmul_add_comm_reduction]: 9.04e-06 [allreduce_slice_to_reducescatter]: 7.7e-07 [virtual_shard_identity]: 7.65998e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.76e-06 [virtual_output]: 5.87999e-06 [merge_forward]: 3.97e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.38002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.176e-05 [merge_recompute_call_nodes]: 1.52999e-06 [before_grad]: 1.069e-05 [set_forward_comm_id_for_comm_node_pass]: 3.71999e-06 [meta_fg_expand]: 2.73003e-06 [flash_sp_send_recv_attached]: 2.42001e-06 [receive_attached]: 1.97999e-06 [after_resolve]: 9.98002e-06 [a_after_grad]: 9.11002e-06 [renormalize]: 0.00039934 [add_forward_monad_depend]: 4.63999e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.336e-05 [cse]: 3.033e-05 [a_3]: 4.222e-05 [Cycle 2]: 0.00060686, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.89999e-06 [loop_unroll]: 5.88002e-06 [a_1]: 0.00011383 [with_stream_mark]: 1.228e-05 [recompute_prepare]: 6.17001e-06 [updatestate_depend_eliminate]: 2.99999e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.89999e-06 [parameter_eliminate]: 9.99979e-07 [a_2]: 7.323e-05 [accelerated_algorithm]: 5.79999e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.73002e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.64e-06 [parallel]: 4.37e-06 [flash_sp]: 3.56001e-06 [merge_comm]: 3.14999e-06 [allreduce_fusion]: 2.96001e-06 [matmul_add_comm_reduction]: 5.61e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.31e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.32001e-06 [virtual_output]: 5.22999e-06 [merge_forward]: 2.71e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 6.48e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.014e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.68001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46001e-06 [meta_fg_expand]: 2.03002e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 9.00007e-07 [after_resolve]: 8.89e-06 [a_after_grad]: 7.87003e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.20001e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 6.36e-06 [cse]: 1.425e-05 [a_3]: 3.307e-05 [py_interpret_to_execute_after_opt_a]: 7.79997e-06 [slice_cell_reuse_recomputed_activation]: 2.26e-06 [rewriter_after_opt_a]: 3.523e-05 [convert_after_rewriter]: 7.15e-06 [order_py_execute_after_rewriter]: 5.22e-06 [mutable_eliminate]: 0.00052595 [opt_b]: 0.00019264, [1] [Cycle 1]: 0.00018637, [7] [b_1]: 0.00011308 [b_2]: 7.50998e-06 [updatestate_depend_eliminate]: 5.18002e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.67001e-06 [renormalize]: 4.69998e-07 [cse]: 1.846e-05 [optimize_parallel_all_gather_comm]: 1.692e-05 [overlap_param_gather]: 2.26e-06 [cconv]: 2.262e-05 [loop_unroll]: 0.00041964 [opt_after_cconv]: 9.642e-05, [1] [Cycle 1]: 9.081e-05, [7] [c_1]: 2.56e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.44001e-06 [cse]: 1.81e-05 [renormalize]: 6.00005e-07 [remove_dup_value]: 1.37e-05 [tuple_transform]: 6.743e-05, [1] [Cycle 1]: 6.305e-05, [4] [d_1]: 3.657e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 2.79979e-07 [switch_simplify]: 6.50002e-06 [partial_unused_args_eliminate]: 1.85001e-06 [add_recomputation]: 4.378e-05 [cse_after_recomputation]: 2.169e-05, [1] [Cycle 1]: 1.702e-05, [1] [cse]: 1.181e-05 [environ_conv]: 5.00001e-06 [swap_dp_allreduce_reducescatter]: 5.56998e-06 [bias_add_comm_swap]: 2.41e-06 [label_micro_interleaved_index]: 4.21001e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.23002e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.76999e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.12999e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76998e-06 [control_data_broadcast_order]: 1.221e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 4.15999e-06 [overlap_recompute_and_grad_model_parallel]: 4.97e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 4.30999e-06 [overlap_grad_flash_sp]: 1.746e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.22001e-06 [split_layernorm_comm]: 1.99e-06 [handle_group_info]: 1.19e-06 [symbol_engine_optimizer]: 7.17e-05, [1] [Cycle 1]: 6.744e-05, [6] [build]: 2.10002e-06 [elim_shapecalc]: 9.16998e-06 [elim_not_effective]: 1.226e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 9.30001e-06 [renormalize]: 2.29978e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.92001e-06 [auto_monad_reorder]: 1.583e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.00045596 [validate]: 3.566e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.0650361 [execute]: 1.008e-05 Sums bootstrap : 0.000540s : 0.72% type_inference : 0.005999s : 7.96% event_method : 0.000012s : 0.02% auto_monad : 0.000059s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.03% optimize.rewriter_before_opt_a : 0.000053s : 0.07% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000036s : 0.05% optimize.opt_a.loop_unroll : 0.000023s : 0.03% optimize.opt_a.a_1 : 0.000469s : 0.62% optimize.opt_a.with_stream_mark : 0.000026s : 0.03% optimize.opt_a.recompute_prepare : 0.000015s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000156s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000399s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000045s : 0.06% optimize.opt_a.a_3 : 0.000075s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000526s : 0.70% optimize.opt_b.b_1 : 0.000113s : 0.15% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000420s : 0.56% optimize.opt_after_cconv.c_1 : 0.000026s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000014s : 0.02% optimize.tuple_transform.d_1 : 0.000037s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000456s : 0.60% validate : 0.000036s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.065036s : 86.26% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000140 24 20.32% : 0.000028s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 3.50% : 0.000005s : 3: substitution.graph_param_transform 65.93% : 0.000092s : 3: substitution.inline 2.26% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.17% : 0.000004s : 4: substitution.remove_not_recompute_node 2.35% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005952 2 92.03% : 0.005478s : 1: type_inference.infer 7.97% : 0.000474s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000091 3 100.00% : 0.000091s : 3: match.inline ------[predicate.] 0.000149 815 0.85% : 0.000001s : 8: predicate.accumulaten_eliminater 0.95% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 6: predicate.addn_check_dump 0.98% : 0.000001s : 8: predicate.addn_zero_filter 0.77% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.27% : 0.000003s : 14: predicate.arithmetic_simplify 0.90% : 0.000001s : 8: predicate.cast_eliminate 0.71% : 0.000001s : 6: predicate.check_bprop_eliminate 0.63% : 0.000001s : 6: predicate.compare_switch_simplify 0.24% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.90% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 1.14% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.48% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_depend_swap 1.76% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 6: predicate.float_environ_get_switch 0.93% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.77% : 0.000001s : 6: predicate.incorporate_call 0.62% : 0.000001s : 6: predicate.incorporate_call_switch 6.27% : 0.000009s : 37: predicate.inline 1.03% : 0.000002s : 6: predicate.inline_without_move 0.43% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.04% : 0.000002s : 6: predicate.less_batch_normalization 1.57% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.25% : 0.000003s : 22: predicate.load_eliminater 1.10% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.67% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 6: predicate.merge_addn 0.67% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 8: predicate.minmaximum_grad 1.17% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.43% : 0.000002s : 11: predicate.partial_defer_inline 1.36% : 0.000002s : 11: predicate.partial_eliminate 0.83% : 0.000001s : 8: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.12% : 0.000002s : 8: predicate.reduce_eliminate 2.22% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.64% : 0.000001s : 6: predicate.remove_not_recompute_node 1.22% : 0.000002s : 14: predicate.replace_applicator 0.67% : 0.000001s : 6: predicate.replace_old_param 0.31% : 0.000000s : 3: predicate.reset_defer_inline 0.94% : 0.000001s : 8: predicate.reshape_eliminate 0.69% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.89% : 0.000001s : 6: predicate.same_eliminate 0.52% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 6: predicate.shard_identity_eliminate 0.81% : 0.000001s : 6: predicate.special_op_eliminate 0.93% : 0.000001s : 6: predicate.specialize_transform 1.00% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 11: predicate.switch_defer_inline 1.93% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.90% : 0.000007s : 38: predicate.switch_simplify 0.87% : 0.000001s : 8: predicate.tile_eliminate 0.87% : 0.000001s : 8: predicate.transpose_eliminate 1.56% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.75% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.36% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000305 7 38.98% : 0.000119s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.02% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.087940 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.51% : 0.003086s : 1: add_attr 3.50% : 0.003077s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000065s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.66% : 0.000577s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000018s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.49% : 0.000428s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.61% : 0.000535s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 0.96% : 0.000842s : 78: opt.transform.opt_a 0.03% : 0.000024s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000093s : 28: opt.transform.opt_b 0.05% : 0.000041s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.31% : 0.002035s : 1: opt_a 0.11% : 0.000100s : 1: opt_after_cconv 0.53% : 0.000466s : 1: opt_after_jit_grad 0.22% : 0.000196s : 1: opt_b 4.53% : 0.003979s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.03% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.23% : 0.000202s : 1: renormalize.infer 0.22% : 0.000191s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000039s : 1: rewriter_after_opt_a 0.06% : 0.000057s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000074s : 1: symbol_engine_optimizer 73.98% : 0.065060s : 1: task_emit 0.08% : 0.000070s : 1: tuple_transform 6.84% : 0.006014s : 1: type_inference 0.07% : 0.000060s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[2-dtype_x9-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2-dtype_x9-ge],max_mem:10.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x0-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x0-pynative],max_mem:10.0M TotalTime = 0.0222902, [24] [bootstrap]: 0.00052024 [type_inference]: 0.00653497 [event_method]: 1.406e-05 [auto_monad]: 6.389e-05 [graph_reusing]: 5.96e-06 [inline]: 2.24001e-06 [add_attr]: 0.0036723, [1] [add_attr_with_inline]: 0.00366163, [1] [Cycle 1]: 4.798e-05, [2] [tag_attr]: 1.561e-05 [meta_addattr_fg_expand]: 4.94998e-06 [parallel-infer-symbol]: 3.05998e-06 [pre_auto_parallel]: 2.592e-05 [insert-virtual-dataset]: 3.04999e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.76e-06 [optimize]: 0.00413261, [53] [py_interpret_to_execute]: 2.169e-05 [rewriter_before_opt_a]: 6.403e-05 [opt_a]: 0.00218716, [2] [Cycle 1]: 0.00157018, [45] [expand_dump_flag]: 3.23e-06 [switch_simplify]: 3.397e-05 [loop_unroll]: 2.089e-05 [a_1]: 0.00044371 [with_stream_mark]: 1.431e-05 [recompute_prepare]: 8.36002e-06 [updatestate_depend_eliminate]: 3.58e-06 [updatestate_assign_eliminate]: 3.41001e-06 [updatestate_loads_eliminate]: 3.16001e-06 [parameter_eliminate]: 1.84998e-06 [a_2]: 8.031e-05 [accelerated_algorithm]: 6.88998e-06 [shard]: 2.76999e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 6.33e-06 [merge_send_recv]: 8.54998e-06 [auto_parallel]: 6.48e-06 [parallel]: 2.429e-05 [flash_sp]: 7.59002e-06 [merge_comm]: 4.12e-06 [allreduce_fusion]: 3.95e-06 [matmul_add_comm_reduction]: 9.04e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 8e-06 [virtual_dataset]: 6.52001e-06 [get_grad_eliminate_]: 5.62999e-06 [virtual_output]: 5.68002e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 1.029e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.169e-05 [merge_recompute_call_nodes]: 1.54998e-06 [before_grad]: 9.81998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.93003e-06 [flash_sp_send_recv_attached]: 2.46998e-06 [receive_attached]: 2.04e-06 [after_resolve]: 9.39998e-06 [a_after_grad]: 8.54002e-06 [renormalize]: 0.00044219 [add_forward_monad_depend]: 8.12e-06 [auto_monad_grad]: 2.04e-06 [auto_monad_eliminator]: 1.334e-05 [cse]: 3.085e-05 [a_3]: 4.31e-05 [Cycle 2]: 0.00060798, [45] [expand_dump_flag]: 8.90024e-07 [switch_simplify]: 7.06999e-06 [loop_unroll]: 5.82999e-06 [a_1]: 0.00011563 [with_stream_mark]: 1.001e-05 [recompute_prepare]: 5.92999e-06 [updatestate_depend_eliminate]: 3.11001e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.66999e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 7.299e-05 [accelerated_algorithm]: 5.91e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.87999e-06 [merge_send_recv]: 4.52e-06 [auto_parallel]: 5.37999e-06 [parallel]: 3.94002e-06 [flash_sp]: 3.48e-06 [merge_comm]: 3.20998e-06 [allreduce_fusion]: 2.79001e-06 [matmul_add_comm_reduction]: 5.04e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.54999e-06 [virtual_dataset]: 5.38002e-06 [get_grad_eliminate_]: 5.30001e-06 [virtual_output]: 5.25999e-06 [merge_forward]: 3.05002e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.034e-05 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 8.86002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23998e-06 [meta_fg_expand]: 1.76003e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 9.39996e-07 [after_resolve]: 8.40001e-06 [a_after_grad]: 7.82e-06 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 8.99978e-07 [auto_monad_eliminator]: 6.25002e-06 [cse]: 1.481e-05 [a_3]: 3.293e-05 [py_interpret_to_execute_after_opt_a]: 7.98999e-06 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 3.08e-05 [convert_after_rewriter]: 6.88e-06 [order_py_execute_after_rewriter]: 5.34e-06 [mutable_eliminate]: 0.00046584 [opt_b]: 0.0001909, [1] [Cycle 1]: 0.00018469, [7] [b_1]: 0.00011266 [b_2]: 7.68999e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.56e-06 [renormalize]: 5.89993e-07 [cse]: 1.837e-05 [optimize_parallel_all_gather_comm]: 1.72e-05 [overlap_param_gather]: 1.89999e-06 [cconv]: 4.455e-05 [loop_unroll]: 0.00043255 [opt_after_cconv]: 9.679e-05, [1] [Cycle 1]: 9.108e-05, [7] [c_1]: 2.619e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.37999e-06 [cse]: 1.811e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.539e-05 [tuple_transform]: 6.869e-05, [1] [Cycle 1]: 6.424e-05, [4] [d_1]: 3.747e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.46e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 5.12e-05 [cse_after_recomputation]: 2.209e-05, [1] [Cycle 1]: 1.743e-05, [1] [cse]: 1.207e-05 [environ_conv]: 7.88999e-06 [swap_dp_allreduce_reducescatter]: 5.06997e-06 [bias_add_comm_swap]: 2.83e-06 [label_micro_interleaved_index]: 4.42e-06 [label_fine_grained_interleaved_index]: 2.79999e-06 [merge_cast_opt]: 1.60001e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.61e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 9.20001e-07 [remove_cast_before_assign_add]: 1.09998e-06 [full_micro_interleaved_order_control]: 2.24999e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.15001e-06 [overlap_opt_shard_in_pipeline]: 1.29998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89e-06 [control_data_broadcast_order]: 1.246e-05 [grouped_pairwise_exchange_alltoall]: 1.74998e-06 [offloading_packed_experts]: 3.75e-06 [overlap_recompute_and_grad_model_parallel]: 4.69998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 4.17998e-06 [overlap_grad_flash_sp]: 1.752e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 2.04e-06 [handle_group_info]: 1.08001e-06 [symbol_engine_optimizer]: 7.221e-05, [1] [Cycle 1]: 6.769e-05, [6] [build]: 2.38998e-06 [elim_shapecalc]: 9.22999e-06 [elim_not_effective]: 1.218e-05 [opt_reshape]: 6.41998e-06 [fold_const_symbol]: 9.82001e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.99999e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.66e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 0.00013298 [opt_after_jit_grad]: 0.00046681 [validate]: 3.448e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.00643691 [execute]: 6.84999e-06 Sums bootstrap : 0.000520s : 2.95% type_inference : 0.006535s : 37.12% event_method : 0.000014s : 0.08% auto_monad : 0.000064s : 0.36% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.12% optimize.rewriter_before_opt_a : 0.000064s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000559s : 3.18% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000153s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000016s : 0.09% optimize.opt_a.renormalize : 0.000442s : 2.51% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.05% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000046s : 0.26% optimize.opt_a.a_3 : 0.000076s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000466s : 2.65% optimize.opt_b.b_1 : 0.000113s : 0.64% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000045s : 0.25% optimize.loop_unroll : 0.000433s : 2.46% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000008s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000133s : 0.76% opt_after_jit_grad : 0.000467s : 2.65% validate : 0.000034s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006437s : 36.56% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000164 26 19.01% : 0.000031s : 5: substitution.arithmetic_simplify 1.16% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000002s : 2: substitution.fold_const_symbol 3.22% : 0.000005s : 3: substitution.graph_param_transform 63.72% : 0.000105s : 3: substitution.inline 1.91% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.69% : 0.000004s : 4: substitution.remove_not_recompute_node 1.84% : 0.000003s : 2: substitution.replace_old_param 5.43% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006485 2 89.98% : 0.005835s : 1: type_inference.infer 10.02% : 0.000650s : 1: type_inference.specialize ------[replace.] 0.000039 4 78.13% : 0.000030s : 3: replace.inline 21.87% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000111 4 92.64% : 0.000103s : 3: match.inline 7.36% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 883 0.89% : 0.000001s : 9: predicate.accumulaten_eliminater 0.91% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.93% : 0.000001s : 9: predicate.addn_zero_filter 0.82% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.16% : 0.000003s : 15: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.66% : 0.000001s : 6: predicate.depend_value_elim 0.89% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 9: predicate.dict_set_item_eliminator 0.97% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.40% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.15% : 0.000002s : 12: predicate.environ_get_depend_swap 1.78% : 0.000003s : 18: predicate.environ_get_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.37% : 0.000004s : 13: predicate.float_depend_g_call 0.57% : 0.000001s : 6: predicate.float_environ_get_switch 0.85% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.73% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.57% : 0.000001s : 6: predicate.incorporate_call_switch 6.49% : 0.000010s : 40: predicate.inline 0.89% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 6: predicate.less_batch_normalization 1.63% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.55% : 0.000004s : 25: predicate.load_eliminater 0.97% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.08% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.18% : 0.000002s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.67% : 0.000003s : 13: predicate.partial_defer_inline 1.51% : 0.000002s : 13: predicate.partial_eliminate 0.88% : 0.000001s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.28% : 0.000002s : 9: predicate.reduce_eliminate 2.40% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 6: predicate.remove_not_recompute_node 1.28% : 0.000002s : 16: predicate.replace_applicator 0.63% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 9: predicate.reshape_eliminate 0.64% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.79% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.77% : 0.000001s : 6: predicate.shard_identity_eliminate 0.76% : 0.000001s : 6: predicate.special_op_eliminate 0.86% : 0.000001s : 6: predicate.specialize_transform 0.95% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.74% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 13: predicate.switch_defer_inline 1.98% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.95% : 0.000008s : 43: predicate.switch_simplify 0.89% : 0.000001s : 9: predicate.tile_eliminate 0.91% : 0.000001s : 9: predicate.transpose_eliminate 1.58% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.69% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.44% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.57% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.08% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.69% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000428 8 48.71% : 0.000209s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.29% : 0.000220s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031622 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.63% : 0.003677s : 1: add_attr 11.59% : 0.003665s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000069s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.77% : 0.000559s : 1: bootstrap 0.15% : 0.000049s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000011s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000441s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.50% : 0.000475s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.96% : 0.000935s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000092s : 28: opt.transform.opt_b 0.13% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 6.93% : 0.002190s : 1: opt_a 0.32% : 0.000100s : 1: opt_after_cconv 1.51% : 0.000477s : 1: opt_after_jit_grad 0.61% : 0.000194s : 1: opt_b 13.08% : 0.004137s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.70% : 0.000222s : 1: renormalize.infer 0.68% : 0.000214s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.44% : 0.000139s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000068s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000075s : 1: symbol_engine_optimizer 20.39% : 0.006447s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 20.71% : 0.006549s : 1: type_inference 0.21% : 0.000065s : 1: validate TotalTime = 0.0199265, [24] [bootstrap]: 0.00042128 [type_inference]: 0.00564605 [event_method]: 1.213e-05 [auto_monad]: 5.948e-05 [graph_reusing]: 5.59998e-06 [inline]: 1.95001e-06 [add_attr]: 0.00299775, [1] [add_attr_with_inline]: 0.00299013, [1] [Cycle 1]: 4.895e-05, [2] [tag_attr]: 1.413e-05 [meta_addattr_fg_expand]: 4.13001e-06 [parallel-infer-symbol]: 2.95998e-06 [pre_auto_parallel]: 2.436e-05 [insert-virtual-dataset]: 2.81e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 2.26e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.0039622, [53] [py_interpret_to_execute]: 1.861e-05 [rewriter_before_opt_a]: 5.168e-05 [opt_a]: 0.00206666, [2] [Cycle 1]: 0.00144697, [45] [expand_dump_flag]: 2.89999e-06 [switch_simplify]: 2.852e-05 [loop_unroll]: 1.746e-05 [a_1]: 0.00039059 [with_stream_mark]: 1.436e-05 [recompute_prepare]: 8.42998e-06 [updatestate_depend_eliminate]: 3.71001e-06 [updatestate_assign_eliminate]: 3.59002e-06 [updatestate_loads_eliminate]: 3.01001e-06 [parameter_eliminate]: 2.06998e-06 [a_2]: 8.114e-05 [accelerated_algorithm]: 6.84001e-06 [shard]: 2.61999e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 6.22001e-06 [merge_send_recv]: 8.22998e-06 [auto_parallel]: 6.31e-06 [parallel]: 1.896e-05 [flash_sp]: 7.18998e-06 [merge_comm]: 4.38001e-06 [allreduce_fusion]: 3.78001e-06 [matmul_add_comm_reduction]: 9.32001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.93999e-06 [virtual_dataset]: 6.19001e-06 [get_grad_eliminate_]: 6.01e-06 [virtual_output]: 5.89e-06 [merge_forward]: 4.02e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 9.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.232e-05 [merge_recompute_call_nodes]: 1.92001e-06 [before_grad]: 1.002e-05 [set_forward_comm_id_for_comm_node_pass]: 3.82998e-06 [meta_fg_expand]: 2.79999e-06 [flash_sp_send_recv_attached]: 2.48998e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 9.69e-06 [a_after_grad]: 8.84e-06 [renormalize]: 0.00039059 [add_forward_monad_depend]: 5.34e-06 [auto_monad_grad]: 2.02999e-06 [auto_monad_eliminator]: 1.325e-05 [cse]: 2.932e-05 [a_3]: 4.278e-05 [Cycle 2]: 0.0006104, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 7.3e-06 [loop_unroll]: 5.93002e-06 [a_1]: 0.0001155 [with_stream_mark]: 9.99999e-06 [recompute_prepare]: 6.29999e-06 [updatestate_depend_eliminate]: 3.11999e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.71999e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 7.381e-05 [accelerated_algorithm]: 6.56e-06 [shard]: 9.00007e-07 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.98998e-06 [merge_send_recv]: 4.52e-06 [auto_parallel]: 5.60001e-06 [parallel]: 4.58001e-06 [flash_sp]: 3.18e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.01001e-06 [matmul_add_comm_reduction]: 5.29e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.31e-06 [virtual_dataset]: 5.40999e-06 [get_grad_eliminate_]: 5.30999e-06 [virtual_output]: 5.25001e-06 [merge_forward]: 3.16001e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 8.53001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.034e-05 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 9.05999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.63e-06 [meta_fg_expand]: 1.89999e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 8.27e-06 [a_after_grad]: 7.73999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 1.02998e-06 [auto_monad_eliminator]: 6.14001e-06 [cse]: 1.344e-05 [a_3]: 3.307e-05 [py_interpret_to_execute_after_opt_a]: 8.10999e-06 [slice_cell_reuse_recomputed_activation]: 2.19001e-06 [rewriter_after_opt_a]: 3.278e-05 [convert_after_rewriter]: 7.03998e-06 [order_py_execute_after_rewriter]: 4.92999e-06 [mutable_eliminate]: 0.00046907 [opt_b]: 0.00018889, [1] [Cycle 1]: 0.00018297, [7] [b_1]: 0.00011115 [b_2]: 7.78001e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.72001e-06 [renormalize]: 4.39992e-07 [cse]: 1.749e-05 [optimize_parallel_all_gather_comm]: 1.628e-05 [overlap_param_gather]: 2.17001e-06 [cconv]: 2.43e-05 [loop_unroll]: 0.00042649 [opt_after_cconv]: 9.776e-05, [1] [Cycle 1]: 9.222e-05, [7] [c_1]: 2.666e-05 [parameter_eliminate]: 2.18998e-06 [updatestate_depend_eliminate]: 5.64998e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 2.39999e-06 [cse]: 1.776e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.535e-05 [tuple_transform]: 6.874e-05, [1] [Cycle 1]: 6.425e-05, [4] [d_1]: 3.746e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.80002e-06 [partial_unused_args_eliminate]: 2.07001e-06 [add_recomputation]: 4.363e-05 [cse_after_recomputation]: 2.195e-05, [1] [Cycle 1]: 1.71e-05, [1] [cse]: 1.157e-05 [environ_conv]: 5.07e-06 [swap_dp_allreduce_reducescatter]: 5.15001e-06 [bias_add_comm_swap]: 2.43e-06 [label_micro_interleaved_index]: 4.74e-06 [label_fine_grained_interleaved_index]: 2.93e-06 [merge_cast_opt]: 1.31998e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.06e-06 [assign_add_opt]: 1.62001e-06 [ForceFp32Comm]: 1.02e-06 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.27999e-06 [reorder_send_recv_between_fp_bp]: 3.01001e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.54e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.306e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 4.18999e-06 [overlap_recompute_and_grad_model_parallel]: 4.97999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.55001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.51998e-06 [overlap_grad_ring_attention]: 4.48001e-06 [overlap_grad_flash_sp]: 1.735e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.18998e-06 [split_layernorm_comm]: 1.66998e-06 [handle_group_info]: 1.17e-06 [symbol_engine_optimizer]: 7.207e-05, [1] [Cycle 1]: 6.776e-05, [6] [build]: 2.10002e-06 [elim_shapecalc]: 8.82e-06 [elim_not_effective]: 1.206e-05 [opt_reshape]: 6.53998e-06 [fold_const_symbol]: 9.34998e-06 [renormalize]: 1.99972e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.56998e-06 [auto_monad_reorder]: 1.582e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.58999e-06 [opt_after_jit_grad]: 0.00051933 [validate]: 3.333e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00601116 [execute]: 6.86999e-06 Sums bootstrap : 0.000421s : 2.64% type_inference : 0.005646s : 35.42% event_method : 0.000012s : 0.08% auto_monad : 0.000059s : 0.37% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000052s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000036s : 0.22% optimize.opt_a.loop_unroll : 0.000023s : 0.15% optimize.opt_a.a_1 : 0.000506s : 3.18% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000155s : 0.97% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000024s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000018s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000391s : 2.45% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000043s : 0.27% optimize.opt_a.a_3 : 0.000076s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000469s : 2.94% optimize.opt_b.b_1 : 0.000111s : 0.70% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.15% optimize.loop_unroll : 0.000426s : 2.68% optimize.opt_after_cconv.c_1 : 0.000027s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000018s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.10% optimize.tuple_transform.d_1 : 0.000037s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.27% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000002s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000519s : 3.26% validate : 0.000033s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006011s : 37.71% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000144 24 21.20% : 0.000031s : 4: substitution.arithmetic_simplify 1.32% : 0.000002s : 2: substitution.elim_not_effective 0.91% : 0.000001s : 2: substitution.fold_const_symbol 3.63% : 0.000005s : 3: substitution.graph_param_transform 65.38% : 0.000094s : 3: substitution.inline 2.31% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.17% : 0.000005s : 4: substitution.remove_not_recompute_node 2.09% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005604 2 91.37% : 0.005120s : 1: type_inference.infer 8.63% : 0.000484s : 1: type_inference.specialize ------[replace.] 0.000029 3 100.00% : 0.000029s : 3: replace.inline ------[match.] 0.000092 3 100.00% : 0.000092s : 3: match.inline ------[predicate.] 0.000152 815 0.88% : 0.000001s : 8: predicate.accumulaten_eliminater 1.13% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 6: predicate.addn_check_dump 0.94% : 0.000001s : 8: predicate.addn_zero_filter 0.77% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.26% : 0.000003s : 14: predicate.arithmetic_simplify 0.90% : 0.000001s : 8: predicate.cast_eliminate 0.68% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.24% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.79% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.46% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_depend_swap 1.79% : 0.000003s : 17: predicate.environ_get_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.15% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.28% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 6: predicate.float_environ_get_switch 0.90% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.78% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.73% : 0.000001s : 6: predicate.incorporate_call 0.61% : 0.000001s : 6: predicate.incorporate_call_switch 6.34% : 0.000010s : 37: predicate.inline 1.01% : 0.000002s : 6: predicate.inline_without_move 0.43% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.19% : 0.000002s : 6: predicate.less_batch_normalization 1.56% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.24% : 0.000003s : 22: predicate.load_eliminater 0.98% : 0.000001s : 3: predicate.loop_unroll_after_grad 1.98% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.18% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.39% : 0.000001s : 3: predicate.parallel_virtual_node 1.40% : 0.000002s : 11: predicate.partial_defer_inline 1.32% : 0.000002s : 11: predicate.partial_eliminate 0.83% : 0.000001s : 8: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.20% : 0.000002s : 8: predicate.reduce_eliminate 2.24% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.85% : 0.000001s : 6: predicate.remove_not_recompute_node 1.21% : 0.000002s : 14: predicate.replace_applicator 0.75% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.90% : 0.000001s : 8: predicate.reshape_eliminate 0.67% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 3: predicate.row_tensor_eliminate 0.88% : 0.000001s : 6: predicate.same_eliminate 0.55% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.18% : 0.000002s : 6: predicate.shard_identity_eliminate 0.80% : 0.000001s : 6: predicate.special_op_eliminate 0.89% : 0.000001s : 6: predicate.specialize_transform 0.96% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.25% : 0.000002s : 11: predicate.switch_defer_inline 1.85% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.61% : 0.000007s : 38: predicate.switch_simplify 0.94% : 0.000001s : 8: predicate.tile_eliminate 0.86% : 0.000001s : 8: predicate.transpose_eliminate 1.66% : 0.000003s : 14: predicate.tuple_list_convert_item_index_to_positive 1.75% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.40% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.52% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.61% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.16% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.93% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.74% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000276 7 30.36% : 0.000084s : 2: func_graph_cloner_run.FuncGraphClonerGraph 69.64% : 0.000193s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028308 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.60% : 0.003002s : 1: add_attr 10.58% : 0.002994s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.23% : 0.000065s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.57% : 0.000446s : 1: bootstrap 0.10% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.54% : 0.000435s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.69% : 0.000478s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.10% : 0.000876s : 78: opt.transform.opt_a 0.09% : 0.000025s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.31% : 0.002070s : 1: opt_a 0.36% : 0.000101s : 1: opt_after_cconv 1.87% : 0.000529s : 1: opt_after_jit_grad 0.68% : 0.000192s : 1: opt_b 14.01% : 0.003966s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000019s : 1: remove_dup_value 0.72% : 0.000204s : 1: renormalize.infer 0.64% : 0.000181s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000037s : 1: rewriter_after_opt_a 0.20% : 0.000056s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000075s : 1: symbol_engine_optimizer 21.27% : 0.006022s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 20.00% : 0.005660s : 1: type_inference 0.21% : 0.000061s : 1: validate TotalTime = 0.0205258, [24] [bootstrap]: 0.00046586 [type_inference]: 0.00578119 [event_method]: 1.474e-05 [auto_monad]: 6.121e-05 [graph_reusing]: 5.57999e-06 [inline]: 2.05002e-06 [add_attr]: 0.00319075, [1] [add_attr_with_inline]: 0.00318211, [1] [Cycle 1]: 5.057e-05, [2] [tag_attr]: 1.538e-05 [meta_addattr_fg_expand]: 4.60001e-06 [parallel-infer-symbol]: 3.26001e-06 [pre_auto_parallel]: 2.694e-05 [insert-virtual-dataset]: 2.92002e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.64998e-06 [optimize]: 0.00421563, [53] [py_interpret_to_execute]: 2.367e-05 [rewriter_before_opt_a]: 6.517e-05 [opt_a]: 0.00221796, [2] [Cycle 1]: 0.00159585, [45] [expand_dump_flag]: 2.79001e-06 [switch_simplify]: 3.337e-05 [loop_unroll]: 2.029e-05 [a_1]: 0.00044109 [with_stream_mark]: 1.485e-05 [recompute_prepare]: 7.73999e-06 [updatestate_depend_eliminate]: 3.82002e-06 [updatestate_assign_eliminate]: 3.95e-06 [updatestate_loads_eliminate]: 3.08998e-06 [parameter_eliminate]: 1.76e-06 [a_2]: 8.035e-05 [accelerated_algorithm]: 7.18998e-06 [shard]: 2.45997e-06 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 6.34001e-06 [merge_send_recv]: 8.84e-06 [auto_parallel]: 6.56999e-06 [parallel]: 1.876e-05 [flash_sp]: 7.26999e-06 [merge_comm]: 3.58999e-06 [allreduce_fusion]: 3.63e-06 [matmul_add_comm_reduction]: 9.58002e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.58999e-06 [virtual_dataset]: 6.03998e-06 [get_grad_eliminate_]: 5.79999e-06 [virtual_output]: 5.86998e-06 [merge_forward]: 4.05998e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.52001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.16e-05 [merge_recompute_call_nodes]: 1.56002e-06 [before_grad]: 1.125e-05 [set_forward_comm_id_for_comm_node_pass]: 3.71999e-06 [meta_fg_expand]: 2.99999e-06 [flash_sp_send_recv_attached]: 2.64999e-06 [receive_attached]: 2.07001e-06 [after_resolve]: 9.59e-06 [a_after_grad]: 8.82e-06 [renormalize]: 0.0004788 [add_forward_monad_depend]: 5.57001e-06 [auto_monad_grad]: 1.73002e-06 [auto_monad_eliminator]: 1.366e-05 [cse]: 2.91e-05 [a_3]: 4.212e-05 [Cycle 2]: 0.00061206, [45] [expand_dump_flag]: 8.10018e-07 [switch_simplify]: 7.77998e-06 [loop_unroll]: 5.86e-06 [a_1]: 0.00011636 [with_stream_mark]: 1.074e-05 [recompute_prepare]: 6.26998e-06 [updatestate_depend_eliminate]: 3.06999e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.84999e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 7.159e-05 [accelerated_algorithm]: 5.74999e-06 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.21002e-06 [shard_inline]: 5.82999e-06 [merge_send_recv]: 4.97e-06 [auto_parallel]: 5.84e-06 [parallel]: 4.11001e-06 [flash_sp]: 3.09001e-06 [merge_comm]: 3.11001e-06 [allreduce_fusion]: 3.24001e-06 [matmul_add_comm_reduction]: 5.00999e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 5.88002e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.20999e-06 [virtual_output]: 5.07999e-06 [merge_forward]: 2.71e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 6.52001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.061e-05 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 8.73001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.37002e-06 [meta_fg_expand]: 2.02001e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 9.49978e-07 [after_resolve]: 8.20999e-06 [a_after_grad]: 8.13001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.60001e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 6.59001e-06 [cse]: 1.42e-05 [a_3]: 3.274e-05 [py_interpret_to_execute_after_opt_a]: 9.31002e-06 [slice_cell_reuse_recomputed_activation]: 2.43e-06 [rewriter_after_opt_a]: 3.41e-05 [convert_after_rewriter]: 7.28e-06 [order_py_execute_after_rewriter]: 5.29e-06 [mutable_eliminate]: 0.00050253 [opt_b]: 0.00019307, [1] [Cycle 1]: 0.00018658, [7] [b_1]: 0.00011442 [b_2]: 7.26999e-06 [updatestate_depend_eliminate]: 5.77999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.40002e-06 [renormalize]: 4.89992e-07 [cse]: 1.807e-05 [optimize_parallel_all_gather_comm]: 1.664e-05 [overlap_param_gather]: 1.88002e-06 [cconv]: 2.354e-05 [loop_unroll]: 0.00046738 [opt_after_cconv]: 9.957e-05, [1] [Cycle 1]: 9.305e-05, [7] [c_1]: 2.703e-05 [parameter_eliminate]: 2.81e-06 [updatestate_depend_eliminate]: 5.94999e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.754e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.507e-05 [tuple_transform]: 7.069e-05, [1] [Cycle 1]: 6.555e-05, [4] [d_1]: 3.861e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 6.48998e-06 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 4.653e-05 [cse_after_recomputation]: 2.11e-05, [1] [Cycle 1]: 1.641e-05, [1] [cse]: 1.113e-05 [environ_conv]: 5.32001e-06 [swap_dp_allreduce_reducescatter]: 5.31002e-06 [bias_add_comm_swap]: 2.79999e-06 [label_micro_interleaved_index]: 4.53999e-06 [label_fine_grained_interleaved_index]: 2.81999e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.17001e-06 [micro_interleaved_order_control]: 2.47001e-06 [assign_add_opt]: 1.67001e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.39001e-06 [reorder_send_recv_between_fp_bp]: 3.04999e-06 [comm_op_add_attrs]: 1.04998e-06 [add_comm_op_reuse_tag]: 1.01002e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66998e-06 [control_data_broadcast_order]: 1.231e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.88001e-06 [overlap_recompute_and_grad_model_parallel]: 4.58999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.26002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.47001e-06 [overlap_grad_ring_attention]: 4.28001e-06 [overlap_grad_flash_sp]: 1.77e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.39999e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.30001e-06 [symbol_engine_optimizer]: 7.168e-05, [1] [Cycle 1]: 6.709e-05, [6] [build]: 2.55002e-06 [elim_shapecalc]: 9.07999e-06 [elim_not_effective]: 1.177e-05 [opt_reshape]: 6.23e-06 [fold_const_symbol]: 9.44e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.84e-06 [auto_monad_reorder]: 1.716e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.54002e-06 [opt_after_jit_grad]: 0.00046007 [validate]: 3.544e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00601877 [execute]: 6.69001e-06 Sums bootstrap : 0.000466s : 2.85% type_inference : 0.005781s : 35.42% event_method : 0.000015s : 0.09% auto_monad : 0.000061s : 0.38% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000024s : 0.15% optimize.rewriter_before_opt_a : 0.000065s : 0.40% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000557s : 3.42% optimize.opt_a.with_stream_mark : 0.000026s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000152s : 0.93% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000020s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000479s : 2.93% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000043s : 0.27% optimize.opt_a.a_3 : 0.000075s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000503s : 3.08% optimize.opt_b.b_1 : 0.000114s : 0.70% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.14% optimize.loop_unroll : 0.000467s : 2.86% optimize.opt_after_cconv.c_1 : 0.000027s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000460s : 2.82% validate : 0.000035s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006019s : 36.87% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000168 26 19.70% : 0.000033s : 5: substitution.arithmetic_simplify 1.22% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 3.36% : 0.000006s : 3: substitution.graph_param_transform 63.36% : 0.000107s : 3: substitution.inline 1.91% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.72% : 0.000005s : 4: substitution.remove_not_recompute_node 1.81% : 0.000003s : 2: substitution.replace_old_param 5.12% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005733 2 89.45% : 0.005128s : 1: type_inference.infer 10.55% : 0.000605s : 1: type_inference.specialize ------[replace.] 0.000036 4 78.19% : 0.000028s : 3: replace.inline 21.81% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000112 4 93.01% : 0.000104s : 3: match.inline 6.99% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 883 0.91% : 0.000001s : 9: predicate.accumulaten_eliminater 0.89% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 0.95% : 0.000002s : 9: predicate.addn_zero_filter 0.81% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 15: predicate.arithmetic_simplify 0.90% : 0.000001s : 9: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.91% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.04% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_depend_swap 1.77% : 0.000003s : 18: predicate.environ_get_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.31% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.36% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.71% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.22% : 0.000010s : 40: predicate.inline 0.90% : 0.000001s : 6: predicate.inline_without_move 0.38% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 6: predicate.less_batch_normalization 1.65% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 25: predicate.load_eliminater 0.96% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.12% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.55% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.02% : 0.000002s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.54% : 0.000001s : 3: predicate.parallel_virtual_node 1.62% : 0.000003s : 13: predicate.partial_defer_inline 1.44% : 0.000002s : 13: predicate.partial_eliminate 0.91% : 0.000001s : 9: predicate.print_const_string_wrapper 0.61% : 0.000001s : 6: predicate.reduce_all_const_elim 1.21% : 0.000002s : 9: predicate.reduce_eliminate 2.35% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 6: predicate.remove_not_recompute_node 1.33% : 0.000002s : 16: predicate.replace_applicator 0.61% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 0.90% : 0.000001s : 9: predicate.reshape_eliminate 0.56% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.76% : 0.000001s : 6: predicate.same_eliminate 0.48% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 6: predicate.shard_identity_eliminate 0.81% : 0.000001s : 6: predicate.special_op_eliminate 0.87% : 0.000001s : 6: predicate.specialize_transform 0.93% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.70% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 13: predicate.switch_defer_inline 1.97% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.14% : 0.000008s : 43: predicate.switch_simplify 0.94% : 0.000002s : 9: predicate.tile_eliminate 0.91% : 0.000001s : 9: predicate.transpose_eliminate 1.61% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.49% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.56% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.27% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 3: predicate.value_based_eliminate 0.69% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000412 8 51.12% : 0.000211s : 3: func_graph_cloner_run.FuncGraphClonerGraph 48.88% : 0.000202s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029498 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.83% : 0.003196s : 1: add_attr 10.80% : 0.003186s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000066s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.70% : 0.000501s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000477s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000512s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 3.15% : 0.000930s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.53% : 0.002221s : 1: opt_a 0.35% : 0.000103s : 1: opt_after_cconv 1.59% : 0.000470s : 1: opt_after_jit_grad 0.67% : 0.000197s : 1: opt_b 14.31% : 0.004220s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.09% : 0.000027s : 1: py_interpret_to_execute 0.05% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.84% : 0.000247s : 1: renormalize.infer 0.76% : 0.000224s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000038s : 1: rewriter_after_opt_a 0.23% : 0.000069s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000074s : 1: symbol_engine_optimizer 20.44% : 0.006029s : 1: task_emit 0.25% : 0.000074s : 1: tuple_transform 19.66% : 0.005800s : 1: type_inference 0.21% : 0.000063s : 1: validate TotalTime = 0.0395769, [24] [bootstrap]: 0.00052605 [type_inference]: 0.0117917 [event_method]: 4.565e-05 [auto_monad]: 0.00013078 [graph_reusing]: 8.1e-06 [inline]: 1.86e-06 [add_attr]: 0.00306704, [1] [add_attr_with_inline]: 0.00305886, [1] [Cycle 1]: 7.165e-05, [2] [tag_attr]: 3.454e-05 [meta_addattr_fg_expand]: 9.94999e-06 [parallel-infer-symbol]: 2.99001e-06 [pre_auto_parallel]: 4.902e-05 [insert-virtual-dataset]: 2.82002e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 2.36e-06 [pipeline_split]: 1.83002e-06 [optimize]: 0.0165892, [53] [py_interpret_to_execute]: 3.8e-05 [rewriter_before_opt_a]: 0.00015629 [opt_a]: 0.0143732, [3] [Cycle 1]: 0.0108623, [45] [expand_dump_flag]: 3.85e-06 [switch_simplify]: 7.622e-05 [loop_unroll]: 6.305e-05 [a_1]: 0.00143011 [with_stream_mark]: 2.364e-05 [recompute_prepare]: 2.214e-05 [updatestate_depend_eliminate]: 8.27998e-06 [updatestate_assign_eliminate]: 7.18e-06 [updatestate_loads_eliminate]: 6.82002e-06 [parameter_eliminate]: 2.59001e-06 [a_2]: 0.00024155 [accelerated_algorithm]: 3.145e-05 [shard]: 2.32001e-06 [meta_shard_fg_expand]: 3.5e-06 [shard_inline]: 1.584e-05 [merge_send_recv]: 1.651e-05 [auto_parallel]: 1.013e-05 [parallel]: 1.864e-05 [flash_sp]: 1.093e-05 [merge_comm]: 9.25999e-06 [allreduce_fusion]: 9.11002e-06 [matmul_add_comm_reduction]: 2.572e-05 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 1.822e-05 [virtual_dataset]: 1.553e-05 [get_grad_eliminate_]: 1.523e-05 [virtual_output]: 1.478e-05 [merge_forward]: 8.73001e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 1.701e-05 [cell_reuse_handle_not_recompute_node_pass]: 4.232e-05 [merge_recompute_call_nodes]: 1.57999e-06 [before_grad]: 2.875e-05 [set_forward_comm_id_for_comm_node_pass]: 9.79e-06 [meta_fg_expand]: 0.00144254 [flash_sp_send_recv_attached]: 3.98001e-06 [receive_attached]: 2.73e-06 [after_resolve]: 6.368e-05 [a_after_grad]: 8.863e-05 [renormalize]: 0.00614797 [add_forward_monad_depend]: 1.022e-05 [auto_monad_grad]: 6.27001e-06 [auto_monad_eliminator]: 5.221e-05 [cse]: 0.00019507 [a_3]: 0.00033408 [Cycle 2]: 0.00278733, [45] [expand_dump_flag]: 1.61998e-06 [switch_simplify]: 4.639e-05 [loop_unroll]: 4.237e-05 [a_1]: 0.00135549 [with_stream_mark]: 1.411e-05 [recompute_prepare]: 1.051e-05 [updatestate_depend_eliminate]: 4.38001e-06 [updatestate_assign_eliminate]: 3.14999e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.14e-06 [a_2]: 9.127e-05 [accelerated_algorithm]: 1.13e-05 [shard]: 1.49e-06 [meta_shard_fg_expand]: 2.19001e-06 [shard_inline]: 7.03998e-06 [merge_send_recv]: 6.59999e-06 [auto_parallel]: 7.35e-06 [parallel]: 5.99e-06 [flash_sp]: 3.58999e-06 [merge_comm]: 4.22998e-06 [allreduce_fusion]: 3.94002e-06 [matmul_add_comm_reduction]: 6.79001e-06 [allreduce_slice_to_reducescatter]: 8.89995e-07 [virtual_shard_identity]: 7.58999e-06 [virtual_dataset]: 6.48e-06 [get_grad_eliminate_]: 6.01e-06 [virtual_output]: 5.94999e-06 [merge_forward]: 4.22003e-06 [cell_reuse_recompute_pass]: 8.89995e-07 [offload_activation]: 8.28001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.328e-05 [merge_recompute_call_nodes]: 1.02998e-06 [before_grad]: 1.219e-05 [set_forward_comm_id_for_comm_node_pass]: 4.35999e-06 [meta_fg_expand]: 7.979e-05 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.34e-06 [after_resolve]: 1.282e-05 [a_after_grad]: 1.009e-05 [renormalize]: 0.00062991 [add_forward_monad_depend]: 4.70999e-06 [auto_monad_grad]: 1.84e-06 [auto_monad_eliminator]: 1.243e-05 [cse]: 2.201e-05 [a_3]: 4.832e-05 [Cycle 3]: 0.00070791, [45] [expand_dump_flag]: 1.12999e-06 [switch_simplify]: 8.08001e-06 [loop_unroll]: 6.63e-06 [a_1]: 0.00014827 [with_stream_mark]: 8.63001e-06 [recompute_prepare]: 6.89999e-06 [updatestate_depend_eliminate]: 3.8e-06 [updatestate_assign_eliminate]: 2.83e-06 [updatestate_loads_eliminate]: 2.56998e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 8.431e-05 [accelerated_algorithm]: 1.009e-05 [shard]: 8.89995e-07 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 6.69001e-06 [merge_send_recv]: 5.89999e-06 [auto_parallel]: 6.53e-06 [parallel]: 5.05001e-06 [flash_sp]: 1.02e-06 [merge_comm]: 3.72002e-06 [allreduce_fusion]: 3.36999e-06 [matmul_add_comm_reduction]: 6.07001e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 7.44002e-06 [virtual_dataset]: 6.24999e-06 [get_grad_eliminate_]: 6.25997e-06 [virtual_output]: 5.98002e-06 [merge_forward]: 3.83001e-06 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 7.72998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.268e-05 [merge_recompute_call_nodes]: 9.20001e-07 [before_grad]: 1.086e-05 [set_forward_comm_id_for_comm_node_pass]: 4.01001e-06 [meta_fg_expand]: 2.51e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 9.09989e-07 [after_resolve]: 1.007e-05 [a_after_grad]: 9.46e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.13001e-06 [auto_monad_grad]: 1.04998e-06 [auto_monad_eliminator]: 7.85e-06 [cse]: 1.596e-05 [a_3]: 6.021e-05 [py_interpret_to_execute_after_opt_a]: 1.239e-05 [slice_cell_reuse_recomputed_activation]: 2.17999e-06 [rewriter_after_opt_a]: 4.172e-05 [convert_after_rewriter]: 7.47998e-06 [order_py_execute_after_rewriter]: 5.64998e-06 [mutable_eliminate]: 0.00052635 [opt_b]: 0.00023223, [1] [Cycle 1]: 0.00022456, [7] [b_1]: 0.00014367 [b_2]: 8.89e-06 [updatestate_depend_eliminate]: 7.07002e-06 [updatestate_assign_eliminate]: 2.83003e-06 [updatestate_loads_eliminate]: 2.76e-06 [renormalize]: 4.60015e-07 [cse]: 2.233e-05 [optimize_parallel_all_gather_comm]: 1.891e-05 [overlap_param_gather]: 2.07001e-06 [cconv]: 2.296e-05 [loop_unroll]: 0.00043709 [opt_after_cconv]: 0.00011078, [1] [Cycle 1]: 0.00010457, [7] [c_1]: 3.278e-05 [parameter_eliminate]: 3.01999e-06 [updatestate_depend_eliminate]: 6.27001e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 3.18e-06 [cse]: 2.04e-05 [renormalize]: 4.59986e-07 [remove_dup_value]: 1.571e-05 [tuple_transform]: 8.133e-05, [1] [Cycle 1]: 7.603e-05, [4] [d_1]: 4.816e-05 [none_parameter_eliminate]: 1.46998e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 7.35998e-06 [partial_unused_args_eliminate]: 2.59001e-06 [add_recomputation]: 5.286e-05 [cse_after_recomputation]: 2.524e-05, [1] [Cycle 1]: 2.003e-05, [1] [cse]: 1.454e-05 [environ_conv]: 8.43001e-06 [swap_dp_allreduce_reducescatter]: 5.93998e-06 [bias_add_comm_swap]: 2.71e-06 [label_micro_interleaved_index]: 4.61002e-06 [label_fine_grained_interleaved_index]: 3.33e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.49001e-06 [micro_interleaved_order_control]: 2.31998e-06 [assign_add_opt]: 1.34998e-06 [ForceFp32Comm]: 7.99977e-07 [remove_cast_before_assign_add]: 1.27e-06 [full_micro_interleaved_order_control]: 2.26998e-06 [reorder_send_recv_between_fp_bp]: 2.90998e-06 [comm_op_add_attrs]: 1.45001e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.49998e-06 [interleave_parallel_branches]: 1.17e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74998e-06 [control_data_broadcast_order]: 1.393e-05 [grouped_pairwise_exchange_alltoall]: 1.76e-06 [offloading_packed_experts]: 4.47998e-06 [overlap_recompute_and_grad_model_parallel]: 5.41998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.41002e-06 [overlap_recompute_comm]: 2.48e-06 [overlap_grad_ring_attention]: 4.62e-06 [overlap_grad_flash_sp]: 2.097e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 2.14e-06 [handle_group_info]: 1.11002e-06 [symbol_engine_optimizer]: 8.571e-05, [1] [Cycle 1]: 8.116e-05, [6] [build]: 8.67e-06 [elim_shapecalc]: 1.089e-05 [elim_not_effective]: 1.438e-05 [opt_reshape]: 7.31999e-06 [fold_const_symbol]: 1.126e-05 [renormalize]: 2.50002e-07 [detach_backward]: 1.97999e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 2.032e-05 [get_jit_bprop_graph]: 1.24e-06 [rewriter_after_jit_bprop_graph]: 4.11001e-06 [opt_after_jit_grad]: 0.00048417 [validate]: 4.361e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.0065733 [execute]: 6.98e-06 Sums bootstrap : 0.000526s : 1.50% type_inference : 0.011792s : 33.52% event_method : 0.000046s : 0.13% auto_monad : 0.000131s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000038s : 0.11% optimize.rewriter_before_opt_a : 0.000156s : 0.44% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000131s : 0.37% optimize.opt_a.loop_unroll : 0.000112s : 0.32% optimize.opt_a.a_1 : 0.002934s : 8.34% optimize.opt_a.with_stream_mark : 0.000046s : 0.13% optimize.opt_a.recompute_prepare : 0.000040s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000013s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.03% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000417s : 1.19% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.15% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000030s : 0.08% optimize.opt_a.merge_send_recv : 0.000029s : 0.08% optimize.opt_a.auto_parallel : 0.000024s : 0.07% optimize.opt_a.parallel : 0.000030s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.04% optimize.opt_a.merge_comm : 0.000017s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000039s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000033s : 0.09% optimize.opt_a.virtual_dataset : 0.000028s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000027s : 0.08% optimize.opt_a.virtual_output : 0.000027s : 0.08% optimize.opt_a.merge_forward : 0.000017s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000033s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000068s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000052s : 0.15% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.05% optimize.opt_a.meta_fg_expand : 0.001525s : 4.34% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000087s : 0.25% optimize.opt_a.a_after_grad : 0.000108s : 0.31% optimize.opt_a.renormalize : 0.006778s : 19.27% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.05% optimize.opt_a.auto_monad_grad : 0.000009s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000072s : 0.21% optimize.opt_a.cse : 0.000233s : 0.66% optimize.opt_a.a_3 : 0.000443s : 1.26% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000042s : 0.12% optimize.convert_after_rewriter : 0.000007s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000526s : 1.50% optimize.opt_b.b_1 : 0.000144s : 0.41% optimize.opt_b.b_2 : 0.000009s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000022s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.07% optimize.loop_unroll : 0.000437s : 1.24% optimize.opt_after_cconv.c_1 : 0.000033s : 0.09% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000020s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.04% optimize.tuple_transform.d_1 : 0.000048s : 0.14% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.02% optimize.partial_unused_args_eliminate : 0.000003s : 0.01% optimize.add_recomputation : 0.000053s : 0.15% optimize.cse_after_recomputation.cse : 0.000015s : 0.04% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000021s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000020s : 0.06% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000484s : 1.38% validate : 0.000044s : 0.12% backend_pass : 0.000001s : 0.00% task_emit : 0.006573s : 18.69% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000710 161 7.33% : 0.000052s : 8: substitution.arithmetic_simplify 0.32% : 0.000002s : 3: substitution.elim_not_effective 0.58% : 0.000004s : 5: substitution.float_depend_g_call 0.62% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 3: substitution.fold_const_symbol 0.85% : 0.000006s : 4: substitution.graph_param_transform 0.43% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 57.52% : 0.000408s : 17: substitution.inline 2.33% : 0.000017s : 2: substitution.inline_without_move 1.43% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.25% : 0.000016s : 3: substitution.less_batch_normalization 1.55% : 0.000011s : 7: substitution.minmaximum_grad 0.83% : 0.000006s : 5: substitution.partial_eliminate 1.72% : 0.000012s : 15: substitution.remove_not_recompute_node 3.88% : 0.000028s : 10: substitution.replace_applicator 1.39% : 0.000010s : 10: substitution.replace_old_param 0.40% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.01% : 0.000021s : 7: substitution.tuple_list_convert_item_index_to_positive 1.58% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.92% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.48% : 0.000053s : 19: substitution.tuple_list_get_item_eliminator 2.05% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011715 2 86.78% : 0.010166s : 1: type_inference.infer 13.22% : 0.001549s : 1: type_inference.specialize ------[replace.] 0.000203 27 64.20% : 0.000130s : 17: replace.inline 35.80% : 0.000073s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000425 27 93.71% : 0.000399s : 17: match.inline 6.29% : 0.000027s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000699 4248 1.17% : 0.000008s : 53: predicate.accumulaten_eliminater 0.21% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.45% : 0.000003s : 21: predicate.addn_check_dump 1.15% : 0.000008s : 53: predicate.addn_zero_filter 1.11% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 1.98% : 0.000014s : 74: predicate.arithmetic_simplify 1.15% : 0.000008s : 53: predicate.cast_eliminate 1.10% : 0.000008s : 50: predicate.check_bprop_eliminate 0.45% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.45% : 0.000003s : 21: predicate.depend_value_elim 1.18% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.21% : 0.000008s : 53: predicate.dict_get_item_eliminator 1.19% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000000s : 4: predicate.elim_not_effective 0.15% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.22% : 0.000009s : 57: predicate.environ_get_add_eliminate 1.19% : 0.000008s : 57: predicate.environ_get_depend_swap 1.67% : 0.000012s : 78: predicate.environ_get_eliminate 1.19% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.83% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.56% : 0.000018s : 80: predicate.float_depend_g_call 0.45% : 0.000003s : 21: predicate.float_environ_get_switch 0.58% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.53% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.50% : 0.000003s : 21: predicate.incorporate_call 0.45% : 0.000003s : 21: predicate.incorporate_call_switch 5.92% : 0.000041s : 183: predicate.inline 1.41% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.60% : 0.000004s : 21: predicate.less_batch_normalization 1.56% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.64% : 0.000018s : 124: predicate.load_eliminater 0.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.50% : 0.000017s : 113: predicate.loop_unroll_before_grad 1.37% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.47% : 0.000003s : 21: predicate.merge_addn 1.10% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.08% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 53: predicate.minmaximum_grad 0.30% : 0.000002s : 4: predicate.mutable_eliminate 0.10% : 0.000001s : 4: predicate.opt_reshape 0.11% : 0.000001s : 4: predicate.parallel_virtual_node 2.07% : 0.000014s : 80: predicate.partial_defer_inline 1.74% : 0.000012s : 67: predicate.partial_eliminate 1.19% : 0.000008s : 53: predicate.print_const_string_wrapper 0.54% : 0.000004s : 21: predicate.reduce_all_const_elim 1.39% : 0.000010s : 53: predicate.reduce_eliminate 2.63% : 0.000018s : 124: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000002s : 21: predicate.remove_not_recompute_node 1.89% : 0.000013s : 113: predicate.replace_applicator 0.67% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.13% : 0.000008s : 53: predicate.reshape_eliminate 1.08% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 4: predicate.row_tensor_eliminate 1.21% : 0.000008s : 50: predicate.same_eliminate 0.35% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.57% : 0.000004s : 21: predicate.shard_identity_eliminate 0.22% : 0.000002s : 8: predicate.special_op_eliminate 0.59% : 0.000004s : 21: predicate.specialize_transform 1.26% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.12% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.96% : 0.000014s : 80: predicate.switch_defer_inline 3.06% : 0.000021s : 130: predicate.switch_layer_defer_inline 5.24% : 0.000037s : 218: predicate.switch_simplify 1.12% : 0.000008s : 53: predicate.tile_eliminate 1.11% : 0.000008s : 53: predicate.transpose_eliminate 1.48% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000009s : 61: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000020s : 92: predicate.tuple_list_get_item_eliminator 1.48% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 1.94% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.56% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.62% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.18% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 4: predicate.value_based_eliminate 0.51% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.48% : 0.000003s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001741 36 59.97% : 0.001044s : 15: func_graph_cloner_run.FuncGraphClonerGraph 40.03% : 0.000697s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.070666 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.35% : 0.003072s : 1: add_attr 4.33% : 0.003063s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000057s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000138s : 1: auto_monad 0.03% : 0.000024s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.80% : 0.000565s : 1: bootstrap 0.04% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.02% : 0.000011s : 1: convert_after_rewriter 0.04% : 0.000028s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.07% : 0.000053s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.63% : 0.000446s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000537s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 6.30% : 0.004453s : 117: opt.transform.opt_a 0.04% : 0.000031s : 1: opt.transform.opt_after_cconv 0.03% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.17% : 0.000121s : 28: opt.transform.opt_b 0.08% : 0.000053s : 2: opt.transform.opt_trans_graph 0.06% : 0.000040s : 4: opt.transform.symbol_engine_opt 20.34% : 0.014376s : 1: opt_a 0.16% : 0.000114s : 1: opt_after_cconv 0.70% : 0.000495s : 1: opt_after_jit_grad 0.33% : 0.000236s : 1: opt_b 23.48% : 0.016593s : 1: optimize 0.03% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000006s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000054s : 1: pre_auto_parallel 0.06% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 7.45% : 0.005265s : 2: renormalize.infer 2.12% : 0.001498s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000046s : 1: rewriter_after_opt_a 0.23% : 0.000161s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.13% : 0.000089s : 1: symbol_engine_optimizer 9.32% : 0.006585s : 1: task_emit 0.12% : 0.000084s : 1: tuple_transform 16.71% : 0.011808s : 1: type_inference 0.11% : 0.000075s : 1: validate TotalTime = 0.0198196, [24] [bootstrap]: 0.00040817 [type_inference]: 0.00559152 [event_method]: 1.276e-05 [auto_monad]: 5.819e-05 [graph_reusing]: 5.49e-06 [inline]: 1.81998e-06 [add_attr]: 0.00301954, [1] [add_attr_with_inline]: 0.00301201, [1] [Cycle 1]: 5.184e-05, [2] [tag_attr]: 1.378e-05 [meta_addattr_fg_expand]: 4.04997e-06 [parallel-infer-symbol]: 3.16999e-06 [pre_auto_parallel]: 2.349e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 2.34001e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00393832, [53] [py_interpret_to_execute]: 1.831e-05 [rewriter_before_opt_a]: 5.076e-05 [opt_a]: 0.00204234, [2] [Cycle 1]: 0.00142337, [45] [expand_dump_flag]: 2.86999e-06 [switch_simplify]: 2.899e-05 [loop_unroll]: 1.687e-05 [a_1]: 0.00034102 [with_stream_mark]: 1.381e-05 [recompute_prepare]: 8.09002e-06 [updatestate_depend_eliminate]: 3.75e-06 [updatestate_assign_eliminate]: 3.53e-06 [updatestate_loads_eliminate]: 3.13998e-06 [parameter_eliminate]: 1.87999e-06 [a_2]: 8.174e-05 [accelerated_algorithm]: 6.66e-06 [shard]: 2.21e-06 [meta_shard_fg_expand]: 1.92001e-06 [shard_inline]: 6.31e-06 [merge_send_recv]: 9.17999e-06 [auto_parallel]: 7.3e-06 [parallel]: 1.855e-05 [flash_sp]: 7.61999e-06 [merge_comm]: 4e-06 [allreduce_fusion]: 3.73999e-06 [matmul_add_comm_reduction]: 9.19e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 7.22997e-06 [virtual_dataset]: 6.21998e-06 [get_grad_eliminate_]: 6.09001e-06 [virtual_output]: 6.02999e-06 [merge_forward]: 4.48001e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 9.71998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.22e-05 [merge_recompute_call_nodes]: 1.60999e-06 [before_grad]: 1.077e-05 [set_forward_comm_id_for_comm_node_pass]: 4.34002e-06 [meta_fg_expand]: 3.01999e-06 [flash_sp_send_recv_attached]: 2.82002e-06 [receive_attached]: 2.08002e-06 [after_resolve]: 1.064e-05 [a_after_grad]: 9.78002e-06 [renormalize]: 0.00040216 [add_forward_monad_depend]: 4.39998e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.473e-05 [cse]: 3.045e-05 [a_3]: 4.71e-05 [Cycle 2]: 0.00060878, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 7.9e-06 [loop_unroll]: 6.14001e-06 [a_1]: 0.00011651 [with_stream_mark]: 1.004e-05 [recompute_prepare]: 5.91e-06 [updatestate_depend_eliminate]: 2.98e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 7.141e-05 [accelerated_algorithm]: 6.00002e-06 [shard]: 1.06997e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.51e-06 [parallel]: 4.4e-06 [flash_sp]: 3.97e-06 [merge_comm]: 3.28e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 5.15999e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 6.33e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.24e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 2.63998e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 5.70001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.045e-05 [merge_recompute_call_nodes]: 7.80012e-07 [before_grad]: 8.85001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.68e-06 [meta_fg_expand]: 1.86e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.52998e-06 [a_after_grad]: 8.10999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 8.00006e-07 [auto_monad_eliminator]: 6.37001e-06 [cse]: 1.387e-05 [a_3]: 3.275e-05 [py_interpret_to_execute_after_opt_a]: 7.83999e-06 [slice_cell_reuse_recomputed_activation]: 2.36998e-06 [rewriter_after_opt_a]: 3.469e-05 [convert_after_rewriter]: 6.84999e-06 [order_py_execute_after_rewriter]: 5.27999e-06 [mutable_eliminate]: 0.0004623 [opt_b]: 0.00019091, [1] [Cycle 1]: 0.00018404, [7] [b_1]: 0.00011189 [b_2]: 7.56999e-06 [updatestate_depend_eliminate]: 5.36002e-06 [updatestate_assign_eliminate]: 2.88e-06 [updatestate_loads_eliminate]: 2.48002e-06 [renormalize]: 3.80009e-07 [cse]: 1.72e-05 [optimize_parallel_all_gather_comm]: 1.641e-05 [overlap_param_gather]: 1.72001e-06 [cconv]: 2.263e-05 [loop_unroll]: 0.00042411 [opt_after_cconv]: 0.00010914, [1] [Cycle 1]: 0.00010319, [7] [c_1]: 3.773e-05 [parameter_eliminate]: 2.42001e-06 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.41e-06 [cse]: 1.702e-05 [renormalize]: 4.2998e-07 [remove_dup_value]: 1.477e-05 [tuple_transform]: 7.035e-05, [1] [Cycle 1]: 6.534e-05, [4] [d_1]: 3.755e-05 [none_parameter_eliminate]: 1.79e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.68998e-06 [partial_unused_args_eliminate]: 1.92999e-06 [add_recomputation]: 4.443e-05 [cse_after_recomputation]: 2.128e-05, [1] [Cycle 1]: 1.622e-05, [1] [cse]: 1.089e-05 [environ_conv]: 4.77e-06 [swap_dp_allreduce_reducescatter]: 5.14e-06 [bias_add_comm_swap]: 2.63e-06 [label_micro_interleaved_index]: 4.50001e-06 [label_fine_grained_interleaved_index]: 2.79999e-06 [merge_cast_opt]: 1.67001e-06 [slice_recompute_activation]: 2.21e-06 [micro_interleaved_order_control]: 2.74001e-06 [assign_add_opt]: 1.37999e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.15999e-06 [full_micro_interleaved_order_control]: 2.43e-06 [reorder_send_recv_between_fp_bp]: 3.06001e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 1.06002e-06 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.17e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.80001e-06 [control_data_broadcast_order]: 1.238e-05 [grouped_pairwise_exchange_alltoall]: 2.13002e-06 [offloading_packed_experts]: 3.85998e-06 [overlap_recompute_and_grad_model_parallel]: 4.47e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.76e-06 [overlap_grad_ring_attention]: 4.62e-06 [overlap_grad_flash_sp]: 1.705e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 7.232e-05, [1] [Cycle 1]: 6.785e-05, [6] [build]: 2.71999e-06 [elim_shapecalc]: 8.94e-06 [elim_not_effective]: 1.166e-05 [opt_reshape]: 6.34999e-06 [fold_const_symbol]: 9.59999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 1.644e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.48999e-06 [opt_after_jit_grad]: 0.00045452 [validate]: 3.554e-05 [backend_pass]: 1.15001e-06 [task_emit]: 0.00603412 [execute]: 6.78e-06 Sums bootstrap : 0.000408s : 2.58% type_inference : 0.005592s : 35.40% event_method : 0.000013s : 0.08% auto_monad : 0.000058s : 0.37% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000018s : 0.12% optimize.rewriter_before_opt_a : 0.000051s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000037s : 0.23% optimize.opt_a.loop_unroll : 0.000023s : 0.15% optimize.opt_a.a_1 : 0.000458s : 2.90% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000153s : 0.97% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000014s : 0.09% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.15% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000020s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000018s : 0.11% optimize.opt_a.renormalize : 0.000402s : 2.55% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000044s : 0.28% optimize.opt_a.a_3 : 0.000080s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000035s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000462s : 2.93% optimize.opt_b.b_1 : 0.000112s : 0.71% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000424s : 2.68% optimize.opt_after_cconv.c_1 : 0.000038s : 0.24% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000455s : 2.88% validate : 0.000036s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006034s : 38.20% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000130 24 21.88% : 0.000028s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.28% : 0.000006s : 3: substitution.graph_param_transform 62.67% : 0.000081s : 3: substitution.inline 2.44% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.52% : 0.000005s : 4: substitution.remove_not_recompute_node 2.67% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005550 2 91.90% : 0.005101s : 1: type_inference.infer 8.10% : 0.000450s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000079 3 100.00% : 0.000079s : 3: match.inline ------[predicate.] 0.000149 815 1.00% : 0.000001s : 8: predicate.accumulaten_eliminater 0.89% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 6: predicate.addn_check_dump 0.95% : 0.000001s : 8: predicate.addn_zero_filter 0.79% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.48% : 0.000004s : 14: predicate.arithmetic_simplify 0.84% : 0.000001s : 8: predicate.cast_eliminate 0.73% : 0.000001s : 6: predicate.check_bprop_eliminate 0.67% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.67% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.03% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.42% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.28% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_depend_swap 2.00% : 0.000003s : 17: predicate.environ_get_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.14% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.22% : 0.000003s : 11: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.87% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.28% : 0.000000s : 3: predicate.graph_param_transform 0.71% : 0.000001s : 6: predicate.incorporate_call 0.63% : 0.000001s : 6: predicate.incorporate_call_switch 6.29% : 0.000009s : 37: predicate.inline 1.08% : 0.000002s : 6: predicate.inline_without_move 0.46% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 6: predicate.less_batch_normalization 1.61% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.24% : 0.000003s : 22: predicate.load_eliminater 1.28% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.00% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 6: predicate.merge_addn 0.68% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.82% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.18% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.39% : 0.000001s : 3: predicate.parallel_virtual_node 1.48% : 0.000002s : 11: predicate.partial_defer_inline 1.35% : 0.000002s : 11: predicate.partial_eliminate 0.85% : 0.000001s : 8: predicate.print_const_string_wrapper 0.64% : 0.000001s : 6: predicate.reduce_all_const_elim 1.18% : 0.000002s : 8: predicate.reduce_eliminate 2.18% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.58% : 0.000001s : 6: predicate.remove_not_recompute_node 1.28% : 0.000002s : 14: predicate.replace_applicator 0.79% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 0.87% : 0.000001s : 8: predicate.reshape_eliminate 0.93% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.91% : 0.000001s : 6: predicate.same_eliminate 0.50% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 6: predicate.shard_identity_eliminate 0.75% : 0.000001s : 6: predicate.special_op_eliminate 0.92% : 0.000001s : 6: predicate.specialize_transform 1.04% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.22% : 0.000002s : 11: predicate.switch_defer_inline 1.94% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.79% : 0.000007s : 38: predicate.switch_simplify 0.85% : 0.000001s : 8: predicate.tile_eliminate 0.83% : 0.000001s : 8: predicate.transpose_eliminate 1.56% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.77% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.04% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.88% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 3: predicate.value_based_eliminate 0.76% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000261 7 36.61% : 0.000096s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.39% : 0.000166s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028173 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.73% : 0.003024s : 1: add_attr 10.70% : 0.003015s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.23% : 0.000064s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.56% : 0.000440s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.54% : 0.000433s : 1: loop_unroll 0.02% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.67% : 0.000471s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.95% : 0.000831s : 78: opt.transform.opt_a 0.13% : 0.000036s : 1: opt.transform.opt_after_cconv 0.07% : 0.000020s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.26% : 0.002045s : 1: opt_a 0.40% : 0.000113s : 1: opt_after_cconv 1.65% : 0.000464s : 1: opt_after_jit_grad 0.69% : 0.000194s : 1: opt_b 13.99% : 0.003942s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.08% : 0.000022s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.74% : 0.000209s : 1: renormalize.infer 0.66% : 0.000187s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000038s : 1: rewriter_after_opt_a 0.19% : 0.000055s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000075s : 1: symbol_engine_optimizer 21.45% : 0.006044s : 1: task_emit 0.26% : 0.000073s : 1: tuple_transform 19.90% : 0.005606s : 1: type_inference 0.22% : 0.000062s : 1: validate TotalTime = 0.0380812, [24] [bootstrap]: 0.00042811 [type_inference]: 0.0114235 [event_method]: 4.242e-05 [auto_monad]: 0.0001285 [graph_reusing]: 8.59e-06 [inline]: 1.94999e-06 [add_attr]: 0.00304078, [1] [add_attr_with_inline]: 0.00303156, [1] [Cycle 1]: 7.101e-05, [2] [tag_attr]: 3.209e-05 [meta_addattr_fg_expand]: 9.90002e-06 [parallel-infer-symbol]: 3.06001e-06 [pre_auto_parallel]: 4.711e-05 [insert-virtual-dataset]: 2.72001e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 2.31e-06 [pipeline_split]: 1.89e-06 [optimize]: 0.0158548, [53] [py_interpret_to_execute]: 3.468e-05 [rewriter_before_opt_a]: 0.00014204 [opt_a]: 0.0137649, [3] [Cycle 1]: 0.0104438, [45] [expand_dump_flag]: 4.48999e-06 [switch_simplify]: 7.249e-05 [loop_unroll]: 6.005e-05 [a_1]: 0.00137031 [with_stream_mark]: 2.376e-05 [recompute_prepare]: 2.197e-05 [updatestate_depend_eliminate]: 8.77e-06 [updatestate_assign_eliminate]: 7.45e-06 [updatestate_loads_eliminate]: 6.93e-06 [parameter_eliminate]: 3.01999e-06 [a_2]: 0.00024172 [accelerated_algorithm]: 3.068e-05 [shard]: 1.89e-06 [meta_shard_fg_expand]: 3.52997e-06 [shard_inline]: 1.603e-05 [merge_send_recv]: 1.586e-05 [auto_parallel]: 1.018e-05 [parallel]: 1.754e-05 [flash_sp]: 1.122e-05 [merge_comm]: 9.44e-06 [allreduce_fusion]: 8.82e-06 [matmul_add_comm_reduction]: 2.536e-05 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 1.807e-05 [virtual_dataset]: 1.574e-05 [get_grad_eliminate_]: 1.494e-05 [virtual_output]: 1.494e-05 [merge_forward]: 8.85999e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 1.812e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.957e-05 [merge_recompute_call_nodes]: 2.01e-06 [before_grad]: 2.83e-05 [set_forward_comm_id_for_comm_node_pass]: 9.72999e-06 [meta_fg_expand]: 0.00142682 [flash_sp_send_recv_attached]: 4.03001e-06 [receive_attached]: 2.17001e-06 [after_resolve]: 6.303e-05 [a_after_grad]: 8.82e-05 [renormalize]: 0.00586907 [add_forward_monad_depend]: 9.14e-06 [auto_monad_grad]: 5.64e-06 [auto_monad_eliminator]: 5.135e-05 [cse]: 0.00017445 [a_3]: 0.00032807 [Cycle 2]: 0.00262927, [45] [expand_dump_flag]: 1.50999e-06 [switch_simplify]: 4.45e-05 [loop_unroll]: 4.148e-05 [a_1]: 0.00130096 [with_stream_mark]: 1.054e-05 [recompute_prepare]: 8.89e-06 [updatestate_depend_eliminate]: 4.05e-06 [updatestate_assign_eliminate]: 2.95998e-06 [updatestate_loads_eliminate]: 2.53998e-06 [parameter_eliminate]: 1.03001e-06 [a_2]: 8.756e-05 [accelerated_algorithm]: 1.063e-05 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 6.81001e-06 [merge_send_recv]: 5.66003e-06 [auto_parallel]: 6.64999e-06 [parallel]: 5.00999e-06 [flash_sp]: 3.51001e-06 [merge_comm]: 3.94002e-06 [allreduce_fusion]: 3.55e-06 [matmul_add_comm_reduction]: 6.07001e-06 [allreduce_slice_to_reducescatter]: 4.2998e-07 [virtual_shard_identity]: 7.67002e-06 [virtual_dataset]: 7.03e-06 [get_grad_eliminate_]: 6.74001e-06 [virtual_output]: 6.35002e-06 [merge_forward]: 3.48e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 7.44002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.287e-05 [merge_recompute_call_nodes]: 8.50006e-07 [before_grad]: 1.065e-05 [set_forward_comm_id_for_comm_node_pass]: 4.09002e-06 [meta_fg_expand]: 5.316e-05 [flash_sp_send_recv_attached]: 9.90025e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.077e-05 [a_after_grad]: 9.67999e-06 [renormalize]: 0.00055122 [add_forward_monad_depend]: 4.33999e-06 [auto_monad_grad]: 1.15001e-06 [auto_monad_eliminator]: 1.074e-05 [cse]: 2.014e-05 [a_3]: 9.438e-05 [Cycle 3]: 0.00067794, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 8.3e-06 [loop_unroll]: 6.48998e-06 [a_1]: 0.00014745 [with_stream_mark]: 8.44002e-06 [recompute_prepare]: 7.01001e-06 [updatestate_depend_eliminate]: 3.8e-06 [updatestate_assign_eliminate]: 2.71e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 8.416e-05 [accelerated_algorithm]: 9.56998e-06 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.29003e-06 [shard_inline]: 6.82002e-06 [merge_send_recv]: 5.31002e-06 [auto_parallel]: 6.09999e-06 [parallel]: 4.57e-06 [flash_sp]: 1.00999e-06 [merge_comm]: 3.98001e-06 [allreduce_fusion]: 3.61001e-06 [matmul_add_comm_reduction]: 5.86e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 7.71999e-06 [virtual_dataset]: 6.44001e-06 [get_grad_eliminate_]: 6.28e-06 [virtual_output]: 6.12001e-06 [merge_forward]: 3.21999e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 6.79999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.359e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 1.088e-05 [set_forward_comm_id_for_comm_node_pass]: 3.75e-06 [meta_fg_expand]: 2.16998e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 8.89995e-07 [after_resolve]: 8.61002e-06 [a_after_grad]: 9.42001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 7.45998e-06 [cse]: 1.699e-05 [a_3]: 3.885e-05 [py_interpret_to_execute_after_opt_a]: 9.25001e-06 [slice_cell_reuse_recomputed_activation]: 2.43002e-06 [rewriter_after_opt_a]: 3.981e-05 [convert_after_rewriter]: 7.38999e-06 [order_py_execute_after_rewriter]: 5.37999e-06 [mutable_eliminate]: 0.00047497 [opt_b]: 0.00022016, [1] [Cycle 1]: 0.00021357, [7] [b_1]: 0.00012976 [b_2]: 1.409e-05 [updatestate_depend_eliminate]: 6.02999e-06 [updatestate_assign_eliminate]: 2.93e-06 [updatestate_loads_eliminate]: 2.92002e-06 [renormalize]: 5.19998e-07 [cse]: 2.111e-05 [optimize_parallel_all_gather_comm]: 1.729e-05 [overlap_param_gather]: 2.21e-06 [cconv]: 1.97e-05 [loop_unroll]: 0.00043156 [opt_after_cconv]: 0.00010716, [1] [Cycle 1]: 0.00010144, [7] [c_1]: 3.173e-05 [parameter_eliminate]: 2.61e-06 [updatestate_depend_eliminate]: 5.76e-06 [updatestate_assign_eliminate]: 3.07002e-06 [updatestate_loads_eliminate]: 2.98e-06 [cse]: 2.059e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.595e-05 [tuple_transform]: 7.684e-05, [1] [Cycle 1]: 7.195e-05, [4] [d_1]: 4.476e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 7.09001e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 4.881e-05 [cse_after_recomputation]: 2.367e-05, [1] [Cycle 1]: 1.918e-05, [1] [cse]: 1.385e-05 [environ_conv]: 7.56001e-06 [swap_dp_allreduce_reducescatter]: 5.84e-06 [bias_add_comm_swap]: 2.79999e-06 [label_micro_interleaved_index]: 4.35999e-06 [label_fine_grained_interleaved_index]: 2.74999e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.28998e-06 [micro_interleaved_order_control]: 2.17999e-06 [assign_add_opt]: 1.64998e-06 [ForceFp32Comm]: 1.10001e-06 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.44001e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.40999e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.04003e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91998e-06 [control_data_broadcast_order]: 1.366e-05 [grouped_pairwise_exchange_alltoall]: 1.99e-06 [offloading_packed_experts]: 4.28999e-06 [overlap_recompute_and_grad_model_parallel]: 4.93001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.38002e-06 [overlap_recompute_comm]: 2.09e-06 [overlap_grad_ring_attention]: 4.35e-06 [overlap_grad_flash_sp]: 2.019e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.31998e-06 [split_layernorm_comm]: 2.01e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 8.433e-05, [1] [Cycle 1]: 7.981e-05, [6] [build]: 8.51002e-06 [elim_shapecalc]: 1.035e-05 [elim_not_effective]: 1.396e-05 [opt_reshape]: 7.11999e-06 [fold_const_symbol]: 1.104e-05 [renormalize]: 1.90019e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.50001e-06 [auto_monad_reorder]: 1.976e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.25e-06 [opt_after_jit_grad]: 0.00046601 [validate]: 3.969e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.00632561 [execute]: 6.88e-06 Sums bootstrap : 0.000428s : 1.27% type_inference : 0.011424s : 33.86% event_method : 0.000042s : 0.13% auto_monad : 0.000129s : 0.38% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000047s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000035s : 0.10% optimize.rewriter_before_opt_a : 0.000142s : 0.42% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000125s : 0.37% optimize.opt_a.loop_unroll : 0.000108s : 0.32% optimize.opt_a.a_1 : 0.002819s : 8.35% optimize.opt_a.with_stream_mark : 0.000043s : 0.13% optimize.opt_a.recompute_prepare : 0.000038s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000013s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000413s : 1.23% optimize.opt_a.accelerated_algorithm : 0.000051s : 0.15% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000006s : 0.02% optimize.opt_a.shard_inline : 0.000030s : 0.09% optimize.opt_a.merge_send_recv : 0.000027s : 0.08% optimize.opt_a.auto_parallel : 0.000023s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000017s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000037s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000033s : 0.10% optimize.opt_a.virtual_dataset : 0.000029s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.08% optimize.opt_a.virtual_output : 0.000027s : 0.08% optimize.opt_a.merge_forward : 0.000016s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000032s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000056s : 0.17% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000050s : 0.15% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.05% optimize.opt_a.meta_fg_expand : 0.001482s : 4.39% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000082s : 0.24% optimize.opt_a.a_after_grad : 0.000107s : 0.32% optimize.opt_a.renormalize : 0.006420s : 19.03% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000070s : 0.21% optimize.opt_a.cse : 0.000212s : 0.63% optimize.opt_a.a_3 : 0.000461s : 1.37% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000040s : 0.12% optimize.convert_after_rewriter : 0.000007s : 0.02% optimize.order_py_execute_after_rewriter : 0.000005s : 0.02% optimize.mutable_eliminate : 0.000475s : 1.41% optimize.opt_b.b_1 : 0.000130s : 0.38% optimize.opt_b.b_2 : 0.000014s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000021s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000432s : 1.28% optimize.opt_after_cconv.c_1 : 0.000032s : 0.09% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000021s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.05% optimize.tuple_transform.d_1 : 0.000045s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.14% optimize.cse_after_recomputation.cse : 0.000014s : 0.04% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000020s : 0.06% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000466s : 1.38% validate : 0.000040s : 0.12% backend_pass : 0.000001s : 0.00% task_emit : 0.006326s : 18.75% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000676 159 6.67% : 0.000045s : 7: substitution.arithmetic_simplify 0.35% : 0.000002s : 3: substitution.elim_not_effective 0.63% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 3: substitution.fold_const_symbol 0.96% : 0.000006s : 4: substitution.graph_param_transform 0.46% : 0.000003s : 2: substitution.incorporate_call 0.36% : 0.000002s : 2: substitution.incorporate_call_switch 57.86% : 0.000391s : 17: substitution.inline 2.48% : 0.000017s : 2: substitution.inline_without_move 1.45% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.29% : 0.000015s : 3: substitution.less_batch_normalization 1.50% : 0.000010s : 7: substitution.minmaximum_grad 0.87% : 0.000006s : 5: substitution.partial_eliminate 1.89% : 0.000013s : 15: substitution.remove_not_recompute_node 3.76% : 0.000025s : 10: substitution.replace_applicator 1.28% : 0.000009s : 10: substitution.replace_old_param 0.39% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.04% : 0.000021s : 7: substitution.tuple_list_convert_item_index_to_positive 1.52% : 0.000010s : 7: substitution.tuple_list_get_item_const_eliminator 1.96% : 0.000013s : 7: substitution.tuple_list_get_item_depend_reorder 7.37% : 0.000050s : 18: substitution.tuple_list_get_item_eliminator 2.07% : 0.000014s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011350 2 87.47% : 0.009928s : 1: type_inference.infer 12.53% : 0.001422s : 1: type_inference.specialize ------[replace.] 0.000188 26 66.12% : 0.000124s : 17: replace.inline 33.88% : 0.000064s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000406 26 93.90% : 0.000382s : 17: match.inline 6.10% : 0.000025s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000672 4180 1.13% : 0.000008s : 52: predicate.accumulaten_eliminater 0.26% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.47% : 0.000003s : 21: predicate.addn_check_dump 1.17% : 0.000008s : 52: predicate.addn_zero_filter 1.09% : 0.000007s : 52: predicate.adjust_all_reduce_mul_add 2.04% : 0.000014s : 73: predicate.arithmetic_simplify 1.15% : 0.000008s : 52: predicate.cast_eliminate 1.17% : 0.000008s : 50: predicate.check_bprop_eliminate 0.48% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.46% : 0.000003s : 21: predicate.depend_value_elim 1.16% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.19% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.06% : 0.000000s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.18% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.18% : 0.000008s : 56: predicate.environ_get_depend_swap 1.70% : 0.000011s : 77: predicate.environ_get_eliminate 1.20% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.81% : 0.000012s : 78: predicate.exchange_switch_depend_value 2.44% : 0.000016s : 78: predicate.float_depend_g_call 0.47% : 0.000003s : 21: predicate.float_environ_get_switch 0.60% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.53% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.52% : 0.000004s : 21: predicate.incorporate_call 0.47% : 0.000003s : 21: predicate.incorporate_call_switch 5.86% : 0.000039s : 180: predicate.inline 1.48% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.59% : 0.000004s : 21: predicate.less_batch_normalization 1.52% : 0.000010s : 69: predicate.list_to_tuple_eliminator_ 2.63% : 0.000018s : 121: predicate.load_eliminater 0.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.63% : 0.000018s : 110: predicate.loop_unroll_before_grad 1.35% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.48% : 0.000003s : 21: predicate.merge_addn 1.11% : 0.000007s : 50: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 52: predicate.minmaximum_grad 0.28% : 0.000002s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.11% : 0.000001s : 4: predicate.parallel_virtual_node 2.11% : 0.000014s : 78: predicate.partial_defer_inline 1.70% : 0.000011s : 65: predicate.partial_eliminate 1.12% : 0.000008s : 52: predicate.print_const_string_wrapper 0.50% : 0.000003s : 21: predicate.reduce_all_const_elim 1.34% : 0.000009s : 52: predicate.reduce_eliminate 2.61% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 21: predicate.remove_not_recompute_node 1.89% : 0.000013s : 111: predicate.replace_applicator 0.67% : 0.000004s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.12% : 0.000008s : 52: predicate.reshape_eliminate 1.16% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.26% : 0.000008s : 50: predicate.same_eliminate 0.35% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.60% : 0.000004s : 21: predicate.shard_identity_eliminate 0.24% : 0.000002s : 8: predicate.special_op_eliminate 0.63% : 0.000004s : 21: predicate.specialize_transform 1.24% : 0.000008s : 50: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.95% : 0.000013s : 78: predicate.switch_defer_inline 3.08% : 0.000021s : 128: predicate.switch_layer_defer_inline 5.22% : 0.000035s : 213: predicate.switch_simplify 1.14% : 0.000008s : 52: predicate.tile_eliminate 1.13% : 0.000008s : 52: predicate.transpose_eliminate 1.45% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.68% : 0.000018s : 90: predicate.tuple_list_get_item_eliminator 1.47% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000014s : 81: predicate.tuple_list_set_item_eliminator 1.53% : 0.000010s : 69: predicate.tuple_to_list_eliminator_ 2.58% : 0.000017s : 121: predicate.updatestate_pure_node_eliminater 3.17% : 0.000021s : 142: predicate.updatestate_useless_node_eliminater 0.11% : 0.000001s : 4: predicate.value_based_eliminate 0.52% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001602 35 59.58% : 0.000955s : 14: func_graph_cloner_run.FuncGraphClonerGraph 40.42% : 0.000648s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.067874 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.49% : 0.003045s : 1: add_attr 4.47% : 0.003035s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000136s : 1: auto_monad 0.03% : 0.000024s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.72% : 0.000486s : 1: bootstrap 0.03% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.02% : 0.000011s : 1: convert_after_rewriter 0.04% : 0.000027s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.07% : 0.000049s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.65% : 0.000440s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.71% : 0.000484s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 6.31% : 0.004280s : 117: opt.transform.opt_a 0.04% : 0.000030s : 1: opt.transform.opt_after_cconv 0.04% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.17% : 0.000116s : 28: opt.transform.opt_b 0.07% : 0.000050s : 2: opt.transform.opt_trans_graph 0.06% : 0.000039s : 4: opt.transform.symbol_engine_opt 20.28% : 0.013768s : 1: opt_a 0.16% : 0.000111s : 1: opt_after_cconv 0.70% : 0.000476s : 1: opt_after_jit_grad 0.33% : 0.000223s : 1: opt_b 23.37% : 0.015859s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000052s : 1: pre_auto_parallel 0.06% : 0.000038s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000020s : 1: remove_dup_value 7.30% : 0.004956s : 2: renormalize.infer 2.14% : 0.001450s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000044s : 1: rewriter_after_opt_a 0.22% : 0.000146s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.13% : 0.000087s : 1: symbol_engine_optimizer 9.33% : 0.006336s : 1: task_emit 0.12% : 0.000080s : 1: tuple_transform 16.85% : 0.011439s : 1: type_inference 0.10% : 0.000068s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x0-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x0-kbk],max_mem:10.0M TotalTime = 0.0629, [24] [bootstrap]: 0.00056062 [type_inference]: 0.0064003 [event_method]: 1.359e-05 [auto_monad]: 5.871e-05 [graph_reusing]: 5.34998e-06 [inline]: 1.73002e-06 [add_attr]: 0.00359505, [1] [add_attr_with_inline]: 0.00358451, [1] [Cycle 1]: 4.178e-05, [2] [tag_attr]: 1.374e-05 [meta_addattr_fg_expand]: 4.62e-06 [parallel-infer-symbol]: 3.2e-06 [pre_auto_parallel]: 2.573e-05 [insert-virtual-dataset]: 2.55002e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 2.34999e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.0041382, [53] [py_interpret_to_execute]: 2.028e-05 [rewriter_before_opt_a]: 6.438e-05 [opt_a]: 0.00221443, [2] [Cycle 1]: 0.00160364, [45] [expand_dump_flag]: 1.71998e-06 [switch_simplify]: 3.012e-05 [loop_unroll]: 2.33e-05 [a_1]: 0.00042225 [with_stream_mark]: 1.382e-05 [recompute_prepare]: 8.37e-06 [updatestate_depend_eliminate]: 3.71999e-06 [updatestate_assign_eliminate]: 3.85e-06 [updatestate_loads_eliminate]: 3.08998e-06 [parameter_eliminate]: 1.81e-06 [a_2]: 8.398e-05 [accelerated_algorithm]: 7.71001e-06 [shard]: 2.41e-06 [meta_shard_fg_expand]: 2.19001e-06 [shard_inline]: 6.97997e-06 [merge_send_recv]: 8.62998e-06 [auto_parallel]: 6.68e-06 [parallel]: 2.723e-05 [flash_sp]: 7.10002e-06 [merge_comm]: 4.07e-06 [allreduce_fusion]: 3.54002e-06 [matmul_add_comm_reduction]: 8.95999e-06 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 7.46001e-06 [virtual_dataset]: 6.11998e-06 [get_grad_eliminate_]: 5.62999e-06 [virtual_output]: 5.91e-06 [merge_forward]: 4.06001e-06 [cell_reuse_recompute_pass]: 1.43002e-06 [offload_activation]: 9.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.359e-05 [merge_recompute_call_nodes]: 1.63002e-06 [before_grad]: 1.078e-05 [set_forward_comm_id_for_comm_node_pass]: 3.81999e-06 [meta_fg_expand]: 2.93e-06 [flash_sp_send_recv_attached]: 2.79999e-06 [receive_attached]: 2.08998e-06 [after_resolve]: 9.75002e-06 [a_after_grad]: 1.033e-05 [renormalize]: 0.00048682 [add_forward_monad_depend]: 8.70001e-06 [auto_monad_grad]: 2.05002e-06 [auto_monad_eliminator]: 1.256e-05 [cse]: 2.218e-05 [a_3]: 4.16e-05 [Cycle 2]: 0.00060144, [45] [expand_dump_flag]: 1.22e-06 [switch_simplify]: 7.33e-06 [loop_unroll]: 5.86003e-06 [a_1]: 0.00011307 [with_stream_mark]: 1.009e-05 [recompute_prepare]: 6.13002e-06 [updatestate_depend_eliminate]: 2.93998e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 7.064e-05 [accelerated_algorithm]: 5.86e-06 [shard]: 9.30013e-07 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 4.82e-06 [auto_parallel]: 5.34e-06 [parallel]: 4.34997e-06 [flash_sp]: 3.3e-06 [merge_comm]: 3.15998e-06 [allreduce_fusion]: 3.01001e-06 [matmul_add_comm_reduction]: 5.69999e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 7e-06 [virtual_dataset]: 5.92999e-06 [get_grad_eliminate_]: 5.82999e-06 [virtual_output]: 5.44e-06 [merge_forward]: 2.81999e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 6.39999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.037e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 8.94e-06 [set_forward_comm_id_for_comm_node_pass]: 3.42002e-06 [meta_fg_expand]: 1.79998e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.09989e-07 [after_resolve]: 8.60999e-06 [a_after_grad]: 8.43999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 9.80013e-07 [auto_monad_grad]: 8.99978e-07 [auto_monad_eliminator]: 5.98998e-06 [cse]: 1.286e-05 [a_3]: 3.204e-05 [py_interpret_to_execute_after_opt_a]: 7.30998e-06 [slice_cell_reuse_recomputed_activation]: 2.08998e-06 [rewriter_after_opt_a]: 3.318e-05 [convert_after_rewriter]: 7.09001e-06 [order_py_execute_after_rewriter]: 5.19998e-06 [mutable_eliminate]: 0.00045796 [opt_b]: 0.00019318, [1] [Cycle 1]: 0.00018673, [7] [b_1]: 0.00011558 [b_2]: 7.43e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 4.89992e-07 [cse]: 1.73e-05 [optimize_parallel_all_gather_comm]: 1.572e-05 [overlap_param_gather]: 2.47001e-06 [cconv]: 2.201e-05 [loop_unroll]: 0.00042666 [opt_after_cconv]: 9.672e-05, [1] [Cycle 1]: 9.105e-05, [7] [c_1]: 2.629e-05 [parameter_eliminate]: 2.49999e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.33002e-06 [cse]: 1.746e-05 [renormalize]: 5.50004e-07 [remove_dup_value]: 1.549e-05 [tuple_transform]: 6.823e-05, [1] [Cycle 1]: 6.345e-05, [4] [d_1]: 3.693e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 6.21998e-06 [partial_unused_args_eliminate]: 2.24001e-06 [add_recomputation]: 4.649e-05 [cse_after_recomputation]: 2.129e-05, [1] [Cycle 1]: 1.649e-05, [1] [cse]: 1.127e-05 [environ_conv]: 7.11999e-06 [swap_dp_allreduce_reducescatter]: 5.61e-06 [bias_add_comm_swap]: 3.02002e-06 [label_micro_interleaved_index]: 4.67e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.34999e-06 [assign_add_opt]: 1.50001e-06 [ForceFp32Comm]: 1.09e-06 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.46e-06 [reorder_send_recv_between_fp_bp]: 3.03e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.35001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.95001e-06 [control_data_broadcast_order]: 1.23e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.43999e-06 [overlap_recompute_and_grad_model_parallel]: 4.89e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.79001e-06 [overlap_grad_ring_attention]: 4.62e-06 [overlap_grad_flash_sp]: 1.733e-05 [begin_end_overlap_inline]: 6.00005e-07 [split_matmul_comm_elemetwise]: 2.49001e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 1.06002e-06 [symbol_engine_optimizer]: 7.316e-05, [1] [Cycle 1]: 6.855e-05, [6] [build]: 2.44999e-06 [elim_shapecalc]: 9.20999e-06 [elim_not_effective]: 1.242e-05 [opt_reshape]: 6.11998e-06 [fold_const_symbol]: 9.90002e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.74998e-06 [auto_monad_reorder]: 1.679e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.51001e-06 [opt_after_jit_grad]: 0.0004572 [validate]: 3.334e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.0473655 [execute]: 8.60999e-06 Sums bootstrap : 0.000561s : 0.96% type_inference : 0.006400s : 10.98% event_method : 0.000014s : 0.02% auto_monad : 0.000059s : 0.10% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000026s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000064s : 0.11% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000037s : 0.06% optimize.opt_a.loop_unroll : 0.000029s : 0.05% optimize.opt_a.a_1 : 0.000535s : 0.92% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000015s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000155s : 0.27% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000013s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000032s : 0.05% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000019s : 0.03% optimize.opt_a.renormalize : 0.000487s : 0.84% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.02% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000035s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000458s : 0.79% optimize.opt_b.b_1 : 0.000116s : 0.20% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.04% optimize.loop_unroll : 0.000427s : 0.73% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000037s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000007s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000457s : 0.78% validate : 0.000033s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.047365s : 81.28% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000141 26 22.49% : 0.000032s : 5: substitution.arithmetic_simplify 1.40% : 0.000002s : 2: substitution.elim_not_effective 1.17% : 0.000002s : 2: substitution.fold_const_symbol 3.57% : 0.000005s : 3: substitution.graph_param_transform 59.83% : 0.000085s : 3: substitution.inline 2.07% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.72% : 0.000005s : 4: substitution.remove_not_recompute_node 2.09% : 0.000003s : 2: substitution.replace_old_param 3.64% : 0.000005s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006350 2 90.14% : 0.005724s : 1: type_inference.infer 9.86% : 0.000626s : 1: type_inference.specialize ------[replace.] 0.000035 4 76.56% : 0.000027s : 3: replace.inline 23.44% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000087 4 94.99% : 0.000083s : 3: match.inline 5.01% : 0.000004s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 883 1.00% : 0.000002s : 9: predicate.accumulaten_eliminater 0.74% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.94% : 0.000002s : 9: predicate.addn_zero_filter 0.83% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 15: predicate.arithmetic_simplify 0.95% : 0.000002s : 9: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.91% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.03% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.39% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_depend_swap 1.83% : 0.000003s : 18: predicate.environ_get_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.39% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.07% : 0.000003s : 13: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.41% : 0.000010s : 40: predicate.inline 0.97% : 0.000002s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 6: predicate.less_batch_normalization 1.64% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 25: predicate.load_eliminater 1.10% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.38% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.84% : 0.000001s : 9: predicate.minmaximum_grad 1.03% : 0.000002s : 3: predicate.mutable_eliminate 0.41% : 0.000001s : 3: predicate.opt_reshape 0.36% : 0.000001s : 3: predicate.parallel_virtual_node 1.63% : 0.000003s : 13: predicate.partial_defer_inline 1.48% : 0.000002s : 13: predicate.partial_eliminate 0.90% : 0.000001s : 9: predicate.print_const_string_wrapper 0.68% : 0.000001s : 6: predicate.reduce_all_const_elim 1.15% : 0.000002s : 9: predicate.reduce_eliminate 2.54% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.30% : 0.000002s : 16: predicate.replace_applicator 0.64% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.92% : 0.000001s : 9: predicate.reshape_eliminate 0.64% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 3: predicate.row_tensor_eliminate 0.79% : 0.000001s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 6: predicate.shard_identity_eliminate 0.74% : 0.000001s : 6: predicate.special_op_eliminate 0.79% : 0.000001s : 6: predicate.specialize_transform 0.99% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.42% : 0.000002s : 13: predicate.switch_defer_inline 2.00% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.98% : 0.000008s : 43: predicate.switch_simplify 0.95% : 0.000002s : 9: predicate.tile_eliminate 0.97% : 0.000002s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.68% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.00% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.54% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 3: predicate.value_based_eliminate 0.72% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.28% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000359 8 45.59% : 0.000164s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.41% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.072172 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.99% : 0.003599s : 1: add_attr 4.97% : 0.003588s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000064s : 1: auto_monad 0.03% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.82% : 0.000588s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.60% : 0.000435s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.65% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.27% : 0.000918s : 78: opt.transform.opt_a 0.03% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000020s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000094s : 28: opt.transform.opt_b 0.06% : 0.000041s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 3.07% : 0.002217s : 1: opt_a 0.14% : 0.000100s : 1: opt_after_cconv 0.65% : 0.000467s : 1: opt_after_jit_grad 0.27% : 0.000197s : 1: opt_b 5.74% : 0.004142s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000007s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000015s : 1: pipeline_split 0.04% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.38% : 0.000273s : 1: renormalize.infer 0.29% : 0.000206s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000037s : 1: rewriter_after_opt_a 0.10% : 0.000069s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000076s : 1: symbol_engine_optimizer 65.65% : 0.047380s : 1: task_emit 0.10% : 0.000071s : 1: tuple_transform 8.89% : 0.006413s : 1: type_inference 0.08% : 0.000056s : 1: validate TotalTime = 0.0564673, [24] [bootstrap]: 0.00045087 [type_inference]: 0.00606181 [event_method]: 1.227e-05 [auto_monad]: 5.851e-05 [graph_reusing]: 5.99999e-06 [inline]: 2.37001e-06 [add_attr]: 0.00302207, [1] [add_attr_with_inline]: 0.00301447, [1] [Cycle 1]: 4.846e-05, [2] [tag_attr]: 1.438e-05 [meta_addattr_fg_expand]: 4.31002e-06 [parallel-infer-symbol]: 2.81999e-06 [pre_auto_parallel]: 2.391e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.84998e-06 [optimize]: 0.00387964, [53] [py_interpret_to_execute]: 1.933e-05 [rewriter_before_opt_a]: 5.083e-05 [opt_a]: 0.00199377, [2] [Cycle 1]: 0.00138988, [45] [expand_dump_flag]: 2.79001e-06 [switch_simplify]: 2.853e-05 [loop_unroll]: 1.701e-05 [a_1]: 0.00035167 [with_stream_mark]: 1.44e-05 [recompute_prepare]: 7.53999e-06 [updatestate_depend_eliminate]: 3.83999e-06 [updatestate_assign_eliminate]: 3.87002e-06 [updatestate_loads_eliminate]: 3.23e-06 [parameter_eliminate]: 1.84e-06 [a_2]: 7.992e-05 [accelerated_algorithm]: 6.40002e-06 [shard]: 2.04999e-06 [meta_shard_fg_expand]: 1.70001e-06 [shard_inline]: 6.15002e-06 [merge_send_recv]: 8.31002e-06 [auto_parallel]: 5.87999e-06 [parallel]: 1.725e-05 [flash_sp]: 7.78999e-06 [merge_comm]: 3.71001e-06 [allreduce_fusion]: 3.67998e-06 [matmul_add_comm_reduction]: 9.62999e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 7.16001e-06 [virtual_dataset]: 5.93002e-06 [get_grad_eliminate_]: 5.79e-06 [virtual_output]: 5.84e-06 [merge_forward]: 3.74002e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 9.51998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.178e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.90002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.80998e-06 [meta_fg_expand]: 2.48e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.19001e-06 [after_resolve]: 9.99001e-06 [a_after_grad]: 8.82e-06 [renormalize]: 0.00038651 [add_forward_monad_depend]: 4.75001e-06 [auto_monad_grad]: 1.85001e-06 [auto_monad_eliminator]: 1.331e-05 [cse]: 2.8e-05 [a_3]: 4.15e-05 [Cycle 2]: 0.00059493, [45] [expand_dump_flag]: 1.10999e-06 [switch_simplify]: 7.11999e-06 [loop_unroll]: 5.62001e-06 [a_1]: 0.0001127 [with_stream_mark]: 1.141e-05 [recompute_prepare]: 5.77999e-06 [updatestate_depend_eliminate]: 2.91e-06 [updatestate_assign_eliminate]: 2.31998e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 7.049e-05 [accelerated_algorithm]: 5.65001e-06 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.25999e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 4.34002e-06 [auto_parallel]: 5.49e-06 [parallel]: 4.58001e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 3.18e-06 [allreduce_fusion]: 2.79999e-06 [matmul_add_comm_reduction]: 5.40001e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.12999e-06 [virtual_dataset]: 5.23002e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 5.14998e-06 [merge_forward]: 2.71e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 6.16e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.94001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.51002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48999e-06 [meta_fg_expand]: 1.75001e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 8.32998e-06 [a_after_grad]: 7.82e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.326e-05 [a_3]: 3.21e-05 [py_interpret_to_execute_after_opt_a]: 7.50998e-06 [slice_cell_reuse_recomputed_activation]: 2.21998e-06 [rewriter_after_opt_a]: 3.214e-05 [convert_after_rewriter]: 6.44999e-06 [order_py_execute_after_rewriter]: 5.44998e-06 [mutable_eliminate]: 0.00045258 [opt_b]: 0.0001862, [1] [Cycle 1]: 0.0001798, [7] [b_1]: 0.00010982 [b_2]: 7.28e-06 [updatestate_depend_eliminate]: 5.06002e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.23998e-06 [renormalize]: 5.29981e-07 [cse]: 1.745e-05 [optimize_parallel_all_gather_comm]: 1.589e-05 [overlap_param_gather]: 2.08002e-06 [cconv]: 2.299e-05 [loop_unroll]: 0.00044556 [opt_after_cconv]: 9.504e-05, [1] [Cycle 1]: 8.926e-05, [7] [c_1]: 2.596e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 5.47001e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.44999e-06 [cse]: 1.655e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 1.574e-05 [tuple_transform]: 6.784e-05, [1] [Cycle 1]: 6.309e-05, [4] [d_1]: 3.658e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.20002e-06 [partial_unused_args_eliminate]: 2.11998e-06 [add_recomputation]: 4.652e-05 [cse_after_recomputation]: 2.078e-05, [1] [Cycle 1]: 1.6e-05, [1] [cse]: 1.07e-05 [environ_conv]: 4.95001e-06 [swap_dp_allreduce_reducescatter]: 6.06998e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.45e-06 [label_fine_grained_interleaved_index]: 2.94999e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.49001e-06 [reorder_send_recv_between_fp_bp]: 2.68e-06 [comm_op_add_attrs]: 1.15999e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.46002e-06 [interleave_parallel_branches]: 1.13001e-06 [overlap_opt_shard_in_pipeline]: 1.37e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94999e-06 [control_data_broadcast_order]: 1.226e-05 [grouped_pairwise_exchange_alltoall]: 1.39e-06 [offloading_packed_experts]: 4.07e-06 [overlap_recompute_and_grad_model_parallel]: 5.13002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.24998e-06 [overlap_recompute_comm]: 2.14999e-06 [overlap_grad_ring_attention]: 4.26001e-06 [overlap_grad_flash_sp]: 1.781e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.74e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 7.111e-05, [1] [Cycle 1]: 6.677e-05, [6] [build]: 2.59001e-06 [elim_shapecalc]: 8.87999e-06 [elim_not_effective]: 1.211e-05 [opt_reshape]: 6.23e-06 [fold_const_symbol]: 9.14e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.84998e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 1.602e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.46001e-06 [opt_after_jit_grad]: 0.00045231 [validate]: 3.345e-05 [backend_pass]: 8.49977e-07 [task_emit]: 0.0422092 [execute]: 9.31998e-06 Sums bootstrap : 0.000451s : 0.86% type_inference : 0.006062s : 11.56% event_method : 0.000012s : 0.02% auto_monad : 0.000059s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000024s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.04% optimize.rewriter_before_opt_a : 0.000051s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000036s : 0.07% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000464s : 0.89% optimize.opt_a.with_stream_mark : 0.000026s : 0.05% optimize.opt_a.recompute_prepare : 0.000013s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000150s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.03% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000387s : 0.74% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000041s : 0.08% optimize.opt_a.a_3 : 0.000074s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.06% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000453s : 0.86% optimize.opt_b.b_1 : 0.000110s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.04% optimize.loop_unroll : 0.000446s : 0.85% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.03% optimize.tuple_transform.d_1 : 0.000037s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.09% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000452s : 0.86% validate : 0.000033s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.042209s : 80.48% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000143 24 19.93% : 0.000028s : 4: substitution.arithmetic_simplify 1.54% : 0.000002s : 2: substitution.elim_not_effective 0.97% : 0.000001s : 2: substitution.fold_const_symbol 3.81% : 0.000005s : 3: substitution.graph_param_transform 66.08% : 0.000094s : 3: substitution.inline 2.33% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.21% : 0.000005s : 4: substitution.remove_not_recompute_node 2.12% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006018 2 92.03% : 0.005538s : 1: type_inference.infer 7.97% : 0.000480s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000092 3 100.00% : 0.000092s : 3: match.inline ------[predicate.] 0.000146 815 0.87% : 0.000001s : 8: predicate.accumulaten_eliminater 0.83% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 6: predicate.addn_check_dump 0.93% : 0.000001s : 8: predicate.addn_zero_filter 0.80% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.18% : 0.000003s : 14: predicate.arithmetic_simplify 0.91% : 0.000001s : 8: predicate.cast_eliminate 0.70% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.69% : 0.000001s : 6: predicate.depend_value_elim 0.85% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.15% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_depend_swap 1.79% : 0.000003s : 17: predicate.environ_get_eliminate 1.17% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.16% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.91% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.77% : 0.000001s : 6: predicate.incorporate_call 0.61% : 0.000001s : 6: predicate.incorporate_call_switch 6.32% : 0.000009s : 37: predicate.inline 1.03% : 0.000002s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 6: predicate.less_batch_normalization 1.78% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.26% : 0.000003s : 22: predicate.load_eliminater 1.15% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.02% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 6: predicate.merge_addn 0.69% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 8: predicate.minmaximum_grad 1.13% : 0.000002s : 3: predicate.mutable_eliminate 0.42% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.42% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 11: predicate.partial_eliminate 0.90% : 0.000001s : 8: predicate.print_const_string_wrapper 0.70% : 0.000001s : 6: predicate.reduce_all_const_elim 1.06% : 0.000002s : 8: predicate.reduce_eliminate 2.25% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.63% : 0.000001s : 6: predicate.remove_not_recompute_node 1.28% : 0.000002s : 14: predicate.replace_applicator 0.84% : 0.000001s : 6: predicate.replace_old_param 0.27% : 0.000000s : 3: predicate.reset_defer_inline 0.98% : 0.000001s : 8: predicate.reshape_eliminate 0.68% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.61% : 0.000001s : 3: predicate.row_tensor_eliminate 0.86% : 0.000001s : 6: predicate.same_eliminate 0.48% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 6: predicate.shard_identity_eliminate 0.83% : 0.000001s : 6: predicate.special_op_eliminate 0.89% : 0.000001s : 6: predicate.specialize_transform 0.95% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.25% : 0.000002s : 11: predicate.switch_defer_inline 1.91% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.86% : 0.000007s : 38: predicate.switch_simplify 0.87% : 0.000001s : 8: predicate.tile_eliminate 0.90% : 0.000001s : 8: predicate.transpose_eliminate 1.52% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.64% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.16% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.53% : 0.000001s : 3: predicate.value_based_eliminate 0.71% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000297 7 40.22% : 0.000119s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.78% : 0.000178s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064724 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.68% : 0.003026s : 1: add_attr 4.66% : 0.003018s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000063s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.75% : 0.000484s : 1: bootstrap 0.04% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000018s : 1: event_method 0.03% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000454s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.71% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.27% : 0.000823s : 78: opt.transform.opt_a 0.04% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000020s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000089s : 28: opt.transform.opt_b 0.06% : 0.000041s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 3.08% : 0.001997s : 1: opt_a 0.15% : 0.000098s : 1: opt_after_cconv 0.71% : 0.000461s : 1: opt_after_jit_grad 0.29% : 0.000190s : 1: opt_b 6.00% : 0.003883s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000028s : 1: pre_auto_parallel 0.04% : 0.000023s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.31% : 0.000204s : 1: renormalize.infer 0.27% : 0.000176s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000036s : 1: rewriter_after_opt_a 0.08% : 0.000055s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000074s : 1: symbol_engine_optimizer 65.25% : 0.042231s : 1: task_emit 0.11% : 0.000071s : 1: tuple_transform 9.39% : 0.006077s : 1: type_inference 0.09% : 0.000055s : 1: validate TotalTime = 0.0593977, [24] [bootstrap]: 0.00047663 [type_inference]: 0.00570381 [event_method]: 1.341e-05 [auto_monad]: 6.027e-05 [graph_reusing]: 5.54e-06 [inline]: 1.96e-06 [add_attr]: 0.00305402, [1] [add_attr_with_inline]: 0.00304666, [1] [Cycle 1]: 4.84e-05, [2] [tag_attr]: 1.45e-05 [meta_addattr_fg_expand]: 4.62998e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 2.477e-05 [insert-virtual-dataset]: 2.76e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.42001e-06 [pipeline_split]: 1.64998e-06 [optimize]: 0.00411234, [53] [py_interpret_to_execute]: 2.12e-05 [rewriter_before_opt_a]: 6.379e-05 [opt_a]: 0.00222056, [2] [Cycle 1]: 0.00160513, [45] [expand_dump_flag]: 2.81e-06 [switch_simplify]: 3.355e-05 [loop_unroll]: 2.045e-05 [a_1]: 0.0004361 [with_stream_mark]: 1.323e-05 [recompute_prepare]: 7.71999e-06 [updatestate_depend_eliminate]: 3.86001e-06 [updatestate_assign_eliminate]: 3.5e-06 [updatestate_loads_eliminate]: 3.48e-06 [parameter_eliminate]: 1.82999e-06 [a_2]: 7.973e-05 [accelerated_algorithm]: 6.59999e-06 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 5.99e-06 [merge_send_recv]: 8.47e-06 [auto_parallel]: 6.11998e-06 [parallel]: 1.723e-05 [flash_sp]: 7.16001e-06 [merge_comm]: 3.70998e-06 [allreduce_fusion]: 3.40003e-06 [matmul_add_comm_reduction]: 8.94e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.35998e-06 [virtual_dataset]: 6.33e-06 [get_grad_eliminate_]: 5.57001e-06 [virtual_output]: 5.69999e-06 [merge_forward]: 4.05998e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 9.30001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.149e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 9.99001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55e-06 [meta_fg_expand]: 2.88e-06 [flash_sp_send_recv_attached]: 3.06999e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 1.038e-05 [a_after_grad]: 8.72998e-06 [renormalize]: 0.00047713 [add_forward_monad_depend]: 4.68999e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.328e-05 [cse]: 2.914e-05 [a_3]: 4.08e-05 [Cycle 2]: 0.00060543, [45] [expand_dump_flag]: 1.15999e-06 [switch_simplify]: 7.53999e-06 [loop_unroll]: 5.71e-06 [a_1]: 0.00011535 [with_stream_mark]: 9.94001e-06 [recompute_prepare]: 6.08002e-06 [updatestate_depend_eliminate]: 3.08e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.65002e-06 [parameter_eliminate]: 9.29984e-07 [a_2]: 7.101e-05 [accelerated_algorithm]: 5.82999e-06 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.82001e-06 [merge_send_recv]: 4.75001e-06 [auto_parallel]: 5.64998e-06 [parallel]: 4.22003e-06 [flash_sp]: 3.69002e-06 [merge_comm]: 3.36999e-06 [allreduce_fusion]: 3.02002e-06 [matmul_add_comm_reduction]: 5.22999e-06 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 6.64999e-06 [virtual_dataset]: 5.60001e-06 [get_grad_eliminate_]: 5.32001e-06 [virtual_output]: 5.17999e-06 [merge_forward]: 2.62001e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 6.27001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.087e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 8.60001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.27002e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.23999e-06 [a_after_grad]: 7.77e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.32e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.42001e-06 [cse]: 1.423e-05 [a_3]: 3.286e-05 [py_interpret_to_execute_after_opt_a]: 7.61999e-06 [slice_cell_reuse_recomputed_activation]: 2.31e-06 [rewriter_after_opt_a]: 3.333e-05 [convert_after_rewriter]: 6.96999e-06 [order_py_execute_after_rewriter]: 4.80999e-06 [mutable_eliminate]: 0.00046013 [opt_b]: 0.00018877, [1] [Cycle 1]: 0.00018238, [7] [b_1]: 0.00010997 [b_2]: 7.96001e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.77002e-06 [updatestate_loads_eliminate]: 2.51e-06 [renormalize]: 3.30008e-07 [cse]: 1.748e-05 [optimize_parallel_all_gather_comm]: 1.625e-05 [overlap_param_gather]: 1.96e-06 [cconv]: 2.359e-05 [loop_unroll]: 0.00042494 [opt_after_cconv]: 9.685e-05, [1] [Cycle 1]: 9.082e-05, [7] [c_1]: 2.614e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 5.26002e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.43e-06 [cse]: 1.759e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.535e-05 [tuple_transform]: 6.855e-05, [1] [Cycle 1]: 6.373e-05, [4] [d_1]: 3.672e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 4.19997e-07 [switch_simplify]: 6.63e-06 [partial_unused_args_eliminate]: 1.84998e-06 [add_recomputation]: 4.645e-05 [cse_after_recomputation]: 2.123e-05, [1] [Cycle 1]: 1.634e-05, [1] [cse]: 1.108e-05 [environ_conv]: 5.61e-06 [swap_dp_allreduce_reducescatter]: 5.10999e-06 [bias_add_comm_swap]: 2.79999e-06 [label_micro_interleaved_index]: 4.05e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.42999e-06 [slice_recompute_activation]: 2.32999e-06 [micro_interleaved_order_control]: 2.36e-06 [assign_add_opt]: 1.67001e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.37999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.02001e-06 [control_data_broadcast_order]: 1.278e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.76001e-06 [overlap_recompute_and_grad_model_parallel]: 4.55999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27999e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 4.74e-06 [overlap_grad_flash_sp]: 1.777e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.46e-06 [split_layernorm_comm]: 1.77999e-06 [handle_group_info]: 1.19e-06 [symbol_engine_optimizer]: 7.091e-05, [1] [Cycle 1]: 6.652e-05, [6] [build]: 2.36e-06 [elim_shapecalc]: 8.85999e-06 [elim_not_effective]: 1.188e-05 [opt_reshape]: 5.97999e-06 [fold_const_symbol]: 9.38002e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.59e-06 [auto_monad_reorder]: 1.6e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.67998e-06 [opt_after_jit_grad]: 0.00046 [validate]: 3.545e-05 [backend_pass]: 8.00006e-07 [task_emit]: 0.0451903 [execute]: 9.54e-06 Sums bootstrap : 0.000477s : 0.86% type_inference : 0.005704s : 10.31% event_method : 0.000013s : 0.02% auto_monad : 0.000060s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.04% optimize.rewriter_before_opt_a : 0.000064s : 0.12% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000041s : 0.07% optimize.opt_a.loop_unroll : 0.000026s : 0.05% optimize.opt_a.a_1 : 0.000551s : 1.00% optimize.opt_a.with_stream_mark : 0.000023s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000151s : 0.27% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.03% optimize.opt_a.renormalize : 0.000477s : 0.86% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000043s : 0.08% optimize.opt_a.a_3 : 0.000074s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000460s : 0.83% optimize.opt_b.b_1 : 0.000110s : 0.20% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.04% optimize.loop_unroll : 0.000425s : 0.77% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000037s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000460s : 0.83% validate : 0.000035s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.045190s : 81.71% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000169 26 18.88% : 0.000032s : 5: substitution.arithmetic_simplify 1.17% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 3.24% : 0.000005s : 3: substitution.graph_param_transform 63.19% : 0.000107s : 3: substitution.inline 1.94% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.07% : 0.000005s : 4: substitution.remove_not_recompute_node 2.10% : 0.000004s : 2: substitution.replace_old_param 5.59% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005661 2 89.51% : 0.005067s : 1: type_inference.infer 10.49% : 0.000594s : 1: type_inference.specialize ------[replace.] 0.000037 4 78.91% : 0.000029s : 3: replace.inline 21.09% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000113 4 92.35% : 0.000105s : 3: match.inline 7.65% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 883 0.92% : 0.000001s : 9: predicate.accumulaten_eliminater 0.80% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.97% : 0.000002s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.42% : 0.000004s : 15: predicate.arithmetic_simplify 0.98% : 0.000002s : 9: predicate.cast_eliminate 0.65% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.86% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.04% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.02% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.42% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_depend_swap 1.78% : 0.000003s : 18: predicate.environ_get_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.34% : 0.000004s : 13: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.68% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.02% : 0.000010s : 40: predicate.inline 0.91% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 6: predicate.less_batch_normalization 1.65% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.48% : 0.000004s : 25: predicate.load_eliminater 1.03% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 21: predicate.loop_unroll_before_grad 2.02% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.61% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.07% : 0.000002s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.35% : 0.000001s : 3: predicate.parallel_virtual_node 1.53% : 0.000002s : 13: predicate.partial_defer_inline 1.45% : 0.000002s : 13: predicate.partial_eliminate 0.95% : 0.000001s : 9: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.19% : 0.000002s : 9: predicate.reduce_eliminate 2.36% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 6: predicate.remove_not_recompute_node 1.28% : 0.000002s : 16: predicate.replace_applicator 0.74% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.89% : 0.000001s : 9: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 3: predicate.row_tensor_eliminate 0.84% : 0.000001s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 6: predicate.shard_identity_eliminate 0.77% : 0.000001s : 6: predicate.special_op_eliminate 0.82% : 0.000001s : 6: predicate.specialize_transform 0.91% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 13: predicate.switch_defer_inline 1.98% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.13% : 0.000008s : 43: predicate.switch_simplify 0.91% : 0.000001s : 9: predicate.tile_eliminate 0.92% : 0.000001s : 9: predicate.transpose_eliminate 1.58% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.54% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.02% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.88% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000358 8 45.92% : 0.000164s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.08% : 0.000194s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.068112 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.49% : 0.003058s : 1: add_attr 4.48% : 0.003050s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000065s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.75% : 0.000514s : 1: bootstrap 0.04% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.03% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.64% : 0.000433s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.69% : 0.000469s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.35% : 0.000923s : 78: opt.transform.opt_a 0.04% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000089s : 28: opt.transform.opt_b 0.06% : 0.000041s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 3.26% : 0.002223s : 1: opt_a 0.15% : 0.000100s : 1: opt_after_cconv 0.69% : 0.000470s : 1: opt_after_jit_grad 0.28% : 0.000192s : 1: opt_b 6.04% : 0.004116s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.04% : 0.000025s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.39% : 0.000267s : 1: renormalize.infer 0.30% : 0.000204s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000037s : 1: rewriter_after_opt_a 0.10% : 0.000068s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000074s : 1: symbol_engine_optimizer 66.38% : 0.045214s : 1: task_emit 0.10% : 0.000072s : 1: tuple_transform 8.39% : 0.005718s : 1: type_inference 0.08% : 0.000058s : 1: validate TotalTime = 0.0773214, [24] [bootstrap]: 0.0005302 [type_inference]: 0.0117079 [event_method]: 4.672e-05 [auto_monad]: 0.00012914 [graph_reusing]: 8.55001e-06 [inline]: 2.21e-06 [add_attr]: 0.00305969, [1] [add_attr_with_inline]: 0.00305131, [1] [Cycle 1]: 7.024e-05, [2] [tag_attr]: 3.212e-05 [meta_addattr_fg_expand]: 9.97999e-06 [parallel-infer-symbol]: 3.21999e-06 [pre_auto_parallel]: 4.945e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 2.48e-06 [pipeline_split]: 1.91e-06 [optimize]: 0.0169251, [53] [py_interpret_to_execute]: 3.985e-05 [rewriter_before_opt_a]: 0.00015624 [opt_a]: 0.0147031, [3] [Cycle 1]: 0.011203, [45] [expand_dump_flag]: 3.91001e-06 [switch_simplify]: 7.626e-05 [loop_unroll]: 6.326e-05 [a_1]: 0.00147794 [with_stream_mark]: 2.33e-05 [recompute_prepare]: 2.199e-05 [updatestate_depend_eliminate]: 8.57e-06 [updatestate_assign_eliminate]: 7.35e-06 [updatestate_loads_eliminate]: 6.89999e-06 [parameter_eliminate]: 2.69999e-06 [a_2]: 0.00024096 [accelerated_algorithm]: 3.088e-05 [shard]: 1.73997e-06 [meta_shard_fg_expand]: 3.51999e-06 [shard_inline]: 1.597e-05 [merge_send_recv]: 1.609e-05 [auto_parallel]: 1.054e-05 [parallel]: 1.888e-05 [flash_sp]: 1.123e-05 [merge_comm]: 9.49e-06 [allreduce_fusion]: 8.69e-06 [matmul_add_comm_reduction]: 2.606e-05 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 1.742e-05 [virtual_dataset]: 1.523e-05 [get_grad_eliminate_]: 1.479e-05 [virtual_output]: 1.502e-05 [merge_forward]: 8.94003e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 1.784e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.952e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 2.831e-05 [set_forward_comm_id_for_comm_node_pass]: 9.66e-06 [meta_fg_expand]: 0.00150199 [flash_sp_send_recv_attached]: 3.84002e-06 [receive_attached]: 2.38998e-06 [after_resolve]: 6.443e-05 [a_after_grad]: 8.908e-05 [renormalize]: 0.00640871 [add_forward_monad_depend]: 9.89999e-06 [auto_monad_grad]: 6.16998e-06 [auto_monad_eliminator]: 5.416e-05 [cse]: 0.00018542 [a_3]: 0.00033182 [Cycle 2]: 0.00279874, [45] [expand_dump_flag]: 2.07999e-06 [switch_simplify]: 4.549e-05 [loop_unroll]: 4.185e-05 [a_1]: 0.00132568 [with_stream_mark]: 1.367e-05 [recompute_prepare]: 1.059e-05 [updatestate_depend_eliminate]: 4.72998e-06 [updatestate_assign_eliminate]: 3.28e-06 [updatestate_loads_eliminate]: 2.79001e-06 [parameter_eliminate]: 1.09e-06 [a_2]: 8.892e-05 [accelerated_algorithm]: 1.088e-05 [shard]: 1.09998e-06 [meta_shard_fg_expand]: 1.72999e-06 [shard_inline]: 6.44999e-06 [merge_send_recv]: 6.76999e-06 [auto_parallel]: 8.28999e-06 [parallel]: 6.14001e-06 [flash_sp]: 3.90998e-06 [merge_comm]: 4.58001e-06 [allreduce_fusion]: 3.88001e-06 [matmul_add_comm_reduction]: 7.51999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 7.98001e-06 [virtual_dataset]: 6.39001e-06 [get_grad_eliminate_]: 6.23e-06 [virtual_output]: 5.91e-06 [merge_forward]: 3.6e-06 [cell_reuse_recompute_pass]: 9.20001e-07 [offload_activation]: 9.17999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.316e-05 [merge_recompute_call_nodes]: 1.15999e-06 [before_grad]: 1.104e-05 [set_forward_comm_id_for_comm_node_pass]: 3.93999e-06 [meta_fg_expand]: 8.002e-05 [flash_sp_send_recv_attached]: 1.31002e-06 [receive_attached]: 1.31998e-06 [after_resolve]: 1.318e-05 [a_after_grad]: 1.037e-05 [renormalize]: 0.0006764 [add_forward_monad_depend]: 4.43999e-06 [auto_monad_grad]: 1.93002e-06 [auto_monad_eliminator]: 1.184e-05 [cse]: 2.361e-05 [a_3]: 4.817e-05 [Cycle 3]: 0.00068517, [45] [expand_dump_flag]: 1.17e-06 [switch_simplify]: 8.32003e-06 [loop_unroll]: 6.69001e-06 [a_1]: 0.00014635 [with_stream_mark]: 9.02e-06 [recompute_prepare]: 6.93e-06 [updatestate_depend_eliminate]: 3.92002e-06 [updatestate_assign_eliminate]: 2.76e-06 [updatestate_loads_eliminate]: 2.48e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 8.791e-05 [accelerated_algorithm]: 9.71998e-06 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.56998e-06 [shard_inline]: 6.77002e-06 [merge_send_recv]: 5.71e-06 [auto_parallel]: 6.93e-06 [parallel]: 5.39e-06 [flash_sp]: 8.39995e-07 [merge_comm]: 3.63999e-06 [allreduce_fusion]: 3.51001e-06 [matmul_add_comm_reduction]: 5.79e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 7.45998e-06 [virtual_dataset]: 6.26e-06 [get_grad_eliminate_]: 6.14999e-06 [virtual_output]: 6.10002e-06 [merge_forward]: 3.03e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 7.91001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.29e-05 [merge_recompute_call_nodes]: 9.10019e-07 [before_grad]: 1.055e-05 [set_forward_comm_id_for_comm_node_pass]: 3.85998e-06 [meta_fg_expand]: 2.63e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 9.50007e-07 [after_resolve]: 8.69998e-06 [a_after_grad]: 9.29e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.29e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 7.48e-06 [cse]: 1.744e-05 [a_3]: 3.978e-05 [py_interpret_to_execute_after_opt_a]: 1.117e-05 [slice_cell_reuse_recomputed_activation]: 2.32999e-06 [rewriter_after_opt_a]: 4.24e-05 [convert_after_rewriter]: 8.45999e-06 [order_py_execute_after_rewriter]: 5.61003e-06 [mutable_eliminate]: 0.00053922 [opt_b]: 0.00022835, [1] [Cycle 1]: 0.00022136, [7] [b_1]: 0.0001375 [b_2]: 8.89e-06 [updatestate_depend_eliminate]: 7.5e-06 [updatestate_assign_eliminate]: 2.98998e-06 [updatestate_loads_eliminate]: 2.81e-06 [renormalize]: 4.10015e-07 [cse]: 2.42e-05 [optimize_parallel_all_gather_comm]: 1.846e-05 [overlap_param_gather]: 2.26e-06 [cconv]: 2.424e-05 [loop_unroll]: 0.00043643 [opt_after_cconv]: 0.00010936, [1] [Cycle 1]: 0.000103, [7] [c_1]: 3.22e-05 [parameter_eliminate]: 2.79999e-06 [updatestate_depend_eliminate]: 5.92001e-06 [updatestate_assign_eliminate]: 2.98e-06 [updatestate_loads_eliminate]: 2.74001e-06 [cse]: 2.095e-05 [renormalize]: 5.39992e-07 [remove_dup_value]: 1.797e-05 [tuple_transform]: 7.977e-05, [1] [Cycle 1]: 7.456e-05, [4] [d_1]: 4.622e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 7.46999e-06 [partial_unused_args_eliminate]: 1.82999e-06 [add_recomputation]: 5.58e-05 [cse_after_recomputation]: 2.554e-05, [1] [Cycle 1]: 2.06e-05, [1] [cse]: 1.477e-05 [environ_conv]: 8.85999e-06 [swap_dp_allreduce_reducescatter]: 6.27001e-06 [bias_add_comm_swap]: 2.64999e-06 [label_micro_interleaved_index]: 4.30999e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.29001e-06 [micro_interleaved_order_control]: 2.14e-06 [assign_add_opt]: 1.26997e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.22e-06 [full_micro_interleaved_order_control]: 2.54001e-06 [reorder_send_recv_between_fp_bp]: 2.96001e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 1.31002e-06 [interleave_split_concat_branches]: 1.24998e-06 [interleave_parallel_branches]: 1.14998e-06 [overlap_opt_shard_in_pipeline]: 1.49998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.80001e-06 [control_data_broadcast_order]: 1.393e-05 [grouped_pairwise_exchange_alltoall]: 1.44998e-06 [offloading_packed_experts]: 4.58999e-06 [overlap_recompute_and_grad_model_parallel]: 5.20001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.25999e-06 [overlap_recompute_comm]: 2.37001e-06 [overlap_grad_ring_attention]: 4.68001e-06 [overlap_grad_flash_sp]: 2.132e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 1.92999e-06 [split_layernorm_comm]: 2.05002e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 8.654e-05, [1] [Cycle 1]: 8.194e-05, [6] [build]: 8.81997e-06 [elim_shapecalc]: 1.075e-05 [elim_not_effective]: 1.43e-05 [opt_reshape]: 7.45e-06 [fold_const_symbol]: 1.15e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.02999e-06 [pipeline_parallel_scheduler]: 1.74e-06 [auto_monad_reorder]: 2.079e-05 [get_jit_bprop_graph]: 1.57999e-06 [rewriter_after_jit_bprop_graph]: 4.13999e-06 [opt_after_jit_grad]: 0.00048573 [validate]: 4.306e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.0440575 [execute]: 1.028e-05 Sums bootstrap : 0.000530s : 0.73% type_inference : 0.011708s : 16.05% event_method : 0.000047s : 0.06% auto_monad : 0.000129s : 0.18% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.07% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.05% optimize.rewriter_before_opt_a : 0.000156s : 0.21% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000130s : 0.18% optimize.opt_a.loop_unroll : 0.000112s : 0.15% optimize.opt_a.a_1 : 0.002950s : 4.04% optimize.opt_a.with_stream_mark : 0.000046s : 0.06% optimize.opt_a.recompute_prepare : 0.000040s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000013s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000418s : 0.57% optimize.opt_a.accelerated_algorithm : 0.000051s : 0.07% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000029s : 0.04% optimize.opt_a.merge_send_recv : 0.000029s : 0.04% optimize.opt_a.auto_parallel : 0.000026s : 0.04% optimize.opt_a.parallel : 0.000030s : 0.04% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000018s : 0.02% optimize.opt_a.allreduce_fusion : 0.000016s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000039s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000033s : 0.05% optimize.opt_a.virtual_dataset : 0.000028s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000027s : 0.04% optimize.opt_a.virtual_output : 0.000027s : 0.04% optimize.opt_a.merge_forward : 0.000016s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.05% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000056s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000050s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000017s : 0.02% optimize.opt_a.meta_fg_expand : 0.001585s : 2.17% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000086s : 0.12% optimize.opt_a.a_after_grad : 0.000109s : 0.15% optimize.opt_a.renormalize : 0.007085s : 9.71% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.02% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000073s : 0.10% optimize.opt_a.cse : 0.000226s : 0.31% optimize.opt_a.a_3 : 0.000420s : 0.58% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000042s : 0.06% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000539s : 0.74% optimize.opt_b.b_1 : 0.000137s : 0.19% optimize.opt_b.b_2 : 0.000009s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000024s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000436s : 0.60% optimize.opt_after_cconv.c_1 : 0.000032s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000021s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000018s : 0.02% optimize.tuple_transform.d_1 : 0.000046s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.08% optimize.cse_after_recomputation.cse : 0.000015s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000021s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000486s : 0.67% validate : 0.000043s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.044057s : 60.41% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000705 161 7.34% : 0.000052s : 8: substitution.arithmetic_simplify 0.33% : 0.000002s : 3: substitution.elim_not_effective 0.64% : 0.000005s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 3: substitution.fold_const_symbol 0.91% : 0.000006s : 4: substitution.graph_param_transform 0.41% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 57.29% : 0.000404s : 17: substitution.inline 2.41% : 0.000017s : 2: substitution.inline_without_move 1.38% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.24% : 0.000016s : 3: substitution.less_batch_normalization 1.50% : 0.000011s : 7: substitution.minmaximum_grad 0.85% : 0.000006s : 5: substitution.partial_eliminate 1.83% : 0.000013s : 15: substitution.remove_not_recompute_node 3.86% : 0.000027s : 10: substitution.replace_applicator 1.40% : 0.000010s : 10: substitution.replace_old_param 0.38% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.06% : 0.000022s : 7: substitution.tuple_list_convert_item_index_to_positive 1.54% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 2.05% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.46% : 0.000053s : 19: substitution.tuple_list_get_item_eliminator 2.05% : 0.000014s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011631 2 86.24% : 0.010031s : 1: type_inference.infer 13.76% : 0.001600s : 1: type_inference.specialize ------[replace.] 0.000262 27 48.39% : 0.000127s : 17: replace.inline 51.61% : 0.000135s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000421 27 93.67% : 0.000395s : 17: match.inline 6.33% : 0.000027s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000687 4248 1.13% : 0.000008s : 53: predicate.accumulaten_eliminater 0.26% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.13% : 0.000008s : 53: predicate.addn_zero_filter 1.10% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 1.96% : 0.000013s : 74: predicate.arithmetic_simplify 1.14% : 0.000008s : 53: predicate.cast_eliminate 1.12% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.47% : 0.000003s : 21: predicate.depend_value_elim 1.18% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.19% : 0.000008s : 53: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_depend_swap 1.70% : 0.000012s : 78: predicate.environ_get_eliminate 1.17% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.83% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.49% : 0.000017s : 80: predicate.float_depend_g_call 0.48% : 0.000003s : 21: predicate.float_environ_get_switch 0.57% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.51% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.50% : 0.000003s : 21: predicate.incorporate_call 0.45% : 0.000003s : 21: predicate.incorporate_call_switch 5.86% : 0.000040s : 183: predicate.inline 1.46% : 0.000010s : 45: predicate.inline_without_move 0.29% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.63% : 0.000004s : 21: predicate.less_batch_normalization 1.56% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.68% : 0.000018s : 124: predicate.load_eliminater 0.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.58% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.36% : 0.000009s : 61: predicate.make_slice_get_slice_eliminator 0.49% : 0.000003s : 21: predicate.merge_addn 1.12% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.08% : 0.000007s : 50: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 53: predicate.minmaximum_grad 0.30% : 0.000002s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 2.12% : 0.000015s : 80: predicate.partial_defer_inline 1.75% : 0.000012s : 67: predicate.partial_eliminate 1.11% : 0.000008s : 53: predicate.print_const_string_wrapper 0.48% : 0.000003s : 21: predicate.reduce_all_const_elim 1.43% : 0.000010s : 53: predicate.reduce_eliminate 2.67% : 0.000018s : 124: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 21: predicate.remove_not_recompute_node 1.91% : 0.000013s : 113: predicate.replace_applicator 0.66% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.14% : 0.000008s : 53: predicate.reshape_eliminate 1.10% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.10% : 0.000001s : 4: predicate.row_tensor_eliminate 1.21% : 0.000008s : 50: predicate.same_eliminate 0.36% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.58% : 0.000004s : 21: predicate.shard_identity_eliminate 0.22% : 0.000002s : 8: predicate.special_op_eliminate 0.61% : 0.000004s : 21: predicate.specialize_transform 1.26% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.96% : 0.000013s : 80: predicate.switch_defer_inline 3.03% : 0.000021s : 130: predicate.switch_layer_defer_inline 5.26% : 0.000036s : 218: predicate.switch_simplify 1.14% : 0.000008s : 53: predicate.tile_eliminate 1.10% : 0.000008s : 53: predicate.transpose_eliminate 1.45% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000009s : 61: predicate.tuple_list_get_item_depend_reorder 2.79% : 0.000019s : 92: predicate.tuple_list_get_item_eliminator 1.45% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 1.98% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.57% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.59% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.16% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.12% : 0.000001s : 4: predicate.value_based_eliminate 0.51% : 0.000003s : 21: predicate.virtual_dataset_eliminate 0.51% : 0.000003s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001863 36 61.31% : 0.001142s : 15: func_graph_cloner_run.FuncGraphClonerGraph 38.69% : 0.000721s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.109013 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.81% : 0.003064s : 1: add_attr 2.80% : 0.003055s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.12% : 0.000136s : 1: auto_monad 0.02% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.52% : 0.000567s : 1: bootstrap 0.03% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000029s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.05% : 0.000054s : 1: event_method 0.02% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.41% : 0.000446s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.50% : 0.000550s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000016s : 1: opt.transform.mutable_eliminate 4.06% : 0.004427s : 117: opt.transform.opt_a 0.03% : 0.000030s : 1: opt.transform.opt_after_cconv 0.02% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000114s : 28: opt.transform.opt_b 0.05% : 0.000051s : 2: opt.transform.opt_trans_graph 0.04% : 0.000040s : 4: opt.transform.symbol_engine_opt 13.49% : 0.014706s : 1: opt_a 0.10% : 0.000113s : 1: opt_after_cconv 0.45% : 0.000495s : 1: opt_after_jit_grad 0.21% : 0.000232s : 1: opt_b 15.53% : 0.016929s : 1: optimize 0.02% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000055s : 1: pre_auto_parallel 0.04% : 0.000044s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000022s : 1: remove_dup_value 5.06% : 0.005517s : 2: renormalize.infer 1.43% : 0.001554s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000046s : 1: rewriter_after_opt_a 0.15% : 0.000160s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000089s : 1: symbol_engine_optimizer 40.43% : 0.044079s : 1: task_emit 0.08% : 0.000083s : 1: tuple_transform 10.75% : 0.011724s : 1: type_inference 0.06% : 0.000067s : 1: validate TotalTime = 0.0564422, [24] [bootstrap]: 0.00044786 [type_inference]: 0.00575704 [event_method]: 1.249e-05 [auto_monad]: 5.804e-05 [graph_reusing]: 5.96e-06 [inline]: 2.04e-06 [add_attr]: 0.0029911, [1] [add_attr_with_inline]: 0.00298378, [1] [Cycle 1]: 5.04e-05, [2] [tag_attr]: 1.373e-05 [meta_addattr_fg_expand]: 4.48999e-06 [parallel-infer-symbol]: 3.21001e-06 [pre_auto_parallel]: 2.413e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 9.20001e-07 [dataset_repeat_opt]: 2.07001e-06 [pipeline_split]: 1.82001e-06 [optimize]: 0.00394862, [53] [py_interpret_to_execute]: 1.843e-05 [rewriter_before_opt_a]: 5.136e-05 [opt_a]: 0.00208458, [2] [Cycle 1]: 0.00147662, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 2.918e-05 [loop_unroll]: 1.683e-05 [a_1]: 0.00035182 [with_stream_mark]: 6.028e-05 [recompute_prepare]: 8.14002e-06 [updatestate_depend_eliminate]: 4.32e-06 [updatestate_assign_eliminate]: 3.99002e-06 [updatestate_loads_eliminate]: 3.31001e-06 [parameter_eliminate]: 2.17001e-06 [a_2]: 8.072e-05 [accelerated_algorithm]: 6.70998e-06 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 6.26e-06 [merge_send_recv]: 8.58001e-06 [auto_parallel]: 6.36e-06 [parallel]: 1.85e-05 [flash_sp]: 7.19001e-06 [merge_comm]: 3.89002e-06 [allreduce_fusion]: 3.81001e-06 [matmul_add_comm_reduction]: 9.27999e-06 [allreduce_slice_to_reducescatter]: 6.49976e-07 [virtual_shard_identity]: 7.25e-06 [virtual_dataset]: 6.07999e-06 [get_grad_eliminate_]: 5.58002e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 4.02e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 9.33002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.164e-05 [merge_recompute_call_nodes]: 1.99999e-06 [before_grad]: 1.01e-05 [set_forward_comm_id_for_comm_node_pass]: 3.81999e-06 [meta_fg_expand]: 2.59999e-06 [flash_sp_send_recv_attached]: 2.37999e-06 [receive_attached]: 2.17001e-06 [after_resolve]: 1.019e-05 [a_after_grad]: 8.55999e-06 [renormalize]: 0.00041946 [add_forward_monad_depend]: 4.76002e-06 [auto_monad_grad]: 2.06e-06 [auto_monad_eliminator]: 1.363e-05 [cse]: 2.917e-05 [a_3]: 4.117e-05 [Cycle 2]: 0.00059794, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.54998e-06 [a_1]: 0.00011282 [with_stream_mark]: 1.22e-05 [recompute_prepare]: 5.89999e-06 [updatestate_depend_eliminate]: 2.96001e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 7.118e-05 [accelerated_algorithm]: 5.79999e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.30001e-06 [parallel]: 4.27003e-06 [flash_sp]: 3.4e-06 [merge_comm]: 3.35998e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 5.25001e-06 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 6.37001e-06 [virtual_dataset]: 5.31998e-06 [get_grad_eliminate_]: 5.28002e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 6.12999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.06e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.89e-06 [set_forward_comm_id_for_comm_node_pass]: 3.62998e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 7.99977e-07 [receive_attached]: 9.20001e-07 [after_resolve]: 8.31002e-06 [a_after_grad]: 7.68001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.12001e-06 [cse]: 1.368e-05 [a_3]: 3.201e-05 [py_interpret_to_execute_after_opt_a]: 7.58001e-06 [slice_cell_reuse_recomputed_activation]: 1.84e-06 [rewriter_after_opt_a]: 3.303e-05 [convert_after_rewriter]: 6.57002e-06 [order_py_execute_after_rewriter]: 5.07e-06 [mutable_eliminate]: 0.00045671 [opt_b]: 0.00018696, [1] [Cycle 1]: 0.00018069, [7] [b_1]: 0.00010948 [b_2]: 7.23e-06 [updatestate_depend_eliminate]: 4.93001e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.16e-06 [renormalize]: 6.39993e-07 [cse]: 1.823e-05 [optimize_parallel_all_gather_comm]: 1.672e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.253e-05 [loop_unroll]: 0.00041962 [opt_after_cconv]: 9.704e-05, [1] [Cycle 1]: 9.111e-05, [7] [c_1]: 2.648e-05 [parameter_eliminate]: 2.44999e-06 [updatestate_depend_eliminate]: 5.44e-06 [updatestate_assign_eliminate]: 2.72001e-06 [updatestate_loads_eliminate]: 2.27001e-06 [cse]: 1.693e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.527e-05 [tuple_transform]: 6.997e-05, [1] [Cycle 1]: 6.519e-05, [4] [d_1]: 3.803e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.26e-06 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 4.415e-05 [cse_after_recomputation]: 2.171e-05, [1] [Cycle 1]: 1.671e-05, [1] [cse]: 1.122e-05 [environ_conv]: 4.85999e-06 [swap_dp_allreduce_reducescatter]: 5.35999e-06 [bias_add_comm_swap]: 2.32999e-06 [label_micro_interleaved_index]: 4.13999e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.49001e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.65001e-06 [ForceFp32Comm]: 9.49978e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.53e-06 [reorder_send_recv_between_fp_bp]: 3.08e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 9.90025e-07 [interleave_split_concat_branches]: 1.41998e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.43002e-06 [overlap_opt_shard_grad_in_pipeline]: 2.07999e-06 [control_data_broadcast_order]: 1.152e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.9e-06 [overlap_recompute_and_grad_model_parallel]: 5.02e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 4.15999e-06 [overlap_grad_flash_sp]: 1.747e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.19999e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 1.19003e-06 [symbol_engine_optimizer]: 7.249e-05, [1] [Cycle 1]: 6.773e-05, [6] [build]: 2.23998e-06 [elim_shapecalc]: 8.27e-06 [elim_not_effective]: 1.255e-05 [opt_reshape]: 6.40002e-06 [fold_const_symbol]: 9.92999e-06 [renormalize]: 2.40019e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 1.681e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.79002e-06 [opt_after_jit_grad]: 0.00045526 [validate]: 3.378e-05 [backend_pass]: 1.14e-06 [task_emit]: 0.0424392 [execute]: 8.59e-06 Sums bootstrap : 0.000448s : 0.85% type_inference : 0.005757s : 10.98% event_method : 0.000012s : 0.02% auto_monad : 0.000058s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000024s : 0.05% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000018s : 0.04% optimize.rewriter_before_opt_a : 0.000051s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000036s : 0.07% optimize.opt_a.loop_unroll : 0.000022s : 0.04% optimize.opt_a.a_1 : 0.000465s : 0.89% optimize.opt_a.with_stream_mark : 0.000072s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000152s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.04% optimize.opt_a.a_after_grad : 0.000016s : 0.03% optimize.opt_a.renormalize : 0.000420s : 0.80% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000043s : 0.08% optimize.opt_a.a_3 : 0.000073s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000457s : 0.87% optimize.opt_b.b_1 : 0.000109s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.04% optimize.loop_unroll : 0.000420s : 0.80% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000038s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000455s : 0.87% validate : 0.000034s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.042439s : 80.94% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000142 24 20.70% : 0.000029s : 4: substitution.arithmetic_simplify 1.34% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.18% : 0.000006s : 3: substitution.graph_param_transform 65.02% : 0.000092s : 3: substitution.inline 2.36% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.24% : 0.000005s : 4: substitution.remove_not_recompute_node 2.13% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005715 2 91.45% : 0.005226s : 1: type_inference.infer 8.55% : 0.000489s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000091 3 100.00% : 0.000091s : 3: match.inline ------[predicate.] 0.000146 815 0.91% : 0.000001s : 8: predicate.accumulaten_eliminater 0.91% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 6: predicate.addn_check_dump 0.88% : 0.000001s : 8: predicate.addn_zero_filter 0.82% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.51% : 0.000004s : 14: predicate.arithmetic_simplify 0.84% : 0.000001s : 8: predicate.cast_eliminate 0.73% : 0.000001s : 6: predicate.check_bprop_eliminate 0.65% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.69% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_depend_swap 1.87% : 0.000003s : 17: predicate.environ_get_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.20% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 6: predicate.float_environ_get_switch 0.97% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.73% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.79% : 0.000001s : 6: predicate.incorporate_call 0.64% : 0.000001s : 6: predicate.incorporate_call_switch 6.36% : 0.000009s : 37: predicate.inline 0.97% : 0.000001s : 6: predicate.inline_without_move 0.45% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 6: predicate.less_batch_normalization 1.56% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.31% : 0.000003s : 22: predicate.load_eliminater 1.02% : 0.000001s : 3: predicate.loop_unroll_after_grad 1.97% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.65% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 6: predicate.merge_addn 0.69% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 8: predicate.minmaximum_grad 1.13% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.45% : 0.000002s : 11: predicate.partial_defer_inline 1.30% : 0.000002s : 11: predicate.partial_eliminate 0.86% : 0.000001s : 8: predicate.print_const_string_wrapper 0.67% : 0.000001s : 6: predicate.reduce_all_const_elim 1.21% : 0.000002s : 8: predicate.reduce_eliminate 2.28% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.59% : 0.000001s : 6: predicate.remove_not_recompute_node 1.22% : 0.000002s : 14: predicate.replace_applicator 0.77% : 0.000001s : 6: predicate.replace_old_param 0.32% : 0.000000s : 3: predicate.reset_defer_inline 0.88% : 0.000001s : 8: predicate.reshape_eliminate 0.62% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.83% : 0.000001s : 6: predicate.same_eliminate 0.50% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 6: predicate.shard_identity_eliminate 0.82% : 0.000001s : 6: predicate.special_op_eliminate 0.93% : 0.000001s : 6: predicate.specialize_transform 0.99% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.23% : 0.000002s : 11: predicate.switch_defer_inline 1.89% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.85% : 0.000007s : 38: predicate.switch_simplify 0.90% : 0.000001s : 8: predicate.tile_eliminate 0.84% : 0.000001s : 8: predicate.transpose_eliminate 1.51% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.54% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.47% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.20% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.61% : 0.000001s : 3: predicate.value_based_eliminate 0.80% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.35% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000285 7 37.38% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.62% : 0.000179s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064771 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.62% : 0.002995s : 1: add_attr 4.61% : 0.002987s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000048s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.10% : 0.000063s : 1: auto_monad 0.03% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.75% : 0.000488s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000018s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000010s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.66% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.72% : 0.000465s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.28% : 0.000828s : 78: opt.transform.opt_a 0.04% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000089s : 28: opt.transform.opt_b 0.07% : 0.000042s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 3.22% : 0.002087s : 1: opt_a 0.16% : 0.000101s : 1: opt_after_cconv 0.72% : 0.000465s : 1: opt_after_jit_grad 0.29% : 0.000190s : 1: opt_b 6.10% : 0.003952s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000022s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.36% : 0.000231s : 1: renormalize.infer 0.28% : 0.000182s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000037s : 1: rewriter_after_opt_a 0.09% : 0.000056s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000075s : 1: symbol_engine_optimizer 65.55% : 0.042458s : 1: task_emit 0.11% : 0.000073s : 1: tuple_transform 8.91% : 0.005772s : 1: type_inference 0.09% : 0.000056s : 1: validate TotalTime = 0.0759836, [24] [bootstrap]: 0.00043854 [type_inference]: 0.0114501 [event_method]: 4.377e-05 [auto_monad]: 0.00012768 [graph_reusing]: 8.65999e-06 [inline]: 2.03002e-06 [add_attr]: 0.00303581, [1] [add_attr_with_inline]: 0.00302732, [1] [Cycle 1]: 6.964e-05, [2] [tag_attr]: 3.209e-05 [meta_addattr_fg_expand]: 9.48002e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 4.607e-05 [insert-virtual-dataset]: 2.49001e-06 [parallel-infer-symbol-second]: 8.79983e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.0159422, [53] [py_interpret_to_execute]: 3.769e-05 [rewriter_before_opt_a]: 0.00014382 [opt_a]: 0.0138506, [3] [Cycle 1]: 0.0105419, [45] [expand_dump_flag]: 3.93001e-06 [switch_simplify]: 7.24e-05 [loop_unroll]: 5.91e-05 [a_1]: 0.00136011 [with_stream_mark]: 2.326e-05 [recompute_prepare]: 2.241e-05 [updatestate_depend_eliminate]: 8.72e-06 [updatestate_assign_eliminate]: 7.05998e-06 [updatestate_loads_eliminate]: 6.99001e-06 [parameter_eliminate]: 2.44001e-06 [a_2]: 0.00024054 [accelerated_algorithm]: 3.056e-05 [shard]: 1.82001e-06 [meta_shard_fg_expand]: 3.45e-06 [shard_inline]: 1.599e-05 [merge_send_recv]: 1.669e-05 [auto_parallel]: 1.058e-05 [parallel]: 1.85e-05 [flash_sp]: 1.112e-05 [merge_comm]: 9.39e-06 [allreduce_fusion]: 8.77999e-06 [matmul_add_comm_reduction]: 2.654e-05 [allreduce_slice_to_reducescatter]: 9.70002e-07 [virtual_shard_identity]: 1.762e-05 [virtual_dataset]: 1.538e-05 [get_grad_eliminate_]: 1.476e-05 [virtual_output]: 1.459e-05 [merge_forward]: 9.17999e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 1.824e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.909e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 2.839e-05 [set_forward_comm_id_for_comm_node_pass]: 9.54e-06 [meta_fg_expand]: 0.00143369 [flash_sp_send_recv_attached]: 4e-06 [receive_attached]: 2.59001e-06 [after_resolve]: 6.194e-05 [a_after_grad]: 8.574e-05 [renormalize]: 0.0059699 [add_forward_monad_depend]: 9.51998e-06 [auto_monad_grad]: 5.35999e-06 [auto_monad_eliminator]: 5.127e-05 [cse]: 0.00017993 [a_3]: 0.00032677 [Cycle 2]: 0.00259292, [45] [expand_dump_flag]: 1.57999e-06 [switch_simplify]: 4.489e-05 [loop_unroll]: 4.152e-05 [a_1]: 0.00131474 [with_stream_mark]: 1.114e-05 [recompute_prepare]: 8.60999e-06 [updatestate_depend_eliminate]: 4e-06 [updatestate_assign_eliminate]: 2.98998e-06 [updatestate_loads_eliminate]: 2.62001e-06 [parameter_eliminate]: 1.06997e-06 [a_2]: 8.651e-05 [accelerated_algorithm]: 1.009e-05 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 6.91001e-06 [merge_send_recv]: 5.99e-06 [auto_parallel]: 5.99e-06 [parallel]: 4.90999e-06 [flash_sp]: 3.75998e-06 [merge_comm]: 3.88001e-06 [allreduce_fusion]: 3.49001e-06 [matmul_add_comm_reduction]: 6.25002e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 7.34002e-06 [virtual_dataset]: 6.19999e-06 [get_grad_eliminate_]: 6.12999e-06 [virtual_output]: 5.97999e-06 [merge_forward]: 3.19001e-06 [cell_reuse_recompute_pass]: 9.70002e-07 [offload_activation]: 7.87e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.36e-05 [merge_recompute_call_nodes]: 9.89996e-07 [before_grad]: 1.095e-05 [set_forward_comm_id_for_comm_node_pass]: 4.18001e-06 [meta_fg_expand]: 5.08e-05 [flash_sp_send_recv_attached]: 9.09989e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.082e-05 [a_after_grad]: 9.92001e-06 [renormalize]: 0.00055018 [add_forward_monad_depend]: 4.25e-06 [auto_monad_grad]: 1.20999e-06 [auto_monad_eliminator]: 1.137e-05 [cse]: 2.086e-05 [a_3]: 4.743e-05 [Cycle 3]: 0.00070206, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 8.1e-06 [loop_unroll]: 6.42001e-06 [a_1]: 0.00017079 [with_stream_mark]: 8.48999e-06 [recompute_prepare]: 7.10998e-06 [updatestate_depend_eliminate]: 3.71999e-06 [updatestate_assign_eliminate]: 2.84999e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 8.547e-05 [accelerated_algorithm]: 1.001e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.25999e-06 [shard_inline]: 6.70002e-06 [merge_send_recv]: 5.25999e-06 [auto_parallel]: 6.01e-06 [parallel]: 4.95001e-06 [flash_sp]: 9.79984e-07 [merge_comm]: 3.78001e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 6.00002e-06 [allreduce_slice_to_reducescatter]: 7.40023e-07 [virtual_shard_identity]: 7.79002e-06 [virtual_dataset]: 5.99999e-06 [get_grad_eliminate_]: 6.07001e-06 [virtual_output]: 5.94e-06 [merge_forward]: 3.48e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 6.64999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.265e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 1.059e-05 [set_forward_comm_id_for_comm_node_pass]: 3.88999e-06 [meta_fg_expand]: 2.24999e-06 [flash_sp_send_recv_attached]: 8.99978e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 9.12001e-06 [a_after_grad]: 9.27999e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 7.68001e-06 [cse]: 1.731e-05 [a_3]: 3.885e-05 [py_interpret_to_execute_after_opt_a]: 9.48002e-06 [slice_cell_reuse_recomputed_activation]: 2.27999e-06 [rewriter_after_opt_a]: 3.994e-05 [convert_after_rewriter]: 7.56999e-06 [order_py_execute_after_rewriter]: 5.09e-06 [mutable_eliminate]: 0.00046879 [opt_b]: 0.00021363, [1] [Cycle 1]: 0.00020741, [7] [b_1]: 0.00013039 [b_2]: 8.57e-06 [updatestate_depend_eliminate]: 5.97999e-06 [updatestate_assign_eliminate]: 3.01001e-06 [updatestate_loads_eliminate]: 2.61e-06 [renormalize]: 3.9002e-07 [cse]: 2.116e-05 [optimize_parallel_all_gather_comm]: 1.735e-05 [overlap_param_gather]: 2.07001e-06 [cconv]: 1.893e-05 [loop_unroll]: 0.00042891 [opt_after_cconv]: 0.0001114, [1] [Cycle 1]: 0.00010519, [7] [c_1]: 3.318e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 6.19001e-06 [updatestate_assign_eliminate]: 3.33998e-06 [updatestate_loads_eliminate]: 2.79999e-06 [cse]: 2.151e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 1.603e-05 [tuple_transform]: 7.698e-05, [1] [Cycle 1]: 7.234e-05, [4] [d_1]: 4.479e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 7.13e-06 [partial_unused_args_eliminate]: 1.86e-06 [add_recomputation]: 4.825e-05 [cse_after_recomputation]: 2.524e-05, [1] [Cycle 1]: 2.04e-05, [1] [cse]: 1.499e-05 [environ_conv]: 7.45e-06 [swap_dp_allreduce_reducescatter]: 5.70001e-06 [bias_add_comm_swap]: 2.74999e-06 [label_micro_interleaved_index]: 4.31002e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.48e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 1.21997e-06 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.13002e-06 [reorder_send_recv_between_fp_bp]: 2.72001e-06 [comm_op_add_attrs]: 1.17999e-06 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.51998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.355e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.55999e-06 [overlap_recompute_and_grad_model_parallel]: 5.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.29999e-06 [overlap_grad_ring_attention]: 4.72998e-06 [overlap_grad_flash_sp]: 2.04e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.39999e-06 [split_layernorm_comm]: 1.72999e-06 [handle_group_info]: 1.14e-06 [symbol_engine_optimizer]: 8.628e-05, [1] [Cycle 1]: 8.153e-05, [6] [build]: 9.61e-06 [elim_shapecalc]: 1.029e-05 [elim_not_effective]: 1.464e-05 [opt_reshape]: 7.36001e-06 [fold_const_symbol]: 1.169e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.27999e-06 [pipeline_parallel_scheduler]: 1.46998e-06 [auto_monad_reorder]: 2.017e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.33998e-06 [opt_after_jit_grad]: 0.00046571 [validate]: 4.114e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0441303 [execute]: 8.67998e-06 Sums bootstrap : 0.000439s : 0.61% type_inference : 0.011450s : 15.98% event_method : 0.000044s : 0.06% auto_monad : 0.000128s : 0.18% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.06% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.05% optimize.rewriter_before_opt_a : 0.000144s : 0.20% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000125s : 0.17% optimize.opt_a.loop_unroll : 0.000107s : 0.15% optimize.opt_a.a_1 : 0.002846s : 3.97% optimize.opt_a.with_stream_mark : 0.000043s : 0.06% optimize.opt_a.recompute_prepare : 0.000038s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000013s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000413s : 0.58% optimize.opt_a.accelerated_algorithm : 0.000051s : 0.07% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000006s : 0.01% optimize.opt_a.shard_inline : 0.000030s : 0.04% optimize.opt_a.merge_send_recv : 0.000028s : 0.04% optimize.opt_a.auto_parallel : 0.000023s : 0.03% optimize.opt_a.parallel : 0.000028s : 0.04% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000017s : 0.02% optimize.opt_a.allreduce_fusion : 0.000016s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000039s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000033s : 0.05% optimize.opt_a.virtual_dataset : 0.000028s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000027s : 0.04% optimize.opt_a.virtual_output : 0.000027s : 0.04% optimize.opt_a.merge_forward : 0.000016s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000033s : 0.05% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000055s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000050s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.02% optimize.opt_a.meta_fg_expand : 0.001487s : 2.07% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000082s : 0.11% optimize.opt_a.a_after_grad : 0.000105s : 0.15% optimize.opt_a.renormalize : 0.006520s : 9.10% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.02% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000070s : 0.10% optimize.opt_a.cse : 0.000218s : 0.30% optimize.opt_a.a_3 : 0.000413s : 0.58% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000040s : 0.06% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000469s : 0.65% optimize.opt_b.b_1 : 0.000130s : 0.18% optimize.opt_b.b_2 : 0.000009s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000021s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.03% optimize.loop_unroll : 0.000429s : 0.60% optimize.opt_after_cconv.c_1 : 0.000033s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000022s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.02% optimize.tuple_transform.d_1 : 0.000045s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.07% optimize.cse_after_recomputation.cse : 0.000015s : 0.02% optimize.environ_conv : 0.000007s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000020s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000466s : 0.65% validate : 0.000041s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.044130s : 61.58% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000674 159 6.81% : 0.000046s : 7: substitution.arithmetic_simplify 0.35% : 0.000002s : 3: substitution.elim_not_effective 0.60% : 0.000004s : 5: substitution.float_depend_g_call 0.63% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 3: substitution.fold_const_symbol 0.90% : 0.000006s : 4: substitution.graph_param_transform 0.47% : 0.000003s : 2: substitution.incorporate_call 0.32% : 0.000002s : 2: substitution.incorporate_call_switch 58.13% : 0.000392s : 17: substitution.inline 2.34% : 0.000016s : 2: substitution.inline_without_move 1.45% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.19% : 0.000015s : 3: substitution.less_batch_normalization 1.43% : 0.000010s : 7: substitution.minmaximum_grad 0.86% : 0.000006s : 5: substitution.partial_eliminate 1.82% : 0.000012s : 15: substitution.remove_not_recompute_node 3.92% : 0.000026s : 10: substitution.replace_applicator 1.27% : 0.000009s : 10: substitution.replace_old_param 0.39% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.09% : 0.000021s : 7: substitution.tuple_list_convert_item_index_to_positive 1.50% : 0.000010s : 7: substitution.tuple_list_get_item_const_eliminator 2.04% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.24% : 0.000049s : 18: substitution.tuple_list_get_item_eliminator 1.98% : 0.000013s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011379 2 87.20% : 0.009923s : 1: type_inference.infer 12.80% : 0.001456s : 1: type_inference.specialize ------[replace.] 0.000188 26 66.40% : 0.000125s : 17: replace.inline 33.60% : 0.000063s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000406 26 94.12% : 0.000382s : 17: match.inline 5.88% : 0.000024s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000673 4180 1.13% : 0.000008s : 52: predicate.accumulaten_eliminater 0.23% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.47% : 0.000003s : 21: predicate.addn_check_dump 1.13% : 0.000008s : 52: predicate.addn_zero_filter 1.08% : 0.000007s : 52: predicate.adjust_all_reduce_mul_add 2.04% : 0.000014s : 73: predicate.arithmetic_simplify 1.14% : 0.000008s : 52: predicate.cast_eliminate 1.12% : 0.000008s : 50: predicate.check_bprop_eliminate 0.47% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.48% : 0.000003s : 21: predicate.depend_value_elim 1.17% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.22% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000000s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.21% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.19% : 0.000008s : 56: predicate.environ_get_depend_swap 1.72% : 0.000012s : 77: predicate.environ_get_eliminate 1.19% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.82% : 0.000012s : 78: predicate.exchange_switch_depend_value 2.43% : 0.000016s : 78: predicate.float_depend_g_call 0.47% : 0.000003s : 21: predicate.float_environ_get_switch 0.59% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.52% : 0.000003s : 21: predicate.get_grad_eliminate 0.08% : 0.000001s : 4: predicate.graph_param_transform 0.54% : 0.000004s : 21: predicate.incorporate_call 0.47% : 0.000003s : 21: predicate.incorporate_call_switch 5.92% : 0.000040s : 180: predicate.inline 1.43% : 0.000010s : 45: predicate.inline_without_move 0.29% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.61% : 0.000004s : 21: predicate.less_batch_normalization 1.56% : 0.000011s : 69: predicate.list_to_tuple_eliminator_ 2.64% : 0.000018s : 121: predicate.load_eliminater 0.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.52% : 0.000017s : 110: predicate.loop_unroll_before_grad 1.37% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.49% : 0.000003s : 21: predicate.merge_addn 1.11% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 52: predicate.minmaximum_grad 0.29% : 0.000002s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.13% : 0.000001s : 4: predicate.parallel_virtual_node 2.11% : 0.000014s : 78: predicate.partial_defer_inline 1.73% : 0.000012s : 65: predicate.partial_eliminate 1.13% : 0.000008s : 52: predicate.print_const_string_wrapper 0.48% : 0.000003s : 21: predicate.reduce_all_const_elim 1.43% : 0.000010s : 52: predicate.reduce_eliminate 2.63% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 21: predicate.remove_not_recompute_node 1.92% : 0.000013s : 111: predicate.replace_applicator 0.68% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.14% : 0.000008s : 52: predicate.reshape_eliminate 1.14% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.29% : 0.000009s : 50: predicate.same_eliminate 0.34% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.63% : 0.000004s : 21: predicate.shard_identity_eliminate 0.22% : 0.000001s : 8: predicate.special_op_eliminate 0.63% : 0.000004s : 21: predicate.specialize_transform 1.22% : 0.000008s : 50: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.12% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.93% : 0.000013s : 78: predicate.switch_defer_inline 3.04% : 0.000020s : 128: predicate.switch_layer_defer_inline 5.22% : 0.000035s : 213: predicate.switch_simplify 1.12% : 0.000008s : 52: predicate.tile_eliminate 1.10% : 0.000007s : 52: predicate.transpose_eliminate 1.46% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000010s : 60: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.71% : 0.000018s : 90: predicate.tuple_list_get_item_eliminator 1.45% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000013s : 81: predicate.tuple_list_set_item_eliminator 1.57% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.59% : 0.000017s : 121: predicate.updatestate_pure_node_eliminater 3.18% : 0.000021s : 142: predicate.updatestate_useless_node_eliminater 0.12% : 0.000001s : 4: predicate.value_based_eliminate 0.52% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.50% : 0.000003s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001647 35 59.30% : 0.000977s : 14: func_graph_cloner_run.FuncGraphClonerGraph 40.70% : 0.000670s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.105969 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.87% : 0.003040s : 1: add_attr 2.86% : 0.003031s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.13% : 0.000135s : 1: auto_monad 0.02% : 0.000024s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.44% : 0.000467s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000028s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.05% : 0.000051s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.41% : 0.000437s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.45% : 0.000478s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000015s : 1: opt.transform.mutable_eliminate 4.06% : 0.004297s : 117: opt.transform.opt_a 0.03% : 0.000032s : 1: opt.transform.opt_after_cconv 0.02% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000112s : 28: opt.transform.opt_b 0.05% : 0.000050s : 2: opt.transform.opt_trans_graph 0.04% : 0.000040s : 4: opt.transform.symbol_engine_opt 13.07% : 0.013853s : 1: opt_a 0.11% : 0.000115s : 1: opt_after_cconv 0.45% : 0.000476s : 1: opt_after_jit_grad 0.20% : 0.000217s : 1: opt_b 15.05% : 0.015946s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000024s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000051s : 1: pre_auto_parallel 0.04% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000020s : 1: remove_dup_value 4.72% : 0.004998s : 2: renormalize.infer 1.42% : 0.001509s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000044s : 1: rewriter_after_opt_a 0.14% : 0.000148s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000089s : 1: symbol_engine_optimizer 41.66% : 0.044149s : 1: task_emit 0.08% : 0.000080s : 1: tuple_transform 10.82% : 0.011466s : 1: type_inference 0.06% : 0.000063s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x0-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x0-ge],max_mem:10.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x1-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x1-pynative],max_mem:10.0M TotalTime = 0.0227254, [24] [bootstrap]: 0.00057173 [type_inference]: 0.00653582 [event_method]: 1.433e-05 [auto_monad]: 5.924e-05 [graph_reusing]: 5.62999e-06 [inline]: 2.21998e-06 [add_attr]: 0.00368652, [1] [add_attr_with_inline]: 0.00367466, [1] [Cycle 1]: 5.055e-05, [2] [tag_attr]: 1.577e-05 [meta_addattr_fg_expand]: 4.47998e-06 [parallel-infer-symbol]: 3.01999e-06 [pre_auto_parallel]: 2.722e-05 [insert-virtual-dataset]: 2.56998e-06 [parallel-infer-symbol-second]: 9.60019e-07 [dataset_repeat_opt]: 2.54001e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00428904, [53] [py_interpret_to_execute]: 2.344e-05 [rewriter_before_opt_a]: 6.366e-05 [opt_a]: 0.00225907, [2] [Cycle 1]: 0.00163788, [45] [expand_dump_flag]: 2.85998e-06 [switch_simplify]: 3.314e-05 [loop_unroll]: 2.036e-05 [a_1]: 0.00044992 [with_stream_mark]: 1.499e-05 [recompute_prepare]: 7.93999e-06 [updatestate_depend_eliminate]: 3.79002e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 3.33e-06 [parameter_eliminate]: 1.77001e-06 [a_2]: 7.93e-05 [accelerated_algorithm]: 6.79999e-06 [shard]: 2.53e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 5.97999e-06 [merge_send_recv]: 8.43999e-06 [auto_parallel]: 6.38003e-06 [parallel]: 2.654e-05 [flash_sp]: 7.98999e-06 [merge_comm]: 3.85e-06 [allreduce_fusion]: 3.58e-06 [matmul_add_comm_reduction]: 9.20999e-06 [allreduce_slice_to_reducescatter]: 7.40023e-07 [virtual_shard_identity]: 7.8e-06 [virtual_dataset]: 6.00002e-06 [get_grad_eliminate_]: 5.91e-06 [virtual_output]: 5.78002e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 9.91e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.178e-05 [merge_recompute_call_nodes]: 1.95001e-06 [before_grad]: 9.91998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.66001e-06 [meta_fg_expand]: 2.55002e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 2.35002e-06 [after_resolve]: 9.62999e-06 [a_after_grad]: 8.75001e-06 [renormalize]: 0.00049937 [add_forward_monad_depend]: 9.65002e-06 [auto_monad_grad]: 2.17001e-06 [auto_monad_eliminator]: 1.371e-05 [cse]: 2.998e-05 [a_3]: 4.323e-05 [Cycle 2]: 0.00061109, [45] [expand_dump_flag]: 1.20999e-06 [switch_simplify]: 6.68e-06 [loop_unroll]: 5.59e-06 [a_1]: 0.00011491 [with_stream_mark]: 1.011e-05 [recompute_prepare]: 5.89e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.21998e-06 [updatestate_loads_eliminate]: 2.80002e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 7.083e-05 [accelerated_algorithm]: 6.02999e-06 [shard]: 1.18001e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 5.24e-06 [auto_parallel]: 7.04001e-06 [parallel]: 4.62e-06 [flash_sp]: 3.93001e-06 [merge_comm]: 3.45998e-06 [allreduce_fusion]: 2.97002e-06 [matmul_add_comm_reduction]: 5.54998e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.51e-06 [virtual_dataset]: 5.45001e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 5.09e-06 [merge_forward]: 3.3e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 6.28e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.024e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.57e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38999e-06 [meta_fg_expand]: 1.87999e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 9.50007e-07 [after_resolve]: 9.01002e-06 [a_after_grad]: 8.25999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.30001e-06 [auto_monad_grad]: 1.06002e-06 [auto_monad_eliminator]: 6.94999e-06 [cse]: 1.405e-05 [a_3]: 3.245e-05 [py_interpret_to_execute_after_opt_a]: 8.77999e-06 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 3.324e-05 [convert_after_rewriter]: 6.48e-06 [order_py_execute_after_rewriter]: 5.27999e-06 [mutable_eliminate]: 0.00057016 [opt_b]: 0.00018976, [1] [Cycle 1]: 0.00018293, [7] [b_1]: 0.00010794 [b_2]: 7.89002e-06 [updatestate_depend_eliminate]: 5.69999e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.80009e-07 [cse]: 1.97e-05 [optimize_parallel_all_gather_comm]: 1.567e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.552e-05 [loop_unroll]: 0.00042903 [opt_after_cconv]: 9.861e-05, [1] [Cycle 1]: 9.248e-05, [7] [c_1]: 2.647e-05 [parameter_eliminate]: 2.92002e-06 [updatestate_depend_eliminate]: 5.24003e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.806e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.559e-05 [tuple_transform]: 6.923e-05, [1] [Cycle 1]: 6.471e-05, [4] [d_1]: 3.802e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 1.88002e-06 [add_recomputation]: 5.176e-05 [cse_after_recomputation]: 2.196e-05, [1] [Cycle 1]: 1.738e-05, [1] [cse]: 1.162e-05 [environ_conv]: 7.95e-06 [swap_dp_allreduce_reducescatter]: 5.15001e-06 [bias_add_comm_swap]: 3.08e-06 [label_micro_interleaved_index]: 4.33999e-06 [label_fine_grained_interleaved_index]: 2.60002e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.48e-06 [micro_interleaved_order_control]: 2.15002e-06 [assign_add_opt]: 1.72999e-06 [ForceFp32Comm]: 8.60018e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.21e-06 [reorder_send_recv_between_fp_bp]: 3.01001e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 1.01997e-06 [interleave_split_concat_branches]: 1.49e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.26002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.90001e-06 [control_data_broadcast_order]: 1.224e-05 [grouped_pairwise_exchange_alltoall]: 1.40001e-06 [offloading_packed_experts]: 3.79002e-06 [overlap_recompute_and_grad_model_parallel]: 4.51002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.36002e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 4.17e-06 [overlap_grad_flash_sp]: 1.815e-05 [begin_end_overlap_inline]: 8.29983e-07 [split_matmul_comm_elemetwise]: 2.32999e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.35999e-06 [symbol_engine_optimizer]: 7.173e-05, [1] [Cycle 1]: 6.693e-05, [6] [build]: 2.68998e-06 [elim_shapecalc]: 9.07001e-06 [elim_not_effective]: 1.192e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 9.53002e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.59e-06 [auto_monad_reorder]: 1.589e-05 [get_jit_bprop_graph]: 1.09003e-06 [rewriter_after_jit_bprop_graph]: 0.00014323 [opt_after_jit_grad]: 0.00049092 [validate]: 3.728e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.00660746 [execute]: 7.38e-06 Sums bootstrap : 0.000572s : 3.17% type_inference : 0.006536s : 36.28% event_method : 0.000014s : 0.08% auto_monad : 0.000059s : 0.33% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000003s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000023s : 0.13% optimize.rewriter_before_opt_a : 0.000064s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.22% optimize.opt_a.loop_unroll : 0.000026s : 0.14% optimize.opt_a.a_1 : 0.000565s : 3.14% optimize.opt_a.with_stream_mark : 0.000025s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000150s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.07% optimize.opt_a.parallel : 0.000031s : 0.17% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000499s : 2.77% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.06% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.11% optimize.opt_a.cse : 0.000044s : 0.24% optimize.opt_a.a_3 : 0.000076s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.18% optimize.convert_after_rewriter : 0.000006s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000570s : 3.17% optimize.opt_b.b_1 : 0.000108s : 0.60% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000026s : 0.14% optimize.loop_unroll : 0.000429s : 2.38% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000008s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000143s : 0.80% opt_after_jit_grad : 0.000491s : 2.73% validate : 0.000037s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006607s : 36.68% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000174 26 18.93% : 0.000033s : 5: substitution.arithmetic_simplify 1.37% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000002s : 2: substitution.fold_const_symbol 3.19% : 0.000006s : 3: substitution.graph_param_transform 63.16% : 0.000110s : 3: substitution.inline 1.84% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.75% : 0.000005s : 4: substitution.remove_not_recompute_node 2.02% : 0.000004s : 2: substitution.replace_old_param 5.75% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006483 2 90.12% : 0.005843s : 1: type_inference.infer 9.88% : 0.000640s : 1: type_inference.specialize ------[replace.] 0.000037 4 79.25% : 0.000029s : 3: replace.inline 20.75% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 4 92.50% : 0.000108s : 3: match.inline 7.50% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 883 0.88% : 0.000001s : 9: predicate.accumulaten_eliminater 0.85% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.97% : 0.000002s : 9: predicate.addn_zero_filter 0.85% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.09% : 0.000003s : 15: predicate.arithmetic_simplify 0.93% : 0.000001s : 9: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.60% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.39% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_depend_swap 1.79% : 0.000003s : 18: predicate.environ_get_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.35% : 0.000004s : 13: predicate.float_depend_g_call 0.62% : 0.000001s : 6: predicate.float_environ_get_switch 0.86% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.75% : 0.000001s : 6: predicate.incorporate_call 0.61% : 0.000001s : 6: predicate.incorporate_call_switch 6.44% : 0.000010s : 40: predicate.inline 0.91% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 6: predicate.less_batch_normalization 1.69% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 25: predicate.load_eliminater 1.05% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.63% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.84% : 0.000001s : 9: predicate.minmaximum_grad 1.11% : 0.000002s : 3: predicate.mutable_eliminate 0.39% : 0.000001s : 3: predicate.opt_reshape 0.34% : 0.000001s : 3: predicate.parallel_virtual_node 1.60% : 0.000003s : 13: predicate.partial_defer_inline 1.42% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.17% : 0.000002s : 9: predicate.reduce_eliminate 2.37% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 6: predicate.remove_not_recompute_node 1.34% : 0.000002s : 16: predicate.replace_applicator 0.57% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 9: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.51% : 0.000001s : 3: predicate.row_tensor_eliminate 0.82% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 6: predicate.shard_identity_eliminate 0.79% : 0.000001s : 6: predicate.special_op_eliminate 0.84% : 0.000001s : 6: predicate.specialize_transform 1.02% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.72% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 13: predicate.switch_defer_inline 1.98% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.86% : 0.000008s : 43: predicate.switch_simplify 0.89% : 0.000001s : 9: predicate.tile_eliminate 0.92% : 0.000001s : 9: predicate.transpose_eliminate 1.66% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.70% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.34% : 0.000001s : 3: predicate.value_based_eliminate 0.74% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000396 8 45.73% : 0.000181s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.27% : 0.000215s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032283 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.43% : 0.003691s : 1: add_attr 11.39% : 0.003679s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000056s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.20% : 0.000065s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.90% : 0.000612s : 1: bootstrap 0.09% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000011s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.36% : 0.000438s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.80% : 0.000580s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 2.90% : 0.000935s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000088s : 28: opt.transform.opt_b 0.13% : 0.000042s : 2: opt.transform.opt_trans_graph 0.10% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.01% : 0.002262s : 1: opt_a 0.32% : 0.000102s : 1: opt_after_cconv 1.55% : 0.000501s : 1: opt_after_jit_grad 0.60% : 0.000193s : 1: opt_b 13.30% : 0.004293s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.10% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000027s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.80% : 0.000259s : 1: renormalize.infer 0.72% : 0.000233s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.46% : 0.000149s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000037s : 1: rewriter_after_opt_a 0.21% : 0.000068s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000075s : 1: symbol_engine_optimizer 20.50% : 0.006619s : 1: task_emit 0.22% : 0.000072s : 1: tuple_transform 20.29% : 0.006551s : 1: type_inference 0.21% : 0.000067s : 1: validate TotalTime = 0.0197246, [24] [bootstrap]: 0.00038016 [type_inference]: 0.00555257 [event_method]: 1.259e-05 [auto_monad]: 5.812e-05 [graph_reusing]: 5.23002e-06 [inline]: 1.76e-06 [add_attr]: 0.00302167, [1] [add_attr_with_inline]: 0.00301419, [1] [Cycle 1]: 4.93e-05, [2] [tag_attr]: 1.387e-05 [meta_addattr_fg_expand]: 4.15999e-06 [parallel-infer-symbol]: 3.63e-06 [pre_auto_parallel]: 2.448e-05 [insert-virtual-dataset]: 2.98e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.00384544, [53] [py_interpret_to_execute]: 2.007e-05 [rewriter_before_opt_a]: 5.051e-05 [opt_a]: 0.00198882, [2] [Cycle 1]: 0.00138524, [45] [expand_dump_flag]: 2.84001e-06 [switch_simplify]: 2.813e-05 [loop_unroll]: 1.714e-05 [a_1]: 0.00034601 [with_stream_mark]: 1.471e-05 [recompute_prepare]: 7.57998e-06 [updatestate_depend_eliminate]: 3.58e-06 [updatestate_assign_eliminate]: 3.65e-06 [updatestate_loads_eliminate]: 3.16999e-06 [parameter_eliminate]: 1.90001e-06 [a_2]: 7.974e-05 [accelerated_algorithm]: 6.98e-06 [shard]: 1.91e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 6.26e-06 [merge_send_recv]: 9.14e-06 [auto_parallel]: 6.09999e-06 [parallel]: 1.856e-05 [flash_sp]: 8.07e-06 [merge_comm]: 4.15999e-06 [allreduce_fusion]: 3.53e-06 [matmul_add_comm_reduction]: 9.30001e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.65e-06 [virtual_dataset]: 5.93002e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.76e-06 [merge_forward]: 4.03001e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 9.63002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.149e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 1.035e-05 [set_forward_comm_id_for_comm_node_pass]: 3.68999e-06 [meta_fg_expand]: 2.61e-06 [flash_sp_send_recv_attached]: 2.57001e-06 [receive_attached]: 2.46998e-06 [after_resolve]: 9.57001e-06 [a_after_grad]: 8.55999e-06 [renormalize]: 0.000385 [add_forward_monad_depend]: 5.02999e-06 [auto_monad_grad]: 1.86e-06 [auto_monad_eliminator]: 1.331e-05 [cse]: 2.914e-05 [a_3]: 4.089e-05 [Cycle 2]: 0.0005942, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 6.64999e-06 [loop_unroll]: 5.43002e-06 [a_1]: 0.00011248 [with_stream_mark]: 1.199e-05 [recompute_prepare]: 5.84e-06 [updatestate_depend_eliminate]: 2.94001e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 1.01002e-06 [a_2]: 7.031e-05 [accelerated_algorithm]: 5.78002e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 4.41002e-06 [auto_parallel]: 5.27001e-06 [parallel]: 4.05e-06 [flash_sp]: 3.13e-06 [merge_comm]: 3.29001e-06 [allreduce_fusion]: 2.96001e-06 [matmul_add_comm_reduction]: 5.22999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.39001e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.61999e-06 [cell_reuse_recompute_pass]: 1.45001e-06 [offload_activation]: 6.05002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.005e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 8.88002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 8.28999e-06 [a_after_grad]: 7.93999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.06e-06 [cse]: 1.355e-05 [a_3]: 3.186e-05 [py_interpret_to_execute_after_opt_a]: 7.18e-06 [slice_cell_reuse_recomputed_activation]: 1.89999e-06 [rewriter_after_opt_a]: 3.233e-05 [convert_after_rewriter]: 6.61e-06 [order_py_execute_after_rewriter]: 5.35999e-06 [mutable_eliminate]: 0.00046527 [opt_b]: 0.00018332, [1] [Cycle 1]: 0.00017703, [7] [b_1]: 0.00010702 [b_2]: 6.84001e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.46998e-06 [renormalize]: 3.89991e-07 [cse]: 1.753e-05 [optimize_parallel_all_gather_comm]: 1.611e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.277e-05 [loop_unroll]: 0.00041553 [opt_after_cconv]: 9.429e-05, [1] [Cycle 1]: 8.882e-05, [7] [c_1]: 2.567e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 4.85001e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.56e-06 [cse]: 1.712e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.489e-05 [tuple_transform]: 6.739e-05, [1] [Cycle 1]: 6.268e-05, [4] [d_1]: 3.682e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.38998e-06 [partial_unused_args_eliminate]: 1.74998e-06 [add_recomputation]: 4.455e-05 [cse_after_recomputation]: 2.138e-05, [1] [Cycle 1]: 1.699e-05, [1] [cse]: 1.167e-05 [environ_conv]: 5.14998e-06 [swap_dp_allreduce_reducescatter]: 5.04e-06 [bias_add_comm_swap]: 2.32001e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.37999e-06 [assign_add_opt]: 1.67001e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.22999e-06 [reorder_send_recv_between_fp_bp]: 2.78003e-06 [comm_op_add_attrs]: 1.09998e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.55001e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.19998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.241e-05 [grouped_pairwise_exchange_alltoall]: 1.69998e-06 [offloading_packed_experts]: 3.71001e-06 [overlap_recompute_and_grad_model_parallel]: 4.45999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.25001e-06 [overlap_recompute_comm]: 2.49999e-06 [overlap_grad_ring_attention]: 3.98001e-06 [overlap_grad_flash_sp]: 1.722e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.00002e-06 [split_layernorm_comm]: 1.79998e-06 [handle_group_info]: 1.08001e-06 [symbol_engine_optimizer]: 7.122e-05, [1] [Cycle 1]: 6.682e-05, [6] [build]: 2.53e-06 [elim_shapecalc]: 8.73001e-06 [elim_not_effective]: 1.175e-05 [opt_reshape]: 6.38e-06 [fold_const_symbol]: 9.42999e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.89999e-06 [auto_monad_reorder]: 1.56e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.33998e-06 [opt_after_jit_grad]: 0.00044875 [validate]: 3.312e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.00610766 [execute]: 7.65e-06 Sums bootstrap : 0.000380s : 2.42% type_inference : 0.005553s : 35.31% event_method : 0.000013s : 0.08% auto_monad : 0.000058s : 0.37% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000024s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000051s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000035s : 0.22% optimize.opt_a.loop_unroll : 0.000023s : 0.14% optimize.opt_a.a_1 : 0.000458s : 2.92% optimize.opt_a.with_stream_mark : 0.000027s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000150s : 0.95% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000014s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000385s : 2.45% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000043s : 0.27% optimize.opt_a.a_3 : 0.000073s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000465s : 2.96% optimize.opt_b.b_1 : 0.000107s : 0.68% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000416s : 2.64% optimize.opt_after_cconv.c_1 : 0.000026s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000002s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 2.85% validate : 0.000033s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006108s : 38.84% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000136 24 21.99% : 0.000030s : 4: substitution.arithmetic_simplify 1.38% : 0.000002s : 2: substitution.elim_not_effective 1.15% : 0.000002s : 2: substitution.fold_const_symbol 4.22% : 0.000006s : 3: substitution.graph_param_transform 63.08% : 0.000086s : 3: substitution.inline 2.48% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.17% : 0.000004s : 4: substitution.remove_not_recompute_node 2.53% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005511 2 91.90% : 0.005065s : 1: type_inference.infer 8.10% : 0.000446s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000084 3 100.00% : 0.000084s : 3: match.inline ------[predicate.] 0.000145 815 0.86% : 0.000001s : 8: predicate.accumulaten_eliminater 0.82% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 6: predicate.addn_check_dump 0.86% : 0.000001s : 8: predicate.addn_zero_filter 0.79% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.19% : 0.000003s : 14: predicate.arithmetic_simplify 0.88% : 0.000001s : 8: predicate.cast_eliminate 0.69% : 0.000001s : 6: predicate.check_bprop_eliminate 0.66% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.82% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_depend_swap 1.79% : 0.000003s : 17: predicate.environ_get_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.39% : 0.000003s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 6: predicate.float_environ_get_switch 0.91% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.70% : 0.000001s : 6: predicate.get_grad_eliminate 0.29% : 0.000000s : 3: predicate.graph_param_transform 0.72% : 0.000001s : 6: predicate.incorporate_call 0.62% : 0.000001s : 6: predicate.incorporate_call_switch 6.40% : 0.000009s : 37: predicate.inline 0.91% : 0.000001s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.20% : 0.000002s : 6: predicate.less_batch_normalization 1.53% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 22: predicate.load_eliminater 1.06% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.98% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.63% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.66% : 0.000001s : 6: predicate.merge_addn 0.66% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.19% : 0.000002s : 3: predicate.mutable_eliminate 0.43% : 0.000001s : 3: predicate.opt_reshape 0.46% : 0.000001s : 3: predicate.parallel_virtual_node 1.50% : 0.000002s : 11: predicate.partial_defer_inline 1.31% : 0.000002s : 11: predicate.partial_eliminate 0.84% : 0.000001s : 8: predicate.print_const_string_wrapper 0.91% : 0.000001s : 6: predicate.reduce_all_const_elim 1.23% : 0.000002s : 8: predicate.reduce_eliminate 2.22% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.62% : 0.000001s : 6: predicate.remove_not_recompute_node 1.23% : 0.000002s : 14: predicate.replace_applicator 0.64% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.88% : 0.000001s : 8: predicate.reshape_eliminate 0.85% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000001s : 6: predicate.same_eliminate 0.48% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 6: predicate.shard_identity_eliminate 0.77% : 0.000001s : 6: predicate.special_op_eliminate 0.91% : 0.000001s : 6: predicate.specialize_transform 0.96% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.25% : 0.000002s : 11: predicate.switch_defer_inline 1.88% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.96% : 0.000007s : 38: predicate.switch_simplify 0.83% : 0.000001s : 8: predicate.tile_eliminate 0.97% : 0.000001s : 8: predicate.transpose_eliminate 1.52% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.46% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.41% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.17% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.00% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 3: predicate.value_based_eliminate 0.76% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000254 7 34.88% : 0.000088s : 2: func_graph_cloner_run.FuncGraphClonerGraph 65.12% : 0.000165s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027945 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.83% : 0.003026s : 1: add_attr 10.80% : 0.003018s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000049s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.23% : 0.000063s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.47% : 0.000410s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.52% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.70% : 0.000474s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.92% : 0.000817s : 78: opt.transform.opt_a 0.09% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000087s : 28: opt.transform.opt_b 0.15% : 0.000041s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.13% : 0.001992s : 1: opt_a 0.35% : 0.000098s : 1: opt_after_cconv 1.64% : 0.000458s : 1: opt_after_jit_grad 0.67% : 0.000187s : 1: opt_b 13.78% : 0.003849s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000019s : 1: remove_dup_value 0.72% : 0.000201s : 1: renormalize.infer 0.64% : 0.000178s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.20% : 0.000055s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000074s : 1: symbol_engine_optimizer 21.89% : 0.006118s : 1: task_emit 0.25% : 0.000070s : 1: tuple_transform 19.92% : 0.005566s : 1: type_inference 0.21% : 0.000059s : 1: validate TotalTime = 0.019662, [24] [bootstrap]: 0.00047385 [type_inference]: 0.00556052 [event_method]: 1.379e-05 [auto_monad]: 5.914e-05 [graph_reusing]: 5.52001e-06 [inline]: 2.11e-06 [add_attr]: 0.00306121, [1] [add_attr_with_inline]: 0.00305307, [1] [Cycle 1]: 4.884e-05, [2] [tag_attr]: 1.554e-05 [meta_addattr_fg_expand]: 3.92002e-06 [parallel-infer-symbol]: 3.46001e-06 [pre_auto_parallel]: 2.595e-05 [insert-virtual-dataset]: 2.50002e-06 [parallel-infer-symbol-second]: 7.99977e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00414736, [53] [py_interpret_to_execute]: 2.273e-05 [rewriter_before_opt_a]: 6.264e-05 [opt_a]: 0.00218786, [2] [Cycle 1]: 0.00157104, [45] [expand_dump_flag]: 3.21001e-06 [switch_simplify]: 3.244e-05 [loop_unroll]: 2.064e-05 [a_1]: 0.00043316 [with_stream_mark]: 1.454e-05 [recompute_prepare]: 8.70999e-06 [updatestate_depend_eliminate]: 4e-06 [updatestate_assign_eliminate]: 3.11001e-06 [updatestate_loads_eliminate]: 3.04001e-06 [parameter_eliminate]: 1.69998e-06 [a_2]: 7.95e-05 [accelerated_algorithm]: 6.74001e-06 [shard]: 1.87001e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 6.17001e-06 [merge_send_recv]: 7.45e-06 [auto_parallel]: 6.74001e-06 [parallel]: 1.743e-05 [flash_sp]: 7.83001e-06 [merge_comm]: 4.10998e-06 [allreduce_fusion]: 3.6e-06 [matmul_add_comm_reduction]: 8.96002e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7.36001e-06 [virtual_dataset]: 6.39999e-06 [get_grad_eliminate_]: 5.96e-06 [virtual_output]: 6.34001e-06 [merge_forward]: 4.17e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 9.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.172e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 1.047e-05 [set_forward_comm_id_for_comm_node_pass]: 3.58999e-06 [meta_fg_expand]: 2.71e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.61999e-06 [after_resolve]: 9.51e-06 [a_after_grad]: 8.79e-06 [renormalize]: 0.00047646 [add_forward_monad_depend]: 4.73001e-06 [auto_monad_grad]: 2.04e-06 [auto_monad_eliminator]: 1.375e-05 [cse]: 2.258e-05 [a_3]: 4.184e-05 [Cycle 2]: 0.00060644, [45] [expand_dump_flag]: 1.20001e-06 [switch_simplify]: 7.31001e-06 [loop_unroll]: 5.87001e-06 [a_1]: 0.00011402 [with_stream_mark]: 1.11e-05 [recompute_prepare]: 5.92999e-06 [updatestate_depend_eliminate]: 3.03998e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 7.068e-05 [accelerated_algorithm]: 5.86e-06 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.87999e-06 [merge_send_recv]: 4.86002e-06 [auto_parallel]: 6.04999e-06 [parallel]: 4.33999e-06 [flash_sp]: 3.29001e-06 [merge_comm]: 3.17002e-06 [allreduce_fusion]: 2.89001e-06 [matmul_add_comm_reduction]: 5.37999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.38998e-06 [virtual_dataset]: 5.47999e-06 [get_grad_eliminate_]: 5.35999e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.78e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 6.23e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.062e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 8.72e-06 [set_forward_comm_id_for_comm_node_pass]: 3.61999e-06 [meta_fg_expand]: 1.92999e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 9.29984e-07 [after_resolve]: 8.95999e-06 [a_after_grad]: 7.65e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 1.11002e-06 [auto_monad_eliminator]: 6.81999e-06 [cse]: 1.452e-05 [a_3]: 3.216e-05 [py_interpret_to_execute_after_opt_a]: 8.32998e-06 [slice_cell_reuse_recomputed_activation]: 2.24001e-06 [rewriter_after_opt_a]: 3.211e-05 [convert_after_rewriter]: 6.76e-06 [order_py_execute_after_rewriter]: 5.56998e-06 [mutable_eliminate]: 0.00048736 [opt_b]: 0.00023602, [1] [Cycle 1]: 0.00022964, [7] [b_1]: 0.00015819 [b_2]: 7.07002e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.42001e-06 [renormalize]: 3.4002e-07 [cse]: 1.719e-05 [optimize_parallel_all_gather_comm]: 1.629e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.452e-05 [loop_unroll]: 0.00041932 [opt_after_cconv]: 9.552e-05, [1] [Cycle 1]: 8.89e-05, [7] [c_1]: 2.485e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 4.83001e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.39999e-06 [cse]: 1.737e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.429e-05 [tuple_transform]: 6.928e-05, [1] [Cycle 1]: 6.456e-05, [4] [d_1]: 3.743e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.69001e-06 [partial_unused_args_eliminate]: 1.89999e-06 [add_recomputation]: 4.455e-05 [cse_after_recomputation]: 2.135e-05, [1] [Cycle 1]: 1.65e-05, [1] [cse]: 1.111e-05 [environ_conv]: 4.94e-06 [swap_dp_allreduce_reducescatter]: 4.73001e-06 [bias_add_comm_swap]: 2.89999e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.40001e-06 [slice_recompute_activation]: 2.16998e-06 [micro_interleaved_order_control]: 2.45002e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.49999e-06 [reorder_send_recv_between_fp_bp]: 3.04999e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 1.14998e-06 [interleave_split_concat_branches]: 1.30999e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72999e-06 [control_data_broadcast_order]: 1.179e-05 [grouped_pairwise_exchange_alltoall]: 1.40999e-06 [offloading_packed_experts]: 3.99002e-06 [overlap_recompute_and_grad_model_parallel]: 5.27001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.41998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.48998e-06 [overlap_grad_ring_attention]: 4.2e-06 [overlap_grad_flash_sp]: 1.767e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 2.11e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 7.142e-05, [1] [Cycle 1]: 6.665e-05, [6] [build]: 2.59999e-06 [elim_shapecalc]: 8.21002e-06 [elim_not_effective]: 1.189e-05 [opt_reshape]: 6.27001e-06 [fold_const_symbol]: 9.51003e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.85001e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 1.615e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.5e-06 [opt_after_jit_grad]: 0.00045513 [validate]: 3.342e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00559594 [execute]: 5.54e-06 Sums bootstrap : 0.000474s : 3.04% type_inference : 0.005561s : 35.62% event_method : 0.000014s : 0.09% auto_monad : 0.000059s : 0.38% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000023s : 0.15% optimize.rewriter_before_opt_a : 0.000063s : 0.40% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000040s : 0.25% optimize.opt_a.loop_unroll : 0.000027s : 0.17% optimize.opt_a.a_1 : 0.000547s : 3.50% optimize.opt_a.with_stream_mark : 0.000026s : 0.16% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000150s : 0.96% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.12% optimize.opt_a.a_after_grad : 0.000016s : 0.11% optimize.opt_a.renormalize : 0.000477s : 3.05% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000037s : 0.24% optimize.opt_a.a_3 : 0.000074s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000487s : 3.12% optimize.opt_b.b_1 : 0.000158s : 1.01% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.16% optimize.loop_unroll : 0.000419s : 2.69% optimize.opt_after_cconv.c_1 : 0.000025s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000455s : 2.92% validate : 0.000033s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005596s : 35.85% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000170 26 19.63% : 0.000033s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.82% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000006s : 3: substitution.graph_param_transform 62.97% : 0.000107s : 3: substitution.inline 2.10% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.93% : 0.000005s : 4: substitution.remove_not_recompute_node 2.17% : 0.000004s : 2: substitution.replace_old_param 5.02% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005518 2 89.34% : 0.004930s : 1: type_inference.infer 10.66% : 0.000588s : 1: type_inference.specialize ------[replace.] 0.000034 4 77.01% : 0.000026s : 3: replace.inline 22.99% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000113 4 93.02% : 0.000105s : 3: match.inline 6.98% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000156 883 0.96% : 0.000001s : 9: predicate.accumulaten_eliminater 0.86% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.15% : 0.000003s : 15: predicate.arithmetic_simplify 0.93% : 0.000001s : 9: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.90% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.02% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_depend_swap 1.77% : 0.000003s : 18: predicate.environ_get_eliminate 1.16% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.23% : 0.000003s : 13: predicate.float_depend_g_call 0.57% : 0.000001s : 6: predicate.float_environ_get_switch 0.85% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.71% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.69% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.33% : 0.000010s : 40: predicate.inline 0.89% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 6: predicate.less_batch_normalization 1.68% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 25: predicate.load_eliminater 0.96% : 0.000001s : 3: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.82% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 9: predicate.minmaximum_grad 1.22% : 0.000002s : 3: predicate.mutable_eliminate 0.42% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.61% : 0.000003s : 13: predicate.partial_defer_inline 1.43% : 0.000002s : 13: predicate.partial_eliminate 1.15% : 0.000002s : 9: predicate.print_const_string_wrapper 0.59% : 0.000001s : 6: predicate.reduce_all_const_elim 1.21% : 0.000002s : 9: predicate.reduce_eliminate 2.42% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 6: predicate.remove_not_recompute_node 1.30% : 0.000002s : 16: predicate.replace_applicator 0.69% : 0.000001s : 6: predicate.replace_old_param 0.32% : 0.000001s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 9: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.83% : 0.000001s : 6: predicate.same_eliminate 0.47% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 6: predicate.shard_identity_eliminate 0.91% : 0.000001s : 6: predicate.special_op_eliminate 0.87% : 0.000001s : 6: predicate.specialize_transform 0.91% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 13: predicate.switch_defer_inline 1.98% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.93% : 0.000008s : 43: predicate.switch_simplify 0.91% : 0.000001s : 9: predicate.tile_eliminate 0.89% : 0.000001s : 9: predicate.transpose_eliminate 1.61% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.03% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.08% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.70% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000354 8 44.66% : 0.000158s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.34% : 0.000196s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028415 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.79% : 0.003066s : 1: add_attr 10.76% : 0.003057s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000064s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.75% : 0.000497s : 1: bootstrap 0.10% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000496s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 3.23% : 0.000918s : 78: opt.transform.opt_a 0.08% : 0.000023s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.71% : 0.002191s : 1: opt_a 0.35% : 0.000099s : 1: opt_after_cconv 1.64% : 0.000465s : 1: opt_after_jit_grad 0.84% : 0.000239s : 1: opt_b 14.61% : 0.004152s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.09% : 0.000027s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.86% : 0.000244s : 1: renormalize.infer 0.79% : 0.000224s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.24% : 0.000067s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000074s : 1: symbol_engine_optimizer 19.73% : 0.005606s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 19.62% : 0.005575s : 1: type_inference 0.22% : 0.000062s : 1: validate TotalTime = 0.0398014, [24] [bootstrap]: 0.00051743 [type_inference]: 0.0119586 [event_method]: 4.637e-05 [auto_monad]: 0.00013609 [graph_reusing]: 9.04998e-06 [inline]: 1.87999e-06 [add_attr]: 0.00315411, [1] [add_attr_with_inline]: 0.00314576, [1] [Cycle 1]: 7.301e-05, [2] [tag_attr]: 3.401e-05 [meta_addattr_fg_expand]: 1.016e-05 [parallel-infer-symbol]: 3.18998e-06 [pre_auto_parallel]: 5.116e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 1.80001e-06 [pipeline_split]: 1.63002e-06 [optimize]: 0.0167033, [53] [py_interpret_to_execute]: 4.174e-05 [rewriter_before_opt_a]: 0.00015678 [opt_a]: 0.0145291, [3] [Cycle 1]: 0.0111148, [45] [expand_dump_flag]: 3.4e-06 [switch_simplify]: 7.702e-05 [loop_unroll]: 6.297e-05 [a_1]: 0.00142526 [with_stream_mark]: 2.447e-05 [recompute_prepare]: 2.285e-05 [updatestate_depend_eliminate]: 9.22999e-06 [updatestate_assign_eliminate]: 7.98999e-06 [updatestate_loads_eliminate]: 6.91001e-06 [parameter_eliminate]: 3.10002e-06 [a_2]: 0.00024226 [accelerated_algorithm]: 3.36e-05 [shard]: 1.90001e-06 [meta_shard_fg_expand]: 3.56001e-06 [shard_inline]: 1.618e-05 [merge_send_recv]: 1.684e-05 [auto_parallel]: 1.115e-05 [parallel]: 1.947e-05 [flash_sp]: 1.184e-05 [merge_comm]: 9.77001e-06 [allreduce_fusion]: 9.04e-06 [matmul_add_comm_reduction]: 2.737e-05 [allreduce_slice_to_reducescatter]: 8.80013e-07 [virtual_shard_identity]: 1.791e-05 [virtual_dataset]: 1.559e-05 [get_grad_eliminate_]: 1.47e-05 [virtual_output]: 1.495e-05 [merge_forward]: 9.52001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.939e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.968e-05 [merge_recompute_call_nodes]: 1.53002e-06 [before_grad]: 2.863e-05 [set_forward_comm_id_for_comm_node_pass]: 9.89001e-06 [meta_fg_expand]: 0.00154244 [flash_sp_send_recv_attached]: 3.51999e-06 [receive_attached]: 2.27999e-06 [after_resolve]: 6.621e-05 [a_after_grad]: 8.799e-05 [renormalize]: 0.00632221 [add_forward_monad_depend]: 9.56998e-06 [auto_monad_grad]: 6.43003e-06 [auto_monad_eliminator]: 5.083e-05 [cse]: 0.00018037 [a_3]: 0.0003321 [Cycle 2]: 0.00272481, [45] [expand_dump_flag]: 2.22001e-06 [switch_simplify]: 5.87e-05 [loop_unroll]: 4.244e-05 [a_1]: 0.0013207 [with_stream_mark]: 1.255e-05 [recompute_prepare]: 9.46e-06 [updatestate_depend_eliminate]: 4.48001e-06 [updatestate_assign_eliminate]: 3.24001e-06 [updatestate_loads_eliminate]: 2.98998e-06 [parameter_eliminate]: 1.34e-06 [a_2]: 8.76e-05 [accelerated_algorithm]: 1.064e-05 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 7.12002e-06 [merge_send_recv]: 6.97002e-06 [auto_parallel]: 7.26999e-06 [parallel]: 6.37001e-06 [flash_sp]: 4e-06 [merge_comm]: 4.1e-06 [allreduce_fusion]: 3.9e-06 [matmul_add_comm_reduction]: 6.87002e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 8.26002e-06 [virtual_dataset]: 6.55997e-06 [get_grad_eliminate_]: 6.29001e-06 [virtual_output]: 6.13998e-06 [merge_forward]: 3.77002e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 8.17e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.426e-05 [merge_recompute_call_nodes]: 9.30013e-07 [before_grad]: 1.155e-05 [set_forward_comm_id_for_comm_node_pass]: 4.53001e-06 [meta_fg_expand]: 8.057e-05 [flash_sp_send_recv_attached]: 1.31002e-06 [receive_attached]: 1.29e-06 [after_resolve]: 1.209e-05 [a_after_grad]: 1.007e-05 [renormalize]: 0.00060166 [add_forward_monad_depend]: 4.34997e-06 [auto_monad_grad]: 2.01e-06 [auto_monad_eliminator]: 1.214e-05 [cse]: 2.294e-05 [a_3]: 4.751e-05 [Cycle 3]: 0.00067414, [45] [expand_dump_flag]: 1.03001e-06 [switch_simplify]: 7.98999e-06 [loop_unroll]: 6.58e-06 [a_1]: 0.00014615 [with_stream_mark]: 8.40001e-06 [recompute_prepare]: 6.93998e-06 [updatestate_depend_eliminate]: 3.75998e-06 [updatestate_assign_eliminate]: 2.71e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 8.476e-05 [accelerated_algorithm]: 9.55001e-06 [shard]: 8.39995e-07 [meta_shard_fg_expand]: 1.34e-06 [shard_inline]: 6.83e-06 [merge_send_recv]: 5.40001e-06 [auto_parallel]: 5.80002e-06 [parallel]: 4.61002e-06 [flash_sp]: 9.20001e-07 [merge_comm]: 3.74002e-06 [allreduce_fusion]: 3.56001e-06 [matmul_add_comm_reduction]: 5.61e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 7.34002e-06 [virtual_dataset]: 6.64999e-06 [get_grad_eliminate_]: 6.19999e-06 [virtual_output]: 5.95002e-06 [merge_forward]: 3.52997e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 6.86999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.228e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.046e-05 [set_forward_comm_id_for_comm_node_pass]: 4.26001e-06 [meta_fg_expand]: 2.26998e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 9.99979e-07 [after_resolve]: 8.87e-06 [a_after_grad]: 9.47999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.10001e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 8.02e-06 [cse]: 1.664e-05 [a_3]: 3.877e-05 [py_interpret_to_execute_after_opt_a]: 1.012e-05 [slice_cell_reuse_recomputed_activation]: 2.62001e-06 [rewriter_after_opt_a]: 4.183e-05 [convert_after_rewriter]: 7.58999e-06 [order_py_execute_after_rewriter]: 5.74e-06 [mutable_eliminate]: 0.00049332 [opt_b]: 0.00025354, [1] [Cycle 1]: 0.00024673, [7] [b_1]: 0.00016297 [b_2]: 1.348e-05 [updatestate_depend_eliminate]: 5.67001e-06 [updatestate_assign_eliminate]: 3.01001e-06 [updatestate_loads_eliminate]: 2.93003e-06 [renormalize]: 4.80009e-07 [cse]: 2.105e-05 [optimize_parallel_all_gather_comm]: 1.795e-05 [overlap_param_gather]: 2.20002e-06 [cconv]: 2.027e-05 [loop_unroll]: 0.00043043 [opt_after_cconv]: 0.00010871, [1] [Cycle 1]: 0.00010273, [7] [c_1]: 3.292e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 5.69e-06 [updatestate_assign_eliminate]: 3.08998e-06 [updatestate_loads_eliminate]: 2.68e-06 [cse]: 2.05e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 1.628e-05 [tuple_transform]: 7.687e-05, [1] [Cycle 1]: 7.224e-05, [4] [d_1]: 4.472e-05 [none_parameter_eliminate]: 1.86998e-06 [renormalize]: 3.09985e-07 [switch_simplify]: 7.13e-06 [partial_unused_args_eliminate]: 1.81003e-06 [add_recomputation]: 5.041e-05 [cse_after_recomputation]: 2.445e-05, [1] [Cycle 1]: 1.953e-05, [1] [cse]: 1.419e-05 [environ_conv]: 7.70998e-06 [swap_dp_allreduce_reducescatter]: 5.81998e-06 [bias_add_comm_swap]: 2.47001e-06 [label_micro_interleaved_index]: 4.32e-06 [label_fine_grained_interleaved_index]: 2.61999e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.09999e-06 [micro_interleaved_order_control]: 2.30002e-06 [assign_add_opt]: 1.45999e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.49001e-06 [reorder_send_recv_between_fp_bp]: 3.01001e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 1.34e-06 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 2.05002e-06 [control_data_broadcast_order]: 1.397e-05 [grouped_pairwise_exchange_alltoall]: 1.95001e-06 [offloading_packed_experts]: 4.52998e-06 [overlap_recompute_and_grad_model_parallel]: 4.87e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.24998e-06 [overlap_recompute_comm]: 2.79999e-06 [overlap_grad_ring_attention]: 4.58001e-06 [overlap_grad_flash_sp]: 2.044e-05 [begin_end_overlap_inline]: 5.40022e-07 [split_matmul_comm_elemetwise]: 2.21e-06 [split_layernorm_comm]: 2.11998e-06 [handle_group_info]: 1.14e-06 [symbol_engine_optimizer]: 8.521e-05, [1] [Cycle 1]: 8.088e-05, [6] [build]: 8.77e-06 [elim_shapecalc]: 1.016e-05 [elim_not_effective]: 1.46e-05 [opt_reshape]: 7.33999e-06 [fold_const_symbol]: 1.123e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.76003e-06 [pipeline_parallel_scheduler]: 1.76998e-06 [auto_monad_reorder]: 2.048e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.45998e-06 [opt_after_jit_grad]: 0.00046678 [validate]: 4.042e-05 [backend_pass]: 9.10019e-07 [task_emit]: 0.00645292 [execute]: 7.1e-06 Sums bootstrap : 0.000517s : 1.46% type_inference : 0.011959s : 33.84% event_method : 0.000046s : 0.13% auto_monad : 0.000136s : 0.39% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000051s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000042s : 0.12% optimize.rewriter_before_opt_a : 0.000157s : 0.44% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000144s : 0.41% optimize.opt_a.loop_unroll : 0.000112s : 0.32% optimize.opt_a.a_1 : 0.002892s : 8.18% optimize.opt_a.with_stream_mark : 0.000045s : 0.13% optimize.opt_a.recompute_prepare : 0.000039s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000415s : 1.17% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.15% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000030s : 0.09% optimize.opt_a.merge_send_recv : 0.000029s : 0.08% optimize.opt_a.auto_parallel : 0.000024s : 0.07% optimize.opt_a.parallel : 0.000030s : 0.09% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000018s : 0.05% optimize.opt_a.allreduce_fusion : 0.000017s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000040s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000034s : 0.09% optimize.opt_a.virtual_dataset : 0.000029s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000027s : 0.08% optimize.opt_a.virtual_output : 0.000027s : 0.08% optimize.opt_a.merge_forward : 0.000017s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000034s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000056s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000051s : 0.14% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000019s : 0.05% optimize.opt_a.meta_fg_expand : 0.001625s : 4.60% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000087s : 0.25% optimize.opt_a.a_after_grad : 0.000108s : 0.30% optimize.opt_a.renormalize : 0.006924s : 19.59% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000009s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000071s : 0.20% optimize.opt_a.cse : 0.000220s : 0.62% optimize.opt_a.a_3 : 0.000418s : 1.18% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.01% optimize.rewriter_after_opt_a : 0.000042s : 0.12% optimize.convert_after_rewriter : 0.000008s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000493s : 1.40% optimize.opt_b.b_1 : 0.000163s : 0.46% optimize.opt_b.b_2 : 0.000013s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000021s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000430s : 1.22% optimize.opt_after_cconv.c_1 : 0.000033s : 0.09% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000020s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.05% optimize.tuple_transform.d_1 : 0.000045s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.14% optimize.cse_after_recomputation.cse : 0.000014s : 0.04% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000020s : 0.06% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000467s : 1.32% validate : 0.000040s : 0.11% backend_pass : 0.000001s : 0.00% task_emit : 0.006453s : 18.26% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000707 161 7.29% : 0.000052s : 8: substitution.arithmetic_simplify 0.33% : 0.000002s : 3: substitution.elim_not_effective 0.62% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 3: substitution.fold_const_symbol 0.85% : 0.000006s : 4: substitution.graph_param_transform 0.43% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 57.58% : 0.000407s : 17: substitution.inline 2.33% : 0.000016s : 2: substitution.inline_without_move 1.48% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.26% : 0.000016s : 3: substitution.less_batch_normalization 1.45% : 0.000010s : 7: substitution.minmaximum_grad 0.79% : 0.000006s : 5: substitution.partial_eliminate 1.75% : 0.000012s : 15: substitution.remove_not_recompute_node 3.82% : 0.000027s : 10: substitution.replace_applicator 1.47% : 0.000010s : 10: substitution.replace_old_param 0.38% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.00% : 0.000021s : 7: substitution.tuple_list_convert_item_index_to_positive 1.46% : 0.000010s : 7: substitution.tuple_list_get_item_const_eliminator 2.04% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.53% : 0.000053s : 19: substitution.tuple_list_get_item_eliminator 2.05% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011878 2 86.45% : 0.010269s : 1: type_inference.infer 13.55% : 0.001609s : 1: type_inference.specialize ------[replace.] 0.000197 27 64.46% : 0.000127s : 17: replace.inline 35.54% : 0.000070s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000425 27 93.59% : 0.000398s : 17: match.inline 6.41% : 0.000027s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000687 4248 1.13% : 0.000008s : 53: predicate.accumulaten_eliminater 0.24% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.45% : 0.000003s : 21: predicate.addn_check_dump 1.12% : 0.000008s : 53: predicate.addn_zero_filter 1.09% : 0.000007s : 53: predicate.adjust_all_reduce_mul_add 1.94% : 0.000013s : 74: predicate.arithmetic_simplify 1.14% : 0.000008s : 53: predicate.cast_eliminate 1.11% : 0.000008s : 50: predicate.check_bprop_eliminate 0.45% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.46% : 0.000003s : 21: predicate.depend_value_elim 1.15% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.21% : 0.000008s : 53: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000000s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_depend_swap 1.70% : 0.000012s : 78: predicate.environ_get_eliminate 1.19% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.83% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.51% : 0.000017s : 80: predicate.float_depend_g_call 0.46% : 0.000003s : 21: predicate.float_environ_get_switch 0.57% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.52% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.53% : 0.000004s : 21: predicate.incorporate_call 0.46% : 0.000003s : 21: predicate.incorporate_call_switch 5.94% : 0.000041s : 183: predicate.inline 1.46% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.61% : 0.000004s : 21: predicate.less_batch_normalization 1.55% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.64% : 0.000018s : 124: predicate.load_eliminater 0.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.60% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.37% : 0.000009s : 61: predicate.make_slice_get_slice_eliminator 0.47% : 0.000003s : 21: predicate.merge_addn 1.10% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.09% : 0.000007s : 50: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 53: predicate.minmaximum_grad 0.30% : 0.000002s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.15% : 0.000001s : 4: predicate.parallel_virtual_node 2.13% : 0.000015s : 80: predicate.partial_defer_inline 1.76% : 0.000012s : 67: predicate.partial_eliminate 1.13% : 0.000008s : 53: predicate.print_const_string_wrapper 0.47% : 0.000003s : 21: predicate.reduce_all_const_elim 1.40% : 0.000010s : 53: predicate.reduce_eliminate 2.67% : 0.000018s : 124: predicate.redundant_stop_gradient_eliminater 0.30% : 0.000002s : 21: predicate.remove_not_recompute_node 1.92% : 0.000013s : 113: predicate.replace_applicator 0.70% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.15% : 0.000008s : 53: predicate.reshape_eliminate 1.11% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.26% : 0.000009s : 50: predicate.same_eliminate 0.35% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.57% : 0.000004s : 21: predicate.shard_identity_eliminate 0.22% : 0.000002s : 8: predicate.special_op_eliminate 0.63% : 0.000004s : 21: predicate.specialize_transform 1.23% : 0.000008s : 50: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.95% : 0.000013s : 80: predicate.switch_defer_inline 3.01% : 0.000021s : 130: predicate.switch_layer_defer_inline 5.23% : 0.000036s : 218: predicate.switch_simplify 1.13% : 0.000008s : 53: predicate.tile_eliminate 1.11% : 0.000008s : 53: predicate.transpose_eliminate 1.43% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000009s : 61: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000019s : 92: predicate.tuple_list_get_item_eliminator 1.49% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.57% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.64% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.15% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.12% : 0.000001s : 4: predicate.value_based_eliminate 0.52% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.51% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001743 36 61.49% : 0.001072s : 15: func_graph_cloner_run.FuncGraphClonerGraph 38.51% : 0.000671s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.071198 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.44% : 0.003159s : 1: add_attr 4.42% : 0.003150s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000144s : 1: auto_monad 0.03% : 0.000024s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.78% : 0.000557s : 1: bootstrap 0.03% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.02% : 0.000011s : 1: convert_after_rewriter 0.04% : 0.000027s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.08% : 0.000054s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.62% : 0.000439s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.71% : 0.000503s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 6.16% : 0.004388s : 117: opt.transform.opt_a 0.04% : 0.000031s : 1: opt.transform.opt_after_cconv 0.03% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.21% : 0.000148s : 28: opt.transform.opt_b 0.07% : 0.000050s : 2: opt.transform.opt_trans_graph 0.06% : 0.000040s : 4: opt.transform.symbol_engine_opt 20.41% : 0.014532s : 1: opt_a 0.16% : 0.000112s : 1: opt_after_cconv 0.67% : 0.000476s : 1: opt_after_jit_grad 0.36% : 0.000257s : 1: opt_b 23.47% : 0.016707s : 1: optimize 0.03% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000056s : 1: pre_auto_parallel 0.06% : 0.000046s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000020s : 1: remove_dup_value 7.63% : 0.005434s : 2: renormalize.infer 2.07% : 0.001475s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000046s : 1: rewriter_after_opt_a 0.23% : 0.000161s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000088s : 1: symbol_engine_optimizer 9.08% : 0.006463s : 1: task_emit 0.11% : 0.000080s : 1: tuple_transform 16.82% : 0.011978s : 1: type_inference 0.10% : 0.000068s : 1: validate TotalTime = 0.0195581, [24] [bootstrap]: 0.00037297 [type_inference]: 0.00547305 [event_method]: 1.089e-05 [auto_monad]: 4.563e-05 [graph_reusing]: 3.78999e-06 [inline]: 1.76e-06 [add_attr]: 0.00290754, [1] [add_attr_with_inline]: 0.00289962, [1] [Cycle 1]: 3.976e-05, [2] [tag_attr]: 1.035e-05 [meta_addattr_fg_expand]: 2.84001e-06 [parallel-infer-symbol]: 2.84001e-06 [pre_auto_parallel]: 1.986e-05 [insert-virtual-dataset]: 1.25001e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.76e-06 [pipeline_split]: 1.85001e-06 [optimize]: 0.00384898, [53] [py_interpret_to_execute]: 1.879e-05 [rewriter_before_opt_a]: 4.736e-05 [opt_a]: 0.00198755, [2] [Cycle 1]: 0.00137718, [45] [expand_dump_flag]: 2.06e-06 [switch_simplify]: 2.526e-05 [loop_unroll]: 1.692e-05 [a_1]: 0.0003236 [with_stream_mark]: 1.196e-05 [recompute_prepare]: 7.95e-06 [updatestate_depend_eliminate]: 3.58e-06 [updatestate_assign_eliminate]: 2.83e-06 [updatestate_loads_eliminate]: 2.44001e-06 [parameter_eliminate]: 1.25999e-06 [a_2]: 7.858e-05 [accelerated_algorithm]: 6.49999e-06 [shard]: 1.35999e-06 [meta_shard_fg_expand]: 1.47999e-06 [shard_inline]: 6.11e-06 [merge_send_recv]: 6.11e-06 [auto_parallel]: 7.05e-06 [parallel]: 1.106e-05 [flash_sp]: 5.63997e-06 [merge_comm]: 3.97e-06 [allreduce_fusion]: 3.45998e-06 [matmul_add_comm_reduction]: 6.51999e-06 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 7.74002e-06 [virtual_dataset]: 6.04999e-06 [get_grad_eliminate_]: 5.66e-06 [virtual_output]: 5.82001e-06 [merge_forward]: 3.01001e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 7.21001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.169e-05 [merge_recompute_call_nodes]: 9.50007e-07 [before_grad]: 9.86e-06 [set_forward_comm_id_for_comm_node_pass]: 4.05e-06 [meta_fg_expand]: 2.41e-06 [flash_sp_send_recv_attached]: 2.76e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 9.50001e-06 [a_after_grad]: 8.94e-06 [renormalize]: 0.00041282 [add_forward_monad_depend]: 4.72e-06 [auto_monad_grad]: 2.04e-06 [auto_monad_eliminator]: 1.372e-05 [cse]: 2.989e-05 [a_3]: 4.182e-05 [Cycle 2]: 0.00060039, [45] [expand_dump_flag]: 1.12e-06 [switch_simplify]: 7.2e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00011245 [with_stream_mark]: 9.09998e-06 [recompute_prepare]: 5.79999e-06 [updatestate_depend_eliminate]: 3.06001e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.68003e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 7.071e-05 [accelerated_algorithm]: 5.79e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 4.35999e-06 [auto_parallel]: 5.87001e-06 [parallel]: 4.52e-06 [flash_sp]: 3.75e-06 [merge_comm]: 3.31999e-06 [allreduce_fusion]: 2.92002e-06 [matmul_add_comm_reduction]: 5.25001e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 6.17001e-06 [virtual_dataset]: 5.30999e-06 [get_grad_eliminate_]: 5.15001e-06 [virtual_output]: 5.08002e-06 [merge_forward]: 2.44999e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 5.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.034e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 8.65001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.52002e-06 [meta_fg_expand]: 1.70001e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 8.20999e-06 [a_after_grad]: 7.98999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.86001e-06 [cse]: 1.425e-05 [a_3]: 3.268e-05 [py_interpret_to_execute_after_opt_a]: 7.51001e-06 [slice_cell_reuse_recomputed_activation]: 2.43998e-06 [rewriter_after_opt_a]: 3.377e-05 [convert_after_rewriter]: 6.64999e-06 [order_py_execute_after_rewriter]: 5.49998e-06 [mutable_eliminate]: 0.00045951 [opt_b]: 0.00018458, [1] [Cycle 1]: 0.00017801, [7] [b_1]: 0.00010807 [b_2]: 6.72002e-06 [updatestate_depend_eliminate]: 5.10001e-06 [updatestate_assign_eliminate]: 2.58998e-06 [updatestate_loads_eliminate]: 2.56e-06 [renormalize]: 6.89994e-07 [cse]: 1.757e-05 [optimize_parallel_all_gather_comm]: 1.623e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.081e-05 [loop_unroll]: 0.00042263 [opt_after_cconv]: 9.607e-05, [1] [Cycle 1]: 9.006e-05, [7] [c_1]: 2.566e-05 [parameter_eliminate]: 2.37999e-06 [updatestate_depend_eliminate]: 4.84e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.677e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 1.483e-05 [tuple_transform]: 6.871e-05, [1] [Cycle 1]: 6.388e-05, [4] [d_1]: 3.691e-05 [none_parameter_eliminate]: 1.91e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 6.40002e-06 [partial_unused_args_eliminate]: 1.75001e-06 [add_recomputation]: 4.501e-05 [cse_after_recomputation]: 2.147e-05, [1] [Cycle 1]: 1.658e-05, [1] [cse]: 1.118e-05 [environ_conv]: 5.00999e-06 [swap_dp_allreduce_reducescatter]: 5.29998e-06 [bias_add_comm_swap]: 2.69001e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.37999e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.39001e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.247e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 4.15e-06 [overlap_recompute_and_grad_model_parallel]: 4.43001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27e-06 [overlap_recompute_comm]: 2.49999e-06 [overlap_grad_ring_attention]: 4.20999e-06 [overlap_grad_flash_sp]: 1.783e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.18998e-06 [split_layernorm_comm]: 2.01003e-06 [handle_group_info]: 1.25001e-06 [symbol_engine_optimizer]: 7.198e-05, [1] [Cycle 1]: 6.756e-05, [6] [build]: 2.27999e-06 [elim_shapecalc]: 8.80001e-06 [elim_not_effective]: 1.21e-05 [opt_reshape]: 6.30002e-06 [fold_const_symbol]: 9.49999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.513e-05 [get_jit_bprop_graph]: 1.17999e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00045729 [validate]: 3.448e-05 [backend_pass]: 1.04998e-06 [task_emit]: 0.00616282 [execute]: 6.98998e-06 Sums bootstrap : 0.000373s : 2.38% type_inference : 0.005473s : 34.96% event_method : 0.000011s : 0.07% auto_monad : 0.000046s : 0.29% graph_reusing : 0.000004s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000010s : 0.07% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000020s : 0.13% insert-virtual-dataset : 0.000001s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000047s : 0.30% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.21% optimize.opt_a.loop_unroll : 0.000023s : 0.14% optimize.opt_a.a_1 : 0.000436s : 2.79% optimize.opt_a.with_stream_mark : 0.000021s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000149s : 0.95% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000002s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000010s : 0.07% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000016s : 0.10% optimize.opt_a.flash_sp : 0.000009s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000012s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000005s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000013s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000413s : 2.64% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000044s : 0.28% optimize.opt_a.a_3 : 0.000075s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000034s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000460s : 2.93% optimize.opt_b.b_1 : 0.000108s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.13% optimize.loop_unroll : 0.000423s : 2.70% optimize.opt_after_cconv.c_1 : 0.000026s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000457s : 2.92% validate : 0.000034s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006163s : 39.36% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000116 24 20.13% : 0.000023s : 4: substitution.arithmetic_simplify 1.69% : 0.000002s : 2: substitution.elim_not_effective 1.13% : 0.000001s : 2: substitution.fold_const_symbol 4.84% : 0.000006s : 3: substitution.graph_param_transform 63.70% : 0.000074s : 3: substitution.inline 2.50% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.63% : 0.000004s : 4: substitution.remove_not_recompute_node 2.40% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005438 2 91.87% : 0.004996s : 1: type_inference.infer 8.13% : 0.000442s : 1: type_inference.specialize ------[replace.] 0.000025 3 100.00% : 0.000025s : 3: replace.inline ------[match.] 0.000072 3 100.00% : 0.000072s : 3: match.inline ------[predicate.] 0.000146 815 1.06% : 0.000002s : 8: predicate.accumulaten_eliminater 0.90% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 6: predicate.addn_check_dump 0.94% : 0.000001s : 8: predicate.addn_zero_filter 0.79% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.37% : 0.000003s : 14: predicate.arithmetic_simplify 0.86% : 0.000001s : 8: predicate.cast_eliminate 0.73% : 0.000001s : 6: predicate.check_bprop_eliminate 0.66% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.70% : 0.000001s : 6: predicate.depend_value_elim 0.84% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 3: predicate.elim_not_effective 0.44% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_depend_swap 1.82% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 1.98% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 6: predicate.float_environ_get_switch 0.90% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000001s : 6: predicate.get_grad_eliminate 0.26% : 0.000000s : 3: predicate.graph_param_transform 0.70% : 0.000001s : 6: predicate.incorporate_call 0.64% : 0.000001s : 6: predicate.incorporate_call_switch 6.14% : 0.000009s : 37: predicate.inline 0.95% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 6: predicate.less_batch_normalization 1.63% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.34% : 0.000003s : 22: predicate.load_eliminater 1.13% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.03% : 0.000003s : 18: predicate.loop_unroll_before_grad 2.00% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.66% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 8: predicate.minmaximum_grad 1.15% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.53% : 0.000002s : 11: predicate.partial_defer_inline 1.32% : 0.000002s : 11: predicate.partial_eliminate 0.97% : 0.000001s : 8: predicate.print_const_string_wrapper 0.68% : 0.000001s : 6: predicate.reduce_all_const_elim 1.31% : 0.000002s : 8: predicate.reduce_eliminate 2.27% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 6: predicate.remove_not_recompute_node 1.22% : 0.000002s : 14: predicate.replace_applicator 0.66% : 0.000001s : 6: predicate.replace_old_param 0.32% : 0.000000s : 3: predicate.reset_defer_inline 0.90% : 0.000001s : 8: predicate.reshape_eliminate 0.66% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 3: predicate.row_tensor_eliminate 0.98% : 0.000001s : 6: predicate.same_eliminate 0.52% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 6: predicate.shard_identity_eliminate 0.80% : 0.000001s : 6: predicate.special_op_eliminate 0.87% : 0.000001s : 6: predicate.specialize_transform 1.01% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.26% : 0.000002s : 11: predicate.switch_defer_inline 1.85% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.84% : 0.000007s : 38: predicate.switch_simplify 0.85% : 0.000001s : 8: predicate.tile_eliminate 0.94% : 0.000001s : 8: predicate.transpose_eliminate 1.62% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 2.88% : 0.000004s : 20: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.62% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.19% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.00% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.79% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000261 7 35.84% : 0.000093s : 2: func_graph_cloner_run.FuncGraphClonerGraph 64.16% : 0.000167s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027676 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.52% : 0.002912s : 1: add_attr 10.49% : 0.002903s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.18% : 0.000051s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.42% : 0.000392s : 1: bootstrap 0.09% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.56% : 0.000432s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.69% : 0.000468s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.87% : 0.000795s : 78: opt.transform.opt_a 0.09% : 0.000024s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000088s : 28: opt.transform.opt_b 0.15% : 0.000041s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.19% : 0.001991s : 1: opt_a 0.36% : 0.000099s : 1: opt_after_cconv 1.69% : 0.000467s : 1: opt_after_jit_grad 0.68% : 0.000188s : 1: opt_b 13.92% : 0.003853s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.09% : 0.000024s : 1: pre_auto_parallel 0.08% : 0.000022s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000018s : 1: remove_dup_value 0.82% : 0.000226s : 1: renormalize.infer 0.65% : 0.000180s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000038s : 1: rewriter_after_opt_a 0.19% : 0.000051s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000075s : 1: symbol_engine_optimizer 22.31% : 0.006173s : 1: task_emit 0.26% : 0.000072s : 1: tuple_transform 19.82% : 0.005487s : 1: type_inference 0.23% : 0.000062s : 1: validate TotalTime = 0.0384622, [24] [bootstrap]: 0.00050323 [type_inference]: 0.0117257 [event_method]: 4.303e-05 [auto_monad]: 0.00013014 [graph_reusing]: 8.92e-06 [inline]: 1.94999e-06 [add_attr]: 0.00301312, [1] [add_attr_with_inline]: 0.00300502, [1] [Cycle 1]: 6.839e-05, [2] [tag_attr]: 3.058e-05 [meta_addattr_fg_expand]: 9.81e-06 [parallel-infer-symbol]: 3.06001e-06 [pre_auto_parallel]: 4.684e-05 [insert-virtual-dataset]: 2.58998e-06 [parallel-infer-symbol-second]: 9.70002e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.87001e-06 [optimize]: 0.0158962, [53] [py_interpret_to_execute]: 3.79e-05 [rewriter_before_opt_a]: 0.00014482 [opt_a]: 0.0137781, [3] [Cycle 1]: 0.0104219, [45] [expand_dump_flag]: 3.87998e-06 [switch_simplify]: 7.227e-05 [loop_unroll]: 5.962e-05 [a_1]: 0.00137925 [with_stream_mark]: 2.311e-05 [recompute_prepare]: 2.2e-05 [updatestate_depend_eliminate]: 8.71997e-06 [updatestate_assign_eliminate]: 7.13e-06 [updatestate_loads_eliminate]: 7.28999e-06 [parameter_eliminate]: 2.59001e-06 [a_2]: 0.00024143 [accelerated_algorithm]: 3.116e-05 [shard]: 1.93002e-06 [meta_shard_fg_expand]: 3.67002e-06 [shard_inline]: 1.617e-05 [merge_send_recv]: 1.619e-05 [auto_parallel]: 1.079e-05 [parallel]: 1.857e-05 [flash_sp]: 1.097e-05 [merge_comm]: 9.51e-06 [allreduce_fusion]: 8.80999e-06 [matmul_add_comm_reduction]: 2.59e-05 [allreduce_slice_to_reducescatter]: 6.49976e-07 [virtual_shard_identity]: 1.743e-05 [virtual_dataset]: 1.556e-05 [get_grad_eliminate_]: 1.512e-05 [virtual_output]: 1.496e-05 [merge_forward]: 8.74998e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 1.743e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.987e-05 [merge_recompute_call_nodes]: 1.45001e-06 [before_grad]: 2.861e-05 [set_forward_comm_id_for_comm_node_pass]: 9.81998e-06 [meta_fg_expand]: 0.00141993 [flash_sp_send_recv_attached]: 3.62998e-06 [receive_attached]: 2.06e-06 [after_resolve]: 6.18e-05 [a_after_grad]: 8.558e-05 [renormalize]: 0.00585284 [add_forward_monad_depend]: 9.19998e-06 [auto_monad_grad]: 5.37999e-06 [auto_monad_eliminator]: 5.122e-05 [cse]: 0.00017395 [a_3]: 0.00032615 [Cycle 2]: 0.00266631, [45] [expand_dump_flag]: 1.59998e-06 [switch_simplify]: 4.582e-05 [loop_unroll]: 4.15e-05 [a_1]: 0.00130559 [with_stream_mark]: 1.098e-05 [recompute_prepare]: 8.93002e-06 [updatestate_depend_eliminate]: 4e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 2.57001e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 8.752e-05 [accelerated_algorithm]: 9.92999e-06 [shard]: 1.27999e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 6.86001e-06 [merge_send_recv]: 6.09999e-06 [auto_parallel]: 6.50002e-06 [parallel]: 4.80999e-06 [flash_sp]: 3.5e-06 [merge_comm]: 3.98999e-06 [allreduce_fusion]: 3.51001e-06 [matmul_add_comm_reduction]: 5.69999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 7.75998e-06 [virtual_dataset]: 6.29999e-06 [get_grad_eliminate_]: 6.17001e-06 [virtual_output]: 5.86003e-06 [merge_forward]: 3.2e-06 [cell_reuse_recompute_pass]: 8.89995e-07 [offload_activation]: 6.92002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.323e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 6.313e-05 [set_forward_comm_id_for_comm_node_pass]: 4.95999e-06 [meta_fg_expand]: 4.879e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.20001e-06 [after_resolve]: 1.21e-05 [a_after_grad]: 1.034e-05 [renormalize]: 0.0005774 [add_forward_monad_depend]: 4.67e-06 [auto_monad_grad]: 1.19e-06 [auto_monad_eliminator]: 1.111e-05 [cse]: 2.123e-05 [a_3]: 4.662e-05 [Cycle 3]: 0.00067563, [45] [expand_dump_flag]: 1.06002e-06 [switch_simplify]: 8.36002e-06 [loop_unroll]: 6.41998e-06 [a_1]: 0.00014709 [with_stream_mark]: 8.30999e-06 [recompute_prepare]: 6.97002e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 2.76999e-06 [updatestate_loads_eliminate]: 2.66999e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 8.463e-05 [accelerated_algorithm]: 9.64e-06 [shard]: 9.30013e-07 [meta_shard_fg_expand]: 1.47999e-06 [shard_inline]: 6.71e-06 [merge_send_recv]: 5.42001e-06 [auto_parallel]: 6.33e-06 [parallel]: 5.16002e-06 [flash_sp]: 9.60019e-07 [merge_comm]: 3.78999e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 5.71998e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 7.78999e-06 [virtual_dataset]: 6.16e-06 [get_grad_eliminate_]: 6.11e-06 [virtual_output]: 5.91e-06 [merge_forward]: 3.36999e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 7.43e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.263e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 1.088e-05 [set_forward_comm_id_for_comm_node_pass]: 3.85e-06 [meta_fg_expand]: 2.20002e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 9.15999e-06 [a_after_grad]: 9.11002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.30999e-06 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 7.36999e-06 [cse]: 1.67e-05 [a_3]: 3.873e-05 [py_interpret_to_execute_after_opt_a]: 1.06e-05 [slice_cell_reuse_recomputed_activation]: 2.02001e-06 [rewriter_after_opt_a]: 3.98e-05 [convert_after_rewriter]: 7.56999e-06 [order_py_execute_after_rewriter]: 5.40001e-06 [mutable_eliminate]: 0.00049241 [opt_b]: 0.00021453, [1] [Cycle 1]: 0.00020812, [7] [b_1]: 0.00013138 [b_2]: 8.48999e-06 [updatestate_depend_eliminate]: 5.94999e-06 [updatestate_assign_eliminate]: 2.91999e-06 [updatestate_loads_eliminate]: 2.78998e-06 [renormalize]: 4.69998e-07 [cse]: 2.107e-05 [optimize_parallel_all_gather_comm]: 1.796e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.117e-05 [loop_unroll]: 0.00042927 [opt_after_cconv]: 0.00010885, [1] [Cycle 1]: 0.00010292, [7] [c_1]: 3.328e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 5.64998e-06 [updatestate_assign_eliminate]: 3.01001e-06 [updatestate_loads_eliminate]: 2.63e-06 [cse]: 2.045e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.584e-05 [tuple_transform]: 7.711e-05, [1] [Cycle 1]: 7.239e-05, [4] [d_1]: 4.489e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 7.29001e-06 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 4.903e-05 [cse_after_recomputation]: 2.464e-05, [1] [Cycle 1]: 1.997e-05, [1] [cse]: 1.425e-05 [environ_conv]: 7.53e-06 [swap_dp_allreduce_reducescatter]: 5.69e-06 [bias_add_comm_swap]: 3.09999e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.65997e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.36002e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.63e-06 [reorder_send_recv_between_fp_bp]: 3.06999e-06 [comm_op_add_attrs]: 1.26002e-06 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.29e-06 [overlap_opt_shard_in_pipeline]: 1.54998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.404e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 4.03001e-06 [overlap_recompute_and_grad_model_parallel]: 4.79e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.35001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.41002e-06 [overlap_recompute_comm]: 2.46998e-06 [overlap_grad_ring_attention]: 4.60001e-06 [overlap_grad_flash_sp]: 1.92e-05 [begin_end_overlap_inline]: 5.49975e-07 [split_matmul_comm_elemetwise]: 2.32001e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.20999e-06 [symbol_engine_optimizer]: 8.442e-05, [1] [Cycle 1]: 8.003e-05, [6] [build]: 8.70999e-06 [elim_shapecalc]: 1.045e-05 [elim_not_effective]: 1.404e-05 [opt_reshape]: 7.3e-06 [fold_const_symbol]: 1.114e-05 [renormalize]: 1.80007e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.64e-06 [auto_monad_reorder]: 2.03e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.21999e-06 [opt_after_jit_grad]: 0.00046343 [validate]: 4.129e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00628185 [execute]: 6.54001e-06 Sums bootstrap : 0.000503s : 1.48% type_inference : 0.011726s : 34.38% event_method : 0.000043s : 0.13% auto_monad : 0.000130s : 0.38% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000047s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000038s : 0.11% optimize.rewriter_before_opt_a : 0.000145s : 0.42% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000126s : 0.37% optimize.opt_a.loop_unroll : 0.000108s : 0.32% optimize.opt_a.a_1 : 0.002832s : 8.30% optimize.opt_a.with_stream_mark : 0.000042s : 0.12% optimize.opt_a.recompute_prepare : 0.000038s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000013s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000414s : 1.21% optimize.opt_a.accelerated_algorithm : 0.000051s : 0.15% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000030s : 0.09% optimize.opt_a.merge_send_recv : 0.000028s : 0.08% optimize.opt_a.auto_parallel : 0.000024s : 0.07% optimize.opt_a.parallel : 0.000029s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000017s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000037s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000033s : 0.10% optimize.opt_a.virtual_dataset : 0.000028s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000027s : 0.08% optimize.opt_a.virtual_output : 0.000027s : 0.08% optimize.opt_a.merge_forward : 0.000015s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000032s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000056s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000103s : 0.30% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000019s : 0.05% optimize.opt_a.meta_fg_expand : 0.001471s : 4.31% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000083s : 0.24% optimize.opt_a.a_after_grad : 0.000105s : 0.31% optimize.opt_a.renormalize : 0.006430s : 18.85% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000070s : 0.20% optimize.opt_a.cse : 0.000212s : 0.62% optimize.opt_a.a_3 : 0.000411s : 1.21% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000040s : 0.12% optimize.convert_after_rewriter : 0.000008s : 0.02% optimize.order_py_execute_after_rewriter : 0.000005s : 0.02% optimize.mutable_eliminate : 0.000492s : 1.44% optimize.opt_b.b_1 : 0.000131s : 0.39% optimize.opt_b.b_2 : 0.000008s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000021s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000429s : 1.26% optimize.opt_after_cconv.c_1 : 0.000033s : 0.10% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000020s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.05% optimize.tuple_transform.d_1 : 0.000045s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.14% optimize.cse_after_recomputation.cse : 0.000014s : 0.04% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000019s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000020s : 0.06% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000463s : 1.36% validate : 0.000041s : 0.12% backend_pass : 0.000001s : 0.00% task_emit : 0.006282s : 18.42% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000703 159 6.37% : 0.000045s : 7: substitution.arithmetic_simplify 0.32% : 0.000002s : 3: substitution.elim_not_effective 0.63% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 3: substitution.fold_const_symbol 0.88% : 0.000006s : 4: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.35% : 0.000002s : 2: substitution.incorporate_call_switch 59.78% : 0.000420s : 17: substitution.inline 2.24% : 0.000016s : 2: substitution.inline_without_move 1.44% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.16% : 0.000015s : 3: substitution.less_batch_normalization 1.48% : 0.000010s : 7: substitution.minmaximum_grad 0.82% : 0.000006s : 5: substitution.partial_eliminate 1.75% : 0.000012s : 15: substitution.remove_not_recompute_node 3.68% : 0.000026s : 10: substitution.replace_applicator 1.29% : 0.000009s : 10: substitution.replace_old_param 0.39% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.90% : 0.000020s : 7: substitution.tuple_list_convert_item_index_to_positive 1.48% : 0.000010s : 7: substitution.tuple_list_get_item_const_eliminator 1.86% : 0.000013s : 7: substitution.tuple_list_get_item_depend_reorder 7.03% : 0.000049s : 18: substitution.tuple_list_get_item_eliminator 1.97% : 0.000014s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011651 2 87.66% : 0.010213s : 1: type_inference.infer 12.34% : 0.001438s : 1: type_inference.specialize ------[replace.] 0.000188 26 66.09% : 0.000124s : 17: replace.inline 33.91% : 0.000064s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000435 26 94.35% : 0.000410s : 17: match.inline 5.65% : 0.000025s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000675 4180 1.14% : 0.000008s : 52: predicate.accumulaten_eliminater 0.23% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.47% : 0.000003s : 21: predicate.addn_check_dump 1.13% : 0.000008s : 52: predicate.addn_zero_filter 1.13% : 0.000008s : 52: predicate.adjust_all_reduce_mul_add 1.95% : 0.000013s : 73: predicate.arithmetic_simplify 1.15% : 0.000008s : 52: predicate.cast_eliminate 1.12% : 0.000008s : 50: predicate.check_bprop_eliminate 0.49% : 0.000003s : 21: predicate.compare_switch_simplify 0.07% : 0.000000s : 4: predicate.const_output_eliminate 0.47% : 0.000003s : 21: predicate.depend_value_elim 1.14% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.19% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000000s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.19% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.26% : 0.000009s : 56: predicate.environ_get_depend_swap 1.71% : 0.000012s : 77: predicate.environ_get_eliminate 1.19% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.83% : 0.000012s : 78: predicate.exchange_switch_depend_value 2.49% : 0.000017s : 78: predicate.float_depend_g_call 0.47% : 0.000003s : 21: predicate.float_environ_get_switch 0.59% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.56% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.53% : 0.000004s : 21: predicate.incorporate_call 0.48% : 0.000003s : 21: predicate.incorporate_call_switch 5.90% : 0.000040s : 180: predicate.inline 1.42% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.64% : 0.000004s : 21: predicate.less_batch_normalization 1.56% : 0.000011s : 69: predicate.list_to_tuple_eliminator_ 2.64% : 0.000018s : 121: predicate.load_eliminater 0.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.54% : 0.000017s : 110: predicate.loop_unroll_before_grad 1.35% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.49% : 0.000003s : 21: predicate.merge_addn 1.12% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 52: predicate.minmaximum_grad 0.29% : 0.000002s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 2.11% : 0.000014s : 78: predicate.partial_defer_inline 1.70% : 0.000011s : 65: predicate.partial_eliminate 1.15% : 0.000008s : 52: predicate.print_const_string_wrapper 0.50% : 0.000003s : 21: predicate.reduce_all_const_elim 1.37% : 0.000009s : 52: predicate.reduce_eliminate 2.65% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 21: predicate.remove_not_recompute_node 1.87% : 0.000013s : 111: predicate.replace_applicator 0.68% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.15% : 0.000008s : 52: predicate.reshape_eliminate 1.13% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 4: predicate.row_tensor_eliminate 1.24% : 0.000008s : 50: predicate.same_eliminate 0.34% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.59% : 0.000004s : 21: predicate.shard_identity_eliminate 0.23% : 0.000002s : 8: predicate.special_op_eliminate 0.62% : 0.000004s : 21: predicate.specialize_transform 1.21% : 0.000008s : 50: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.12% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.94% : 0.000013s : 78: predicate.switch_defer_inline 3.04% : 0.000021s : 128: predicate.switch_layer_defer_inline 5.21% : 0.000035s : 213: predicate.switch_simplify 1.10% : 0.000007s : 52: predicate.tile_eliminate 1.13% : 0.000008s : 52: predicate.transpose_eliminate 1.46% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000010s : 60: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.74% : 0.000019s : 90: predicate.tuple_list_get_item_eliminator 1.45% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000014s : 81: predicate.tuple_list_set_item_eliminator 1.57% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.60% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.18% : 0.000021s : 142: predicate.updatestate_useless_node_eliminater 0.13% : 0.000001s : 4: predicate.value_based_eliminate 0.53% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.51% : 0.000003s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001629 35 60.34% : 0.000983s : 14: func_graph_cloner_run.FuncGraphClonerGraph 39.66% : 0.000646s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.068334 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.42% : 0.003018s : 1: add_attr 4.40% : 0.003009s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000137s : 1: auto_monad 0.04% : 0.000024s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.78% : 0.000533s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000017s : 1: control_data_broadcast_order 0.02% : 0.000011s : 1: convert_after_rewriter 0.04% : 0.000028s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.07% : 0.000050s : 1: event_method 0.02% : 0.000011s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.64% : 0.000438s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.73% : 0.000502s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 6.35% : 0.004341s : 117: opt.transform.opt_a 0.05% : 0.000032s : 1: opt.transform.opt_after_cconv 0.04% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000112s : 28: opt.transform.opt_b 0.07% : 0.000050s : 2: opt.transform.opt_trans_graph 0.06% : 0.000039s : 4: opt.transform.symbol_engine_opt 20.17% : 0.013781s : 1: opt_a 0.16% : 0.000112s : 1: opt_after_cconv 0.78% : 0.000531s : 1: opt_after_jit_grad 0.32% : 0.000218s : 1: opt_b 23.27% : 0.015900s : 1: optimize 0.03% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000005s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000052s : 1: pre_auto_parallel 0.06% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 7.31% : 0.004994s : 2: renormalize.infer 2.08% : 0.001423s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000044s : 1: rewriter_after_opt_a 0.22% : 0.000149s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.13% : 0.000087s : 1: symbol_engine_optimizer 9.21% : 0.006293s : 1: task_emit 0.12% : 0.000080s : 1: tuple_transform 17.18% : 0.011740s : 1: type_inference 0.10% : 0.000070s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x1-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x1-kbk],max_mem:10.0M TotalTime = 0.064694, [24] [bootstrap]: 0.00058085 [type_inference]: 0.00641704 [event_method]: 1.396e-05 [auto_monad]: 5.944e-05 [graph_reusing]: 6.67002e-06 [inline]: 2.39001e-06 [add_attr]: 0.00358235, [1] [add_attr_with_inline]: 0.0035712, [1] [Cycle 1]: 4.742e-05, [2] [tag_attr]: 1.485e-05 [meta_addattr_fg_expand]: 4.52e-06 [parallel-infer-symbol]: 2.79999e-06 [pre_auto_parallel]: 2.458e-05 [insert-virtual-dataset]: 2.91e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.24999e-06 [pipeline_split]: 1.74998e-06 [optimize]: 0.0040906, [53] [py_interpret_to_execute]: 2.099e-05 [rewriter_before_opt_a]: 6.46e-05 [opt_a]: 0.00217256, [2] [Cycle 1]: 0.00156325, [45] [expand_dump_flag]: 2.91e-06 [switch_simplify]: 3.337e-05 [loop_unroll]: 2.059e-05 [a_1]: 0.00044034 [with_stream_mark]: 1.385e-05 [recompute_prepare]: 7.83001e-06 [updatestate_depend_eliminate]: 3.71001e-06 [updatestate_assign_eliminate]: 3.49001e-06 [updatestate_loads_eliminate]: 3.23e-06 [parameter_eliminate]: 1.81003e-06 [a_2]: 8.002e-05 [accelerated_algorithm]: 6.43e-06 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 1.70001e-06 [shard_inline]: 6.12999e-06 [merge_send_recv]: 8.07e-06 [auto_parallel]: 6.77002e-06 [parallel]: 2.76e-05 [flash_sp]: 7.36999e-06 [merge_comm]: 4.02e-06 [allreduce_fusion]: 3.45e-06 [matmul_add_comm_reduction]: 9.10001e-06 [allreduce_slice_to_reducescatter]: 8.90024e-07 [virtual_shard_identity]: 7.73999e-06 [virtual_dataset]: 6.18998e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.74e-06 [merge_forward]: 3.88999e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 9.10999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.205e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 9.76003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.68999e-06 [meta_fg_expand]: 2.93e-06 [flash_sp_send_recv_attached]: 3.00002e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.012e-05 [a_after_grad]: 8.57e-06 [renormalize]: 0.00043891 [add_forward_monad_depend]: 9.37999e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.473e-05 [cse]: 3.08e-05 [a_3]: 4.177e-05 [Cycle 2]: 0.00059974, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 6.96999e-06 [loop_unroll]: 5.83002e-06 [a_1]: 0.00011428 [with_stream_mark]: 9.96998e-06 [recompute_prepare]: 5.94999e-06 [updatestate_depend_eliminate]: 3.00002e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.63e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 7.038e-05 [accelerated_algorithm]: 5.82999e-06 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.21002e-06 [shard_inline]: 5.73002e-06 [merge_send_recv]: 4.60001e-06 [auto_parallel]: 5.72001e-06 [parallel]: 4.43001e-06 [flash_sp]: 3.28e-06 [merge_comm]: 3.25002e-06 [allreduce_fusion]: 2.99001e-06 [matmul_add_comm_reduction]: 5.40001e-06 [allreduce_slice_to_reducescatter]: 4.2998e-07 [virtual_shard_identity]: 6.34999e-06 [virtual_dataset]: 5.49998e-06 [get_grad_eliminate_]: 5.27001e-06 [virtual_output]: 5.25001e-06 [merge_forward]: 2.79999e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 5.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.027e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 9.00999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 8.90024e-07 [after_resolve]: 8.43001e-06 [a_after_grad]: 7.82e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 6.14999e-06 [cse]: 1.689e-05 [a_3]: 3.251e-05 [py_interpret_to_execute_after_opt_a]: 7.41001e-06 [slice_cell_reuse_recomputed_activation]: 2.10002e-06 [rewriter_after_opt_a]: 3.301e-05 [convert_after_rewriter]: 6.93e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.00046807 [opt_b]: 0.00018638, [1] [Cycle 1]: 0.00018045, [7] [b_1]: 0.00010859 [b_2]: 7.56999e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 3.19997e-07 [cse]: 1.825e-05 [optimize_parallel_all_gather_comm]: 1.6e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.243e-05 [loop_unroll]: 0.00043183 [opt_after_cconv]: 9.72e-05, [1] [Cycle 1]: 9.122e-05, [7] [c_1]: 2.563e-05 [parameter_eliminate]: 2.43998e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.78e-06 [updatestate_loads_eliminate]: 2.53998e-06 [cse]: 1.791e-05 [renormalize]: 4.90021e-07 [remove_dup_value]: 1.559e-05 [tuple_transform]: 7.008e-05, [1] [Cycle 1]: 6.545e-05, [4] [d_1]: 3.792e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 6.51e-06 [partial_unused_args_eliminate]: 2.22001e-06 [add_recomputation]: 5.013e-05 [cse_after_recomputation]: 2.133e-05, [1] [Cycle 1]: 1.686e-05, [1] [cse]: 1.128e-05 [environ_conv]: 8.12998e-06 [swap_dp_allreduce_reducescatter]: 5.30999e-06 [bias_add_comm_swap]: 2.52001e-06 [label_micro_interleaved_index]: 4.26001e-06 [label_fine_grained_interleaved_index]: 2.56998e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.19999e-06 [micro_interleaved_order_control]: 2.49999e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 8.59989e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.15002e-06 [reorder_send_recv_between_fp_bp]: 2.99999e-06 [comm_op_add_attrs]: 1.35999e-06 [add_comm_op_reuse_tag]: 1.42999e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.15001e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.251e-05 [grouped_pairwise_exchange_alltoall]: 1.37999e-06 [offloading_packed_experts]: 4.35999e-06 [overlap_recompute_and_grad_model_parallel]: 4.76002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39e-06 [overlap_recompute_allgather_and_fa_grad]: 1.66e-06 [overlap_recompute_comm]: 2.36e-06 [overlap_grad_ring_attention]: 4.30999e-06 [overlap_grad_flash_sp]: 1.845e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 1.95001e-06 [split_layernorm_comm]: 1.75001e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 7.199e-05, [1] [Cycle 1]: 6.751e-05, [6] [build]: 2.64999e-06 [elim_shapecalc]: 9.04e-06 [elim_not_effective]: 1.208e-05 [opt_reshape]: 6.41998e-06 [fold_const_symbol]: 9.29e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.79998e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.623e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.28998e-06 [opt_after_jit_grad]: 0.00046274 [validate]: 3.579e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.049153 [execute]: 9.53002e-06 Sums bootstrap : 0.000581s : 0.97% type_inference : 0.006417s : 10.68% event_method : 0.000014s : 0.02% auto_monad : 0.000059s : 0.10% graph_reusing : 0.000007s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000065s : 0.11% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.07% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000555s : 0.92% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000150s : 0.25% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000032s : 0.05% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.03% optimize.opt_a.renormalize : 0.000439s : 0.73% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.02% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000048s : 0.08% optimize.opt_a.a_3 : 0.000074s : 0.12% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000468s : 0.78% optimize.opt_b.b_1 : 0.000109s : 0.18% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.04% optimize.loop_unroll : 0.000432s : 0.72% optimize.opt_after_cconv.c_1 : 0.000026s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.03% optimize.tuple_transform.d_1 : 0.000038s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000463s : 0.77% validate : 0.000036s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.049153s : 81.79% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000168 26 18.86% : 0.000032s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.47% : 0.000006s : 3: substitution.graph_param_transform 63.44% : 0.000107s : 3: substitution.inline 1.93% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.64% : 0.000004s : 4: substitution.remove_not_recompute_node 2.13% : 0.000004s : 2: substitution.replace_old_param 5.63% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006366 2 90.66% : 0.005772s : 1: type_inference.infer 9.34% : 0.000594s : 1: type_inference.specialize ------[replace.] 0.000036 4 78.16% : 0.000028s : 3: replace.inline 21.84% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000113 4 92.34% : 0.000105s : 3: match.inline 7.66% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 883 1.12% : 0.000002s : 9: predicate.accumulaten_eliminater 0.78% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.92% : 0.000001s : 9: predicate.addn_zero_filter 0.83% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 15: predicate.arithmetic_simplify 0.92% : 0.000001s : 9: predicate.cast_eliminate 0.69% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.91% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.95% : 0.000002s : 9: predicate.dict_set_item_eliminator 0.95% : 0.000001s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.39% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.32% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.16% : 0.000002s : 12: predicate.environ_get_depend_swap 1.96% : 0.000003s : 18: predicate.environ_get_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.33% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.83% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.65% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.27% : 0.000010s : 40: predicate.inline 0.88% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 6: predicate.less_batch_normalization 1.64% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 25: predicate.load_eliminater 1.01% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.80% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 9: predicate.minmaximum_grad 1.13% : 0.000002s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.34% : 0.000001s : 3: predicate.parallel_virtual_node 1.61% : 0.000003s : 13: predicate.partial_defer_inline 1.46% : 0.000002s : 13: predicate.partial_eliminate 1.05% : 0.000002s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.17% : 0.000002s : 9: predicate.reduce_eliminate 2.37% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 6: predicate.remove_not_recompute_node 1.28% : 0.000002s : 16: predicate.replace_applicator 0.58% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.98% : 0.000002s : 9: predicate.reshape_eliminate 0.67% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.88% : 0.000001s : 6: predicate.same_eliminate 0.44% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 6: predicate.shard_identity_eliminate 0.90% : 0.000001s : 6: predicate.special_op_eliminate 0.85% : 0.000001s : 6: predicate.specialize_transform 0.86% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.73% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 13: predicate.switch_defer_inline 1.96% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.05% : 0.000008s : 43: predicate.switch_simplify 0.92% : 0.000001s : 9: predicate.tile_eliminate 0.90% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.42% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.03% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 3: predicate.value_based_eliminate 0.66% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000339 8 46.78% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.22% : 0.000181s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.073874 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.85% : 0.003587s : 1: add_attr 4.84% : 0.003575s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000065s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.84% : 0.000621s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000011s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.60% : 0.000440s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.65% : 0.000477s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.25% : 0.000924s : 78: opt.transform.opt_a 0.03% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000088s : 28: opt.transform.opt_b 0.06% : 0.000042s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.94% : 0.002175s : 1: opt_a 0.14% : 0.000100s : 1: opt_after_cconv 0.64% : 0.000472s : 1: opt_after_jit_grad 0.26% : 0.000190s : 1: opt_b 5.54% : 0.004094s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.31% : 0.000231s : 1: renormalize.infer 0.27% : 0.000202s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000037s : 1: rewriter_after_opt_a 0.09% : 0.000069s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.10% : 0.000075s : 1: symbol_engine_optimizer 66.57% : 0.049177s : 1: task_emit 0.10% : 0.000073s : 1: tuple_transform 8.71% : 0.006432s : 1: type_inference 0.08% : 0.000059s : 1: validate TotalTime = 0.0556199, [24] [bootstrap]: 0.00047685 [type_inference]: 0.00607155 [event_method]: 1.36e-05 [auto_monad]: 5.719e-05 [graph_reusing]: 5.32001e-06 [inline]: 2.06e-06 [add_attr]: 0.00306485, [1] [add_attr_with_inline]: 0.00305737, [1] [Cycle 1]: 5.237e-05, [2] [tag_attr]: 1.338e-05 [meta_addattr_fg_expand]: 3.93001e-06 [parallel-infer-symbol]: 3.21001e-06 [pre_auto_parallel]: 2.381e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.82001e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.003959, [53] [py_interpret_to_execute]: 1.955e-05 [rewriter_before_opt_a]: 5.017e-05 [opt_a]: 0.00204548, [2] [Cycle 1]: 0.00143977, [45] [expand_dump_flag]: 2.86999e-06 [switch_simplify]: 2.845e-05 [loop_unroll]: 1.69e-05 [a_1]: 0.00035147 [with_stream_mark]: 1.404e-05 [recompute_prepare]: 7.77e-06 [updatestate_depend_eliminate]: 3.53999e-06 [updatestate_assign_eliminate]: 3.91999e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 2.10002e-06 [a_2]: 8.119e-05 [accelerated_algorithm]: 7.12002e-06 [shard]: 1.87001e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 6.19001e-06 [merge_send_recv]: 8.95001e-06 [auto_parallel]: 6.26e-06 [parallel]: 1.789e-05 [flash_sp]: 7.03e-06 [merge_comm]: 4.05e-06 [allreduce_fusion]: 3.65998e-06 [matmul_add_comm_reduction]: 9.53002e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 6.89999e-06 [virtual_dataset]: 5.91998e-06 [get_grad_eliminate_]: 5.89999e-06 [virtual_output]: 6.26998e-06 [merge_forward]: 3.99002e-06 [cell_reuse_recompute_pass]: 1.31998e-06 [offload_activation]: 9.77001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.19e-05 [merge_recompute_call_nodes]: 2.02999e-06 [before_grad]: 9.95002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.63e-06 [meta_fg_expand]: 2.65997e-06 [flash_sp_send_recv_attached]: 2.39999e-06 [receive_attached]: 2.39999e-06 [after_resolve]: 9.67001e-06 [a_after_grad]: 9.44e-06 [renormalize]: 0.0004316 [add_forward_monad_depend]: 4.41002e-06 [auto_monad_grad]: 2.03997e-06 [auto_monad_eliminator]: 1.315e-05 [cse]: 3.043e-05 [a_3]: 4.069e-05 [Cycle 2]: 0.00059606, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.98e-06 [loop_unroll]: 5.92999e-06 [a_1]: 0.00011384 [with_stream_mark]: 9.72999e-06 [recompute_prepare]: 6.07999e-06 [updatestate_depend_eliminate]: 3.09999e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.64999e-06 [parameter_eliminate]: 9.79984e-07 [a_2]: 7.041e-05 [accelerated_algorithm]: 5.82001e-06 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.11002e-06 [shard_inline]: 5.77001e-06 [merge_send_recv]: 4.49998e-06 [auto_parallel]: 5.46e-06 [parallel]: 4.03999e-06 [flash_sp]: 3.49001e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.73998e-06 [matmul_add_comm_reduction]: 5.12999e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 5.96e-06 [virtual_dataset]: 5.28002e-06 [get_grad_eliminate_]: 5.12999e-06 [virtual_output]: 5.02e-06 [merge_forward]: 2.68998e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 5.93002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.89999e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.40001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.25e-06 [meta_fg_expand]: 1.70001e-06 [flash_sp_send_recv_attached]: 1.07e-06 [receive_attached]: 9.79984e-07 [after_resolve]: 8.25999e-06 [a_after_grad]: 7.53e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 6.02001e-06 [cse]: 1.409e-05 [a_3]: 3.287e-05 [py_interpret_to_execute_after_opt_a]: 7.82998e-06 [slice_cell_reuse_recomputed_activation]: 1.99e-06 [rewriter_after_opt_a]: 3.314e-05 [convert_after_rewriter]: 6.96001e-06 [order_py_execute_after_rewriter]: 4.95001e-06 [mutable_eliminate]: 0.00045662 [opt_b]: 0.00018698, [1] [Cycle 1]: 0.00018061, [7] [b_1]: 0.00010916 [b_2]: 7.5e-06 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 2.62001e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 4.39992e-07 [cse]: 1.806e-05 [optimize_parallel_all_gather_comm]: 1.558e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 2.29e-05 [loop_unroll]: 0.00046856 [opt_after_cconv]: 9.604e-05, [1] [Cycle 1]: 9.026e-05, [7] [c_1]: 2.578e-05 [parameter_eliminate]: 2.38998e-06 [updatestate_depend_eliminate]: 5.32999e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.712e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.526e-05 [tuple_transform]: 6.805e-05, [1] [Cycle 1]: 6.356e-05, [4] [d_1]: 3.666e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 7.89994e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 2.21998e-06 [add_recomputation]: 4.5e-05 [cse_after_recomputation]: 2.149e-05, [1] [Cycle 1]: 1.689e-05, [1] [cse]: 1.127e-05 [environ_conv]: 5.97001e-06 [swap_dp_allreduce_reducescatter]: 5.05001e-06 [bias_add_comm_swap]: 2.44001e-06 [label_micro_interleaved_index]: 4.41002e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.29001e-06 [micro_interleaved_order_control]: 2.71999e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.58e-06 [reorder_send_recv_between_fp_bp]: 2.93e-06 [comm_op_add_attrs]: 1.42999e-06 [add_comm_op_reuse_tag]: 1.29e-06 [interleave_split_concat_branches]: 1.25001e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.191e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 4.22e-06 [overlap_recompute_and_grad_model_parallel]: 4.67998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39998e-06 [overlap_recompute_comm]: 2.35002e-06 [overlap_grad_ring_attention]: 4.32998e-06 [overlap_grad_flash_sp]: 1.745e-05 [begin_end_overlap_inline]: 6.00005e-07 [split_matmul_comm_elemetwise]: 2.48e-06 [split_layernorm_comm]: 2.10002e-06 [handle_group_info]: 1.13001e-06 [symbol_engine_optimizer]: 7.216e-05, [1] [Cycle 1]: 6.788e-05, [6] [build]: 2.54001e-06 [elim_shapecalc]: 9.34e-06 [elim_not_effective]: 1.165e-05 [opt_reshape]: 6.02001e-06 [fold_const_symbol]: 9.49e-06 [renormalize]: 2.69996e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 1.578e-05 [get_jit_bprop_graph]: 1.23002e-06 [rewriter_after_jit_bprop_graph]: 3.51001e-06 [opt_after_jit_grad]: 0.00045355 [validate]: 3.365e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.0412056 [execute]: 1.023e-05 Sums bootstrap : 0.000477s : 0.92% type_inference : 0.006072s : 11.78% event_method : 0.000014s : 0.03% auto_monad : 0.000057s : 0.11% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000024s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.04% optimize.rewriter_before_opt_a : 0.000050s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000035s : 0.07% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000465s : 0.90% optimize.opt_a.with_stream_mark : 0.000024s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000152s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.03% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.03% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000432s : 0.84% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.04% optimize.opt_a.cse : 0.000045s : 0.09% optimize.opt_a.a_3 : 0.000074s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000457s : 0.89% optimize.opt_b.b_1 : 0.000109s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.04% optimize.loop_unroll : 0.000469s : 0.91% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000037s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000001s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.09% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000454s : 0.88% validate : 0.000034s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.041206s : 79.93% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000141 24 20.82% : 0.000029s : 4: substitution.arithmetic_simplify 1.31% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 3.70% : 0.000005s : 3: substitution.graph_param_transform 65.39% : 0.000092s : 3: substitution.inline 2.16% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.41% : 0.000005s : 4: substitution.remove_not_recompute_node 2.23% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006025 2 91.70% : 0.005525s : 1: type_inference.infer 8.30% : 0.000500s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000090 3 100.00% : 0.000090s : 3: match.inline ------[predicate.] 0.000146 815 0.86% : 0.000001s : 8: predicate.accumulaten_eliminater 0.80% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.72% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 8: predicate.addn_zero_filter 0.77% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.34% : 0.000003s : 14: predicate.arithmetic_simplify 0.90% : 0.000001s : 8: predicate.cast_eliminate 0.67% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.25% : 0.000000s : 3: predicate.const_output_eliminate 0.66% : 0.000001s : 6: predicate.depend_value_elim 0.84% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 1.06% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.25% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_depend_swap 1.77% : 0.000003s : 17: predicate.environ_get_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.21% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 6: predicate.float_environ_get_switch 0.92% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 3: predicate.fold_const_symbol 0.77% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.75% : 0.000001s : 6: predicate.incorporate_call 0.63% : 0.000001s : 6: predicate.incorporate_call_switch 6.12% : 0.000009s : 37: predicate.inline 1.01% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.08% : 0.000002s : 6: predicate.less_batch_normalization 1.64% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 22: predicate.load_eliminater 1.12% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 6: predicate.merge_addn 0.68% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.13% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.57% : 0.000001s : 3: predicate.parallel_virtual_node 1.46% : 0.000002s : 11: predicate.partial_defer_inline 1.33% : 0.000002s : 11: predicate.partial_eliminate 0.84% : 0.000001s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.08% : 0.000002s : 8: predicate.reduce_eliminate 2.27% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.64% : 0.000001s : 6: predicate.remove_not_recompute_node 1.30% : 0.000002s : 14: predicate.replace_applicator 0.73% : 0.000001s : 6: predicate.replace_old_param 0.33% : 0.000000s : 3: predicate.reset_defer_inline 0.87% : 0.000001s : 8: predicate.reshape_eliminate 0.64% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.66% : 0.000001s : 3: predicate.row_tensor_eliminate 0.84% : 0.000001s : 6: predicate.same_eliminate 0.51% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 6: predicate.shard_identity_eliminate 0.82% : 0.000001s : 6: predicate.special_op_eliminate 0.87% : 0.000001s : 6: predicate.specialize_transform 0.96% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.27% : 0.000002s : 11: predicate.switch_defer_inline 1.90% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.84% : 0.000007s : 38: predicate.switch_simplify 0.93% : 0.000001s : 8: predicate.tile_eliminate 0.82% : 0.000001s : 8: predicate.transpose_eliminate 1.55% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.68% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.62% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 3: predicate.value_based_eliminate 0.75% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000312 7 37.19% : 0.000116s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.81% : 0.000196s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064049 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.79% : 0.003069s : 1: add_attr 4.78% : 0.003061s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000062s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.80% : 0.000515s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.03% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.74% : 0.000477s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.73% : 0.000465s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.29% : 0.000827s : 78: opt.transform.opt_a 0.04% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000089s : 28: opt.transform.opt_b 0.06% : 0.000041s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 3.20% : 0.002048s : 1: opt_a 0.16% : 0.000099s : 1: opt_after_cconv 0.72% : 0.000462s : 1: opt_after_jit_grad 0.30% : 0.000190s : 1: opt_b 6.19% : 0.003963s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000028s : 1: pre_auto_parallel 0.04% : 0.000023s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.37% : 0.000237s : 1: renormalize.infer 0.29% : 0.000188s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000037s : 1: rewriter_after_opt_a 0.08% : 0.000054s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000075s : 1: symbol_engine_optimizer 64.36% : 0.041224s : 1: task_emit 0.11% : 0.000071s : 1: tuple_transform 9.50% : 0.006088s : 1: type_inference 0.09% : 0.000055s : 1: validate TotalTime = 0.0560528, [24] [bootstrap]: 0.00041015 [type_inference]: 0.00560267 [event_method]: 1.332e-05 [auto_monad]: 5.968e-05 [graph_reusing]: 5.67999e-06 [inline]: 2.02999e-06 [add_attr]: 0.00297438, [1] [add_attr_with_inline]: 0.00296663, [1] [Cycle 1]: 4.729e-05, [2] [tag_attr]: 1.47e-05 [meta_addattr_fg_expand]: 4.83001e-06 [parallel-infer-symbol]: 2.77002e-06 [pre_auto_parallel]: 2.502e-05 [insert-virtual-dataset]: 2.50002e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.43e-06 [pipeline_split]: 1.77999e-06 [optimize]: 0.00413304, [53] [py_interpret_to_execute]: 2.028e-05 [rewriter_before_opt_a]: 0.00011369 [opt_a]: 0.00216863, [2] [Cycle 1]: 0.00155458, [45] [expand_dump_flag]: 3.32002e-06 [switch_simplify]: 3.382e-05 [loop_unroll]: 2.034e-05 [a_1]: 0.00044903 [with_stream_mark]: 1.433e-05 [recompute_prepare]: 8.27e-06 [updatestate_depend_eliminate]: 4.45999e-06 [updatestate_assign_eliminate]: 3.46999e-06 [updatestate_loads_eliminate]: 3.07002e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 8.066e-05 [accelerated_algorithm]: 7.09001e-06 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 6.23998e-06 [merge_send_recv]: 8.54e-06 [auto_parallel]: 6.23e-06 [parallel]: 1.878e-05 [flash_sp]: 7.67998e-06 [merge_comm]: 3.83001e-06 [allreduce_fusion]: 3.83999e-06 [matmul_add_comm_reduction]: 9.34e-06 [allreduce_slice_to_reducescatter]: 7.59988e-07 [virtual_shard_identity]: 7.51999e-06 [virtual_dataset]: 6.07001e-06 [get_grad_eliminate_]: 5.71998e-06 [virtual_output]: 5.81e-06 [merge_forward]: 3.9e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.49e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.196e-05 [merge_recompute_call_nodes]: 1.75001e-06 [before_grad]: 9.92001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.7e-06 [meta_fg_expand]: 2.62001e-06 [flash_sp_send_recv_attached]: 2.63e-06 [receive_attached]: 2.19001e-06 [after_resolve]: 9.43002e-06 [a_after_grad]: 8.54e-06 [renormalize]: 0.00043655 [add_forward_monad_depend]: 4.33999e-06 [auto_monad_grad]: 1.73002e-06 [auto_monad_eliminator]: 1.31e-05 [cse]: 3.052e-05 [a_3]: 4.138e-05 [Cycle 2]: 0.00060349, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 7.26001e-06 [loop_unroll]: 5.62999e-06 [a_1]: 0.0001151 [with_stream_mark]: 1.015e-05 [recompute_prepare]: 5.90002e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 7.065e-05 [accelerated_algorithm]: 5.75001e-06 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.64e-06 [merge_send_recv]: 4.60999e-06 [auto_parallel]: 5.72999e-06 [parallel]: 4.13001e-06 [flash_sp]: 3.23998e-06 [merge_comm]: 3.35003e-06 [allreduce_fusion]: 2.98e-06 [matmul_add_comm_reduction]: 5.34e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.53e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 5.10999e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 6.24001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.029e-05 [merge_recompute_call_nodes]: 7.60017e-07 [before_grad]: 8.57e-06 [set_forward_comm_id_for_comm_node_pass]: 3.50998e-06 [meta_fg_expand]: 1.85001e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 8.11002e-06 [a_after_grad]: 7.85e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.24998e-06 [auto_monad_grad]: 8.99978e-07 [auto_monad_eliminator]: 6.54001e-06 [cse]: 1.419e-05 [a_3]: 3.311e-05 [py_interpret_to_execute_after_opt_a]: 7.85998e-06 [slice_cell_reuse_recomputed_activation]: 2.56e-06 [rewriter_after_opt_a]: 3.267e-05 [convert_after_rewriter]: 7.82e-06 [order_py_execute_after_rewriter]: 4.68999e-06 [mutable_eliminate]: 0.00047688 [opt_b]: 0.00018688, [1] [Cycle 1]: 0.00018041, [7] [b_1]: 0.00010882 [b_2]: 7.45e-06 [updatestate_depend_eliminate]: 5.29998e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 4.70027e-07 [cse]: 1.838e-05 [optimize_parallel_all_gather_comm]: 1.599e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.285e-05 [loop_unroll]: 0.00042135 [opt_after_cconv]: 9.71e-05, [1] [Cycle 1]: 9.086e-05, [7] [c_1]: 2.601e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.48002e-06 [updatestate_assign_eliminate]: 2.72001e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.73e-05 [renormalize]: 4.2998e-07 [remove_dup_value]: 1.536e-05 [tuple_transform]: 6.882e-05, [1] [Cycle 1]: 6.428e-05, [4] [d_1]: 3.71e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.59999e-06 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 4.471e-05 [cse_after_recomputation]: 2.14e-05, [1] [Cycle 1]: 1.653e-05, [1] [cse]: 1.092e-05 [environ_conv]: 5.44e-06 [swap_dp_allreduce_reducescatter]: 5.14998e-06 [bias_add_comm_swap]: 2.82002e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 3.06001e-06 [merge_cast_opt]: 1.47001e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.69001e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.25001e-06 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.83e-06 [comm_op_add_attrs]: 1.06997e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.34e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.282e-05 [grouped_pairwise_exchange_alltoall]: 1.51998e-06 [offloading_packed_experts]: 3.73001e-06 [overlap_recompute_and_grad_model_parallel]: 4.66002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 4.53999e-06 [overlap_grad_flash_sp]: 1.673e-05 [begin_end_overlap_inline]: 5.40022e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.85001e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 7.403e-05, [1] [Cycle 1]: 6.913e-05, [6] [build]: 2.45002e-06 [elim_shapecalc]: 9.45001e-06 [elim_not_effective]: 1.269e-05 [opt_reshape]: 6.34999e-06 [fold_const_symbol]: 9.63002e-06 [renormalize]: 2.30008e-07 [detach_backward]: 2.12001e-06 [pipeline_parallel_scheduler]: 1.51002e-06 [auto_monad_reorder]: 1.613e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.76001e-06 [opt_after_jit_grad]: 0.00045555 [validate]: 3.399e-05 [backend_pass]: 1.19998e-06 [task_emit]: 0.0420796 [execute]: 9.41e-06 Sums bootstrap : 0.000410s : 0.79% type_inference : 0.005603s : 10.76% event_method : 0.000013s : 0.03% auto_monad : 0.000060s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000025s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.04% optimize.rewriter_before_opt_a : 0.000114s : 0.22% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000041s : 0.08% optimize.opt_a.loop_unroll : 0.000026s : 0.05% optimize.opt_a.a_1 : 0.000564s : 1.08% optimize.opt_a.with_stream_mark : 0.000024s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000151s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.03% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.03% optimize.opt_a.renormalize : 0.000437s : 0.84% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000045s : 0.09% optimize.opt_a.a_3 : 0.000074s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000008s : 0.02% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000477s : 0.92% optimize.opt_b.b_1 : 0.000109s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.04% optimize.loop_unroll : 0.000421s : 0.81% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000037s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.09% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000456s : 0.88% validate : 0.000034s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.042080s : 80.83% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000168 26 19.22% : 0.000032s : 5: substitution.arithmetic_simplify 1.20% : 0.000002s : 2: substitution.elim_not_effective 0.86% : 0.000001s : 2: substitution.fold_const_symbol 3.22% : 0.000005s : 3: substitution.graph_param_transform 63.46% : 0.000106s : 3: substitution.inline 1.89% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.90% : 0.000005s : 4: substitution.remove_not_recompute_node 1.92% : 0.000003s : 2: substitution.replace_old_param 5.33% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005561 2 89.34% : 0.004968s : 1: type_inference.infer 10.66% : 0.000592s : 1: type_inference.specialize ------[replace.] 0.000036 4 79.02% : 0.000028s : 3: replace.inline 20.98% : 0.000007s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000113 4 92.77% : 0.000104s : 3: match.inline 7.23% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 883 0.92% : 0.000001s : 9: predicate.accumulaten_eliminater 0.71% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.94% : 0.000001s : 9: predicate.addn_zero_filter 0.87% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 15: predicate.arithmetic_simplify 0.94% : 0.000001s : 9: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.61% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.86% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_depend_swap 1.84% : 0.000003s : 18: predicate.environ_get_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.31% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.23% : 0.000003s : 13: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.85% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.70% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.73% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.41% : 0.000010s : 40: predicate.inline 0.92% : 0.000001s : 6: predicate.inline_without_move 0.38% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 6: predicate.less_batch_normalization 1.61% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 25: predicate.load_eliminater 1.07% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.32% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.67% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.58% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.84% : 0.000001s : 9: predicate.minmaximum_grad 1.00% : 0.000002s : 3: predicate.mutable_eliminate 0.39% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.78% : 0.000003s : 13: predicate.partial_defer_inline 1.43% : 0.000002s : 13: predicate.partial_eliminate 0.92% : 0.000001s : 9: predicate.print_const_string_wrapper 0.63% : 0.000001s : 6: predicate.reduce_all_const_elim 1.18% : 0.000002s : 9: predicate.reduce_eliminate 2.38% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 6: predicate.remove_not_recompute_node 1.39% : 0.000002s : 16: predicate.replace_applicator 0.61% : 0.000001s : 6: predicate.replace_old_param 0.31% : 0.000000s : 3: predicate.reset_defer_inline 0.96% : 0.000002s : 9: predicate.reshape_eliminate 0.62% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.51% : 0.000001s : 3: predicate.row_tensor_eliminate 0.84% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 6: predicate.shard_identity_eliminate 0.79% : 0.000001s : 6: predicate.special_op_eliminate 0.84% : 0.000001s : 6: predicate.specialize_transform 0.95% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.69% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 13: predicate.switch_defer_inline 1.98% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.92% : 0.000008s : 43: predicate.switch_simplify 1.08% : 0.000002s : 9: predicate.tile_eliminate 0.92% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.41% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 3: predicate.value_based_eliminate 0.71% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000343 8 45.79% : 0.000157s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.21% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064682 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.60% : 0.002978s : 1: add_attr 4.59% : 0.002970s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000065s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.69% : 0.000445s : 1: bootstrap 0.04% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.02% : 0.000011s : 1: convert_after_rewriter 0.04% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000014s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.66% : 0.000430s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.75% : 0.000485s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.45% : 0.000936s : 78: opt.transform.opt_a 0.04% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000089s : 28: opt.transform.opt_b 0.06% : 0.000042s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 3.36% : 0.002172s : 1: opt_a 0.16% : 0.000101s : 1: opt_after_cconv 0.72% : 0.000465s : 1: opt_after_jit_grad 0.29% : 0.000190s : 1: opt_b 6.40% : 0.004137s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.05% : 0.000029s : 1: pre_auto_parallel 0.04% : 0.000024s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.34% : 0.000222s : 1: renormalize.infer 0.32% : 0.000209s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000036s : 1: rewriter_after_opt_a 0.18% : 0.000119s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000077s : 1: symbol_engine_optimizer 65.09% : 0.042104s : 1: task_emit 0.11% : 0.000072s : 1: tuple_transform 8.68% : 0.005617s : 1: type_inference 0.09% : 0.000057s : 1: validate TotalTime = 0.0768976, [24] [bootstrap]: 0.00054008 [type_inference]: 0.0116133 [event_method]: 4.671e-05 [auto_monad]: 0.00012831 [graph_reusing]: 8.18001e-06 [inline]: 2.11998e-06 [add_attr]: 0.00304601, [1] [add_attr_with_inline]: 0.003038, [1] [Cycle 1]: 7.091e-05, [2] [tag_attr]: 3.331e-05 [meta_addattr_fg_expand]: 9.96998e-06 [parallel-infer-symbol]: 2.78e-06 [pre_auto_parallel]: 4.855e-05 [insert-virtual-dataset]: 2.54999e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.22999e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.0165035, [53] [py_interpret_to_execute]: 3.848e-05 [rewriter_before_opt_a]: 0.00015441 [opt_a]: 0.0143525, [3] [Cycle 1]: 0.0109415, [45] [expand_dump_flag]: 4.18001e-06 [switch_simplify]: 7.681e-05 [loop_unroll]: 6.292e-05 [a_1]: 0.00144342 [with_stream_mark]: 2.365e-05 [recompute_prepare]: 2.245e-05 [updatestate_depend_eliminate]: 8.60999e-06 [updatestate_assign_eliminate]: 7.10002e-06 [updatestate_loads_eliminate]: 6.82002e-06 [parameter_eliminate]: 2.85002e-06 [a_2]: 0.00024066 [accelerated_algorithm]: 3.083e-05 [shard]: 2.11998e-06 [meta_shard_fg_expand]: 3.5e-06 [shard_inline]: 1.604e-05 [merge_send_recv]: 1.633e-05 [auto_parallel]: 1.15e-05 [parallel]: 1.867e-05 [flash_sp]: 1.152e-05 [merge_comm]: 9.36e-06 [allreduce_fusion]: 8.64998e-06 [matmul_add_comm_reduction]: 2.595e-05 [allreduce_slice_to_reducescatter]: 7.29982e-07 [virtual_shard_identity]: 1.809e-05 [virtual_dataset]: 1.56e-05 [get_grad_eliminate_]: 1.494e-05 [virtual_output]: 1.505e-05 [merge_forward]: 9.22999e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 1.751e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.988e-05 [merge_recompute_call_nodes]: 1.77001e-06 [before_grad]: 2.955e-05 [set_forward_comm_id_for_comm_node_pass]: 9.94001e-06 [meta_fg_expand]: 0.0014435 [flash_sp_send_recv_attached]: 4e-06 [receive_attached]: 2.06e-06 [after_resolve]: 6.362e-05 [a_after_grad]: 8.887e-05 [renormalize]: 0.00624527 [add_forward_monad_depend]: 9.72999e-06 [auto_monad_grad]: 5.49e-06 [auto_monad_eliminator]: 5.242e-05 [cse]: 0.00018471 [a_3]: 0.00033135 [Cycle 2]: 0.00271666, [45] [expand_dump_flag]: 1.81e-06 [switch_simplify]: 4.539e-05 [loop_unroll]: 4.202e-05 [a_1]: 0.00132295 [with_stream_mark]: 1.148e-05 [recompute_prepare]: 8.95999e-06 [updatestate_depend_eliminate]: 3.93999e-06 [updatestate_assign_eliminate]: 3.18998e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 8.835e-05 [accelerated_algorithm]: 1.043e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 6.88e-06 [merge_send_recv]: 6.55997e-06 [auto_parallel]: 6.84999e-06 [parallel]: 5.58002e-06 [flash_sp]: 3.53e-06 [merge_comm]: 4.02e-06 [allreduce_fusion]: 3.8e-06 [matmul_add_comm_reduction]: 6.29001e-06 [allreduce_slice_to_reducescatter]: 5.3001e-07 [virtual_shard_identity]: 8.33001e-06 [virtual_dataset]: 6.53998e-06 [get_grad_eliminate_]: 6.41e-06 [virtual_output]: 6.14001e-06 [merge_forward]: 3.56999e-06 [cell_reuse_recompute_pass]: 9.50007e-07 [offload_activation]: 8.25999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.313e-05 [merge_recompute_call_nodes]: 1.03001e-06 [before_grad]: 1.135e-05 [set_forward_comm_id_for_comm_node_pass]: 4.53999e-06 [meta_fg_expand]: 7.481e-05 [flash_sp_send_recv_attached]: 1.05001e-06 [receive_attached]: 1.14e-06 [after_resolve]: 1.225e-05 [a_after_grad]: 1.01e-05 [renormalize]: 0.00057876 [add_forward_monad_depend]: 4.03999e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 5.784e-05 [cse]: 2.226e-05 [a_3]: 4.647e-05 [Cycle 3]: 0.00067974, [45] [expand_dump_flag]: 1.05999e-06 [switch_simplify]: 7.84002e-06 [loop_unroll]: 6.59999e-06 [a_1]: 0.00014699 [with_stream_mark]: 8.77e-06 [recompute_prepare]: 6.96999e-06 [updatestate_depend_eliminate]: 3.78999e-06 [updatestate_assign_eliminate]: 2.83e-06 [updatestate_loads_eliminate]: 2.45002e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 8.562e-05 [accelerated_algorithm]: 9.74e-06 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.37e-06 [shard_inline]: 6.98998e-06 [merge_send_recv]: 5.30999e-06 [auto_parallel]: 6.00002e-06 [parallel]: 4.96002e-06 [flash_sp]: 1.02e-06 [merge_comm]: 3.76999e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 5.82001e-06 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 7.35e-06 [virtual_dataset]: 6.53e-06 [get_grad_eliminate_]: 6.25002e-06 [virtual_output]: 6.41998e-06 [merge_forward]: 3.03e-06 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 7.1e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.33e-05 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 1.065e-05 [set_forward_comm_id_for_comm_node_pass]: 3.83001e-06 [meta_fg_expand]: 2.23002e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.85001e-06 [a_after_grad]: 9.47999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 9.80013e-07 [auto_monad_eliminator]: 7.26001e-06 [cse]: 1.672e-05 [a_3]: 3.935e-05 [py_interpret_to_execute_after_opt_a]: 1.022e-05 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 4.106e-05 [convert_after_rewriter]: 7.08e-06 [order_py_execute_after_rewriter]: 5.50001e-06 [mutable_eliminate]: 0.00050486 [opt_b]: 0.000224, [1] [Cycle 1]: 0.00021702, [7] [b_1]: 0.00013194 [b_2]: 1.363e-05 [updatestate_depend_eliminate]: 5.83002e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 3.04001e-06 [renormalize]: 4.69998e-07 [cse]: 2.157e-05 [optimize_parallel_all_gather_comm]: 1.757e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 1.999e-05 [loop_unroll]: 0.00042875 [opt_after_cconv]: 0.00011085, [1] [Cycle 1]: 0.00010444, [7] [c_1]: 3.322e-05 [parameter_eliminate]: 2.53e-06 [updatestate_depend_eliminate]: 5.66e-06 [updatestate_assign_eliminate]: 3.17002e-06 [updatestate_loads_eliminate]: 2.74001e-06 [cse]: 2.176e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.624e-05 [tuple_transform]: 7.822e-05, [1] [Cycle 1]: 7.289e-05, [4] [d_1]: 4.53e-05 [none_parameter_eliminate]: 1.72999e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 7.18998e-06 [partial_unused_args_eliminate]: 1.92001e-06 [add_recomputation]: 5.059e-05 [cse_after_recomputation]: 2.494e-05, [1] [Cycle 1]: 2.011e-05, [1] [cse]: 1.46e-05 [environ_conv]: 7.92998e-06 [swap_dp_allreduce_reducescatter]: 5.81998e-06 [bias_add_comm_swap]: 2.85998e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.53998e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.01e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.68e-06 [reorder_send_recv_between_fp_bp]: 2.99999e-06 [comm_op_add_attrs]: 1.25999e-06 [add_comm_op_reuse_tag]: 1.24e-06 [interleave_split_concat_branches]: 1.21002e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.15999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.09999e-06 [control_data_broadcast_order]: 1.392e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 4.06001e-06 [overlap_recompute_and_grad_model_parallel]: 5.02999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.76e-06 [overlap_recompute_comm]: 2.57001e-06 [overlap_grad_ring_attention]: 4.70999e-06 [overlap_grad_flash_sp]: 2.045e-05 [begin_end_overlap_inline]: 8.10018e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.95001e-06 [handle_group_info]: 1.14998e-06 [symbol_engine_optimizer]: 8.49e-05, [1] [Cycle 1]: 8.06e-05, [6] [build]: 8.58001e-06 [elim_shapecalc]: 1.06e-05 [elim_not_effective]: 1.453e-05 [opt_reshape]: 7.19001e-06 [fold_const_symbol]: 1.154e-05 [renormalize]: 1.50001e-07 [detach_backward]: 1.84998e-06 [pipeline_parallel_scheduler]: 1.56998e-06 [auto_monad_reorder]: 1.956e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.68999e-06 [opt_after_jit_grad]: 0.00046521 [validate]: 4.073e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.044186 [execute]: 9.04e-06 Sums bootstrap : 0.000540s : 0.74% type_inference : 0.011613s : 16.01% event_method : 0.000047s : 0.06% auto_monad : 0.000128s : 0.18% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.05% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.07% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.05% optimize.rewriter_before_opt_a : 0.000154s : 0.21% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000130s : 0.18% optimize.opt_a.loop_unroll : 0.000112s : 0.15% optimize.opt_a.a_1 : 0.002913s : 4.02% optimize.opt_a.with_stream_mark : 0.000044s : 0.06% optimize.opt_a.recompute_prepare : 0.000038s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000013s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000415s : 0.57% optimize.opt_a.accelerated_algorithm : 0.000051s : 0.07% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000030s : 0.04% optimize.opt_a.merge_send_recv : 0.000028s : 0.04% optimize.opt_a.auto_parallel : 0.000024s : 0.03% optimize.opt_a.parallel : 0.000029s : 0.04% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000017s : 0.02% optimize.opt_a.allreduce_fusion : 0.000016s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000038s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000034s : 0.05% optimize.opt_a.virtual_dataset : 0.000029s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.04% optimize.opt_a.virtual_output : 0.000028s : 0.04% optimize.opt_a.merge_forward : 0.000016s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000033s : 0.05% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000056s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000052s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.03% optimize.opt_a.meta_fg_expand : 0.001521s : 2.10% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000085s : 0.12% optimize.opt_a.a_after_grad : 0.000108s : 0.15% optimize.opt_a.renormalize : 0.006824s : 9.41% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.02% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000118s : 0.16% optimize.opt_a.cse : 0.000224s : 0.31% optimize.opt_a.a_3 : 0.000417s : 0.58% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000041s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000505s : 0.70% optimize.opt_b.b_1 : 0.000132s : 0.18% optimize.opt_b.b_2 : 0.000014s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000022s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.03% optimize.loop_unroll : 0.000429s : 0.59% optimize.opt_after_cconv.c_1 : 0.000033s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000022s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.02% optimize.tuple_transform.d_1 : 0.000045s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.07% optimize.cse_after_recomputation.cse : 0.000015s : 0.02% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000020s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000465s : 0.64% validate : 0.000041s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.044186s : 60.91% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000693 161 7.37% : 0.000051s : 8: substitution.arithmetic_simplify 0.35% : 0.000002s : 3: substitution.elim_not_effective 0.57% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 3: substitution.fold_const_symbol 0.83% : 0.000006s : 4: substitution.graph_param_transform 0.40% : 0.000003s : 2: substitution.incorporate_call 0.34% : 0.000002s : 2: substitution.incorporate_call_switch 57.35% : 0.000398s : 17: substitution.inline 2.31% : 0.000016s : 2: substitution.inline_without_move 1.46% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.16% : 0.000015s : 3: substitution.less_batch_normalization 1.50% : 0.000010s : 7: substitution.minmaximum_grad 0.90% : 0.000006s : 5: substitution.partial_eliminate 1.85% : 0.000013s : 15: substitution.remove_not_recompute_node 3.88% : 0.000027s : 10: substitution.replace_applicator 1.32% : 0.000009s : 10: substitution.replace_old_param 0.43% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.04% : 0.000021s : 7: substitution.tuple_list_convert_item_index_to_positive 1.46% : 0.000010s : 7: substitution.tuple_list_get_item_const_eliminator 2.01% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.61% : 0.000053s : 19: substitution.tuple_list_get_item_eliminator 2.01% : 0.000014s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011536 2 86.29% : 0.009955s : 1: type_inference.infer 13.71% : 0.001581s : 1: type_inference.specialize ------[replace.] 0.000196 27 63.68% : 0.000125s : 17: replace.inline 36.32% : 0.000071s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000415 27 93.52% : 0.000388s : 17: match.inline 6.48% : 0.000027s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000691 4248 1.12% : 0.000008s : 53: predicate.accumulaten_eliminater 0.24% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.14% : 0.000008s : 53: predicate.addn_zero_filter 1.10% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 1.98% : 0.000014s : 74: predicate.arithmetic_simplify 1.16% : 0.000008s : 53: predicate.cast_eliminate 1.11% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.45% : 0.000003s : 21: predicate.depend_value_elim 1.16% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.20% : 0.000008s : 53: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.20% : 0.000008s : 57: predicate.environ_get_depend_swap 1.67% : 0.000012s : 78: predicate.environ_get_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.81% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.53% : 0.000017s : 80: predicate.float_depend_g_call 0.47% : 0.000003s : 21: predicate.float_environ_get_switch 0.57% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.07% : 0.000000s : 4: predicate.fold_const_symbol 0.53% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.51% : 0.000003s : 21: predicate.incorporate_call 0.46% : 0.000003s : 21: predicate.incorporate_call_switch 5.92% : 0.000041s : 183: predicate.inline 1.43% : 0.000010s : 45: predicate.inline_without_move 0.29% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 21: predicate.less_batch_normalization 1.56% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.63% : 0.000018s : 124: predicate.load_eliminater 0.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.58% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.48% : 0.000003s : 21: predicate.merge_addn 1.09% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 53: predicate.minmaximum_grad 0.28% : 0.000002s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.13% : 0.000001s : 4: predicate.parallel_virtual_node 2.13% : 0.000015s : 80: predicate.partial_defer_inline 1.71% : 0.000012s : 67: predicate.partial_eliminate 1.13% : 0.000008s : 53: predicate.print_const_string_wrapper 0.49% : 0.000003s : 21: predicate.reduce_all_const_elim 1.42% : 0.000010s : 53: predicate.reduce_eliminate 2.66% : 0.000018s : 124: predicate.redundant_stop_gradient_eliminater 0.29% : 0.000002s : 21: predicate.remove_not_recompute_node 1.88% : 0.000013s : 113: predicate.replace_applicator 0.68% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.17% : 0.000008s : 53: predicate.reshape_eliminate 1.11% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.27% : 0.000009s : 50: predicate.same_eliminate 0.34% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.60% : 0.000004s : 21: predicate.shard_identity_eliminate 0.22% : 0.000002s : 8: predicate.special_op_eliminate 0.62% : 0.000004s : 21: predicate.specialize_transform 1.23% : 0.000008s : 50: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.97% : 0.000014s : 80: predicate.switch_defer_inline 3.04% : 0.000021s : 130: predicate.switch_layer_defer_inline 5.27% : 0.000036s : 218: predicate.switch_simplify 1.13% : 0.000008s : 53: predicate.tile_eliminate 1.10% : 0.000008s : 53: predicate.transpose_eliminate 1.45% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000009s : 61: predicate.tuple_list_get_item_depend_reorder 2.75% : 0.000019s : 92: predicate.tuple_list_get_item_eliminator 1.46% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.56% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.61% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.18% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 4: predicate.value_based_eliminate 0.52% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.49% : 0.000003s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001760 36 60.04% : 0.001056s : 15: func_graph_cloner_run.FuncGraphClonerGraph 39.96% : 0.000703s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.107862 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.83% : 0.003050s : 1: add_attr 2.82% : 0.003042s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.13% : 0.000136s : 1: auto_monad 0.02% : 0.000023s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.54% : 0.000578s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000028s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.05% : 0.000054s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.41% : 0.000437s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.48% : 0.000514s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 4.07% : 0.004393s : 117: opt.transform.opt_a 0.03% : 0.000032s : 1: opt.transform.opt_after_cconv 0.02% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000117s : 28: opt.transform.opt_b 0.05% : 0.000050s : 2: opt.transform.opt_trans_graph 0.04% : 0.000040s : 4: opt.transform.symbol_engine_opt 13.31% : 0.014356s : 1: opt_a 0.11% : 0.000114s : 1: opt_after_cconv 0.44% : 0.000475s : 1: opt_after_jit_grad 0.21% : 0.000228s : 1: opt_b 15.30% : 0.016508s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000024s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000054s : 1: pre_auto_parallel 0.04% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000020s : 1: remove_dup_value 4.87% : 0.005254s : 2: renormalize.infer 1.44% : 0.001556s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000045s : 1: rewriter_after_opt_a 0.15% : 0.000159s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000088s : 1: symbol_engine_optimizer 40.99% : 0.044207s : 1: task_emit 0.08% : 0.000081s : 1: tuple_transform 10.78% : 0.011629s : 1: type_inference 0.06% : 0.000064s : 1: validate TotalTime = 0.0565669, [24] [bootstrap]: 0.00048818 [type_inference]: 0.0059522 [event_method]: 1.297e-05 [auto_monad]: 5.841e-05 [graph_reusing]: 5.71e-06 [inline]: 2.19999e-06 [add_attr]: 0.00305839, [1] [add_attr_with_inline]: 0.00305049, [1] [Cycle 1]: 5.077e-05, [2] [tag_attr]: 1.432e-05 [meta_addattr_fg_expand]: 4.1e-06 [parallel-infer-symbol]: 2.94001e-06 [pre_auto_parallel]: 2.548e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 2.26998e-06 [pipeline_split]: 1.76e-06 [optimize]: 0.00410527, [53] [py_interpret_to_execute]: 1.97e-05 [rewriter_before_opt_a]: 5.209e-05 [opt_a]: 0.0021577, [2] [Cycle 1]: 0.00150972, [45] [expand_dump_flag]: 2.89999e-06 [switch_simplify]: 2.93e-05 [loop_unroll]: 1.703e-05 [a_1]: 0.000355 [with_stream_mark]: 1.459e-05 [recompute_prepare]: 7.68999e-06 [updatestate_depend_eliminate]: 3.60998e-06 [updatestate_assign_eliminate]: 3.48e-06 [updatestate_loads_eliminate]: 3.55998e-06 [parameter_eliminate]: 2.18998e-06 [a_2]: 7.993e-05 [accelerated_algorithm]: 6.88e-06 [shard]: 1.86e-06 [meta_shard_fg_expand]: 1.75001e-06 [shard_inline]: 6.35002e-06 [merge_send_recv]: 9.22001e-06 [auto_parallel]: 5.96e-06 [parallel]: 1.767e-05 [flash_sp]: 7.76001e-06 [merge_comm]: 3.85e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 9.96e-06 [allreduce_slice_to_reducescatter]: 1.40999e-06 [virtual_shard_identity]: 7.27997e-06 [virtual_dataset]: 6.23e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.71998e-06 [merge_forward]: 3.81999e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.96998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.223e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 1.011e-05 [set_forward_comm_id_for_comm_node_pass]: 3.70998e-06 [meta_fg_expand]: 2.58e-06 [flash_sp_send_recv_attached]: 3.04001e-06 [receive_attached]: 2.29001e-06 [after_resolve]: 1.099e-05 [a_after_grad]: 9.70002e-06 [renormalize]: 0.00046539 [add_forward_monad_depend]: 5.70001e-06 [auto_monad_grad]: 2.34001e-06 [auto_monad_eliminator]: 1.732e-05 [cse]: 3.497e-05 [a_3]: 4.633e-05 [Cycle 2]: 0.00063715, [45] [expand_dump_flag]: 1.24003e-06 [switch_simplify]: 7.28e-06 [loop_unroll]: 5.78002e-06 [a_1]: 0.00012337 [with_stream_mark]: 1.489e-05 [recompute_prepare]: 6.36998e-06 [updatestate_depend_eliminate]: 3.42002e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.50002e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 7.146e-05 [accelerated_algorithm]: 5.77001e-06 [shard]: 1.19e-06 [meta_shard_fg_expand]: 1.47001e-06 [shard_inline]: 6.01e-06 [merge_send_recv]: 5.12e-06 [auto_parallel]: 5.56002e-06 [parallel]: 5.39998e-06 [flash_sp]: 3.43999e-06 [merge_comm]: 3.22002e-06 [allreduce_fusion]: 3.13e-06 [matmul_add_comm_reduction]: 6.04999e-06 [allreduce_slice_to_reducescatter]: 4.30009e-07 [virtual_shard_identity]: 6.55997e-06 [virtual_dataset]: 5.39998e-06 [get_grad_eliminate_]: 5.34e-06 [virtual_output]: 5.82001e-06 [merge_forward]: 3.11999e-06 [cell_reuse_recompute_pass]: 1.60999e-06 [offload_activation]: 7.66999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.209e-05 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 8.90001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.59002e-06 [meta_fg_expand]: 1.66002e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.23002e-06 [after_resolve]: 8.77e-06 [a_after_grad]: 8.07e-06 [renormalize]: 7.00238e-08 [add_forward_monad_depend]: 1.32e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 6.89001e-06 [cse]: 1.494e-05 [a_3]: 3.24e-05 [py_interpret_to_execute_after_opt_a]: 9.20001e-06 [slice_cell_reuse_recomputed_activation]: 2.16998e-06 [rewriter_after_opt_a]: 3.479e-05 [convert_after_rewriter]: 6.88998e-06 [order_py_execute_after_rewriter]: 5.09998e-06 [mutable_eliminate]: 0.00052298 [opt_b]: 0.00018743, [1] [Cycle 1]: 0.00018061, [7] [b_1]: 0.0001096 [b_2]: 7.33999e-06 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 2.50997e-06 [updatestate_loads_eliminate]: 2.26998e-06 [renormalize]: 5.19998e-07 [cse]: 1.778e-05 [optimize_parallel_all_gather_comm]: 1.582e-05 [overlap_param_gather]: 1.99e-06 [cconv]: 2.382e-05 [loop_unroll]: 0.00042175 [opt_after_cconv]: 0.0001016, [1] [Cycle 1]: 9.569e-05, [7] [c_1]: 2.591e-05 [parameter_eliminate]: 2.47001e-06 [updatestate_depend_eliminate]: 5.52999e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.56e-06 [cse]: 1.887e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.613e-05 [tuple_transform]: 7.015e-05, [1] [Cycle 1]: 6.517e-05, [4] [d_1]: 3.779e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.43e-06 [partial_unused_args_eliminate]: 2.16998e-06 [add_recomputation]: 4.492e-05 [cse_after_recomputation]: 2.207e-05, [1] [Cycle 1]: 1.703e-05, [1] [cse]: 1.178e-05 [environ_conv]: 5.61e-06 [swap_dp_allreduce_reducescatter]: 5.14e-06 [bias_add_comm_swap]: 2.29999e-06 [label_micro_interleaved_index]: 4.18001e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.39998e-06 [slice_recompute_activation]: 2.24999e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.60999e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.80002e-06 [comm_op_add_attrs]: 1.17e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.38002e-06 [interleave_parallel_branches]: 1.09003e-06 [overlap_opt_shard_in_pipeline]: 1.23002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.247e-05 [grouped_pairwise_exchange_alltoall]: 1.42e-06 [offloading_packed_experts]: 3.86999e-06 [overlap_recompute_and_grad_model_parallel]: 4.47e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.48e-06 [overlap_grad_ring_attention]: 4.40999e-06 [overlap_grad_flash_sp]: 1.816e-05 [begin_end_overlap_inline]: 6.89994e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 1.06002e-06 [symbol_engine_optimizer]: 7.327e-05, [1] [Cycle 1]: 6.872e-05, [6] [build]: 2.37999e-06 [elim_shapecalc]: 9.07999e-06 [elim_not_effective]: 1.24e-05 [opt_reshape]: 6.48e-06 [fold_const_symbol]: 9.94999e-06 [renormalize]: 2.50002e-07 [detach_backward]: 2.02999e-06 [pipeline_parallel_scheduler]: 1.81e-06 [auto_monad_reorder]: 1.633e-05 [get_jit_bprop_graph]: 1.09003e-06 [rewriter_after_jit_bprop_graph]: 3.49001e-06 [opt_after_jit_grad]: 0.00046622 [validate]: 3.676e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0420941 [execute]: 1.045e-05 Sums bootstrap : 0.000488s : 0.93% type_inference : 0.005952s : 11.34% event_method : 0.000013s : 0.02% auto_monad : 0.000058s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000025s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.04% optimize.rewriter_before_opt_a : 0.000052s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000037s : 0.07% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000478s : 0.91% optimize.opt_a.with_stream_mark : 0.000029s : 0.06% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000151s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000014s : 0.03% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000012s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000018s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.04% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000465s : 0.89% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.05% optimize.opt_a.cse : 0.000050s : 0.10% optimize.opt_a.a_3 : 0.000079s : 0.15% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.07% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000523s : 1.00% optimize.opt_b.b_1 : 0.000110s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.05% optimize.loop_unroll : 0.000422s : 0.80% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.04% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.03% optimize.tuple_transform.d_1 : 0.000038s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.09% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000466s : 0.89% validate : 0.000037s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.042094s : 80.22% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000148 24 20.52% : 0.000030s : 4: substitution.arithmetic_simplify 1.37% : 0.000002s : 2: substitution.elim_not_effective 1.12% : 0.000002s : 2: substitution.fold_const_symbol 3.75% : 0.000006s : 3: substitution.graph_param_transform 64.93% : 0.000096s : 3: substitution.inline 2.25% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.53% : 0.000005s : 4: substitution.remove_not_recompute_node 2.54% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.005907 2 91.91% : 0.005429s : 1: type_inference.infer 8.09% : 0.000478s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000094 3 100.00% : 0.000094s : 3: match.inline ------[predicate.] 0.000150 815 0.80% : 0.000001s : 8: predicate.accumulaten_eliminater 0.98% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 6: predicate.addn_check_dump 0.88% : 0.000001s : 8: predicate.addn_zero_filter 0.78% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.63% : 0.000004s : 14: predicate.arithmetic_simplify 1.11% : 0.000002s : 8: predicate.cast_eliminate 0.82% : 0.000001s : 6: predicate.check_bprop_eliminate 0.61% : 0.000001s : 6: predicate.compare_switch_simplify 0.24% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.85% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 8: predicate.dict_set_item_eliminator 0.98% : 0.000001s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.45% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 11: predicate.environ_get_depend_swap 1.76% : 0.000003s : 17: predicate.environ_get_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.14% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.21% : 0.000003s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.88% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.71% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.71% : 0.000001s : 6: predicate.incorporate_call 0.61% : 0.000001s : 6: predicate.incorporate_call_switch 6.05% : 0.000009s : 37: predicate.inline 1.05% : 0.000002s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.10% : 0.000002s : 6: predicate.less_batch_normalization 1.63% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 22: predicate.load_eliminater 1.01% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.00% : 0.000003s : 18: predicate.loop_unroll_before_grad 2.12% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 6: predicate.merge_addn 0.75% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.22% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.52% : 0.000001s : 3: predicate.parallel_virtual_node 1.53% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 11: predicate.partial_eliminate 0.84% : 0.000001s : 8: predicate.print_const_string_wrapper 0.75% : 0.000001s : 6: predicate.reduce_all_const_elim 1.16% : 0.000002s : 8: predicate.reduce_eliminate 2.26% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.89% : 0.000001s : 6: predicate.remove_not_recompute_node 1.22% : 0.000002s : 14: predicate.replace_applicator 0.62% : 0.000001s : 6: predicate.replace_old_param 0.33% : 0.000000s : 3: predicate.reset_defer_inline 0.94% : 0.000001s : 8: predicate.reshape_eliminate 0.59% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 3: predicate.row_tensor_eliminate 0.94% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 6: predicate.shard_identity_eliminate 0.78% : 0.000001s : 6: predicate.special_op_eliminate 0.88% : 0.000001s : 6: predicate.specialize_transform 1.16% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.23% : 0.000002s : 11: predicate.switch_defer_inline 1.86% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.92% : 0.000007s : 38: predicate.switch_simplify 0.93% : 0.000001s : 8: predicate.tile_eliminate 0.83% : 0.000001s : 8: predicate.transpose_eliminate 1.52% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 2.96% : 0.000004s : 20: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.00% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.62% : 0.000001s : 3: predicate.value_based_eliminate 0.77% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.78% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000304 7 35.90% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 64.10% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.065192 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.70% : 0.003063s : 1: add_attr 4.68% : 0.003054s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000064s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.80% : 0.000525s : 1: bootstrap 0.04% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.03% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.66% : 0.000430s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.82% : 0.000532s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.30% : 0.000848s : 78: opt.transform.opt_a 0.04% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000042s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 3.31% : 0.002161s : 1: opt_a 0.16% : 0.000105s : 1: opt_after_cconv 0.73% : 0.000475s : 1: opt_after_jit_grad 0.29% : 0.000191s : 1: opt_b 6.30% : 0.004109s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000006s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.05% : 0.000030s : 1: pre_auto_parallel 0.04% : 0.000023s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000020s : 1: remove_dup_value 0.36% : 0.000232s : 1: renormalize.infer 0.35% : 0.000226s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000039s : 1: rewriter_after_opt_a 0.09% : 0.000056s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000076s : 1: symbol_engine_optimizer 64.60% : 0.042114s : 1: task_emit 0.11% : 0.000073s : 1: tuple_transform 9.15% : 0.005967s : 1: type_inference 0.09% : 0.000061s : 1: validate TotalTime = 0.0787929, [24] [bootstrap]: 0.00049843 [type_inference]: 0.011787 [event_method]: 4.364e-05 [auto_monad]: 0.00012818 [graph_reusing]: 8.83001e-06 [inline]: 1.62001e-06 [add_attr]: 0.00300127, [1] [add_attr_with_inline]: 0.00299281, [1] [Cycle 1]: 7.114e-05, [2] [tag_attr]: 3.309e-05 [meta_addattr_fg_expand]: 9.47999e-06 [parallel-infer-symbol]: 3.04999e-06 [pre_auto_parallel]: 4.635e-05 [insert-virtual-dataset]: 2.42001e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.95001e-06 [optimize]: 0.0159803, [53] [py_interpret_to_execute]: 3.636e-05 [rewriter_before_opt_a]: 0.00015508 [opt_a]: 0.0138791, [3] [Cycle 1]: 0.0105088, [45] [expand_dump_flag]: 4.12e-06 [switch_simplify]: 7.472e-05 [loop_unroll]: 5.993e-05 [a_1]: 0.00134582 [with_stream_mark]: 2.31e-05 [recompute_prepare]: 2.19e-05 [updatestate_depend_eliminate]: 9.00999e-06 [updatestate_assign_eliminate]: 7.45e-06 [updatestate_loads_eliminate]: 7.06001e-06 [parameter_eliminate]: 2.58e-06 [a_2]: 0.00024212 [accelerated_algorithm]: 3.085e-05 [shard]: 1.82001e-06 [meta_shard_fg_expand]: 3.86001e-06 [shard_inline]: 1.593e-05 [merge_send_recv]: 1.643e-05 [auto_parallel]: 1.043e-05 [parallel]: 1.863e-05 [flash_sp]: 1.098e-05 [merge_comm]: 9.38002e-06 [allreduce_fusion]: 8.67e-06 [matmul_add_comm_reduction]: 2.65e-05 [allreduce_slice_to_reducescatter]: 8.80013e-07 [virtual_shard_identity]: 1.736e-05 [virtual_dataset]: 1.565e-05 [get_grad_eliminate_]: 1.515e-05 [virtual_output]: 1.461e-05 [merge_forward]: 9.44e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 1.716e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.029e-05 [merge_recompute_call_nodes]: 1.79998e-06 [before_grad]: 2.862e-05 [set_forward_comm_id_for_comm_node_pass]: 9.20999e-06 [meta_fg_expand]: 0.00142513 [flash_sp_send_recv_attached]: 3.8e-06 [receive_attached]: 2.74001e-06 [after_resolve]: 6.324e-05 [a_after_grad]: 0.00010657 [renormalize]: 0.00592377 [add_forward_monad_depend]: 9.34e-06 [auto_monad_grad]: 5.47999e-06 [auto_monad_eliminator]: 5.181e-05 [cse]: 0.00018239 [a_3]: 0.00032938 [Cycle 2]: 0.00267772, [45] [expand_dump_flag]: 1.55999e-06 [switch_simplify]: 4.534e-05 [loop_unroll]: 4.208e-05 [a_1]: 0.00135278 [with_stream_mark]: 1.157e-05 [recompute_prepare]: 8.75001e-06 [updatestate_depend_eliminate]: 4.41002e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 8.822e-05 [accelerated_algorithm]: 1.014e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 6.77002e-06 [merge_send_recv]: 6.00002e-06 [auto_parallel]: 5.96e-06 [parallel]: 4.74e-06 [flash_sp]: 3.55e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.43e-06 [matmul_add_comm_reduction]: 6.29001e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 7.95e-06 [virtual_dataset]: 6.39001e-06 [get_grad_eliminate_]: 5.91998e-06 [virtual_output]: 5.92999e-06 [merge_forward]: 3.21001e-06 [cell_reuse_recompute_pass]: 9.20001e-07 [offload_activation]: 7.48999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.357e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.115e-05 [set_forward_comm_id_for_comm_node_pass]: 4.09997e-06 [meta_fg_expand]: 5.103e-05 [flash_sp_send_recv_attached]: 9.80013e-07 [receive_attached]: 1.11002e-06 [after_resolve]: 1.135e-05 [a_after_grad]: 1.013e-05 [renormalize]: 0.00059317 [add_forward_monad_depend]: 4.15999e-06 [auto_monad_grad]: 1.22999e-06 [auto_monad_eliminator]: 1.086e-05 [cse]: 2.116e-05 [a_3]: 4.684e-05 [Cycle 3]: 0.00067836, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 8.32e-06 [loop_unroll]: 6.59999e-06 [a_1]: 0.00014501 [with_stream_mark]: 8.01001e-06 [recompute_prepare]: 6.78e-06 [updatestate_depend_eliminate]: 3.85e-06 [updatestate_assign_eliminate]: 2.94999e-06 [updatestate_loads_eliminate]: 2.48e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 8.463e-05 [accelerated_algorithm]: 9.66e-06 [shard]: 8.50006e-07 [meta_shard_fg_expand]: 1.29998e-06 [shard_inline]: 6.73e-06 [merge_send_recv]: 5.16998e-06 [auto_parallel]: 5.61003e-06 [parallel]: 4.79e-06 [flash_sp]: 9.79984e-07 [merge_comm]: 3.68999e-06 [allreduce_fusion]: 3.69002e-06 [matmul_add_comm_reduction]: 5.92001e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 7.44002e-06 [virtual_dataset]: 6.24999e-06 [get_grad_eliminate_]: 6.19999e-06 [virtual_output]: 5.99999e-06 [merge_forward]: 3.13e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 6.81001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.286e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.042e-05 [set_forward_comm_id_for_comm_node_pass]: 3.85e-06 [meta_fg_expand]: 2.32001e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 9.10019e-07 [after_resolve]: 9.14e-06 [a_after_grad]: 9.31e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 7.35e-06 [cse]: 1.732e-05 [a_3]: 4.356e-05 [py_interpret_to_execute_after_opt_a]: 9.03002e-06 [slice_cell_reuse_recomputed_activation]: 2.31998e-06 [rewriter_after_opt_a]: 4.318e-05 [convert_after_rewriter]: 7.75e-06 [order_py_execute_after_rewriter]: 6.00002e-06 [mutable_eliminate]: 0.00046379 [opt_b]: 0.00021282, [1] [Cycle 1]: 0.00020684, [7] [b_1]: 0.00013055 [b_2]: 8.45001e-06 [updatestate_depend_eliminate]: 5.98998e-06 [updatestate_assign_eliminate]: 2.90998e-06 [updatestate_loads_eliminate]: 2.52001e-06 [renormalize]: 7.50006e-07 [cse]: 2.107e-05 [optimize_parallel_all_gather_comm]: 1.737e-05 [overlap_param_gather]: 2.02999e-06 [cconv]: 1.906e-05 [loop_unroll]: 0.00042474 [opt_after_cconv]: 0.0001103, [1] [Cycle 1]: 0.00010446, [7] [c_1]: 3.318e-05 [parameter_eliminate]: 2.21998e-06 [updatestate_depend_eliminate]: 6.30002e-06 [updatestate_assign_eliminate]: 3.23e-06 [updatestate_loads_eliminate]: 2.84001e-06 [cse]: 2.131e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 1.626e-05 [tuple_transform]: 7.725e-05, [1] [Cycle 1]: 7.233e-05, [4] [d_1]: 4.511e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 7.16001e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 4.984e-05 [cse_after_recomputation]: 2.567e-05, [1] [Cycle 1]: 2.081e-05, [1] [cse]: 1.516e-05 [environ_conv]: 8.25e-06 [swap_dp_allreduce_reducescatter]: 5.85002e-06 [bias_add_comm_swap]: 2.36998e-06 [label_micro_interleaved_index]: 4.75999e-06 [label_fine_grained_interleaved_index]: 2.94001e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.56e-06 [assign_add_opt]: 1.33002e-06 [ForceFp32Comm]: 7.90023e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.37001e-06 [reorder_send_recv_between_fp_bp]: 2.71999e-06 [comm_op_add_attrs]: 1.39e-06 [add_comm_op_reuse_tag]: 1.25001e-06 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.17999e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.369e-05 [grouped_pairwise_exchange_alltoall]: 2.05002e-06 [offloading_packed_experts]: 4.35e-06 [overlap_recompute_and_grad_model_parallel]: 5.25999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.62999e-06 [overlap_recompute_comm]: 2.78e-06 [overlap_grad_ring_attention]: 4.52e-06 [overlap_grad_flash_sp]: 2.014e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.44001e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 1.03001e-06 [symbol_engine_optimizer]: 8.489e-05, [1] [Cycle 1]: 8.042e-05, [6] [build]: 8.46002e-06 [elim_shapecalc]: 1.027e-05 [elim_not_effective]: 1.42e-05 [opt_reshape]: 7.32002e-06 [fold_const_symbol]: 1.13e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.86e-06 [pipeline_parallel_scheduler]: 1.62999e-06 [auto_monad_reorder]: 2.052e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.3e-06 [opt_after_jit_grad]: 0.00050805 [validate]: 4.039e-05 [backend_pass]: 1.10001e-06 [task_emit]: 0.0464951 [execute]: 9.04998e-06 Sums bootstrap : 0.000498s : 0.67% type_inference : 0.011787s : 15.82% event_method : 0.000044s : 0.06% auto_monad : 0.000128s : 0.17% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.06% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.05% optimize.rewriter_before_opt_a : 0.000155s : 0.21% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000128s : 0.17% optimize.opt_a.loop_unroll : 0.000109s : 0.15% optimize.opt_a.a_1 : 0.002844s : 3.82% optimize.opt_a.with_stream_mark : 0.000043s : 0.06% optimize.opt_a.recompute_prepare : 0.000037s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.02% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000415s : 0.56% optimize.opt_a.accelerated_algorithm : 0.000051s : 0.07% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000029s : 0.04% optimize.opt_a.merge_send_recv : 0.000028s : 0.04% optimize.opt_a.auto_parallel : 0.000022s : 0.03% optimize.opt_a.parallel : 0.000028s : 0.04% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000017s : 0.02% optimize.opt_a.allreduce_fusion : 0.000016s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000039s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000033s : 0.04% optimize.opt_a.virtual_dataset : 0.000028s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000027s : 0.04% optimize.opt_a.virtual_output : 0.000027s : 0.04% optimize.opt_a.merge_forward : 0.000016s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000031s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000057s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000050s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000017s : 0.02% optimize.opt_a.meta_fg_expand : 0.001478s : 1.98% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000084s : 0.11% optimize.opt_a.a_after_grad : 0.000126s : 0.17% optimize.opt_a.renormalize : 0.006517s : 8.75% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.02% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000070s : 0.09% optimize.opt_a.cse : 0.000221s : 0.30% optimize.opt_a.a_3 : 0.000420s : 0.56% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000043s : 0.06% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000464s : 0.62% optimize.opt_b.b_1 : 0.000131s : 0.18% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000021s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.03% optimize.loop_unroll : 0.000425s : 0.57% optimize.opt_after_cconv.c_1 : 0.000033s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000021s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.02% optimize.tuple_transform.d_1 : 0.000045s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.07% optimize.cse_after_recomputation.cse : 0.000015s : 0.02% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000508s : 0.68% validate : 0.000040s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.046495s : 62.41% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000672 159 6.67% : 0.000045s : 7: substitution.arithmetic_simplify 0.35% : 0.000002s : 3: substitution.elim_not_effective 0.61% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 3: substitution.fold_const_symbol 0.90% : 0.000006s : 4: substitution.graph_param_transform 0.42% : 0.000003s : 2: substitution.incorporate_call 0.36% : 0.000002s : 2: substitution.incorporate_call_switch 57.80% : 0.000389s : 17: substitution.inline 2.49% : 0.000017s : 2: substitution.inline_without_move 1.53% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.24% : 0.000015s : 3: substitution.less_batch_normalization 1.43% : 0.000010s : 7: substitution.minmaximum_grad 0.89% : 0.000006s : 5: substitution.partial_eliminate 2.02% : 0.000014s : 15: substitution.remove_not_recompute_node 3.87% : 0.000026s : 10: substitution.replace_applicator 1.31% : 0.000009s : 10: substitution.replace_old_param 0.38% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.16% : 0.000021s : 7: substitution.tuple_list_convert_item_index_to_positive 1.52% : 0.000010s : 7: substitution.tuple_list_get_item_const_eliminator 2.02% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.15% : 0.000048s : 18: substitution.tuple_list_get_item_eliminator 2.01% : 0.000014s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011715 2 87.09% : 0.010202s : 1: type_inference.infer 12.91% : 0.001513s : 1: type_inference.specialize ------[replace.] 0.000186 26 66.10% : 0.000123s : 17: replace.inline 33.90% : 0.000063s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000404 26 94.03% : 0.000380s : 17: match.inline 5.97% : 0.000024s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000723 4180 1.05% : 0.000008s : 52: predicate.accumulaten_eliminater 0.24% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.43% : 0.000003s : 21: predicate.addn_check_dump 1.05% : 0.000008s : 52: predicate.addn_zero_filter 1.01% : 0.000007s : 52: predicate.adjust_all_reduce_mul_add 1.84% : 0.000013s : 73: predicate.arithmetic_simplify 1.09% : 0.000008s : 52: predicate.cast_eliminate 1.06% : 0.000008s : 50: predicate.check_bprop_eliminate 0.44% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.43% : 0.000003s : 21: predicate.depend_value_elim 1.09% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.14% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.05% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000001s : 4: predicate.elim_not_effective 0.11% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.10% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.10% : 0.000008s : 56: predicate.environ_get_depend_swap 1.58% : 0.000011s : 77: predicate.environ_get_eliminate 1.11% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.70% : 0.000012s : 78: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 78: predicate.float_depend_g_call 0.44% : 0.000003s : 21: predicate.float_environ_get_switch 0.55% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.05% : 0.000000s : 4: predicate.fold_const_symbol 0.49% : 0.000004s : 21: predicate.get_grad_eliminate 0.06% : 0.000000s : 4: predicate.graph_param_transform 0.48% : 0.000003s : 21: predicate.incorporate_call 0.44% : 0.000003s : 21: predicate.incorporate_call_switch 5.50% : 0.000040s : 180: predicate.inline 1.40% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.56% : 0.000004s : 21: predicate.less_batch_normalization 1.46% : 0.000011s : 69: predicate.list_to_tuple_eliminator_ 2.44% : 0.000018s : 121: predicate.load_eliminater 0.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.38% : 0.000017s : 110: predicate.loop_unroll_before_grad 1.28% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.45% : 0.000003s : 21: predicate.merge_addn 1.05% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.06% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.06% : 0.000008s : 52: predicate.minmaximum_grad 0.28% : 0.000002s : 4: predicate.mutable_eliminate 0.10% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 1.97% : 0.000014s : 78: predicate.partial_defer_inline 1.60% : 0.000012s : 65: predicate.partial_eliminate 1.05% : 0.000008s : 52: predicate.print_const_string_wrapper 0.45% : 0.000003s : 21: predicate.reduce_all_const_elim 1.28% : 0.000009s : 52: predicate.reduce_eliminate 2.47% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.29% : 0.000002s : 21: predicate.remove_not_recompute_node 1.78% : 0.000013s : 111: predicate.replace_applicator 0.66% : 0.000005s : 45: predicate.replace_old_param 0.07% : 0.000001s : 4: predicate.reset_defer_inline 1.11% : 0.000008s : 52: predicate.reshape_eliminate 1.04% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.11% : 0.000001s : 4: predicate.row_tensor_eliminate 1.19% : 0.000009s : 50: predicate.same_eliminate 0.32% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.53% : 0.000004s : 21: predicate.shard_identity_eliminate 0.21% : 0.000002s : 8: predicate.special_op_eliminate 0.58% : 0.000004s : 21: predicate.specialize_transform 1.15% : 0.000008s : 50: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 78: predicate.switch_defer_inline 2.83% : 0.000020s : 128: predicate.switch_layer_defer_inline 4.94% : 0.000036s : 213: predicate.switch_simplify 1.06% : 0.000008s : 52: predicate.tile_eliminate 1.03% : 0.000007s : 52: predicate.transpose_eliminate 8.03% : 0.000058s : 60: predicate.tuple_list_convert_item_index_to_positive 1.45% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.27% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.47% : 0.000018s : 90: predicate.tuple_list_get_item_eliminator 1.35% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 1.87% : 0.000014s : 81: predicate.tuple_list_set_item_eliminator 1.43% : 0.000010s : 69: predicate.tuple_to_list_eliminator_ 2.43% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 2.96% : 0.000021s : 142: predicate.updatestate_useless_node_eliminater 0.11% : 0.000001s : 4: predicate.value_based_eliminate 0.49% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.48% : 0.000003s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001664 35 59.69% : 0.000994s : 14: func_graph_cloner_run.FuncGraphClonerGraph 40.31% : 0.000671s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.108812 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.76% : 0.003006s : 1: add_attr 2.75% : 0.002997s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.12% : 0.000135s : 1: auto_monad 0.02% : 0.000024s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.49% : 0.000532s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000029s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.05% : 0.000050s : 1: event_method 0.01% : 0.000015s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.40% : 0.000433s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.43% : 0.000472s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000015s : 1: opt.transform.mutable_eliminate 3.98% : 0.004332s : 117: opt.transform.opt_a 0.03% : 0.000032s : 1: opt.transform.opt_after_cconv 0.02% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000112s : 28: opt.transform.opt_b 0.05% : 0.000050s : 2: opt.transform.opt_trans_graph 0.04% : 0.000040s : 4: opt.transform.symbol_engine_opt 12.76% : 0.013882s : 1: opt_a 0.10% : 0.000114s : 1: opt_after_cconv 0.48% : 0.000517s : 1: opt_after_jit_grad 0.20% : 0.000216s : 1: opt_b 14.69% : 0.015984s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000051s : 1: pre_auto_parallel 0.04% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000020s : 1: remove_dup_value 4.63% : 0.005038s : 2: renormalize.infer 1.35% : 0.001466s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000047s : 1: rewriter_after_opt_a 0.15% : 0.000160s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000088s : 1: symbol_engine_optimizer 42.75% : 0.046514s : 1: task_emit 0.07% : 0.000080s : 1: tuple_transform 10.85% : 0.011801s : 1: type_inference 0.06% : 0.000062s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x1-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x1-ge],max_mem:10.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x2-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x2-pynative],max_mem:10.0M TotalTime = 0.0220187, [24] [bootstrap]: 0.00052736 [type_inference]: 0.00639534 [event_method]: 1.392e-05 [auto_monad]: 5.863e-05 [graph_reusing]: 6.25002e-06 [inline]: 1.81998e-06 [add_attr]: 0.00357603, [1] [add_attr_with_inline]: 0.00356572, [1] [Cycle 1]: 4.581e-05, [2] [tag_attr]: 1.478e-05 [meta_addattr_fg_expand]: 4.39002e-06 [parallel-infer-symbol]: 2.89999e-06 [pre_auto_parallel]: 2.579e-05 [insert-virtual-dataset]: 2.44001e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 1.99999e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.00414253, [53] [py_interpret_to_execute]: 2.044e-05 [rewriter_before_opt_a]: 6.354e-05 [opt_a]: 0.00223864, [2] [Cycle 1]: 0.00158616, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 3.462e-05 [loop_unroll]: 2.02e-05 [a_1]: 0.0004421 [with_stream_mark]: 1.425e-05 [recompute_prepare]: 7.46001e-06 [updatestate_depend_eliminate]: 3.97998e-06 [updatestate_assign_eliminate]: 3.85e-06 [updatestate_loads_eliminate]: 2.96999e-06 [parameter_eliminate]: 2.16998e-06 [a_2]: 7.976e-05 [accelerated_algorithm]: 6.59001e-06 [shard]: 2.04e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 6.09001e-06 [merge_send_recv]: 8.08001e-06 [auto_parallel]: 6.08002e-06 [parallel]: 2.394e-05 [flash_sp]: 7.7e-06 [merge_comm]: 3.78001e-06 [allreduce_fusion]: 3.68999e-06 [matmul_add_comm_reduction]: 9.34e-06 [allreduce_slice_to_reducescatter]: 9.40025e-07 [virtual_shard_identity]: 7.67998e-06 [virtual_dataset]: 6.14001e-06 [get_grad_eliminate_]: 5.79e-06 [virtual_output]: 5.81e-06 [merge_forward]: 4.03999e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.44e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.193e-05 [merge_recompute_call_nodes]: 1.70001e-06 [before_grad]: 1.036e-05 [set_forward_comm_id_for_comm_node_pass]: 3.69002e-06 [meta_fg_expand]: 2.75997e-06 [flash_sp_send_recv_attached]: 2.50002e-06 [receive_attached]: 2.44001e-06 [after_resolve]: 9.65002e-06 [a_after_grad]: 8.59998e-06 [renormalize]: 0.00046526 [add_forward_monad_depend]: 7.71999e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.354e-05 [cse]: 2.977e-05 [a_3]: 4.134e-05 [Cycle 2]: 0.00064318, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 7.11999e-06 [loop_unroll]: 5.72001e-06 [a_1]: 0.0001501 [with_stream_mark]: 1.019e-05 [recompute_prepare]: 6.22001e-06 [updatestate_depend_eliminate]: 2.97002e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.56998e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 7.086e-05 [accelerated_algorithm]: 5.89999e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.23002e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 4.68999e-06 [auto_parallel]: 5.59e-06 [parallel]: 4.12998e-06 [flash_sp]: 3.5e-06 [merge_comm]: 3.23e-06 [allreduce_fusion]: 2.81e-06 [matmul_add_comm_reduction]: 5.67999e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 6.70002e-06 [virtual_dataset]: 5.66e-06 [get_grad_eliminate_]: 5.27001e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.69999e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 6.13002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.017e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.95001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 1.78002e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 9.29984e-07 [after_resolve]: 8.57e-06 [a_after_grad]: 7.88001e-06 [renormalize]: 7.00238e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 1.01002e-06 [auto_monad_eliminator]: 7.15e-06 [cse]: 1.532e-05 [a_3]: 3.212e-05 [py_interpret_to_execute_after_opt_a]: 7.82e-06 [slice_cell_reuse_recomputed_activation]: 2.64001e-06 [rewriter_after_opt_a]: 3.185e-05 [convert_after_rewriter]: 6.66e-06 [order_py_execute_after_rewriter]: 5.74e-06 [mutable_eliminate]: 0.0004661 [opt_b]: 0.00018605, [1] [Cycle 1]: 0.00017983, [7] [b_1]: 0.00010901 [b_2]: 6.98998e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.53e-06 [renormalize]: 3.99974e-07 [cse]: 1.824e-05 [optimize_parallel_all_gather_comm]: 1.624e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.309e-05 [loop_unroll]: 0.00042391 [opt_after_cconv]: 9.574e-05, [1] [Cycle 1]: 9.006e-05, [7] [c_1]: 2.522e-05 [parameter_eliminate]: 2.39001e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 2.81e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.805e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.595e-05 [tuple_transform]: 6.923e-05, [1] [Cycle 1]: 6.488e-05, [4] [d_1]: 3.74e-05 [none_parameter_eliminate]: 1.76e-06 [renormalize]: 4.00003e-07 [switch_simplify]: 6.74999e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.998e-05 [cse_after_recomputation]: 2.249e-05, [1] [Cycle 1]: 1.774e-05, [1] [cse]: 1.199e-05 [environ_conv]: 8.04997e-06 [swap_dp_allreduce_reducescatter]: 5.23002e-06 [bias_add_comm_swap]: 2.51e-06 [label_micro_interleaved_index]: 4.80001e-06 [label_fine_grained_interleaved_index]: 2.63998e-06 [merge_cast_opt]: 1.66e-06 [slice_recompute_activation]: 2.32001e-06 [micro_interleaved_order_control]: 2.46998e-06 [assign_add_opt]: 1.31002e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.44001e-06 [reorder_send_recv_between_fp_bp]: 2.88998e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 1.28002e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 2.03002e-06 [control_data_broadcast_order]: 1.191e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 3.7e-06 [overlap_recompute_and_grad_model_parallel]: 4.67998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 4.22e-06 [overlap_grad_flash_sp]: 1.895e-05 [begin_end_overlap_inline]: 5.49975e-07 [split_matmul_comm_elemetwise]: 2.43998e-06 [split_layernorm_comm]: 1.66002e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 7.046e-05, [1] [Cycle 1]: 6.608e-05, [6] [build]: 2.67001e-06 [elim_shapecalc]: 8.50999e-06 [elim_not_effective]: 1.153e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 9.31e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.86003e-06 [pipeline_parallel_scheduler]: 1.54998e-06 [auto_monad_reorder]: 1.625e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 0.00013749 [opt_after_jit_grad]: 0.00046162 [validate]: 3.467e-05 [backend_pass]: 1.00999e-06 [task_emit]: 0.00639258 [execute]: 7.5e-06 Sums bootstrap : 0.000527s : 3.02% type_inference : 0.006395s : 36.68% event_method : 0.000014s : 0.08% auto_monad : 0.000059s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000064s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000042s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000592s : 3.40% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000151s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000016s : 0.09% optimize.opt_a.renormalize : 0.000465s : 2.67% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.05% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000045s : 0.26% optimize.opt_a.a_3 : 0.000073s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.02% optimize.rewriter_after_opt_a : 0.000032s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000466s : 2.67% optimize.opt_b.b_1 : 0.000109s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000424s : 2.43% optimize.opt_after_cconv.c_1 : 0.000025s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000008s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000137s : 0.79% opt_after_jit_grad : 0.000462s : 2.65% validate : 0.000035s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006393s : 36.67% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000169 26 19.16% : 0.000032s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 3.43% : 0.000006s : 3: substitution.graph_param_transform 63.34% : 0.000107s : 3: substitution.inline 2.09% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.69% : 0.000005s : 4: substitution.remove_not_recompute_node 1.94% : 0.000003s : 2: substitution.replace_old_param 5.44% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006339 2 89.98% : 0.005704s : 1: type_inference.infer 10.02% : 0.000635s : 1: type_inference.specialize ------[replace.] 0.000037 4 79.21% : 0.000030s : 3: replace.inline 20.79% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000113 4 92.59% : 0.000105s : 3: match.inline 7.41% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 883 0.91% : 0.000001s : 9: predicate.accumulaten_eliminater 0.94% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 6: predicate.addn_check_dump 0.88% : 0.000001s : 9: predicate.addn_zero_filter 0.82% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 15: predicate.arithmetic_simplify 0.91% : 0.000001s : 9: predicate.cast_eliminate 0.67% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.03% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_depend_swap 1.73% : 0.000003s : 18: predicate.environ_get_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.26% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.71% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.56% : 0.000001s : 6: predicate.incorporate_call_switch 6.38% : 0.000010s : 40: predicate.inline 0.89% : 0.000001s : 6: predicate.inline_without_move 0.38% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 6: predicate.less_batch_normalization 1.81% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 25: predicate.load_eliminater 1.00% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.20% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.56% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 9: predicate.minmaximum_grad 1.29% : 0.000002s : 3: predicate.mutable_eliminate 0.34% : 0.000001s : 3: predicate.opt_reshape 0.34% : 0.000001s : 3: predicate.parallel_virtual_node 1.63% : 0.000003s : 13: predicate.partial_defer_inline 1.45% : 0.000002s : 13: predicate.partial_eliminate 0.92% : 0.000001s : 9: predicate.print_const_string_wrapper 0.63% : 0.000001s : 6: predicate.reduce_all_const_elim 1.17% : 0.000002s : 9: predicate.reduce_eliminate 2.35% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 6: predicate.remove_not_recompute_node 1.29% : 0.000002s : 16: predicate.replace_applicator 0.62% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 9: predicate.reshape_eliminate 0.59% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 3: predicate.row_tensor_eliminate 0.81% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 6: predicate.shard_identity_eliminate 0.83% : 0.000001s : 6: predicate.special_op_eliminate 0.86% : 0.000001s : 6: predicate.specialize_transform 0.88% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.73% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.40% : 0.000002s : 13: predicate.switch_defer_inline 1.96% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.94% : 0.000008s : 43: predicate.switch_simplify 0.93% : 0.000001s : 9: predicate.tile_eliminate 1.16% : 0.000002s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.52% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.52% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000003s : 21: predicate.tuple_list_set_item_eliminator 1.94% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.01% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 3: predicate.value_based_eliminate 0.69% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000380 8 46.59% : 0.000177s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.41% : 0.000203s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031313 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.43% : 0.003581s : 1: add_attr 11.40% : 0.003569s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000064s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.81% : 0.000567s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000026s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000012s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.38% : 0.000433s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.52% : 0.000475s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000014s : 1: opt.transform.mutable_eliminate 3.08% : 0.000964s : 78: opt.transform.opt_a 0.08% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000089s : 28: opt.transform.opt_b 0.13% : 0.000042s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.16% : 0.002241s : 1: opt_a 0.32% : 0.000099s : 1: opt_after_cconv 1.51% : 0.000472s : 1: opt_after_jit_grad 0.60% : 0.000189s : 1: opt_b 13.24% : 0.004146s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.80% : 0.000251s : 1: renormalize.infer 0.66% : 0.000207s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.46% : 0.000143s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000036s : 1: rewriter_after_opt_a 0.22% : 0.000068s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000073s : 1: symbol_engine_optimizer 20.45% : 0.006403s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 20.47% : 0.006409s : 1: type_inference 0.20% : 0.000061s : 1: validate TotalTime = 0.0202147, [24] [bootstrap]: 0.00043252 [type_inference]: 0.00593929 [event_method]: 1.233e-05 [auto_monad]: 6.086e-05 [graph_reusing]: 5.78002e-06 [inline]: 1.74998e-06 [add_attr]: 0.00304467, [1] [add_attr_with_inline]: 0.00303699, [1] [Cycle 1]: 5.006e-05, [2] [tag_attr]: 1.362e-05 [meta_addattr_fg_expand]: 4.02e-06 [parallel-infer-symbol]: 3.23998e-06 [pre_auto_parallel]: 2.384e-05 [insert-virtual-dataset]: 2.58e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.98002e-06 [pipeline_split]: 1.99e-06 [optimize]: 0.00394744, [53] [py_interpret_to_execute]: 1.929e-05 [rewriter_before_opt_a]: 5.129e-05 [opt_a]: 0.00209001, [2] [Cycle 1]: 0.00147902, [45] [expand_dump_flag]: 2.88e-06 [switch_simplify]: 2.842e-05 [loop_unroll]: 1.676e-05 [a_1]: 0.00035328 [with_stream_mark]: 1.428e-05 [recompute_prepare]: 7.77998e-06 [updatestate_depend_eliminate]: 3.95e-06 [updatestate_assign_eliminate]: 3.70998e-06 [updatestate_loads_eliminate]: 3.19001e-06 [parameter_eliminate]: 2.17999e-06 [a_2]: 8.061e-05 [accelerated_algorithm]: 6.78e-06 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 1.67999e-06 [shard_inline]: 6.31e-06 [merge_send_recv]: 8.67e-06 [auto_parallel]: 6.18002e-06 [parallel]: 1.926e-05 [flash_sp]: 7.28999e-06 [merge_comm]: 3.83999e-06 [allreduce_fusion]: 3.65e-06 [matmul_add_comm_reduction]: 1.032e-05 [allreduce_slice_to_reducescatter]: 8.60018e-07 [virtual_shard_identity]: 7.11999e-06 [virtual_dataset]: 5.94999e-06 [get_grad_eliminate_]: 5.92001e-06 [virtual_output]: 6.01e-06 [merge_forward]: 4e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 9.41003e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.166e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 1.023e-05 [set_forward_comm_id_for_comm_node_pass]: 4.09997e-06 [meta_fg_expand]: 2.81e-06 [flash_sp_send_recv_attached]: 2.81e-06 [receive_attached]: 2.85998e-06 [after_resolve]: 9.87999e-06 [a_after_grad]: 8.65001e-06 [renormalize]: 0.00039891 [add_forward_monad_depend]: 4.80999e-06 [auto_monad_grad]: 2.20002e-06 [auto_monad_eliminator]: 7.939e-05 [cse]: 2.972e-05 [a_3]: 4.126e-05 [Cycle 2]: 0.00060203, [45] [expand_dump_flag]: 1.15001e-06 [switch_simplify]: 7.12002e-06 [loop_unroll]: 5.59e-06 [a_1]: 0.00011456 [with_stream_mark]: 1.265e-05 [recompute_prepare]: 6.01e-06 [updatestate_depend_eliminate]: 3.09001e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.69999e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 7.087e-05 [accelerated_algorithm]: 5.81998e-06 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.87999e-06 [merge_send_recv]: 4.45999e-06 [auto_parallel]: 5.42001e-06 [parallel]: 4.12e-06 [flash_sp]: 3.40998e-06 [merge_comm]: 3.31999e-06 [allreduce_fusion]: 2.97002e-06 [matmul_add_comm_reduction]: 5.38002e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.31e-06 [virtual_dataset]: 5.42001e-06 [get_grad_eliminate_]: 5.33002e-06 [virtual_output]: 5.08002e-06 [merge_forward]: 2.74999e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 6.31e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.071e-05 [merge_recompute_call_nodes]: 8.2e-07 [before_grad]: 8.64e-06 [set_forward_comm_id_for_comm_node_pass]: 3.66999e-06 [meta_fg_expand]: 1.79e-06 [flash_sp_send_recv_attached]: 7.29982e-07 [receive_attached]: 8.90024e-07 [after_resolve]: 8.19002e-06 [a_after_grad]: 7.95e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 6.23e-06 [cse]: 1.356e-05 [a_3]: 3.277e-05 [py_interpret_to_execute_after_opt_a]: 7.66999e-06 [slice_cell_reuse_recomputed_activation]: 2.68e-06 [rewriter_after_opt_a]: 3.311e-05 [convert_after_rewriter]: 6.38e-06 [order_py_execute_after_rewriter]: 5.57999e-06 [mutable_eliminate]: 0.00045888 [opt_b]: 0.00018462, [1] [Cycle 1]: 0.00017837, [7] [b_1]: 0.0001081 [b_2]: 6.91001e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 5.20027e-07 [cse]: 1.77e-05 [optimize_parallel_all_gather_comm]: 1.635e-05 [overlap_param_gather]: 2.19001e-06 [cconv]: 2.238e-05 [loop_unroll]: 0.00041716 [opt_after_cconv]: 9.492e-05, [1] [Cycle 1]: 8.922e-05, [7] [c_1]: 2.588e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 5.09998e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.28002e-06 [cse]: 1.73e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.539e-05 [tuple_transform]: 6.795e-05, [1] [Cycle 1]: 6.334e-05, [4] [d_1]: 3.629e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.37001e-06 [partial_unused_args_eliminate]: 1.91e-06 [add_recomputation]: 4.452e-05 [cse_after_recomputation]: 2.166e-05, [1] [Cycle 1]: 1.721e-05, [1] [cse]: 1.182e-05 [environ_conv]: 4.93001e-06 [swap_dp_allreduce_reducescatter]: 5.02e-06 [bias_add_comm_swap]: 3.04999e-06 [label_micro_interleaved_index]: 4.38999e-06 [label_fine_grained_interleaved_index]: 2.60002e-06 [merge_cast_opt]: 1.34998e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.49999e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.11002e-06 [full_micro_interleaved_order_control]: 2.59001e-06 [reorder_send_recv_between_fp_bp]: 2.81e-06 [comm_op_add_attrs]: 1.06002e-06 [add_comm_op_reuse_tag]: 1.25001e-06 [interleave_split_concat_branches]: 1.24998e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.234e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.76999e-06 [overlap_recompute_and_grad_model_parallel]: 4.84e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.41002e-06 [overlap_recompute_comm]: 2.44001e-06 [overlap_grad_ring_attention]: 4.29002e-06 [overlap_grad_flash_sp]: 1.685e-05 [begin_end_overlap_inline]: 7.7e-07 [split_matmul_comm_elemetwise]: 2.36e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 1.10999e-06 [symbol_engine_optimizer]: 7.097e-05, [1] [Cycle 1]: 6.657e-05, [6] [build]: 2.24001e-06 [elim_shapecalc]: 8.94e-06 [elim_not_effective]: 1.178e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 9.47001e-06 [renormalize]: 2.29978e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.661e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.68e-06 [opt_after_jit_grad]: 0.00045305 [validate]: 3.315e-05 [backend_pass]: 1.29e-06 [task_emit]: 0.00601331 [execute]: 7.88001e-06 Sums bootstrap : 0.000433s : 2.67% type_inference : 0.005939s : 36.71% event_method : 0.000012s : 0.08% auto_monad : 0.000061s : 0.38% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000051s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000036s : 0.22% optimize.opt_a.loop_unroll : 0.000022s : 0.14% optimize.opt_a.a_1 : 0.000468s : 2.89% optimize.opt_a.with_stream_mark : 0.000027s : 0.17% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000151s : 0.94% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000399s : 2.47% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000086s : 0.53% optimize.opt_a.cse : 0.000043s : 0.27% optimize.opt_a.a_3 : 0.000074s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.02% optimize.rewriter_after_opt_a : 0.000033s : 0.20% optimize.convert_after_rewriter : 0.000006s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000459s : 2.84% optimize.opt_b.b_1 : 0.000108s : 0.67% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000417s : 2.58% optimize.opt_after_cconv.c_1 : 0.000026s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.10% optimize.tuple_transform.d_1 : 0.000036s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000453s : 2.80% validate : 0.000033s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006013s : 37.17% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000141 24 20.42% : 0.000029s : 4: substitution.arithmetic_simplify 1.33% : 0.000002s : 2: substitution.elim_not_effective 0.97% : 0.000001s : 2: substitution.fold_const_symbol 3.81% : 0.000005s : 3: substitution.graph_param_transform 65.81% : 0.000093s : 3: substitution.inline 2.17% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.35% : 0.000005s : 4: substitution.remove_not_recompute_node 2.14% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005895 2 92.03% : 0.005425s : 1: type_inference.infer 7.97% : 0.000470s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000091 3 100.00% : 0.000091s : 3: match.inline ------[predicate.] 0.000146 815 0.86% : 0.000001s : 8: predicate.accumulaten_eliminater 0.92% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 6: predicate.addn_check_dump 0.86% : 0.000001s : 8: predicate.addn_zero_filter 0.81% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.36% : 0.000003s : 14: predicate.arithmetic_simplify 0.92% : 0.000001s : 8: predicate.cast_eliminate 0.69% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.26% : 0.000000s : 3: predicate.const_output_eliminate 0.71% : 0.000001s : 6: predicate.depend_value_elim 0.84% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.45% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_depend_swap 1.84% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.30% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 6: predicate.float_environ_get_switch 0.93% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.79% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.77% : 0.000001s : 6: predicate.incorporate_call 0.61% : 0.000001s : 6: predicate.incorporate_call_switch 6.22% : 0.000009s : 37: predicate.inline 0.99% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 6: predicate.less_batch_normalization 1.61% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.33% : 0.000003s : 22: predicate.load_eliminater 0.99% : 0.000001s : 3: predicate.loop_unroll_after_grad 2.02% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.66% : 0.000001s : 6: predicate.merge_addn 0.65% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 8: predicate.minmaximum_grad 1.17% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.45% : 0.000002s : 11: predicate.partial_defer_inline 1.34% : 0.000002s : 11: predicate.partial_eliminate 0.84% : 0.000001s : 8: predicate.print_const_string_wrapper 0.67% : 0.000001s : 6: predicate.reduce_all_const_elim 1.12% : 0.000002s : 8: predicate.reduce_eliminate 2.19% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.61% : 0.000001s : 6: predicate.remove_not_recompute_node 1.23% : 0.000002s : 14: predicate.replace_applicator 0.68% : 0.000001s : 6: predicate.replace_old_param 0.27% : 0.000000s : 3: predicate.reset_defer_inline 0.95% : 0.000001s : 8: predicate.reshape_eliminate 0.69% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.86% : 0.000001s : 6: predicate.same_eliminate 0.48% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 6: predicate.shard_identity_eliminate 1.00% : 0.000001s : 6: predicate.special_op_eliminate 0.86% : 0.000001s : 6: predicate.specialize_transform 0.99% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.30% : 0.000002s : 11: predicate.switch_defer_inline 2.04% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.96% : 0.000007s : 38: predicate.switch_simplify 0.84% : 0.000001s : 8: predicate.tile_eliminate 0.82% : 0.000001s : 8: predicate.transpose_eliminate 1.58% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.47% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.05% : 0.000004s : 20: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.43% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.21% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 3: predicate.value_based_eliminate 0.77% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000280 7 37.30% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.70% : 0.000176s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028587 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.67% : 0.003049s : 1: add_attr 10.63% : 0.003040s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000066s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.64% : 0.000468s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.49% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.64% : 0.000468s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.90% : 0.000830s : 78: opt.transform.opt_a 0.09% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000088s : 28: opt.transform.opt_b 0.14% : 0.000041s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.32% : 0.002093s : 1: opt_a 0.34% : 0.000098s : 1: opt_after_cconv 1.62% : 0.000462s : 1: opt_after_jit_grad 0.66% : 0.000188s : 1: opt_b 13.82% : 0.003951s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000019s : 1: remove_dup_value 0.74% : 0.000213s : 1: renormalize.infer 0.63% : 0.000179s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000037s : 1: rewriter_after_opt_a 0.19% : 0.000055s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000074s : 1: symbol_engine_optimizer 21.07% : 0.006024s : 1: task_emit 0.25% : 0.000071s : 1: tuple_transform 20.83% : 0.005955s : 1: type_inference 0.22% : 0.000062s : 1: validate TotalTime = 0.0205721, [24] [bootstrap]: 0.00046 [type_inference]: 0.00581305 [event_method]: 1.411e-05 [auto_monad]: 5.911e-05 [graph_reusing]: 5.42001e-06 [inline]: 2.27999e-06 [add_attr]: 0.00309472, [1] [add_attr_with_inline]: 0.00308667, [1] [Cycle 1]: 5.349e-05, [2] [tag_attr]: 1.644e-05 [meta_addattr_fg_expand]: 4.4e-06 [parallel-infer-symbol]: 3.61001e-06 [pre_auto_parallel]: 2.683e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.00429759, [53] [py_interpret_to_execute]: 2.088e-05 [rewriter_before_opt_a]: 6.27e-05 [opt_a]: 0.00219915, [2] [Cycle 1]: 0.00158842, [45] [expand_dump_flag]: 2.87002e-06 [switch_simplify]: 3.322e-05 [loop_unroll]: 2.048e-05 [a_1]: 0.00044264 [with_stream_mark]: 1.523e-05 [recompute_prepare]: 8.28999e-06 [updatestate_depend_eliminate]: 4.01001e-06 [updatestate_assign_eliminate]: 3.7e-06 [updatestate_loads_eliminate]: 3.12002e-06 [parameter_eliminate]: 2.14999e-06 [a_2]: 8.001e-05 [accelerated_algorithm]: 6.78e-06 [shard]: 1.90001e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 6.14999e-06 [merge_send_recv]: 8.59e-06 [auto_parallel]: 6.12001e-06 [parallel]: 1.843e-05 [flash_sp]: 7.66999e-06 [merge_comm]: 3.62998e-06 [allreduce_fusion]: 3.73999e-06 [matmul_add_comm_reduction]: 9.25001e-06 [allreduce_slice_to_reducescatter]: 9.00007e-07 [virtual_shard_identity]: 7.41999e-06 [virtual_dataset]: 6.49001e-06 [get_grad_eliminate_]: 5.69e-06 [virtual_output]: 5.89e-06 [merge_forward]: 4.12003e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.29e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.153e-05 [merge_recompute_call_nodes]: 1.88997e-06 [before_grad]: 1.095e-05 [set_forward_comm_id_for_comm_node_pass]: 3.7e-06 [meta_fg_expand]: 2.66999e-06 [flash_sp_send_recv_attached]: 2.32001e-06 [receive_attached]: 2.24999e-06 [after_resolve]: 9.59999e-06 [a_after_grad]: 8.55001e-06 [renormalize]: 0.00047252 [add_forward_monad_depend]: 5.28002e-06 [auto_monad_grad]: 2.16998e-06 [auto_monad_eliminator]: 1.372e-05 [cse]: 2.948e-05 [a_3]: 4.18e-05 [Cycle 2]: 0.00060022, [45] [expand_dump_flag]: 1.19e-06 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.70001e-06 [a_1]: 0.00011274 [with_stream_mark]: 1.092e-05 [recompute_prepare]: 5.80002e-06 [updatestate_depend_eliminate]: 3.08e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.77002e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 7.03e-05 [accelerated_algorithm]: 5.69999e-06 [shard]: 9.69972e-07 [meta_shard_fg_expand]: 1.16997e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 4.57e-06 [auto_parallel]: 5.29e-06 [parallel]: 4.13001e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 3.21001e-06 [allreduce_fusion]: 2.80002e-06 [matmul_add_comm_reduction]: 5.19e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.21e-06 [virtual_dataset]: 5.51e-06 [get_grad_eliminate_]: 5.14998e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.50002e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 5.92001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.024e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 9.26998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.37997e-06 [meta_fg_expand]: 1.79e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 8.03999e-06 [a_after_grad]: 7.7e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.34e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 6.66999e-06 [cse]: 1.422e-05 [a_3]: 3.242e-05 [py_interpret_to_execute_after_opt_a]: 7.92e-06 [slice_cell_reuse_recomputed_activation]: 1.97999e-06 [rewriter_after_opt_a]: 3.32e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 6.718e-05 [mutable_eliminate]: 0.0005105 [opt_b]: 0.00020242, [1] [Cycle 1]: 0.00019515, [7] [b_1]: 0.00011876 [b_2]: 7.51001e-06 [updatestate_depend_eliminate]: 6.01e-06 [updatestate_assign_eliminate]: 2.79001e-06 [updatestate_loads_eliminate]: 2.88998e-06 [renormalize]: 3.4002e-07 [cse]: 1.873e-05 [optimize_parallel_all_gather_comm]: 1.719e-05 [overlap_param_gather]: 1.97001e-06 [cconv]: 2.362e-05 [loop_unroll]: 0.00046184 [opt_after_cconv]: 0.00010453, [1] [Cycle 1]: 9.826e-05, [7] [c_1]: 2.835e-05 [parameter_eliminate]: 2.48e-06 [updatestate_depend_eliminate]: 5.52999e-06 [updatestate_assign_eliminate]: 2.73e-06 [updatestate_loads_eliminate]: 2.71999e-06 [cse]: 1.868e-05 [renormalize]: 5.10016e-07 [remove_dup_value]: 1.516e-05 [tuple_transform]: 7.528e-05, [1] [Cycle 1]: 7.037e-05, [4] [d_1]: 4.062e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 7.49002e-06 [partial_unused_args_eliminate]: 2.04e-06 [add_recomputation]: 4.637e-05 [cse_after_recomputation]: 2.402e-05, [1] [Cycle 1]: 1.933e-05, [1] [cse]: 1.317e-05 [environ_conv]: 5.31998e-06 [swap_dp_allreduce_reducescatter]: 6.02001e-06 [bias_add_comm_swap]: 3.02002e-06 [label_micro_interleaved_index]: 4.81002e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.46002e-06 [slice_recompute_activation]: 2.32001e-06 [micro_interleaved_order_control]: 2.15002e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 8.80013e-07 [remove_cast_before_assign_add]: 1.27e-06 [full_micro_interleaved_order_control]: 2.71e-06 [reorder_send_recv_between_fp_bp]: 2.75002e-06 [comm_op_add_attrs]: 1.18001e-06 [add_comm_op_reuse_tag]: 1.14e-06 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.30999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.322e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 4.45e-06 [overlap_recompute_and_grad_model_parallel]: 5.43002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50999e-06 [overlap_recompute_comm]: 2.43998e-06 [overlap_grad_ring_attention]: 4.55999e-06 [overlap_grad_flash_sp]: 1.804e-05 [begin_end_overlap_inline]: 4.99975e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 2.06e-06 [handle_group_info]: 1.15001e-06 [symbol_engine_optimizer]: 7.989e-05, [1] [Cycle 1]: 7.477e-05, [6] [build]: 2.66e-06 [elim_shapecalc]: 1.032e-05 [elim_not_effective]: 1.404e-05 [opt_reshape]: 6.59001e-06 [fold_const_symbol]: 1.039e-05 [renormalize]: 2.50002e-07 [detach_backward]: 2.32001e-06 [pipeline_parallel_scheduler]: 1.65001e-06 [auto_monad_reorder]: 1.606e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.75e-06 [opt_after_jit_grad]: 0.0004749 [validate]: 3.483e-05 [backend_pass]: 1.14e-06 [task_emit]: 0.006038 [execute]: 8.03999e-06 Sums bootstrap : 0.000460s : 2.80% type_inference : 0.005813s : 35.35% event_method : 0.000014s : 0.09% auto_monad : 0.000059s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000063s : 0.38% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000555s : 3.38% optimize.opt_a.with_stream_mark : 0.000026s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000150s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000020s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000473s : 2.87% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000044s : 0.27% optimize.opt_a.a_3 : 0.000074s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000067s : 0.41% optimize.mutable_eliminate : 0.000511s : 3.10% optimize.opt_b.b_1 : 0.000119s : 0.72% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.14% optimize.loop_unroll : 0.000462s : 2.81% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.cse : 0.000019s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000041s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.05% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.28% optimize.cse_after_recomputation.cse : 0.000013s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.09% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000475s : 2.89% validate : 0.000035s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006038s : 36.71% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000173 26 18.71% : 0.000032s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.40% : 0.000006s : 3: substitution.graph_param_transform 64.25% : 0.000111s : 3: substitution.inline 2.21% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.55% : 0.000004s : 4: substitution.remove_not_recompute_node 1.60% : 0.000003s : 2: substitution.replace_old_param 5.31% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005768 2 89.82% : 0.005181s : 1: type_inference.infer 10.18% : 0.000587s : 1: type_inference.specialize ------[replace.] 0.000036 4 78.32% : 0.000028s : 3: replace.inline 21.68% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 4 92.86% : 0.000110s : 3: match.inline 7.14% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 883 0.88% : 0.000001s : 9: predicate.accumulaten_eliminater 0.92% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.92% : 0.000001s : 9: predicate.addn_zero_filter 0.81% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.07% : 0.000003s : 15: predicate.arithmetic_simplify 0.88% : 0.000001s : 9: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.60% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.02% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.46% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.17% : 0.000002s : 12: predicate.environ_get_depend_swap 1.79% : 0.000003s : 18: predicate.environ_get_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.34% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.92% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.69% : 0.000001s : 6: predicate.get_grad_eliminate 0.19% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.53% : 0.000010s : 40: predicate.inline 0.94% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 6: predicate.less_batch_normalization 1.65% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 25: predicate.load_eliminater 1.04% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.18% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.59% : 0.000003s : 13: predicate.partial_defer_inline 1.43% : 0.000002s : 13: predicate.partial_eliminate 0.90% : 0.000001s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.12% : 0.000002s : 9: predicate.reduce_eliminate 2.39% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 6: predicate.remove_not_recompute_node 1.35% : 0.000002s : 16: predicate.replace_applicator 0.64% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.92% : 0.000001s : 9: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 3: predicate.row_tensor_eliminate 0.82% : 0.000001s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 6: predicate.shard_identity_eliminate 0.75% : 0.000001s : 6: predicate.special_op_eliminate 0.83% : 0.000001s : 6: predicate.specialize_transform 0.89% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.75% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 13: predicate.switch_defer_inline 2.00% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 43: predicate.switch_simplify 0.89% : 0.000001s : 9: predicate.tile_eliminate 0.90% : 0.000001s : 9: predicate.transpose_eliminate 1.59% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.68% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.51% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.41% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.37% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.72% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000351 8 46.12% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.88% : 0.000189s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029531 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.49% : 0.003099s : 1: add_attr 10.46% : 0.003090s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000064s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.69% : 0.000499s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000027s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.59% : 0.000470s : 1: loop_unroll 0.02% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000520s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 3.13% : 0.000925s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000096s : 28: opt.transform.opt_b 0.16% : 0.000046s : 2: opt.transform.opt_trans_graph 0.13% : 0.000037s : 4: opt.transform.symbol_engine_opt 7.46% : 0.002202s : 1: opt_a 0.37% : 0.000108s : 1: opt_after_cconv 1.64% : 0.000485s : 1: opt_after_jit_grad 0.70% : 0.000206s : 1: opt_b 14.57% : 0.004302s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.24% : 0.000071s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.83% : 0.000246s : 1: renormalize.infer 0.75% : 0.000220s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000037s : 1: rewriter_after_opt_a 0.23% : 0.000067s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.28% : 0.000083s : 1: symbol_engine_optimizer 20.48% : 0.006048s : 1: task_emit 0.27% : 0.000078s : 1: tuple_transform 19.73% : 0.005827s : 1: type_inference 0.22% : 0.000064s : 1: validate TotalTime = 0.0404405, [24] [bootstrap]: 0.00050935 [type_inference]: 0.0119821 [event_method]: 4.588e-05 [auto_monad]: 0.00013466 [graph_reusing]: 8.90001e-06 [inline]: 2.07001e-06 [add_attr]: 0.00318557, [1] [add_attr_with_inline]: 0.00317669, [1] [Cycle 1]: 7.849e-05, [2] [tag_attr]: 3.544e-05 [meta_addattr_fg_expand]: 9.97999e-06 [parallel-infer-symbol]: 3.45e-06 [pre_auto_parallel]: 5.102e-05 [insert-virtual-dataset]: 2.54001e-06 [parallel-infer-symbol-second]: 9.09989e-07 [dataset_repeat_opt]: 2.26e-06 [pipeline_split]: 1.73002e-06 [optimize]: 0.0172805, [53] [py_interpret_to_execute]: 3.932e-05 [rewriter_before_opt_a]: 0.0001573 [opt_a]: 0.0150636, [3] [Cycle 1]: 0.0115559, [45] [expand_dump_flag]: 3.91001e-06 [switch_simplify]: 7.551e-05 [loop_unroll]: 6.267e-05 [a_1]: 0.00146782 [with_stream_mark]: 2.717e-05 [recompute_prepare]: 2.477e-05 [updatestate_depend_eliminate]: 1.005e-05 [updatestate_assign_eliminate]: 7.81001e-06 [updatestate_loads_eliminate]: 6.76e-06 [parameter_eliminate]: 2.79999e-06 [a_2]: 0.0002434 [accelerated_algorithm]: 3.291e-05 [shard]: 1.76e-06 [meta_shard_fg_expand]: 3.64002e-06 [shard_inline]: 1.682e-05 [merge_send_recv]: 1.685e-05 [auto_parallel]: 1.221e-05 [parallel]: 1.994e-05 [flash_sp]: 1.287e-05 [merge_comm]: 9.85002e-06 [allreduce_fusion]: 8.98002e-06 [matmul_add_comm_reduction]: 2.729e-05 [allreduce_slice_to_reducescatter]: 1.22e-06 [virtual_shard_identity]: 1.945e-05 [virtual_dataset]: 1.561e-05 [get_grad_eliminate_]: 1.508e-05 [virtual_output]: 1.473e-05 [merge_forward]: 9.79e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 1.79e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.073e-05 [merge_recompute_call_nodes]: 1.57001e-06 [before_grad]: 2.887e-05 [set_forward_comm_id_for_comm_node_pass]: 1.16e-05 [meta_fg_expand]: 0.0015856 [flash_sp_send_recv_attached]: 4.42998e-06 [receive_attached]: 2.29999e-06 [after_resolve]: 6.696e-05 [a_after_grad]: 8.903e-05 [renormalize]: 0.00662965 [add_forward_monad_depend]: 1.114e-05 [auto_monad_grad]: 6.83e-06 [auto_monad_eliminator]: 5.259e-05 [cse]: 0.0001834 [a_3]: 0.00033068 [Cycle 2]: 0.00277257, [45] [expand_dump_flag]: 2.44999e-06 [switch_simplify]: 4.53e-05 [loop_unroll]: 4.179e-05 [a_1]: 0.00133627 [with_stream_mark]: 1.596e-05 [recompute_prepare]: 9.91998e-06 [updatestate_depend_eliminate]: 4.90999e-06 [updatestate_assign_eliminate]: 3.46999e-06 [updatestate_loads_eliminate]: 2.89999e-06 [parameter_eliminate]: 1.05999e-06 [a_2]: 8.999e-05 [accelerated_algorithm]: 1.135e-05 [shard]: 1.99e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 7.31001e-06 [merge_send_recv]: 7.11999e-06 [auto_parallel]: 7.73999e-06 [parallel]: 6.76e-06 [flash_sp]: 3.86001e-06 [merge_comm]: 4.11001e-06 [allreduce_fusion]: 3.72002e-06 [matmul_add_comm_reduction]: 8.77e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 8.55999e-06 [virtual_dataset]: 6.51e-06 [get_grad_eliminate_]: 6.23e-06 [virtual_output]: 5.90002e-06 [merge_forward]: 3.69002e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 9.15001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.307e-05 [merge_recompute_call_nodes]: 1.19998e-06 [before_grad]: 1.094e-05 [set_forward_comm_id_for_comm_node_pass]: 4.32e-06 [meta_fg_expand]: 8.317e-05 [flash_sp_send_recv_attached]: 1.52001e-06 [receive_attached]: 1.68002e-06 [after_resolve]: 1.245e-05 [a_after_grad]: 1.003e-05 [renormalize]: 0.00062823 [add_forward_monad_depend]: 5.35999e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.217e-05 [cse]: 2.325e-05 [a_3]: 4.839e-05 [Cycle 3]: 0.00071933, [45] [expand_dump_flag]: 1.47999e-06 [switch_simplify]: 7.92e-06 [loop_unroll]: 6.38e-06 [a_1]: 0.00015001 [with_stream_mark]: 9.25999e-06 [recompute_prepare]: 6.77002e-06 [updatestate_depend_eliminate]: 4.24997e-06 [updatestate_assign_eliminate]: 3.21001e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 8.424e-05 [accelerated_algorithm]: 1.018e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.47999e-06 [shard_inline]: 6.76999e-06 [merge_send_recv]: 5.52999e-06 [auto_parallel]: 6.04001e-06 [parallel]: 4.87e-06 [flash_sp]: 9.49978e-07 [merge_comm]: 3.61001e-06 [allreduce_fusion]: 3.36001e-06 [matmul_add_comm_reduction]: 6.23002e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 7.46999e-06 [virtual_dataset]: 6.71999e-06 [get_grad_eliminate_]: 6.23e-06 [virtual_output]: 6.09999e-06 [merge_forward]: 3.44001e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 7.33e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.252e-05 [merge_recompute_call_nodes]: 9.50007e-07 [before_grad]: 1.075e-05 [set_forward_comm_id_for_comm_node_pass]: 4.32e-06 [meta_fg_expand]: 2.76e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 8.89995e-07 [after_resolve]: 8.99e-06 [a_after_grad]: 9.19998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.32e-06 [auto_monad_grad]: 1.22999e-06 [auto_monad_eliminator]: 9.65002e-06 [cse]: 1.78e-05 [a_3]: 3.968e-05 [py_interpret_to_execute_after_opt_a]: 1.216e-05 [slice_cell_reuse_recomputed_activation]: 2.26998e-06 [rewriter_after_opt_a]: 4.254e-05 [convert_after_rewriter]: 7.18e-06 [order_py_execute_after_rewriter]: 5.94999e-06 [mutable_eliminate]: 0.00055074 [opt_b]: 0.00022522, [1] [Cycle 1]: 0.00021758, [7] [b_1]: 0.00013284 [b_2]: 8.45999e-06 [updatestate_depend_eliminate]: 6.12001e-06 [updatestate_assign_eliminate]: 2.99001e-06 [updatestate_loads_eliminate]: 2.68e-06 [renormalize]: 5.50004e-07 [cse]: 2.791e-05 [optimize_parallel_all_gather_comm]: 1.813e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 2.285e-05 [loop_unroll]: 0.0004338 [opt_after_cconv]: 0.00010794, [1] [Cycle 1]: 0.00010214, [7] [c_1]: 3.294e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 6.16e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 2.72001e-06 [cse]: 2.032e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.651e-05 [tuple_transform]: 7.71e-05, [1] [Cycle 1]: 7.244e-05, [4] [d_1]: 4.506e-05 [none_parameter_eliminate]: 1.58002e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 7.15998e-06 [partial_unused_args_eliminate]: 1.87999e-06 [add_recomputation]: 5.331e-05 [cse_after_recomputation]: 2.4e-05, [1] [Cycle 1]: 1.933e-05, [1] [cse]: 1.404e-05 [environ_conv]: 8.39002e-06 [swap_dp_allreduce_reducescatter]: 5.49998e-06 [bias_add_comm_swap]: 2.86999e-06 [label_micro_interleaved_index]: 4.28001e-06 [label_fine_grained_interleaved_index]: 2.67001e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.22999e-06 [micro_interleaved_order_control]: 2.51998e-06 [assign_add_opt]: 1.44998e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.57001e-06 [reorder_send_recv_between_fp_bp]: 2.99001e-06 [comm_op_add_attrs]: 1.25001e-06 [add_comm_op_reuse_tag]: 1.29e-06 [interleave_split_concat_branches]: 1.28002e-06 [interleave_parallel_branches]: 1.14998e-06 [overlap_opt_shard_in_pipeline]: 1.55999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.398e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4e-06 [overlap_recompute_and_grad_model_parallel]: 5.15999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.44999e-06 [overlap_grad_ring_attention]: 4.60999e-06 [overlap_grad_flash_sp]: 2.144e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.56e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 8.503e-05, [1] [Cycle 1]: 8.072e-05, [6] [build]: 8.82e-06 [elim_shapecalc]: 1.013e-05 [elim_not_effective]: 1.453e-05 [opt_reshape]: 7.11001e-06 [fold_const_symbol]: 1.151e-05 [renormalize]: 1.8999e-07 [detach_backward]: 2.19001e-06 [pipeline_parallel_scheduler]: 1.54998e-06 [auto_monad_reorder]: 2.004e-05 [get_jit_bprop_graph]: 1.92001e-06 [rewriter_after_jit_bprop_graph]: 3.75e-06 [opt_after_jit_grad]: 0.00047013 [validate]: 4.235e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.00645927 [execute]: 7.75e-06 Sums bootstrap : 0.000509s : 1.42% type_inference : 0.011982s : 33.40% event_method : 0.000046s : 0.13% auto_monad : 0.000135s : 0.38% graph_reusing : 0.000009s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000051s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.11% optimize.rewriter_before_opt_a : 0.000157s : 0.44% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000129s : 0.36% optimize.opt_a.loop_unroll : 0.000111s : 0.31% optimize.opt_a.a_1 : 0.002954s : 8.23% optimize.opt_a.with_stream_mark : 0.000052s : 0.15% optimize.opt_a.recompute_prepare : 0.000041s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.03% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000418s : 1.16% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.15% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000031s : 0.09% optimize.opt_a.merge_send_recv : 0.000029s : 0.08% optimize.opt_a.auto_parallel : 0.000026s : 0.07% optimize.opt_a.parallel : 0.000032s : 0.09% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000018s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000035s : 0.10% optimize.opt_a.virtual_dataset : 0.000029s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.08% optimize.opt_a.virtual_output : 0.000027s : 0.07% optimize.opt_a.merge_forward : 0.000017s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000034s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000056s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000051s : 0.14% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001672s : 4.66% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.25% optimize.opt_a.a_after_grad : 0.000108s : 0.30% optimize.opt_a.renormalize : 0.007258s : 20.23% optimize.opt_a.add_forward_monad_depend : 0.000018s : 0.05% optimize.opt_a.auto_monad_grad : 0.000010s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000074s : 0.21% optimize.opt_a.cse : 0.000224s : 0.63% optimize.opt_a.a_3 : 0.000419s : 1.17% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000043s : 0.12% optimize.convert_after_rewriter : 0.000007s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000551s : 1.54% optimize.opt_b.b_1 : 0.000133s : 0.37% optimize.opt_b.b_2 : 0.000008s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000028s : 0.08% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.06% optimize.loop_unroll : 0.000434s : 1.21% optimize.opt_after_cconv.c_1 : 0.000033s : 0.09% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000020s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000017s : 0.05% optimize.tuple_transform.d_1 : 0.000045s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000053s : 0.15% optimize.cse_after_recomputation.cse : 0.000014s : 0.04% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000021s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000020s : 0.06% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000470s : 1.31% validate : 0.000042s : 0.12% backend_pass : 0.000001s : 0.00% task_emit : 0.006459s : 18.00% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000747 161 7.57% : 0.000057s : 8: substitution.arithmetic_simplify 0.30% : 0.000002s : 3: substitution.elim_not_effective 0.62% : 0.000005s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 3: substitution.fold_const_symbol 0.80% : 0.000006s : 4: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 57.97% : 0.000433s : 17: substitution.inline 2.35% : 0.000018s : 2: substitution.inline_without_move 1.35% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.28% : 0.000017s : 3: substitution.less_batch_normalization 1.37% : 0.000010s : 7: substitution.minmaximum_grad 0.85% : 0.000006s : 5: substitution.partial_eliminate 1.59% : 0.000012s : 15: substitution.remove_not_recompute_node 3.89% : 0.000029s : 10: substitution.replace_applicator 1.35% : 0.000010s : 10: substitution.replace_old_param 0.53% : 0.000004s : 1: substitution.set_cell_output_no_recompute 2.86% : 0.000021s : 7: substitution.tuple_list_convert_item_index_to_positive 1.38% : 0.000010s : 7: substitution.tuple_list_get_item_const_eliminator 1.90% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.57% : 0.000057s : 19: substitution.tuple_list_get_item_eliminator 1.96% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011897 2 86.70% : 0.010315s : 1: type_inference.infer 13.30% : 0.001582s : 1: type_inference.specialize ------[replace.] 0.000204 27 64.23% : 0.000131s : 17: replace.inline 35.77% : 0.000073s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000451 27 93.70% : 0.000422s : 17: match.inline 6.30% : 0.000028s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000693 4248 1.13% : 0.000008s : 53: predicate.accumulaten_eliminater 0.22% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.45% : 0.000003s : 21: predicate.addn_check_dump 1.14% : 0.000008s : 53: predicate.addn_zero_filter 1.08% : 0.000007s : 53: predicate.adjust_all_reduce_mul_add 2.03% : 0.000014s : 74: predicate.arithmetic_simplify 1.12% : 0.000008s : 53: predicate.cast_eliminate 1.10% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.07% : 0.000000s : 4: predicate.const_output_eliminate 0.46% : 0.000003s : 21: predicate.depend_value_elim 1.15% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.21% : 0.000008s : 53: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000001s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.17% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_depend_swap 1.64% : 0.000011s : 78: predicate.environ_get_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.80% : 0.000012s : 80: predicate.exchange_switch_depend_value 2.59% : 0.000018s : 80: predicate.float_depend_g_call 0.46% : 0.000003s : 21: predicate.float_environ_get_switch 0.59% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.53% : 0.000004s : 21: predicate.get_grad_eliminate 0.06% : 0.000000s : 4: predicate.graph_param_transform 0.51% : 0.000004s : 21: predicate.incorporate_call 0.46% : 0.000003s : 21: predicate.incorporate_call_switch 6.06% : 0.000042s : 183: predicate.inline 1.46% : 0.000010s : 45: predicate.inline_without_move 0.27% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.62% : 0.000004s : 21: predicate.less_batch_normalization 1.54% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.63% : 0.000018s : 124: predicate.load_eliminater 0.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.55% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.33% : 0.000009s : 61: predicate.make_slice_get_slice_eliminator 0.47% : 0.000003s : 21: predicate.merge_addn 1.07% : 0.000007s : 50: predicate.micro_step_allgather_replace 1.07% : 0.000007s : 50: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 53: predicate.minmaximum_grad 0.35% : 0.000002s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.10% : 0.000001s : 4: predicate.parallel_virtual_node 2.16% : 0.000015s : 80: predicate.partial_defer_inline 1.73% : 0.000012s : 67: predicate.partial_eliminate 1.12% : 0.000008s : 53: predicate.print_const_string_wrapper 0.48% : 0.000003s : 21: predicate.reduce_all_const_elim 1.45% : 0.000010s : 53: predicate.reduce_eliminate 2.64% : 0.000018s : 124: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 21: predicate.remove_not_recompute_node 1.93% : 0.000013s : 113: predicate.replace_applicator 0.67% : 0.000005s : 45: predicate.replace_old_param 0.11% : 0.000001s : 4: predicate.reset_defer_inline 1.13% : 0.000008s : 53: predicate.reshape_eliminate 1.11% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.11% : 0.000001s : 4: predicate.row_tensor_eliminate 1.28% : 0.000009s : 50: predicate.same_eliminate 0.34% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.70% : 0.000005s : 21: predicate.shard_identity_eliminate 0.21% : 0.000001s : 8: predicate.special_op_eliminate 0.61% : 0.000004s : 21: predicate.specialize_transform 1.21% : 0.000008s : 50: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.93% : 0.000013s : 80: predicate.switch_defer_inline 3.01% : 0.000021s : 130: predicate.switch_layer_defer_inline 5.21% : 0.000036s : 218: predicate.switch_simplify 1.11% : 0.000008s : 53: predicate.tile_eliminate 1.12% : 0.000008s : 53: predicate.transpose_eliminate 1.43% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000009s : 61: predicate.tuple_list_get_item_depend_reorder 2.79% : 0.000019s : 92: predicate.tuple_list_get_item_eliminator 1.46% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.55% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.57% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.14% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 4: predicate.value_based_eliminate 0.51% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.50% : 0.000003s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001860 36 62.31% : 0.001159s : 15: func_graph_cloner_run.FuncGraphClonerGraph 37.69% : 0.000701s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.072797 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.38% : 0.003191s : 1: add_attr 4.37% : 0.003181s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000058s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000142s : 1: auto_monad 0.03% : 0.000024s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.75% : 0.000548s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.04% : 0.000027s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.07% : 0.000054s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.61% : 0.000443s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000560s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 6.10% : 0.004442s : 117: opt.transform.opt_a 0.04% : 0.000031s : 1: opt.transform.opt_after_cconv 0.03% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.15% : 0.000113s : 28: opt.transform.opt_b 0.07% : 0.000050s : 2: opt.transform.opt_trans_graph 0.05% : 0.000040s : 4: opt.transform.symbol_engine_opt 20.70% : 0.015067s : 1: opt_a 0.15% : 0.000111s : 1: opt_after_cconv 0.66% : 0.000480s : 1: opt_after_jit_grad 0.31% : 0.000228s : 1: opt_b 23.75% : 0.017286s : 1: optimize 0.03% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000025s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000056s : 1: pre_auto_parallel 0.06% : 0.000044s : 1: py_interpret_to_execute 0.02% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000020s : 1: remove_dup_value 7.83% : 0.005701s : 2: renormalize.infer 2.12% : 0.001541s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000047s : 1: rewriter_after_opt_a 0.22% : 0.000162s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000088s : 1: symbol_engine_optimizer 8.89% : 0.006470s : 1: task_emit 0.11% : 0.000080s : 1: tuple_transform 16.48% : 0.012000s : 1: type_inference 0.10% : 0.000072s : 1: validate TotalTime = 0.0204656, [24] [bootstrap]: 0.00049804 [type_inference]: 0.00585362 [event_method]: 1.3e-05 [auto_monad]: 6.283e-05 [graph_reusing]: 5.54e-06 [inline]: 1.55999e-06 [add_attr]: 0.00309526, [1] [add_attr_with_inline]: 0.00308666, [1] [Cycle 1]: 5.509e-05, [2] [tag_attr]: 1.384e-05 [meta_addattr_fg_expand]: 4.07e-06 [parallel-infer-symbol]: 3.23e-06 [pre_auto_parallel]: 2.517e-05 [insert-virtual-dataset]: 2.61999e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.37001e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00399183, [53] [py_interpret_to_execute]: 2.018e-05 [rewriter_before_opt_a]: 5.053e-05 [opt_a]: 0.0021244, [2] [Cycle 1]: 0.0014394, [45] [expand_dump_flag]: 3.25e-06 [switch_simplify]: 2.888e-05 [loop_unroll]: 1.686e-05 [a_1]: 0.00035855 [with_stream_mark]: 1.492e-05 [recompute_prepare]: 8.08001e-06 [updatestate_depend_eliminate]: 3.90998e-06 [updatestate_assign_eliminate]: 4.1e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.90001e-06 [a_2]: 8.116e-05 [accelerated_algorithm]: 6.06e-06 [shard]: 1.94e-06 [meta_shard_fg_expand]: 1.64998e-06 [shard_inline]: 6.12999e-06 [merge_send_recv]: 8.63001e-06 [auto_parallel]: 6.31e-06 [parallel]: 1.885e-05 [flash_sp]: 7.55e-06 [merge_comm]: 4.58999e-06 [allreduce_fusion]: 3.55998e-06 [matmul_add_comm_reduction]: 9.99001e-06 [allreduce_slice_to_reducescatter]: 8.79983e-07 [virtual_shard_identity]: 8.1e-06 [virtual_dataset]: 5.85002e-06 [get_grad_eliminate_]: 5.65001e-06 [virtual_output]: 5.61998e-06 [merge_forward]: 3.79002e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 1.022e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.164e-05 [merge_recompute_call_nodes]: 1.89999e-06 [before_grad]: 1.05e-05 [set_forward_comm_id_for_comm_node_pass]: 3.83001e-06 [meta_fg_expand]: 2.88e-06 [flash_sp_send_recv_attached]: 2.35002e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 1.001e-05 [a_after_grad]: 8.90999e-06 [renormalize]: 0.00041653 [add_forward_monad_depend]: 4.50001e-06 [auto_monad_grad]: 1.82001e-06 [auto_monad_eliminator]: 1.435e-05 [cse]: 3.06e-05 [a_3]: 4.168e-05 [Cycle 2]: 0.00067506, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 7.28e-06 [loop_unroll]: 6.01e-06 [a_1]: 0.00011442 [with_stream_mark]: 9.82999e-06 [recompute_prepare]: 6.16e-06 [updatestate_depend_eliminate]: 2.96001e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 9.79984e-07 [a_2]: 7.064e-05 [accelerated_algorithm]: 5.92001e-06 [shard]: 8.99978e-07 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 7.452e-05 [auto_parallel]: 5.49998e-06 [parallel]: 3.95e-06 [flash_sp]: 3.35998e-06 [merge_comm]: 3.26999e-06 [allreduce_fusion]: 2.83998e-06 [matmul_add_comm_reduction]: 5.34e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.78e-06 [virtual_dataset]: 5.55001e-06 [get_grad_eliminate_]: 5.11997e-06 [virtual_output]: 4.98001e-06 [merge_forward]: 2.60997e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 6.42001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.08e-05 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 8.80999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 1.74998e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 8.70001e-07 [after_resolve]: 8.45001e-06 [a_after_grad]: 7.88001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.19003e-06 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 6.56e-06 [cse]: 1.414e-05 [a_3]: 3.344e-05 [py_interpret_to_execute_after_opt_a]: 7.17002e-06 [slice_cell_reuse_recomputed_activation]: 2.29001e-06 [rewriter_after_opt_a]: 3.297e-05 [convert_after_rewriter]: 6.30997e-06 [order_py_execute_after_rewriter]: 5.18002e-06 [mutable_eliminate]: 0.00046051 [opt_b]: 0.00018463, [1] [Cycle 1]: 0.0001784, [7] [b_1]: 0.00010915 [b_2]: 6.89001e-06 [updatestate_depend_eliminate]: 5.09998e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.11998e-06 [renormalize]: 3.89991e-07 [cse]: 1.71e-05 [optimize_parallel_all_gather_comm]: 1.672e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.278e-05 [loop_unroll]: 0.00041953 [opt_after_cconv]: 9.387e-05, [1] [Cycle 1]: 8.81e-05, [7] [c_1]: 2.477e-05 [parameter_eliminate]: 2.35002e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.37001e-06 [cse]: 1.703e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 1.453e-05 [tuple_transform]: 6.948e-05, [1] [Cycle 1]: 6.483e-05, [4] [d_1]: 3.737e-05 [none_parameter_eliminate]: 1.67999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.68e-06 [partial_unused_args_eliminate]: 2.14e-06 [add_recomputation]: 4.373e-05 [cse_after_recomputation]: 2.095e-05, [1] [Cycle 1]: 1.628e-05, [1] [cse]: 1.091e-05 [environ_conv]: 5.59e-06 [swap_dp_allreduce_reducescatter]: 5.29e-06 [bias_add_comm_swap]: 2.46e-06 [label_micro_interleaved_index]: 4.48999e-06 [label_fine_grained_interleaved_index]: 2.84001e-06 [merge_cast_opt]: 1.52001e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.79999e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 8.30012e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.22999e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 1.19998e-06 [interleave_split_concat_branches]: 1.45999e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.338e-05 [grouped_pairwise_exchange_alltoall]: 1.87001e-06 [offloading_packed_experts]: 4.18999e-06 [overlap_recompute_and_grad_model_parallel]: 4.53001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.77999e-06 [overlap_recompute_comm]: 2.51998e-06 [overlap_grad_ring_attention]: 4.41002e-06 [overlap_grad_flash_sp]: 1.847e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.80001e-06 [handle_group_info]: 1.08001e-06 [symbol_engine_optimizer]: 7.19e-05, [1] [Cycle 1]: 6.732e-05, [6] [build]: 2.32001e-06 [elim_shapecalc]: 8.62998e-06 [elim_not_effective]: 1.179e-05 [opt_reshape]: 6.19999e-06 [fold_const_symbol]: 9.54e-06 [renormalize]: 3.00002e-07 [detach_backward]: 1.60999e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 1.657e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 4.03001e-06 [opt_after_jit_grad]: 0.00045348 [validate]: 3.389e-05 [backend_pass]: 1.10001e-06 [task_emit]: 0.00618722 [execute]: 8.43999e-06 Sums bootstrap : 0.000498s : 3.04% type_inference : 0.005854s : 35.75% event_method : 0.000013s : 0.08% auto_monad : 0.000063s : 0.38% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000051s : 0.31% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000036s : 0.22% optimize.opt_a.loop_unroll : 0.000023s : 0.14% optimize.opt_a.a_1 : 0.000473s : 2.89% optimize.opt_a.with_stream_mark : 0.000025s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000152s : 0.93% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000083s : 0.51% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000417s : 2.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000045s : 0.27% optimize.opt_a.a_3 : 0.000075s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.20% optimize.convert_after_rewriter : 0.000006s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000461s : 2.81% optimize.opt_b.b_1 : 0.000109s : 0.67% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000420s : 2.56% optimize.opt_after_cconv.c_1 : 0.000025s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000453s : 2.77% validate : 0.000034s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006187s : 37.79% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000146 24 20.92% : 0.000031s : 4: substitution.arithmetic_simplify 1.34% : 0.000002s : 2: substitution.elim_not_effective 0.97% : 0.000001s : 2: substitution.fold_const_symbol 3.91% : 0.000006s : 3: substitution.graph_param_transform 65.41% : 0.000095s : 3: substitution.inline 2.14% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.08% : 0.000005s : 4: substitution.remove_not_recompute_node 2.23% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005807 2 91.95% : 0.005339s : 1: type_inference.infer 8.05% : 0.000468s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000094 3 100.00% : 0.000094s : 3: match.inline ------[predicate.] 0.000148 815 1.11% : 0.000002s : 8: predicate.accumulaten_eliminater 0.89% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 6: predicate.addn_check_dump 0.82% : 0.000001s : 8: predicate.addn_zero_filter 0.79% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.39% : 0.000004s : 14: predicate.arithmetic_simplify 0.86% : 0.000001s : 8: predicate.cast_eliminate 0.69% : 0.000001s : 6: predicate.check_bprop_eliminate 0.67% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.66% : 0.000001s : 6: predicate.depend_value_elim 0.80% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.04% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.42% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_depend_swap 1.97% : 0.000003s : 17: predicate.environ_get_eliminate 1.11% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.17% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.47% : 0.000004s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 6: predicate.float_environ_get_switch 0.92% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.27% : 0.000000s : 3: predicate.graph_param_transform 0.73% : 0.000001s : 6: predicate.incorporate_call 0.62% : 0.000001s : 6: predicate.incorporate_call_switch 6.40% : 0.000010s : 37: predicate.inline 1.01% : 0.000001s : 6: predicate.inline_without_move 0.53% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 6: predicate.less_batch_normalization 1.51% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.25% : 0.000003s : 22: predicate.load_eliminater 1.34% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.90% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.63% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.75% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 8: predicate.minmaximum_grad 1.19% : 0.000002s : 3: predicate.mutable_eliminate 0.54% : 0.000001s : 3: predicate.opt_reshape 0.52% : 0.000001s : 3: predicate.parallel_virtual_node 1.45% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 11: predicate.partial_eliminate 0.95% : 0.000001s : 8: predicate.print_const_string_wrapper 0.69% : 0.000001s : 6: predicate.reduce_all_const_elim 1.17% : 0.000002s : 8: predicate.reduce_eliminate 2.18% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 6: predicate.remove_not_recompute_node 1.21% : 0.000002s : 14: predicate.replace_applicator 0.79% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.89% : 0.000001s : 8: predicate.reshape_eliminate 0.68% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.88% : 0.000001s : 6: predicate.same_eliminate 0.52% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 6: predicate.shard_identity_eliminate 0.80% : 0.000001s : 6: predicate.special_op_eliminate 0.92% : 0.000001s : 6: predicate.specialize_transform 1.06% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.73% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.23% : 0.000002s : 11: predicate.switch_defer_inline 2.06% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.60% : 0.000007s : 38: predicate.switch_simplify 0.86% : 0.000001s : 8: predicate.tile_eliminate 0.84% : 0.000001s : 8: predicate.transpose_eliminate 1.52% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.07% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.66% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.18% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.95% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.75% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000281 7 38.88% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.12% : 0.000172s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028957 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.70% : 0.003099s : 1: add_attr 10.67% : 0.003091s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000068s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.84% : 0.000533s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000009s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.48% : 0.000428s : 1: loop_unroll 0.02% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.62% : 0.000469s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.90% : 0.000838s : 78: opt.transform.opt_a 0.08% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000088s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.35% : 0.002127s : 1: opt_a 0.34% : 0.000097s : 1: opt_after_cconv 1.60% : 0.000462s : 1: opt_after_jit_grad 0.65% : 0.000188s : 1: opt_b 13.80% : 0.003996s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.77% : 0.000222s : 1: renormalize.infer 0.65% : 0.000188s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000037s : 1: rewriter_after_opt_a 0.19% : 0.000055s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000075s : 1: symbol_engine_optimizer 21.40% : 0.006198s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 20.27% : 0.005870s : 1: type_inference 0.21% : 0.000061s : 1: validate TotalTime = 0.0390347, [24] [bootstrap]: 0.00057333 [type_inference]: 0.0119249 [event_method]: 4.198e-05 [auto_monad]: 0.00012832 [graph_reusing]: 8.75001e-06 [inline]: 2.07999e-06 [add_attr]: 0.00303115, [1] [add_attr_with_inline]: 0.00302253, [1] [Cycle 1]: 6.945e-05, [2] [tag_attr]: 3.065e-05 [meta_addattr_fg_expand]: 9.34e-06 [parallel-infer-symbol]: 3.23e-06 [pre_auto_parallel]: 4.692e-05 [insert-virtual-dataset]: 3.08e-06 [parallel-infer-symbol-second]: 9.10019e-07 [dataset_repeat_opt]: 2.22001e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.0161362, [53] [py_interpret_to_execute]: 3.757e-05 [rewriter_before_opt_a]: 0.00014418 [opt_a]: 0.0140183, [3] [Cycle 1]: 0.0106491, [45] [expand_dump_flag]: 4.23001e-06 [switch_simplify]: 7.445e-05 [loop_unroll]: 6.058e-05 [a_1]: 0.00136252 [with_stream_mark]: 2.449e-05 [recompute_prepare]: 2.162e-05 [updatestate_depend_eliminate]: 8.81002e-06 [updatestate_assign_eliminate]: 7.2e-06 [updatestate_loads_eliminate]: 6.73e-06 [parameter_eliminate]: 3.21001e-06 [a_2]: 0.00024375 [accelerated_algorithm]: 3.107e-05 [shard]: 1.94e-06 [meta_shard_fg_expand]: 3.77002e-06 [shard_inline]: 1.632e-05 [merge_send_recv]: 1.659e-05 [auto_parallel]: 1.061e-05 [parallel]: 1.945e-05 [flash_sp]: 1.13e-05 [merge_comm]: 9.84999e-06 [allreduce_fusion]: 8.80999e-06 [matmul_add_comm_reduction]: 2.599e-05 [allreduce_slice_to_reducescatter]: 6.90023e-07 [virtual_shard_identity]: 1.862e-05 [virtual_dataset]: 1.604e-05 [get_grad_eliminate_]: 1.506e-05 [virtual_output]: 1.57e-05 [merge_forward]: 9.22999e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 1.821e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.058e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 2.905e-05 [set_forward_comm_id_for_comm_node_pass]: 9.46e-06 [meta_fg_expand]: 0.00143263 [flash_sp_send_recv_attached]: 4.36002e-06 [receive_attached]: 2.71e-06 [after_resolve]: 6.367e-05 [a_after_grad]: 8.865e-05 [renormalize]: 0.00603101 [add_forward_monad_depend]: 9.71e-06 [auto_monad_grad]: 5.76e-06 [auto_monad_eliminator]: 5.161e-05 [cse]: 0.00017981 [a_3]: 0.00033046 [Cycle 2]: 0.00267814, [45] [expand_dump_flag]: 2.01998e-06 [switch_simplify]: 4.503e-05 [loop_unroll]: 4.206e-05 [a_1]: 0.00132664 [with_stream_mark]: 1.141e-05 [recompute_prepare]: 8.79e-06 [updatestate_depend_eliminate]: 4.32e-06 [updatestate_assign_eliminate]: 3.14999e-06 [updatestate_loads_eliminate]: 2.55002e-06 [parameter_eliminate]: 1.53002e-06 [a_2]: 9.133e-05 [accelerated_algorithm]: 1.088e-05 [shard]: 1.86003e-06 [meta_shard_fg_expand]: 1.89999e-06 [shard_inline]: 7.15998e-06 [merge_send_recv]: 6.49001e-06 [auto_parallel]: 7.18e-06 [parallel]: 5.67001e-06 [flash_sp]: 3.51999e-06 [merge_comm]: 3.86001e-06 [allreduce_fusion]: 3.51001e-06 [matmul_add_comm_reduction]: 7.46999e-06 [allreduce_slice_to_reducescatter]: 5.3001e-07 [virtual_shard_identity]: 7.82e-06 [virtual_dataset]: 6.47001e-06 [get_grad_eliminate_]: 6.26e-06 [virtual_output]: 6.06998e-06 [merge_forward]: 3.63e-06 [cell_reuse_recompute_pass]: 8.60018e-07 [offload_activation]: 8.27998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.275e-05 [merge_recompute_call_nodes]: 1.06002e-06 [before_grad]: 1.22e-05 [set_forward_comm_id_for_comm_node_pass]: 4.52e-06 [meta_fg_expand]: 5.345e-05 [flash_sp_send_recv_attached]: 1.03001e-06 [receive_attached]: 1.38002e-06 [after_resolve]: 1.132e-05 [a_after_grad]: 1.018e-05 [renormalize]: 0.00056385 [add_forward_monad_depend]: 4.33001e-06 [auto_monad_grad]: 1.52999e-06 [auto_monad_eliminator]: 1.155e-05 [cse]: 2.064e-05 [a_3]: 4.714e-05 [Cycle 3]: 0.00067684, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 7.95e-06 [loop_unroll]: 6.59999e-06 [a_1]: 0.00014617 [with_stream_mark]: 8.42e-06 [recompute_prepare]: 6.66999e-06 [updatestate_depend_eliminate]: 3.70998e-06 [updatestate_assign_eliminate]: 2.65002e-06 [updatestate_loads_eliminate]: 2.58998e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 8.474e-05 [accelerated_algorithm]: 9.81998e-06 [shard]: 9.00007e-07 [meta_shard_fg_expand]: 1.40001e-06 [shard_inline]: 6.78e-06 [merge_send_recv]: 5.34998e-06 [auto_parallel]: 6.09001e-06 [parallel]: 4.42e-06 [flash_sp]: 9.40025e-07 [merge_comm]: 3.75e-06 [allreduce_fusion]: 3.38999e-06 [matmul_add_comm_reduction]: 5.62999e-06 [allreduce_slice_to_reducescatter]: 4.89992e-07 [virtual_shard_identity]: 7.43e-06 [virtual_dataset]: 6.78998e-06 [get_grad_eliminate_]: 6.18998e-06 [virtual_output]: 5.98998e-06 [merge_forward]: 3.23e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 6.94001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.29e-05 [merge_recompute_call_nodes]: 8.60018e-07 [before_grad]: 1.065e-05 [set_forward_comm_id_for_comm_node_pass]: 4.1e-06 [meta_fg_expand]: 2.32001e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 9.09989e-07 [after_resolve]: 9.37001e-06 [a_after_grad]: 9.39998e-06 [renormalize]: 1.19995e-07 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 9.80013e-07 [auto_monad_eliminator]: 7.90998e-06 [cse]: 1.649e-05 [a_3]: 3.926e-05 [py_interpret_to_execute_after_opt_a]: 9.72999e-06 [slice_cell_reuse_recomputed_activation]: 1.97001e-06 [rewriter_after_opt_a]: 4.042e-05 [convert_after_rewriter]: 7.50998e-06 [order_py_execute_after_rewriter]: 6.08002e-06 [mutable_eliminate]: 0.00049285 [opt_b]: 0.00021976, [1] [Cycle 1]: 0.00021226, [7] [b_1]: 0.0001303 [b_2]: 8.15e-06 [updatestate_depend_eliminate]: 6.03998e-06 [updatestate_assign_eliminate]: 2.94001e-06 [updatestate_loads_eliminate]: 7.80998e-06 [renormalize]: 5.29981e-07 [cse]: 2.119e-05 [optimize_parallel_all_gather_comm]: 1.713e-05 [overlap_param_gather]: 1.81003e-06 [cconv]: 2.018e-05 [loop_unroll]: 0.00042441 [opt_after_cconv]: 0.00010834, [1] [Cycle 1]: 0.0001025, [7] [c_1]: 3.346e-05 [parameter_eliminate]: 2.39001e-06 [updatestate_depend_eliminate]: 5.65001e-06 [updatestate_assign_eliminate]: 3.16001e-06 [updatestate_loads_eliminate]: 2.74999e-06 [cse]: 2.039e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.552e-05 [tuple_transform]: 7.787e-05, [1] [Cycle 1]: 7.276e-05, [4] [d_1]: 4.517e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 7.26001e-06 [partial_unused_args_eliminate]: 2.19001e-06 [add_recomputation]: 4.968e-05 [cse_after_recomputation]: 2.459e-05, [1] [Cycle 1]: 2.006e-05, [1] [cse]: 1.449e-05 [environ_conv]: 7.82e-06 [swap_dp_allreduce_reducescatter]: 5.80002e-06 [bias_add_comm_swap]: 2.92002e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.60002e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.81e-06 [assign_add_opt]: 1.70001e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.14999e-06 [reorder_send_recv_between_fp_bp]: 2.89001e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.96e-06 [control_data_broadcast_order]: 1.433e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 4.51002e-06 [overlap_recompute_and_grad_model_parallel]: 5.20001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.59999e-06 [overlap_grad_ring_attention]: 4.97e-06 [overlap_grad_flash_sp]: 2.148e-05 [begin_end_overlap_inline]: 8.00006e-07 [split_matmul_comm_elemetwise]: 2.65002e-06 [split_layernorm_comm]: 1.73002e-06 [handle_group_info]: 1.40001e-06 [symbol_engine_optimizer]: 8.55e-05, [1] [Cycle 1]: 8.121e-05, [6] [build]: 8.58001e-06 [elim_shapecalc]: 1.055e-05 [elim_not_effective]: 1.442e-05 [opt_reshape]: 7.35998e-06 [fold_const_symbol]: 1.207e-05 [renormalize]: 2.00002e-07 [detach_backward]: 2.11e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 2.162e-05 [get_jit_bprop_graph]: 1.22999e-06 [rewriter_after_jit_bprop_graph]: 3.78001e-06 [opt_after_jit_grad]: 0.00047788 [validate]: 4.051e-05 [backend_pass]: 1.06002e-06 [task_emit]: 0.00635947 [execute]: 7.13e-06 Sums bootstrap : 0.000573s : 1.65% type_inference : 0.011925s : 34.41% event_method : 0.000042s : 0.12% auto_monad : 0.000128s : 0.37% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000047s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000038s : 0.11% optimize.rewriter_before_opt_a : 0.000144s : 0.42% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000127s : 0.37% optimize.opt_a.loop_unroll : 0.000109s : 0.32% optimize.opt_a.a_1 : 0.002835s : 8.18% optimize.opt_a.with_stream_mark : 0.000044s : 0.13% optimize.opt_a.recompute_prepare : 0.000037s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000013s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.03% optimize.opt_a.parameter_eliminate : 0.000006s : 0.02% optimize.opt_a.a_2 : 0.000420s : 1.21% optimize.opt_a.accelerated_algorithm : 0.000052s : 0.15% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000030s : 0.09% optimize.opt_a.merge_send_recv : 0.000028s : 0.08% optimize.opt_a.auto_parallel : 0.000024s : 0.07% optimize.opt_a.parallel : 0.000030s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000017s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000039s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000034s : 0.10% optimize.opt_a.virtual_dataset : 0.000029s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.08% optimize.opt_a.virtual_output : 0.000028s : 0.08% optimize.opt_a.merge_forward : 0.000016s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000033s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000056s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000052s : 0.15% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.05% optimize.opt_a.meta_fg_expand : 0.001488s : 4.30% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000084s : 0.24% optimize.opt_a.a_after_grad : 0.000108s : 0.31% optimize.opt_a.renormalize : 0.006595s : 19.03% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000071s : 0.21% optimize.opt_a.cse : 0.000217s : 0.63% optimize.opt_a.a_3 : 0.000417s : 1.20% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000040s : 0.12% optimize.convert_after_rewriter : 0.000008s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000493s : 1.42% optimize.opt_b.b_1 : 0.000130s : 0.38% optimize.opt_b.b_2 : 0.000008s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000008s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000021s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000424s : 1.22% optimize.opt_after_cconv.c_1 : 0.000033s : 0.10% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000020s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.04% optimize.tuple_transform.d_1 : 0.000045s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.14% optimize.cse_after_recomputation.cse : 0.000014s : 0.04% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000021s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.06% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000478s : 1.38% validate : 0.000041s : 0.12% backend_pass : 0.000001s : 0.00% task_emit : 0.006359s : 18.35% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000689 159 6.82% : 0.000047s : 7: substitution.arithmetic_simplify 0.37% : 0.000003s : 3: substitution.elim_not_effective 0.63% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 3: substitution.fold_const_symbol 0.95% : 0.000007s : 4: substitution.graph_param_transform 0.50% : 0.000003s : 2: substitution.incorporate_call 0.39% : 0.000003s : 2: substitution.incorporate_call_switch 57.96% : 0.000399s : 17: substitution.inline 2.38% : 0.000016s : 2: substitution.inline_without_move 1.49% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.28% : 0.000016s : 3: substitution.less_batch_normalization 1.45% : 0.000010s : 7: substitution.minmaximum_grad 0.89% : 0.000006s : 5: substitution.partial_eliminate 1.73% : 0.000012s : 15: substitution.remove_not_recompute_node 3.85% : 0.000027s : 10: substitution.replace_applicator 1.38% : 0.000009s : 10: substitution.replace_old_param 0.37% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.95% : 0.000020s : 7: substitution.tuple_list_convert_item_index_to_positive 1.46% : 0.000010s : 7: substitution.tuple_list_get_item_const_eliminator 2.02% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.27% : 0.000050s : 18: substitution.tuple_list_get_item_eliminator 1.98% : 0.000014s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011850 2 87.80% : 0.010405s : 1: type_inference.infer 12.20% : 0.001445s : 1: type_inference.specialize ------[replace.] 0.000191 26 66.02% : 0.000126s : 17: replace.inline 33.98% : 0.000065s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000414 26 94.12% : 0.000390s : 17: match.inline 5.88% : 0.000024s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000678 4180 1.13% : 0.000008s : 52: predicate.accumulaten_eliminater 0.22% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.12% : 0.000008s : 52: predicate.addn_zero_filter 1.09% : 0.000007s : 52: predicate.adjust_all_reduce_mul_add 1.95% : 0.000013s : 73: predicate.arithmetic_simplify 1.12% : 0.000008s : 52: predicate.cast_eliminate 1.12% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.47% : 0.000003s : 21: predicate.depend_value_elim 1.15% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.21% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000000s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.19% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.19% : 0.000008s : 56: predicate.environ_get_depend_swap 1.69% : 0.000011s : 77: predicate.environ_get_eliminate 1.22% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.83% : 0.000012s : 78: predicate.exchange_switch_depend_value 2.47% : 0.000017s : 78: predicate.float_depend_g_call 0.47% : 0.000003s : 21: predicate.float_environ_get_switch 0.60% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.53% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.53% : 0.000004s : 21: predicate.incorporate_call 0.47% : 0.000003s : 21: predicate.incorporate_call_switch 5.93% : 0.000040s : 180: predicate.inline 1.45% : 0.000010s : 45: predicate.inline_without_move 0.29% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.60% : 0.000004s : 21: predicate.less_batch_normalization 1.53% : 0.000010s : 69: predicate.list_to_tuple_eliminator_ 2.65% : 0.000018s : 121: predicate.load_eliminater 0.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.62% : 0.000018s : 110: predicate.loop_unroll_before_grad 1.36% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.49% : 0.000003s : 21: predicate.merge_addn 1.12% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 52: predicate.minmaximum_grad 0.28% : 0.000002s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.11% : 0.000001s : 4: predicate.parallel_virtual_node 2.12% : 0.000014s : 78: predicate.partial_defer_inline 1.69% : 0.000011s : 65: predicate.partial_eliminate 1.12% : 0.000008s : 52: predicate.print_const_string_wrapper 0.50% : 0.000003s : 21: predicate.reduce_all_const_elim 1.43% : 0.000010s : 52: predicate.reduce_eliminate 2.63% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 21: predicate.remove_not_recompute_node 1.94% : 0.000013s : 111: predicate.replace_applicator 0.68% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.15% : 0.000008s : 52: predicate.reshape_eliminate 1.14% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.11% : 0.000001s : 4: predicate.row_tensor_eliminate 1.29% : 0.000009s : 50: predicate.same_eliminate 0.35% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.60% : 0.000004s : 21: predicate.shard_identity_eliminate 0.22% : 0.000002s : 8: predicate.special_op_eliminate 0.64% : 0.000004s : 21: predicate.specialize_transform 1.24% : 0.000008s : 50: predicate.split_environ_get_set_with_tuple_value 1.22% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.95% : 0.000013s : 78: predicate.switch_defer_inline 3.02% : 0.000020s : 128: predicate.switch_layer_defer_inline 5.33% : 0.000036s : 213: predicate.switch_simplify 1.13% : 0.000008s : 52: predicate.tile_eliminate 1.10% : 0.000007s : 52: predicate.transpose_eliminate 1.38% : 0.000009s : 60: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.68% : 0.000018s : 90: predicate.tuple_list_get_item_eliminator 1.44% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000014s : 81: predicate.tuple_list_set_item_eliminator 1.54% : 0.000010s : 69: predicate.tuple_to_list_eliminator_ 2.59% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.15% : 0.000021s : 142: predicate.updatestate_useless_node_eliminater 0.13% : 0.000001s : 4: predicate.value_based_eliminate 0.52% : 0.000003s : 21: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.13% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001645 35 60.57% : 0.000996s : 14: func_graph_cloner_run.FuncGraphClonerGraph 39.43% : 0.000648s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.069301 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.38% : 0.003036s : 1: add_attr 4.37% : 0.003026s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000054s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.20% : 0.000136s : 1: auto_monad 0.04% : 0.000026s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.88% : 0.000612s : 1: bootstrap 0.03% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000018s : 1: control_data_broadcast_order 0.02% : 0.000011s : 1: convert_after_rewriter 0.04% : 0.000028s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.07% : 0.000049s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.62% : 0.000433s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.72% : 0.000502s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 6.22% : 0.004312s : 117: opt.transform.opt_a 0.05% : 0.000032s : 1: opt.transform.opt_after_cconv 0.04% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000111s : 28: opt.transform.opt_b 0.07% : 0.000050s : 2: opt.transform.opt_trans_graph 0.06% : 0.000041s : 4: opt.transform.symbol_engine_opt 20.23% : 0.014021s : 1: opt_a 0.16% : 0.000112s : 1: opt_after_cconv 0.70% : 0.000488s : 1: opt_after_jit_grad 0.32% : 0.000223s : 1: opt_b 23.29% : 0.016140s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.04% : 0.000025s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.07% : 0.000051s : 1: pre_auto_parallel 0.06% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 7.41% : 0.005132s : 2: renormalize.infer 2.09% : 0.001448s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000044s : 1: rewriter_after_opt_a 0.21% : 0.000149s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.13% : 0.000088s : 1: symbol_engine_optimizer 9.19% : 0.006370s : 1: task_emit 0.12% : 0.000081s : 1: tuple_transform 17.23% : 0.011942s : 1: type_inference 0.10% : 0.000069s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x2-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x2-kbk],max_mem:10.0M TotalTime = 0.114128, [24] [bootstrap]: 0.00058898 [type_inference]: 0.00677944 [event_method]: 1.52e-05 [auto_monad]: 6.142e-05 [graph_reusing]: 6.04999e-06 [inline]: 2.41e-06 [add_attr]: 0.00382052, [1] [add_attr_with_inline]: 0.00380795, [1] [Cycle 1]: 5.3e-05, [2] [tag_attr]: 1.616e-05 [meta_addattr_fg_expand]: 4.32998e-06 [parallel-infer-symbol]: 3.8e-06 [pre_auto_parallel]: 2.916e-05 [insert-virtual-dataset]: 2.70002e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.27001e-06 [pipeline_split]: 1.94999e-06 [optimize]: 0.00438425, [53] [py_interpret_to_execute]: 2.386e-05 [rewriter_before_opt_a]: 6.546e-05 [opt_a]: 0.00237342, [2] [Cycle 1]: 0.00174802, [45] [expand_dump_flag]: 3.27002e-06 [switch_simplify]: 3.463e-05 [loop_unroll]: 2.013e-05 [a_1]: 0.00045003 [with_stream_mark]: 1.632e-05 [recompute_prepare]: 8.27e-06 [updatestate_depend_eliminate]: 4.26001e-06 [updatestate_assign_eliminate]: 3.51999e-06 [updatestate_loads_eliminate]: 3.63e-06 [parameter_eliminate]: 1.77999e-06 [a_2]: 7.931e-05 [accelerated_algorithm]: 7.35e-06 [shard]: 2.53e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 8.48001e-06 [auto_parallel]: 7.51001e-06 [parallel]: 2.746e-05 [flash_sp]: 7.40998e-06 [merge_comm]: 3.65e-06 [allreduce_fusion]: 3.53e-06 [matmul_add_comm_reduction]: 1.009e-05 [allreduce_slice_to_reducescatter]: 8.70001e-07 [virtual_shard_identity]: 7.735e-05 [virtual_dataset]: 7.08998e-06 [get_grad_eliminate_]: 5.67999e-06 [virtual_output]: 6.15002e-06 [merge_forward]: 4.58001e-06 [cell_reuse_recompute_pass]: 1.76e-06 [offload_activation]: 1.021e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.372e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 1.038e-05 [set_forward_comm_id_for_comm_node_pass]: 4.22998e-06 [meta_fg_expand]: 3.09999e-06 [flash_sp_send_recv_attached]: 2.91e-06 [receive_attached]: 2.24999e-06 [after_resolve]: 9.52999e-06 [a_after_grad]: 8.59e-06 [renormalize]: 0.00051854 [add_forward_monad_depend]: 9.15001e-06 [auto_monad_grad]: 2.43e-06 [auto_monad_eliminator]: 1.383e-05 [cse]: 3.09e-05 [a_3]: 4.298e-05 [Cycle 2]: 0.00061505, [45] [expand_dump_flag]: 1.20001e-06 [switch_simplify]: 6.81001e-06 [loop_unroll]: 5.47999e-06 [a_1]: 0.00011584 [with_stream_mark]: 1.145e-05 [recompute_prepare]: 5.80002e-06 [updatestate_depend_eliminate]: 3.5e-06 [updatestate_assign_eliminate]: 2.43002e-06 [updatestate_loads_eliminate]: 2.62001e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 7.032e-05 [accelerated_algorithm]: 5.76e-06 [shard]: 1.32999e-06 [meta_shard_fg_expand]: 1.34998e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 4.77998e-06 [auto_parallel]: 6.06e-06 [parallel]: 5.39e-06 [flash_sp]: 4.13001e-06 [merge_comm]: 3.56999e-06 [allreduce_fusion]: 3.03998e-06 [matmul_add_comm_reduction]: 5.94e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 6.35002e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.08002e-06 [virtual_output]: 5.07999e-06 [merge_forward]: 2.79999e-06 [cell_reuse_recompute_pass]: 1.55001e-06 [offload_activation]: 6.69001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.102e-05 [merge_recompute_call_nodes]: 8.79983e-07 [before_grad]: 8.37998e-06 [set_forward_comm_id_for_comm_node_pass]: 4.00998e-06 [meta_fg_expand]: 1.95001e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.55999e-06 [after_resolve]: 8.3e-06 [a_after_grad]: 8.02e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 1.13001e-06 [auto_monad_eliminator]: 7.68001e-06 [cse]: 1.888e-05 [a_3]: 3.245e-05 [py_interpret_to_execute_after_opt_a]: 9.37001e-06 [slice_cell_reuse_recomputed_activation]: 1.94999e-06 [rewriter_after_opt_a]: 3.477e-05 [convert_after_rewriter]: 7.23e-06 [order_py_execute_after_rewriter]: 4.94e-06 [mutable_eliminate]: 0.00051122 [opt_b]: 0.00019069, [1] [Cycle 1]: 0.00018394, [7] [b_1]: 0.00010904 [b_2]: 7.35998e-06 [updatestate_depend_eliminate]: 6.74999e-06 [updatestate_assign_eliminate]: 2.67001e-06 [updatestate_loads_eliminate]: 2.59001e-06 [renormalize]: 4.80009e-07 [cse]: 1.927e-05 [optimize_parallel_all_gather_comm]: 1.72e-05 [overlap_param_gather]: 1.98002e-06 [cconv]: 2.564e-05 [loop_unroll]: 0.00045121 [opt_after_cconv]: 9.84e-05, [1] [Cycle 1]: 9.137e-05, [7] [c_1]: 2.566e-05 [parameter_eliminate]: 2.90998e-06 [updatestate_depend_eliminate]: 5.48002e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.14999e-06 [cse]: 1.748e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.575e-05 [tuple_transform]: 6.898e-05, [1] [Cycle 1]: 6.439e-05, [4] [d_1]: 3.742e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.59999e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 5.369e-05 [cse_after_recomputation]: 2.231e-05, [1] [Cycle 1]: 1.759e-05, [1] [cse]: 1.197e-05 [environ_conv]: 8.81997e-06 [swap_dp_allreduce_reducescatter]: 5.09e-06 [bias_add_comm_swap]: 3.08e-06 [label_micro_interleaved_index]: 5.39998e-06 [label_fine_grained_interleaved_index]: 2.66999e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.22999e-06 [micro_interleaved_order_control]: 2.22999e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 1.13001e-06 [full_micro_interleaved_order_control]: 2.19001e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 1.30001e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.37999e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.3e-05 [grouped_pairwise_exchange_alltoall]: 1.87999e-06 [offloading_packed_experts]: 4.08001e-06 [overlap_recompute_and_grad_model_parallel]: 5.48997e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.79998e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 4.14002e-06 [overlap_grad_flash_sp]: 1.878e-05 [begin_end_overlap_inline]: 4.99975e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 2.16e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 7.427e-05, [1] [Cycle 1]: 6.98e-05, [6] [build]: 2.76999e-06 [elim_shapecalc]: 1.018e-05 [elim_not_effective]: 1.233e-05 [opt_reshape]: 6.28e-06 [fold_const_symbol]: 9.57001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 2.39999e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 1.58e-05 [get_jit_bprop_graph]: 1.41998e-06 [rewriter_after_jit_bprop_graph]: 4.18001e-06 [opt_after_jit_grad]: 0.00049522 [validate]: 4.823e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0976098 [execute]: 1.005e-05 Sums bootstrap : 0.000589s : 0.54% type_inference : 0.006779s : 6.21% event_method : 0.000015s : 0.01% auto_monad : 0.000061s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000029s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.02% optimize.rewriter_before_opt_a : 0.000065s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000566s : 0.52% optimize.opt_a.with_stream_mark : 0.000028s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000150s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000014s : 0.01% optimize.opt_a.parallel : 0.000033s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000084s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000018s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000519s : 0.47% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.02% optimize.opt_a.cse : 0.000050s : 0.05% optimize.opt_a.a_3 : 0.000075s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000511s : 0.47% optimize.opt_b.b_1 : 0.000109s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.02% optimize.loop_unroll : 0.000451s : 0.41% optimize.opt_after_cconv.c_1 : 0.000026s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.01% optimize.tuple_transform.d_1 : 0.000037s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000054s : 0.05% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000495s : 0.45% validate : 0.000048s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.097610s : 89.34% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000175 26 20.22% : 0.000035s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.23% : 0.000006s : 3: substitution.graph_param_transform 63.47% : 0.000111s : 3: substitution.inline 1.77% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.66% : 0.000005s : 4: substitution.remove_not_recompute_node 1.92% : 0.000003s : 2: substitution.replace_old_param 4.83% : 0.000008s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006718 2 90.49% : 0.006079s : 1: type_inference.infer 9.51% : 0.000639s : 1: type_inference.specialize ------[replace.] 0.000037 4 79.34% : 0.000029s : 3: replace.inline 20.66% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 4 93.36% : 0.000109s : 3: match.inline 6.64% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 883 0.89% : 0.000001s : 9: predicate.accumulaten_eliminater 1.09% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.88% : 0.000001s : 9: predicate.addn_zero_filter 0.83% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.04% : 0.000003s : 15: predicate.arithmetic_simplify 0.90% : 0.000001s : 9: predicate.cast_eliminate 0.61% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.87% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.20% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_depend_swap 1.74% : 0.000003s : 18: predicate.environ_get_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.48% : 0.000004s : 13: predicate.float_depend_g_call 0.55% : 0.000001s : 6: predicate.float_environ_get_switch 0.82% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.68% : 0.000001s : 6: predicate.get_grad_eliminate 0.19% : 0.000000s : 3: predicate.graph_param_transform 0.65% : 0.000001s : 6: predicate.incorporate_call 0.56% : 0.000001s : 6: predicate.incorporate_call_switch 6.47% : 0.000010s : 40: predicate.inline 0.88% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.01% : 0.000002s : 6: predicate.less_batch_normalization 1.64% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 25: predicate.load_eliminater 1.39% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.10% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.56% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.55% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.42% : 0.000002s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.34% : 0.000001s : 3: predicate.parallel_virtual_node 1.62% : 0.000003s : 13: predicate.partial_defer_inline 1.41% : 0.000002s : 13: predicate.partial_eliminate 0.85% : 0.000001s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.23% : 0.000002s : 9: predicate.reduce_eliminate 2.28% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 6: predicate.remove_not_recompute_node 1.33% : 0.000002s : 16: predicate.replace_applicator 0.72% : 0.000001s : 6: predicate.replace_old_param 0.44% : 0.000001s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 9: predicate.reshape_eliminate 0.59% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 3: predicate.row_tensor_eliminate 0.93% : 0.000002s : 6: predicate.same_eliminate 0.47% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 6: predicate.shard_identity_eliminate 0.75% : 0.000001s : 6: predicate.special_op_eliminate 0.78% : 0.000001s : 6: predicate.specialize_transform 0.88% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.72% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 13: predicate.switch_defer_inline 1.96% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.97% : 0.000008s : 43: predicate.switch_simplify 0.86% : 0.000001s : 9: predicate.tile_eliminate 1.10% : 0.000002s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 15: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.04% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 3: predicate.value_based_eliminate 0.79% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000390 8 47.11% : 0.000184s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.89% : 0.000206s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.124002 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.08% : 0.003825s : 1: add_attr 3.07% : 0.003811s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000058s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000067s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.51% : 0.000629s : 1: bootstrap 0.02% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.02% : 0.000021s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.37% : 0.000460s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.42% : 0.000521s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000015s : 1: opt.transform.mutable_eliminate 0.81% : 0.001007s : 78: opt.transform.opt_a 0.02% : 0.000024s : 1: opt.transform.opt_after_cconv 0.02% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000088s : 28: opt.transform.opt_b 0.03% : 0.000042s : 2: opt.transform.opt_trans_graph 0.03% : 0.000034s : 4: opt.transform.symbol_engine_opt 1.92% : 0.002376s : 1: opt_a 0.08% : 0.000102s : 1: opt_after_cconv 0.41% : 0.000506s : 1: opt_after_jit_grad 0.16% : 0.000194s : 1: opt_b 3.54% : 0.004389s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000034s : 1: pre_auto_parallel 0.02% : 0.000028s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000019s : 1: remove_dup_value 0.23% : 0.000279s : 1: renormalize.infer 0.19% : 0.000232s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000040s : 1: rewriter_after_opt_a 0.06% : 0.000070s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000077s : 1: symbol_engine_optimizer 78.74% : 0.097634s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 5.48% : 0.006801s : 1: type_inference 0.06% : 0.000076s : 1: validate TotalTime = 0.0960711, [24] [bootstrap]: 0.00049936 [type_inference]: 0.00604282 [event_method]: 1.215e-05 [auto_monad]: 5.991e-05 [graph_reusing]: 5.50001e-06 [inline]: 1.86e-06 [add_attr]: 0.00301488, [1] [add_attr_with_inline]: 0.00300693, [1] [Cycle 1]: 5.13e-05, [2] [tag_attr]: 1.528e-05 [meta_addattr_fg_expand]: 4.09002e-06 [parallel-infer-symbol]: 3.08998e-06 [pre_auto_parallel]: 2.373e-05 [insert-virtual-dataset]: 2.60002e-06 [parallel-infer-symbol-second]: 8.19971e-07 [dataset_repeat_opt]: 2.29001e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.00390553, [53] [py_interpret_to_execute]: 1.87e-05 [rewriter_before_opt_a]: 5.106e-05 [opt_a]: 0.00200959, [2] [Cycle 1]: 0.00140057, [45] [expand_dump_flag]: 2.96001e-06 [switch_simplify]: 2.946e-05 [loop_unroll]: 1.679e-05 [a_1]: 0.00035484 [with_stream_mark]: 1.525e-05 [recompute_prepare]: 7.83001e-06 [updatestate_depend_eliminate]: 3.64002e-06 [updatestate_assign_eliminate]: 3.17002e-06 [updatestate_loads_eliminate]: 3.33e-06 [parameter_eliminate]: 1.97001e-06 [a_2]: 8.186e-05 [accelerated_algorithm]: 7.11001e-06 [shard]: 2.21998e-06 [meta_shard_fg_expand]: 1.89999e-06 [shard_inline]: 6.21998e-06 [merge_send_recv]: 8.40001e-06 [auto_parallel]: 5.92001e-06 [parallel]: 1.851e-05 [flash_sp]: 7.46001e-06 [merge_comm]: 3.95e-06 [allreduce_fusion]: 3.52002e-06 [matmul_add_comm_reduction]: 9.68002e-06 [allreduce_slice_to_reducescatter]: 8.80013e-07 [virtual_shard_identity]: 7.33e-06 [virtual_dataset]: 6.41998e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.79e-06 [merge_forward]: 4.17998e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.25e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 1.036e-05 [set_forward_comm_id_for_comm_node_pass]: 3.78001e-06 [meta_fg_expand]: 2.88998e-06 [flash_sp_send_recv_attached]: 2.54001e-06 [receive_attached]: 2.48e-06 [after_resolve]: 9.86998e-06 [a_after_grad]: 8.90999e-06 [renormalize]: 0.00038547 [add_forward_monad_depend]: 4.70999e-06 [auto_monad_grad]: 1.70001e-06 [auto_monad_eliminator]: 1.353e-05 [cse]: 3.017e-05 [a_3]: 4.116e-05 [Cycle 2]: 0.0005997, [45] [expand_dump_flag]: 9.60019e-07 [switch_simplify]: 7.05002e-06 [loop_unroll]: 5.65001e-06 [a_1]: 0.00011389 [with_stream_mark]: 1.017e-05 [recompute_prepare]: 5.92999e-06 [updatestate_depend_eliminate]: 2.99999e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.62001e-06 [parameter_eliminate]: 9.19972e-07 [a_2]: 7.003e-05 [accelerated_algorithm]: 5.76e-06 [shard]: 1.31002e-06 [meta_shard_fg_expand]: 1.19998e-06 [shard_inline]: 5.77001e-06 [merge_send_recv]: 4.70999e-06 [auto_parallel]: 5.83997e-06 [parallel]: 4.12e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 3.37997e-06 [allreduce_fusion]: 3.09001e-06 [matmul_add_comm_reduction]: 7.77e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.04001e-06 [virtual_dataset]: 5.44998e-06 [get_grad_eliminate_]: 5.34e-06 [virtual_output]: 5.14998e-06 [merge_forward]: 2.76e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 6.24999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.004e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.59e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56999e-06 [meta_fg_expand]: 1.72999e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 9.79984e-07 [after_resolve]: 8.04997e-06 [a_after_grad]: 7.68999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.29001e-06 [cse]: 1.373e-05 [a_3]: 3.232e-05 [py_interpret_to_execute_after_opt_a]: 7.45e-06 [slice_cell_reuse_recomputed_activation]: 1.97999e-06 [rewriter_after_opt_a]: 3.297e-05 [convert_after_rewriter]: 7.18998e-06 [order_py_execute_after_rewriter]: 5.24e-06 [mutable_eliminate]: 0.00046341 [opt_b]: 0.00018481, [1] [Cycle 1]: 0.00017842, [7] [b_1]: 0.00010767 [b_2]: 7.44002e-06 [updatestate_depend_eliminate]: 5.34998e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 5.39992e-07 [cse]: 1.732e-05 [optimize_parallel_all_gather_comm]: 1.674e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.379e-05 [loop_unroll]: 0.00042827 [opt_after_cconv]: 9.647e-05, [1] [Cycle 1]: 9.085e-05, [7] [c_1]: 2.611e-05 [parameter_eliminate]: 2.35002e-06 [updatestate_depend_eliminate]: 4.91002e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.39001e-06 [cse]: 1.754e-05 [renormalize]: 6.29982e-07 [remove_dup_value]: 1.526e-05 [tuple_transform]: 6.831e-05, [1] [Cycle 1]: 6.377e-05, [4] [d_1]: 3.714e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 2.17999e-06 [add_recomputation]: 4.517e-05 [cse_after_recomputation]: 2.202e-05, [1] [Cycle 1]: 1.748e-05, [1] [cse]: 1.172e-05 [environ_conv]: 5.40001e-06 [swap_dp_allreduce_reducescatter]: 4.90001e-06 [bias_add_comm_swap]: 3.16999e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.31002e-06 [slice_recompute_activation]: 2.23002e-06 [micro_interleaved_order_control]: 2.19999e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.58003e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.59998e-06 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.42999e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81998e-06 [control_data_broadcast_order]: 1.255e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 4.08001e-06 [overlap_recompute_and_grad_model_parallel]: 4.45e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27e-06 [overlap_recompute_comm]: 2.56e-06 [overlap_grad_ring_attention]: 4.39002e-06 [overlap_grad_flash_sp]: 1.746e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.15002e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 8.617e-05, [1] [Cycle 1]: 8.183e-05, [6] [build]: 2.59001e-06 [elim_shapecalc]: 9.11998e-06 [elim_not_effective]: 1.2e-05 [opt_reshape]: 1.82e-05 [fold_const_symbol]: 1.008e-05 [renormalize]: 2.3999e-07 [detach_backward]: 2.02001e-06 [pipeline_parallel_scheduler]: 1.91e-06 [auto_monad_reorder]: 1.662e-05 [get_jit_bprop_graph]: 1.12e-06 [rewriter_after_jit_bprop_graph]: 3.98001e-06 [opt_after_jit_grad]: 0.00045416 [validate]: 3.416e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0817585 [execute]: 9.43002e-06 Sums bootstrap : 0.000499s : 0.54% type_inference : 0.006043s : 6.56% event_method : 0.000012s : 0.01% auto_monad : 0.000060s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000024s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.02% optimize.rewriter_before_opt_a : 0.000051s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000037s : 0.04% optimize.opt_a.loop_unroll : 0.000022s : 0.02% optimize.opt_a.a_1 : 0.000469s : 0.51% optimize.opt_a.with_stream_mark : 0.000025s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000152s : 0.17% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000018s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000386s : 0.42% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000044s : 0.05% optimize.opt_a.a_3 : 0.000073s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000463s : 0.50% optimize.opt_b.b_1 : 0.000108s : 0.12% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000428s : 0.47% optimize.opt_after_cconv.c_1 : 0.000026s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.02% optimize.tuple_transform.d_1 : 0.000037s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.05% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000002s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000018s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000454s : 0.49% validate : 0.000034s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.081759s : 88.82% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000144 24 20.83% : 0.000030s : 4: substitution.arithmetic_simplify 1.36% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000002s : 2: substitution.fold_const_symbol 3.82% : 0.000005s : 3: substitution.graph_param_transform 65.47% : 0.000094s : 3: substitution.inline 2.28% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.01% : 0.000004s : 4: substitution.remove_not_recompute_node 2.18% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005999 2 91.93% : 0.005515s : 1: type_inference.infer 8.07% : 0.000484s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000092 3 100.00% : 0.000092s : 3: match.inline ------[predicate.] 0.000147 815 0.86% : 0.000001s : 8: predicate.accumulaten_eliminater 1.03% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.85% : 0.000001s : 8: predicate.addn_zero_filter 0.80% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.18% : 0.000003s : 14: predicate.arithmetic_simplify 0.92% : 0.000001s : 8: predicate.cast_eliminate 0.73% : 0.000001s : 6: predicate.check_bprop_eliminate 0.66% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.61% : 0.000001s : 6: predicate.depend_value_elim 0.80% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_depend_swap 1.79% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.17% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.18% : 0.000003s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 6: predicate.float_environ_get_switch 0.92% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.78% : 0.000001s : 6: predicate.incorporate_call 0.68% : 0.000001s : 6: predicate.incorporate_call_switch 6.45% : 0.000009s : 37: predicate.inline 0.99% : 0.000001s : 6: predicate.inline_without_move 0.55% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.13% : 0.000002s : 6: predicate.less_batch_normalization 1.62% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 22: predicate.load_eliminater 1.05% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.99% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.62% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.66% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 8: predicate.minmaximum_grad 1.31% : 0.000002s : 3: predicate.mutable_eliminate 0.56% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.38% : 0.000002s : 11: predicate.partial_defer_inline 1.29% : 0.000002s : 11: predicate.partial_eliminate 0.84% : 0.000001s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.11% : 0.000002s : 8: predicate.reduce_eliminate 2.26% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.63% : 0.000001s : 6: predicate.remove_not_recompute_node 1.23% : 0.000002s : 14: predicate.replace_applicator 0.80% : 0.000001s : 6: predicate.replace_old_param 0.31% : 0.000000s : 3: predicate.reset_defer_inline 0.86% : 0.000001s : 8: predicate.reshape_eliminate 0.69% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 3: predicate.row_tensor_eliminate 0.83% : 0.000001s : 6: predicate.same_eliminate 0.52% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 6: predicate.shard_identity_eliminate 0.75% : 0.000001s : 6: predicate.special_op_eliminate 0.92% : 0.000001s : 6: predicate.specialize_transform 1.05% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.27% : 0.000002s : 11: predicate.switch_defer_inline 1.93% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.71% : 0.000007s : 38: predicate.switch_simplify 0.90% : 0.000001s : 8: predicate.tile_eliminate 0.90% : 0.000001s : 8: predicate.transpose_eliminate 1.54% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.35% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.46% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.17% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 3: predicate.value_based_eliminate 0.73% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000303 7 40.07% : 0.000121s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.93% : 0.000181s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.104368 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.89% : 0.003020s : 1: add_attr 2.88% : 0.003010s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000065s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.51% : 0.000534s : 1: bootstrap 0.03% : 0.000028s : 1: cconv 0.00% : 0.000005s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.42% : 0.000437s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.45% : 0.000472s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.80% : 0.000832s : 78: opt.transform.opt_a 0.02% : 0.000024s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000088s : 28: opt.transform.opt_b 0.04% : 0.000041s : 2: opt.transform.opt_trans_graph 0.04% : 0.000046s : 4: opt.transform.symbol_engine_opt 1.93% : 0.002013s : 1: opt_a 0.10% : 0.000100s : 1: opt_after_cconv 0.44% : 0.000463s : 1: opt_after_jit_grad 0.18% : 0.000188s : 1: opt_b 3.75% : 0.003910s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000028s : 1: pre_auto_parallel 0.02% : 0.000022s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000019s : 1: remove_dup_value 0.19% : 0.000198s : 1: renormalize.infer 0.17% : 0.000180s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000037s : 1: rewriter_after_opt_a 0.05% : 0.000055s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000089s : 1: symbol_engine_optimizer 78.36% : 0.081782s : 1: task_emit 0.07% : 0.000071s : 1: tuple_transform 5.80% : 0.006057s : 1: type_inference 0.05% : 0.000057s : 1: validate TotalTime = 0.106476, [24] [bootstrap]: 0.00047267 [type_inference]: 0.00596535 [event_method]: 1.371e-05 [auto_monad]: 5.962e-05 [graph_reusing]: 5.40001e-06 [inline]: 1.94999e-06 [add_attr]: 0.00299115, [1] [add_attr_with_inline]: 0.00298317, [1] [Cycle 1]: 4.7e-05, [2] [tag_attr]: 1.455e-05 [meta_addattr_fg_expand]: 4.83001e-06 [parallel-infer-symbol]: 3.00002e-06 [pre_auto_parallel]: 2.495e-05 [insert-virtual-dataset]: 2.54999e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.26e-06 [pipeline_split]: 1.72001e-06 [optimize]: 0.0040205, [53] [py_interpret_to_execute]: 2.146e-05 [rewriter_before_opt_a]: 6.348e-05 [opt_a]: 0.00214941, [2] [Cycle 1]: 0.00153838, [45] [expand_dump_flag]: 2.83998e-06 [switch_simplify]: 3.289e-05 [loop_unroll]: 2.024e-05 [a_1]: 0.00045852 [with_stream_mark]: 1.317e-05 [recompute_prepare]: 7.58001e-06 [updatestate_depend_eliminate]: 3.94002e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 3.12002e-06 [parameter_eliminate]: 2.19999e-06 [a_2]: 7.997e-05 [accelerated_algorithm]: 6.58998e-06 [shard]: 2.32001e-06 [meta_shard_fg_expand]: 1.69998e-06 [shard_inline]: 6.14999e-06 [merge_send_recv]: 8.27e-06 [auto_parallel]: 6.08002e-06 [parallel]: 1.793e-05 [flash_sp]: 7.36001e-06 [merge_comm]: 4.01001e-06 [allreduce_fusion]: 3.61999e-06 [matmul_add_comm_reduction]: 9.05999e-06 [allreduce_slice_to_reducescatter]: 7.39994e-07 [virtual_shard_identity]: 7.46001e-06 [virtual_dataset]: 6.11e-06 [get_grad_eliminate_]: 5.69e-06 [virtual_output]: 5.86998e-06 [merge_forward]: 3.79002e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 1.043e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.103e-05 [merge_recompute_call_nodes]: 1.72999e-06 [before_grad]: 1.023e-05 [set_forward_comm_id_for_comm_node_pass]: 3.63e-06 [meta_fg_expand]: 2.58998e-06 [flash_sp_send_recv_attached]: 2.44999e-06 [receive_attached]: 1.99e-06 [after_resolve]: 9.61e-06 [a_after_grad]: 8.94e-06 [renormalize]: 0.00041836 [add_forward_monad_depend]: 4.97e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.327e-05 [cse]: 2.83e-05 [a_3]: 4.215e-05 [Cycle 2]: 0.00060076, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 7.21999e-06 [loop_unroll]: 5.86998e-06 [a_1]: 0.00011349 [with_stream_mark]: 9.19e-06 [recompute_prepare]: 5.99e-06 [updatestate_depend_eliminate]: 2.95998e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 7.015e-05 [accelerated_algorithm]: 5.76e-06 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 4.60999e-06 [auto_parallel]: 5.59998e-06 [parallel]: 4.65001e-06 [flash_sp]: 3.45e-06 [merge_comm]: 3.28998e-06 [allreduce_fusion]: 2.91e-06 [matmul_add_comm_reduction]: 5.22999e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.38e-06 [virtual_dataset]: 5.69e-06 [get_grad_eliminate_]: 5.15999e-06 [virtual_output]: 5.15001e-06 [merge_forward]: 2.73e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.90002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.042e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 8.71002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58e-06 [meta_fg_expand]: 1.76003e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 9.50007e-07 [after_resolve]: 8.55999e-06 [a_after_grad]: 7.87e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 6.23998e-06 [cse]: 1.39e-05 [a_3]: 3.285e-05 [py_interpret_to_execute_after_opt_a]: 7.63999e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.332e-05 [convert_after_rewriter]: 6.69999e-06 [order_py_execute_after_rewriter]: 4.93001e-06 [mutable_eliminate]: 0.00045117 [opt_b]: 0.00018548, [1] [Cycle 1]: 0.00017942, [7] [b_1]: 0.00010939 [b_2]: 7.27997e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.19999e-06 [renormalize]: 3.59985e-07 [cse]: 1.778e-05 [optimize_parallel_all_gather_comm]: 1.722e-05 [overlap_param_gather]: 2.05002e-06 [cconv]: 2.262e-05 [loop_unroll]: 0.00041798 [opt_after_cconv]: 9.644e-05, [1] [Cycle 1]: 9.045e-05, [7] [c_1]: 2.58e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 4.82e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.43e-06 [cse]: 1.749e-05 [renormalize]: 4.10015e-07 [remove_dup_value]: 1.561e-05 [tuple_transform]: 6.965e-05, [1] [Cycle 1]: 6.446e-05, [4] [d_1]: 3.68e-05 [none_parameter_eliminate]: 1.59998e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.72002e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 4.369e-05 [cse_after_recomputation]: 2.134e-05, [1] [Cycle 1]: 1.642e-05, [1] [cse]: 1.114e-05 [environ_conv]: 5.17e-06 [swap_dp_allreduce_reducescatter]: 5.17e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.31002e-06 [label_fine_grained_interleaved_index]: 2.75002e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.46998e-06 [micro_interleaved_order_control]: 2.34999e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 1.04998e-06 [remove_cast_before_assign_add]: 1.36002e-06 [full_micro_interleaved_order_control]: 2.37999e-06 [reorder_send_recv_between_fp_bp]: 2.79999e-06 [comm_op_add_attrs]: 1.27999e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.31002e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.24e-05 [grouped_pairwise_exchange_alltoall]: 1.42999e-06 [offloading_packed_experts]: 4.13999e-06 [overlap_recompute_and_grad_model_parallel]: 4.84e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.69999e-06 [overlap_grad_ring_attention]: 4.43999e-06 [overlap_grad_flash_sp]: 1.753e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.21e-06 [split_layernorm_comm]: 2.14999e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 7.008e-05, [1] [Cycle 1]: 6.575e-05, [6] [build]: 2.84001e-06 [elim_shapecalc]: 8.40001e-06 [elim_not_effective]: 1.182e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 9.30001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.59e-06 [pipeline_parallel_scheduler]: 1.79998e-06 [auto_monad_reorder]: 1.606e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00046224 [validate]: 3.466e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0921657 [execute]: 1.007e-05 Sums bootstrap : 0.000473s : 0.46% type_inference : 0.005965s : 5.82% event_method : 0.000014s : 0.01% auto_monad : 0.000060s : 0.06% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000063s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000572s : 0.56% optimize.opt_a.with_stream_mark : 0.000022s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000150s : 0.15% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000018s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000418s : 0.41% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000075s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000451s : 0.44% optimize.opt_b.b_1 : 0.000109s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000418s : 0.41% optimize.opt_after_cconv.c_1 : 0.000026s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.02% optimize.tuple_transform.d_1 : 0.000037s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000462s : 0.45% validate : 0.000035s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.092166s : 89.94% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000165 26 18.75% : 0.000031s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.84% : 0.000001s : 2: substitution.fold_const_symbol 3.16% : 0.000005s : 3: substitution.graph_param_transform 64.52% : 0.000106s : 3: substitution.inline 1.88% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.66% : 0.000004s : 4: substitution.remove_not_recompute_node 1.71% : 0.000003s : 2: substitution.replace_old_param 5.32% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005924 2 88.70% : 0.005254s : 1: type_inference.infer 11.30% : 0.000669s : 1: type_inference.specialize ------[replace.] 0.000037 4 79.05% : 0.000029s : 3: replace.inline 20.95% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000112 4 92.93% : 0.000104s : 3: match.inline 7.07% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 883 0.94% : 0.000001s : 9: predicate.accumulaten_eliminater 0.93% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.92% : 0.000001s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 15: predicate.arithmetic_simplify 0.96% : 0.000002s : 9: predicate.cast_eliminate 0.75% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.59% : 0.000001s : 6: predicate.depend_value_elim 0.94% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.04% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 3: predicate.elim_not_effective 0.37% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_depend_swap 1.74% : 0.000003s : 18: predicate.environ_get_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.27% : 0.000004s : 13: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 3: predicate.fold_const_symbol 0.72% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.60% : 0.000001s : 6: predicate.incorporate_call_switch 6.47% : 0.000010s : 40: predicate.inline 0.89% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 6: predicate.less_batch_normalization 1.81% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 25: predicate.load_eliminater 0.93% : 0.000001s : 3: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.59% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.84% : 0.000001s : 9: predicate.minmaximum_grad 1.08% : 0.000002s : 3: predicate.mutable_eliminate 0.39% : 0.000001s : 3: predicate.opt_reshape 0.36% : 0.000001s : 3: predicate.parallel_virtual_node 1.56% : 0.000002s : 13: predicate.partial_defer_inline 1.45% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.20% : 0.000002s : 9: predicate.reduce_eliminate 2.49% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.29% : 0.000002s : 16: predicate.replace_applicator 0.86% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 0.92% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 3: predicate.row_tensor_eliminate 0.81% : 0.000001s : 6: predicate.same_eliminate 0.44% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 6: predicate.shard_identity_eliminate 0.76% : 0.000001s : 6: predicate.special_op_eliminate 0.84% : 0.000001s : 6: predicate.specialize_transform 0.91% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.74% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 13: predicate.switch_defer_inline 1.98% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.97% : 0.000008s : 43: predicate.switch_simplify 0.96% : 0.000002s : 9: predicate.tile_eliminate 0.90% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 15: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.05% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.78% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000354 8 45.74% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.26% : 0.000192s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.114992 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.61% : 0.002996s : 1: add_attr 2.60% : 0.002987s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000065s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.44% : 0.000507s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.37% : 0.000427s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.40% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.82% : 0.000940s : 78: opt.transform.opt_a 0.02% : 0.000024s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000041s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.87% : 0.002152s : 1: opt_a 0.09% : 0.000100s : 1: opt_after_cconv 0.41% : 0.000471s : 1: opt_after_jit_grad 0.16% : 0.000189s : 1: opt_b 3.50% : 0.004024s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000030s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000019s : 1: remove_dup_value 0.18% : 0.000210s : 1: renormalize.infer 0.18% : 0.000202s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000037s : 1: rewriter_after_opt_a 0.06% : 0.000068s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000073s : 1: symbol_engine_optimizer 80.17% : 0.092191s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 5.20% : 0.005979s : 1: type_inference 0.05% : 0.000057s : 1: validate TotalTime = 0.119394, [24] [bootstrap]: 0.00046561 [type_inference]: 0.0116745 [event_method]: 4.684e-05 [auto_monad]: 0.00013072 [graph_reusing]: 8.70001e-06 [inline]: 2.09e-06 [add_attr]: 0.00312581, [1] [add_attr_with_inline]: 0.00311737, [1] [Cycle 1]: 7.545e-05, [2] [tag_attr]: 3.391e-05 [meta_addattr_fg_expand]: 9.97999e-06 [parallel-infer-symbol]: 3.19001e-06 [pre_auto_parallel]: 4.943e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 8.29983e-07 [dataset_repeat_opt]: 2.25002e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.0171283, [53] [py_interpret_to_execute]: 4.142e-05 [rewriter_before_opt_a]: 0.00015479 [opt_a]: 0.0149158, [3] [Cycle 1]: 0.0114233, [45] [expand_dump_flag]: 4.24002e-06 [switch_simplify]: 7.658e-05 [loop_unroll]: 6.335e-05 [a_1]: 0.0014416 [with_stream_mark]: 2.425e-05 [recompute_prepare]: 2.217e-05 [updatestate_depend_eliminate]: 8.80001e-06 [updatestate_assign_eliminate]: 7.78001e-06 [updatestate_loads_eliminate]: 7.31999e-06 [parameter_eliminate]: 2.63e-06 [a_2]: 0.0002442 [accelerated_algorithm]: 3.116e-05 [shard]: 1.91998e-06 [meta_shard_fg_expand]: 3.42002e-06 [shard_inline]: 1.608e-05 [merge_send_recv]: 1.756e-05 [auto_parallel]: 1.163e-05 [parallel]: 1.944e-05 [flash_sp]: 1.196e-05 [merge_comm]: 9.36e-06 [allreduce_fusion]: 8.90001e-06 [matmul_add_comm_reduction]: 2.842e-05 [allreduce_slice_to_reducescatter]: 6.80011e-07 [virtual_shard_identity]: 1.953e-05 [virtual_dataset]: 1.575e-05 [get_grad_eliminate_]: 1.518e-05 [virtual_output]: 1.551e-05 [merge_forward]: 9.46e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 1.85e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.073e-05 [merge_recompute_call_nodes]: 1.61998e-06 [before_grad]: 2.902e-05 [set_forward_comm_id_for_comm_node_pass]: 9.77999e-06 [meta_fg_expand]: 0.00152786 [flash_sp_send_recv_attached]: 4.22e-06 [receive_attached]: 2.31e-06 [after_resolve]: 6.525e-05 [a_after_grad]: 8.732e-05 [renormalize]: 0.00655358 [add_forward_monad_depend]: 9.81998e-06 [auto_monad_grad]: 6.17001e-06 [auto_monad_eliminator]: 5.291e-05 [cse]: 0.00019209 [a_3]: 0.0003435 [Cycle 2]: 0.00277213, [45] [expand_dump_flag]: 2.07001e-06 [switch_simplify]: 4.552e-05 [loop_unroll]: 4.179e-05 [a_1]: 0.0013469 [with_stream_mark]: 1.249e-05 [recompute_prepare]: 8.91997e-06 [updatestate_depend_eliminate]: 4.02e-06 [updatestate_assign_eliminate]: 3.23e-06 [updatestate_loads_eliminate]: 2.76e-06 [parameter_eliminate]: 1.45001e-06 [a_2]: 8.717e-05 [accelerated_algorithm]: 1.052e-05 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.79998e-06 [shard_inline]: 6.62002e-06 [merge_send_recv]: 7.45998e-06 [auto_parallel]: 7.63001e-06 [parallel]: 6.53998e-06 [flash_sp]: 3.88999e-06 [merge_comm]: 4.25e-06 [allreduce_fusion]: 3.56001e-06 [matmul_add_comm_reduction]: 7.18e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 7.36999e-06 [virtual_dataset]: 6.18998e-06 [get_grad_eliminate_]: 6.31e-06 [virtual_output]: 5.90002e-06 [merge_forward]: 3.88001e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 8.82e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.362e-05 [merge_recompute_call_nodes]: 9.20001e-07 [before_grad]: 1.103e-05 [set_forward_comm_id_for_comm_node_pass]: 4.1e-06 [meta_fg_expand]: 8.019e-05 [flash_sp_send_recv_attached]: 1.42999e-06 [receive_attached]: 1.44e-06 [after_resolve]: 1.262e-05 [a_after_grad]: 1.022e-05 [renormalize]: 0.00060773 [add_forward_monad_depend]: 4.22e-06 [auto_monad_grad]: 2.16e-06 [auto_monad_eliminator]: 1.097e-05 [cse]: 2.162e-05 [a_3]: 4.706e-05 [Cycle 3]: 0.00070401, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 8.48999e-06 [loop_unroll]: 6.58e-06 [a_1]: 0.0001509 [with_stream_mark]: 8.78001e-06 [recompute_prepare]: 7.75e-06 [updatestate_depend_eliminate]: 3.93001e-06 [updatestate_assign_eliminate]: 2.98998e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 9.248e-05 [accelerated_algorithm]: 1.11e-05 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 7.21001e-06 [merge_send_recv]: 6.13998e-06 [auto_parallel]: 6.34001e-06 [parallel]: 5.00001e-06 [flash_sp]: 9.40025e-07 [merge_comm]: 3.83999e-06 [allreduce_fusion]: 3.45998e-06 [matmul_add_comm_reduction]: 6.04001e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 7.75e-06 [virtual_dataset]: 6.33e-06 [get_grad_eliminate_]: 6.16e-06 [virtual_output]: 6.54999e-06 [merge_forward]: 2.92002e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 6.74999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.382e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 1.073e-05 [set_forward_comm_id_for_comm_node_pass]: 3.81999e-06 [meta_fg_expand]: 2.24999e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 8.79e-06 [a_after_grad]: 9.70002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 1.01002e-06 [auto_monad_eliminator]: 7.43e-06 [cse]: 1.654e-05 [a_3]: 3.911e-05 [py_interpret_to_execute_after_opt_a]: 1.145e-05 [slice_cell_reuse_recomputed_activation]: 2.22001e-06 [rewriter_after_opt_a]: 4.116e-05 [convert_after_rewriter]: 7.76001e-06 [order_py_execute_after_rewriter]: 6.09001e-06 [mutable_eliminate]: 0.00054559 [opt_b]: 0.00022311, [1] [Cycle 1]: 0.0002156, [7] [b_1]: 0.00013894 [b_2]: 8.65999e-06 [updatestate_depend_eliminate]: 5.75001e-06 [updatestate_assign_eliminate]: 2.84999e-06 [updatestate_loads_eliminate]: 2.56e-06 [renormalize]: 3.10014e-07 [cse]: 2.115e-05 [optimize_parallel_all_gather_comm]: 1.73e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.671e-05 [loop_unroll]: 0.00043138 [opt_after_cconv]: 0.00011268, [1] [Cycle 1]: 0.00010622, [7] [c_1]: 3.41e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 6.19999e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 2.94999e-06 [cse]: 2.132e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.577e-05 [tuple_transform]: 7.874e-05, [1] [Cycle 1]: 7.381e-05, [4] [d_1]: 4.581e-05 [none_parameter_eliminate]: 1.72001e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 7.38e-06 [partial_unused_args_eliminate]: 2.14999e-06 [add_recomputation]: 4.96e-05 [cse_after_recomputation]: 2.553e-05, [1] [Cycle 1]: 2.086e-05, [1] [cse]: 1.498e-05 [environ_conv]: 8.77e-06 [swap_dp_allreduce_reducescatter]: 5.65001e-06 [bias_add_comm_swap]: 2.59001e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.49999e-06 [assign_add_opt]: 1.33002e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.15999e-06 [full_micro_interleaved_order_control]: 2.78998e-06 [reorder_send_recv_between_fp_bp]: 3.06001e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 1.63002e-06 [control_data_broadcast_order]: 1.39e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 4.48001e-06 [overlap_recompute_and_grad_model_parallel]: 5.10001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50999e-06 [overlap_recompute_comm]: 2.02001e-06 [overlap_grad_ring_attention]: 4.47998e-06 [overlap_grad_flash_sp]: 2.199e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.91e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 8.75e-05, [1] [Cycle 1]: 8.271e-05, [6] [build]: 9.66e-06 [elim_shapecalc]: 1.05e-05 [elim_not_effective]: 1.444e-05 [opt_reshape]: 7.59002e-06 [fold_const_symbol]: 1.145e-05 [renormalize]: 3.10014e-07 [detach_backward]: 2.06998e-06 [pipeline_parallel_scheduler]: 1.81003e-06 [auto_monad_reorder]: 2.154e-05 [get_jit_bprop_graph]: 1.34998e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00046275 [validate]: 4.242e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0859794 [execute]: 9.44998e-06 Sums bootstrap : 0.000466s : 0.41% type_inference : 0.011675s : 10.16% event_method : 0.000047s : 0.04% auto_monad : 0.000131s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000041s : 0.04% optimize.rewriter_before_opt_a : 0.000155s : 0.13% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000131s : 0.11% optimize.opt_a.loop_unroll : 0.000112s : 0.10% optimize.opt_a.a_1 : 0.002939s : 2.56% optimize.opt_a.with_stream_mark : 0.000046s : 0.04% optimize.opt_a.recompute_prepare : 0.000039s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000424s : 0.37% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000030s : 0.03% optimize.opt_a.merge_send_recv : 0.000031s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000031s : 0.03% optimize.opt_a.flash_sp : 0.000017s : 0.01% optimize.opt_a.merge_comm : 0.000017s : 0.02% optimize.opt_a.allreduce_fusion : 0.000016s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000035s : 0.03% optimize.opt_a.virtual_dataset : 0.000028s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.02% optimize.opt_a.virtual_output : 0.000028s : 0.02% optimize.opt_a.merge_forward : 0.000016s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000034s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000058s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000051s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.02% optimize.opt_a.meta_fg_expand : 0.001610s : 1.40% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000087s : 0.08% optimize.opt_a.a_after_grad : 0.000107s : 0.09% optimize.opt_a.renormalize : 0.007161s : 6.24% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000071s : 0.06% optimize.opt_a.cse : 0.000230s : 0.20% optimize.opt_a.a_3 : 0.000430s : 0.37% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000041s : 0.04% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000546s : 0.48% optimize.opt_b.b_1 : 0.000139s : 0.12% optimize.opt_b.b_2 : 0.000009s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000021s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000027s : 0.02% optimize.loop_unroll : 0.000431s : 0.38% optimize.opt_after_cconv.c_1 : 0.000034s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000021s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.01% optimize.tuple_transform.d_1 : 0.000046s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.04% optimize.cse_after_recomputation.cse : 0.000015s : 0.01% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000022s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000463s : 0.40% validate : 0.000042s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.085979s : 74.86% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000715 161 7.12% : 0.000051s : 8: substitution.arithmetic_simplify 0.33% : 0.000002s : 3: substitution.elim_not_effective 0.65% : 0.000005s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 3: substitution.fold_const_symbol 0.87% : 0.000006s : 4: substitution.graph_param_transform 0.40% : 0.000003s : 2: substitution.incorporate_call 0.35% : 0.000003s : 2: substitution.incorporate_call_switch 57.81% : 0.000413s : 17: substitution.inline 2.35% : 0.000017s : 2: substitution.inline_without_move 1.42% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.23% : 0.000016s : 3: substitution.less_batch_normalization 1.41% : 0.000010s : 7: substitution.minmaximum_grad 0.92% : 0.000007s : 5: substitution.partial_eliminate 1.87% : 0.000013s : 15: substitution.remove_not_recompute_node 3.81% : 0.000027s : 10: substitution.replace_applicator 1.31% : 0.000009s : 10: substitution.replace_old_param 0.42% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.04% : 0.000022s : 7: substitution.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.92% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.38% : 0.000053s : 19: substitution.tuple_list_get_item_eliminator 2.04% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011596 2 86.24% : 0.010000s : 1: type_inference.infer 13.76% : 0.001596s : 1: type_inference.specialize ------[replace.] 0.000199 27 63.81% : 0.000127s : 17: replace.inline 36.19% : 0.000072s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000430 27 93.80% : 0.000404s : 17: match.inline 6.20% : 0.000027s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000699 4248 1.14% : 0.000008s : 53: predicate.accumulaten_eliminater 0.23% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.10% : 0.000008s : 53: predicate.addn_zero_filter 1.09% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 1.95% : 0.000014s : 74: predicate.arithmetic_simplify 1.16% : 0.000008s : 53: predicate.cast_eliminate 1.13% : 0.000008s : 50: predicate.check_bprop_eliminate 0.47% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.45% : 0.000003s : 21: predicate.depend_value_elim 1.15% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.19% : 0.000008s : 53: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000000s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_depend_swap 1.71% : 0.000012s : 78: predicate.environ_get_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.83% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.54% : 0.000018s : 80: predicate.float_depend_g_call 0.46% : 0.000003s : 21: predicate.float_environ_get_switch 0.60% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.53% : 0.000004s : 21: predicate.get_grad_eliminate 0.06% : 0.000000s : 4: predicate.graph_param_transform 0.53% : 0.000004s : 21: predicate.incorporate_call 0.45% : 0.000003s : 21: predicate.incorporate_call_switch 5.80% : 0.000041s : 183: predicate.inline 1.41% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.61% : 0.000004s : 21: predicate.less_batch_normalization 1.56% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.67% : 0.000019s : 124: predicate.load_eliminater 0.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.54% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 61: predicate.make_slice_get_slice_eliminator 0.49% : 0.000003s : 21: predicate.merge_addn 1.08% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 53: predicate.minmaximum_grad 0.32% : 0.000002s : 4: predicate.mutable_eliminate 0.12% : 0.000001s : 4: predicate.opt_reshape 0.10% : 0.000001s : 4: predicate.parallel_virtual_node 2.10% : 0.000015s : 80: predicate.partial_defer_inline 1.75% : 0.000012s : 67: predicate.partial_eliminate 1.12% : 0.000008s : 53: predicate.print_const_string_wrapper 0.49% : 0.000003s : 21: predicate.reduce_all_const_elim 1.44% : 0.000010s : 53: predicate.reduce_eliminate 2.64% : 0.000018s : 124: predicate.redundant_stop_gradient_eliminater 0.30% : 0.000002s : 21: predicate.remove_not_recompute_node 1.90% : 0.000013s : 113: predicate.replace_applicator 0.70% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.13% : 0.000008s : 53: predicate.reshape_eliminate 1.15% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.26% : 0.000009s : 50: predicate.same_eliminate 0.33% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.56% : 0.000004s : 21: predicate.shard_identity_eliminate 0.22% : 0.000002s : 8: predicate.special_op_eliminate 0.63% : 0.000004s : 21: predicate.specialize_transform 1.29% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.98% : 0.000014s : 80: predicate.switch_defer_inline 3.08% : 0.000022s : 130: predicate.switch_layer_defer_inline 5.32% : 0.000037s : 218: predicate.switch_simplify 1.15% : 0.000008s : 53: predicate.tile_eliminate 1.12% : 0.000008s : 53: predicate.transpose_eliminate 1.40% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000009s : 61: predicate.tuple_list_get_item_depend_reorder 2.74% : 0.000019s : 92: predicate.tuple_list_get_item_eliminator 1.42% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.58% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.62% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.20% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 4: predicate.value_based_eliminate 0.50% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.50% : 0.000003s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001806 36 60.27% : 0.001088s : 15: func_graph_cloner_run.FuncGraphClonerGraph 39.73% : 0.000718s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.151445 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.07% : 0.003131s : 1: add_attr 2.06% : 0.003121s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000138s : 1: auto_monad 0.02% : 0.000025s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.33% : 0.000502s : 1: bootstrap 0.02% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000028s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000053s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.29% : 0.000439s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.37% : 0.000555s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000016s : 1: opt.transform.mutable_eliminate 2.93% : 0.004439s : 117: opt.transform.opt_a 0.02% : 0.000032s : 1: opt.transform.opt_after_cconv 0.02% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000117s : 28: opt.transform.opt_b 0.03% : 0.000051s : 2: opt.transform.opt_trans_graph 0.03% : 0.000040s : 4: opt.transform.symbol_engine_opt 9.85% : 0.014919s : 1: opt_a 0.08% : 0.000116s : 1: opt_after_cconv 0.31% : 0.000472s : 1: opt_after_jit_grad 0.15% : 0.000227s : 1: opt_b 11.31% : 0.017133s : 1: optimize 0.01% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000055s : 1: pre_auto_parallel 0.03% : 0.000046s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000020s : 1: remove_dup_value 3.66% : 0.005550s : 2: renormalize.infer 1.05% : 0.001596s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000045s : 1: rewriter_after_opt_a 0.11% : 0.000159s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000090s : 1: symbol_engine_optimizer 56.79% : 0.086003s : 1: task_emit 0.05% : 0.000082s : 1: tuple_transform 7.72% : 0.011692s : 1: type_inference 0.04% : 0.000066s : 1: validate TotalTime = 0.102221, [24] [bootstrap]: 0.00047263 [type_inference]: 0.00580032 [event_method]: 1.246e-05 [auto_monad]: 5.959e-05 [graph_reusing]: 5.76e-06 [inline]: 2.17999e-06 [add_attr]: 0.00300618, [1] [add_attr_with_inline]: 0.00299717, [1] [Cycle 1]: 4.584e-05, [2] [tag_attr]: 1.425e-05 [meta_addattr_fg_expand]: 4.12e-06 [parallel-infer-symbol]: 3.13e-06 [pre_auto_parallel]: 2.327e-05 [insert-virtual-dataset]: 2.51998e-06 [parallel-infer-symbol-second]: 8.49977e-07 [dataset_repeat_opt]: 1.93997e-06 [pipeline_split]: 1.83002e-06 [optimize]: 0.00389732, [53] [py_interpret_to_execute]: 1.868e-05 [rewriter_before_opt_a]: 5.101e-05 [opt_a]: 0.00199822, [2] [Cycle 1]: 0.00139292, [45] [expand_dump_flag]: 2.78998e-06 [switch_simplify]: 2.875e-05 [loop_unroll]: 1.721e-05 [a_1]: 0.00035335 [with_stream_mark]: 1.451e-05 [recompute_prepare]: 7.56001e-06 [updatestate_depend_eliminate]: 3.58999e-06 [updatestate_assign_eliminate]: 3.76999e-06 [updatestate_loads_eliminate]: 3.01001e-06 [parameter_eliminate]: 1.70001e-06 [a_2]: 7.966e-05 [accelerated_algorithm]: 6.46e-06 [shard]: 2.04e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 8.72e-06 [auto_parallel]: 5.99999e-06 [parallel]: 1.898e-05 [flash_sp]: 7.31001e-06 [merge_comm]: 3.80998e-06 [allreduce_fusion]: 3.96001e-06 [matmul_add_comm_reduction]: 9.70002e-06 [allreduce_slice_to_reducescatter]: 8.2e-07 [virtual_shard_identity]: 8.17e-06 [virtual_dataset]: 5.69e-06 [get_grad_eliminate_]: 5.69e-06 [virtual_output]: 5.44998e-06 [merge_forward]: 3.87002e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.20999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.138e-05 [merge_recompute_call_nodes]: 1.57999e-06 [before_grad]: 1.014e-05 [set_forward_comm_id_for_comm_node_pass]: 3.62002e-06 [meta_fg_expand]: 2.50002e-06 [flash_sp_send_recv_attached]: 2.86999e-06 [receive_attached]: 2.39999e-06 [after_resolve]: 9.76998e-06 [a_after_grad]: 8.52998e-06 [renormalize]: 0.00038673 [add_forward_monad_depend]: 4.65001e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.332e-05 [cse]: 2.939e-05 [a_3]: 4.106e-05 [Cycle 2]: 0.00059519, [45] [expand_dump_flag]: 8.49977e-07 [switch_simplify]: 6.88e-06 [loop_unroll]: 5.62001e-06 [a_1]: 0.00011271 [with_stream_mark]: 9.59e-06 [recompute_prepare]: 5.79e-06 [updatestate_depend_eliminate]: 3.03e-06 [updatestate_assign_eliminate]: 2.53003e-06 [updatestate_loads_eliminate]: 2.65002e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.981e-05 [accelerated_algorithm]: 5.61003e-06 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 4.57998e-06 [auto_parallel]: 5.39e-06 [parallel]: 4.57e-06 [flash_sp]: 3.06999e-06 [merge_comm]: 3.20998e-06 [allreduce_fusion]: 3.04001e-06 [matmul_add_comm_reduction]: 5.20999e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 5.93002e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 5.09e-06 [virtual_output]: 4.99998e-06 [merge_forward]: 2.72001e-06 [cell_reuse_recompute_pass]: 1.26997e-06 [offload_activation]: 6.45997e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.056e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.85999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 7.93001e-06 [a_after_grad]: 7.56999e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.38e-06 [cse]: 1.321e-05 [a_3]: 3.22e-05 [py_interpret_to_execute_after_opt_a]: 7.18998e-06 [slice_cell_reuse_recomputed_activation]: 2.08002e-06 [rewriter_after_opt_a]: 3.351e-05 [convert_after_rewriter]: 6.65002e-06 [order_py_execute_after_rewriter]: 5.25999e-06 [mutable_eliminate]: 0.00049731 [opt_b]: 0.00018515, [1] [Cycle 1]: 0.00017864, [7] [b_1]: 0.00010791 [b_2]: 7.18e-06 [updatestate_depend_eliminate]: 5.40001e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 3.89991e-07 [cse]: 1.764e-05 [optimize_parallel_all_gather_comm]: 1.585e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.463e-05 [loop_unroll]: 0.00041692 [opt_after_cconv]: 9.59e-05, [1] [Cycle 1]: 8.968e-05, [7] [c_1]: 2.547e-05 [parameter_eliminate]: 2.26998e-06 [updatestate_depend_eliminate]: 5.31998e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.748e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.464e-05 [tuple_transform]: 6.951e-05, [1] [Cycle 1]: 6.464e-05, [4] [d_1]: 3.697e-05 [none_parameter_eliminate]: 1.64998e-06 [renormalize]: 2.80008e-07 [switch_simplify]: 6.66e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 4.473e-05 [cse_after_recomputation]: 2.15e-05, [1] [Cycle 1]: 1.662e-05, [1] [cse]: 1.109e-05 [environ_conv]: 5.56e-06 [swap_dp_allreduce_reducescatter]: 5.09998e-06 [bias_add_comm_swap]: 2.61e-06 [label_micro_interleaved_index]: 4.85999e-06 [label_fine_grained_interleaved_index]: 2.91e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.35002e-06 [micro_interleaved_order_control]: 2.66999e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.91999e-06 [comm_op_add_attrs]: 1.18001e-06 [add_comm_op_reuse_tag]: 1.20001e-06 [interleave_split_concat_branches]: 1.28002e-06 [interleave_parallel_branches]: 1.12999e-06 [overlap_opt_shard_in_pipeline]: 1.15999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.223e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 4e-06 [overlap_recompute_and_grad_model_parallel]: 4.82998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.34001e-06 [overlap_grad_ring_attention]: 4.30999e-06 [overlap_grad_flash_sp]: 1.626e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.72001e-06 [handle_group_info]: 1.11002e-06 [symbol_engine_optimizer]: 7.191e-05, [1] [Cycle 1]: 6.729e-05, [6] [build]: 2.59999e-06 [elim_shapecalc]: 8.74e-06 [elim_not_effective]: 1.167e-05 [opt_reshape]: 6.42001e-06 [fold_const_symbol]: 9.49999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.53002e-06 [auto_monad_reorder]: 1.624e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.00045542 [validate]: 3.55e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.0881921 [execute]: 9.85002e-06 Sums bootstrap : 0.000473s : 0.48% type_inference : 0.005800s : 5.91% event_method : 0.000012s : 0.01% auto_monad : 0.000060s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.02% optimize.rewriter_before_opt_a : 0.000051s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000036s : 0.04% optimize.opt_a.loop_unroll : 0.000023s : 0.02% optimize.opt_a.a_1 : 0.000466s : 0.47% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000149s : 0.15% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000024s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000018s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000387s : 0.39% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000043s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000497s : 0.51% optimize.opt_b.b_1 : 0.000108s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.03% optimize.loop_unroll : 0.000417s : 0.42% optimize.opt_after_cconv.c_1 : 0.000025s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.01% optimize.tuple_transform.d_1 : 0.000037s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000455s : 0.46% validate : 0.000036s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.088192s : 89.80% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000141 24 19.81% : 0.000028s : 4: substitution.arithmetic_simplify 1.31% : 0.000002s : 2: substitution.elim_not_effective 0.94% : 0.000001s : 2: substitution.fold_const_symbol 3.62% : 0.000005s : 3: substitution.graph_param_transform 66.77% : 0.000094s : 3: substitution.inline 2.37% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.17% : 0.000004s : 4: substitution.remove_not_recompute_node 2.00% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005756 2 91.89% : 0.005289s : 1: type_inference.infer 8.11% : 0.000467s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000092 3 100.00% : 0.000092s : 3: match.inline ------[predicate.] 0.000146 815 1.10% : 0.000002s : 8: predicate.accumulaten_eliminater 0.90% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 6: predicate.addn_check_dump 0.88% : 0.000001s : 8: predicate.addn_zero_filter 0.80% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.27% : 0.000003s : 14: predicate.arithmetic_simplify 0.89% : 0.000001s : 8: predicate.cast_eliminate 0.73% : 0.000001s : 6: predicate.check_bprop_eliminate 0.65% : 0.000001s : 6: predicate.compare_switch_simplify 0.25% : 0.000000s : 3: predicate.const_output_eliminate 0.66% : 0.000001s : 6: predicate.depend_value_elim 0.84% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.26% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_depend_swap 1.85% : 0.000003s : 17: predicate.environ_get_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.24% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 6: predicate.float_environ_get_switch 0.93% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.76% : 0.000001s : 6: predicate.incorporate_call 0.66% : 0.000001s : 6: predicate.incorporate_call_switch 6.14% : 0.000009s : 37: predicate.inline 1.00% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 6: predicate.less_batch_normalization 1.59% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 22: predicate.load_eliminater 1.11% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.00% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.67% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.84% : 0.000001s : 8: predicate.minmaximum_grad 1.12% : 0.000002s : 3: predicate.mutable_eliminate 0.43% : 0.000001s : 3: predicate.opt_reshape 0.36% : 0.000001s : 3: predicate.parallel_virtual_node 1.41% : 0.000002s : 11: predicate.partial_defer_inline 1.33% : 0.000002s : 11: predicate.partial_eliminate 1.04% : 0.000002s : 8: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.41% : 0.000002s : 8: predicate.reduce_eliminate 2.26% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.62% : 0.000001s : 6: predicate.remove_not_recompute_node 1.24% : 0.000002s : 14: predicate.replace_applicator 0.82% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.89% : 0.000001s : 8: predicate.reshape_eliminate 0.65% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000001s : 6: predicate.same_eliminate 0.52% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 6: predicate.shard_identity_eliminate 0.79% : 0.000001s : 6: predicate.special_op_eliminate 0.94% : 0.000001s : 6: predicate.specialize_transform 1.06% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.26% : 0.000002s : 11: predicate.switch_defer_inline 1.83% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.92% : 0.000007s : 38: predicate.switch_simplify 0.83% : 0.000001s : 8: predicate.tile_eliminate 0.84% : 0.000001s : 8: predicate.transpose_eliminate 1.58% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.00% : 0.000004s : 20: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.40% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.25% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.74% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000320 7 46.35% : 0.000148s : 2: func_graph_cloner_run.FuncGraphClonerGraph 53.65% : 0.000172s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.110479 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.72% : 0.003010s : 1: add_attr 2.72% : 0.003001s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000065s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.46% : 0.000510s : 1: bootstrap 0.03% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.39% : 0.000426s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.46% : 0.000506s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.75% : 0.000823s : 78: opt.transform.opt_a 0.02% : 0.000024s : 1: opt.transform.opt_after_cconv 0.02% : 0.000020s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000088s : 28: opt.transform.opt_b 0.04% : 0.000042s : 2: opt.transform.opt_trans_graph 0.03% : 0.000033s : 4: opt.transform.symbol_engine_opt 1.81% : 0.002001s : 1: opt_a 0.09% : 0.000099s : 1: opt_after_cconv 0.42% : 0.000464s : 1: opt_after_jit_grad 0.17% : 0.000188s : 1: opt_b 3.53% : 0.003901s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000022s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 0.18% : 0.000201s : 1: renormalize.infer 0.16% : 0.000179s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000037s : 1: rewriter_after_opt_a 0.05% : 0.000055s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000075s : 1: symbol_engine_optimizer 79.85% : 0.088215s : 1: task_emit 0.07% : 0.000072s : 1: tuple_transform 5.26% : 0.005815s : 1: type_inference 0.05% : 0.000057s : 1: validate TotalTime = 0.118277, [24] [bootstrap]: 0.00048937 [type_inference]: 0.0120661 [event_method]: 4.669e-05 [auto_monad]: 0.00013049 [graph_reusing]: 8.87999e-06 [inline]: 2.15002e-06 [add_attr]: 0.00313665, [1] [add_attr_with_inline]: 0.00312838, [1] [Cycle 1]: 7.36e-05, [2] [tag_attr]: 3.35e-05 [meta_addattr_fg_expand]: 1.066e-05 [parallel-infer-symbol]: 2.96999e-06 [pre_auto_parallel]: 4.808e-05 [insert-virtual-dataset]: 2.73003e-06 [parallel-infer-symbol-second]: 8.60018e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.82999e-06 [optimize]: 0.0171809, [53] [py_interpret_to_execute]: 4.031e-05 [rewriter_before_opt_a]: 0.00015014 [opt_a]: 0.0149024, [3] [Cycle 1]: 0.011208, [45] [expand_dump_flag]: 3.73999e-06 [switch_simplify]: 7.669e-05 [loop_unroll]: 6.268e-05 [a_1]: 0.00140951 [with_stream_mark]: 2.373e-05 [recompute_prepare]: 2.307e-05 [updatestate_depend_eliminate]: 9.11998e-06 [updatestate_assign_eliminate]: 7.77e-06 [updatestate_loads_eliminate]: 7.14001e-06 [parameter_eliminate]: 2.93e-06 [a_2]: 0.00025006 [accelerated_algorithm]: 3.186e-05 [shard]: 1.77999e-06 [meta_shard_fg_expand]: 3.74002e-06 [shard_inline]: 1.633e-05 [merge_send_recv]: 1.737e-05 [auto_parallel]: 1.096e-05 [parallel]: 1.83e-05 [flash_sp]: 1.069e-05 [merge_comm]: 1.009e-05 [allreduce_fusion]: 9.00001e-06 [matmul_add_comm_reduction]: 2.677e-05 [allreduce_slice_to_reducescatter]: 6.90023e-07 [virtual_shard_identity]: 1.825e-05 [virtual_dataset]: 1.608e-05 [get_grad_eliminate_]: 1.551e-05 [virtual_output]: 1.529e-05 [merge_forward]: 1.023e-05 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 1.807e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.01e-05 [merge_recompute_call_nodes]: 1.62999e-06 [before_grad]: 2.975e-05 [set_forward_comm_id_for_comm_node_pass]: 1.022e-05 [meta_fg_expand]: 0.00161717 [flash_sp_send_recv_attached]: 3.88001e-06 [receive_attached]: 2.91999e-06 [after_resolve]: 6.673e-05 [a_after_grad]: 9.084e-05 [renormalize]: 0.00629058 [add_forward_monad_depend]: 9.62001e-06 [auto_monad_grad]: 5.62001e-06 [auto_monad_eliminator]: 5.4e-05 [cse]: 0.00019518 [a_3]: 0.00037151 [Cycle 2]: 0.00296873, [45] [expand_dump_flag]: 1.81e-06 [switch_simplify]: 5.014e-05 [loop_unroll]: 4.741e-05 [a_1]: 0.00150634 [with_stream_mark]: 1.249e-05 [recompute_prepare]: 9.24e-06 [updatestate_depend_eliminate]: 4.21001e-06 [updatestate_assign_eliminate]: 3.07002e-06 [updatestate_loads_eliminate]: 3.16001e-06 [parameter_eliminate]: 1.03001e-06 [a_2]: 9.047e-05 [accelerated_algorithm]: 1.108e-05 [shard]: 1.71998e-06 [meta_shard_fg_expand]: 1.89e-06 [shard_inline]: 7.36001e-06 [merge_send_recv]: 6.71999e-06 [auto_parallel]: 7.34002e-06 [parallel]: 5.92001e-06 [flash_sp]: 4.16001e-06 [merge_comm]: 4.80999e-06 [allreduce_fusion]: 3.91999e-06 [matmul_add_comm_reduction]: 6.75998e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 9.80002e-06 [virtual_dataset]: 7.77002e-06 [get_grad_eliminate_]: 7.18e-06 [virtual_output]: 6.63e-06 [merge_forward]: 4.4e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 8.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.415e-05 [merge_recompute_call_nodes]: 9.50007e-07 [before_grad]: 1.199e-05 [set_forward_comm_id_for_comm_node_pass]: 4.23001e-06 [meta_fg_expand]: 5.791e-05 [flash_sp_send_recv_attached]: 9.99979e-07 [receive_attached]: 1.35999e-06 [after_resolve]: 1.285e-05 [a_after_grad]: 1.046e-05 [renormalize]: 0.0006601 [add_forward_monad_depend]: 4.59002e-06 [auto_monad_grad]: 1.96e-06 [auto_monad_eliminator]: 1.328e-05 [cse]: 2.482e-05 [a_3]: 5.035e-05 [Cycle 3]: 0.00070871, [45] [expand_dump_flag]: 1.30001e-06 [switch_simplify]: 8.42998e-06 [loop_unroll]: 6.86001e-06 [a_1]: 0.00015245 [with_stream_mark]: 8.79e-06 [recompute_prepare]: 7.21999e-06 [updatestate_depend_eliminate]: 4.12e-06 [updatestate_assign_eliminate]: 3.43999e-06 [updatestate_loads_eliminate]: 2.94001e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 8.784e-05 [accelerated_algorithm]: 1.016e-05 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 6.85998e-06 [merge_send_recv]: 5.86e-06 [auto_parallel]: 7.6e-06 [parallel]: 5.07e-06 [flash_sp]: 9.89996e-07 [merge_comm]: 4.06001e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 6.44001e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 7.63001e-06 [virtual_dataset]: 6.54001e-06 [get_grad_eliminate_]: 6.18998e-06 [virtual_output]: 6.16998e-06 [merge_forward]: 3.83999e-06 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 8.87e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.391e-05 [merge_recompute_call_nodes]: 9.89996e-07 [before_grad]: 1.108e-05 [set_forward_comm_id_for_comm_node_pass]: 3.90998e-06 [meta_fg_expand]: 2.56998e-06 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 9.96998e-06 [a_after_grad]: 9.64999e-06 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 1.34998e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 8.68001e-06 [cse]: 1.856e-05 [a_3]: 4.074e-05 [py_interpret_to_execute_after_opt_a]: 1.13e-05 [slice_cell_reuse_recomputed_activation]: 1.94999e-06 [rewriter_after_opt_a]: 4.31e-05 [convert_after_rewriter]: 8.15e-06 [order_py_execute_after_rewriter]: 5.74999e-06 [mutable_eliminate]: 0.00056175 [opt_b]: 0.00022626, [1] [Cycle 1]: 0.00021877, [7] [b_1]: 0.00013653 [b_2]: 8.84e-06 [updatestate_depend_eliminate]: 6.57002e-06 [updatestate_assign_eliminate]: 3.21999e-06 [updatestate_loads_eliminate]: 2.73998e-06 [renormalize]: 4.69998e-07 [cse]: 2.353e-05 [optimize_parallel_all_gather_comm]: 2.631e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.381e-05 [loop_unroll]: 0.0004458 [opt_after_cconv]: 0.00011585, [1] [Cycle 1]: 0.0001093, [7] [c_1]: 3.424e-05 [parameter_eliminate]: 2.94999e-06 [updatestate_depend_eliminate]: 6.43e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 2.96001e-06 [cse]: 2.29e-05 [renormalize]: 6.00005e-07 [remove_dup_value]: 1.633e-05 [tuple_transform]: 8.142e-05, [1] [Cycle 1]: 7.673e-05, [4] [d_1]: 4.854e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 7.31001e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 5.496e-05 [cse_after_recomputation]: 2.582e-05, [1] [Cycle 1]: 2.132e-05, [1] [cse]: 1.546e-05 [environ_conv]: 9.29e-06 [swap_dp_allreduce_reducescatter]: 6.19001e-06 [bias_add_comm_swap]: 2.62001e-06 [label_micro_interleaved_index]: 4.85001e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.47001e-06 [slice_recompute_activation]: 2.14999e-06 [micro_interleaved_order_control]: 2.43002e-06 [assign_add_opt]: 1.43002e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.49001e-06 [reorder_send_recv_between_fp_bp]: 2.93e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.35001e-06 [interleave_parallel_branches]: 1.12999e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.398e-05 [grouped_pairwise_exchange_alltoall]: 1.99e-06 [offloading_packed_experts]: 4.82e-06 [overlap_recompute_and_grad_model_parallel]: 5.66e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.59001e-06 [overlap_grad_ring_attention]: 5.14998e-06 [overlap_grad_flash_sp]: 2.236e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.29999e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.05999e-06 [symbol_engine_optimizer]: 9.515e-05, [1] [Cycle 1]: 9.016e-05, [6] [build]: 1.028e-05 [elim_shapecalc]: 1.182e-05 [elim_not_effective]: 1.698e-05 [opt_reshape]: 7.3e-06 [fold_const_symbol]: 1.183e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.03002e-06 [pipeline_parallel_scheduler]: 1.92999e-06 [auto_monad_reorder]: 2.167e-05 [get_jit_bprop_graph]: 1.12e-06 [rewriter_after_jit_bprop_graph]: 3.90998e-06 [opt_after_jit_grad]: 0.00049668 [validate]: 4.486e-05 [backend_pass]: 8.29983e-07 [task_emit]: 0.0843481 [execute]: 9.22001e-06 Sums bootstrap : 0.000489s : 0.43% type_inference : 0.012066s : 10.60% event_method : 0.000047s : 0.04% auto_monad : 0.000130s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000048s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.04% optimize.rewriter_before_opt_a : 0.000150s : 0.13% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000135s : 0.12% optimize.opt_a.loop_unroll : 0.000117s : 0.10% optimize.opt_a.a_1 : 0.003068s : 2.70% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000040s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000428s : 0.38% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000031s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000019s : 0.02% optimize.opt_a.allreduce_fusion : 0.000016s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000040s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000036s : 0.03% optimize.opt_a.virtual_dataset : 0.000030s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000029s : 0.03% optimize.opt_a.virtual_output : 0.000028s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000058s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000053s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.02% optimize.opt_a.meta_fg_expand : 0.001678s : 1.47% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.08% optimize.opt_a.a_after_grad : 0.000111s : 0.10% optimize.opt_a.renormalize : 0.006951s : 6.11% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.01% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000076s : 0.07% optimize.opt_a.cse : 0.000239s : 0.21% optimize.opt_a.a_3 : 0.000463s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000043s : 0.04% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000562s : 0.49% optimize.opt_b.b_1 : 0.000137s : 0.12% optimize.opt_b.b_2 : 0.000009s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000024s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.02% optimize.loop_unroll : 0.000446s : 0.39% optimize.opt_after_cconv.c_1 : 0.000034s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000023s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.01% optimize.tuple_transform.d_1 : 0.000049s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000055s : 0.05% optimize.cse_after_recomputation.cse : 0.000015s : 0.01% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000022s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000017s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000497s : 0.44% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.084348s : 74.13% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000724 159 6.92% : 0.000050s : 7: substitution.arithmetic_simplify 0.34% : 0.000002s : 3: substitution.elim_not_effective 0.69% : 0.000005s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 3: substitution.fold_const_symbol 0.90% : 0.000007s : 4: substitution.graph_param_transform 0.44% : 0.000003s : 2: substitution.incorporate_call 0.35% : 0.000003s : 2: substitution.incorporate_call_switch 57.93% : 0.000420s : 17: substitution.inline 2.38% : 0.000017s : 2: substitution.inline_without_move 1.36% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.24% : 0.000016s : 3: substitution.less_batch_normalization 1.58% : 0.000011s : 7: substitution.minmaximum_grad 0.86% : 0.000006s : 5: substitution.partial_eliminate 1.75% : 0.000013s : 15: substitution.remove_not_recompute_node 3.96% : 0.000029s : 10: substitution.replace_applicator 1.50% : 0.000011s : 10: substitution.replace_old_param 0.38% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.01% : 0.000022s : 7: substitution.tuple_list_convert_item_index_to_positive 1.49% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.95% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.14% : 0.000052s : 18: substitution.tuple_list_get_item_eliminator 2.05% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011991 2 86.10% : 0.010325s : 1: type_inference.infer 13.90% : 0.001667s : 1: type_inference.specialize ------[replace.] 0.000208 26 65.06% : 0.000135s : 17: replace.inline 34.94% : 0.000073s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000434 26 94.38% : 0.000410s : 17: match.inline 5.62% : 0.000024s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000776 4180 1.01% : 0.000008s : 52: predicate.accumulaten_eliminater 0.21% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.42% : 0.000003s : 21: predicate.addn_check_dump 1.03% : 0.000008s : 52: predicate.addn_zero_filter 0.98% : 0.000008s : 52: predicate.adjust_all_reduce_mul_add 1.90% : 0.000015s : 73: predicate.arithmetic_simplify 1.05% : 0.000008s : 52: predicate.cast_eliminate 1.09% : 0.000008s : 50: predicate.check_bprop_eliminate 0.42% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.42% : 0.000003s : 21: predicate.depend_value_elim 1.10% : 0.000009s : 52: predicate.dict_get_item_const_eliminator 1.08% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.04% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 4: predicate.elim_not_effective 0.13% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000009s : 56: predicate.environ_add_const_eliminate 1.08% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.08% : 0.000008s : 56: predicate.environ_get_depend_swap 1.50% : 0.000012s : 77: predicate.environ_get_eliminate 1.11% : 0.000009s : 56: predicate.environ_get_set_eliminate 1.64% : 0.000013s : 78: predicate.exchange_switch_depend_value 2.32% : 0.000018s : 78: predicate.float_depend_g_call 0.41% : 0.000003s : 21: predicate.float_environ_get_switch 0.53% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.50% : 0.000004s : 21: predicate.get_grad_eliminate 0.06% : 0.000000s : 4: predicate.graph_param_transform 0.45% : 0.000004s : 21: predicate.incorporate_call 0.41% : 0.000003s : 21: predicate.incorporate_call_switch 5.30% : 0.000041s : 180: predicate.inline 1.27% : 0.000010s : 45: predicate.inline_without_move 0.26% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.56% : 0.000004s : 21: predicate.less_batch_normalization 9.72% : 0.000075s : 69: predicate.list_to_tuple_eliminator_ 2.38% : 0.000019s : 121: predicate.load_eliminater 0.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.40% : 0.000019s : 110: predicate.loop_unroll_before_grad 1.22% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.43% : 0.000003s : 21: predicate.merge_addn 1.03% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.09% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.04% : 0.000008s : 52: predicate.minmaximum_grad 0.31% : 0.000002s : 4: predicate.mutable_eliminate 0.09% : 0.000001s : 4: predicate.opt_reshape 0.14% : 0.000001s : 4: predicate.parallel_virtual_node 1.92% : 0.000015s : 78: predicate.partial_defer_inline 1.56% : 0.000012s : 65: predicate.partial_eliminate 1.03% : 0.000008s : 52: predicate.print_const_string_wrapper 0.44% : 0.000003s : 21: predicate.reduce_all_const_elim 1.31% : 0.000010s : 52: predicate.reduce_eliminate 2.38% : 0.000019s : 121: predicate.redundant_stop_gradient_eliminater 0.28% : 0.000002s : 21: predicate.remove_not_recompute_node 1.72% : 0.000013s : 111: predicate.replace_applicator 0.60% : 0.000005s : 45: predicate.replace_old_param 0.06% : 0.000000s : 4: predicate.reset_defer_inline 1.06% : 0.000008s : 52: predicate.reshape_eliminate 1.12% : 0.000009s : 50: predicate.row_tensor_add_zeros_like 0.10% : 0.000001s : 4: predicate.row_tensor_eliminate 1.23% : 0.000010s : 50: predicate.same_eliminate 0.30% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.53% : 0.000004s : 21: predicate.shard_identity_eliminate 0.24% : 0.000002s : 8: predicate.special_op_eliminate 0.55% : 0.000004s : 21: predicate.specialize_transform 1.17% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.10% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.79% : 0.000014s : 78: predicate.switch_defer_inline 2.88% : 0.000022s : 128: predicate.switch_layer_defer_inline 4.81% : 0.000037s : 213: predicate.switch_simplify 1.03% : 0.000008s : 52: predicate.tile_eliminate 1.01% : 0.000008s : 52: predicate.transpose_eliminate 1.30% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.42% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.26% : 0.000010s : 60: predicate.tuple_list_get_item_depend_reorder 2.51% : 0.000019s : 90: predicate.tuple_list_get_item_eliminator 1.35% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 1.91% : 0.000015s : 81: predicate.tuple_list_set_item_eliminator 1.45% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.34% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 2.81% : 0.000022s : 142: predicate.updatestate_useless_node_eliminater 0.10% : 0.000001s : 4: predicate.value_based_eliminate 0.46% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.45% : 0.000004s : 21: predicate.virtual_output_eliminate 0.08% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001960 35 57.20% : 0.001121s : 14: func_graph_cloner_run.FuncGraphClonerGraph 42.80% : 0.000839s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.150373 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.09% : 0.003141s : 1: add_attr 2.08% : 0.003132s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000138s : 1: auto_monad 0.02% : 0.000026s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.35% : 0.000525s : 1: bootstrap 0.02% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000029s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000054s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.30% : 0.000454s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000572s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000016s : 1: opt.transform.mutable_eliminate 3.08% : 0.004628s : 117: opt.transform.opt_a 0.02% : 0.000033s : 1: opt.transform.opt_after_cconv 0.02% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000116s : 28: opt.transform.opt_b 0.04% : 0.000054s : 2: opt.transform.opt_trans_graph 0.03% : 0.000043s : 4: opt.transform.symbol_engine_opt 9.91% : 0.014905s : 1: opt_a 0.08% : 0.000119s : 1: opt_after_cconv 0.34% : 0.000507s : 1: opt_after_jit_grad 0.15% : 0.000230s : 1: opt_b 11.43% : 0.017186s : 1: optimize 0.02% : 0.000031s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000053s : 1: pre_auto_parallel 0.03% : 0.000045s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000020s : 1: remove_dup_value 3.50% : 0.005260s : 2: renormalize.infer 1.11% : 0.001675s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000047s : 1: rewriter_after_opt_a 0.10% : 0.000155s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000098s : 1: symbol_engine_optimizer 56.11% : 0.084372s : 1: task_emit 0.06% : 0.000084s : 1: tuple_transform 8.03% : 0.012081s : 1: type_inference 0.05% : 0.000070s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x2-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x2-ge],max_mem:12.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x3-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x3-pynative],max_mem:12.0M TotalTime = 0.0226829, [24] [bootstrap]: 0.00060133 [type_inference]: 0.00664808 [event_method]: 1.43e-05 [auto_monad]: 6.183e-05 [graph_reusing]: 5.24e-06 [inline]: 1.84e-06 [add_attr]: 0.00365205, [1] [add_attr_with_inline]: 0.00364125, [1] [Cycle 1]: 4.803e-05, [2] [tag_attr]: 1.536e-05 [meta_addattr_fg_expand]: 5.09e-06 [parallel-infer-symbol]: 2.97002e-06 [pre_auto_parallel]: 2.695e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.38998e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00415682, [53] [py_interpret_to_execute]: 2.129e-05 [rewriter_before_opt_a]: 6.346e-05 [opt_a]: 0.00221513, [2] [Cycle 1]: 0.00159683, [45] [expand_dump_flag]: 2.58e-06 [switch_simplify]: 3.297e-05 [loop_unroll]: 2.096e-05 [a_1]: 0.00045309 [with_stream_mark]: 1.406e-05 [recompute_prepare]: 9.04e-06 [updatestate_depend_eliminate]: 4.18001e-06 [updatestate_assign_eliminate]: 3.71001e-06 [updatestate_loads_eliminate]: 3.4e-06 [parameter_eliminate]: 1.79998e-06 [a_2]: 8.14e-05 [accelerated_algorithm]: 6.64999e-06 [shard]: 2.29001e-06 [meta_shard_fg_expand]: 1.83002e-06 [shard_inline]: 6.54001e-06 [merge_send_recv]: 8.38999e-06 [auto_parallel]: 6.14001e-06 [parallel]: 2.701e-05 [flash_sp]: 7.36999e-06 [merge_comm]: 3.97002e-06 [allreduce_fusion]: 3.78999e-06 [matmul_add_comm_reduction]: 9.42001e-06 [allreduce_slice_to_reducescatter]: 6.70028e-07 [virtual_shard_identity]: 7.56999e-06 [virtual_dataset]: 6.58e-06 [get_grad_eliminate_]: 5.77001e-06 [virtual_output]: 6.06e-06 [merge_forward]: 4.43001e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.04e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.182e-05 [merge_recompute_call_nodes]: 1.49998e-06 [before_grad]: 1.025e-05 [set_forward_comm_id_for_comm_node_pass]: 3.85998e-06 [meta_fg_expand]: 2.93e-06 [flash_sp_send_recv_attached]: 2.91e-06 [receive_attached]: 2.56e-06 [after_resolve]: 9.39e-06 [a_after_grad]: 9.07001e-06 [renormalize]: 0.00045579 [add_forward_monad_depend]: 8.40999e-06 [auto_monad_grad]: 1.94e-06 [auto_monad_eliminator]: 1.422e-05 [cse]: 2.876e-05 [a_3]: 4.214e-05 [Cycle 2]: 0.00060902, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 7.10002e-06 [loop_unroll]: 5.81e-06 [a_1]: 0.00011626 [with_stream_mark]: 1.036e-05 [recompute_prepare]: 5.90002e-06 [updatestate_depend_eliminate]: 3.05998e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 7.149e-05 [accelerated_algorithm]: 5.87999e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.67999e-06 [merge_send_recv]: 4.57998e-06 [auto_parallel]: 5.50001e-06 [parallel]: 4.28999e-06 [flash_sp]: 3.36999e-06 [merge_comm]: 3.25e-06 [allreduce_fusion]: 2.83e-06 [matmul_add_comm_reduction]: 5.52999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.78998e-06 [virtual_dataset]: 5.59e-06 [get_grad_eliminate_]: 5.39998e-06 [virtual_output]: 5.13002e-06 [merge_forward]: 2.93e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 5.99999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.041e-05 [merge_recompute_call_nodes]: 7.60017e-07 [before_grad]: 8.83001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51999e-06 [meta_fg_expand]: 1.77999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 8.48999e-06 [a_after_grad]: 7.91001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 6.17001e-06 [cse]: 1.754e-05 [a_3]: 3.33e-05 [py_interpret_to_execute_after_opt_a]: 8.23001e-06 [slice_cell_reuse_recomputed_activation]: 2.39001e-06 [rewriter_after_opt_a]: 3.098e-05 [convert_after_rewriter]: 6.94001e-06 [order_py_execute_after_rewriter]: 5.48002e-06 [mutable_eliminate]: 0.00046897 [opt_b]: 0.00019131, [1] [Cycle 1]: 0.00018524, [7] [b_1]: 0.00011097 [b_2]: 7.15998e-06 [updatestate_depend_eliminate]: 5.71003e-06 [updatestate_assign_eliminate]: 2.91999e-06 [updatestate_loads_eliminate]: 2.31998e-06 [renormalize]: 4.50003e-07 [cse]: 1.861e-05 [optimize_parallel_all_gather_comm]: 1.61e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.311e-05 [loop_unroll]: 0.00043378 [opt_after_cconv]: 9.835e-05, [1] [Cycle 1]: 9.27e-05, [7] [c_1]: 2.642e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 5.37999e-06 [updatestate_assign_eliminate]: 2.76e-06 [updatestate_loads_eliminate]: 2.67001e-06 [cse]: 1.79e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.51e-05 [tuple_transform]: 6.815e-05, [1] [Cycle 1]: 6.385e-05, [4] [d_1]: 3.685e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 7.11001e-06 [partial_unused_args_eliminate]: 1.73002e-06 [add_recomputation]: 4.977e-05 [cse_after_recomputation]: 2.472e-05, [1] [Cycle 1]: 2.025e-05, [1] [cse]: 1.37e-05 [environ_conv]: 8.29998e-06 [swap_dp_allreduce_reducescatter]: 5.59e-06 [bias_add_comm_swap]: 2.87002e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.65001e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 1.05999e-06 [remove_cast_before_assign_add]: 1.18001e-06 [full_micro_interleaved_order_control]: 2.16998e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.15001e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.24998e-06 [interleave_parallel_branches]: 1.39e-06 [overlap_opt_shard_in_pipeline]: 1.33002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77999e-06 [control_data_broadcast_order]: 1.268e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 3.85e-06 [overlap_recompute_and_grad_model_parallel]: 4.80001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.26e-06 [overlap_grad_ring_attention]: 4.56002e-06 [overlap_grad_flash_sp]: 1.789e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 1.32e-06 [symbol_engine_optimizer]: 7.336e-05, [1] [Cycle 1]: 6.856e-05, [6] [build]: 2.46e-06 [elim_shapecalc]: 9.19e-06 [elim_not_effective]: 1.265e-05 [opt_reshape]: 6.54999e-06 [fold_const_symbol]: 9.32001e-06 [renormalize]: 1.60013e-07 [detach_backward]: 1.91e-06 [pipeline_parallel_scheduler]: 1.56002e-06 [auto_monad_reorder]: 1.647e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 0.00014614 [opt_after_jit_grad]: 0.00047088 [validate]: 3.497e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.00660959 [execute]: 7.11001e-06 Sums bootstrap : 0.000601s : 3.34% type_inference : 0.006648s : 36.93% event_method : 0.000014s : 0.08% auto_monad : 0.000062s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000063s : 0.35% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000569s : 3.16% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000015s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000153s : 0.85% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.06% optimize.opt_a.parallel : 0.000031s : 0.17% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000456s : 2.53% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.05% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000046s : 0.26% optimize.opt_a.a_3 : 0.000075s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000469s : 2.61% optimize.opt_b.b_1 : 0.000111s : 0.62% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000434s : 2.41% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.08% optimize.tuple_transform.d_1 : 0.000037s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.28% optimize.cse_after_recomputation.cse : 0.000014s : 0.08% optimize.environ_conv : 0.000008s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000146s : 0.81% opt_after_jit_grad : 0.000471s : 2.62% validate : 0.000035s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006610s : 36.71% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000171 26 19.26% : 0.000033s : 5: substitution.arithmetic_simplify 1.18% : 0.000002s : 2: substitution.elim_not_effective 0.84% : 0.000001s : 2: substitution.fold_const_symbol 3.09% : 0.000005s : 3: substitution.graph_param_transform 64.42% : 0.000110s : 3: substitution.inline 1.87% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.55% : 0.000004s : 4: substitution.remove_not_recompute_node 1.77% : 0.000003s : 2: substitution.replace_old_param 5.04% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006595 2 89.35% : 0.005892s : 1: type_inference.infer 10.65% : 0.000703s : 1: type_inference.specialize ------[replace.] 0.000038 4 78.59% : 0.000030s : 3: replace.inline 21.41% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 4 93.20% : 0.000108s : 3: match.inline 6.80% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 883 1.09% : 0.000002s : 9: predicate.accumulaten_eliminater 0.84% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 6: predicate.addn_check_dump 0.89% : 0.000001s : 9: predicate.addn_zero_filter 0.83% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 15: predicate.arithmetic_simplify 0.93% : 0.000001s : 9: predicate.cast_eliminate 0.63% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.60% : 0.000001s : 6: predicate.depend_value_elim 0.90% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.21% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 12: predicate.environ_get_depend_swap 1.77% : 0.000003s : 18: predicate.environ_get_eliminate 1.15% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.36% : 0.000004s : 13: predicate.float_depend_g_call 0.54% : 0.000001s : 6: predicate.float_environ_get_switch 0.82% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.67% : 0.000001s : 6: predicate.get_grad_eliminate 0.20% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.55% : 0.000001s : 6: predicate.incorporate_call_switch 6.25% : 0.000010s : 40: predicate.inline 0.93% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 6: predicate.less_batch_normalization 1.73% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.46% : 0.000004s : 25: predicate.load_eliminater 1.01% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 6: predicate.merge_addn 0.62% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 9: predicate.minmaximum_grad 1.22% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.35% : 0.000001s : 3: predicate.parallel_virtual_node 1.55% : 0.000002s : 13: predicate.partial_defer_inline 1.44% : 0.000002s : 13: predicate.partial_eliminate 0.91% : 0.000001s : 9: predicate.print_const_string_wrapper 0.61% : 0.000001s : 6: predicate.reduce_all_const_elim 1.18% : 0.000002s : 9: predicate.reduce_eliminate 2.60% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.28% : 0.000002s : 16: predicate.replace_applicator 0.62% : 0.000001s : 6: predicate.replace_old_param 0.25% : 0.000000s : 3: predicate.reset_defer_inline 0.93% : 0.000001s : 9: predicate.reshape_eliminate 0.60% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 3: predicate.row_tensor_eliminate 0.81% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 6: predicate.shard_identity_eliminate 0.72% : 0.000001s : 6: predicate.special_op_eliminate 0.85% : 0.000001s : 6: predicate.specialize_transform 0.94% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.75% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 13: predicate.switch_defer_inline 2.06% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.07% : 0.000008s : 43: predicate.switch_simplify 0.93% : 0.000001s : 9: predicate.tile_eliminate 0.93% : 0.000001s : 9: predicate.transpose_eliminate 1.68% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.51% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.53% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 21: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.67% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000456 8 41.69% : 0.000190s : 3: func_graph_cloner_run.FuncGraphClonerGraph 58.31% : 0.000266s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032027 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.42% : 0.003657s : 1: add_attr 11.38% : 0.003645s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000067s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.00% : 0.000642s : 1: bootstrap 0.08% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000028s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000012s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.38% : 0.000443s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.49% : 0.000478s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000014s : 1: opt.transform.mutable_eliminate 2.95% : 0.000944s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000089s : 28: opt.transform.opt_b 0.13% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 6.93% : 0.002218s : 1: opt_a 0.32% : 0.000102s : 1: opt_after_cconv 1.50% : 0.000481s : 1: opt_after_jit_grad 0.61% : 0.000195s : 1: opt_b 12.99% : 0.004161s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.70% : 0.000225s : 1: renormalize.infer 0.70% : 0.000224s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.47% : 0.000152s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000035s : 1: rewriter_after_opt_a 0.21% : 0.000068s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000076s : 1: symbol_engine_optimizer 20.67% : 0.006621s : 1: task_emit 0.22% : 0.000071s : 1: tuple_transform 20.80% : 0.006663s : 1: type_inference 0.20% : 0.000063s : 1: validate TotalTime = 0.0207827, [24] [bootstrap]: 0.00048693 [type_inference]: 0.00602917 [event_method]: 1.282e-05 [auto_monad]: 6.157e-05 [graph_reusing]: 5.19998e-06 [inline]: 1.98997e-06 [add_attr]: 0.00312031, [1] [add_attr_with_inline]: 0.00311212, [1] [Cycle 1]: 5.674e-05, [2] [tag_attr]: 1.54e-05 [meta_addattr_fg_expand]: 4.55001e-06 [parallel-infer-symbol]: 3.06001e-06 [pre_auto_parallel]: 2.676e-05 [insert-virtual-dataset]: 2.85998e-06 [parallel-infer-symbol-second]: 8.59989e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.72999e-06 [optimize]: 0.00405704, [53] [py_interpret_to_execute]: 2.16e-05 [rewriter_before_opt_a]: 5.26e-05 [opt_a]: 0.00214224, [2] [Cycle 1]: 0.0014854, [45] [expand_dump_flag]: 2.82002e-06 [switch_simplify]: 2.939e-05 [loop_unroll]: 1.757e-05 [a_1]: 0.00036611 [with_stream_mark]: 1.593e-05 [recompute_prepare]: 9.08002e-06 [updatestate_depend_eliminate]: 4.02e-06 [updatestate_assign_eliminate]: 3.78999e-06 [updatestate_loads_eliminate]: 3.37002e-06 [parameter_eliminate]: 1.91e-06 [a_2]: 8.455e-05 [accelerated_algorithm]: 6.80998e-06 [shard]: 1.91998e-06 [meta_shard_fg_expand]: 1.73002e-06 [shard_inline]: 6.54001e-06 [merge_send_recv]: 8.50001e-06 [auto_parallel]: 6.28e-06 [parallel]: 1.923e-05 [flash_sp]: 7.76001e-06 [merge_comm]: 3.92002e-06 [allreduce_fusion]: 3.96001e-06 [matmul_add_comm_reduction]: 9.04998e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 8.03001e-06 [virtual_dataset]: 6.29999e-06 [get_grad_eliminate_]: 5.67999e-06 [virtual_output]: 6.07001e-06 [merge_forward]: 4.67e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.84999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.233e-05 [merge_recompute_call_nodes]: 1.59998e-06 [before_grad]: 1.099e-05 [set_forward_comm_id_for_comm_node_pass]: 4.12e-06 [meta_fg_expand]: 2.86e-06 [flash_sp_send_recv_attached]: 2.34999e-06 [receive_attached]: 1.94e-06 [after_resolve]: 9.99001e-06 [a_after_grad]: 9.05999e-06 [renormalize]: 0.0004384 [add_forward_monad_depend]: 5.46998e-06 [auto_monad_grad]: 1.81998e-06 [auto_monad_eliminator]: 1.421e-05 [cse]: 2.891e-05 [a_3]: 4.244e-05 [Cycle 2]: 0.00064719, [45] [expand_dump_flag]: 8.60018e-07 [switch_simplify]: 7.37997e-06 [loop_unroll]: 5.82001e-06 [a_1]: 0.00011722 [with_stream_mark]: 1.259e-05 [recompute_prepare]: 6.31e-06 [updatestate_depend_eliminate]: 3.09001e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.92002e-06 [parameter_eliminate]: 1.14e-06 [a_2]: 7.414e-05 [accelerated_algorithm]: 6.05002e-06 [shard]: 1.54e-06 [meta_shard_fg_expand]: 1.33002e-06 [shard_inline]: 5.86998e-06 [merge_send_recv]: 4.56002e-06 [auto_parallel]: 6.09001e-06 [parallel]: 4.27e-06 [flash_sp]: 3.6e-06 [merge_comm]: 3.23e-06 [allreduce_fusion]: 2.86999e-06 [matmul_add_comm_reduction]: 5.40001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.09001e-06 [virtual_dataset]: 5.35999e-06 [get_grad_eliminate_]: 5.25999e-06 [virtual_output]: 5.17e-06 [merge_forward]: 2.79001e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 6.18998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.094e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 8.87999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.66001e-06 [meta_fg_expand]: 2.01e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 8.79983e-07 [after_resolve]: 8.47e-06 [a_after_grad]: 8.10999e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.25999e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.44999e-06 [cse]: 1.415e-05 [a_3]: 3.34e-05 [py_interpret_to_execute_after_opt_a]: 8.06001e-06 [slice_cell_reuse_recomputed_activation]: 2.24001e-06 [rewriter_after_opt_a]: 3.466e-05 [convert_after_rewriter]: 6.84999e-06 [order_py_execute_after_rewriter]: 5.64e-06 [mutable_eliminate]: 0.00048603 [opt_b]: 0.00018944, [1] [Cycle 1]: 0.0001831, [7] [b_1]: 0.00011137 [b_2]: 7.45e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.36998e-06 [renormalize]: 4.19997e-07 [cse]: 1.796e-05 [optimize_parallel_all_gather_comm]: 1.629e-05 [overlap_param_gather]: 1.97001e-06 [cconv]: 2.293e-05 [loop_unroll]: 0.00042305 [opt_after_cconv]: 9.55e-05, [1] [Cycle 1]: 8.987e-05, [7] [c_1]: 2.657e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.49999e-06 [cse]: 1.668e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 1.565e-05 [tuple_transform]: 6.869e-05, [1] [Cycle 1]: 6.446e-05, [4] [d_1]: 3.747e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 2.9002e-07 [switch_simplify]: 6.51999e-06 [partial_unused_args_eliminate]: 1.88997e-06 [add_recomputation]: 4.333e-05 [cse_after_recomputation]: 2.145e-05, [1] [Cycle 1]: 1.679e-05, [1] [cse]: 1.117e-05 [environ_conv]: 5.24998e-06 [swap_dp_allreduce_reducescatter]: 4.97999e-06 [bias_add_comm_swap]: 2.69001e-06 [label_micro_interleaved_index]: 4.80001e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.59e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.47001e-06 [assign_add_opt]: 1.59e-06 [ForceFp32Comm]: 1.05999e-06 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.45997e-06 [reorder_send_recv_between_fp_bp]: 2.89999e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 1.19e-06 [interleave_split_concat_branches]: 1.24998e-06 [interleave_parallel_branches]: 1.09998e-06 [overlap_opt_shard_in_pipeline]: 1.21002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76003e-06 [control_data_broadcast_order]: 1.305e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.12003e-06 [overlap_recompute_and_grad_model_parallel]: 5.24e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.52001e-06 [overlap_grad_ring_attention]: 4.54998e-06 [overlap_grad_flash_sp]: 1.725e-05 [begin_end_overlap_inline]: 7.39994e-07 [split_matmul_comm_elemetwise]: 2.21998e-06 [split_layernorm_comm]: 2.07999e-06 [handle_group_info]: 1.16997e-06 [symbol_engine_optimizer]: 7.297e-05, [1] [Cycle 1]: 6.868e-05, [6] [build]: 2.39999e-06 [elim_shapecalc]: 8.94e-06 [elim_not_effective]: 1.267e-05 [opt_reshape]: 6.59999e-06 [fold_const_symbol]: 9.30001e-06 [renormalize]: 2.20025e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.55e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.59002e-06 [opt_after_jit_grad]: 0.00045588 [validate]: 3.478e-05 [backend_pass]: 1.10001e-06 [task_emit]: 0.00623969 [execute]: 7.77e-06 Sums bootstrap : 0.000487s : 2.93% type_inference : 0.006029s : 36.28% event_method : 0.000013s : 0.08% auto_monad : 0.000062s : 0.37% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.13% optimize.rewriter_before_opt_a : 0.000053s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000037s : 0.22% optimize.opt_a.loop_unroll : 0.000023s : 0.14% optimize.opt_a.a_1 : 0.000483s : 2.91% optimize.opt_a.with_stream_mark : 0.000029s : 0.17% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000159s : 0.96% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000024s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000020s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000438s : 2.64% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000043s : 0.26% optimize.opt_a.a_3 : 0.000076s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000035s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000486s : 2.92% optimize.opt_b.b_1 : 0.000111s : 0.67% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000423s : 2.55% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.26% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000456s : 2.74% validate : 0.000035s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006240s : 37.55% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000145 24 21.08% : 0.000031s : 4: substitution.arithmetic_simplify 1.41% : 0.000002s : 2: substitution.elim_not_effective 0.93% : 0.000001s : 2: substitution.fold_const_symbol 3.66% : 0.000005s : 3: substitution.graph_param_transform 65.36% : 0.000095s : 3: substitution.inline 2.20% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.21% : 0.000005s : 4: substitution.remove_not_recompute_node 2.14% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005984 2 92.03% : 0.005507s : 1: type_inference.infer 7.97% : 0.000477s : 1: type_inference.specialize ------[replace.] 0.000029 3 100.00% : 0.000029s : 3: replace.inline ------[match.] 0.000093 3 100.00% : 0.000093s : 3: match.inline ------[predicate.] 0.000151 815 0.88% : 0.000001s : 8: predicate.accumulaten_eliminater 1.00% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 6: predicate.addn_check_dump 0.89% : 0.000001s : 8: predicate.addn_zero_filter 0.78% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.31% : 0.000003s : 14: predicate.arithmetic_simplify 0.93% : 0.000001s : 8: predicate.cast_eliminate 0.70% : 0.000001s : 6: predicate.check_bprop_eliminate 0.66% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.70% : 0.000001s : 6: predicate.depend_value_elim 0.91% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.01% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_depend_swap 1.79% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.20% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 6: predicate.float_environ_get_switch 0.91% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.26% : 0.000000s : 3: predicate.graph_param_transform 0.76% : 0.000001s : 6: predicate.incorporate_call 0.70% : 0.000001s : 6: predicate.incorporate_call_switch 6.33% : 0.000010s : 37: predicate.inline 0.94% : 0.000001s : 6: predicate.inline_without_move 0.43% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 6: predicate.less_batch_normalization 1.56% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.27% : 0.000003s : 22: predicate.load_eliminater 1.07% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.96% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.64% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 6: predicate.merge_addn 0.66% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 8: predicate.minmaximum_grad 1.05% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.49% : 0.000002s : 11: predicate.partial_defer_inline 1.36% : 0.000002s : 11: predicate.partial_eliminate 0.80% : 0.000001s : 8: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.21% : 0.000002s : 8: predicate.reduce_eliminate 2.29% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.62% : 0.000001s : 6: predicate.remove_not_recompute_node 1.27% : 0.000002s : 14: predicate.replace_applicator 0.76% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.85% : 0.000001s : 8: predicate.reshape_eliminate 0.68% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.69% : 0.000001s : 3: predicate.row_tensor_eliminate 0.86% : 0.000001s : 6: predicate.same_eliminate 0.52% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 6: predicate.shard_identity_eliminate 0.87% : 0.000001s : 6: predicate.special_op_eliminate 0.89% : 0.000001s : 6: predicate.specialize_transform 0.95% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.28% : 0.000002s : 11: predicate.switch_defer_inline 2.01% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.75% : 0.000007s : 38: predicate.switch_simplify 0.91% : 0.000001s : 8: predicate.tile_eliminate 0.87% : 0.000001s : 8: predicate.transpose_eliminate 1.56% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.69% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 2.98% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.62% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.18% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.94% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.84% : 0.000001s : 3: predicate.value_based_eliminate 0.75% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000296 7 36.94% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.06% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029411 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.63% : 0.003125s : 1: add_attr 10.59% : 0.003116s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.23% : 0.000067s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.78% : 0.000524s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.47% : 0.000432s : 1: loop_unroll 0.02% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.69% : 0.000496s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.92% : 0.000858s : 78: opt.transform.opt_a 0.09% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.29% : 0.002145s : 1: opt_a 0.34% : 0.000099s : 1: opt_after_cconv 1.58% : 0.000465s : 1: opt_after_jit_grad 0.65% : 0.000193s : 1: opt_b 13.81% : 0.004061s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000031s : 1: pre_auto_parallel 0.09% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.78% : 0.000230s : 1: renormalize.infer 0.68% : 0.000201s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000039s : 1: rewriter_after_opt_a 0.19% : 0.000057s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000076s : 1: symbol_engine_optimizer 21.25% : 0.006250s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.56% : 0.006046s : 1: type_inference 0.22% : 0.000064s : 1: validate TotalTime = 0.0210474, [24] [bootstrap]: 0.00047986 [type_inference]: 0.00582971 [event_method]: 1.449e-05 [auto_monad]: 6.397e-05 [graph_reusing]: 5.74e-06 [inline]: 2.22999e-06 [add_attr]: 0.00317985, [1] [add_attr_with_inline]: 0.00317146, [1] [Cycle 1]: 5.37e-05, [2] [tag_attr]: 1.639e-05 [meta_addattr_fg_expand]: 4.55001e-06 [parallel-infer-symbol]: 4e-06 [pre_auto_parallel]: 2.853e-05 [insert-virtual-dataset]: 2.53e-06 [parallel-infer-symbol-second]: 8.79983e-07 [dataset_repeat_opt]: 2.17999e-06 [pipeline_split]: 1.81998e-06 [optimize]: 0.00430574, [53] [py_interpret_to_execute]: 2.296e-05 [rewriter_before_opt_a]: 6.565e-05 [opt_a]: 0.00231732, [2] [Cycle 1]: 0.00164256, [45] [expand_dump_flag]: 3.28e-06 [switch_simplify]: 3.502e-05 [loop_unroll]: 2.059e-05 [a_1]: 0.00044859 [with_stream_mark]: 1.448e-05 [recompute_prepare]: 8.95999e-06 [updatestate_depend_eliminate]: 4.47e-06 [updatestate_assign_eliminate]: 3.46999e-06 [updatestate_loads_eliminate]: 3.33e-06 [parameter_eliminate]: 1.74998e-06 [a_2]: 8.182e-05 [accelerated_algorithm]: 6.78e-06 [shard]: 2.07001e-06 [meta_shard_fg_expand]: 1.79998e-06 [shard_inline]: 6.37001e-06 [merge_send_recv]: 8.21002e-06 [auto_parallel]: 6.37001e-06 [parallel]: 1.95e-05 [flash_sp]: 8.05e-06 [merge_comm]: 3.88001e-06 [allreduce_fusion]: 3.58e-06 [matmul_add_comm_reduction]: 8.94e-06 [allreduce_slice_to_reducescatter]: 9.10019e-07 [virtual_shard_identity]: 7.66001e-06 [virtual_dataset]: 6.05002e-06 [get_grad_eliminate_]: 5.62001e-06 [virtual_output]: 6.02999e-06 [merge_forward]: 4.79998e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.006e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.257e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 1.045e-05 [set_forward_comm_id_for_comm_node_pass]: 3.86999e-06 [meta_fg_expand]: 2.96001e-06 [flash_sp_send_recv_attached]: 2.59999e-06 [receive_attached]: 2.53e-06 [after_resolve]: 1.031e-05 [a_after_grad]: 9.11002e-06 [renormalize]: 0.000502 [add_forward_monad_depend]: 5.16998e-06 [auto_monad_grad]: 2.54999e-06 [auto_monad_eliminator]: 1.416e-05 [cse]: 2.945e-05 [a_3]: 4.423e-05 [Cycle 2]: 0.00066433, [45] [expand_dump_flag]: 1.40999e-06 [switch_simplify]: 7.3e-06 [loop_unroll]: 5.72001e-06 [a_1]: 0.00011434 [with_stream_mark]: 1.062e-05 [recompute_prepare]: 6.01e-06 [updatestate_depend_eliminate]: 3.16001e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.63998e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 0.00011276 [accelerated_algorithm]: 6.51e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.28002e-06 [shard_inline]: 6.14999e-06 [merge_send_recv]: 5.30999e-06 [auto_parallel]: 6.59999e-06 [parallel]: 4.61002e-06 [flash_sp]: 3.40998e-06 [merge_comm]: 3.41001e-06 [allreduce_fusion]: 3.16999e-06 [matmul_add_comm_reduction]: 5.99e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 6.41e-06 [virtual_dataset]: 5.60001e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 5.30999e-06 [merge_forward]: 2.81e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 6.93e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.052e-05 [merge_recompute_call_nodes]: 9.5999e-07 [before_grad]: 8.50999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.7e-06 [meta_fg_expand]: 2.06e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 9.29984e-07 [after_resolve]: 8.72998e-06 [a_after_grad]: 7.92e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.62001e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 8.23999e-06 [cse]: 1.985e-05 [a_3]: 3.397e-05 [py_interpret_to_execute_after_opt_a]: 9.34998e-06 [slice_cell_reuse_recomputed_activation]: 2.48e-06 [rewriter_after_opt_a]: 3.606e-05 [convert_after_rewriter]: 6.64001e-06 [order_py_execute_after_rewriter]: 5.43002e-06 [mutable_eliminate]: 0.00050593 [opt_b]: 0.00019102, [1] [Cycle 1]: 0.00018356, [7] [b_1]: 0.00011106 [b_2]: 7.3e-06 [updatestate_depend_eliminate]: 6.48e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.56e-06 [renormalize]: 5.79981e-07 [cse]: 1.762e-05 [optimize_parallel_all_gather_comm]: 1.682e-05 [overlap_param_gather]: 1.92999e-06 [cconv]: 2.454e-05 [loop_unroll]: 0.00043901 [opt_after_cconv]: 9.979e-05, [1] [Cycle 1]: 9.352e-05, [7] [c_1]: 2.616e-05 [parameter_eliminate]: 3.21001e-06 [updatestate_depend_eliminate]: 5.79999e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.49001e-06 [cse]: 1.808e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.605e-05 [tuple_transform]: 7.068e-05, [1] [Cycle 1]: 6.594e-05, [4] [d_1]: 3.868e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.60014e-07 [switch_simplify]: 6.30002e-06 [partial_unused_args_eliminate]: 1.88002e-06 [add_recomputation]: 4.768e-05 [cse_after_recomputation]: 2.267e-05, [1] [Cycle 1]: 1.705e-05, [1] [cse]: 1.158e-05 [environ_conv]: 6.14999e-06 [swap_dp_allreduce_reducescatter]: 5.19e-06 [bias_add_comm_swap]: 2.43002e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 3.28998e-06 [merge_cast_opt]: 1.45999e-06 [slice_recompute_activation]: 2.19999e-06 [micro_interleaved_order_control]: 2.71e-06 [assign_add_opt]: 1.39e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.13998e-06 [reorder_send_recv_between_fp_bp]: 3.01001e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.39998e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.57001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77999e-06 [control_data_broadcast_order]: 1.314e-05 [grouped_pairwise_exchange_alltoall]: 1.56998e-06 [offloading_packed_experts]: 4.73001e-06 [overlap_recompute_and_grad_model_parallel]: 4.89998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 2.81e-06 [overlap_grad_ring_attention]: 4.25999e-06 [overlap_grad_flash_sp]: 1.864e-05 [begin_end_overlap_inline]: 7.49977e-07 [split_matmul_comm_elemetwise]: 2.48e-06 [split_layernorm_comm]: 2.18002e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 7.395e-05, [1] [Cycle 1]: 6.92e-05, [6] [build]: 2.96001e-06 [elim_shapecalc]: 9.35001e-06 [elim_not_effective]: 1.22e-05 [opt_reshape]: 6.40002e-06 [fold_const_symbol]: 9.62999e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.52001e-06 [pipeline_parallel_scheduler]: 1.59e-06 [auto_monad_reorder]: 1.614e-05 [get_jit_bprop_graph]: 1.04003e-06 [rewriter_after_jit_bprop_graph]: 3.72002e-06 [opt_after_jit_grad]: 0.00050728 [validate]: 3.71e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00634014 [execute]: 8e-06 Sums bootstrap : 0.000480s : 2.85% type_inference : 0.005830s : 34.62% event_method : 0.000014s : 0.09% auto_monad : 0.000064s : 0.38% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000029s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000023s : 0.14% optimize.rewriter_before_opt_a : 0.000066s : 0.39% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000042s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000563s : 3.34% optimize.opt_a.with_stream_mark : 0.000025s : 0.15% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000195s : 1.16% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000024s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000008s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000502s : 2.98% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.13% optimize.opt_a.cse : 0.000049s : 0.29% optimize.opt_a.a_3 : 0.000078s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000506s : 3.00% optimize.opt_b.b_1 : 0.000111s : 0.66% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.15% optimize.loop_unroll : 0.000439s : 2.61% optimize.opt_after_cconv.c_1 : 0.000026s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.10% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000006s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000507s : 3.01% validate : 0.000037s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006340s : 37.65% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000176 26 19.87% : 0.000035s : 5: substitution.arithmetic_simplify 1.03% : 0.000002s : 2: substitution.elim_not_effective 0.82% : 0.000001s : 2: substitution.fold_const_symbol 3.35% : 0.000006s : 3: substitution.graph_param_transform 63.04% : 0.000111s : 3: substitution.inline 1.82% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.82% : 0.000005s : 4: substitution.remove_not_recompute_node 2.05% : 0.000004s : 2: substitution.replace_old_param 5.20% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005782 2 89.04% : 0.005149s : 1: type_inference.infer 10.96% : 0.000634s : 1: type_inference.specialize ------[replace.] 0.000039 4 77.66% : 0.000030s : 3: replace.inline 22.34% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 4 92.93% : 0.000109s : 3: match.inline 7.07% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 883 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 1.17% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.88% : 0.000001s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.30% : 0.000004s : 15: predicate.arithmetic_simplify 0.97% : 0.000002s : 9: predicate.cast_eliminate 0.86% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.72% : 0.000001s : 6: predicate.depend_value_elim 0.86% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 9: predicate.dict_set_item_eliminator 0.96% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.27% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_depend_swap 1.97% : 0.000003s : 18: predicate.environ_get_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.39% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.83% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.66% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.69% : 0.000001s : 6: predicate.incorporate_call 0.55% : 0.000001s : 6: predicate.incorporate_call_switch 6.22% : 0.000010s : 40: predicate.inline 0.90% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 6: predicate.less_batch_normalization 1.72% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 25: predicate.load_eliminater 1.21% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.06% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.59% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.59% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.61% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 9: predicate.minmaximum_grad 1.23% : 0.000002s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.43% : 0.000001s : 3: predicate.parallel_virtual_node 1.54% : 0.000003s : 13: predicate.partial_defer_inline 1.46% : 0.000002s : 13: predicate.partial_eliminate 0.88% : 0.000001s : 9: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.22% : 0.000002s : 9: predicate.reduce_eliminate 2.39% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 6: predicate.remove_not_recompute_node 1.21% : 0.000002s : 16: predicate.replace_applicator 0.61% : 0.000001s : 6: predicate.replace_old_param 0.25% : 0.000000s : 3: predicate.reset_defer_inline 0.92% : 0.000001s : 9: predicate.reshape_eliminate 0.58% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 3: predicate.row_tensor_eliminate 0.82% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 6: predicate.shard_identity_eliminate 0.92% : 0.000002s : 6: predicate.special_op_eliminate 0.83% : 0.000001s : 6: predicate.specialize_transform 1.02% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.72% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 13: predicate.switch_defer_inline 1.89% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 43: predicate.switch_simplify 0.89% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.48% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.38% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.03% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 3: predicate.value_based_eliminate 0.68% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 6: predicate.virtual_output_eliminate 0.29% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000373 8 45.35% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.65% : 0.000204s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030172 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.55% : 0.003185s : 1: add_attr 10.52% : 0.003175s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000070s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.71% : 0.000515s : 1: bootstrap 0.09% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000026s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.49% : 0.000448s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.71% : 0.000517s : 1: mutable_eliminate 0.03% : 0.000008s : 1: offloading_packed_experts 0.05% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 3.26% : 0.000984s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.69% : 0.002320s : 1: opt_a 0.34% : 0.000103s : 1: opt_after_cconv 1.71% : 0.000517s : 1: opt_after_jit_grad 0.64% : 0.000194s : 1: opt_b 14.29% : 0.004310s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.09% : 0.000027s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000020s : 1: remove_dup_value 0.87% : 0.000262s : 1: renormalize.infer 0.77% : 0.000232s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000040s : 1: rewriter_after_opt_a 0.23% : 0.000070s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000077s : 1: symbol_engine_optimizer 21.05% : 0.006351s : 1: task_emit 0.24% : 0.000074s : 1: tuple_transform 19.37% : 0.005844s : 1: type_inference 0.22% : 0.000067s : 1: validate TotalTime = 0.0401254, [24] [bootstrap]: 0.00050203 [type_inference]: 0.0118687 [event_method]: 4.911e-05 [auto_monad]: 0.00014216 [graph_reusing]: 8.65001e-06 [inline]: 1.96998e-06 [add_attr]: 0.00315481, [1] [add_attr_with_inline]: 0.0031459, [1] [Cycle 1]: 7.555e-05, [2] [tag_attr]: 3.489e-05 [meta_addattr_fg_expand]: 1.114e-05 [parallel-infer-symbol]: 3.24001e-06 [pre_auto_parallel]: 5.112e-05 [insert-virtual-dataset]: 2.61999e-06 [parallel-infer-symbol-second]: 9.20001e-07 [dataset_repeat_opt]: 2.34999e-06 [pipeline_split]: 1.72001e-06 [optimize]: 0.0169565, [53] [py_interpret_to_execute]: 3.931e-05 [rewriter_before_opt_a]: 0.00015852 [opt_a]: 0.0147955, [3] [Cycle 1]: 0.011265, [45] [expand_dump_flag]: 4.66002e-06 [switch_simplify]: 7.747e-05 [loop_unroll]: 6.429e-05 [a_1]: 0.00149275 [with_stream_mark]: 2.354e-05 [recompute_prepare]: 2.241e-05 [updatestate_depend_eliminate]: 8.63001e-06 [updatestate_assign_eliminate]: 8.15999e-06 [updatestate_loads_eliminate]: 8.00999e-06 [parameter_eliminate]: 2.58998e-06 [a_2]: 0.00024638 [accelerated_algorithm]: 3.152e-05 [shard]: 2.23998e-06 [meta_shard_fg_expand]: 3.71001e-06 [shard_inline]: 1.627e-05 [merge_send_recv]: 1.693e-05 [auto_parallel]: 1.107e-05 [parallel]: 1.862e-05 [flash_sp]: 1.148e-05 [merge_comm]: 9.81e-06 [allreduce_fusion]: 8.84e-06 [matmul_add_comm_reduction]: 2.662e-05 [allreduce_slice_to_reducescatter]: 6.99976e-07 [virtual_shard_identity]: 1.785e-05 [virtual_dataset]: 1.564e-05 [get_grad_eliminate_]: 1.508e-05 [virtual_output]: 1.508e-05 [merge_forward]: 9.28002e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 1.804e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.983e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 2.845e-05 [set_forward_comm_id_for_comm_node_pass]: 1.023e-05 [meta_fg_expand]: 0.00153367 [flash_sp_send_recv_attached]: 3.95e-06 [receive_attached]: 2.51998e-06 [after_resolve]: 6.423e-05 [a_after_grad]: 8.912e-05 [renormalize]: 0.0064051 [add_forward_monad_depend]: 9.75002e-06 [auto_monad_grad]: 6.01e-06 [auto_monad_eliminator]: 5.322e-05 [cse]: 0.00018548 [a_3]: 0.00033682 [Cycle 2]: 0.00282624, [45] [expand_dump_flag]: 1.69998e-06 [switch_simplify]: 4.605e-05 [loop_unroll]: 4.184e-05 [a_1]: 0.00136287 [with_stream_mark]: 1.236e-05 [recompute_prepare]: 9.56998e-06 [updatestate_depend_eliminate]: 4.30999e-06 [updatestate_assign_eliminate]: 3.33998e-06 [updatestate_loads_eliminate]: 3.03998e-06 [parameter_eliminate]: 1.32e-06 [a_2]: 9.012e-05 [accelerated_algorithm]: 1.123e-05 [shard]: 1.27999e-06 [meta_shard_fg_expand]: 1.85001e-06 [shard_inline]: 7.35e-06 [merge_send_recv]: 7.06001e-06 [auto_parallel]: 7.07002e-06 [parallel]: 7.03e-06 [flash_sp]: 4.05998e-06 [merge_comm]: 4.28001e-06 [allreduce_fusion]: 3.65003e-06 [matmul_add_comm_reduction]: 7.03e-06 [allreduce_slice_to_reducescatter]: 4.69998e-07 [virtual_shard_identity]: 7.87e-06 [virtual_dataset]: 6.60002e-06 [get_grad_eliminate_]: 6.46999e-06 [virtual_output]: 6.17999e-06 [merge_forward]: 3.57002e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.07001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.404e-05 [merge_recompute_call_nodes]: 9.50007e-07 [before_grad]: 1.226e-05 [set_forward_comm_id_for_comm_node_pass]: 4.23001e-06 [meta_fg_expand]: 8.151e-05 [flash_sp_send_recv_attached]: 1.32e-06 [receive_attached]: 1.12999e-06 [after_resolve]: 1.418e-05 [a_after_grad]: 1.079e-05 [renormalize]: 0.00064026 [add_forward_monad_depend]: 4.49002e-06 [auto_monad_grad]: 1.22999e-06 [auto_monad_eliminator]: 1.15e-05 [cse]: 2.167e-05 [a_3]: 4.873e-05 [Cycle 3]: 0.00069013, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 8.38001e-06 [loop_unroll]: 6.84999e-06 [a_1]: 0.00014992 [with_stream_mark]: 8.90999e-06 [recompute_prepare]: 6.94999e-06 [updatestate_depend_eliminate]: 3.93999e-06 [updatestate_assign_eliminate]: 2.98e-06 [updatestate_loads_eliminate]: 2.68998e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 8.626e-05 [accelerated_algorithm]: 1.013e-05 [shard]: 9.30013e-07 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 7.18e-06 [merge_send_recv]: 5.29e-06 [auto_parallel]: 6.34001e-06 [parallel]: 5.05999e-06 [flash_sp]: 1.02e-06 [merge_comm]: 3.60998e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 5.81998e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 7.8e-06 [virtual_dataset]: 6.27001e-06 [get_grad_eliminate_]: 6.70002e-06 [virtual_output]: 6.11e-06 [merge_forward]: 3.26999e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 7.51001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.255e-05 [merge_recompute_call_nodes]: 8.2e-07 [before_grad]: 1.071e-05 [set_forward_comm_id_for_comm_node_pass]: 4.23999e-06 [meta_fg_expand]: 2.53e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 8.50006e-07 [after_resolve]: 9.04998e-06 [a_after_grad]: 9.67999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 7.83999e-06 [cse]: 1.766e-05 [a_3]: 3.968e-05 [py_interpret_to_execute_after_opt_a]: 9.82001e-06 [slice_cell_reuse_recomputed_activation]: 2.24001e-06 [rewriter_after_opt_a]: 4.172e-05 [convert_after_rewriter]: 7.25e-06 [order_py_execute_after_rewriter]: 5.99999e-06 [mutable_eliminate]: 0.00048731 [opt_b]: 0.00021984, [1] [Cycle 1]: 0.0002133, [7] [b_1]: 0.00013464 [b_2]: 8.90999e-06 [updatestate_depend_eliminate]: 5.84e-06 [updatestate_assign_eliminate]: 3.14999e-06 [updatestate_loads_eliminate]: 2.82002e-06 [renormalize]: 4.50003e-07 [cse]: 2.159e-05 [optimize_parallel_all_gather_comm]: 1.736e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.064e-05 [loop_unroll]: 0.00042831 [opt_after_cconv]: 0.00011746, [1] [Cycle 1]: 0.0001116, [7] [c_1]: 3.194e-05 [parameter_eliminate]: 2.11e-06 [updatestate_depend_eliminate]: 5.69999e-06 [updatestate_assign_eliminate]: 3.04001e-06 [updatestate_loads_eliminate]: 1.002e-05 [cse]: 2.229e-05 [renormalize]: 5.3001e-07 [remove_dup_value]: 1.703e-05 [tuple_transform]: 8.265e-05, [1] [Cycle 1]: 7.762e-05, [4] [d_1]: 4.788e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 8.08999e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 5.255e-05 [cse_after_recomputation]: 2.603e-05, [1] [Cycle 1]: 2.127e-05, [1] [cse]: 1.538e-05 [environ_conv]: 7.9e-06 [swap_dp_allreduce_reducescatter]: 5.84e-06 [bias_add_comm_swap]: 2.51e-06 [label_micro_interleaved_index]: 4.33001e-06 [label_fine_grained_interleaved_index]: 3.00998e-06 [merge_cast_opt]: 1.43002e-06 [slice_recompute_activation]: 2.21e-06 [micro_interleaved_order_control]: 2.25002e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.35997e-06 [reorder_send_recv_between_fp_bp]: 2.67001e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.57001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99999e-06 [control_data_broadcast_order]: 1.444e-05 [grouped_pairwise_exchange_alltoall]: 1.84e-06 [offloading_packed_experts]: 4.31002e-06 [overlap_recompute_and_grad_model_parallel]: 5.14e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.59e-06 [overlap_recompute_comm]: 2.43998e-06 [overlap_grad_ring_attention]: 4.55001e-06 [overlap_grad_flash_sp]: 2.055e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 2.14e-06 [handle_group_info]: 1.25001e-06 [symbol_engine_optimizer]: 8.867e-05, [1] [Cycle 1]: 8.413e-05, [6] [build]: 8.92e-06 [elim_shapecalc]: 1.063e-05 [elim_not_effective]: 1.514e-05 [opt_reshape]: 8.05999e-06 [fold_const_symbol]: 1.159e-05 [renormalize]: 2.40019e-07 [detach_backward]: 2.12999e-06 [pipeline_parallel_scheduler]: 1.59e-06 [auto_monad_reorder]: 2.119e-05 [get_jit_bprop_graph]: 1.06997e-06 [rewriter_after_jit_bprop_graph]: 4.05998e-06 [opt_after_jit_grad]: 0.00048134 [validate]: 4.211e-05 [backend_pass]: 1.04e-06 [task_emit]: 0.0066041 [execute]: 7.14001e-06 Sums bootstrap : 0.000502s : 1.41% type_inference : 0.011869s : 33.31% event_method : 0.000049s : 0.14% auto_monad : 0.000142s : 0.40% graph_reusing : 0.000009s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000051s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.11% optimize.rewriter_before_opt_a : 0.000159s : 0.44% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.37% optimize.opt_a.loop_unroll : 0.000113s : 0.32% optimize.opt_a.a_1 : 0.003006s : 8.44% optimize.opt_a.with_stream_mark : 0.000045s : 0.13% optimize.opt_a.recompute_prepare : 0.000039s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000423s : 1.19% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.15% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000031s : 0.09% optimize.opt_a.merge_send_recv : 0.000029s : 0.08% optimize.opt_a.auto_parallel : 0.000024s : 0.07% optimize.opt_a.parallel : 0.000031s : 0.09% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000018s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000039s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000034s : 0.09% optimize.opt_a.virtual_dataset : 0.000029s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.08% optimize.opt_a.virtual_output : 0.000027s : 0.08% optimize.opt_a.merge_forward : 0.000016s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000056s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000051s : 0.14% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000019s : 0.05% optimize.opt_a.meta_fg_expand : 0.001618s : 4.54% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000087s : 0.25% optimize.opt_a.a_after_grad : 0.000110s : 0.31% optimize.opt_a.renormalize : 0.007045s : 19.77% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000073s : 0.20% optimize.opt_a.cse : 0.000225s : 0.63% optimize.opt_a.a_3 : 0.000425s : 1.19% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000042s : 0.12% optimize.convert_after_rewriter : 0.000007s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000487s : 1.37% optimize.opt_b.b_1 : 0.000135s : 0.38% optimize.opt_b.b_2 : 0.000009s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000022s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000428s : 1.20% optimize.opt_after_cconv.c_1 : 0.000032s : 0.09% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000010s : 0.03% optimize.opt_after_cconv.cse : 0.000022s : 0.06% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000017s : 0.05% optimize.tuple_transform.d_1 : 0.000048s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000053s : 0.15% optimize.cse_after_recomputation.cse : 0.000015s : 0.04% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000021s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.06% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000481s : 1.35% validate : 0.000042s : 0.12% backend_pass : 0.000001s : 0.00% task_emit : 0.006604s : 18.53% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000728 161 6.87% : 0.000050s : 8: substitution.arithmetic_simplify 0.32% : 0.000002s : 3: substitution.elim_not_effective 0.57% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 3: substitution.fold_const_symbol 0.91% : 0.000007s : 4: substitution.graph_param_transform 0.44% : 0.000003s : 2: substitution.incorporate_call 0.30% : 0.000002s : 2: substitution.incorporate_call_switch 58.73% : 0.000428s : 17: substitution.inline 2.27% : 0.000017s : 2: substitution.inline_without_move 1.44% : 0.000011s : 15: substitution.j_node_and_user_rematch 2.15% : 0.000016s : 3: substitution.less_batch_normalization 1.51% : 0.000011s : 7: substitution.minmaximum_grad 0.83% : 0.000006s : 5: substitution.partial_eliminate 1.70% : 0.000012s : 15: substitution.remove_not_recompute_node 3.65% : 0.000027s : 10: substitution.replace_applicator 1.22% : 0.000009s : 10: substitution.replace_old_param 0.39% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.09% : 0.000023s : 7: substitution.tuple_list_convert_item_index_to_positive 1.43% : 0.000010s : 7: substitution.tuple_list_get_item_const_eliminator 2.01% : 0.000015s : 7: substitution.tuple_list_get_item_depend_reorder 7.43% : 0.000054s : 19: substitution.tuple_list_get_item_eliminator 1.97% : 0.000014s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011791 2 85.69% : 0.010104s : 1: type_inference.infer 14.31% : 0.001687s : 1: type_inference.specialize ------[replace.] 0.000208 27 63.65% : 0.000132s : 17: replace.inline 36.35% : 0.000076s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000445 27 93.95% : 0.000418s : 17: match.inline 6.05% : 0.000027s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000707 4248 1.14% : 0.000008s : 53: predicate.accumulaten_eliminater 0.29% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.45% : 0.000003s : 21: predicate.addn_check_dump 1.11% : 0.000008s : 53: predicate.addn_zero_filter 1.10% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 1.99% : 0.000014s : 74: predicate.arithmetic_simplify 1.14% : 0.000008s : 53: predicate.cast_eliminate 1.12% : 0.000008s : 50: predicate.check_bprop_eliminate 0.45% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.47% : 0.000003s : 21: predicate.depend_value_elim 1.15% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.27% : 0.000009s : 53: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000001s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.19% : 0.000008s : 57: predicate.environ_get_depend_swap 1.67% : 0.000012s : 78: predicate.environ_get_eliminate 1.17% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.83% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.54% : 0.000018s : 80: predicate.float_depend_g_call 0.45% : 0.000003s : 21: predicate.float_environ_get_switch 0.58% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.54% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.50% : 0.000004s : 21: predicate.incorporate_call 0.46% : 0.000003s : 21: predicate.incorporate_call_switch 5.90% : 0.000042s : 183: predicate.inline 1.42% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.61% : 0.000004s : 21: predicate.less_batch_normalization 1.55% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.66% : 0.000019s : 124: predicate.load_eliminater 0.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.56% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.33% : 0.000009s : 61: predicate.make_slice_get_slice_eliminator 0.48% : 0.000003s : 21: predicate.merge_addn 1.09% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 53: predicate.minmaximum_grad 0.28% : 0.000002s : 4: predicate.mutable_eliminate 0.12% : 0.000001s : 4: predicate.opt_reshape 0.11% : 0.000001s : 4: predicate.parallel_virtual_node 2.11% : 0.000015s : 80: predicate.partial_defer_inline 1.71% : 0.000012s : 67: predicate.partial_eliminate 1.15% : 0.000008s : 53: predicate.print_const_string_wrapper 0.48% : 0.000003s : 21: predicate.reduce_all_const_elim 1.44% : 0.000010s : 53: predicate.reduce_eliminate 2.64% : 0.000019s : 124: predicate.redundant_stop_gradient_eliminater 0.28% : 0.000002s : 21: predicate.remove_not_recompute_node 1.86% : 0.000013s : 113: predicate.replace_applicator 0.65% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.13% : 0.000008s : 53: predicate.reshape_eliminate 1.11% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.22% : 0.000009s : 50: predicate.same_eliminate 0.35% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.56% : 0.000004s : 21: predicate.shard_identity_eliminate 0.22% : 0.000002s : 8: predicate.special_op_eliminate 0.60% : 0.000004s : 21: predicate.specialize_transform 1.23% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.12% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.99% : 0.000014s : 80: predicate.switch_defer_inline 3.05% : 0.000022s : 130: predicate.switch_layer_defer_inline 5.25% : 0.000037s : 218: predicate.switch_simplify 1.11% : 0.000008s : 53: predicate.tile_eliminate 1.13% : 0.000008s : 53: predicate.transpose_eliminate 1.45% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000010s : 61: predicate.tuple_list_get_item_depend_reorder 2.89% : 0.000020s : 92: predicate.tuple_list_get_item_eliminator 1.51% : 0.000011s : 61: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.59% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.64% : 0.000019s : 124: predicate.updatestate_pure_node_eliminater 3.13% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 4: predicate.value_based_eliminate 0.49% : 0.000003s : 21: predicate.virtual_dataset_eliminate 0.50% : 0.000003s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001898 36 58.04% : 0.001102s : 15: func_graph_cloner_run.FuncGraphClonerGraph 41.96% : 0.000796s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.071988 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.39% : 0.003160s : 1: add_attr 4.38% : 0.003150s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000057s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000149s : 1: auto_monad 0.04% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.75% : 0.000538s : 1: bootstrap 0.03% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000029s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.08% : 0.000056s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000014s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.61% : 0.000437s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.69% : 0.000496s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 6.26% : 0.004506s : 117: opt.transform.opt_a 0.04% : 0.000030s : 1: opt.transform.opt_after_cconv 0.04% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000114s : 28: opt.transform.opt_b 0.07% : 0.000054s : 2: opt.transform.opt_trans_graph 0.06% : 0.000042s : 4: opt.transform.symbol_engine_opt 20.56% : 0.014799s : 1: opt_a 0.17% : 0.000121s : 1: opt_after_cconv 0.68% : 0.000491s : 1: opt_after_jit_grad 0.31% : 0.000223s : 1: opt_b 23.56% : 0.016961s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000056s : 1: pre_auto_parallel 0.06% : 0.000044s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000021s : 1: remove_dup_value 7.55% : 0.005436s : 2: renormalize.infer 2.21% : 0.001594s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000046s : 1: rewriter_after_opt_a 0.23% : 0.000163s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.13% : 0.000092s : 1: symbol_engine_optimizer 9.19% : 0.006616s : 1: task_emit 0.12% : 0.000086s : 1: tuple_transform 16.51% : 0.011885s : 1: type_inference 0.10% : 0.000072s : 1: validate TotalTime = 0.0207524, [24] [bootstrap]: 0.00042082 [type_inference]: 0.00582442 [event_method]: 1.326e-05 [auto_monad]: 5.993e-05 [graph_reusing]: 6.11998e-06 [inline]: 2.02001e-06 [add_attr]: 0.00313635, [1] [add_attr_with_inline]: 0.00312723, [1] [Cycle 1]: 5.12e-05, [2] [tag_attr]: 1.556e-05 [meta_addattr_fg_expand]: 4.80999e-06 [parallel-infer-symbol]: 3.85998e-06 [pre_auto_parallel]: 2.881e-05 [insert-virtual-dataset]: 2.68003e-06 [parallel-infer-symbol-second]: 1.05001e-06 [dataset_repeat_opt]: 2.58e-06 [pipeline_split]: 1.94e-06 [optimize]: 0.00425366, [53] [py_interpret_to_execute]: 2.442e-05 [rewriter_before_opt_a]: 5.884e-05 [opt_a]: 0.00225946, [2] [Cycle 1]: 0.00158866, [45] [expand_dump_flag]: 3.01999e-06 [switch_simplify]: 3.444e-05 [loop_unroll]: 2.052e-05 [a_1]: 0.0004098 [with_stream_mark]: 1.649e-05 [recompute_prepare]: 8.69e-06 [updatestate_depend_eliminate]: 4.17e-06 [updatestate_assign_eliminate]: 4.2e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 1.94e-06 [a_2]: 9.144e-05 [accelerated_algorithm]: 7.97e-06 [shard]: 2.43002e-06 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 7.15e-06 [merge_send_recv]: 8.97e-06 [auto_parallel]: 7.2e-06 [parallel]: 1.914e-05 [flash_sp]: 8.47e-06 [merge_comm]: 4.91002e-06 [allreduce_fusion]: 4.11001e-06 [matmul_add_comm_reduction]: 9.39998e-06 [allreduce_slice_to_reducescatter]: 1.03001e-06 [virtual_shard_identity]: 9.24e-06 [virtual_dataset]: 7.61999e-06 [get_grad_eliminate_]: 6.40002e-06 [virtual_output]: 7.12997e-06 [merge_forward]: 4.68999e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 1.069e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.432e-05 [merge_recompute_call_nodes]: 1.84998e-06 [before_grad]: 1.196e-05 [set_forward_comm_id_for_comm_node_pass]: 4.71002e-06 [meta_fg_expand]: 2.91e-06 [flash_sp_send_recv_attached]: 3.06001e-06 [receive_attached]: 2.44001e-06 [after_resolve]: 1.146e-05 [a_after_grad]: 1.018e-05 [renormalize]: 0.00044321 [add_forward_monad_depend]: 4.50999e-06 [auto_monad_grad]: 2.27999e-06 [auto_monad_eliminator]: 1.455e-05 [cse]: 3.02e-05 [a_3]: 4.467e-05 [Cycle 2]: 0.00065998, [45] [expand_dump_flag]: 8.2e-07 [switch_simplify]: 7.98999e-06 [loop_unroll]: 6.01e-06 [a_1]: 0.00012771 [with_stream_mark]: 1.028e-05 [recompute_prepare]: 6.21e-06 [updatestate_depend_eliminate]: 3.09999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 7.27e-05 [accelerated_algorithm]: 6.06e-06 [shard]: 1.19e-06 [meta_shard_fg_expand]: 1.26002e-06 [shard_inline]: 6.07001e-06 [merge_send_recv]: 4.84e-06 [auto_parallel]: 5.52001e-06 [parallel]: 4.35e-06 [flash_sp]: 3.46999e-06 [merge_comm]: 3.37002e-06 [allreduce_fusion]: 2.90002e-06 [matmul_add_comm_reduction]: 5.49e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 6.94001e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.89e-06 [merge_forward]: 2.81999e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 6.17001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.076e-05 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 8.95999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.77002e-06 [meta_fg_expand]: 1.97999e-06 [flash_sp_send_recv_attached]: 9.5999e-07 [receive_attached]: 9.60019e-07 [after_resolve]: 8.74e-06 [a_after_grad]: 8.23001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 9.99979e-07 [auto_monad_eliminator]: 6.54001e-06 [cse]: 1.782e-05 [a_3]: 3.462e-05 [py_interpret_to_execute_after_opt_a]: 8.80001e-06 [slice_cell_reuse_recomputed_activation]: 2.43e-06 [rewriter_after_opt_a]: 3.527e-05 [convert_after_rewriter]: 7.81001e-06 [order_py_execute_after_rewriter]: 6.33e-06 [mutable_eliminate]: 0.00049746 [opt_b]: 0.00020577, [1] [Cycle 1]: 0.00019927, [7] [b_1]: 0.00012185 [b_2]: 7.92e-06 [updatestate_depend_eliminate]: 5.48997e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.82002e-06 [renormalize]: 4.80009e-07 [cse]: 2.036e-05 [optimize_parallel_all_gather_comm]: 1.681e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 2.349e-05 [loop_unroll]: 0.00044669 [opt_after_cconv]: 9.813e-05, [1] [Cycle 1]: 9.208e-05, [7] [c_1]: 2.666e-05 [parameter_eliminate]: 2.36998e-06 [updatestate_depend_eliminate]: 5.35001e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.736e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 1.641e-05 [tuple_transform]: 7.118e-05, [1] [Cycle 1]: 6.653e-05, [4] [d_1]: 3.846e-05 [none_parameter_eliminate]: 1.89e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.63e-06 [partial_unused_args_eliminate]: 1.91e-06 [add_recomputation]: 4.438e-05 [cse_after_recomputation]: 2.317e-05, [1] [Cycle 1]: 1.832e-05, [1] [cse]: 1.246e-05 [environ_conv]: 5.47999e-06 [swap_dp_allreduce_reducescatter]: 5.60001e-06 [bias_add_comm_swap]: 2.41e-06 [label_micro_interleaved_index]: 4.52e-06 [label_fine_grained_interleaved_index]: 2.99999e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.22999e-06 [micro_interleaved_order_control]: 2.50002e-06 [assign_add_opt]: 1.37999e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 8.30012e-07 [full_micro_interleaved_order_control]: 2.54999e-06 [reorder_send_recv_between_fp_bp]: 2.99999e-06 [comm_op_add_attrs]: 1.29e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.25999e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94999e-06 [control_data_broadcast_order]: 1.347e-05 [grouped_pairwise_exchange_alltoall]: 1.57999e-06 [offloading_packed_experts]: 4.63999e-06 [overlap_recompute_and_grad_model_parallel]: 4.73001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.78002e-06 [overlap_recompute_comm]: 2.16998e-06 [overlap_grad_ring_attention]: 4.79998e-06 [overlap_grad_flash_sp]: 1.745e-05 [begin_end_overlap_inline]: 5.89993e-07 [split_matmul_comm_elemetwise]: 2.93998e-06 [split_layernorm_comm]: 1.68002e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 7.309e-05, [1] [Cycle 1]: 6.853e-05, [6] [build]: 2.22001e-06 [elim_shapecalc]: 9.56e-06 [elim_not_effective]: 1.211e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 9.89001e-06 [renormalize]: 2.40019e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.632e-05 [get_jit_bprop_graph]: 1.10999e-06 [rewriter_after_jit_bprop_graph]: 3.46001e-06 [opt_after_jit_grad]: 0.00046543 [validate]: 3.698e-05 [backend_pass]: 1.02998e-06 [task_emit]: 0.00625823 [execute]: 7.3e-06 Sums bootstrap : 0.000421s : 2.54% type_inference : 0.005824s : 35.19% event_method : 0.000013s : 0.08% auto_monad : 0.000060s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000029s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000003s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000024s : 0.15% optimize.rewriter_before_opt_a : 0.000059s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000042s : 0.26% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000538s : 3.25% optimize.opt_a.with_stream_mark : 0.000027s : 0.16% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000164s : 0.99% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.08% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.10% optimize.opt_a.virtual_dataset : 0.000013s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.07% optimize.opt_a.virtual_output : 0.000013s : 0.08% optimize.opt_a.merge_forward : 0.000008s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000021s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000018s : 0.11% optimize.opt_a.renormalize : 0.000443s : 2.68% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000048s : 0.29% optimize.opt_a.a_3 : 0.000079s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000035s : 0.21% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000497s : 3.01% optimize.opt_b.b_1 : 0.000122s : 0.74% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000447s : 2.70% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.10% optimize.tuple_transform.d_1 : 0.000038s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.27% optimize.cse_after_recomputation.cse : 0.000012s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000465s : 2.81% validate : 0.000037s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006258s : 37.81% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000158 24 19.74% : 0.000031s : 4: substitution.arithmetic_simplify 1.25% : 0.000002s : 2: substitution.elim_not_effective 1.09% : 0.000002s : 2: substitution.fold_const_symbol 3.42% : 0.000005s : 3: substitution.graph_param_transform 66.81% : 0.000105s : 3: substitution.inline 2.30% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.37% : 0.000005s : 4: substitution.remove_not_recompute_node 2.02% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005781 2 91.77% : 0.005306s : 1: type_inference.infer 8.23% : 0.000476s : 1: type_inference.specialize ------[replace.] 0.000032 3 100.00% : 0.000032s : 3: replace.inline ------[match.] 0.000103 3 100.00% : 0.000103s : 3: match.inline ------[predicate.] 0.000163 815 0.88% : 0.000001s : 8: predicate.accumulaten_eliminater 0.84% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 8: predicate.addn_zero_filter 0.77% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 14: predicate.arithmetic_simplify 0.85% : 0.000001s : 8: predicate.cast_eliminate 0.64% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.58% : 0.000001s : 6: predicate.depend_value_elim 0.87% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.92% : 0.000002s : 8: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.45% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.15% : 0.000002s : 11: predicate.environ_get_depend_swap 1.70% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.29% : 0.000004s : 11: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.95% : 0.000002s : 9: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 3: predicate.fold_const_symbol 0.73% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.72% : 0.000001s : 6: predicate.incorporate_call 0.56% : 0.000001s : 6: predicate.incorporate_call_switch 6.39% : 0.000010s : 37: predicate.inline 0.99% : 0.000002s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.97% : 0.000002s : 6: predicate.less_batch_normalization 1.61% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.26% : 0.000004s : 22: predicate.load_eliminater 1.14% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.00% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.58% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.61% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 8: predicate.minmaximum_grad 1.18% : 0.000002s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.39% : 0.000001s : 3: predicate.parallel_virtual_node 1.47% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 11: predicate.partial_eliminate 0.91% : 0.000001s : 8: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.43% : 0.000002s : 8: predicate.reduce_eliminate 2.27% : 0.000004s : 22: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 6: predicate.remove_not_recompute_node 1.44% : 0.000002s : 14: predicate.replace_applicator 0.59% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.92% : 0.000002s : 8: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000001s : 6: predicate.same_eliminate 0.47% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.99% : 0.000002s : 6: predicate.shard_identity_eliminate 0.72% : 0.000001s : 6: predicate.special_op_eliminate 0.92% : 0.000001s : 6: predicate.specialize_transform 0.99% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 11: predicate.switch_defer_inline 1.86% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.91% : 0.000008s : 38: predicate.switch_simplify 0.92% : 0.000002s : 8: predicate.tile_eliminate 0.86% : 0.000001s : 8: predicate.transpose_eliminate 1.48% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.48% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.03% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.61% : 0.000003s : 14: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.71% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 22: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.77% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.73% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000286 7 37.75% : 0.000108s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.25% : 0.000178s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029684 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.58% : 0.003141s : 1: add_attr 10.55% : 0.003131s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000065s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.51% : 0.000448s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000017s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000026s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.53% : 0.000455s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.71% : 0.000506s : 1: mutable_eliminate 0.03% : 0.000008s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 3.16% : 0.000938s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000098s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.62% : 0.002263s : 1: opt_a 0.34% : 0.000102s : 1: opt_after_cconv 1.60% : 0.000475s : 1: opt_after_jit_grad 0.70% : 0.000209s : 1: opt_b 14.34% : 0.004258s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000010s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000008s : 1: parallel-infer-symbol 0.02% : 0.000005s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000006s : 1: pipeline_split 0.11% : 0.000034s : 1: pre_auto_parallel 0.10% : 0.000029s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000020s : 1: remove_dup_value 0.80% : 0.000239s : 1: renormalize.infer 0.66% : 0.000197s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000040s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000006s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000076s : 1: symbol_engine_optimizer 21.12% : 0.006270s : 1: task_emit 0.25% : 0.000074s : 1: tuple_transform 19.68% : 0.005841s : 1: type_inference 0.22% : 0.000065s : 1: validate TotalTime = 0.0405433, [24] [bootstrap]: 0.00050851 [type_inference]: 0.0120717 [event_method]: 4.622e-05 [auto_monad]: 0.00013753 [graph_reusing]: 9.15999e-06 [inline]: 2.04e-06 [add_attr]: 0.00329715, [1] [add_attr_with_inline]: 0.00328723, [1] [Cycle 1]: 7.38e-05, [2] [tag_attr]: 3.354e-05 [meta_addattr_fg_expand]: 1.049e-05 [parallel-infer-symbol]: 3.23998e-06 [pre_auto_parallel]: 4.835e-05 [insert-virtual-dataset]: 2.78e-06 [parallel-infer-symbol-second]: 9.20001e-07 [dataset_repeat_opt]: 2.03997e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.016977, [53] [py_interpret_to_execute]: 4.013e-05 [rewriter_before_opt_a]: 0.00014742 [opt_a]: 0.0147634, [3] [Cycle 1]: 0.0112247, [45] [expand_dump_flag]: 4.52e-06 [switch_simplify]: 7.546e-05 [loop_unroll]: 6.214e-05 [a_1]: 0.00144145 [with_stream_mark]: 2.441e-05 [recompute_prepare]: 2.248e-05 [updatestate_depend_eliminate]: 8.95999e-06 [updatestate_assign_eliminate]: 8.35001e-06 [updatestate_loads_eliminate]: 7.17002e-06 [parameter_eliminate]: 3.24001e-06 [a_2]: 0.00024802 [accelerated_algorithm]: 3.192e-05 [shard]: 2.08998e-06 [meta_shard_fg_expand]: 4e-06 [shard_inline]: 1.631e-05 [merge_send_recv]: 1.662e-05 [auto_parallel]: 1.108e-05 [parallel]: 1.927e-05 [flash_sp]: 1.236e-05 [merge_comm]: 1.004e-05 [allreduce_fusion]: 8.85001e-06 [matmul_add_comm_reduction]: 2.706e-05 [allreduce_slice_to_reducescatter]: 9.29984e-07 [virtual_shard_identity]: 1.752e-05 [virtual_dataset]: 1.6e-05 [get_grad_eliminate_]: 1.549e-05 [virtual_output]: 1.522e-05 [merge_forward]: 9.22999e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 1.777e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.986e-05 [merge_recompute_call_nodes]: 1.49998e-06 [before_grad]: 2.893e-05 [set_forward_comm_id_for_comm_node_pass]: 1.032e-05 [meta_fg_expand]: 0.00148419 [flash_sp_send_recv_attached]: 4.51002e-06 [receive_attached]: 2.36998e-06 [after_resolve]: 6.684e-05 [a_after_grad]: 9.05e-05 [renormalize]: 0.00644258 [add_forward_monad_depend]: 9.52001e-06 [auto_monad_grad]: 5.81998e-06 [auto_monad_eliminator]: 5.359e-05 [cse]: 0.00019056 [a_3]: 0.00033916 [Cycle 2]: 0.00282425, [45] [expand_dump_flag]: 2.13998e-06 [switch_simplify]: 4.63e-05 [loop_unroll]: 4.373e-05 [a_1]: 0.001362 [with_stream_mark]: 1.13e-05 [recompute_prepare]: 9.04e-06 [updatestate_depend_eliminate]: 4.37e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 2.89001e-06 [parameter_eliminate]: 1.38002e-06 [a_2]: 8.982e-05 [accelerated_algorithm]: 1.098e-05 [shard]: 1.49998e-06 [meta_shard_fg_expand]: 1.99999e-06 [shard_inline]: 7.01999e-06 [merge_send_recv]: 6.81001e-06 [auto_parallel]: 7.82e-06 [parallel]: 6.41e-06 [flash_sp]: 3.90998e-06 [merge_comm]: 4.17998e-06 [allreduce_fusion]: 3.65e-06 [matmul_add_comm_reduction]: 6.76999e-06 [allreduce_slice_to_reducescatter]: 5.19998e-07 [virtual_shard_identity]: 9.10999e-06 [virtual_dataset]: 7.03998e-06 [get_grad_eliminate_]: 6.49001e-06 [virtual_output]: 6.41998e-06 [merge_forward]: 3.66999e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 8.87e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.371e-05 [merge_recompute_call_nodes]: 9.50007e-07 [before_grad]: 1.192e-05 [set_forward_comm_id_for_comm_node_pass]: 4.23001e-06 [meta_fg_expand]: 5.568e-05 [flash_sp_send_recv_attached]: 1.19e-06 [receive_attached]: 1.16002e-06 [after_resolve]: 1.193e-05 [a_after_grad]: 1.029e-05 [renormalize]: 0.00062701 [add_forward_monad_depend]: 4.65001e-06 [auto_monad_grad]: 1.47001e-06 [auto_monad_eliminator]: 1.188e-05 [cse]: 2.339e-05 [a_3]: 4.829e-05 [Cycle 3]: 0.00069887, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 8.48999e-06 [loop_unroll]: 6.76e-06 [a_1]: 0.00015046 [with_stream_mark]: 8.49002e-06 [recompute_prepare]: 7.11001e-06 [updatestate_depend_eliminate]: 3.98001e-06 [updatestate_assign_eliminate]: 3.06999e-06 [updatestate_loads_eliminate]: 3.03e-06 [parameter_eliminate]: 1.13001e-06 [a_2]: 8.683e-05 [accelerated_algorithm]: 9.69e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.43002e-06 [shard_inline]: 7.08998e-06 [merge_send_recv]: 5.92001e-06 [auto_parallel]: 6.64001e-06 [parallel]: 5.14e-06 [flash_sp]: 9.5999e-07 [merge_comm]: 3.93001e-06 [allreduce_fusion]: 3.50998e-06 [matmul_add_comm_reduction]: 5.87999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 8.20999e-06 [virtual_dataset]: 6.44001e-06 [get_grad_eliminate_]: 6.20997e-06 [virtual_output]: 6.07999e-06 [merge_forward]: 3.36001e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 7.36001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.338e-05 [merge_recompute_call_nodes]: 8.39995e-07 [before_grad]: 1.092e-05 [set_forward_comm_id_for_comm_node_pass]: 4.24002e-06 [meta_fg_expand]: 2.69001e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 9.20001e-07 [after_resolve]: 9.44e-06 [a_after_grad]: 9.75002e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.29e-06 [auto_monad_grad]: 9.79984e-07 [auto_monad_eliminator]: 7.84002e-06 [cse]: 1.738e-05 [a_3]: 4.061e-05 [py_interpret_to_execute_after_opt_a]: 1.068e-05 [slice_cell_reuse_recomputed_activation]: 2.17999e-06 [rewriter_after_opt_a]: 4.281e-05 [convert_after_rewriter]: 8.28999e-06 [order_py_execute_after_rewriter]: 5.54998e-06 [mutable_eliminate]: 0.00052891 [opt_b]: 0.00022286, [1] [Cycle 1]: 0.00021529, [7] [b_1]: 0.00013504 [b_2]: 9.02e-06 [updatestate_depend_eliminate]: 6.31e-06 [updatestate_assign_eliminate]: 3.01001e-06 [updatestate_loads_eliminate]: 2.80997e-06 [renormalize]: 4.90021e-07 [cse]: 2.146e-05 [optimize_parallel_all_gather_comm]: 1.758e-05 [overlap_param_gather]: 1.94999e-06 [cconv]: 2.173e-05 [loop_unroll]: 0.00044647 [opt_after_cconv]: 0.0001144, [1] [Cycle 1]: 0.00010849, [7] [c_1]: 3.422e-05 [parameter_eliminate]: 3.13e-06 [updatestate_depend_eliminate]: 6.17999e-06 [updatestate_assign_eliminate]: 3.24001e-06 [updatestate_loads_eliminate]: 2.93e-06 [cse]: 2.182e-05 [renormalize]: 4.60015e-07 [remove_dup_value]: 1.665e-05 [tuple_transform]: 8.105e-05, [1] [Cycle 1]: 7.628e-05, [4] [d_1]: 4.777e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 7.83001e-06 [partial_unused_args_eliminate]: 2.22001e-06 [add_recomputation]: 5.112e-05 [cse_after_recomputation]: 2.651e-05, [1] [Cycle 1]: 2.171e-05, [1] [cse]: 1.56e-05 [environ_conv]: 7.77e-06 [swap_dp_allreduce_reducescatter]: 6.35002e-06 [bias_add_comm_swap]: 2.81e-06 [label_micro_interleaved_index]: 4.70001e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.53e-06 [micro_interleaved_order_control]: 2.09999e-06 [assign_add_opt]: 1.31002e-06 [ForceFp32Comm]: 1.05999e-06 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.10999e-06 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.22999e-06 [interleave_parallel_branches]: 1.22e-06 [overlap_opt_shard_in_pipeline]: 1.57001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.429e-05 [grouped_pairwise_exchange_alltoall]: 2.14e-06 [offloading_packed_experts]: 4.68999e-06 [overlap_recompute_and_grad_model_parallel]: 5.47001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.34001e-06 [overlap_grad_ring_attention]: 4.75001e-06 [overlap_grad_flash_sp]: 2.046e-05 [begin_end_overlap_inline]: 5.49975e-07 [split_matmul_comm_elemetwise]: 2.24999e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.24e-06 [symbol_engine_optimizer]: 8.835e-05, [1] [Cycle 1]: 8.383e-05, [6] [build]: 8.50001e-06 [elim_shapecalc]: 1.102e-05 [elim_not_effective]: 1.485e-05 [opt_reshape]: 7.74002e-06 [fold_const_symbol]: 1.172e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.66e-06 [auto_monad_reorder]: 2.096e-05 [get_jit_bprop_graph]: 1.25001e-06 [rewriter_after_jit_bprop_graph]: 3.75e-06 [opt_after_jit_grad]: 0.00051629 [validate]: 4.226e-05 [backend_pass]: 1.04003e-06 [task_emit]: 0.00662257 [execute]: 7.6e-06 Sums bootstrap : 0.000509s : 1.42% type_inference : 0.012072s : 33.68% event_method : 0.000046s : 0.13% auto_monad : 0.000138s : 0.38% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000048s : 0.13% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.11% optimize.rewriter_before_opt_a : 0.000147s : 0.41% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000130s : 0.36% optimize.opt_a.loop_unroll : 0.000113s : 0.31% optimize.opt_a.a_1 : 0.002954s : 8.24% optimize.opt_a.with_stream_mark : 0.000044s : 0.12% optimize.opt_a.recompute_prepare : 0.000039s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.04% optimize.opt_a.parameter_eliminate : 0.000006s : 0.02% optimize.opt_a.a_2 : 0.000425s : 1.18% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.15% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000030s : 0.08% optimize.opt_a.merge_send_recv : 0.000029s : 0.08% optimize.opt_a.auto_parallel : 0.000026s : 0.07% optimize.opt_a.parallel : 0.000031s : 0.09% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000018s : 0.05% optimize.opt_a.allreduce_fusion : 0.000016s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000040s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000035s : 0.10% optimize.opt_a.virtual_dataset : 0.000029s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.08% optimize.opt_a.virtual_output : 0.000028s : 0.08% optimize.opt_a.merge_forward : 0.000016s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000034s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000057s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000052s : 0.14% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000019s : 0.05% optimize.opt_a.meta_fg_expand : 0.001543s : 4.30% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.25% optimize.opt_a.a_after_grad : 0.000111s : 0.31% optimize.opt_a.renormalize : 0.007070s : 19.72% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000073s : 0.20% optimize.opt_a.cse : 0.000231s : 0.65% optimize.opt_a.a_3 : 0.000428s : 1.19% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000043s : 0.12% optimize.convert_after_rewriter : 0.000008s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000529s : 1.48% optimize.opt_b.b_1 : 0.000135s : 0.38% optimize.opt_b.b_2 : 0.000009s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000021s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.06% optimize.loop_unroll : 0.000446s : 1.25% optimize.opt_after_cconv.c_1 : 0.000034s : 0.10% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000022s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000017s : 0.05% optimize.tuple_transform.d_1 : 0.000048s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.14% optimize.cse_after_recomputation.cse : 0.000016s : 0.04% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.06% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000516s : 1.44% validate : 0.000042s : 0.12% backend_pass : 0.000001s : 0.00% task_emit : 0.006623s : 18.47% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000721 159 6.52% : 0.000047s : 7: substitution.arithmetic_simplify 0.35% : 0.000003s : 3: substitution.elim_not_effective 0.61% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 3: substitution.fold_const_symbol 0.90% : 0.000007s : 4: substitution.graph_param_transform 0.47% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 58.96% : 0.000425s : 17: substitution.inline 2.42% : 0.000017s : 2: substitution.inline_without_move 1.43% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.20% : 0.000016s : 3: substitution.less_batch_normalization 1.46% : 0.000011s : 7: substitution.minmaximum_grad 0.88% : 0.000006s : 5: substitution.partial_eliminate 1.70% : 0.000012s : 15: substitution.remove_not_recompute_node 3.68% : 0.000027s : 10: substitution.replace_applicator 1.40% : 0.000010s : 10: substitution.replace_old_param 0.40% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.05% : 0.000022s : 7: substitution.tuple_list_convert_item_index_to_positive 1.49% : 0.000011s : 7: substitution.tuple_list_get_item_const_eliminator 1.94% : 0.000014s : 7: substitution.tuple_list_get_item_depend_reorder 7.05% : 0.000051s : 18: substitution.tuple_list_get_item_eliminator 1.97% : 0.000014s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011992 2 86.96% : 0.010428s : 1: type_inference.infer 13.04% : 0.001564s : 1: type_inference.specialize ------[replace.] 0.000201 26 65.84% : 0.000132s : 17: replace.inline 34.16% : 0.000069s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000439 26 94.39% : 0.000414s : 17: match.inline 5.61% : 0.000025s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000698 4180 1.12% : 0.000008s : 52: predicate.accumulaten_eliminater 0.26% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.18% : 0.000008s : 52: predicate.addn_zero_filter 1.11% : 0.000008s : 52: predicate.adjust_all_reduce_mul_add 2.08% : 0.000014s : 73: predicate.arithmetic_simplify 1.14% : 0.000008s : 52: predicate.cast_eliminate 1.13% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.46% : 0.000003s : 21: predicate.depend_value_elim 1.15% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.19% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000000s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.17% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.17% : 0.000008s : 56: predicate.environ_get_depend_swap 1.69% : 0.000012s : 77: predicate.environ_get_eliminate 1.16% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.81% : 0.000013s : 78: predicate.exchange_switch_depend_value 2.53% : 0.000018s : 78: predicate.float_depend_g_call 0.46% : 0.000003s : 21: predicate.float_environ_get_switch 0.59% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.52% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.53% : 0.000004s : 21: predicate.incorporate_call 0.45% : 0.000003s : 21: predicate.incorporate_call_switch 5.92% : 0.000041s : 180: predicate.inline 1.40% : 0.000010s : 45: predicate.inline_without_move 0.28% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 21: predicate.less_batch_normalization 1.55% : 0.000011s : 69: predicate.list_to_tuple_eliminator_ 2.58% : 0.000018s : 121: predicate.load_eliminater 0.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.47% : 0.000017s : 110: predicate.loop_unroll_before_grad 1.35% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.49% : 0.000003s : 21: predicate.merge_addn 1.13% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.16% : 0.000008s : 52: predicate.minmaximum_grad 0.30% : 0.000002s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.13% : 0.000001s : 4: predicate.parallel_virtual_node 2.09% : 0.000015s : 78: predicate.partial_defer_inline 1.73% : 0.000012s : 65: predicate.partial_eliminate 1.11% : 0.000008s : 52: predicate.print_const_string_wrapper 0.51% : 0.000004s : 21: predicate.reduce_all_const_elim 1.43% : 0.000010s : 52: predicate.reduce_eliminate 2.58% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.29% : 0.000002s : 21: predicate.remove_not_recompute_node 1.93% : 0.000014s : 111: predicate.replace_applicator 0.69% : 0.000005s : 45: predicate.replace_old_param 0.07% : 0.000000s : 4: predicate.reset_defer_inline 1.16% : 0.000008s : 52: predicate.reshape_eliminate 1.12% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.26% : 0.000009s : 50: predicate.same_eliminate 0.34% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.59% : 0.000004s : 21: predicate.shard_identity_eliminate 0.22% : 0.000002s : 8: predicate.special_op_eliminate 0.63% : 0.000004s : 21: predicate.specialize_transform 1.24% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.30% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.12% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.92% : 0.000013s : 78: predicate.switch_defer_inline 3.01% : 0.000021s : 128: predicate.switch_layer_defer_inline 5.23% : 0.000037s : 213: predicate.switch_simplify 1.14% : 0.000008s : 52: predicate.tile_eliminate 1.10% : 0.000008s : 52: predicate.transpose_eliminate 1.45% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000010s : 60: predicate.tuple_list_get_item_depend_reorder 2.80% : 0.000020s : 90: predicate.tuple_list_get_item_eliminator 1.48% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000014s : 81: predicate.tuple_list_set_item_eliminator 1.53% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.56% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.17% : 0.000022s : 142: predicate.updatestate_useless_node_eliminater 0.11% : 0.000001s : 4: predicate.value_based_eliminate 0.52% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.50% : 0.000003s : 21: predicate.virtual_output_eliminate 0.11% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001789 35 57.75% : 0.001033s : 14: func_graph_cloner_run.FuncGraphClonerGraph 42.25% : 0.000756s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.072546 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.55% : 0.003301s : 1: add_attr 4.54% : 0.003292s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000145s : 1: auto_monad 0.03% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.75% : 0.000541s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000018s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.04% : 0.000029s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.07% : 0.000053s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.63% : 0.000455s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.74% : 0.000539s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 6.15% : 0.004459s : 117: opt.transform.opt_a 0.05% : 0.000033s : 1: opt.transform.opt_after_cconv 0.03% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000115s : 28: opt.transform.opt_b 0.07% : 0.000053s : 2: opt.transform.opt_trans_graph 0.06% : 0.000041s : 4: opt.transform.symbol_engine_opt 20.36% : 0.014767s : 1: opt_a 0.16% : 0.000118s : 1: opt_after_cconv 0.73% : 0.000527s : 1: opt_after_jit_grad 0.31% : 0.000226s : 1: opt_b 23.41% : 0.016982s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.07% : 0.000053s : 1: pre_auto_parallel 0.06% : 0.000044s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000020s : 1: remove_dup_value 7.48% : 0.005424s : 2: renormalize.infer 2.25% : 0.001631s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000047s : 1: rewriter_after_opt_a 0.21% : 0.000152s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.13% : 0.000091s : 1: symbol_engine_optimizer 9.15% : 0.006635s : 1: task_emit 0.12% : 0.000084s : 1: tuple_transform 16.67% : 0.012090s : 1: type_inference 0.10% : 0.000072s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x3-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x3-kbk],max_mem:12.0M . TotalTime = 0.0639584, [24] [bootstrap]: 0.00064185 [type_inference]: 0.00674428 [event_method]: 1.396e-05 [auto_monad]: 6.08e-05 [graph_reusing]: 5.67999e-06 [inline]: 2.30002e-06 [add_attr]: 0.00365415, [1] [add_attr_with_inline]: 0.00364362, [1] [Cycle 1]: 4.706e-05, [2] [tag_attr]: 1.565e-05 [meta_addattr_fg_expand]: 4.92e-06 [parallel-infer-symbol]: 2.88e-06 [pre_auto_parallel]: 2.605e-05 [insert-virtual-dataset]: 2.51998e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.00419131, [53] [py_interpret_to_execute]: 2.133e-05 [rewriter_before_opt_a]: 6.357e-05 [opt_a]: 0.00224049, [2] [Cycle 1]: 0.00161533, [45] [expand_dump_flag]: 3.18e-06 [switch_simplify]: 3.301e-05 [loop_unroll]: 2.08e-05 [a_1]: 0.0004548 [with_stream_mark]: 1.44e-05 [recompute_prepare]: 8.95999e-06 [updatestate_depend_eliminate]: 4.10998e-06 [updatestate_assign_eliminate]: 4.1e-06 [updatestate_loads_eliminate]: 3.13e-06 [parameter_eliminate]: 2.03997e-06 [a_2]: 8.021e-05 [accelerated_algorithm]: 6.63e-06 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 6.22001e-06 [merge_send_recv]: 8.37e-06 [auto_parallel]: 6.44001e-06 [parallel]: 2.702e-05 [flash_sp]: 7.55e-06 [merge_comm]: 3.78999e-06 [allreduce_fusion]: 3.66999e-06 [matmul_add_comm_reduction]: 9.85002e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.73001e-06 [virtual_dataset]: 6.53003e-06 [get_grad_eliminate_]: 5.80002e-06 [virtual_output]: 6.09001e-06 [merge_forward]: 3.93001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.54e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.243e-05 [merge_recompute_call_nodes]: 1.51998e-06 [before_grad]: 1.029e-05 [set_forward_comm_id_for_comm_node_pass]: 4.19002e-06 [meta_fg_expand]: 2.78998e-06 [flash_sp_send_recv_attached]: 2.91e-06 [receive_attached]: 2.69999e-06 [after_resolve]: 9.69e-06 [a_after_grad]: 9.32001e-06 [renormalize]: 0.0004652 [add_forward_monad_depend]: 8.59e-06 [auto_monad_grad]: 2.52001e-06 [auto_monad_eliminator]: 1.46e-05 [cse]: 3.115e-05 [a_3]: 4.274e-05 [Cycle 2]: 0.00061563, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 7.05998e-06 [loop_unroll]: 5.92999e-06 [a_1]: 0.00011682 [with_stream_mark]: 1.03e-05 [recompute_prepare]: 6.10002e-06 [updatestate_depend_eliminate]: 2.99999e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.85998e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 7.288e-05 [accelerated_algorithm]: 5.96e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.21002e-06 [shard_inline]: 6.01e-06 [merge_send_recv]: 4.63999e-06 [auto_parallel]: 5.75001e-06 [parallel]: 4.37e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 3.45e-06 [allreduce_fusion]: 2.99001e-06 [matmul_add_comm_reduction]: 5.42999e-06 [allreduce_slice_to_reducescatter]: 4.19997e-07 [virtual_shard_identity]: 6.82002e-06 [virtual_dataset]: 5.68997e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.22e-06 [merge_forward]: 2.94999e-06 [cell_reuse_recompute_pass]: 1.39998e-06 [offload_activation]: 6.09999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.092e-05 [merge_recompute_call_nodes]: 7.79983e-07 [before_grad]: 8.77e-06 [set_forward_comm_id_for_comm_node_pass]: 3.65e-06 [meta_fg_expand]: 1.98997e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 8.90024e-07 [after_resolve]: 8.61002e-06 [a_after_grad]: 8.11002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.05999e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.38e-06 [cse]: 1.385e-05 [a_3]: 3.349e-05 [py_interpret_to_execute_after_opt_a]: 7.64002e-06 [slice_cell_reuse_recomputed_activation]: 1.99999e-06 [rewriter_after_opt_a]: 3.264e-05 [convert_after_rewriter]: 7.15e-06 [order_py_execute_after_rewriter]: 5.54e-06 [mutable_eliminate]: 0.00048358 [opt_b]: 0.00019185, [1] [Cycle 1]: 0.00018558, [7] [b_1]: 0.00011213 [b_2]: 7.16001e-06 [updatestate_depend_eliminate]: 5.42001e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.73e-06 [renormalize]: 3.89991e-07 [cse]: 1.837e-05 [optimize_parallel_all_gather_comm]: 1.63e-05 [overlap_param_gather]: 2.13002e-06 [cconv]: 2.353e-05 [loop_unroll]: 0.00043338 [opt_after_cconv]: 0.00010016, [1] [Cycle 1]: 9.404e-05, [7] [c_1]: 2.668e-05 [parameter_eliminate]: 2.83e-06 [updatestate_depend_eliminate]: 5.21998e-06 [updatestate_assign_eliminate]: 2.68998e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.851e-05 [renormalize]: 3.20026e-07 [remove_dup_value]: 1.579e-05 [tuple_transform]: 6.906e-05, [1] [Cycle 1]: 6.469e-05, [4] [d_1]: 3.754e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.59001e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 4.851e-05 [cse_after_recomputation]: 2.323e-05, [1] [Cycle 1]: 1.833e-05, [1] [cse]: 1.26e-05 [environ_conv]: 7.38999e-06 [swap_dp_allreduce_reducescatter]: 5.47001e-06 [bias_add_comm_swap]: 2.52001e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.56998e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.39001e-06 [assign_add_opt]: 1.46002e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 1.20999e-06 [interleave_split_concat_branches]: 1.24998e-06 [interleave_parallel_branches]: 1.15001e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.262e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 4.12e-06 [overlap_recompute_and_grad_model_parallel]: 5.04e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.48e-06 [overlap_grad_ring_attention]: 4.67998e-06 [overlap_grad_flash_sp]: 1.75e-05 [begin_end_overlap_inline]: 7.00005e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 1.77999e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 7.345e-05, [1] [Cycle 1]: 6.892e-05, [6] [build]: 2.09999e-06 [elim_shapecalc]: 9.62001e-06 [elim_not_effective]: 1.209e-05 [opt_reshape]: 6.54999e-06 [fold_const_symbol]: 9.46e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.79e-06 [auto_monad_reorder]: 1.587e-05 [get_jit_bprop_graph]: 1.17999e-06 [rewriter_after_jit_bprop_graph]: 3.31001e-06 [opt_after_jit_grad]: 0.00046746 [validate]: 3.419e-05 [backend_pass]: 1.06002e-06 [task_emit]: 0.0478588 [execute]: 9.29e-06 Sums bootstrap : 0.000642s : 1.08% type_inference : 0.006744s : 11.38% event_method : 0.000014s : 0.02% auto_monad : 0.000061s : 0.10% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.04% optimize.rewriter_before_opt_a : 0.000064s : 0.11% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.07% optimize.opt_a.loop_unroll : 0.000027s : 0.05% optimize.opt_a.a_1 : 0.000572s : 0.96% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000015s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000153s : 0.26% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000031s : 0.05% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000465s : 0.78% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.02% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.04% optimize.opt_a.cse : 0.000045s : 0.08% optimize.opt_a.a_3 : 0.000076s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000484s : 0.82% optimize.opt_b.b_1 : 0.000112s : 0.19% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.04% optimize.loop_unroll : 0.000433s : 0.73% optimize.opt_after_cconv.c_1 : 0.000027s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.03% optimize.tuple_transform.d_1 : 0.000038s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.08% optimize.cse_after_recomputation.cse : 0.000013s : 0.02% optimize.environ_conv : 0.000007s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000467s : 0.79% validate : 0.000034s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.047859s : 80.74% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000171 26 18.91% : 0.000032s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 2.84% : 0.000005s : 3: substitution.graph_param_transform 64.40% : 0.000110s : 3: substitution.inline 1.87% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.75% : 0.000005s : 4: substitution.remove_not_recompute_node 1.99% : 0.000003s : 2: substitution.replace_old_param 5.39% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006692 2 90.57% : 0.006061s : 1: type_inference.infer 9.43% : 0.000631s : 1: type_inference.specialize ------[replace.] 0.000039 4 78.40% : 0.000030s : 3: replace.inline 21.60% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 4 92.70% : 0.000108s : 3: match.inline 7.30% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 883 0.92% : 0.000001s : 9: predicate.accumulaten_eliminater 0.97% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.89% : 0.000001s : 9: predicate.addn_zero_filter 0.81% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.16% : 0.000004s : 15: predicate.arithmetic_simplify 0.93% : 0.000002s : 9: predicate.cast_eliminate 0.68% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.25% : 0.000000s : 3: predicate.const_output_eliminate 0.62% : 0.000001s : 6: predicate.depend_value_elim 0.89% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_depend_swap 1.76% : 0.000003s : 18: predicate.environ_get_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.35% : 0.000004s : 13: predicate.float_depend_g_call 0.57% : 0.000001s : 6: predicate.float_environ_get_switch 0.83% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.70% : 0.000001s : 6: predicate.incorporate_call 0.56% : 0.000001s : 6: predicate.incorporate_call_switch 6.36% : 0.000010s : 40: predicate.inline 0.90% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 6: predicate.less_batch_normalization 1.65% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.34% : 0.000004s : 25: predicate.load_eliminater 0.99% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.16% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.81% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 9: predicate.minmaximum_grad 1.20% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.58% : 0.000003s : 13: predicate.partial_defer_inline 1.44% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.67% : 0.000001s : 6: predicate.reduce_all_const_elim 1.29% : 0.000002s : 9: predicate.reduce_eliminate 2.32% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.44% : 0.000001s : 6: predicate.remove_not_recompute_node 1.29% : 0.000002s : 16: predicate.replace_applicator 0.59% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 1.08% : 0.000002s : 9: predicate.reshape_eliminate 0.60% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 3: predicate.row_tensor_eliminate 0.80% : 0.000001s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 6: predicate.shard_identity_eliminate 0.75% : 0.000001s : 6: predicate.special_op_eliminate 0.81% : 0.000001s : 6: predicate.specialize_transform 1.16% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000002s : 6: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 13: predicate.switch_defer_inline 1.97% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.83% : 0.000008s : 43: predicate.switch_simplify 0.92% : 0.000001s : 9: predicate.tile_eliminate 0.87% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.52% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 3: predicate.value_based_eliminate 0.67% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000375 8 46.73% : 0.000175s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.27% : 0.000200s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.073364 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.99% : 0.003659s : 1: add_attr 4.97% : 0.003647s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000066s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.93% : 0.000679s : 1: bootstrap 0.04% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000026s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.60% : 0.000442s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.67% : 0.000493s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.29% : 0.000949s : 78: opt.transform.opt_a 0.03% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000042s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 3.06% : 0.002243s : 1: opt_a 0.14% : 0.000104s : 1: opt_after_cconv 0.65% : 0.000477s : 1: opt_after_jit_grad 0.27% : 0.000195s : 1: opt_b 5.72% : 0.004195s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000020s : 1: remove_dup_value 0.34% : 0.000246s : 1: renormalize.infer 0.29% : 0.000212s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000037s : 1: rewriter_after_opt_a 0.09% : 0.000068s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.10% : 0.000076s : 1: symbol_engine_optimizer 65.26% : 0.047880s : 1: task_emit 0.10% : 0.000072s : 1: tuple_transform 9.21% : 0.006758s : 1: type_inference 0.08% : 0.000057s : 1: validate TotalTime = 0.0574928, [24] [bootstrap]: 0.00046768 [type_inference]: 0.00616416 [event_method]: 1.321e-05 [auto_monad]: 6.24e-05 [graph_reusing]: 5.51002e-06 [inline]: 1.89e-06 [add_attr]: 0.00311005, [1] [add_attr_with_inline]: 0.00310226, [1] [Cycle 1]: 5.303e-05, [2] [tag_attr]: 1.477e-05 [meta_addattr_fg_expand]: 4.53999e-06 [parallel-infer-symbol]: 3.91999e-06 [pre_auto_parallel]: 2.486e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 1.77001e-06 [pipeline_split]: 1.87001e-06 [optimize]: 0.00413532, [53] [py_interpret_to_execute]: 2.581e-05 [rewriter_before_opt_a]: 5.267e-05 [opt_a]: 0.0021053, [2] [Cycle 1]: 0.00147522, [45] [expand_dump_flag]: 3.29001e-06 [switch_simplify]: 3.076e-05 [loop_unroll]: 1.795e-05 [a_1]: 0.00036338 [with_stream_mark]: 1.524e-05 [recompute_prepare]: 8.60999e-06 [updatestate_depend_eliminate]: 4.04002e-06 [updatestate_assign_eliminate]: 4.03999e-06 [updatestate_loads_eliminate]: 3.43999e-06 [parameter_eliminate]: 1.79998e-06 [a_2]: 8.395e-05 [accelerated_algorithm]: 6.90002e-06 [shard]: 2.23998e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 6.07999e-06 [merge_send_recv]: 8.40999e-06 [auto_parallel]: 5.92001e-06 [parallel]: 1.923e-05 [flash_sp]: 8e-06 [merge_comm]: 3.98001e-06 [allreduce_fusion]: 3.57002e-06 [matmul_add_comm_reduction]: 9.72999e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.46001e-06 [virtual_dataset]: 6.42001e-06 [get_grad_eliminate_]: 5.77999e-06 [virtual_output]: 6.12999e-06 [merge_forward]: 4.01001e-06 [cell_reuse_recompute_pass]: 1.11002e-06 [offload_activation]: 9.57001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.156e-05 [merge_recompute_call_nodes]: 1.76e-06 [before_grad]: 1.011e-05 [set_forward_comm_id_for_comm_node_pass]: 3.78999e-06 [meta_fg_expand]: 2.73003e-06 [flash_sp_send_recv_attached]: 2.79001e-06 [receive_attached]: 2.63e-06 [after_resolve]: 9.87001e-06 [a_after_grad]: 9.15001e-06 [renormalize]: 0.00043429 [add_forward_monad_depend]: 4.64002e-06 [auto_monad_grad]: 1.87001e-06 [auto_monad_eliminator]: 1.44e-05 [cse]: 3.002e-05 [a_3]: 4.219e-05 [Cycle 2]: 0.00062052, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 7.08e-06 [loop_unroll]: 5.78002e-06 [a_1]: 0.00011746 [with_stream_mark]: 1.28e-05 [recompute_prepare]: 6.39001e-06 [updatestate_depend_eliminate]: 3.14999e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.75002e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 7.292e-05 [accelerated_algorithm]: 5.89e-06 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 5.95002e-06 [merge_send_recv]: 4.75999e-06 [auto_parallel]: 5.42999e-06 [parallel]: 4.53999e-06 [flash_sp]: 3.66001e-06 [merge_comm]: 3.18e-06 [allreduce_fusion]: 2.98e-06 [matmul_add_comm_reduction]: 5.55001e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 6.51e-06 [virtual_dataset]: 5.53002e-06 [get_grad_eliminate_]: 5.54998e-06 [virtual_output]: 5.39998e-06 [merge_forward]: 2.80997e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 6.51999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.057e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 9.04e-06 [set_forward_comm_id_for_comm_node_pass]: 3.73999e-06 [meta_fg_expand]: 2.06998e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 8.84e-06 [a_after_grad]: 8.03001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 6.63998e-06 [cse]: 1.386e-05 [a_3]: 3.349e-05 [py_interpret_to_execute_after_opt_a]: 8.05e-06 [slice_cell_reuse_recomputed_activation]: 2.88e-06 [rewriter_after_opt_a]: 3.302e-05 [convert_after_rewriter]: 6.59001e-06 [order_py_execute_after_rewriter]: 5.45001e-06 [mutable_eliminate]: 0.00053101 [opt_b]: 0.00019285, [1] [Cycle 1]: 0.00018632, [7] [b_1]: 0.00011365 [b_2]: 7.43999e-06 [updatestate_depend_eliminate]: 5.59998e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.51998e-06 [renormalize]: 2.49973e-07 [cse]: 1.808e-05 [optimize_parallel_all_gather_comm]: 1.662e-05 [overlap_param_gather]: 2.24001e-06 [cconv]: 2.443e-05 [loop_unroll]: 0.0004455 [opt_after_cconv]: 0.00010298, [1] [Cycle 1]: 9.667e-05, [7] [c_1]: 2.622e-05 [parameter_eliminate]: 2.54001e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.82002e-06 [updatestate_loads_eliminate]: 2.69001e-06 [cse]: 1.935e-05 [renormalize]: 5.69999e-07 [remove_dup_value]: 1.717e-05 [tuple_transform]: 7.018e-05, [1] [Cycle 1]: 6.558e-05, [4] [d_1]: 3.743e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 2.80008e-07 [switch_simplify]: 7.01001e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.673e-05 [cse_after_recomputation]: 2.55e-05, [1] [Cycle 1]: 2.026e-05, [1] [cse]: 1.364e-05 [environ_conv]: 5.85002e-06 [swap_dp_allreduce_reducescatter]: 6.07999e-06 [bias_add_comm_swap]: 2.83e-06 [label_micro_interleaved_index]: 5.10001e-06 [label_fine_grained_interleaved_index]: 2.90998e-06 [merge_cast_opt]: 1.68002e-06 [slice_recompute_activation]: 2.33998e-06 [micro_interleaved_order_control]: 2.63e-06 [assign_add_opt]: 1.33002e-06 [ForceFp32Comm]: 8.59989e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 1.15999e-06 [interleave_split_concat_branches]: 1.24998e-06 [interleave_parallel_branches]: 1.17e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 2.04e-06 [control_data_broadcast_order]: 1.247e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 4.67e-06 [overlap_recompute_and_grad_model_parallel]: 5.17e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.50999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.73002e-06 [overlap_recompute_comm]: 2.51998e-06 [overlap_grad_ring_attention]: 4.97e-06 [overlap_grad_flash_sp]: 1.861e-05 [begin_end_overlap_inline]: 6.50005e-07 [split_matmul_comm_elemetwise]: 2.50002e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 1.11997e-06 [symbol_engine_optimizer]: 7.832e-05, [1] [Cycle 1]: 7.339e-05, [6] [build]: 2.49999e-06 [elim_shapecalc]: 9.60001e-06 [elim_not_effective]: 1.317e-05 [opt_reshape]: 7.45003e-06 [fold_const_symbol]: 9.68002e-06 [renormalize]: 2.70025e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 2.07999e-06 [auto_monad_reorder]: 1.694e-05 [get_jit_bprop_graph]: 1.17e-06 [rewriter_after_jit_bprop_graph]: 3.55e-06 [opt_after_jit_grad]: 0.00049082 [validate]: 3.512e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0427155 [execute]: 9.51998e-06 Sums bootstrap : 0.000468s : 0.88% type_inference : 0.006164s : 11.56% event_method : 0.000013s : 0.02% auto_monad : 0.000062s : 0.12% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000025s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000026s : 0.05% optimize.rewriter_before_opt_a : 0.000053s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.07% optimize.opt_a.loop_unroll : 0.000024s : 0.04% optimize.opt_a.a_1 : 0.000481s : 0.90% optimize.opt_a.with_stream_mark : 0.000028s : 0.05% optimize.opt_a.recompute_prepare : 0.000015s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000157s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000024s : 0.04% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000012s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.04% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000434s : 0.81% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.04% optimize.opt_a.cse : 0.000044s : 0.08% optimize.opt_a.a_3 : 0.000076s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000531s : 1.00% optimize.opt_b.b_1 : 0.000114s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.05% optimize.loop_unroll : 0.000445s : 0.84% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000019s : 0.04% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000017s : 0.03% optimize.tuple_transform.d_1 : 0.000037s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.09% optimize.cse_after_recomputation.cse : 0.000014s : 0.03% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000019s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000491s : 0.92% validate : 0.000035s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.042715s : 80.09% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000147 24 20.65% : 0.000030s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000002s : 2: substitution.fold_const_symbol 3.68% : 0.000005s : 3: substitution.graph_param_transform 65.76% : 0.000097s : 3: substitution.inline 2.14% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.14% : 0.000005s : 4: substitution.remove_not_recompute_node 2.12% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006117 2 92.01% : 0.005628s : 1: type_inference.infer 7.99% : 0.000489s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000095 3 100.00% : 0.000095s : 3: match.inline ------[predicate.] 0.000154 815 0.88% : 0.000001s : 8: predicate.accumulaten_eliminater 1.01% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.86% : 0.000001s : 8: predicate.addn_zero_filter 0.78% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.44% : 0.000004s : 14: predicate.arithmetic_simplify 0.87% : 0.000001s : 8: predicate.cast_eliminate 0.70% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.81% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.02% : 0.000002s : 11: predicate.environ_get_depend_swap 1.75% : 0.000003s : 17: predicate.environ_get_eliminate 1.05% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.16% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 6: predicate.float_environ_get_switch 0.86% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 3: predicate.fold_const_symbol 0.70% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.73% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.34% : 0.000010s : 37: predicate.inline 0.98% : 0.000002s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.01% : 0.000002s : 6: predicate.less_batch_normalization 1.57% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 22: predicate.load_eliminater 1.06% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.96% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.66% : 0.000003s : 3: predicate.mutable_eliminate 0.56% : 0.000001s : 3: predicate.opt_reshape 0.50% : 0.000001s : 3: predicate.parallel_virtual_node 1.42% : 0.000002s : 11: predicate.partial_defer_inline 1.31% : 0.000002s : 11: predicate.partial_eliminate 0.88% : 0.000001s : 8: predicate.print_const_string_wrapper 0.74% : 0.000001s : 6: predicate.reduce_all_const_elim 1.09% : 0.000002s : 8: predicate.reduce_eliminate 2.23% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.59% : 0.000001s : 6: predicate.remove_not_recompute_node 1.26% : 0.000002s : 14: predicate.replace_applicator 0.61% : 0.000001s : 6: predicate.replace_old_param 0.32% : 0.000000s : 3: predicate.reset_defer_inline 0.90% : 0.000001s : 8: predicate.reshape_eliminate 0.67% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 3: predicate.row_tensor_eliminate 0.79% : 0.000001s : 6: predicate.same_eliminate 0.52% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 6: predicate.shard_identity_eliminate 0.84% : 0.000001s : 6: predicate.special_op_eliminate 0.93% : 0.000001s : 6: predicate.specialize_transform 0.94% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.23% : 0.000002s : 11: predicate.switch_defer_inline 1.86% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.80% : 0.000007s : 38: predicate.switch_simplify 0.87% : 0.000001s : 8: predicate.tile_eliminate 0.94% : 0.000001s : 8: predicate.transpose_eliminate 1.55% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.80% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.15% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.95% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.76% : 0.000001s : 3: predicate.value_based_eliminate 0.75% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000322 7 40.64% : 0.000131s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.36% : 0.000191s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.066183 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.71% : 0.003115s : 1: add_attr 4.69% : 0.003106s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000068s : 1: auto_monad 0.03% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.76% : 0.000504s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000029s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000010s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.03% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000009s : 1: label_micro_interleaved_index 0.69% : 0.000454s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.82% : 0.000541s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000016s : 1: opt.transform.mutable_eliminate 1.29% : 0.000856s : 78: opt.transform.opt_a 0.04% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000092s : 28: opt.transform.opt_b 0.06% : 0.000042s : 2: opt.transform.opt_trans_graph 0.05% : 0.000036s : 4: opt.transform.symbol_engine_opt 3.19% : 0.002108s : 1: opt_a 0.16% : 0.000107s : 1: opt_after_cconv 0.76% : 0.000501s : 1: opt_after_jit_grad 0.30% : 0.000196s : 1: opt_b 6.25% : 0.004140s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.05% : 0.000030s : 1: py_interpret_to_execute 0.02% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000021s : 1: remove_dup_value 0.34% : 0.000224s : 1: renormalize.infer 0.31% : 0.000203s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000037s : 1: rewriter_after_opt_a 0.09% : 0.000057s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000081s : 1: symbol_engine_optimizer 64.57% : 0.042736s : 1: task_emit 0.11% : 0.000073s : 1: tuple_transform 9.34% : 0.006179s : 1: type_inference 0.09% : 0.000060s : 1: validate TotalTime = 0.0561664, [24] [bootstrap]: 0.00046409 [type_inference]: 0.00574324 [event_method]: 1.42e-05 [auto_monad]: 6.341e-05 [graph_reusing]: 5.31998e-06 [inline]: 2.20002e-06 [add_attr]: 0.00309666, [1] [add_attr_with_inline]: 0.00308873, [1] [Cycle 1]: 4.693e-05, [2] [tag_attr]: 1.527e-05 [meta_addattr_fg_expand]: 4.56002e-06 [parallel-infer-symbol]: 2.61e-06 [pre_auto_parallel]: 2.65e-05 [insert-virtual-dataset]: 2.61e-06 [parallel-infer-symbol-second]: 8.99978e-07 [dataset_repeat_opt]: 2.44001e-06 [pipeline_split]: 1.81e-06 [optimize]: 0.00416205, [53] [py_interpret_to_execute]: 2.251e-05 [rewriter_before_opt_a]: 6.432e-05 [opt_a]: 0.00225426, [2] [Cycle 1]: 0.00162945, [45] [expand_dump_flag]: 3.11001e-06 [switch_simplify]: 3.39e-05 [loop_unroll]: 2.113e-05 [a_1]: 0.00044761 [with_stream_mark]: 1.454e-05 [recompute_prepare]: 7.98999e-06 [updatestate_depend_eliminate]: 3.99002e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 2.97002e-06 [parameter_eliminate]: 2.00002e-06 [a_2]: 8.046e-05 [accelerated_algorithm]: 6.96999e-06 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 6.09999e-06 [merge_send_recv]: 8.48001e-06 [auto_parallel]: 5.99e-06 [parallel]: 1.76e-05 [flash_sp]: 7.34002e-06 [merge_comm]: 3.92998e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 9.54e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 8.07998e-06 [virtual_dataset]: 6.13998e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 5.87999e-06 [merge_forward]: 3.84002e-06 [cell_reuse_recompute_pass]: 1.36002e-06 [offload_activation]: 9.91e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.227e-05 [merge_recompute_call_nodes]: 1.77999e-06 [before_grad]: 1.078e-05 [set_forward_comm_id_for_comm_node_pass]: 3.7e-06 [meta_fg_expand]: 3.23e-06 [flash_sp_send_recv_attached]: 2.52001e-06 [receive_attached]: 1.97001e-06 [after_resolve]: 9.61e-06 [a_after_grad]: 9.02e-06 [renormalize]: 0.0004577 [add_forward_monad_depend]: 4.59998e-06 [auto_monad_grad]: 1.81e-06 [auto_monad_eliminator]: 1.43e-05 [cse]: 3.031e-05 [a_3]: 4.285e-05 [Cycle 2]: 0.00061451, [45] [expand_dump_flag]: 9.10019e-07 [switch_simplify]: 7.33e-06 [loop_unroll]: 5.79e-06 [a_1]: 0.00011579 [with_stream_mark]: 9.57001e-06 [recompute_prepare]: 5.82001e-06 [updatestate_depend_eliminate]: 3.28e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 7.201e-05 [accelerated_algorithm]: 5.87999e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.25999e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 4.41002e-06 [auto_parallel]: 5.23002e-06 [parallel]: 4.68999e-06 [flash_sp]: 3.54002e-06 [merge_comm]: 3.43e-06 [allreduce_fusion]: 2.96001e-06 [matmul_add_comm_reduction]: 5.61003e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.43e-06 [virtual_dataset]: 5.67999e-06 [get_grad_eliminate_]: 5.47001e-06 [virtual_output]: 5.32999e-06 [merge_forward]: 2.78e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 6.44001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.059e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 9.04998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.6e-06 [meta_fg_expand]: 1.97999e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.70999e-06 [a_after_grad]: 8.2e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.10001e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 6.68e-06 [cse]: 1.502e-05 [a_3]: 3.331e-05 [py_interpret_to_execute_after_opt_a]: 7.53e-06 [slice_cell_reuse_recomputed_activation]: 1.97999e-06 [rewriter_after_opt_a]: 3.334e-05 [convert_after_rewriter]: 6.79999e-06 [order_py_execute_after_rewriter]: 5.42999e-06 [mutable_eliminate]: 0.00046263 [opt_b]: 0.00018972, [1] [Cycle 1]: 0.00018339, [7] [b_1]: 0.0001113 [b_2]: 7.48999e-06 [updatestate_depend_eliminate]: 5.36002e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 3.30008e-07 [cse]: 1.794e-05 [optimize_parallel_all_gather_comm]: 1.557e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.216e-05 [loop_unroll]: 0.00042621 [opt_after_cconv]: 9.862e-05, [1] [Cycle 1]: 9.259e-05, [7] [c_1]: 2.621e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 3.11999e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.86e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.561e-05 [tuple_transform]: 7.145e-05, [1] [Cycle 1]: 6.641e-05, [4] [d_1]: 3.84e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 7.01999e-06 [partial_unused_args_eliminate]: 1.77999e-06 [add_recomputation]: 4.452e-05 [cse_after_recomputation]: 2.274e-05, [1] [Cycle 1]: 1.804e-05, [1] [cse]: 1.219e-05 [environ_conv]: 5.05001e-06 [swap_dp_allreduce_reducescatter]: 5.34e-06 [bias_add_comm_swap]: 2.96001e-06 [label_micro_interleaved_index]: 4.27e-06 [label_fine_grained_interleaved_index]: 2.74999e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.37999e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 1.14e-06 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.16998e-06 [reorder_send_recv_between_fp_bp]: 2.89001e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 1.04998e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.73002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76003e-06 [control_data_broadcast_order]: 1.284e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 4.02998e-06 [overlap_recompute_and_grad_model_parallel]: 4.99998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.64001e-06 [overlap_grad_ring_attention]: 4.25999e-06 [overlap_grad_flash_sp]: 1.717e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.93002e-06 [handle_group_info]: 1.28002e-06 [symbol_engine_optimizer]: 7.286e-05, [1] [Cycle 1]: 6.832e-05, [6] [build]: 2.44999e-06 [elim_shapecalc]: 9.15999e-06 [elim_not_effective]: 1.25e-05 [opt_reshape]: 6.69999e-06 [fold_const_symbol]: 9.63002e-06 [renormalize]: 2.20025e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.50001e-06 [auto_monad_reorder]: 1.667e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.35e-06 [opt_after_jit_grad]: 0.00046355 [validate]: 3.526e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.0418347 [execute]: 1.021e-05 Sums bootstrap : 0.000464s : 0.89% type_inference : 0.005743s : 11.04% event_method : 0.000014s : 0.03% auto_monad : 0.000063s : 0.12% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000026s : 0.05% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.04% optimize.rewriter_before_opt_a : 0.000064s : 0.12% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000041s : 0.08% optimize.opt_a.loop_unroll : 0.000027s : 0.05% optimize.opt_a.a_1 : 0.000563s : 1.08% optimize.opt_a.with_stream_mark : 0.000024s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000152s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.04% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000458s : 0.88% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.04% optimize.opt_a.cse : 0.000045s : 0.09% optimize.opt_a.a_3 : 0.000076s : 0.15% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000463s : 0.89% optimize.opt_b.b_1 : 0.000111s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.04% optimize.loop_unroll : 0.000426s : 0.82% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.04% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.03% optimize.tuple_transform.d_1 : 0.000038s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.09% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000464s : 0.89% validate : 0.000035s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.041835s : 80.45% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000170 26 19.18% : 0.000033s : 5: substitution.arithmetic_simplify 1.30% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 3.22% : 0.000005s : 3: substitution.graph_param_transform 63.40% : 0.000108s : 3: substitution.inline 2.12% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.89% : 0.000005s : 4: substitution.remove_not_recompute_node 1.94% : 0.000003s : 2: substitution.replace_old_param 5.12% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005701 2 89.21% : 0.005086s : 1: type_inference.infer 10.79% : 0.000615s : 1: type_inference.specialize ------[replace.] 0.000037 4 77.96% : 0.000029s : 3: replace.inline 22.04% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 4 92.98% : 0.000106s : 3: match.inline 7.02% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 883 0.92% : 0.000001s : 9: predicate.accumulaten_eliminater 0.84% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.94% : 0.000002s : 9: predicate.addn_zero_filter 0.81% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 15: predicate.arithmetic_simplify 0.97% : 0.000002s : 9: predicate.cast_eliminate 0.71% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.62% : 0.000001s : 6: predicate.depend_value_elim 0.93% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 1.01% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.19% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_depend_swap 1.78% : 0.000003s : 18: predicate.environ_get_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 13: predicate.float_depend_g_call 0.57% : 0.000001s : 6: predicate.float_environ_get_switch 0.83% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 3: predicate.fold_const_symbol 0.69% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.69% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.46% : 0.000010s : 40: predicate.inline 0.96% : 0.000002s : 6: predicate.inline_without_move 0.38% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 6: predicate.less_batch_normalization 1.67% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 25: predicate.load_eliminater 1.08% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.07% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.57% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.16% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.64% : 0.000003s : 13: predicate.partial_defer_inline 1.44% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.14% : 0.000002s : 9: predicate.reduce_eliminate 2.37% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.26% : 0.000002s : 16: predicate.replace_applicator 0.62% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.93% : 0.000002s : 9: predicate.reshape_eliminate 0.65% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.83% : 0.000001s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.04% : 0.000002s : 6: predicate.shard_identity_eliminate 0.78% : 0.000001s : 6: predicate.special_op_eliminate 0.81% : 0.000001s : 6: predicate.specialize_transform 0.91% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.75% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 13: predicate.switch_defer_inline 2.00% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.06% : 0.000008s : 43: predicate.switch_simplify 0.93% : 0.000002s : 9: predicate.tile_eliminate 0.90% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.07% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 3: predicate.value_based_eliminate 0.68% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000376 8 47.14% : 0.000177s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.86% : 0.000199s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064974 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.77% : 0.003101s : 1: add_attr 4.76% : 0.003092s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.11% : 0.000068s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.77% : 0.000503s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000026s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.03% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.67% : 0.000435s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.72% : 0.000471s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.45% : 0.000941s : 78: opt.transform.opt_a 0.04% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000090s : 28: opt.transform.opt_b 0.07% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 3.47% : 0.002257s : 1: opt_a 0.16% : 0.000102s : 1: opt_after_cconv 0.73% : 0.000473s : 1: opt_after_jit_grad 0.30% : 0.000193s : 1: opt_b 6.41% : 0.004166s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.05% : 0.000031s : 1: pre_auto_parallel 0.04% : 0.000026s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.37% : 0.000238s : 1: renormalize.infer 0.33% : 0.000212s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000037s : 1: rewriter_after_opt_a 0.11% : 0.000068s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000076s : 1: symbol_engine_optimizer 64.42% : 0.041854s : 1: task_emit 0.11% : 0.000074s : 1: tuple_transform 8.86% : 0.005756s : 1: type_inference 0.09% : 0.000057s : 1: validate TotalTime = 0.0769406, [24] [bootstrap]: 0.00048308 [type_inference]: 0.0116763 [event_method]: 4.898e-05 [auto_monad]: 0.00013264 [graph_reusing]: 9.29e-06 [inline]: 1.82999e-06 [add_attr]: 0.00313909, [1] [add_attr_with_inline]: 0.00313101, [1] [Cycle 1]: 7.319e-05, [2] [tag_attr]: 3.299e-05 [meta_addattr_fg_expand]: 1.034e-05 [parallel-infer-symbol]: 2.88e-06 [pre_auto_parallel]: 4.827e-05 [insert-virtual-dataset]: 2.53e-06 [parallel-infer-symbol-second]: 8.49977e-07 [dataset_repeat_opt]: 2.21e-06 [pipeline_split]: 2.05002e-06 [optimize]: 0.016587, [53] [py_interpret_to_execute]: 3.935e-05 [rewriter_before_opt_a]: 0.00015604 [opt_a]: 0.0143975, [3] [Cycle 1]: 0.0110057, [45] [expand_dump_flag]: 4.12e-06 [switch_simplify]: 7.749e-05 [loop_unroll]: 6.323e-05 [a_1]: 0.00143711 [with_stream_mark]: 2.431e-05 [recompute_prepare]: 2.155e-05 [updatestate_depend_eliminate]: 8.51002e-06 [updatestate_assign_eliminate]: 7.22002e-06 [updatestate_loads_eliminate]: 7.51999e-06 [parameter_eliminate]: 2.83998e-06 [a_2]: 0.00024089 [accelerated_algorithm]: 3.085e-05 [shard]: 1.82999e-06 [meta_shard_fg_expand]: 3.6e-06 [shard_inline]: 1.618e-05 [merge_send_recv]: 1.622e-05 [auto_parallel]: 1.033e-05 [parallel]: 1.904e-05 [flash_sp]: 1.151e-05 [merge_comm]: 9.72999e-06 [allreduce_fusion]: 8.63001e-06 [matmul_add_comm_reduction]: 2.632e-05 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 1.75e-05 [virtual_dataset]: 1.506e-05 [get_grad_eliminate_]: 1.512e-05 [virtual_output]: 1.486e-05 [merge_forward]: 9.05999e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 1.748e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.939e-05 [merge_recompute_call_nodes]: 1.77999e-06 [before_grad]: 2.783e-05 [set_forward_comm_id_for_comm_node_pass]: 9.77001e-06 [meta_fg_expand]: 0.00148974 [flash_sp_send_recv_attached]: 4.15999e-06 [receive_attached]: 2.16e-06 [after_resolve]: 6.325e-05 [a_after_grad]: 8.759e-05 [renormalize]: 0.00623286 [add_forward_monad_depend]: 9.25001e-06 [auto_monad_grad]: 5.63002e-06 [auto_monad_eliminator]: 5.12e-05 [cse]: 0.00022625 [a_3]: 0.00033104 [Cycle 2]: 0.00269561, [45] [expand_dump_flag]: 1.50999e-06 [switch_simplify]: 4.586e-05 [loop_unroll]: 4.248e-05 [a_1]: 0.00133239 [with_stream_mark]: 1.078e-05 [recompute_prepare]: 9.37001e-06 [updatestate_depend_eliminate]: 3.61999e-06 [updatestate_assign_eliminate]: 2.91999e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 8.855e-05 [accelerated_algorithm]: 1.002e-05 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 7.06999e-06 [merge_send_recv]: 6.12001e-06 [auto_parallel]: 6.46999e-06 [parallel]: 4.82e-06 [flash_sp]: 3.7e-06 [merge_comm]: 4.05e-06 [allreduce_fusion]: 3.46999e-06 [matmul_add_comm_reduction]: 6.38e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 7.8e-06 [virtual_dataset]: 6.52001e-06 [get_grad_eliminate_]: 6.48e-06 [virtual_output]: 6.21e-06 [merge_forward]: 3.45e-06 [cell_reuse_recompute_pass]: 9.50007e-07 [offload_activation]: 7.88001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.393e-05 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 1.137e-05 [set_forward_comm_id_for_comm_node_pass]: 4.22e-06 [meta_fg_expand]: 7.524e-05 [flash_sp_send_recv_attached]: 1.03001e-06 [receive_attached]: 1.09e-06 [after_resolve]: 1.198e-05 [a_after_grad]: 1.017e-05 [renormalize]: 0.00059081 [add_forward_monad_depend]: 4.26001e-06 [auto_monad_grad]: 1.39e-06 [auto_monad_eliminator]: 1.168e-05 [cse]: 2.178e-05 [a_3]: 4.794e-05 [Cycle 3]: 0.00068201, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 7.99997e-06 [loop_unroll]: 6.69001e-06 [a_1]: 0.00014697 [with_stream_mark]: 8.97e-06 [recompute_prepare]: 6.79999e-06 [updatestate_depend_eliminate]: 3.82002e-06 [updatestate_assign_eliminate]: 2.83e-06 [updatestate_loads_eliminate]: 2.57001e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 8.474e-05 [accelerated_algorithm]: 9.67999e-06 [shard]: 9.29984e-07 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 6.89001e-06 [merge_send_recv]: 5.25001e-06 [auto_parallel]: 6.49999e-06 [parallel]: 4.57998e-06 [flash_sp]: 1.00001e-06 [merge_comm]: 3.84002e-06 [allreduce_fusion]: 3.53e-06 [matmul_add_comm_reduction]: 5.79e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 7.78999e-06 [virtual_dataset]: 6.31e-06 [get_grad_eliminate_]: 6.17999e-06 [virtual_output]: 5.89e-06 [merge_forward]: 3.53999e-06 [cell_reuse_recompute_pass]: 1.42999e-06 [offload_activation]: 6.88e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.305e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.112e-05 [set_forward_comm_id_for_comm_node_pass]: 3.97e-06 [meta_fg_expand]: 2.36998e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 9.09989e-07 [after_resolve]: 9.15999e-06 [a_after_grad]: 9.46e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.42e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 7.63001e-06 [cse]: 1.714e-05 [a_3]: 3.867e-05 [py_interpret_to_execute_after_opt_a]: 9.90002e-06 [slice_cell_reuse_recomputed_activation]: 1.99999e-06 [rewriter_after_opt_a]: 4.081e-05 [convert_after_rewriter]: 7.5e-06 [order_py_execute_after_rewriter]: 5.79e-06 [mutable_eliminate]: 0.0005227 [opt_b]: 0.00021715, [1] [Cycle 1]: 0.00021086, [7] [b_1]: 0.00013274 [b_2]: 8.37e-06 [updatestate_depend_eliminate]: 6.43e-06 [updatestate_assign_eliminate]: 2.87002e-06 [updatestate_loads_eliminate]: 3.06001e-06 [renormalize]: 4.40021e-07 [cse]: 2.228e-05 [optimize_parallel_all_gather_comm]: 1.787e-05 [overlap_param_gather]: 1.95001e-06 [cconv]: 2.19e-05 [loop_unroll]: 0.00043329 [opt_after_cconv]: 0.00011383, [1] [Cycle 1]: 0.00010771, [7] [c_1]: 3.185e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.74e-06 [updatestate_assign_eliminate]: 3.09001e-06 [updatestate_loads_eliminate]: 2.89001e-06 [cse]: 2.672e-05 [renormalize]: 2.79979e-07 [remove_dup_value]: 1.671e-05 [tuple_transform]: 8.113e-05, [1] [Cycle 1]: 7.608e-05, [4] [d_1]: 4.679e-05 [none_parameter_eliminate]: 1.97001e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 7.66001e-06 [partial_unused_args_eliminate]: 2.01e-06 [add_recomputation]: 5.228e-05 [cse_after_recomputation]: 2.629e-05, [1] [Cycle 1]: 2.176e-05, [1] [cse]: 1.59e-05 [environ_conv]: 8.33999e-06 [swap_dp_allreduce_reducescatter]: 6.12999e-06 [bias_add_comm_swap]: 2.88e-06 [label_micro_interleaved_index]: 4.38001e-06 [label_fine_grained_interleaved_index]: 2.70997e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.27999e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.39001e-06 [reorder_send_recv_between_fp_bp]: 3.04999e-06 [comm_op_add_attrs]: 1.17e-06 [add_comm_op_reuse_tag]: 1.32e-06 [interleave_split_concat_branches]: 1.22999e-06 [interleave_parallel_branches]: 1.50999e-06 [overlap_opt_shard_in_pipeline]: 1.42999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.85001e-06 [control_data_broadcast_order]: 1.432e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.22998e-06 [overlap_recompute_and_grad_model_parallel]: 5.19998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.44999e-06 [overlap_grad_ring_attention]: 4.85999e-06 [overlap_grad_flash_sp]: 2.11e-05 [begin_end_overlap_inline]: 6.69999e-07 [split_matmul_comm_elemetwise]: 2.58e-06 [split_layernorm_comm]: 1.93002e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 8.649e-05, [1] [Cycle 1]: 8.203e-05, [6] [build]: 8.42998e-06 [elim_shapecalc]: 1.077e-05 [elim_not_effective]: 1.466e-05 [opt_reshape]: 7.4e-06 [fold_const_symbol]: 1.156e-05 [renormalize]: 2.3999e-07 [detach_backward]: 2.07999e-06 [pipeline_parallel_scheduler]: 1.66e-06 [auto_monad_reorder]: 2.007e-05 [get_jit_bprop_graph]: 1.24998e-06 [rewriter_after_jit_bprop_graph]: 3.74002e-06 [opt_after_jit_grad]: 0.00047581 [validate]: 4.225e-05 [backend_pass]: 8.30012e-07 [task_emit]: 0.0440227 [execute]: 9.00999e-06 Sums bootstrap : 0.000483s : 0.67% type_inference : 0.011676s : 16.11% event_method : 0.000049s : 0.07% auto_monad : 0.000133s : 0.18% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.05% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000048s : 0.07% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.05% optimize.rewriter_before_opt_a : 0.000156s : 0.22% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000131s : 0.18% optimize.opt_a.loop_unroll : 0.000112s : 0.16% optimize.opt_a.a_1 : 0.002916s : 4.02% optimize.opt_a.with_stream_mark : 0.000044s : 0.06% optimize.opt_a.recompute_prepare : 0.000038s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000013s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000013s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000414s : 0.57% optimize.opt_a.accelerated_algorithm : 0.000051s : 0.07% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000030s : 0.04% optimize.opt_a.merge_send_recv : 0.000028s : 0.04% optimize.opt_a.auto_parallel : 0.000023s : 0.03% optimize.opt_a.parallel : 0.000028s : 0.04% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000018s : 0.02% optimize.opt_a.allreduce_fusion : 0.000016s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000038s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000033s : 0.05% optimize.opt_a.virtual_dataset : 0.000028s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.04% optimize.opt_a.virtual_output : 0.000027s : 0.04% optimize.opt_a.merge_forward : 0.000016s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000032s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000056s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000050s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.02% optimize.opt_a.meta_fg_expand : 0.001567s : 2.16% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000084s : 0.12% optimize.opt_a.a_after_grad : 0.000107s : 0.15% optimize.opt_a.renormalize : 0.006824s : 9.41% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.02% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000071s : 0.10% optimize.opt_a.cse : 0.000265s : 0.37% optimize.opt_a.a_3 : 0.000418s : 0.58% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000041s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000523s : 0.72% optimize.opt_b.b_1 : 0.000133s : 0.18% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000022s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000433s : 0.60% optimize.opt_after_cconv.c_1 : 0.000032s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000027s : 0.04% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000017s : 0.02% optimize.tuple_transform.d_1 : 0.000047s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.07% optimize.cse_after_recomputation.cse : 0.000016s : 0.02% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000002s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000021s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000020s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000476s : 0.66% validate : 0.000042s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.044023s : 60.74% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000697 161 6.87% : 0.000048s : 8: substitution.arithmetic_simplify 0.34% : 0.000002s : 3: substitution.elim_not_effective 0.61% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 3: substitution.fold_const_symbol 0.94% : 0.000007s : 4: substitution.graph_param_transform 0.45% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 58.41% : 0.000407s : 17: substitution.inline 2.33% : 0.000016s : 2: substitution.inline_without_move 1.39% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.17% : 0.000015s : 3: substitution.less_batch_normalization 1.50% : 0.000010s : 7: substitution.minmaximum_grad 0.84% : 0.000006s : 5: substitution.partial_eliminate 1.81% : 0.000013s : 15: substitution.remove_not_recompute_node 3.72% : 0.000026s : 10: substitution.replace_applicator 1.32% : 0.000009s : 10: substitution.replace_old_param 0.39% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.90% : 0.000020s : 7: substitution.tuple_list_convert_item_index_to_positive 1.47% : 0.000010s : 7: substitution.tuple_list_get_item_const_eliminator 1.93% : 0.000013s : 7: substitution.tuple_list_get_item_depend_reorder 7.51% : 0.000052s : 19: substitution.tuple_list_get_item_eliminator 2.01% : 0.000014s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011598 2 86.06% : 0.009981s : 1: type_inference.infer 13.94% : 0.001617s : 1: type_inference.specialize ------[replace.] 0.000200 27 63.99% : 0.000128s : 17: replace.inline 36.01% : 0.000072s : 10: replace.tuple_list_get_item_eliminator ------[match.] 0.000424 27 93.64% : 0.000397s : 17: match.inline 6.36% : 0.000027s : 10: match.tuple_list_get_item_eliminator ------[predicate.] 0.000695 4248 1.12% : 0.000008s : 53: predicate.accumulaten_eliminater 0.24% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.44% : 0.000003s : 21: predicate.addn_check_dump 1.15% : 0.000008s : 53: predicate.addn_zero_filter 1.10% : 0.000008s : 53: predicate.adjust_all_reduce_mul_add 2.03% : 0.000014s : 74: predicate.arithmetic_simplify 1.14% : 0.000008s : 53: predicate.cast_eliminate 1.10% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.06% : 0.000000s : 4: predicate.const_output_eliminate 0.45% : 0.000003s : 21: predicate.depend_value_elim 1.19% : 0.000008s : 53: predicate.dict_get_item_const_eliminator 1.18% : 0.000008s : 53: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 53: predicate.dict_set_item_eliminator 0.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000001s : 4: predicate.elim_not_effective 0.16% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000008s : 57: predicate.environ_add_const_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_add_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_depend_swap 1.68% : 0.000012s : 78: predicate.environ_get_eliminate 1.18% : 0.000008s : 57: predicate.environ_get_set_eliminate 1.84% : 0.000013s : 80: predicate.exchange_switch_depend_value 2.49% : 0.000017s : 80: predicate.float_depend_g_call 0.45% : 0.000003s : 21: predicate.float_environ_get_switch 0.57% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.52% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.51% : 0.000004s : 21: predicate.incorporate_call 0.45% : 0.000003s : 21: predicate.incorporate_call_switch 5.84% : 0.000041s : 183: predicate.inline 1.43% : 0.000010s : 45: predicate.inline_without_move 0.29% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.58% : 0.000004s : 21: predicate.less_batch_normalization 1.54% : 0.000011s : 71: predicate.list_to_tuple_eliminator_ 2.67% : 0.000019s : 124: predicate.load_eliminater 0.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.57% : 0.000018s : 113: predicate.loop_unroll_before_grad 1.36% : 0.000009s : 61: predicate.make_slice_get_slice_eliminator 0.47% : 0.000003s : 21: predicate.merge_addn 1.07% : 0.000007s : 50: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 50: predicate.mini_step_allgather_replace 1.16% : 0.000008s : 53: predicate.minmaximum_grad 0.29% : 0.000002s : 4: predicate.mutable_eliminate 0.11% : 0.000001s : 4: predicate.opt_reshape 0.13% : 0.000001s : 4: predicate.parallel_virtual_node 2.15% : 0.000015s : 80: predicate.partial_defer_inline 1.72% : 0.000012s : 67: predicate.partial_eliminate 1.13% : 0.000008s : 53: predicate.print_const_string_wrapper 0.47% : 0.000003s : 21: predicate.reduce_all_const_elim 1.43% : 0.000010s : 53: predicate.reduce_eliminate 2.66% : 0.000018s : 124: predicate.redundant_stop_gradient_eliminater 0.30% : 0.000002s : 21: predicate.remove_not_recompute_node 1.89% : 0.000013s : 113: predicate.replace_applicator 0.69% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.15% : 0.000008s : 53: predicate.reshape_eliminate 1.09% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 4: predicate.row_tensor_eliminate 1.23% : 0.000009s : 50: predicate.same_eliminate 0.34% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.57% : 0.000004s : 21: predicate.shard_identity_eliminate 0.21% : 0.000001s : 8: predicate.special_op_eliminate 0.60% : 0.000004s : 21: predicate.specialize_transform 1.21% : 0.000008s : 50: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000008s : 45: predicate.stack_unstack_eliminate 0.11% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.96% : 0.000014s : 80: predicate.switch_defer_inline 3.03% : 0.000021s : 130: predicate.switch_layer_defer_inline 5.32% : 0.000037s : 218: predicate.switch_simplify 1.13% : 0.000008s : 53: predicate.tile_eliminate 1.11% : 0.000008s : 53: predicate.transpose_eliminate 1.40% : 0.000010s : 61: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000011s : 61: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000010s : 61: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000020s : 92: predicate.tuple_list_get_item_eliminator 1.49% : 0.000010s : 61: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000014s : 82: predicate.tuple_list_set_item_eliminator 1.56% : 0.000011s : 71: predicate.tuple_to_list_eliminator_ 2.61% : 0.000018s : 124: predicate.updatestate_pure_node_eliminater 3.19% : 0.000022s : 145: predicate.updatestate_useless_node_eliminater 0.11% : 0.000001s : 4: predicate.value_based_eliminate 0.52% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.51% : 0.000004s : 21: predicate.virtual_output_eliminate 0.09% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001779 36 60.00% : 0.001067s : 15: func_graph_cloner_run.FuncGraphClonerGraph 40.00% : 0.000712s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.108072 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.91% : 0.003144s : 1: add_attr 2.90% : 0.003135s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.13% : 0.000140s : 1: auto_monad 0.02% : 0.000024s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.48% : 0.000518s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000029s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.05% : 0.000056s : 1: event_method 0.02% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000014s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.41% : 0.000442s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.49% : 0.000532s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000016s : 1: opt.transform.mutable_eliminate 4.06% : 0.004391s : 117: opt.transform.opt_a 0.03% : 0.000030s : 1: opt.transform.opt_after_cconv 0.02% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000112s : 28: opt.transform.opt_b 0.05% : 0.000052s : 2: opt.transform.opt_trans_graph 0.04% : 0.000041s : 4: opt.transform.symbol_engine_opt 13.33% : 0.014401s : 1: opt_a 0.11% : 0.000117s : 1: opt_after_cconv 0.45% : 0.000485s : 1: opt_after_jit_grad 0.20% : 0.000221s : 1: opt_b 15.35% : 0.016592s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000053s : 1: pre_auto_parallel 0.04% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000020s : 1: remove_dup_value 4.92% : 0.005316s : 2: renormalize.infer 1.38% : 0.001493s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000045s : 1: rewriter_after_opt_a 0.15% : 0.000160s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000089s : 1: symbol_engine_optimizer 40.76% : 0.044045s : 1: task_emit 0.08% : 0.000084s : 1: tuple_transform 10.82% : 0.011694s : 1: type_inference 0.06% : 0.000067s : 1: validate TotalTime = 0.0571532, [24] [bootstrap]: 0.00049676 [type_inference]: 0.00588769 [event_method]: 1.266e-05 [auto_monad]: 6.002e-05 [graph_reusing]: 5.74e-06 [inline]: 1.68002e-06 [add_attr]: 0.00304068, [1] [add_attr_with_inline]: 0.003033, [1] [Cycle 1]: 5.282e-05, [2] [tag_attr]: 1.519e-05 [meta_addattr_fg_expand]: 4.06001e-06 [parallel-infer-symbol]: 3.08e-06 [pre_auto_parallel]: 2.374e-05 [insert-virtual-dataset]: 2.38998e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 1.96998e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.00395331, [53] [py_interpret_to_execute]: 1.923e-05 [rewriter_before_opt_a]: 5.146e-05 [opt_a]: 0.00202578, [2] [Cycle 1]: 0.00140908, [45] [expand_dump_flag]: 3.14999e-06 [switch_simplify]: 2.979e-05 [loop_unroll]: 1.715e-05 [a_1]: 0.00035648 [with_stream_mark]: 1.51e-05 [recompute_prepare]: 7.73999e-06 [updatestate_depend_eliminate]: 3.86001e-06 [updatestate_assign_eliminate]: 3.58e-06 [updatestate_loads_eliminate]: 3.4e-06 [parameter_eliminate]: 2.42001e-06 [a_2]: 8.023e-05 [accelerated_algorithm]: 6.63998e-06 [shard]: 2.10002e-06 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 6.23e-06 [merge_send_recv]: 8.03001e-06 [auto_parallel]: 6.36e-06 [parallel]: 1.79e-05 [flash_sp]: 7.63001e-06 [merge_comm]: 3.9e-06 [allreduce_fusion]: 3.81999e-06 [matmul_add_comm_reduction]: 9.80002e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.27997e-06 [virtual_dataset]: 6.00002e-06 [get_grad_eliminate_]: 5.51e-06 [virtual_output]: 5.68002e-06 [merge_forward]: 4.23001e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 9.27999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.161e-05 [merge_recompute_call_nodes]: 1.46002e-06 [before_grad]: 1.026e-05 [set_forward_comm_id_for_comm_node_pass]: 3.97e-06 [meta_fg_expand]: 2.54001e-06 [flash_sp_send_recv_attached]: 2.59999e-06 [receive_attached]: 2.24999e-06 [after_resolve]: 9.49e-06 [a_after_grad]: 8.65001e-06 [renormalize]: 0.00039284 [add_forward_monad_depend]: 5.00999e-06 [auto_monad_grad]: 1.89999e-06 [auto_monad_eliminator]: 1.338e-05 [cse]: 3.09e-05 [a_3]: 4.117e-05 [Cycle 2]: 0.000606, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 7.61001e-06 [loop_unroll]: 6.16e-06 [a_1]: 0.00011451 [with_stream_mark]: 1.273e-05 [recompute_prepare]: 6.19999e-06 [updatestate_depend_eliminate]: 2.93998e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.68003e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 7.164e-05 [accelerated_algorithm]: 5.74999e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 4.67998e-06 [auto_parallel]: 5.51e-06 [parallel]: 3.81001e-06 [flash_sp]: 3.56999e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 2.88e-06 [matmul_add_comm_reduction]: 5.17e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.24999e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.28002e-06 [virtual_output]: 5.05999e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 6.12999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.063e-05 [merge_recompute_call_nodes]: 9.49978e-07 [before_grad]: 8.59998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.68e-06 [meta_fg_expand]: 1.72001e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 8.25999e-06 [a_after_grad]: 7.88001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.38e-06 [cse]: 1.39e-05 [a_3]: 3.344e-05 [py_interpret_to_execute_after_opt_a]: 7.66001e-06 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 3.285e-05 [convert_after_rewriter]: 6.58e-06 [order_py_execute_after_rewriter]: 5.44e-06 [mutable_eliminate]: 0.0004622 [opt_b]: 0.00018762, [1] [Cycle 1]: 0.00018151, [7] [b_1]: 0.00010996 [b_2]: 7.29001e-06 [updatestate_depend_eliminate]: 5.41998e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.19997e-07 [cse]: 1.823e-05 [optimize_parallel_all_gather_comm]: 1.654e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.267e-05 [loop_unroll]: 0.00047017 [opt_after_cconv]: 9.753e-05, [1] [Cycle 1]: 9.148e-05, [7] [c_1]: 2.639e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 5.19998e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.37001e-06 [cse]: 1.784e-05 [renormalize]: 3.29979e-07 [remove_dup_value]: 1.572e-05 [tuple_transform]: 6.83e-05, [1] [Cycle 1]: 6.352e-05, [4] [d_1]: 3.616e-05 [none_parameter_eliminate]: 1.92001e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 6.33998e-06 [partial_unused_args_eliminate]: 2.19001e-06 [add_recomputation]: 4.555e-05 [cse_after_recomputation]: 2.205e-05, [1] [Cycle 1]: 1.711e-05, [1] [cse]: 1.169e-05 [environ_conv]: 5.54e-06 [swap_dp_allreduce_reducescatter]: 5.40001e-06 [bias_add_comm_swap]: 2.48002e-06 [label_micro_interleaved_index]: 4.70001e-06 [label_fine_grained_interleaved_index]: 2.64999e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.78e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 9.39996e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.37999e-06 [reorder_send_recv_between_fp_bp]: 2.68e-06 [comm_op_add_attrs]: 1.14003e-06 [add_comm_op_reuse_tag]: 1.12999e-06 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.09998e-06 [overlap_opt_shard_in_pipeline]: 1.21002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.259e-05 [grouped_pairwise_exchange_alltoall]: 1.39e-06 [offloading_packed_experts]: 3.68999e-06 [overlap_recompute_and_grad_model_parallel]: 4.43001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.24e-06 [overlap_recompute_comm]: 2.59001e-06 [overlap_grad_ring_attention]: 4.28999e-06 [overlap_grad_flash_sp]: 1.75e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.53003e-06 [split_layernorm_comm]: 1.90001e-06 [handle_group_info]: 1.30001e-06 [symbol_engine_optimizer]: 7.254e-05, [1] [Cycle 1]: 6.755e-05, [6] [build]: 2.29999e-06 [elim_shapecalc]: 8.73001e-06 [elim_not_effective]: 1.215e-05 [opt_reshape]: 6.32001e-06 [fold_const_symbol]: 9.74999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.62999e-06 [auto_monad_reorder]: 1.62e-05 [get_jit_bprop_graph]: 1.11002e-06 [rewriter_after_jit_bprop_graph]: 4.17998e-06 [opt_after_jit_grad]: 0.00046055 [validate]: 3.525e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0429082 [execute]: 1.011e-05 Sums bootstrap : 0.000497s : 0.94% type_inference : 0.005888s : 11.09% event_method : 0.000013s : 0.02% auto_monad : 0.000060s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000024s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.04% optimize.rewriter_before_opt_a : 0.000051s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000037s : 0.07% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000471s : 0.89% optimize.opt_a.with_stream_mark : 0.000028s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000152s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000393s : 0.74% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000045s : 0.08% optimize.opt_a.a_3 : 0.000075s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000462s : 0.87% optimize.opt_b.b_1 : 0.000110s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.04% optimize.loop_unroll : 0.000470s : 0.89% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.03% optimize.tuple_transform.d_1 : 0.000036s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.09% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000461s : 0.87% validate : 0.000035s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.042908s : 80.81% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000142 24 20.11% : 0.000028s : 4: substitution.arithmetic_simplify 1.27% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 3.61% : 0.000005s : 3: substitution.graph_param_transform 66.57% : 0.000094s : 3: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.19% : 0.000005s : 4: substitution.remove_not_recompute_node 2.00% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005842 2 91.79% : 0.005363s : 1: type_inference.infer 8.21% : 0.000480s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000092 3 100.00% : 0.000092s : 3: match.inline ------[predicate.] 0.000148 815 0.86% : 0.000001s : 8: predicate.accumulaten_eliminater 0.90% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 6: predicate.addn_check_dump 0.86% : 0.000001s : 8: predicate.addn_zero_filter 0.78% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.35% : 0.000003s : 14: predicate.arithmetic_simplify 0.88% : 0.000001s : 8: predicate.cast_eliminate 0.71% : 0.000001s : 6: predicate.check_bprop_eliminate 0.63% : 0.000001s : 6: predicate.compare_switch_simplify 0.25% : 0.000000s : 3: predicate.const_output_eliminate 0.69% : 0.000001s : 6: predicate.depend_value_elim 0.85% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_depend_swap 1.79% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.16% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.29% : 0.000003s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 6: predicate.float_environ_get_switch 0.88% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000001s : 6: predicate.get_grad_eliminate 0.26% : 0.000000s : 3: predicate.graph_param_transform 0.73% : 0.000001s : 6: predicate.incorporate_call 0.63% : 0.000001s : 6: predicate.incorporate_call_switch 6.35% : 0.000009s : 37: predicate.inline 0.95% : 0.000001s : 6: predicate.inline_without_move 0.55% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 6: predicate.less_batch_normalization 1.51% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.25% : 0.000003s : 22: predicate.load_eliminater 1.30% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.94% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.65% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 8: predicate.minmaximum_grad 1.32% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.55% : 0.000001s : 3: predicate.parallel_virtual_node 1.46% : 0.000002s : 11: predicate.partial_defer_inline 1.34% : 0.000002s : 11: predicate.partial_eliminate 0.86% : 0.000001s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.13% : 0.000002s : 8: predicate.reduce_eliminate 2.23% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.63% : 0.000001s : 6: predicate.remove_not_recompute_node 1.22% : 0.000002s : 14: predicate.replace_applicator 0.66% : 0.000001s : 6: predicate.replace_old_param 0.32% : 0.000000s : 3: predicate.reset_defer_inline 0.89% : 0.000001s : 8: predicate.reshape_eliminate 0.71% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 3: predicate.row_tensor_eliminate 0.96% : 0.000001s : 6: predicate.same_eliminate 0.51% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 6: predicate.shard_identity_eliminate 0.80% : 0.000001s : 6: predicate.special_op_eliminate 0.92% : 0.000001s : 6: predicate.specialize_transform 1.00% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.25% : 0.000002s : 11: predicate.switch_defer_inline 1.88% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.84% : 0.000007s : 38: predicate.switch_simplify 0.83% : 0.000001s : 8: predicate.tile_eliminate 0.93% : 0.000001s : 8: predicate.transpose_eliminate 1.61% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.61% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.63% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.21% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.73% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 6: predicate.virtual_output_eliminate 0.34% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000295 7 38.78% : 0.000114s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.22% : 0.000181s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.065520 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.65% : 0.003045s : 1: add_attr 4.63% : 0.003036s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000065s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.82% : 0.000535s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000018s : 1: event_method 0.03% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.73% : 0.000479s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.72% : 0.000471s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.27% : 0.000834s : 78: opt.transform.opt_a 0.04% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000089s : 28: opt.transform.opt_b 0.06% : 0.000040s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 3.10% : 0.002029s : 1: opt_a 0.15% : 0.000101s : 1: opt_after_cconv 0.72% : 0.000470s : 1: opt_after_jit_grad 0.29% : 0.000191s : 1: opt_b 6.04% : 0.003957s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000006s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000028s : 1: pre_auto_parallel 0.04% : 0.000023s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.31% : 0.000203s : 1: renormalize.infer 0.28% : 0.000183s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000037s : 1: rewriter_after_opt_a 0.08% : 0.000055s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000075s : 1: symbol_engine_optimizer 65.53% : 0.042934s : 1: task_emit 0.11% : 0.000071s : 1: tuple_transform 9.01% : 0.005902s : 1: type_inference 0.09% : 0.000058s : 1: validate TotalTime = 0.0761892, [24] [bootstrap]: 0.00049261 [type_inference]: 0.0118398 [event_method]: 4.469e-05 [auto_monad]: 0.00013012 [graph_reusing]: 8.85999e-06 [inline]: 1.69998e-06 [add_attr]: 0.00317451, [1] [add_attr_with_inline]: 0.0031656, [1] [Cycle 1]: 6.893e-05, [2] [tag_attr]: 3.19e-05 [meta_addattr_fg_expand]: 9.67001e-06 [parallel-infer-symbol]: 2.82002e-06 [pre_auto_parallel]: 4.671e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.12999e-06 [pipeline_split]: 2.06e-06 [optimize]: 0.0163946, [53] [py_interpret_to_execute]: 3.931e-05 [rewriter_before_opt_a]: 0.00014585 [opt_a]: 0.014259, [3] [Cycle 1]: 0.0108573, [45] [expand_dump_flag]: 4.58001e-06 [switch_simplify]: 7.271e-05 [loop_unroll]: 5.959e-05 [a_1]: 0.00144606 [with_stream_mark]: 2.364e-05 [recompute_prepare]: 2.297e-05 [updatestate_depend_eliminate]: 8.72e-06 [updatestate_assign_eliminate]: 6.98e-06 [updatestate_loads_eliminate]: 6.64001e-06 [parameter_eliminate]: 2.69001e-06 [a_2]: 0.0002466 [accelerated_algorithm]: 3.205e-05 [shard]: 2.01e-06 [meta_shard_fg_expand]: 3.6e-06 [shard_inline]: 1.593e-05 [merge_send_recv]: 1.706e-05 [auto_parallel]: 1.142e-05 [parallel]: 1.936e-05 [flash_sp]: 1.187e-05 [merge_comm]: 9.96e-06 [allreduce_fusion]: 8.79998e-06 [matmul_add_comm_reduction]: 2.663e-05 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 1.786e-05 [virtual_dataset]: 1.55e-05 [get_grad_eliminate_]: 1.514e-05 [virtual_output]: 1.523e-05 [merge_forward]: 9.00999e-06 [cell_reuse_recompute_pass]: 1.45001e-06 [offload_activation]: 1.744e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.972e-05 [merge_recompute_call_nodes]: 1.51998e-06 [before_grad]: 2.889e-05 [set_forward_comm_id_for_comm_node_pass]: 9.71e-06 [meta_fg_expand]: 0.0014758 [flash_sp_send_recv_attached]: 3.79002e-06 [receive_attached]: 2.07999e-06 [after_resolve]: 6.463e-05 [a_after_grad]: 8.853e-05 [renormalize]: 0.00612067 [add_forward_monad_depend]: 9.86e-06 [auto_monad_grad]: 6.63e-06 [auto_monad_eliminator]: 5.137e-05 [cse]: 0.00019017 [a_3]: 0.00033038 [Cycle 2]: 0.00270759, [45] [expand_dump_flag]: 1.94e-06 [switch_simplify]: 4.509e-05 [loop_unroll]: 4.187e-05 [a_1]: 0.00132747 [with_stream_mark]: 1.145e-05 [recompute_prepare]: 8.74998e-06 [updatestate_depend_eliminate]: 3.99002e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 8.698e-05 [accelerated_algorithm]: 1.038e-05 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.64998e-06 [shard_inline]: 6.47001e-06 [merge_send_recv]: 6.59999e-06 [auto_parallel]: 7.35e-06 [parallel]: 6.32001e-06 [flash_sp]: 3.34001e-06 [merge_comm]: 3.91001e-06 [allreduce_fusion]: 3.8e-06 [matmul_add_comm_reduction]: 6.83e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 7.98999e-06 [virtual_dataset]: 6.51999e-06 [get_grad_eliminate_]: 6.63998e-06 [virtual_output]: 6.16e-06 [merge_forward]: 3.65e-06 [cell_reuse_recompute_pass]: 9.50007e-07 [offload_activation]: 7.56001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.265e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 1.128e-05 [set_forward_comm_id_for_comm_node_pass]: 4.13001e-06 [meta_fg_expand]: 5.402e-05 [flash_sp_send_recv_attached]: 1.29998e-06 [receive_attached]: 1.69e-06 [after_resolve]: 1.184e-05 [a_after_grad]: 1.014e-05 [renormalize]: 0.00063748 [add_forward_monad_depend]: 4.38999e-06 [auto_monad_grad]: 1.45001e-06 [auto_monad_eliminator]: 1.162e-05 [cse]: 2.041e-05 [a_3]: 4.799e-05 [Cycle 3]: 0.00067929, [45] [expand_dump_flag]: 1.15001e-06 [switch_simplify]: 8.07998e-06 [loop_unroll]: 6.73998e-06 [a_1]: 0.00014744 [with_stream_mark]: 7.82e-06 [recompute_prepare]: 6.64001e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 2.79001e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 8.554e-05 [accelerated_algorithm]: 9.57999e-06 [shard]: 9.30013e-07 [meta_shard_fg_expand]: 1.34e-06 [shard_inline]: 6.81001e-06 [merge_send_recv]: 5.35001e-06 [auto_parallel]: 6.13998e-06 [parallel]: 5.11002e-06 [flash_sp]: 1.05001e-06 [merge_comm]: 3.75e-06 [allreduce_fusion]: 3.35003e-06 [matmul_add_comm_reduction]: 5.64e-06 [allreduce_slice_to_reducescatter]: 3.99974e-07 [virtual_shard_identity]: 7.41001e-06 [virtual_dataset]: 6.31e-06 [get_grad_eliminate_]: 6.26e-06 [virtual_output]: 6.26998e-06 [merge_forward]: 3.09001e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 6.93e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.294e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 1.055e-05 [set_forward_comm_id_for_comm_node_pass]: 4.01001e-06 [meta_fg_expand]: 2.38998e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.99e-06 [a_after_grad]: 9.66e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 1.10999e-06 [auto_monad_eliminator]: 7.69002e-06 [cse]: 1.645e-05 [a_3]: 3.938e-05 [py_interpret_to_execute_after_opt_a]: 1.06e-05 [slice_cell_reuse_recomputed_activation]: 2.48998e-06 [rewriter_after_opt_a]: 4.143e-05 [convert_after_rewriter]: 7.73001e-06 [order_py_execute_after_rewriter]: 5.37999e-06 [mutable_eliminate]: 0.00048649 [opt_b]: 0.00021639, [1] [Cycle 1]: 0.0002095, [7] [b_1]: 0.0001331 [b_2]: 8.46997e-06 [updatestate_depend_eliminate]: 5.86e-06 [updatestate_assign_eliminate]: 2.95002e-06 [updatestate_loads_eliminate]: 2.73e-06 [renormalize]: 3.69997e-07 [cse]: 2.141e-05 [optimize_parallel_all_gather_comm]: 1.697e-05 [overlap_param_gather]: 1.83002e-06 [cconv]: 2.141e-05 [loop_unroll]: 0.00043723 [opt_after_cconv]: 0.00011233, [1] [Cycle 1]: 0.00010636, [7] [c_1]: 3.38e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 6.02999e-06 [updatestate_assign_eliminate]: 3.24001e-06 [updatestate_loads_eliminate]: 2.91999e-06 [cse]: 2.205e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.692e-05 [tuple_transform]: 7.987e-05, [1] [Cycle 1]: 7.466e-05, [4] [d_1]: 4.55e-05 [none_parameter_eliminate]: 1.99e-06 [renormalize]: 2.60014e-07 [switch_simplify]: 7.80998e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 5.045e-05 [cse_after_recomputation]: 2.499e-05, [1] [Cycle 1]: 2.056e-05, [1] [cse]: 1.52e-05 [environ_conv]: 8.11002e-06 [swap_dp_allreduce_reducescatter]: 5.98002e-06 [bias_add_comm_swap]: 2.81e-06 [label_micro_interleaved_index]: 4.89e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.36002e-06 [slice_recompute_activation]: 2.14e-06 [micro_interleaved_order_control]: 2.19999e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 1.12999e-06 [remove_cast_before_assign_add]: 1.09998e-06 [full_micro_interleaved_order_control]: 2.48002e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 1.37999e-06 [add_comm_op_reuse_tag]: 1.54998e-06 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.32e-06 [overlap_opt_shard_grad_in_pipeline]: 2.02999e-06 [control_data_broadcast_order]: 1.42e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 4.82e-06 [overlap_recompute_and_grad_model_parallel]: 5.29e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 4.44998e-06 [overlap_grad_flash_sp]: 2.011e-05 [begin_end_overlap_inline]: 5.49975e-07 [split_matmul_comm_elemetwise]: 2.50002e-06 [split_layernorm_comm]: 2.09999e-06 [handle_group_info]: 1.27e-06 [symbol_engine_optimizer]: 8.6e-05, [1] [Cycle 1]: 8.147e-05, [6] [build]: 8.94998e-06 [elim_shapecalc]: 1.04e-05 [elim_not_effective]: 1.484e-05 [opt_reshape]: 7.41999e-06 [fold_const_symbol]: 1.139e-05 [renormalize]: 1.90019e-07 [detach_backward]: 1.81e-06 [pipeline_parallel_scheduler]: 1.89e-06 [auto_monad_reorder]: 2.082e-05 [get_jit_bprop_graph]: 1.52001e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.0004694 [validate]: 4.333e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.0432702 [execute]: 1.036e-05 Sums bootstrap : 0.000493s : 0.69% type_inference : 0.011840s : 16.51% event_method : 0.000045s : 0.06% auto_monad : 0.000130s : 0.18% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000047s : 0.07% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.05% optimize.rewriter_before_opt_a : 0.000146s : 0.20% optimize.opt_a.expand_dump_flag : 0.000008s : 0.01% optimize.opt_a.switch_simplify : 0.000126s : 0.18% optimize.opt_a.loop_unroll : 0.000108s : 0.15% optimize.opt_a.a_1 : 0.002921s : 4.07% optimize.opt_a.with_stream_mark : 0.000043s : 0.06% optimize.opt_a.recompute_prepare : 0.000038s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000013s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000012s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000419s : 0.58% optimize.opt_a.accelerated_algorithm : 0.000052s : 0.07% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000029s : 0.04% optimize.opt_a.merge_send_recv : 0.000029s : 0.04% optimize.opt_a.auto_parallel : 0.000025s : 0.03% optimize.opt_a.parallel : 0.000031s : 0.04% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000018s : 0.02% optimize.opt_a.allreduce_fusion : 0.000016s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000039s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000033s : 0.05% optimize.opt_a.virtual_dataset : 0.000028s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000028s : 0.04% optimize.opt_a.virtual_output : 0.000028s : 0.04% optimize.opt_a.merge_forward : 0.000016s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000032s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000055s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000051s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.02% optimize.opt_a.meta_fg_expand : 0.001532s : 2.14% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000085s : 0.12% optimize.opt_a.a_after_grad : 0.000108s : 0.15% optimize.opt_a.renormalize : 0.006758s : 9.42% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.02% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000071s : 0.10% optimize.opt_a.cse : 0.000227s : 0.32% optimize.opt_a.a_3 : 0.000418s : 0.58% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000041s : 0.06% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000486s : 0.68% optimize.opt_b.b_1 : 0.000133s : 0.19% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000021s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000437s : 0.61% optimize.opt_after_cconv.c_1 : 0.000034s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000022s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000017s : 0.02% optimize.tuple_transform.d_1 : 0.000045s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.07% optimize.cse_after_recomputation.cse : 0.000015s : 0.02% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000002s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000469s : 0.65% validate : 0.000043s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.043270s : 60.34% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000692 159 6.69% : 0.000046s : 7: substitution.arithmetic_simplify 0.36% : 0.000002s : 3: substitution.elim_not_effective 0.63% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 2: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 3: substitution.fold_const_symbol 0.87% : 0.000006s : 4: substitution.graph_param_transform 0.46% : 0.000003s : 2: substitution.incorporate_call 0.36% : 0.000002s : 2: substitution.incorporate_call_switch 57.60% : 0.000399s : 17: substitution.inline 2.41% : 0.000017s : 2: substitution.inline_without_move 1.45% : 0.000010s : 15: substitution.j_node_and_user_rematch 2.32% : 0.000016s : 3: substitution.less_batch_normalization 1.44% : 0.000010s : 7: substitution.minmaximum_grad 0.82% : 0.000006s : 5: substitution.partial_eliminate 1.79% : 0.000012s : 15: substitution.remove_not_recompute_node 3.88% : 0.000027s : 10: substitution.replace_applicator 1.34% : 0.000009s : 10: substitution.replace_old_param 0.42% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.25% : 0.000022s : 7: substitution.tuple_list_convert_item_index_to_positive 1.51% : 0.000010s : 7: substitution.tuple_list_get_item_const_eliminator 2.15% : 0.000015s : 7: substitution.tuple_list_get_item_depend_reorder 7.30% : 0.000051s : 18: substitution.tuple_list_get_item_eliminator 2.14% : 0.000015s : 7: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011766 2 87.52% : 0.010298s : 1: type_inference.infer 12.48% : 0.001468s : 1: type_inference.specialize ------[replace.] 0.000270 26 76.35% : 0.000206s : 17: replace.inline 23.65% : 0.000064s : 9: replace.tuple_list_get_item_eliminator ------[match.] 0.000414 26 94.13% : 0.000389s : 17: match.inline 5.87% : 0.000024s : 9: match.tuple_list_get_item_eliminator ------[predicate.] 0.000684 4180 1.12% : 0.000008s : 52: predicate.accumulaten_eliminater 0.25% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000003s : 21: predicate.addn_check_dump 1.15% : 0.000008s : 52: predicate.addn_zero_filter 1.08% : 0.000007s : 52: predicate.adjust_all_reduce_mul_add 2.01% : 0.000014s : 73: predicate.arithmetic_simplify 1.13% : 0.000008s : 52: predicate.cast_eliminate 1.14% : 0.000008s : 50: predicate.check_bprop_eliminate 0.46% : 0.000003s : 21: predicate.compare_switch_simplify 0.07% : 0.000000s : 4: predicate.const_output_eliminate 0.48% : 0.000003s : 21: predicate.depend_value_elim 1.15% : 0.000008s : 52: predicate.dict_get_item_const_eliminator 1.20% : 0.000008s : 52: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 52: predicate.dict_set_item_eliminator 0.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.07% : 0.000001s : 4: predicate.elim_not_effective 0.12% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000008s : 56: predicate.environ_add_const_eliminate 1.20% : 0.000008s : 56: predicate.environ_get_add_eliminate 1.18% : 0.000008s : 56: predicate.environ_get_depend_swap 1.69% : 0.000012s : 77: predicate.environ_get_eliminate 1.19% : 0.000008s : 56: predicate.environ_get_set_eliminate 1.82% : 0.000012s : 78: predicate.exchange_switch_depend_value 2.44% : 0.000017s : 78: predicate.float_depend_g_call 0.47% : 0.000003s : 21: predicate.float_environ_get_switch 0.57% : 0.000004s : 25: predicate.float_tuple_getitem_switch 0.06% : 0.000000s : 4: predicate.fold_const_symbol 0.54% : 0.000004s : 21: predicate.get_grad_eliminate 0.07% : 0.000000s : 4: predicate.graph_param_transform 0.51% : 0.000003s : 21: predicate.incorporate_call 0.47% : 0.000003s : 21: predicate.incorporate_call_switch 5.93% : 0.000041s : 180: predicate.inline 1.47% : 0.000010s : 45: predicate.inline_without_move 0.29% : 0.000002s : 21: predicate.j_node_and_user_rematch 0.63% : 0.000004s : 21: predicate.less_batch_normalization 1.53% : 0.000010s : 69: predicate.list_to_tuple_eliminator_ 2.60% : 0.000018s : 121: predicate.load_eliminater 0.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.54% : 0.000017s : 110: predicate.loop_unroll_before_grad 1.35% : 0.000009s : 60: predicate.make_slice_get_slice_eliminator 0.48% : 0.000003s : 21: predicate.merge_addn 1.11% : 0.000008s : 50: predicate.micro_step_allgather_replace 1.09% : 0.000007s : 50: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 52: predicate.minmaximum_grad 0.28% : 0.000002s : 4: predicate.mutable_eliminate 0.13% : 0.000001s : 4: predicate.opt_reshape 0.12% : 0.000001s : 4: predicate.parallel_virtual_node 2.09% : 0.000014s : 78: predicate.partial_defer_inline 1.69% : 0.000012s : 65: predicate.partial_eliminate 1.13% : 0.000008s : 52: predicate.print_const_string_wrapper 0.49% : 0.000003s : 21: predicate.reduce_all_const_elim 1.38% : 0.000009s : 52: predicate.reduce_eliminate 2.61% : 0.000018s : 121: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 21: predicate.remove_not_recompute_node 1.87% : 0.000013s : 111: predicate.replace_applicator 0.66% : 0.000005s : 45: predicate.replace_old_param 0.08% : 0.000001s : 4: predicate.reset_defer_inline 1.13% : 0.000008s : 52: predicate.reshape_eliminate 1.14% : 0.000008s : 50: predicate.row_tensor_add_zeros_like 0.12% : 0.000001s : 4: predicate.row_tensor_eliminate 1.26% : 0.000009s : 50: predicate.same_eliminate 0.35% : 0.000002s : 21: predicate.set_cell_output_no_recompute 0.56% : 0.000004s : 21: predicate.shard_identity_eliminate 0.27% : 0.000002s : 8: predicate.special_op_eliminate 0.62% : 0.000004s : 21: predicate.specialize_transform 1.25% : 0.000009s : 50: predicate.split_environ_get_set_with_tuple_value 1.25% : 0.000009s : 45: predicate.stack_unstack_eliminate 0.12% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.96% : 0.000013s : 78: predicate.switch_defer_inline 3.02% : 0.000021s : 128: predicate.switch_layer_defer_inline 5.28% : 0.000036s : 213: predicate.switch_simplify 1.13% : 0.000008s : 52: predicate.tile_eliminate 1.09% : 0.000007s : 52: predicate.transpose_eliminate 1.45% : 0.000010s : 60: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 60: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000009s : 60: predicate.tuple_list_get_item_depend_reorder 2.78% : 0.000019s : 90: predicate.tuple_list_get_item_eliminator 1.46% : 0.000010s : 60: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000014s : 81: predicate.tuple_list_set_item_eliminator 1.56% : 0.000011s : 69: predicate.tuple_to_list_eliminator_ 2.59% : 0.000018s : 121: predicate.updatestate_pure_node_eliminater 3.19% : 0.000022s : 142: predicate.updatestate_useless_node_eliminater 0.12% : 0.000001s : 4: predicate.value_based_eliminate 0.52% : 0.000004s : 21: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 21: predicate.virtual_output_eliminate 0.10% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001712 35 60.33% : 0.001033s : 14: func_graph_cloner_run.FuncGraphClonerGraph 39.67% : 0.000679s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.107104 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.97% : 0.003179s : 1: add_attr 2.96% : 0.003169s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.13% : 0.000137s : 1: auto_monad 0.02% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.49% : 0.000528s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000028s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.05% : 0.000052s : 1: event_method 0.02% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.42% : 0.000446s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.46% : 0.000495s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 4.10% : 0.004396s : 117: opt.transform.opt_a 0.03% : 0.000032s : 1: opt.transform.opt_after_cconv 0.02% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000113s : 28: opt.transform.opt_b 0.05% : 0.000051s : 2: opt.transform.opt_trans_graph 0.04% : 0.000040s : 4: opt.transform.symbol_engine_opt 13.32% : 0.014262s : 1: opt_a 0.11% : 0.000116s : 1: opt_after_cconv 0.45% : 0.000479s : 1: opt_after_jit_grad 0.21% : 0.000220s : 1: opt_b 15.31% : 0.016399s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000051s : 1: pre_auto_parallel 0.04% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000020s : 1: remove_dup_value 4.81% : 0.005155s : 2: renormalize.infer 1.48% : 0.001590s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000045s : 1: rewriter_after_opt_a 0.14% : 0.000150s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000089s : 1: symbol_engine_optimizer 40.42% : 0.043295s : 1: task_emit 0.08% : 0.000083s : 1: tuple_transform 11.07% : 0.011855s : 1: type_inference 0.06% : 0.000065s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x3-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x3-ge],max_mem:12.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x4-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x4-pynative],max_mem:12.0M TotalTime = 0.0245087, [24] [bootstrap]: 0.00055856 [type_inference]: 0.00697513 [event_method]: 1.485e-05 [auto_monad]: 6.552e-05 [graph_reusing]: 5.82999e-06 [inline]: 2.24999e-06 [add_attr]: 0.00397437, [1] [add_attr_with_inline]: 0.00396256, [1] [Cycle 1]: 5.699e-05, [2] [tag_attr]: 1.684e-05 [meta_addattr_fg_expand]: 4.63001e-06 [parallel-infer-symbol]: 3.16999e-06 [pre_auto_parallel]: 2.914e-05 [insert-virtual-dataset]: 2.51998e-06 [parallel-infer-symbol-second]: 1.02998e-06 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 1.95001e-06 [optimize]: 0.00452545, [53] [py_interpret_to_execute]: 2.345e-05 [rewriter_before_opt_a]: 6.909e-05 [opt_a]: 0.00243406, [2] [Cycle 1]: 0.00181003, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 3.333e-05 [loop_unroll]: 2.025e-05 [a_1]: 0.00046874 [with_stream_mark]: 1.594e-05 [recompute_prepare]: 8.77e-06 [updatestate_depend_eliminate]: 4.23001e-06 [updatestate_assign_eliminate]: 3.52997e-06 [updatestate_loads_eliminate]: 3.91999e-06 [parameter_eliminate]: 1.81e-06 [a_2]: 8.086e-05 [accelerated_algorithm]: 6.59001e-06 [shard]: 2.23002e-06 [meta_shard_fg_expand]: 1.73002e-06 [shard_inline]: 6.14999e-06 [merge_send_recv]: 8.38999e-06 [auto_parallel]: 6.41e-06 [parallel]: 2.66e-05 [flash_sp]: 9.47999e-06 [merge_comm]: 4.04002e-06 [allreduce_fusion]: 3.75998e-06 [matmul_add_comm_reduction]: 9.82001e-06 [allreduce_slice_to_reducescatter]: 8.09989e-07 [virtual_shard_identity]: 7.93001e-06 [virtual_dataset]: 6.34001e-06 [get_grad_eliminate_]: 5.97999e-06 [virtual_output]: 5.72001e-06 [merge_forward]: 4.25e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 9.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.248e-05 [merge_recompute_call_nodes]: 1.48002e-06 [before_grad]: 1.011e-05 [set_forward_comm_id_for_comm_node_pass]: 3.78001e-06 [meta_fg_expand]: 2.79001e-06 [flash_sp_send_recv_attached]: 2.64001e-06 [receive_attached]: 2.71999e-06 [after_resolve]: 9.34e-06 [a_after_grad]: 9.14e-06 [renormalize]: 0.00063675 [add_forward_monad_depend]: 8.35001e-06 [auto_monad_grad]: 2.74001e-06 [auto_monad_eliminator]: 1.502e-05 [cse]: 3.2e-05 [a_3]: 4.396e-05 [Cycle 2]: 0.00061344, [45] [expand_dump_flag]: 1.32e-06 [switch_simplify]: 7.06999e-06 [loop_unroll]: 5.57999e-06 [a_1]: 0.00011664 [with_stream_mark]: 1.043e-05 [recompute_prepare]: 6.10002e-06 [updatestate_depend_eliminate]: 2.95002e-06 [updatestate_assign_eliminate]: 2.33002e-06 [updatestate_loads_eliminate]: 2.97002e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 7.218e-05 [accelerated_algorithm]: 5.85002e-06 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.40001e-06 [shard_inline]: 5.58002e-06 [merge_send_recv]: 4.74002e-06 [auto_parallel]: 5.62001e-06 [parallel]: 4.72e-06 [flash_sp]: 3.6e-06 [merge_comm]: 3.18998e-06 [allreduce_fusion]: 3.51999e-06 [matmul_add_comm_reduction]: 5.97999e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 6.39999e-06 [virtual_dataset]: 5.59998e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 2.89001e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 6.14999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.034e-05 [merge_recompute_call_nodes]: 8.2e-07 [before_grad]: 9.04998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.66999e-06 [meta_fg_expand]: 1.79e-06 [flash_sp_send_recv_attached]: 1.12e-06 [receive_attached]: 1.19998e-06 [after_resolve]: 8.69e-06 [a_after_grad]: 7.8e-06 [renormalize]: 5.9983e-08 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 1.31998e-06 [auto_monad_eliminator]: 6.39999e-06 [cse]: 1.468e-05 [a_3]: 3.298e-05 [py_interpret_to_execute_after_opt_a]: 8.48001e-06 [slice_cell_reuse_recomputed_activation]: 2.24999e-06 [rewriter_after_opt_a]: 3.685e-05 [convert_after_rewriter]: 7.15e-06 [order_py_execute_after_rewriter]: 5.63002e-06 [mutable_eliminate]: 0.00057288 [opt_b]: 0.00019253, [1] [Cycle 1]: 0.00018601, [7] [b_1]: 0.00011131 [b_2]: 7.10998e-06 [updatestate_depend_eliminate]: 5.64e-06 [updatestate_assign_eliminate]: 2.83998e-06 [updatestate_loads_eliminate]: 2.69999e-06 [renormalize]: 5.09986e-07 [cse]: 1.864e-05 [optimize_parallel_all_gather_comm]: 1.738e-05 [overlap_param_gather]: 2.31e-06 [cconv]: 2.481e-05 [loop_unroll]: 0.00045376 [opt_after_cconv]: 9.848e-05, [1] [Cycle 1]: 9.227e-05, [7] [c_1]: 2.655e-05 [parameter_eliminate]: 2.31998e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.58998e-06 [updatestate_loads_eliminate]: 2.56e-06 [cse]: 1.837e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.636e-05 [tuple_transform]: 7.061e-05, [1] [Cycle 1]: 6.608e-05, [4] [d_1]: 3.845e-05 [none_parameter_eliminate]: 1.76e-06 [renormalize]: 1.99972e-07 [switch_simplify]: 6.76e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 5.102e-05 [cse_after_recomputation]: 2.255e-05, [1] [Cycle 1]: 1.802e-05, [1] [cse]: 1.23e-05 [environ_conv]: 9.01998e-06 [swap_dp_allreduce_reducescatter]: 5.62999e-06 [bias_add_comm_swap]: 2.58e-06 [label_micro_interleaved_index]: 4.97e-06 [label_fine_grained_interleaved_index]: 2.87002e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.29001e-06 [micro_interleaved_order_control]: 2.51998e-06 [assign_add_opt]: 1.43002e-06 [ForceFp32Comm]: 1.06002e-06 [remove_cast_before_assign_add]: 1.19998e-06 [full_micro_interleaved_order_control]: 2.40002e-06 [reorder_send_recv_between_fp_bp]: 3.21001e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.39e-06 [interleave_parallel_branches]: 1.19e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99e-06 [control_data_broadcast_order]: 1.329e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.98999e-06 [overlap_recompute_and_grad_model_parallel]: 4.92999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.49e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50001e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 4.47998e-06 [overlap_grad_flash_sp]: 2e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.46e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 1.20001e-06 [symbol_engine_optimizer]: 7.476e-05, [1] [Cycle 1]: 6.984e-05, [6] [build]: 3.6e-06 [elim_shapecalc]: 9.32999e-06 [elim_not_effective]: 1.258e-05 [opt_reshape]: 6.43e-06 [fold_const_symbol]: 9.92001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 2.18998e-06 [pipeline_parallel_scheduler]: 2.02001e-06 [auto_monad_reorder]: 1.637e-05 [get_jit_bprop_graph]: 1.56998e-06 [rewriter_after_jit_bprop_graph]: 3.95e-06 [opt_after_jit_grad]: 0.00048699 [validate]: 3.97e-05 [backend_pass]: 1.05001e-06 [task_emit]: 0.00754272 [execute]: 1.011e-05 Sums bootstrap : 0.000559s : 2.87% type_inference : 0.006975s : 35.82% event_method : 0.000015s : 0.08% auto_monad : 0.000066s : 0.34% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000023s : 0.12% optimize.rewriter_before_opt_a : 0.000069s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.21% optimize.opt_a.loop_unroll : 0.000026s : 0.13% optimize.opt_a.a_1 : 0.000585s : 3.01% optimize.opt_a.with_stream_mark : 0.000026s : 0.14% optimize.opt_a.recompute_prepare : 0.000015s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000153s : 0.79% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.06% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.06% optimize.opt_a.parallel : 0.000031s : 0.16% optimize.opt_a.flash_sp : 0.000013s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.07% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.09% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000637s : 3.27% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.05% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.11% optimize.opt_a.cse : 0.000047s : 0.24% optimize.opt_a.a_3 : 0.000077s : 0.40% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000037s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000573s : 2.94% optimize.opt_b.b_1 : 0.000111s : 0.57% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000019s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.13% optimize.loop_unroll : 0.000454s : 2.33% optimize.opt_after_cconv.c_1 : 0.000027s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.26% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000009s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000020s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.08% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000487s : 2.50% validate : 0.000040s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.007543s : 38.74% execute : 0.000010s : 0.05% Time group info: ------[substitution.] 0.000188 26 18.59% : 0.000035s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000002s : 2: substitution.fold_const_symbol 2.80% : 0.000005s : 3: substitution.graph_param_transform 65.88% : 0.000124s : 3: substitution.inline 1.74% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.57% : 0.000005s : 4: substitution.remove_not_recompute_node 1.71% : 0.000003s : 2: substitution.replace_old_param 4.79% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006918 2 89.61% : 0.006199s : 1: type_inference.infer 10.39% : 0.000719s : 1: type_inference.specialize ------[replace.] 0.000038 4 78.99% : 0.000030s : 3: replace.inline 21.01% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000130 4 93.80% : 0.000122s : 3: match.inline 6.20% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 883 0.92% : 0.000001s : 9: predicate.accumulaten_eliminater 1.08% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 6: predicate.addn_check_dump 0.88% : 0.000001s : 9: predicate.addn_zero_filter 0.82% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 15: predicate.arithmetic_simplify 1.08% : 0.000002s : 9: predicate.cast_eliminate 0.65% : 0.000001s : 6: predicate.check_bprop_eliminate 0.54% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.58% : 0.000001s : 6: predicate.depend_value_elim 0.92% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.93% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.16% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.32% : 0.000002s : 12: predicate.environ_get_depend_swap 1.70% : 0.000003s : 18: predicate.environ_get_eliminate 1.06% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.39% : 0.000004s : 13: predicate.float_depend_g_call 0.54% : 0.000001s : 6: predicate.float_environ_get_switch 0.82% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.64% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.22% : 0.000010s : 40: predicate.inline 0.88% : 0.000001s : 6: predicate.inline_without_move 0.37% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 6: predicate.less_batch_normalization 1.62% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 25: predicate.load_eliminater 0.97% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.18% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.56% : 0.000001s : 6: predicate.merge_addn 0.57% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.20% : 0.000002s : 3: predicate.mutable_eliminate 0.40% : 0.000001s : 3: predicate.opt_reshape 0.46% : 0.000001s : 3: predicate.parallel_virtual_node 1.58% : 0.000003s : 13: predicate.partial_defer_inline 1.42% : 0.000002s : 13: predicate.partial_eliminate 0.88% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 6: predicate.reduce_all_const_elim 1.34% : 0.000002s : 9: predicate.reduce_eliminate 2.39% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.35% : 0.000002s : 16: predicate.replace_applicator 0.59% : 0.000001s : 6: predicate.replace_old_param 0.27% : 0.000000s : 3: predicate.reset_defer_inline 0.97% : 0.000002s : 9: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.84% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 6: predicate.shard_identity_eliminate 0.74% : 0.000001s : 6: predicate.special_op_eliminate 0.80% : 0.000001s : 6: predicate.specialize_transform 0.89% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 13: predicate.switch_defer_inline 2.00% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.84% : 0.000008s : 43: predicate.switch_simplify 0.92% : 0.000001s : 9: predicate.tile_eliminate 0.89% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.67% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.34% : 0.000001s : 3: predicate.value_based_eliminate 0.74% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.82% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000469 8 48.48% : 0.000227s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.52% : 0.000242s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.034746 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.45% : 0.003980s : 1: add_attr 11.42% : 0.003967s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000071s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.73% : 0.000600s : 1: bootstrap 0.08% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000026s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.04% : 0.000012s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.05% : 0.000017s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.33% : 0.000463s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.68% : 0.000583s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000014s : 1: opt.transform.mutable_eliminate 2.76% : 0.000960s : 78: opt.transform.opt_a 0.07% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.26% : 0.000089s : 28: opt.transform.opt_b 0.12% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000035s : 4: opt.transform.symbol_engine_opt 7.01% : 0.002437s : 1: opt_a 0.29% : 0.000102s : 1: opt_after_cconv 1.43% : 0.000497s : 1: opt_after_jit_grad 0.56% : 0.000196s : 1: opt_b 13.04% : 0.004530s : 1: optimize 0.06% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.10% : 0.000034s : 1: pre_auto_parallel 0.08% : 0.000028s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000020s : 1: remove_dup_value 0.97% : 0.000338s : 1: renormalize.infer 0.84% : 0.000291s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000041s : 1: rewriter_after_opt_a 0.21% : 0.000073s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000078s : 1: symbol_engine_optimizer 21.77% : 0.007564s : 1: task_emit 0.21% : 0.000074s : 1: tuple_transform 20.13% : 0.006993s : 1: type_inference 0.22% : 0.000075s : 1: validate TotalTime = 0.0209333, [24] [bootstrap]: 0.00044669 [type_inference]: 0.00616069 [event_method]: 1.289e-05 [auto_monad]: 6.09e-05 [graph_reusing]: 6.14001e-06 [inline]: 2.32999e-06 [add_attr]: 0.00307868, [1] [add_attr_with_inline]: 0.00307039, [1] [Cycle 1]: 5.101e-05, [2] [tag_attr]: 1.536e-05 [meta_addattr_fg_expand]: 4.35999e-06 [parallel-infer-symbol]: 3.36999e-06 [pre_auto_parallel]: 2.601e-05 [insert-virtual-dataset]: 2.49999e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.26e-06 [pipeline_split]: 1.74998e-06 [optimize]: 0.00406939, [53] [py_interpret_to_execute]: 2.082e-05 [rewriter_before_opt_a]: 5.013e-05 [opt_a]: 0.00211633, [2] [Cycle 1]: 0.00149487, [45] [expand_dump_flag]: 2.89999e-06 [switch_simplify]: 2.943e-05 [loop_unroll]: 1.704e-05 [a_1]: 0.00035892 [with_stream_mark]: 1.536e-05 [recompute_prepare]: 8.27e-06 [updatestate_depend_eliminate]: 3.68e-06 [updatestate_assign_eliminate]: 3.79002e-06 [updatestate_loads_eliminate]: 3.18998e-06 [parameter_eliminate]: 2.04e-06 [a_2]: 8.055e-05 [accelerated_algorithm]: 6.56e-06 [shard]: 2.03002e-06 [meta_shard_fg_expand]: 1.91e-06 [shard_inline]: 6.16e-06 [merge_send_recv]: 8.75999e-06 [auto_parallel]: 6.27001e-06 [parallel]: 2.065e-05 [flash_sp]: 7.77e-06 [merge_comm]: 4.38001e-06 [allreduce_fusion]: 3.62002e-06 [matmul_add_comm_reduction]: 9.85002e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.36999e-06 [virtual_dataset]: 6.15002e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.77001e-06 [merge_forward]: 4.4e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 9.51e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.169e-05 [merge_recompute_call_nodes]: 1.49998e-06 [before_grad]: 1.094e-05 [set_forward_comm_id_for_comm_node_pass]: 3.68e-06 [meta_fg_expand]: 2.66e-06 [flash_sp_send_recv_attached]: 2.54001e-06 [receive_attached]: 2.17001e-06 [after_resolve]: 9.56e-06 [a_after_grad]: 8.59e-06 [renormalize]: 0.00046645 [add_forward_monad_depend]: 4.87e-06 [auto_monad_grad]: 2.61999e-06 [auto_monad_eliminator]: 1.387e-05 [cse]: 3.226e-05 [a_3]: 4.213e-05 [Cycle 2]: 0.00061164, [45] [expand_dump_flag]: 1.12e-06 [switch_simplify]: 7.14001e-06 [loop_unroll]: 6.22001e-06 [a_1]: 0.00011605 [with_stream_mark]: 9.78998e-06 [recompute_prepare]: 6.14999e-06 [updatestate_depend_eliminate]: 2.99001e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.70002e-06 [parameter_eliminate]: 8.49977e-07 [a_2]: 7.131e-05 [accelerated_algorithm]: 6.06e-06 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.16002e-06 [shard_inline]: 5.75001e-06 [merge_send_recv]: 4.72998e-06 [auto_parallel]: 5.61003e-06 [parallel]: 4.68001e-06 [flash_sp]: 3.34001e-06 [merge_comm]: 3.12002e-06 [allreduce_fusion]: 2.86e-06 [matmul_add_comm_reduction]: 5.34998e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.76999e-06 [virtual_dataset]: 5.72001e-06 [get_grad_eliminate_]: 5.24e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.78998e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 6.48e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.076e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 9.05999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.85e-06 [meta_fg_expand]: 2.06e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 8.70001e-06 [a_after_grad]: 7.55998e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.16002e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 6.84999e-06 [cse]: 1.395e-05 [a_3]: 3.256e-05 [py_interpret_to_execute_after_opt_a]: 7.53999e-06 [slice_cell_reuse_recomputed_activation]: 2.07001e-06 [rewriter_after_opt_a]: 3.561e-05 [convert_after_rewriter]: 6.89999e-06 [order_py_execute_after_rewriter]: 5.44e-06 [mutable_eliminate]: 0.0004915 [opt_b]: 0.00020077, [1] [Cycle 1]: 0.0001942, [7] [b_1]: 0.00010916 [b_2]: 7.11999e-06 [updatestate_depend_eliminate]: 5.57999e-06 [updatestate_assign_eliminate]: 2.53003e-06 [updatestate_loads_eliminate]: 2.58e-06 [renormalize]: 4.00003e-07 [cse]: 1.952e-05 [optimize_parallel_all_gather_comm]: 1.76e-05 [overlap_param_gather]: 1.96e-06 [cconv]: 2.402e-05 [loop_unroll]: 0.0004334 [opt_after_cconv]: 9.749e-05, [1] [Cycle 1]: 9.169e-05, [7] [c_1]: 2.583e-05 [parameter_eliminate]: 2.44999e-06 [updatestate_depend_eliminate]: 5.82999e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.69999e-06 [cse]: 1.704e-05 [renormalize]: 4.49974e-07 [remove_dup_value]: 1.523e-05 [tuple_transform]: 6.898e-05, [1] [Cycle 1]: 6.457e-05, [4] [d_1]: 3.787e-05 [none_parameter_eliminate]: 1.75001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.59001e-06 [partial_unused_args_eliminate]: 1.98002e-06 [add_recomputation]: 4.698e-05 [cse_after_recomputation]: 2.19e-05, [1] [Cycle 1]: 1.754e-05, [1] [cse]: 1.183e-05 [environ_conv]: 5.54998e-06 [swap_dp_allreduce_reducescatter]: 5.12e-06 [bias_add_comm_swap]: 2.79999e-06 [label_micro_interleaved_index]: 4.77e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.61e-06 [micro_interleaved_order_control]: 2.61999e-06 [assign_add_opt]: 1.76e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.32999e-06 [reorder_send_recv_between_fp_bp]: 3.56999e-06 [comm_op_add_attrs]: 1.40999e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.30001e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.50001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.317e-05 [grouped_pairwise_exchange_alltoall]: 1.94e-06 [offloading_packed_experts]: 4.09002e-06 [overlap_recompute_and_grad_model_parallel]: 4.92e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.30001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.57001e-06 [overlap_recompute_comm]: 2.44001e-06 [overlap_grad_ring_attention]: 4.38001e-06 [overlap_grad_flash_sp]: 1.938e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.58e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 1.50999e-06 [symbol_engine_optimizer]: 7.345e-05, [1] [Cycle 1]: 6.917e-05, [6] [build]: 2.58e-06 [elim_shapecalc]: 9.28002e-06 [elim_not_effective]: 1.235e-05 [opt_reshape]: 6.36e-06 [fold_const_symbol]: 9.62999e-06 [renormalize]: 2.20025e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.97999e-06 [auto_monad_reorder]: 1.586e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.80998e-06 [opt_after_jit_grad]: 0.00047138 [validate]: 3.635e-05 [backend_pass]: 1.12999e-06 [task_emit]: 0.00629185 [execute]: 7.99002e-06 Sums bootstrap : 0.000447s : 2.66% type_inference : 0.006161s : 36.65% event_method : 0.000013s : 0.08% auto_monad : 0.000061s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000050s : 0.30% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000037s : 0.22% optimize.opt_a.loop_unroll : 0.000023s : 0.14% optimize.opt_a.a_1 : 0.000475s : 2.83% optimize.opt_a.with_stream_mark : 0.000025s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000152s : 0.90% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000025s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000020s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000467s : 2.78% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000046s : 0.27% optimize.opt_a.a_3 : 0.000075s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000491s : 2.92% optimize.opt_b.b_1 : 0.000109s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.14% optimize.loop_unroll : 0.000433s : 2.58% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000004s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000019s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000002s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000471s : 2.80% validate : 0.000036s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006292s : 37.43% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000147 24 20.40% : 0.000030s : 4: substitution.arithmetic_simplify 1.49% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000002s : 2: substitution.fold_const_symbol 3.80% : 0.000006s : 3: substitution.graph_param_transform 65.49% : 0.000096s : 3: substitution.inline 2.45% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.05% : 0.000004s : 4: substitution.remove_not_recompute_node 2.26% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006107 2 92.25% : 0.005634s : 1: type_inference.infer 7.75% : 0.000473s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000094 3 100.00% : 0.000094s : 3: match.inline ------[predicate.] 0.000148 815 0.91% : 0.000001s : 8: predicate.accumulaten_eliminater 0.89% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.89% : 0.000001s : 8: predicate.addn_zero_filter 0.82% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.04% : 0.000003s : 14: predicate.arithmetic_simplify 0.91% : 0.000001s : 8: predicate.cast_eliminate 0.68% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.69% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.19% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 3: predicate.elim_not_effective 0.46% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_depend_swap 1.82% : 0.000003s : 17: predicate.environ_get_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.16% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 11: predicate.float_depend_g_call 0.60% : 0.000001s : 6: predicate.float_environ_get_switch 0.89% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.78% : 0.000001s : 6: predicate.incorporate_call 0.64% : 0.000001s : 6: predicate.incorporate_call_switch 6.19% : 0.000009s : 37: predicate.inline 0.94% : 0.000001s : 6: predicate.inline_without_move 0.43% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.02% : 0.000002s : 6: predicate.less_batch_normalization 1.53% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.34% : 0.000003s : 22: predicate.load_eliminater 0.95% : 0.000001s : 3: predicate.loop_unroll_after_grad 2.08% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.67% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 8: predicate.minmaximum_grad 1.28% : 0.000002s : 3: predicate.mutable_eliminate 0.40% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.67% : 0.000002s : 11: predicate.partial_defer_inline 1.31% : 0.000002s : 11: predicate.partial_eliminate 0.81% : 0.000001s : 8: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.12% : 0.000002s : 8: predicate.reduce_eliminate 2.23% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.61% : 0.000001s : 6: predicate.remove_not_recompute_node 1.24% : 0.000002s : 14: predicate.replace_applicator 0.81% : 0.000001s : 6: predicate.replace_old_param 0.32% : 0.000000s : 3: predicate.reset_defer_inline 0.91% : 0.000001s : 8: predicate.reshape_eliminate 0.65% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 3: predicate.row_tensor_eliminate 0.81% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 6: predicate.shard_identity_eliminate 0.78% : 0.000001s : 6: predicate.special_op_eliminate 0.90% : 0.000001s : 6: predicate.specialize_transform 1.04% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.23% : 0.000002s : 11: predicate.switch_defer_inline 1.93% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.96% : 0.000007s : 38: predicate.switch_simplify 0.88% : 0.000001s : 8: predicate.tile_eliminate 0.90% : 0.000001s : 8: predicate.transpose_eliminate 1.63% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.66% : 0.000004s : 20: predicate.tuple_list_set_item_eliminator 1.73% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.17% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 3: predicate.value_based_eliminate 0.73% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000295 7 36.62% : 0.000108s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.38% : 0.000187s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029534 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.44% : 0.003083s : 1: add_attr 10.41% : 0.003074s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.02% : 0.000005s : 1: assign_add_opt 0.22% : 0.000066s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.64% : 0.000484s : 1: bootstrap 0.09% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.50% : 0.000442s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.69% : 0.000500s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.85% : 0.000841s : 78: opt.transform.opt_a 0.08% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.18% : 0.002119s : 1: opt_a 0.34% : 0.000101s : 1: opt_after_cconv 1.63% : 0.000481s : 1: opt_after_jit_grad 0.69% : 0.000204s : 1: opt_b 13.79% : 0.004074s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000006s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.84% : 0.000248s : 1: renormalize.infer 0.71% : 0.000211s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000040s : 1: rewriter_after_opt_a 0.18% : 0.000054s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000006s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000076s : 1: symbol_engine_optimizer 21.35% : 0.006307s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.93% : 0.006182s : 1: type_inference 0.23% : 0.000068s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x4-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x4-kbk],max_mem:12.0M TotalTime = 0.0659255, [24] [bootstrap]: 0.00059595 [type_inference]: 0.0067347 [event_method]: 1.396e-05 [auto_monad]: 5.984e-05 [graph_reusing]: 5.25001e-06 [inline]: 2.21e-06 [add_attr]: 0.00369568, [1] [add_attr_with_inline]: 0.00368442, [1] [Cycle 1]: 4.878e-05, [2] [tag_attr]: 1.523e-05 [meta_addattr_fg_expand]: 4.48999e-06 [parallel-infer-symbol]: 3.33e-06 [pre_auto_parallel]: 2.642e-05 [insert-virtual-dataset]: 2.75002e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 2.19999e-06 [pipeline_split]: 1.73002e-06 [optimize]: 0.00418773, [53] [py_interpret_to_execute]: 2.308e-05 [rewriter_before_opt_a]: 6.346e-05 [opt_a]: 0.00228545, [2] [Cycle 1]: 0.00167128, [45] [expand_dump_flag]: 2.79999e-06 [switch_simplify]: 3.431e-05 [loop_unroll]: 2.057e-05 [a_1]: 0.00045683 [with_stream_mark]: 1.456e-05 [recompute_prepare]: 8.70999e-06 [updatestate_depend_eliminate]: 4.67e-06 [updatestate_assign_eliminate]: 3.46999e-06 [updatestate_loads_eliminate]: 3.2e-06 [parameter_eliminate]: 1.79998e-06 [a_2]: 8.031e-05 [accelerated_algorithm]: 6.78e-06 [shard]: 1.96e-06 [meta_shard_fg_expand]: 1.68002e-06 [shard_inline]: 6.14001e-06 [merge_send_recv]: 8.88002e-06 [auto_parallel]: 6.24001e-06 [parallel]: 2.554e-05 [flash_sp]: 8.22e-06 [merge_comm]: 4e-06 [allreduce_fusion]: 3.76001e-06 [matmul_add_comm_reduction]: 9.30001e-06 [allreduce_slice_to_reducescatter]: 8.79983e-07 [virtual_shard_identity]: 7.48999e-06 [virtual_dataset]: 6.57002e-06 [get_grad_eliminate_]: 5.86e-06 [virtual_output]: 5.96e-06 [merge_forward]: 3.93999e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 1.004e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.159e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 1.099e-05 [set_forward_comm_id_for_comm_node_pass]: 3.84002e-06 [meta_fg_expand]: 2.59999e-06 [flash_sp_send_recv_attached]: 2.49001e-06 [receive_attached]: 2.06998e-06 [after_resolve]: 9.29998e-06 [a_after_grad]: 9.04998e-06 [renormalize]: 0.00048197 [add_forward_monad_depend]: 9.61e-06 [auto_monad_grad]: 1.91e-06 [auto_monad_eliminator]: 1.399e-05 [cse]: 7.214e-05 [a_3]: 4.261e-05 [Cycle 2]: 0.00060425, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 7.15e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00011545 [with_stream_mark]: 1.038e-05 [recompute_prepare]: 6.11e-06 [updatestate_depend_eliminate]: 3.13e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.78998e-06 [parameter_eliminate]: 8.29983e-07 [a_2]: 7.137e-05 [accelerated_algorithm]: 5.68002e-06 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.86998e-06 [merge_send_recv]: 4.38999e-06 [auto_parallel]: 5.44e-06 [parallel]: 4.05e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 3.33998e-06 [allreduce_fusion]: 3.01999e-06 [matmul_add_comm_reduction]: 5.32001e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.33998e-06 [virtual_dataset]: 5.35999e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 2.49999e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 5.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.95002e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.71002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.60998e-06 [meta_fg_expand]: 1.82001e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 8.3e-06 [a_after_grad]: 7.7e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.26998e-06 [cse]: 1.42e-05 [a_3]: 3.3e-05 [py_interpret_to_execute_after_opt_a]: 7.74002e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.356e-05 [convert_after_rewriter]: 6.71e-06 [order_py_execute_after_rewriter]: 5.12999e-06 [mutable_eliminate]: 0.00046443 [opt_b]: 0.00018698, [1] [Cycle 1]: 0.00018063, [7] [b_1]: 0.00010971 [b_2]: 6.89999e-06 [updatestate_depend_eliminate]: 5.49998e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.53998e-06 [renormalize]: 3.59985e-07 [cse]: 1.809e-05 [optimize_parallel_all_gather_comm]: 1.58e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 2.234e-05 [loop_unroll]: 0.00041815 [opt_after_cconv]: 9.684e-05, [1] [Cycle 1]: 9.117e-05, [7] [c_1]: 2.648e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 5.33002e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.40002e-06 [cse]: 1.757e-05 [renormalize]: 4.10015e-07 [remove_dup_value]: 1.546e-05 [tuple_transform]: 6.815e-05, [1] [Cycle 1]: 6.362e-05, [4] [d_1]: 3.636e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.41e-06 [partial_unused_args_eliminate]: 1.86998e-06 [add_recomputation]: 4.862e-05 [cse_after_recomputation]: 2.189e-05, [1] [Cycle 1]: 1.728e-05, [1] [cse]: 1.171e-05 [environ_conv]: 7.24001e-06 [swap_dp_allreduce_reducescatter]: 5.05999e-06 [bias_add_comm_swap]: 2.45002e-06 [label_micro_interleaved_index]: 4.27998e-06 [label_fine_grained_interleaved_index]: 2.54001e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.16998e-06 [micro_interleaved_order_control]: 2.86999e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.09998e-06 [full_micro_interleaved_order_control]: 2.43e-06 [reorder_send_recv_between_fp_bp]: 2.81e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 1.19e-06 [interleave_split_concat_branches]: 1.27999e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.283e-05 [grouped_pairwise_exchange_alltoall]: 1.89e-06 [offloading_packed_experts]: 4.08999e-06 [overlap_recompute_and_grad_model_parallel]: 4.79e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.93e-06 [overlap_grad_ring_attention]: 4.17e-06 [overlap_grad_flash_sp]: 1.764e-05 [begin_end_overlap_inline]: 7.60017e-07 [split_matmul_comm_elemetwise]: 2.41998e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 1.40999e-06 [symbol_engine_optimizer]: 7.26e-05, [1] [Cycle 1]: 6.824e-05, [6] [build]: 2.51e-06 [elim_shapecalc]: 9.39998e-06 [elim_not_effective]: 1.231e-05 [opt_reshape]: 6.59001e-06 [fold_const_symbol]: 9.35001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.614e-05 [get_jit_bprop_graph]: 1.12999e-06 [rewriter_after_jit_bprop_graph]: 3.61001e-06 [opt_after_jit_grad]: 0.00044921 [validate]: 3.443e-05 [backend_pass]: 1.32e-06 [task_emit]: 0.049854 [execute]: 9.74999e-06 Sums bootstrap : 0.000596s : 0.97% type_inference : 0.006735s : 11.00% event_method : 0.000014s : 0.02% auto_monad : 0.000060s : 0.10% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000026s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.04% optimize.rewriter_before_opt_a : 0.000063s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000041s : 0.07% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000572s : 0.94% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000015s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000152s : 0.25% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000030s : 0.05% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000482s : 0.79% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.02% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000086s : 0.14% optimize.opt_a.a_3 : 0.000076s : 0.12% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000464s : 0.76% optimize.opt_b.b_1 : 0.000110s : 0.18% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.04% optimize.loop_unroll : 0.000418s : 0.68% optimize.opt_after_cconv.c_1 : 0.000026s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000036s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.08% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000007s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000449s : 0.73% validate : 0.000034s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.049854s : 81.46% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000175 26 18.45% : 0.000032s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 2.92% : 0.000005s : 3: substitution.graph_param_transform 65.06% : 0.000114s : 3: substitution.inline 2.09% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.50% : 0.000004s : 4: substitution.remove_not_recompute_node 1.73% : 0.000003s : 2: substitution.replace_old_param 5.37% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006685 2 90.36% : 0.006040s : 1: type_inference.infer 9.64% : 0.000644s : 1: type_inference.specialize ------[replace.] 0.000038 4 78.05% : 0.000029s : 3: replace.inline 21.95% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 4 92.87% : 0.000112s : 3: match.inline 7.13% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 883 0.98% : 0.000002s : 9: predicate.accumulaten_eliminater 0.80% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.94% : 0.000001s : 9: predicate.addn_zero_filter 0.85% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.31% : 0.000004s : 15: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.66% : 0.000001s : 6: predicate.check_bprop_eliminate 0.56% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.62% : 0.000001s : 6: predicate.depend_value_elim 0.95% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 9: predicate.dict_set_item_eliminator 0.99% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.45% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_depend_swap 1.87% : 0.000003s : 18: predicate.environ_get_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.18% : 0.000003s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.92% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.70% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.63% : 0.000001s : 6: predicate.incorporate_call 0.56% : 0.000001s : 6: predicate.incorporate_call_switch 6.42% : 0.000010s : 40: predicate.inline 0.90% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 6: predicate.less_batch_normalization 1.61% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 25: predicate.load_eliminater 0.95% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.22% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.61% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.55% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.83% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 9: predicate.minmaximum_grad 1.02% : 0.000002s : 3: predicate.mutable_eliminate 0.59% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.68% : 0.000003s : 13: predicate.partial_defer_inline 1.47% : 0.000002s : 13: predicate.partial_eliminate 0.95% : 0.000002s : 9: predicate.print_const_string_wrapper 0.59% : 0.000001s : 6: predicate.reduce_all_const_elim 1.17% : 0.000002s : 9: predicate.reduce_eliminate 2.44% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.26% : 0.000002s : 16: predicate.replace_applicator 0.62% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.96% : 0.000002s : 9: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 3: predicate.row_tensor_eliminate 0.90% : 0.000001s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 6: predicate.shard_identity_eliminate 0.71% : 0.000001s : 6: predicate.special_op_eliminate 0.85% : 0.000001s : 6: predicate.specialize_transform 0.87% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.73% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.43% : 0.000002s : 13: predicate.switch_defer_inline 2.06% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.11% : 0.000008s : 43: predicate.switch_simplify 1.00% : 0.000002s : 9: predicate.tile_eliminate 0.87% : 0.000001s : 9: predicate.transpose_eliminate 1.61% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.76% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.71% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000417 8 46.34% : 0.000193s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.66% : 0.000224s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.075377 196 0.00% : 0.000004s : 1: ForceFp32Comm 4.91% : 0.003700s : 1: add_attr 4.89% : 0.003688s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000065s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.84% : 0.000636s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.03% : 0.000020s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.57% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.63% : 0.000474s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.25% : 0.000945s : 78: opt.transform.opt_a 0.03% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000088s : 28: opt.transform.opt_b 0.05% : 0.000041s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 3.04% : 0.002288s : 1: opt_a 0.13% : 0.000100s : 1: opt_after_cconv 0.61% : 0.000459s : 1: opt_after_jit_grad 0.25% : 0.000190s : 1: opt_b 5.56% : 0.004192s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000031s : 1: pre_auto_parallel 0.04% : 0.000027s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.33% : 0.000250s : 1: renormalize.infer 0.30% : 0.000225s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000038s : 1: rewriter_after_opt_a 0.09% : 0.000068s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.10% : 0.000075s : 1: symbol_engine_optimizer 66.17% : 0.049878s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 8.96% : 0.006750s : 1: type_inference 0.08% : 0.000057s : 1: validate TotalTime = 0.0561483, [24] [bootstrap]: 0.00048103 [type_inference]: 0.00602867 [event_method]: 1.261e-05 [auto_monad]: 5.888e-05 [graph_reusing]: 5.44e-06 [inline]: 1.94999e-06 [add_attr]: 0.00303811, [1] [add_attr_with_inline]: 0.00303035, [1] [Cycle 1]: 5.257e-05, [2] [tag_attr]: 1.362e-05 [meta_addattr_fg_expand]: 3.96001e-06 [parallel-infer-symbol]: 2.94001e-06 [pre_auto_parallel]: 2.373e-05 [insert-virtual-dataset]: 2.76e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 2.02001e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.00390336, [53] [py_interpret_to_execute]: 2.025e-05 [rewriter_before_opt_a]: 5.333e-05 [opt_a]: 0.00202931, [2] [Cycle 1]: 0.001417, [45] [expand_dump_flag]: 3.03e-06 [switch_simplify]: 2.801e-05 [loop_unroll]: 1.711e-05 [a_1]: 0.00035607 [with_stream_mark]: 1.436e-05 [recompute_prepare]: 7.90998e-06 [updatestate_depend_eliminate]: 3.96001e-06 [updatestate_assign_eliminate]: 4.03999e-06 [updatestate_loads_eliminate]: 3.11001e-06 [parameter_eliminate]: 1.81998e-06 [a_2]: 9.185e-05 [accelerated_algorithm]: 6.79001e-06 [shard]: 2.04999e-06 [meta_shard_fg_expand]: 1.66998e-06 [shard_inline]: 6.43e-06 [merge_send_recv]: 8.84998e-06 [auto_parallel]: 6.69001e-06 [parallel]: 1.802e-05 [flash_sp]: 8.07e-06 [merge_comm]: 3.96001e-06 [allreduce_fusion]: 3.53e-06 [matmul_add_comm_reduction]: 9.72001e-06 [allreduce_slice_to_reducescatter]: 1.02e-06 [virtual_shard_identity]: 7.50998e-06 [virtual_dataset]: 5.89e-06 [get_grad_eliminate_]: 5.57999e-06 [virtual_output]: 5.75001e-06 [merge_forward]: 3.9e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 9.11998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.194e-05 [merge_recompute_call_nodes]: 1.61998e-06 [before_grad]: 1.042e-05 [set_forward_comm_id_for_comm_node_pass]: 3.86999e-06 [meta_fg_expand]: 2.64001e-06 [flash_sp_send_recv_attached]: 2.79001e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 9.19e-06 [a_after_grad]: 8.59e-06 [renormalize]: 0.00038775 [add_forward_monad_depend]: 4.55999e-06 [auto_monad_grad]: 2.00002e-06 [auto_monad_eliminator]: 1.38e-05 [cse]: 3.101e-05 [a_3]: 4.224e-05 [Cycle 2]: 0.00060298, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.84e-06 [a_1]: 0.00011442 [with_stream_mark]: 9.62999e-06 [recompute_prepare]: 6.31998e-06 [updatestate_depend_eliminate]: 2.94999e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.63998e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 7.246e-05 [accelerated_algorithm]: 5.84e-06 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.19003e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 4.44998e-06 [auto_parallel]: 5.62999e-06 [parallel]: 4.13001e-06 [flash_sp]: 3.44001e-06 [merge_comm]: 3.37002e-06 [allreduce_fusion]: 3.01999e-06 [matmul_add_comm_reduction]: 5.37001e-06 [allreduce_slice_to_reducescatter]: 4.19997e-07 [virtual_shard_identity]: 5.93002e-06 [virtual_dataset]: 5.15999e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.74999e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.013e-05 [merge_recompute_call_nodes]: 8.70001e-07 [before_grad]: 8.77999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.61999e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 9.39996e-07 [after_resolve]: 8.50999e-06 [a_after_grad]: 7.7e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.40001e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.34e-05 [a_3]: 3.247e-05 [py_interpret_to_execute_after_opt_a]: 7.50998e-06 [slice_cell_reuse_recomputed_activation]: 1.97001e-06 [rewriter_after_opt_a]: 3.182e-05 [convert_after_rewriter]: 6.74999e-06 [order_py_execute_after_rewriter]: 5.16998e-06 [mutable_eliminate]: 0.00045991 [opt_b]: 0.00018536, [1] [Cycle 1]: 0.00017916, [7] [b_1]: 0.00010929 [b_2]: 6.76e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 4.10015e-07 [cse]: 1.784e-05 [optimize_parallel_all_gather_comm]: 1.607e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.135e-05 [loop_unroll]: 0.00042596 [opt_after_cconv]: 9.689e-05, [1] [Cycle 1]: 9.115e-05, [7] [c_1]: 2.579e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.827e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.535e-05 [tuple_transform]: 6.914e-05, [1] [Cycle 1]: 6.434e-05, [4] [d_1]: 3.695e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.60014e-07 [switch_simplify]: 6.73e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.34e-05 [cse_after_recomputation]: 2.077e-05, [1] [Cycle 1]: 1.619e-05, [1] [cse]: 1.098e-05 [environ_conv]: 5.63002e-06 [swap_dp_allreduce_reducescatter]: 4.87e-06 [bias_add_comm_swap]: 2.66e-06 [label_micro_interleaved_index]: 4.39002e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.11998e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.65001e-06 [ForceFp32Comm]: 1.02e-06 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.48e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 1.06997e-06 [add_comm_op_reuse_tag]: 1.37e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.23002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.192e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 4.38999e-06 [overlap_recompute_and_grad_model_parallel]: 4.57e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.59e-06 [overlap_recompute_comm]: 2.28998e-06 [overlap_grad_ring_attention]: 4.28001e-06 [overlap_grad_flash_sp]: 1.782e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.22001e-06 [split_layernorm_comm]: 2.07001e-06 [handle_group_info]: 1.35001e-06 [symbol_engine_optimizer]: 7.366e-05, [1] [Cycle 1]: 6.936e-05, [6] [build]: 2.17999e-06 [elim_shapecalc]: 8.85999e-06 [elim_not_effective]: 1.364e-05 [opt_reshape]: 6.66999e-06 [fold_const_symbol]: 9.44998e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.82001e-06 [pipeline_parallel_scheduler]: 1.62999e-06 [auto_monad_reorder]: 1.581e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.63999e-06 [opt_after_jit_grad]: 0.00045478 [validate]: 3.45e-05 [backend_pass]: 1.20999e-06 [task_emit]: 0.0418517 [execute]: 8.99003e-06 Sums bootstrap : 0.000481s : 0.92% type_inference : 0.006029s : 11.57% event_method : 0.000013s : 0.02% auto_monad : 0.000059s : 0.11% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000024s : 0.05% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.04% optimize.rewriter_before_opt_a : 0.000053s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000035s : 0.07% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000470s : 0.90% optimize.opt_a.with_stream_mark : 0.000024s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000164s : 0.32% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.03% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.04% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.03% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.03% optimize.opt_a.renormalize : 0.000388s : 0.74% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000044s : 0.09% optimize.opt_a.a_3 : 0.000075s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000460s : 0.88% optimize.opt_b.b_1 : 0.000109s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.04% optimize.loop_unroll : 0.000426s : 0.82% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.04% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000037s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.03% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000455s : 0.87% validate : 0.000035s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.041852s : 80.32% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000144 24 20.13% : 0.000029s : 4: substitution.arithmetic_simplify 1.68% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000001s : 2: substitution.fold_const_symbol 3.65% : 0.000005s : 3: substitution.graph_param_transform 66.28% : 0.000095s : 3: substitution.inline 2.28% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.03% : 0.000004s : 4: substitution.remove_not_recompute_node 2.00% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005983 2 91.62% : 0.005481s : 1: type_inference.infer 8.38% : 0.000501s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000093 3 100.00% : 0.000093s : 3: match.inline ------[predicate.] 0.000155 815 0.83% : 0.000001s : 8: predicate.accumulaten_eliminater 0.84% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.82% : 0.000001s : 8: predicate.addn_zero_filter 0.78% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.24% : 0.000003s : 14: predicate.arithmetic_simplify 0.83% : 0.000001s : 8: predicate.cast_eliminate 0.66% : 0.000001s : 6: predicate.check_bprop_eliminate 0.61% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.86% : 0.000001s : 6: predicate.depend_value_elim 0.77% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 8: predicate.dict_set_item_eliminator 0.97% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.44% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.01% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.00% : 0.000002s : 11: predicate.environ_get_depend_swap 1.88% : 0.000003s : 17: predicate.environ_get_eliminate 1.03% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.09% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.06% : 0.000003s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.90% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 6.71% : 0.000010s : 6: predicate.incorporate_call_switch 6.00% : 0.000009s : 37: predicate.inline 0.96% : 0.000001s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 6: predicate.less_batch_normalization 1.47% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.10% : 0.000003s : 22: predicate.load_eliminater 0.98% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.87% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.57% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.59% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 8: predicate.minmaximum_grad 1.08% : 0.000002s : 3: predicate.mutable_eliminate 0.44% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.38% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 11: predicate.partial_eliminate 0.79% : 0.000001s : 8: predicate.print_const_string_wrapper 0.75% : 0.000001s : 6: predicate.reduce_all_const_elim 1.08% : 0.000002s : 8: predicate.reduce_eliminate 2.07% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.59% : 0.000001s : 6: predicate.remove_not_recompute_node 1.20% : 0.000002s : 14: predicate.replace_applicator 0.62% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 0.85% : 0.000001s : 8: predicate.reshape_eliminate 0.59% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.84% : 0.000001s : 6: predicate.same_eliminate 0.50% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 6: predicate.shard_identity_eliminate 0.71% : 0.000001s : 6: predicate.special_op_eliminate 0.81% : 0.000001s : 6: predicate.specialize_transform 0.92% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.72% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.21% : 0.000002s : 11: predicate.switch_defer_inline 1.78% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.69% : 0.000007s : 38: predicate.switch_simplify 0.83% : 0.000001s : 8: predicate.tile_eliminate 0.79% : 0.000001s : 8: predicate.transpose_eliminate 1.54% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000004s : 20: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.45% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.90% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 3: predicate.value_based_eliminate 0.71% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000313 7 42.07% : 0.000132s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.93% : 0.000181s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064472 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.72% : 0.003043s : 1: add_attr 4.71% : 0.003034s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000064s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.80% : 0.000516s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000018s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.67% : 0.000434s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.73% : 0.000469s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.31% : 0.000844s : 78: opt.transform.opt_a 0.04% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000089s : 28: opt.transform.opt_b 0.06% : 0.000041s : 2: opt.transform.opt_trans_graph 0.05% : 0.000035s : 4: opt.transform.symbol_engine_opt 3.15% : 0.002032s : 1: opt_a 0.16% : 0.000100s : 1: opt_after_cconv 0.72% : 0.000465s : 1: opt_after_jit_grad 0.29% : 0.000189s : 1: opt_b 6.06% : 0.003907s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000028s : 1: pre_auto_parallel 0.04% : 0.000024s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.31% : 0.000200s : 1: renormalize.infer 0.28% : 0.000181s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000035s : 1: rewriter_after_opt_a 0.09% : 0.000057s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000077s : 1: symbol_engine_optimizer 64.95% : 0.041871s : 1: task_emit 0.11% : 0.000072s : 1: tuple_transform 9.37% : 0.006044s : 1: type_inference 0.09% : 0.000057s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x4-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x4-ge],max_mem:12.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x5-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x5-pynative],max_mem:12.0M TotalTime = 0.0220059, [24] [bootstrap]: 0.00053735 [type_inference]: 0.00655666 [event_method]: 1.361e-05 [auto_monad]: 6.689e-05 [graph_reusing]: 5.30001e-06 [inline]: 1.77001e-06 [add_attr]: 0.00362766, [1] [add_attr_with_inline]: 0.00361773, [1] [Cycle 1]: 4.591e-05, [2] [tag_attr]: 1.501e-05 [meta_addattr_fg_expand]: 4.48999e-06 [parallel-infer-symbol]: 2.99001e-06 [pre_auto_parallel]: 2.607e-05 [insert-virtual-dataset]: 2.46998e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.32001e-06 [pipeline_split]: 1.73002e-06 [optimize]: 0.00414583, [53] [py_interpret_to_execute]: 2.115e-05 [rewriter_before_opt_a]: 6.227e-05 [opt_a]: 0.00223255, [2] [Cycle 1]: 0.00161706, [45] [expand_dump_flag]: 2.43e-06 [switch_simplify]: 3.311e-05 [loop_unroll]: 2.05e-05 [a_1]: 0.00047899 [with_stream_mark]: 1.407e-05 [recompute_prepare]: 8.05e-06 [updatestate_depend_eliminate]: 3.76999e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 3.01001e-06 [parameter_eliminate]: 1.91e-06 [a_2]: 7.977e-05 [accelerated_algorithm]: 6.61e-06 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 1.88002e-06 [shard_inline]: 6.16e-06 [merge_send_recv]: 8.32e-06 [auto_parallel]: 6.17001e-06 [parallel]: 2.579e-05 [flash_sp]: 7.53999e-06 [merge_comm]: 3.66001e-06 [allreduce_fusion]: 3.71999e-06 [matmul_add_comm_reduction]: 9.79e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.71999e-06 [virtual_dataset]: 6.16e-06 [get_grad_eliminate_]: 5.96998e-06 [virtual_output]: 5.92001e-06 [merge_forward]: 3.88999e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 9.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.126e-05 [merge_recompute_call_nodes]: 1.56998e-06 [before_grad]: 1.009e-05 [set_forward_comm_id_for_comm_node_pass]: 3.66999e-06 [meta_fg_expand]: 2.71999e-06 [flash_sp_send_recv_attached]: 2.44001e-06 [receive_attached]: 1.96e-06 [after_resolve]: 9.70002e-06 [a_after_grad]: 8.69e-06 [renormalize]: 0.00045813 [add_forward_monad_depend]: 8.65001e-06 [auto_monad_grad]: 1.90001e-06 [auto_monad_eliminator]: 1.359e-05 [cse]: 2.959e-05 [a_3]: 4.272e-05 [Cycle 2]: 0.00060636, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 7.01999e-06 [loop_unroll]: 5.71e-06 [a_1]: 0.00011543 [with_stream_mark]: 1.001e-05 [recompute_prepare]: 5.88002e-06 [updatestate_depend_eliminate]: 3.02002e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.57001e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 7.149e-05 [accelerated_algorithm]: 5.74e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.65001e-06 [shard_inline]: 5.96e-06 [merge_send_recv]: 4.91002e-06 [auto_parallel]: 5.48002e-06 [parallel]: 4.27e-06 [flash_sp]: 3.28e-06 [merge_comm]: 3.4e-06 [allreduce_fusion]: 2.93e-06 [matmul_add_comm_reduction]: 5.35999e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.53003e-06 [virtual_dataset]: 5.62999e-06 [get_grad_eliminate_]: 5.25999e-06 [virtual_output]: 5.15999e-06 [merge_forward]: 2.65997e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 5.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.023e-05 [merge_recompute_call_nodes]: 7.79983e-07 [before_grad]: 9.12001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 1.82001e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.20001e-07 [after_resolve]: 8.10999e-06 [a_after_grad]: 7.68001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 6.43e-06 [cse]: 1.667e-05 [a_3]: 3.401e-05 [py_interpret_to_execute_after_opt_a]: 7.71001e-06 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 3.407e-05 [convert_after_rewriter]: 6.73e-06 [order_py_execute_after_rewriter]: 5.26998e-06 [mutable_eliminate]: 0.00046248 [opt_b]: 0.00019072, [1] [Cycle 1]: 0.0001847, [7] [b_1]: 0.00011147 [b_2]: 7.7e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.41e-06 [renormalize]: 3.89991e-07 [cse]: 1.869e-05 [optimize_parallel_all_gather_comm]: 1.663e-05 [overlap_param_gather]: 1.86003e-06 [cconv]: 2.291e-05 [loop_unroll]: 0.00042459 [opt_after_cconv]: 9.723e-05, [1] [Cycle 1]: 9.155e-05, [7] [c_1]: 2.622e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.47001e-06 [cse]: 1.821e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.624e-05 [tuple_transform]: 7.093e-05, [1] [Cycle 1]: 6.598e-05, [4] [d_1]: 3.762e-05 [none_parameter_eliminate]: 1.99e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 7.05998e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.9e-05 [cse_after_recomputation]: 2.216e-05, [1] [Cycle 1]: 1.741e-05, [1] [cse]: 1.184e-05 [environ_conv]: 7.73001e-06 [swap_dp_allreduce_reducescatter]: 5.73002e-06 [bias_add_comm_swap]: 2.53998e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.42001e-06 [micro_interleaved_order_control]: 2.30002e-06 [assign_add_opt]: 1.54998e-06 [ForceFp32Comm]: 1.04e-06 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.39001e-06 [reorder_send_recv_between_fp_bp]: 3.01999e-06 [comm_op_add_attrs]: 1.26002e-06 [add_comm_op_reuse_tag]: 9.99979e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.09998e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92999e-06 [control_data_broadcast_order]: 1.229e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 4.1e-06 [overlap_recompute_and_grad_model_parallel]: 4.94998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.30002e-06 [overlap_grad_ring_attention]: 4.10998e-06 [overlap_grad_flash_sp]: 1.754e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.22999e-06 [split_layernorm_comm]: 1.99e-06 [handle_group_info]: 1.09998e-06 [symbol_engine_optimizer]: 7.133e-05, [1] [Cycle 1]: 6.699e-05, [6] [build]: 2.52001e-06 [elim_shapecalc]: 8.80001e-06 [elim_not_effective]: 1.18e-05 [opt_reshape]: 6.43e-06 [fold_const_symbol]: 9.37001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.69e-06 [auto_monad_reorder]: 1.648e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.50998e-06 [opt_after_jit_grad]: 0.00048275 [validate]: 3.369e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00625768 [execute]: 8.16002e-06 Sums bootstrap : 0.000537s : 3.09% type_inference : 0.006557s : 37.75% event_method : 0.000014s : 0.08% auto_monad : 0.000067s : 0.39% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000062s : 0.36% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000594s : 3.42% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000151s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000030s : 0.17% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000016s : 0.09% optimize.opt_a.renormalize : 0.000458s : 2.64% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.06% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000046s : 0.27% optimize.opt_a.a_3 : 0.000077s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000462s : 2.66% optimize.opt_b.b_1 : 0.000111s : 0.64% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000425s : 2.44% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000008s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000483s : 2.78% validate : 0.000034s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006258s : 36.03% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000169 26 19.17% : 0.000032s : 5: substitution.arithmetic_simplify 1.20% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 3.15% : 0.000005s : 3: substitution.graph_param_transform 64.08% : 0.000108s : 3: substitution.inline 1.96% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.57% : 0.000004s : 4: substitution.remove_not_recompute_node 1.87% : 0.000003s : 2: substitution.replace_old_param 5.17% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006507 2 90.55% : 0.005892s : 1: type_inference.infer 9.45% : 0.000615s : 1: type_inference.specialize ------[replace.] 0.000037 4 79.31% : 0.000030s : 3: replace.inline 20.69% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 4 93.02% : 0.000106s : 3: match.inline 6.98% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 883 0.93% : 0.000001s : 9: predicate.accumulaten_eliminater 0.85% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 6: predicate.addn_check_dump 0.88% : 0.000001s : 9: predicate.addn_zero_filter 0.85% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.28% : 0.000004s : 15: predicate.arithmetic_simplify 0.93% : 0.000001s : 9: predicate.cast_eliminate 0.62% : 0.000001s : 6: predicate.check_bprop_eliminate 0.59% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_depend_swap 1.78% : 0.000003s : 18: predicate.environ_get_eliminate 1.14% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.33% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.83% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.72% : 0.000001s : 6: predicate.get_grad_eliminate 0.26% : 0.000000s : 3: predicate.graph_param_transform 0.71% : 0.000001s : 6: predicate.incorporate_call 0.57% : 0.000001s : 6: predicate.incorporate_call_switch 6.49% : 0.000010s : 40: predicate.inline 0.90% : 0.000001s : 6: predicate.inline_without_move 0.45% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 6: predicate.less_batch_normalization 1.63% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 25: predicate.load_eliminater 0.95% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.55% : 0.000001s : 6: predicate.merge_addn 0.56% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 9: predicate.minmaximum_grad 1.12% : 0.000002s : 3: predicate.mutable_eliminate 0.40% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.61% : 0.000003s : 13: predicate.partial_defer_inline 1.43% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000001s : 9: predicate.print_const_string_wrapper 0.60% : 0.000001s : 6: predicate.reduce_all_const_elim 1.21% : 0.000002s : 9: predicate.reduce_eliminate 2.44% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 6: predicate.remove_not_recompute_node 1.26% : 0.000002s : 16: predicate.replace_applicator 0.62% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.90% : 0.000001s : 9: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 3: predicate.row_tensor_eliminate 0.87% : 0.000001s : 6: predicate.same_eliminate 0.44% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 6: predicate.shard_identity_eliminate 0.90% : 0.000001s : 6: predicate.special_op_eliminate 0.85% : 0.000001s : 6: predicate.specialize_transform 0.98% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.73% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 13: predicate.switch_defer_inline 1.98% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 43: predicate.switch_simplify 0.90% : 0.000001s : 9: predicate.tile_eliminate 0.91% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.58% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.74% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000398 8 50.97% : 0.000203s : 3: func_graph_cloner_run.FuncGraphClonerGraph 49.03% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031351 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.59% : 0.003632s : 1: add_attr 11.55% : 0.003621s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000072s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.85% : 0.000580s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000011s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.38% : 0.000434s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.51% : 0.000472s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.08% : 0.000966s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.13% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.13% : 0.002236s : 1: opt_a 0.32% : 0.000101s : 1: opt_after_cconv 1.57% : 0.000492s : 1: opt_after_jit_grad 0.62% : 0.000194s : 1: opt_b 13.24% : 0.004150s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000020s : 1: remove_dup_value 0.79% : 0.000248s : 1: renormalize.infer 0.65% : 0.000203s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000038s : 1: rewriter_after_opt_a 0.21% : 0.000066s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000074s : 1: symbol_engine_optimizer 19.99% : 0.006268s : 1: task_emit 0.24% : 0.000074s : 1: tuple_transform 20.96% : 0.006571s : 1: type_inference 0.19% : 0.000060s : 1: validate TotalTime = 0.0201579, [24] [bootstrap]: 0.00045473 [type_inference]: 0.00593689 [event_method]: 1.307e-05 [auto_monad]: 6.026e-05 [graph_reusing]: 4.97999e-06 [inline]: 1.89e-06 [add_attr]: 0.00304069, [1] [add_attr_with_inline]: 0.00303271, [1] [Cycle 1]: 5.167e-05, [2] [tag_attr]: 1.392e-05 [meta_addattr_fg_expand]: 4.35e-06 [parallel-infer-symbol]: 3.13e-06 [pre_auto_parallel]: 2.426e-05 [insert-virtual-dataset]: 2.81999e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.92999e-06 [pipeline_split]: 1.96998e-06 [optimize]: 0.00389321, [53] [py_interpret_to_execute]: 1.904e-05 [rewriter_before_opt_a]: 5.105e-05 [opt_a]: 0.00203097, [2] [Cycle 1]: 0.00140171, [45] [expand_dump_flag]: 2.66e-06 [switch_simplify]: 2.923e-05 [loop_unroll]: 1.654e-05 [a_1]: 0.00035301 [with_stream_mark]: 1.417e-05 [recompute_prepare]: 7.62002e-06 [updatestate_depend_eliminate]: 4.19002e-06 [updatestate_assign_eliminate]: 4e-06 [updatestate_loads_eliminate]: 3.66001e-06 [parameter_eliminate]: 2.33002e-06 [a_2]: 8.028e-05 [accelerated_algorithm]: 6.76e-06 [shard]: 1.97999e-06 [meta_shard_fg_expand]: 1.93002e-06 [shard_inline]: 6.46e-06 [merge_send_recv]: 8.68001e-06 [auto_parallel]: 5.86e-06 [parallel]: 1.844e-05 [flash_sp]: 7.89002e-06 [merge_comm]: 3.73999e-06 [allreduce_fusion]: 3.87998e-06 [matmul_add_comm_reduction]: 8.65999e-06 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 7.12002e-06 [virtual_dataset]: 5.71998e-06 [get_grad_eliminate_]: 5.64998e-06 [virtual_output]: 5.71e-06 [merge_forward]: 3.91999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.015e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.151e-05 [merge_recompute_call_nodes]: 1.81998e-06 [before_grad]: 1.019e-05 [set_forward_comm_id_for_comm_node_pass]: 3.69002e-06 [meta_fg_expand]: 2.78e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.13002e-06 [after_resolve]: 9.59e-06 [a_after_grad]: 8.72e-06 [renormalize]: 0.00039505 [add_forward_monad_depend]: 4.38001e-06 [auto_monad_grad]: 2.05002e-06 [auto_monad_eliminator]: 1.384e-05 [cse]: 2.906e-05 [a_3]: 4.197e-05 [Cycle 2]: 0.00061989, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 7.02002e-06 [loop_unroll]: 5.81e-06 [a_1]: 0.000114 [with_stream_mark]: 9.64999e-06 [recompute_prepare]: 6.23e-06 [updatestate_depend_eliminate]: 3.11999e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.89001e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 7.123e-05 [accelerated_algorithm]: 5.89999e-06 [shard]: 9.60019e-07 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 4.40999e-06 [auto_parallel]: 5.81003e-06 [parallel]: 1.429e-05 [flash_sp]: 3.78001e-06 [merge_comm]: 3.56999e-06 [allreduce_fusion]: 2.93e-06 [matmul_add_comm_reduction]: 5.51e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.88e-06 [virtual_dataset]: 5.47999e-06 [get_grad_eliminate_]: 5.22999e-06 [virtual_output]: 5.12e-06 [merge_forward]: 2.69999e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 6.17001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.053e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 8.82e-06 [set_forward_comm_id_for_comm_node_pass]: 3.67002e-06 [meta_fg_expand]: 1.90001e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 8.70001e-07 [after_resolve]: 8.70999e-06 [a_after_grad]: 8.3e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 9.40025e-07 [auto_monad_eliminator]: 6.53e-06 [cse]: 1.442e-05 [a_3]: 3.302e-05 [py_interpret_to_execute_after_opt_a]: 7.77e-06 [slice_cell_reuse_recomputed_activation]: 2.22001e-06 [rewriter_after_opt_a]: 3.222e-05 [convert_after_rewriter]: 6.68e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.00046105 [opt_b]: 0.00018505, [1] [Cycle 1]: 0.00017877, [7] [b_1]: 0.00010883 [b_2]: 7.66999e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 4.10015e-07 [cse]: 1.724e-05 [optimize_parallel_all_gather_comm]: 1.615e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.288e-05 [loop_unroll]: 0.0004216 [opt_after_cconv]: 9.594e-05, [1] [Cycle 1]: 9.013e-05, [7] [c_1]: 2.582e-05 [parameter_eliminate]: 2.28998e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.35002e-06 [cse]: 1.75e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.496e-05 [tuple_transform]: 6.883e-05, [1] [Cycle 1]: 6.375e-05, [4] [d_1]: 3.704e-05 [none_parameter_eliminate]: 1.48002e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.22001e-06 [partial_unused_args_eliminate]: 2.09999e-06 [add_recomputation]: 4.323e-05 [cse_after_recomputation]: 2.082e-05, [1] [Cycle 1]: 1.625e-05, [1] [cse]: 1.095e-05 [environ_conv]: 5.35999e-06 [swap_dp_allreduce_reducescatter]: 5.10999e-06 [bias_add_comm_swap]: 2.90002e-06 [label_micro_interleaved_index]: 4.27e-06 [label_fine_grained_interleaved_index]: 2.76999e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.14999e-06 [micro_interleaved_order_control]: 2.30002e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.43e-06 [reorder_send_recv_between_fp_bp]: 2.84001e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.15999e-06 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.15e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.99002e-06 [overlap_recompute_and_grad_model_parallel]: 4.52998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.57001e-06 [overlap_grad_ring_attention]: 4.28001e-06 [overlap_grad_flash_sp]: 1.778e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.26998e-06 [split_layernorm_comm]: 1.64998e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 6.977e-05, [1] [Cycle 1]: 6.551e-05, [6] [build]: 2.21e-06 [elim_shapecalc]: 8.45999e-06 [elim_not_effective]: 1.175e-05 [opt_reshape]: 6.29001e-06 [fold_const_symbol]: 9.22001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.617e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.46999e-06 [opt_after_jit_grad]: 0.00045619 [validate]: 3.289e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.00599872 [execute]: 7.01999e-06 Sums bootstrap : 0.000455s : 2.82% type_inference : 0.005937s : 36.82% event_method : 0.000013s : 0.08% auto_monad : 0.000060s : 0.37% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000051s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000036s : 0.22% optimize.opt_a.loop_unroll : 0.000022s : 0.14% optimize.opt_a.a_1 : 0.000467s : 2.90% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000152s : 0.94% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000033s : 0.20% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000395s : 2.45% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000043s : 0.27% optimize.opt_a.a_3 : 0.000075s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000461s : 2.86% optimize.opt_b.b_1 : 0.000109s : 0.67% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000422s : 2.61% optimize.opt_after_cconv.c_1 : 0.000026s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000456s : 2.83% validate : 0.000033s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005999s : 37.20% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000142 24 19.88% : 0.000028s : 4: substitution.arithmetic_simplify 1.31% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.13% : 0.000006s : 3: substitution.graph_param_transform 65.93% : 0.000093s : 3: substitution.inline 2.27% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.20% : 0.000005s : 4: substitution.remove_not_recompute_node 2.29% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005893 2 92.12% : 0.005429s : 1: type_inference.infer 7.88% : 0.000464s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000091 3 100.00% : 0.000091s : 3: match.inline ------[predicate.] 0.000146 815 0.84% : 0.000001s : 8: predicate.accumulaten_eliminater 0.87% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.89% : 0.000001s : 8: predicate.addn_zero_filter 0.79% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.36% : 0.000003s : 14: predicate.arithmetic_simplify 0.86% : 0.000001s : 8: predicate.cast_eliminate 0.68% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.89% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 1.01% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 11: predicate.environ_get_depend_swap 1.78% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.90% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.86% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.69% : 0.000001s : 6: predicate.incorporate_call 0.62% : 0.000001s : 6: predicate.incorporate_call_switch 6.47% : 0.000009s : 37: predicate.inline 1.07% : 0.000002s : 6: predicate.inline_without_move 0.43% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.97% : 0.000001s : 6: predicate.less_batch_normalization 1.54% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 22: predicate.load_eliminater 1.18% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.66% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.11% : 0.000002s : 3: predicate.mutable_eliminate 0.43% : 0.000001s : 3: predicate.opt_reshape 0.38% : 0.000001s : 3: predicate.parallel_virtual_node 1.48% : 0.000002s : 11: predicate.partial_defer_inline 1.33% : 0.000002s : 11: predicate.partial_eliminate 0.81% : 0.000001s : 8: predicate.print_const_string_wrapper 0.75% : 0.000001s : 6: predicate.reduce_all_const_elim 1.20% : 0.000002s : 8: predicate.reduce_eliminate 2.17% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.58% : 0.000001s : 6: predicate.remove_not_recompute_node 1.19% : 0.000002s : 14: predicate.replace_applicator 0.77% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 0.94% : 0.000001s : 8: predicate.reshape_eliminate 0.67% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 3: predicate.row_tensor_eliminate 0.91% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 6: predicate.shard_identity_eliminate 0.73% : 0.000001s : 6: predicate.special_op_eliminate 0.91% : 0.000001s : 6: predicate.specialize_transform 0.96% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.26% : 0.000002s : 11: predicate.switch_defer_inline 1.91% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.94% : 0.000007s : 38: predicate.switch_simplify 0.86% : 0.000001s : 8: predicate.tile_eliminate 0.86% : 0.000001s : 8: predicate.transpose_eliminate 1.57% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.82% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.18% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.69% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.36% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000283 7 38.64% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.36% : 0.000173s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028471 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.70% : 0.003045s : 1: add_attr 10.66% : 0.003036s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000066s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.72% : 0.000490s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.51% : 0.000430s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.65% : 0.000470s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.92% : 0.000832s : 78: opt.transform.opt_a 0.09% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000020s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000041s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.14% : 0.002034s : 1: opt_a 0.35% : 0.000099s : 1: opt_after_cconv 1.64% : 0.000466s : 1: opt_after_jit_grad 0.66% : 0.000188s : 1: opt_b 13.69% : 0.003897s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000019s : 1: remove_dup_value 0.73% : 0.000209s : 1: renormalize.infer 0.63% : 0.000180s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.19% : 0.000055s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000073s : 1: symbol_engine_optimizer 21.10% : 0.006009s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 20.90% : 0.005951s : 1: type_inference 0.21% : 0.000059s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x5-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x5-kbk],max_mem:12.0M TotalTime = 0.106834, [24] [bootstrap]: 0.00060768 [type_inference]: 0.00644755 [event_method]: 1.369e-05 [auto_monad]: 5.796e-05 [graph_reusing]: 5.32999e-06 [inline]: 1.86e-06 [add_attr]: 0.00358336, [1] [add_attr_with_inline]: 0.00357286, [1] [Cycle 1]: 4.814e-05, [2] [tag_attr]: 1.498e-05 [meta_addattr_fg_expand]: 4.85999e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 2.627e-05 [insert-virtual-dataset]: 2.53e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 2.49001e-06 [pipeline_split]: 1.63002e-06 [optimize]: 0.00411325, [53] [py_interpret_to_execute]: 2.129e-05 [rewriter_before_opt_a]: 6.298e-05 [opt_a]: 0.00219512, [2] [Cycle 1]: 0.00158273, [45] [expand_dump_flag]: 3.09001e-06 [switch_simplify]: 3.279e-05 [loop_unroll]: 2.061e-05 [a_1]: 0.0004408 [with_stream_mark]: 1.41e-05 [recompute_prepare]: 7.53999e-06 [updatestate_depend_eliminate]: 3.8e-06 [updatestate_assign_eliminate]: 3.38999e-06 [updatestate_loads_eliminate]: 3.09001e-06 [parameter_eliminate]: 1.82001e-06 [a_2]: 7.954e-05 [accelerated_algorithm]: 6.55997e-06 [shard]: 2.19999e-06 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 6.22001e-06 [merge_send_recv]: 8.70999e-06 [auto_parallel]: 5.73002e-06 [parallel]: 2.626e-05 [flash_sp]: 7.6e-06 [merge_comm]: 4.32e-06 [allreduce_fusion]: 3.45e-06 [matmul_add_comm_reduction]: 8.92e-06 [allreduce_slice_to_reducescatter]: 7.99977e-07 [virtual_shard_identity]: 7.97e-06 [virtual_dataset]: 6.11e-06 [get_grad_eliminate_]: 5.84e-06 [virtual_output]: 5.92999e-06 [merge_forward]: 4.22998e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 9.70002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.152e-05 [merge_recompute_call_nodes]: 1.92999e-06 [before_grad]: 1.053e-05 [set_forward_comm_id_for_comm_node_pass]: 3.83001e-06 [meta_fg_expand]: 2.49999e-06 [flash_sp_send_recv_attached]: 2.46998e-06 [receive_attached]: 2.37001e-06 [after_resolve]: 9.32001e-06 [a_after_grad]: 8.67e-06 [renormalize]: 0.00045048 [add_forward_monad_depend]: 9.09998e-06 [auto_monad_grad]: 1.84998e-06 [auto_monad_eliminator]: 1.393e-05 [cse]: 3.993e-05 [a_3]: 4.246e-05 [Cycle 2]: 0.00060337, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 7.03e-06 [loop_unroll]: 5.57999e-06 [a_1]: 0.00011451 [with_stream_mark]: 1.004e-05 [recompute_prepare]: 6.14001e-06 [updatestate_depend_eliminate]: 3.10002e-06 [updatestate_assign_eliminate]: 2.38998e-06 [updatestate_loads_eliminate]: 2.79001e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 7.001e-05 [accelerated_algorithm]: 5.71998e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.67001e-06 [merge_send_recv]: 4.59998e-06 [auto_parallel]: 5.54998e-06 [parallel]: 4.81002e-06 [flash_sp]: 3.38e-06 [merge_comm]: 3.2e-06 [allreduce_fusion]: 2.90002e-06 [matmul_add_comm_reduction]: 5.34998e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.24001e-06 [virtual_dataset]: 5.40999e-06 [get_grad_eliminate_]: 5.29998e-06 [virtual_output]: 5.15999e-06 [merge_forward]: 2.68e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 6.17999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.015e-05 [merge_recompute_call_nodes]: 8.2e-07 [before_grad]: 8.72998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.6e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 8.39002e-06 [a_after_grad]: 7.63999e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.09003e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.94001e-06 [cse]: 1.471e-05 [a_3]: 3.257e-05 [py_interpret_to_execute_after_opt_a]: 8.25e-06 [slice_cell_reuse_recomputed_activation]: 2.05002e-06 [rewriter_after_opt_a]: 3.273e-05 [convert_after_rewriter]: 6.99001e-06 [order_py_execute_after_rewriter]: 5.14998e-06 [mutable_eliminate]: 0.00046909 [opt_b]: 0.00018788, [1] [Cycle 1]: 0.00018175, [7] [b_1]: 0.00011005 [b_2]: 6.95998e-06 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 4.80009e-07 [cse]: 1.856e-05 [optimize_parallel_all_gather_comm]: 1.632e-05 [overlap_param_gather]: 2.01998e-06 [cconv]: 2.276e-05 [loop_unroll]: 0.00043085 [opt_after_cconv]: 9.639e-05, [1] [Cycle 1]: 9.054e-05, [7] [c_1]: 2.56e-05 [parameter_eliminate]: 2.26998e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.94001e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.783e-05 [renormalize]: 7.00005e-07 [remove_dup_value]: 1.592e-05 [tuple_transform]: 6.793e-05, [1] [Cycle 1]: 6.332e-05, [4] [d_1]: 3.676e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.26e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 4.992e-05 [cse_after_recomputation]: 2.167e-05, [1] [Cycle 1]: 1.705e-05, [1] [cse]: 1.163e-05 [environ_conv]: 7.63001e-06 [swap_dp_allreduce_reducescatter]: 5.32001e-06 [bias_add_comm_swap]: 2.71e-06 [label_micro_interleaved_index]: 4.82e-06 [label_fine_grained_interleaved_index]: 2.80997e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.34e-06 [full_micro_interleaved_order_control]: 2.58003e-06 [reorder_send_recv_between_fp_bp]: 3.01001e-06 [comm_op_add_attrs]: 1.40001e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.13001e-06 [overlap_opt_shard_in_pipeline]: 1.22999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.73002e-06 [control_data_broadcast_order]: 1.344e-05 [grouped_pairwise_exchange_alltoall]: 1.69e-06 [offloading_packed_experts]: 4.06001e-06 [overlap_recompute_and_grad_model_parallel]: 4.89998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.07999e-06 [overlap_grad_ring_attention]: 4.23001e-06 [overlap_grad_flash_sp]: 1.792e-05 [begin_end_overlap_inline]: 6.50005e-07 [split_matmul_comm_elemetwise]: 2.53998e-06 [split_layernorm_comm]: 2.01998e-06 [handle_group_info]: 1.09998e-06 [symbol_engine_optimizer]: 7.274e-05, [1] [Cycle 1]: 6.84e-05, [6] [build]: 2.26998e-06 [elim_shapecalc]: 9.87001e-06 [elim_not_effective]: 1.259e-05 [opt_reshape]: 6.29999e-06 [fold_const_symbol]: 9.42999e-06 [renormalize]: 2.70025e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.656e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.65e-06 [opt_after_jit_grad]: 0.00046273 [validate]: 3.417e-05 [backend_pass]: 1.27e-06 [task_emit]: 0.0912152 [execute]: 1.073e-05 Sums bootstrap : 0.000608s : 0.59% type_inference : 0.006448s : 6.31% event_method : 0.000014s : 0.01% auto_monad : 0.000058s : 0.06% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000063s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000555s : 0.54% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000150s : 0.15% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000031s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000018s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000451s : 0.44% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000055s : 0.05% optimize.opt_a.a_3 : 0.000075s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000469s : 0.46% optimize.opt_b.b_1 : 0.000110s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000431s : 0.42% optimize.opt_after_cconv.c_1 : 0.000026s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.02% optimize.tuple_transform.d_1 : 0.000037s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.05% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000463s : 0.45% validate : 0.000034s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.091215s : 89.22% execute : 0.000011s : 0.01% Time group info: ------[substitution.] 0.000164 26 19.01% : 0.000031s : 5: substitution.arithmetic_simplify 1.14% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 3.05% : 0.000005s : 3: substitution.graph_param_transform 63.91% : 0.000105s : 3: substitution.inline 1.88% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.78% : 0.000005s : 4: substitution.remove_not_recompute_node 1.74% : 0.000003s : 2: substitution.replace_old_param 5.65% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006400 2 90.65% : 0.005801s : 1: type_inference.infer 9.35% : 0.000599s : 1: type_inference.specialize ------[replace.] 0.000038 4 78.62% : 0.000030s : 3: replace.inline 21.38% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000111 4 92.41% : 0.000102s : 3: match.inline 7.59% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 883 0.94% : 0.000001s : 9: predicate.accumulaten_eliminater 0.82% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.92% : 0.000001s : 9: predicate.addn_zero_filter 0.87% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.28% : 0.000004s : 15: predicate.arithmetic_simplify 0.94% : 0.000002s : 9: predicate.cast_eliminate 0.61% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.19% : 0.000000s : 3: predicate.const_output_eliminate 0.61% : 0.000001s : 6: predicate.depend_value_elim 0.84% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 9: predicate.dict_set_item_eliminator 0.97% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.14% : 0.000002s : 12: predicate.environ_get_depend_swap 1.78% : 0.000003s : 18: predicate.environ_get_eliminate 1.16% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.33% : 0.000004s : 13: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.85% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.71% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.28% : 0.000010s : 40: predicate.inline 0.92% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 6: predicate.less_batch_normalization 1.67% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 25: predicate.load_eliminater 0.97% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.31% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 9: predicate.minmaximum_grad 1.00% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.66% : 0.000003s : 13: predicate.partial_defer_inline 1.42% : 0.000002s : 13: predicate.partial_eliminate 0.94% : 0.000001s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.21% : 0.000002s : 9: predicate.reduce_eliminate 2.43% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 6: predicate.remove_not_recompute_node 1.29% : 0.000002s : 16: predicate.replace_applicator 0.63% : 0.000001s : 6: predicate.replace_old_param 0.27% : 0.000000s : 3: predicate.reset_defer_inline 0.93% : 0.000001s : 9: predicate.reshape_eliminate 0.57% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.86% : 0.000001s : 6: predicate.same_eliminate 0.48% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 6: predicate.shard_identity_eliminate 0.70% : 0.000001s : 6: predicate.special_op_eliminate 0.84% : 0.000001s : 6: predicate.specialize_transform 0.91% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.67% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 13: predicate.switch_defer_inline 1.99% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 43: predicate.switch_simplify 0.87% : 0.000001s : 9: predicate.tile_eliminate 0.91% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.49% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.44% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 3: predicate.value_based_eliminate 0.66% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000374 8 49.42% : 0.000185s : 3: func_graph_cloner_run.FuncGraphClonerGraph 50.58% : 0.000189s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.116045 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.09% : 0.003588s : 1: add_attr 3.08% : 0.003576s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000063s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.56% : 0.000649s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000018s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.38% : 0.000440s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.41% : 0.000478s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.80% : 0.000924s : 78: opt.transform.opt_a 0.02% : 0.000024s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000088s : 28: opt.transform.opt_b 0.04% : 0.000041s : 2: opt.transform.opt_trans_graph 0.03% : 0.000034s : 4: opt.transform.symbol_engine_opt 1.89% : 0.002198s : 1: opt_a 0.09% : 0.000100s : 1: opt_after_cconv 0.41% : 0.000472s : 1: opt_after_jit_grad 0.16% : 0.000191s : 1: opt_b 3.55% : 0.004117s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000031s : 1: pre_auto_parallel 0.02% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000019s : 1: remove_dup_value 0.21% : 0.000241s : 1: renormalize.infer 0.17% : 0.000202s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.06% : 0.000067s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000075s : 1: symbol_engine_optimizer 78.62% : 0.091239s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 5.57% : 0.006461s : 1: type_inference 0.05% : 0.000056s : 1: validate TotalTime = 0.0961784, [24] [bootstrap]: 0.00044348 [type_inference]: 0.00595467 [event_method]: 1.328e-05 [auto_monad]: 6.145e-05 [graph_reusing]: 5.94e-06 [inline]: 2.01e-06 [add_attr]: 0.00302168, [1] [add_attr_with_inline]: 0.00301251, [1] [Cycle 1]: 4.884e-05, [2] [tag_attr]: 1.365e-05 [meta_addattr_fg_expand]: 4.16001e-06 [parallel-infer-symbol]: 3.19001e-06 [pre_auto_parallel]: 2.277e-05 [insert-virtual-dataset]: 2.59999e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.12999e-06 [pipeline_split]: 2.06e-06 [optimize]: 0.00395302, [53] [py_interpret_to_execute]: 2.084e-05 [rewriter_before_opt_a]: 5.545e-05 [opt_a]: 0.00202059, [2] [Cycle 1]: 0.00140755, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 2.871e-05 [loop_unroll]: 1.684e-05 [a_1]: 0.00035304 [with_stream_mark]: 1.417e-05 [recompute_prepare]: 7.73001e-06 [updatestate_depend_eliminate]: 3.86001e-06 [updatestate_assign_eliminate]: 3.51999e-06 [updatestate_loads_eliminate]: 3.46001e-06 [parameter_eliminate]: 2.41e-06 [a_2]: 8.011e-05 [accelerated_algorithm]: 6.47001e-06 [shard]: 2.14999e-06 [meta_shard_fg_expand]: 1.68997e-06 [shard_inline]: 6.13998e-06 [merge_send_recv]: 8.35001e-06 [auto_parallel]: 5.91e-06 [parallel]: 1.752e-05 [flash_sp]: 7e-06 [merge_comm]: 3.88999e-06 [allreduce_fusion]: 3.53999e-06 [matmul_add_comm_reduction]: 9.04e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 6.93998e-06 [virtual_dataset]: 5.81998e-06 [get_grad_eliminate_]: 5.52001e-06 [virtual_output]: 5.86998e-06 [merge_forward]: 4.38999e-06 [cell_reuse_recompute_pass]: 1.45001e-06 [offload_activation]: 9.87001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.174e-05 [merge_recompute_call_nodes]: 1.59998e-06 [before_grad]: 9.72999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.63e-06 [meta_fg_expand]: 2.64001e-06 [flash_sp_send_recv_attached]: 2.76e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 9.29e-06 [a_after_grad]: 8.64998e-06 [renormalize]: 0.00039915 [add_forward_monad_depend]: 4.62998e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.391e-05 [cse]: 3.078e-05 [a_3]: 4.145e-05 [Cycle 2]: 0.00060343, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.94999e-06 [loop_unroll]: 5.67999e-06 [a_1]: 0.00011465 [with_stream_mark]: 1.2e-05 [recompute_prepare]: 5.97999e-06 [updatestate_depend_eliminate]: 3.00998e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 9.19972e-07 [a_2]: 7.181e-05 [accelerated_algorithm]: 5.82001e-06 [shard]: 9.00007e-07 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.59998e-06 [merge_send_recv]: 4.83001e-06 [auto_parallel]: 5.34998e-06 [parallel]: 3.85e-06 [flash_sp]: 3.18e-06 [merge_comm]: 3.65003e-06 [allreduce_fusion]: 3.04001e-06 [matmul_add_comm_reduction]: 5.15001e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.46e-06 [virtual_dataset]: 5.71e-06 [get_grad_eliminate_]: 5.15999e-06 [virtual_output]: 5.17e-06 [merge_forward]: 2.62001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 6.01998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.038e-05 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 8.73001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43999e-06 [meta_fg_expand]: 1.91e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 9.79984e-07 [after_resolve]: 8.2e-06 [a_after_grad]: 7.83999e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.35999e-06 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 6.88e-06 [cse]: 1.383e-05 [a_3]: 3.258e-05 [py_interpret_to_execute_after_opt_a]: 7.26999e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 3.32e-05 [convert_after_rewriter]: 6.93e-06 [order_py_execute_after_rewriter]: 5.42999e-06 [mutable_eliminate]: 0.00045958 [opt_b]: 0.00023549, [1] [Cycle 1]: 0.00022926, [7] [b_1]: 0.00010964 [b_2]: 8.62e-06 [updatestate_depend_eliminate]: 5.28002e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.48e-06 [renormalize]: 4.50003e-07 [cse]: 1.818e-05 [optimize_parallel_all_gather_comm]: 1.67e-05 [overlap_param_gather]: 2.43e-06 [cconv]: 2.274e-05 [loop_unroll]: 0.00042647 [opt_after_cconv]: 9.744e-05, [1] [Cycle 1]: 9.181e-05, [7] [c_1]: 2.582e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.39001e-06 [cse]: 1.745e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 1.621e-05 [tuple_transform]: 6.789e-05, [1] [Cycle 1]: 6.316e-05, [4] [d_1]: 3.641e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.33e-06 [partial_unused_args_eliminate]: 2.04e-06 [add_recomputation]: 4.448e-05 [cse_after_recomputation]: 2.111e-05, [1] [Cycle 1]: 1.644e-05, [1] [cse]: 1.113e-05 [environ_conv]: 6.00002e-06 [swap_dp_allreduce_reducescatter]: 5.25999e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.67e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.59998e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.59001e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.57999e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.49e-06 [overlap_opt_shard_grad_in_pipeline]: 2.02001e-06 [control_data_broadcast_order]: 1.224e-05 [grouped_pairwise_exchange_alltoall]: 1.40001e-06 [offloading_packed_experts]: 4.23001e-06 [overlap_recompute_and_grad_model_parallel]: 4.51002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.66999e-06 [overlap_grad_ring_attention]: 4.11001e-06 [overlap_grad_flash_sp]: 1.782e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.39999e-06 [split_layernorm_comm]: 2.22999e-06 [handle_group_info]: 1.06997e-06 [symbol_engine_optimizer]: 7.134e-05, [1] [Cycle 1]: 6.719e-05, [6] [build]: 2.39999e-06 [elim_shapecalc]: 8.75001e-06 [elim_not_effective]: 1.232e-05 [opt_reshape]: 6.25002e-06 [fold_const_symbol]: 9.56e-06 [renormalize]: 2.60014e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.53002e-06 [auto_monad_reorder]: 1.574e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.80998e-06 [opt_after_jit_grad]: 0.00045455 [validate]: 3.411e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0819148 [execute]: 1.047e-05 Sums bootstrap : 0.000443s : 0.48% type_inference : 0.005955s : 6.47% event_method : 0.000013s : 0.01% auto_monad : 0.000061s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000055s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000036s : 0.04% optimize.opt_a.loop_unroll : 0.000023s : 0.02% optimize.opt_a.a_1 : 0.000468s : 0.51% optimize.opt_a.with_stream_mark : 0.000026s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000152s : 0.17% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000017s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000399s : 0.43% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000045s : 0.05% optimize.opt_a.a_3 : 0.000074s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000460s : 0.50% optimize.opt_b.b_1 : 0.000110s : 0.12% optimize.opt_b.b_2 : 0.000009s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000426s : 0.46% optimize.opt_after_cconv.c_1 : 0.000026s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.02% optimize.tuple_transform.d_1 : 0.000036s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000455s : 0.49% validate : 0.000034s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.081915s : 88.98% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000141 24 20.47% : 0.000029s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 3.61% : 0.000005s : 3: substitution.graph_param_transform 65.77% : 0.000093s : 3: substitution.inline 2.35% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.14% : 0.000004s : 4: substitution.remove_not_recompute_node 2.20% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005910 2 92.00% : 0.005437s : 1: type_inference.infer 8.00% : 0.000473s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000091 3 100.00% : 0.000091s : 3: match.inline ------[predicate.] 0.000145 815 0.90% : 0.000001s : 8: predicate.accumulaten_eliminater 0.94% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 6: predicate.addn_check_dump 0.87% : 0.000001s : 8: predicate.addn_zero_filter 0.80% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.27% : 0.000003s : 14: predicate.arithmetic_simplify 0.96% : 0.000001s : 8: predicate.cast_eliminate 0.69% : 0.000001s : 6: predicate.check_bprop_eliminate 0.66% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.61% : 0.000001s : 6: predicate.depend_value_elim 0.84% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.45% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_depend_swap 1.85% : 0.000003s : 17: predicate.environ_get_eliminate 1.12% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.20% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.06% : 0.000003s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 6: predicate.float_environ_get_switch 0.97% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.75% : 0.000001s : 6: predicate.get_grad_eliminate 0.27% : 0.000000s : 3: predicate.graph_param_transform 0.73% : 0.000001s : 6: predicate.incorporate_call 0.63% : 0.000001s : 6: predicate.incorporate_call_switch 6.40% : 0.000009s : 37: predicate.inline 1.08% : 0.000002s : 6: predicate.inline_without_move 0.43% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.97% : 0.000001s : 6: predicate.less_batch_normalization 1.58% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 22: predicate.load_eliminater 1.01% : 0.000001s : 3: predicate.loop_unroll_after_grad 2.02% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.64% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 8: predicate.minmaximum_grad 1.22% : 0.000002s : 3: predicate.mutable_eliminate 0.43% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.40% : 0.000002s : 11: predicate.partial_defer_inline 1.29% : 0.000002s : 11: predicate.partial_eliminate 0.86% : 0.000001s : 8: predicate.print_const_string_wrapper 0.70% : 0.000001s : 6: predicate.reduce_all_const_elim 1.14% : 0.000002s : 8: predicate.reduce_eliminate 2.37% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.64% : 0.000001s : 6: predicate.remove_not_recompute_node 1.16% : 0.000002s : 14: predicate.replace_applicator 0.74% : 0.000001s : 6: predicate.replace_old_param 0.33% : 0.000000s : 3: predicate.reset_defer_inline 0.89% : 0.000001s : 8: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 3: predicate.row_tensor_eliminate 0.94% : 0.000001s : 6: predicate.same_eliminate 0.53% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 6: predicate.shard_identity_eliminate 0.84% : 0.000001s : 6: predicate.special_op_eliminate 0.91% : 0.000001s : 6: predicate.specialize_transform 0.98% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.24% : 0.000002s : 11: predicate.switch_defer_inline 1.90% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.91% : 0.000007s : 38: predicate.switch_simplify 0.93% : 0.000001s : 8: predicate.tile_eliminate 0.89% : 0.000001s : 8: predicate.transpose_eliminate 1.54% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.69% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.18% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 3: predicate.value_based_eliminate 0.80% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.35% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000302 7 42.70% : 0.000129s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.30% : 0.000173s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.104529 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.89% : 0.003026s : 1: add_attr 2.89% : 0.003017s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000067s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.49% : 0.000516s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000018s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.42% : 0.000435s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.45% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.79% : 0.000828s : 78: opt.transform.opt_a 0.02% : 0.000024s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000041s : 2: opt.transform.opt_trans_graph 0.03% : 0.000033s : 4: opt.transform.symbol_engine_opt 1.94% : 0.002024s : 1: opt_a 0.10% : 0.000101s : 1: opt_after_cconv 0.44% : 0.000463s : 1: opt_after_jit_grad 0.23% : 0.000239s : 1: opt_b 3.79% : 0.003957s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000020s : 1: remove_dup_value 0.20% : 0.000206s : 1: renormalize.infer 0.18% : 0.000186s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000037s : 1: rewriter_after_opt_a 0.06% : 0.000059s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000074s : 1: symbol_engine_optimizer 78.39% : 0.081939s : 1: task_emit 0.07% : 0.000071s : 1: tuple_transform 5.71% : 0.005969s : 1: type_inference 0.05% : 0.000057s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x5-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x5-ge],max_mem:14.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x6-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x6-pynative],max_mem:14.0M TotalTime = 0.0221705, [24] [bootstrap]: 0.00052663 [type_inference]: 0.0065446 [event_method]: 1.409e-05 [auto_monad]: 6.516e-05 [graph_reusing]: 5.92001e-06 [inline]: 1.84e-06 [add_attr]: 0.00360192, [1] [add_attr_with_inline]: 0.00359175, [1] [Cycle 1]: 4.829e-05, [2] [tag_attr]: 1.526e-05 [meta_addattr_fg_expand]: 4.17e-06 [parallel-infer-symbol]: 3.25e-06 [pre_auto_parallel]: 2.643e-05 [insert-virtual-dataset]: 2.58e-06 [parallel-infer-symbol-second]: 8.29983e-07 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 1.86998e-06 [optimize]: 0.00421345, [53] [py_interpret_to_execute]: 2.224e-05 [rewriter_before_opt_a]: 6.522e-05 [opt_a]: 0.00226637, [2] [Cycle 1]: 0.00165349, [45] [expand_dump_flag]: 2.49999e-06 [switch_simplify]: 3.255e-05 [loop_unroll]: 2.047e-05 [a_1]: 0.00044414 [with_stream_mark]: 1.48e-05 [recompute_prepare]: 8.35001e-06 [updatestate_depend_eliminate]: 3.86999e-06 [updatestate_assign_eliminate]: 3.61999e-06 [updatestate_loads_eliminate]: 3.47002e-06 [parameter_eliminate]: 1.87999e-06 [a_2]: 8.079e-05 [accelerated_algorithm]: 6.36e-06 [shard]: 1.83002e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 6.11e-06 [merge_send_recv]: 8.42e-06 [auto_parallel]: 7.18e-06 [parallel]: 2.724e-05 [flash_sp]: 7.35998e-06 [merge_comm]: 4.09002e-06 [allreduce_fusion]: 3.94002e-06 [matmul_add_comm_reduction]: 9.91e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.55e-06 [virtual_dataset]: 5.89e-06 [get_grad_eliminate_]: 5.57999e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.88999e-06 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 1.011e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.179e-05 [merge_recompute_call_nodes]: 1.67001e-06 [before_grad]: 9.79999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.8e-06 [meta_fg_expand]: 2.61e-06 [flash_sp_send_recv_attached]: 2.44001e-06 [receive_attached]: 2.31e-06 [after_resolve]: 9.24998e-06 [a_after_grad]: 8.46002e-06 [renormalize]: 0.00052402 [add_forward_monad_depend]: 9.19e-06 [auto_monad_grad]: 2.14e-06 [auto_monad_eliminator]: 1.408e-05 [cse]: 2.963e-05 [a_3]: 4.217e-05 [Cycle 2]: 0.00060293, [45] [expand_dump_flag]: 9.19972e-07 [switch_simplify]: 6.99001e-06 [loop_unroll]: 5.62001e-06 [a_1]: 0.00011418 [with_stream_mark]: 1.024e-05 [recompute_prepare]: 5.89e-06 [updatestate_depend_eliminate]: 3.09001e-06 [updatestate_assign_eliminate]: 2.33002e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 7.046e-05 [accelerated_algorithm]: 5.79999e-06 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.71998e-06 [merge_send_recv]: 4.58001e-06 [auto_parallel]: 5.66e-06 [parallel]: 4.53001e-06 [flash_sp]: 3.55998e-06 [merge_comm]: 3.41001e-06 [allreduce_fusion]: 2.94999e-06 [matmul_add_comm_reduction]: 5.40001e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.84999e-06 [virtual_dataset]: 5.52001e-06 [get_grad_eliminate_]: 5.15001e-06 [virtual_output]: 5.09e-06 [merge_forward]: 3.14001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.85002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.054e-05 [merge_recompute_call_nodes]: 9.00007e-07 [before_grad]: 8.48001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 8.33999e-06 [a_after_grad]: 7.74002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.23002e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.17001e-06 [cse]: 1.714e-05 [a_3]: 3.242e-05 [py_interpret_to_execute_after_opt_a]: 7.71999e-06 [slice_cell_reuse_recomputed_activation]: 2.43002e-06 [rewriter_after_opt_a]: 3.33e-05 [convert_after_rewriter]: 7.68999e-06 [order_py_execute_after_rewriter]: 5.06002e-06 [mutable_eliminate]: 0.00049085 [opt_b]: 0.00018697, [1] [Cycle 1]: 0.00018056, [7] [b_1]: 0.00010901 [b_2]: 6.98e-06 [updatestate_depend_eliminate]: 5.32999e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.42001e-06 [renormalize]: 4.60015e-07 [cse]: 1.889e-05 [optimize_parallel_all_gather_comm]: 1.679e-05 [overlap_param_gather]: 2.13002e-06 [cconv]: 2.427e-05 [loop_unroll]: 0.00042887 [opt_after_cconv]: 9.669e-05, [1] [Cycle 1]: 9.112e-05, [7] [c_1]: 2.618e-05 [parameter_eliminate]: 2.39999e-06 [updatestate_depend_eliminate]: 4.97999e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.53e-06 [cse]: 1.766e-05 [renormalize]: 5.8001e-07 [remove_dup_value]: 1.675e-05 [tuple_transform]: 6.952e-05, [1] [Cycle 1]: 6.473e-05, [4] [d_1]: 3.797e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.46999e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 4.823e-05 [cse_after_recomputation]: 2.201e-05, [1] [Cycle 1]: 1.711e-05, [1] [cse]: 1.177e-05 [environ_conv]: 7.42002e-06 [swap_dp_allreduce_reducescatter]: 5.20999e-06 [bias_add_comm_swap]: 2.98e-06 [label_micro_interleaved_index]: 4.03001e-06 [label_fine_grained_interleaved_index]: 2.64999e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.29001e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 1.04e-06 [remove_cast_before_assign_add]: 1.24e-06 [full_micro_interleaved_order_control]: 2.31998e-06 [reorder_send_recv_between_fp_bp]: 2.92002e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.22e-06 [overlap_opt_shard_in_pipeline]: 1.55999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62999e-06 [control_data_broadcast_order]: 1.249e-05 [grouped_pairwise_exchange_alltoall]: 1.99999e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.99e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.37001e-06 [overlap_grad_ring_attention]: 4.27e-06 [overlap_grad_flash_sp]: 1.84e-05 [begin_end_overlap_inline]: 6.50005e-07 [split_matmul_comm_elemetwise]: 2.22001e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 1.04e-06 [symbol_engine_optimizer]: 7.349e-05, [1] [Cycle 1]: 6.894e-05, [6] [build]: 2.55002e-06 [elim_shapecalc]: 9.70002e-06 [elim_not_effective]: 1.209e-05 [opt_reshape]: 6.19001e-06 [fold_const_symbol]: 9.89999e-06 [renormalize]: 2.29978e-07 [detach_backward]: 1.92001e-06 [pipeline_parallel_scheduler]: 1.53002e-06 [auto_monad_reorder]: 1.517e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.47002e-06 [opt_after_jit_grad]: 0.00045858 [validate]: 3.632e-05 [backend_pass]: 1.17e-06 [task_emit]: 0.00642167 [execute]: 7.89997e-06 Sums bootstrap : 0.000527s : 3.00% type_inference : 0.006545s : 37.28% event_method : 0.000014s : 0.08% auto_monad : 0.000065s : 0.37% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.13% optimize.rewriter_before_opt_a : 0.000065s : 0.37% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000558s : 3.18% optimize.opt_a.with_stream_mark : 0.000025s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000151s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000013s : 0.07% optimize.opt_a.parallel : 0.000032s : 0.18% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000016s : 0.09% optimize.opt_a.renormalize : 0.000524s : 2.99% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.06% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000047s : 0.27% optimize.opt_a.a_3 : 0.000075s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.19% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000491s : 2.80% optimize.opt_b.b_1 : 0.000109s : 0.62% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.14% optimize.loop_unroll : 0.000429s : 2.44% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000017s : 0.10% optimize.tuple_transform.d_1 : 0.000038s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.27% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000007s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000459s : 2.61% validate : 0.000036s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006422s : 36.58% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000170 26 18.63% : 0.000032s : 5: substitution.arithmetic_simplify 1.26% : 0.000002s : 2: substitution.elim_not_effective 0.89% : 0.000002s : 2: substitution.fold_const_symbol 3.34% : 0.000006s : 3: substitution.graph_param_transform 64.36% : 0.000109s : 3: substitution.inline 1.87% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.83% : 0.000005s : 4: substitution.remove_not_recompute_node 1.70% : 0.000003s : 2: substitution.replace_old_param 5.13% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006495 2 90.53% : 0.005880s : 1: type_inference.infer 9.47% : 0.000615s : 1: type_inference.specialize ------[replace.] 0.000037 4 79.06% : 0.000029s : 3: replace.inline 20.94% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 4 93.12% : 0.000107s : 3: match.inline 6.88% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000156 883 0.92% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000001s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 15: predicate.arithmetic_simplify 0.95% : 0.000001s : 9: predicate.cast_eliminate 0.69% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.63% : 0.000001s : 6: predicate.depend_value_elim 0.87% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.96% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.04% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.40% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_depend_swap 1.75% : 0.000003s : 18: predicate.environ_get_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.29% : 0.000004s : 13: predicate.float_depend_g_call 0.57% : 0.000001s : 6: predicate.float_environ_get_switch 0.87% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.71% : 0.000001s : 6: predicate.get_grad_eliminate 0.24% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.59% : 0.000001s : 6: predicate.incorporate_call_switch 6.46% : 0.000010s : 40: predicate.inline 0.91% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 6: predicate.less_batch_normalization 1.65% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 25: predicate.load_eliminater 1.04% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.28% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 6: predicate.merge_addn 0.60% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.14% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.35% : 0.000001s : 3: predicate.parallel_virtual_node 1.65% : 0.000003s : 13: predicate.partial_defer_inline 1.45% : 0.000002s : 13: predicate.partial_eliminate 0.95% : 0.000001s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.18% : 0.000002s : 9: predicate.reduce_eliminate 2.47% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.32% : 0.000002s : 16: predicate.replace_applicator 0.64% : 0.000001s : 6: predicate.replace_old_param 0.23% : 0.000000s : 3: predicate.reset_defer_inline 0.95% : 0.000001s : 9: predicate.reshape_eliminate 0.64% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 3: predicate.row_tensor_eliminate 0.76% : 0.000001s : 6: predicate.same_eliminate 0.46% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 6: predicate.shard_identity_eliminate 0.77% : 0.000001s : 6: predicate.special_op_eliminate 0.80% : 0.000001s : 6: predicate.specialize_transform 0.94% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.72% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 13: predicate.switch_defer_inline 1.98% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 43: predicate.switch_simplify 0.88% : 0.000001s : 9: predicate.tile_eliminate 0.98% : 0.000002s : 9: predicate.transpose_eliminate 1.63% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 15: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.72% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.69% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000402 8 50.04% : 0.000201s : 3: func_graph_cloner_run.FuncGraphClonerGraph 49.96% : 0.000201s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031582 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.42% : 0.003606s : 1: add_attr 11.38% : 0.003595s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000071s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.79% : 0.000566s : 1: bootstrap 0.09% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000011s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.39% : 0.000438s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.58% : 0.000500s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000014s : 1: opt.transform.mutable_eliminate 2.93% : 0.000926s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000088s : 28: opt.transform.opt_b 0.13% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.19% : 0.002270s : 1: opt_a 0.32% : 0.000100s : 1: opt_after_cconv 1.48% : 0.000469s : 1: opt_after_jit_grad 0.60% : 0.000190s : 1: opt_b 13.35% : 0.004218s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000027s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000020s : 1: remove_dup_value 0.93% : 0.000294s : 1: renormalize.infer 0.71% : 0.000223s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000037s : 1: rewriter_after_opt_a 0.22% : 0.000070s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000076s : 1: symbol_engine_optimizer 20.37% : 0.006434s : 1: task_emit 0.23% : 0.000073s : 1: tuple_transform 20.77% : 0.006561s : 1: type_inference 0.21% : 0.000065s : 1: validate TotalTime = 0.0201706, [24] [bootstrap]: 0.00043565 [type_inference]: 0.00593105 [event_method]: 1.203e-05 [auto_monad]: 5.852e-05 [graph_reusing]: 5.45001e-06 [inline]: 1.99e-06 [add_attr]: 0.00298773, [1] [add_attr_with_inline]: 0.00297918, [1] [Cycle 1]: 4.458e-05, [2] [tag_attr]: 1.372e-05 [meta_addattr_fg_expand]: 3.93999e-06 [parallel-infer-symbol]: 2.83998e-06 [pre_auto_parallel]: 2.356e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 2.19001e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00392594, [53] [py_interpret_to_execute]: 1.904e-05 [rewriter_before_opt_a]: 4.982e-05 [opt_a]: 0.00205775, [2] [Cycle 1]: 0.00145124, [45] [expand_dump_flag]: 2.72001e-06 [switch_simplify]: 2.858e-05 [loop_unroll]: 1.662e-05 [a_1]: 0.00040673 [with_stream_mark]: 1.402e-05 [recompute_prepare]: 7.79002e-06 [updatestate_depend_eliminate]: 3.8e-06 [updatestate_assign_eliminate]: 3.31999e-06 [updatestate_loads_eliminate]: 3.28e-06 [parameter_eliminate]: 1.76e-06 [a_2]: 8.153e-05 [accelerated_algorithm]: 6.38e-06 [shard]: 2.12001e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 6.18002e-06 [merge_send_recv]: 8.53001e-06 [auto_parallel]: 5.71e-06 [parallel]: 1.821e-05 [flash_sp]: 8e-06 [merge_comm]: 4.14002e-06 [allreduce_fusion]: 3.73001e-06 [matmul_add_comm_reduction]: 8.90001e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.63999e-06 [virtual_dataset]: 6.16e-06 [get_grad_eliminate_]: 6.09999e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 3.91001e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 9.76998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.255e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 1.01e-05 [set_forward_comm_id_for_comm_node_pass]: 3.76001e-06 [meta_fg_expand]: 2.79001e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.17001e-06 [after_resolve]: 9.72999e-06 [a_after_grad]: 8.80001e-06 [renormalize]: 0.00038681 [add_forward_monad_depend]: 4.79998e-06 [auto_monad_grad]: 2.04e-06 [auto_monad_eliminator]: 1.358e-05 [cse]: 2.883e-05 [a_3]: 4.148e-05 [Cycle 2]: 0.00059715, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 7.01001e-06 [loop_unroll]: 5.86e-06 [a_1]: 0.00011319 [with_stream_mark]: 9.76e-06 [recompute_prepare]: 5.72999e-06 [updatestate_depend_eliminate]: 3.05002e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 7.102e-05 [accelerated_algorithm]: 5.96998e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.07e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 4.51002e-06 [auto_parallel]: 5.40999e-06 [parallel]: 4.1e-06 [flash_sp]: 3.35e-06 [merge_comm]: 3.51001e-06 [allreduce_fusion]: 2.84001e-06 [matmul_add_comm_reduction]: 5.33002e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 6.43e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.25001e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 2.74001e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 6.11e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.026e-05 [merge_recompute_call_nodes]: 7.80012e-07 [before_grad]: 8.72998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43e-06 [meta_fg_expand]: 1.89e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.29984e-07 [after_resolve]: 8.17998e-06 [a_after_grad]: 7.82998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.30001e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 6.28e-06 [cse]: 1.298e-05 [a_3]: 3.266e-05 [py_interpret_to_execute_after_opt_a]: 7.51999e-06 [slice_cell_reuse_recomputed_activation]: 2.17001e-06 [rewriter_after_opt_a]: 3.257e-05 [convert_after_rewriter]: 6.76e-06 [order_py_execute_after_rewriter]: 5.18002e-06 [mutable_eliminate]: 0.00046332 [opt_b]: 0.00018625, [1] [Cycle 1]: 0.0001802, [7] [b_1]: 0.00010866 [b_2]: 7.46001e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 4.2998e-07 [cse]: 1.805e-05 [optimize_parallel_all_gather_comm]: 1.601e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.161e-05 [loop_unroll]: 0.00042148 [opt_after_cconv]: 9.606e-05, [1] [Cycle 1]: 9.049e-05, [7] [c_1]: 2.552e-05 [parameter_eliminate]: 2.79001e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 2.37999e-06 [cse]: 1.652e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.523e-05 [tuple_transform]: 6.922e-05, [1] [Cycle 1]: 6.453e-05, [4] [d_1]: 3.699e-05 [none_parameter_eliminate]: 1.86998e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.29999e-06 [partial_unused_args_eliminate]: 1.77001e-06 [add_recomputation]: 4.307e-05 [cse_after_recomputation]: 2.206e-05, [1] [Cycle 1]: 1.747e-05, [1] [cse]: 1.156e-05 [environ_conv]: 5.48002e-06 [swap_dp_allreduce_reducescatter]: 5.34e-06 [bias_add_comm_swap]: 2.42001e-06 [label_micro_interleaved_index]: 5.32001e-06 [label_fine_grained_interleaved_index]: 2.57001e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.13002e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.20999e-06 [full_micro_interleaved_order_control]: 2.54999e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.39998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67001e-06 [control_data_broadcast_order]: 1.244e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.85e-06 [overlap_recompute_and_grad_model_parallel]: 4.57e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.11998e-06 [overlap_grad_ring_attention]: 4.2e-06 [overlap_grad_flash_sp]: 1.778e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 1.09998e-06 [symbol_engine_optimizer]: 7.163e-05, [1] [Cycle 1]: 6.735e-05, [6] [build]: 2.34001e-06 [elim_shapecalc]: 8.45999e-06 [elim_not_effective]: 1.236e-05 [opt_reshape]: 6.29001e-06 [fold_const_symbol]: 9.86998e-06 [renormalize]: 2.69996e-07 [detach_backward]: 1.71998e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.593e-05 [get_jit_bprop_graph]: 1.31002e-06 [rewriter_after_jit_bprop_graph]: 3.7e-06 [opt_after_jit_grad]: 0.0004921 [validate]: 3.332e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.0060189 [execute]: 9.09e-06 Sums bootstrap : 0.000436s : 2.69% type_inference : 0.005931s : 36.64% event_method : 0.000012s : 0.07% auto_monad : 0.000059s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000050s : 0.31% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000036s : 0.22% optimize.opt_a.loop_unroll : 0.000022s : 0.14% optimize.opt_a.a_1 : 0.000520s : 3.21% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000153s : 0.94% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000387s : 2.39% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000042s : 0.26% optimize.opt_a.a_3 : 0.000074s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000463s : 2.86% optimize.opt_b.b_1 : 0.000109s : 0.67% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000421s : 2.60% optimize.opt_after_cconv.c_1 : 0.000026s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.27% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000492s : 3.04% validate : 0.000033s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006019s : 37.18% execute : 0.000009s : 0.06% Time group info: ------[substitution.] 0.000140 24 20.57% : 0.000029s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 0.96% : 0.000001s : 2: substitution.fold_const_symbol 3.98% : 0.000006s : 3: substitution.graph_param_transform 64.99% : 0.000091s : 3: substitution.inline 2.22% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.62% : 0.000005s : 4: substitution.remove_not_recompute_node 2.15% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005888 2 91.82% : 0.005406s : 1: type_inference.infer 8.18% : 0.000481s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000089 3 100.00% : 0.000089s : 3: match.inline ------[predicate.] 0.000200 815 0.65% : 0.000001s : 8: predicate.accumulaten_eliminater 0.71% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.47% : 0.000001s : 6: predicate.addn_check_dump 0.70% : 0.000001s : 8: predicate.addn_zero_filter 0.58% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 27.96% : 0.000056s : 14: predicate.arithmetic_simplify 0.63% : 0.000001s : 8: predicate.cast_eliminate 0.55% : 0.000001s : 6: predicate.check_bprop_eliminate 0.46% : 0.000001s : 6: predicate.compare_switch_simplify 0.15% : 0.000000s : 3: predicate.const_output_eliminate 0.50% : 0.000001s : 6: predicate.depend_value_elim 0.66% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.67% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.61% : 0.000001s : 8: predicate.dict_set_item_eliminator 0.78% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.20% : 0.000000s : 3: predicate.elim_not_effective 0.30% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 0.87% : 0.000002s : 11: predicate.environ_add_const_eliminate 0.80% : 0.000002s : 11: predicate.environ_get_add_eliminate 0.80% : 0.000002s : 11: predicate.environ_get_depend_swap 1.37% : 0.000003s : 17: predicate.environ_get_eliminate 0.79% : 0.000002s : 11: predicate.environ_get_set_eliminate 0.86% : 0.000002s : 11: predicate.exchange_switch_depend_value 1.60% : 0.000003s : 11: predicate.float_depend_g_call 0.46% : 0.000001s : 6: predicate.float_environ_get_switch 0.66% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.17% : 0.000000s : 3: predicate.fold_const_symbol 0.56% : 0.000001s : 6: predicate.get_grad_eliminate 0.17% : 0.000000s : 3: predicate.graph_param_transform 0.52% : 0.000001s : 6: predicate.incorporate_call 0.48% : 0.000001s : 6: predicate.incorporate_call_switch 4.85% : 0.000010s : 37: predicate.inline 0.75% : 0.000001s : 6: predicate.inline_without_move 0.33% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.70% : 0.000001s : 6: predicate.less_batch_normalization 1.13% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 1.63% : 0.000003s : 22: predicate.load_eliminater 0.78% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.49% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.23% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.48% : 0.000001s : 6: predicate.merge_addn 0.47% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.47% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.58% : 0.000001s : 8: predicate.minmaximum_grad 0.89% : 0.000002s : 3: predicate.mutable_eliminate 0.29% : 0.000001s : 3: predicate.opt_reshape 0.28% : 0.000001s : 3: predicate.parallel_virtual_node 1.07% : 0.000002s : 11: predicate.partial_defer_inline 0.97% : 0.000002s : 11: predicate.partial_eliminate 0.76% : 0.000002s : 8: predicate.print_const_string_wrapper 0.49% : 0.000001s : 6: predicate.reduce_all_const_elim 0.85% : 0.000002s : 8: predicate.reduce_eliminate 1.65% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.37% : 0.000001s : 6: predicate.remove_not_recompute_node 0.92% : 0.000002s : 14: predicate.replace_applicator 0.54% : 0.000001s : 6: predicate.replace_old_param 0.24% : 0.000000s : 3: predicate.reset_defer_inline 0.69% : 0.000001s : 8: predicate.reshape_eliminate 0.59% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.60% : 0.000001s : 6: predicate.same_eliminate 0.38% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.62% : 0.000001s : 6: predicate.shard_identity_eliminate 0.66% : 0.000001s : 6: predicate.special_op_eliminate 0.69% : 0.000001s : 6: predicate.specialize_transform 0.75% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.57% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.30% : 0.000001s : 3: predicate.switch_call_monad_eliminater 0.92% : 0.000002s : 11: predicate.switch_defer_inline 1.36% : 0.000003s : 17: predicate.switch_layer_defer_inline 3.59% : 0.000007s : 38: predicate.switch_simplify 0.67% : 0.000001s : 8: predicate.tile_eliminate 0.63% : 0.000001s : 8: predicate.transpose_eliminate 1.16% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.16% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.03% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 2.41% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.06% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 1.71% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.11% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 1.60% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.25% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 3: predicate.value_based_eliminate 0.58% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.54% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000282 7 38.58% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.42% : 0.000173s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028508 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.50% : 0.002992s : 1: add_attr 10.46% : 0.002983s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000064s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.65% : 0.000471s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.05% : 0.000015s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.51% : 0.000429s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.66% : 0.000472s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.10% : 0.000885s : 78: opt.transform.opt_a 0.09% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000041s : 2: opt.transform.opt_trans_graph 0.12% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.23% : 0.002061s : 1: opt_a 0.35% : 0.000099s : 1: opt_after_cconv 1.76% : 0.000502s : 1: opt_after_jit_grad 0.66% : 0.000189s : 1: opt_b 13.78% : 0.003930s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000019s : 1: remove_dup_value 0.70% : 0.000201s : 1: renormalize.infer 0.63% : 0.000179s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.19% : 0.000054s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000074s : 1: symbol_engine_optimizer 21.15% : 0.006030s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 20.85% : 0.005945s : 1: type_inference 0.21% : 0.000061s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x6-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x6-kbk],max_mem:14.0M TotalTime = 0.0678399, [24] [bootstrap]: 0.00069286 [type_inference]: 0.00708463 [event_method]: 1.566e-05 [auto_monad]: 6.353e-05 [graph_reusing]: 5.35999e-06 [inline]: 2.54999e-06 [add_attr]: 0.0039704, [1] [add_attr_with_inline]: 0.00395722, [1] [Cycle 1]: 6.156e-05, [2] [tag_attr]: 1.718e-05 [meta_addattr_fg_expand]: 4.58999e-06 [parallel-infer-symbol]: 3.91999e-06 [pre_auto_parallel]: 3.099e-05 [insert-virtual-dataset]: 2.71e-06 [parallel-infer-symbol-second]: 9.50007e-07 [dataset_repeat_opt]: 2.08998e-06 [pipeline_split]: 1.78002e-06 [optimize]: 0.00483542, [53] [py_interpret_to_execute]: 2.613e-05 [rewriter_before_opt_a]: 7.164e-05 [opt_a]: 0.00253864, [2] [Cycle 1]: 0.00187924, [45] [expand_dump_flag]: 3.40003e-06 [switch_simplify]: 3.49e-05 [loop_unroll]: 2.057e-05 [a_1]: 0.00046779 [with_stream_mark]: 1.781e-05 [recompute_prepare]: 1.02e-05 [updatestate_depend_eliminate]: 4.50001e-06 [updatestate_assign_eliminate]: 3.74002e-06 [updatestate_loads_eliminate]: 3.18e-06 [parameter_eliminate]: 1.86e-06 [a_2]: 8.167e-05 [accelerated_algorithm]: 7.38999e-06 [shard]: 2.60002e-06 [meta_shard_fg_expand]: 1.92999e-06 [shard_inline]: 6.84001e-06 [merge_send_recv]: 9.70002e-06 [auto_parallel]: 7.86001e-06 [parallel]: 2.774e-05 [flash_sp]: 9.44e-06 [merge_comm]: 4.83001e-06 [allreduce_fusion]: 3.7e-06 [matmul_add_comm_reduction]: 1.014e-05 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 8.12e-06 [virtual_dataset]: 6.04999e-06 [get_grad_eliminate_]: 6.11e-06 [virtual_output]: 5.84e-06 [merge_forward]: 5.05001e-06 [cell_reuse_recompute_pass]: 1.82999e-06 [offload_activation]: 1.068e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.368e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 1.03e-05 [set_forward_comm_id_for_comm_node_pass]: 4.15e-06 [meta_fg_expand]: 2.89999e-06 [flash_sp_send_recv_attached]: 2.63003e-06 [receive_attached]: 2.22999e-06 [after_resolve]: 9.80002e-06 [a_after_grad]: 9.00001e-06 [renormalize]: 0.00066605 [add_forward_monad_depend]: 1.158e-05 [auto_monad_grad]: 2.43e-06 [auto_monad_eliminator]: 1.611e-05 [cse]: 3.223e-05 [a_3]: 4.727e-05 [Cycle 2]: 0.00064775, [45] [expand_dump_flag]: 1.54e-06 [switch_simplify]: 7.50998e-06 [loop_unroll]: 5.75001e-06 [a_1]: 0.00012111 [with_stream_mark]: 1.363e-05 [recompute_prepare]: 6.26e-06 [updatestate_depend_eliminate]: 3.61999e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.63e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 7.263e-05 [accelerated_algorithm]: 5.97001e-06 [shard]: 1.17999e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 6.40002e-06 [merge_send_recv]: 5.73002e-06 [auto_parallel]: 6.59001e-06 [parallel]: 5.99e-06 [flash_sp]: 3.46999e-06 [merge_comm]: 3.51001e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 6.71999e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 6.53998e-06 [virtual_dataset]: 5.35999e-06 [get_grad_eliminate_]: 5.37999e-06 [virtual_output]: 5.39e-06 [merge_forward]: 3.03e-06 [cell_reuse_recompute_pass]: 1.67999e-06 [offload_activation]: 8.40001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.119e-05 [merge_recompute_call_nodes]: 1.17e-06 [before_grad]: 1.022e-05 [set_forward_comm_id_for_comm_node_pass]: 3.85e-06 [meta_fg_expand]: 1.94999e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.22999e-06 [after_resolve]: 9.37001e-06 [a_after_grad]: 8.05e-06 [renormalize]: 5.9983e-08 [add_forward_monad_depend]: 1.68002e-06 [auto_monad_grad]: 1.32e-06 [auto_monad_eliminator]: 8.55999e-06 [cse]: 1.668e-05 [a_3]: 3.28e-05 [py_interpret_to_execute_after_opt_a]: 1.156e-05 [slice_cell_reuse_recomputed_activation]: 2.21998e-06 [rewriter_after_opt_a]: 4.049e-05 [convert_after_rewriter]: 7.15e-06 [order_py_execute_after_rewriter]: 5.03002e-06 [mutable_eliminate]: 0.00057837 [opt_b]: 0.00019991, [1] [Cycle 1]: 0.00019211, [7] [b_1]: 0.00011174 [b_2]: 8.25e-06 [updatestate_depend_eliminate]: 6.99001e-06 [updatestate_assign_eliminate]: 2.72001e-06 [updatestate_loads_eliminate]: 2.71e-06 [renormalize]: 8.2e-07 [cse]: 2.091e-05 [optimize_parallel_all_gather_comm]: 1.868e-05 [overlap_param_gather]: 2.18998e-06 [cconv]: 2.7e-05 [loop_unroll]: 0.00052799 [opt_after_cconv]: 0.00010187, [1] [Cycle 1]: 9.533e-05, [7] [c_1]: 2.712e-05 [parameter_eliminate]: 3.43e-06 [updatestate_depend_eliminate]: 6.36998e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.46e-06 [cse]: 1.885e-05 [renormalize]: 4.2998e-07 [remove_dup_value]: 1.676e-05 [tuple_transform]: 7.277e-05, [1] [Cycle 1]: 6.789e-05, [4] [d_1]: 3.99e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.33e-06 [partial_unused_args_eliminate]: 2.22999e-06 [add_recomputation]: 5.524e-05 [cse_after_recomputation]: 2.3e-05, [1] [Cycle 1]: 1.802e-05, [1] [cse]: 1.211e-05 [environ_conv]: 8.73001e-06 [swap_dp_allreduce_reducescatter]: 5.75001e-06 [bias_add_comm_swap]: 2.91999e-06 [label_micro_interleaved_index]: 5.23002e-06 [label_fine_grained_interleaved_index]: 2.83998e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.18002e-06 [micro_interleaved_order_control]: 2.63e-06 [assign_add_opt]: 1.59e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.06997e-06 [full_micro_interleaved_order_control]: 2.24001e-06 [reorder_send_recv_between_fp_bp]: 2.99999e-06 [comm_op_add_attrs]: 1.43002e-06 [add_comm_op_reuse_tag]: 1.04998e-06 [interleave_split_concat_branches]: 1.47999e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.36998e-06 [overlap_opt_shard_grad_in_pipeline]: 2.00002e-06 [control_data_broadcast_order]: 1.394e-05 [grouped_pairwise_exchange_alltoall]: 2.32001e-06 [offloading_packed_experts]: 4.53999e-06 [overlap_recompute_and_grad_model_parallel]: 5.19e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.65002e-06 [overlap_grad_ring_attention]: 4.40999e-06 [overlap_grad_flash_sp]: 1.987e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.36e-06 [split_layernorm_comm]: 2.13002e-06 [handle_group_info]: 1.39998e-06 [symbol_engine_optimizer]: 7.886e-05, [1] [Cycle 1]: 7.329e-05, [6] [build]: 3.21999e-06 [elim_shapecalc]: 1.071e-05 [elim_not_effective]: 1.317e-05 [opt_reshape]: 6.49999e-06 [fold_const_symbol]: 9.72001e-06 [renormalize]: 2.9002e-07 [detach_backward]: 2.36998e-06 [pipeline_parallel_scheduler]: 1.72999e-06 [auto_monad_reorder]: 1.767e-05 [get_jit_bprop_graph]: 1.59998e-06 [rewriter_after_jit_bprop_graph]: 4.45999e-06 [opt_after_jit_grad]: 0.00051714 [validate]: 4.325e-05 [backend_pass]: 1.17999e-06 [task_emit]: 0.0502745 [execute]: 9.82001e-06 Sums bootstrap : 0.000693s : 1.11% type_inference : 0.007085s : 11.30% event_method : 0.000016s : 0.02% auto_monad : 0.000064s : 0.10% graph_reusing : 0.000005s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000031s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000026s : 0.04% optimize.rewriter_before_opt_a : 0.000072s : 0.11% optimize.opt_a.expand_dump_flag : 0.000005s : 0.01% optimize.opt_a.switch_simplify : 0.000042s : 0.07% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000589s : 0.94% optimize.opt_a.with_stream_mark : 0.000031s : 0.05% optimize.opt_a.recompute_prepare : 0.000016s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000154s : 0.25% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.01% optimize.opt_a.shard_inline : 0.000013s : 0.02% optimize.opt_a.merge_send_recv : 0.000015s : 0.02% optimize.opt_a.auto_parallel : 0.000014s : 0.02% optimize.opt_a.parallel : 0.000034s : 0.05% optimize.opt_a.flash_sp : 0.000013s : 0.02% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000008s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000019s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000666s : 1.06% optimize.opt_a.add_forward_monad_depend : 0.000013s : 0.02% optimize.opt_a.auto_monad_grad : 0.000004s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.04% optimize.opt_a.cse : 0.000049s : 0.08% optimize.opt_a.a_3 : 0.000080s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000040s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000578s : 0.92% optimize.opt_b.b_1 : 0.000112s : 0.18% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000021s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000027s : 0.04% optimize.loop_unroll : 0.000528s : 0.84% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000017s : 0.03% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000055s : 0.09% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000517s : 0.83% validate : 0.000043s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.050275s : 80.21% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000194 26 19.44% : 0.000038s : 5: substitution.arithmetic_simplify 0.98% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.14% : 0.000006s : 3: substitution.graph_param_transform 64.36% : 0.000125s : 3: substitution.inline 2.03% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.78% : 0.000005s : 4: substitution.remove_not_recompute_node 1.90% : 0.000004s : 2: substitution.replace_old_param 4.65% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007025 2 90.59% : 0.006364s : 1: type_inference.infer 9.41% : 0.000661s : 1: type_inference.specialize ------[replace.] 0.000041 4 78.74% : 0.000033s : 3: replace.inline 21.26% : 0.000009s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000131 4 93.75% : 0.000123s : 3: match.inline 6.25% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000167 883 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 1.31% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.53% : 0.000001s : 6: predicate.addn_check_dump 0.91% : 0.000002s : 9: predicate.addn_zero_filter 0.78% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.22% : 0.000004s : 15: predicate.arithmetic_simplify 0.88% : 0.000001s : 9: predicate.cast_eliminate 0.60% : 0.000001s : 6: predicate.check_bprop_eliminate 0.53% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.77% : 0.000001s : 6: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000002s : 9: predicate.dict_get_item_eliminator 1.11% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.50% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.35% : 0.000001s : 3: predicate.elim_not_effective 0.60% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.03% : 0.000002s : 12: predicate.environ_get_depend_swap 1.73% : 0.000003s : 18: predicate.environ_get_eliminate 1.06% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 13: predicate.float_depend_g_call 0.53% : 0.000001s : 6: predicate.float_environ_get_switch 0.79% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.91% : 0.000002s : 6: predicate.get_grad_eliminate 0.29% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.57% : 0.000001s : 6: predicate.incorporate_call_switch 6.16% : 0.000010s : 40: predicate.inline 0.89% : 0.000001s : 6: predicate.inline_without_move 0.47% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.77% : 0.000001s : 6: predicate.less_batch_normalization 1.52% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.28% : 0.000004s : 25: predicate.load_eliminater 1.32% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.12% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.53% : 0.000001s : 6: predicate.merge_addn 0.59% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 9: predicate.minmaximum_grad 2.09% : 0.000003s : 3: predicate.mutable_eliminate 0.35% : 0.000001s : 3: predicate.opt_reshape 0.35% : 0.000001s : 3: predicate.parallel_virtual_node 2.06% : 0.000003s : 13: predicate.partial_defer_inline 1.40% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000001s : 9: predicate.print_const_string_wrapper 0.58% : 0.000001s : 6: predicate.reduce_all_const_elim 1.17% : 0.000002s : 9: predicate.reduce_eliminate 2.23% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.63% : 0.000001s : 6: predicate.remove_not_recompute_node 1.15% : 0.000002s : 16: predicate.replace_applicator 0.59% : 0.000001s : 6: predicate.replace_old_param 0.37% : 0.000001s : 3: predicate.reset_defer_inline 0.85% : 0.000001s : 9: predicate.reshape_eliminate 0.57% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 3: predicate.row_tensor_eliminate 0.86% : 0.000001s : 6: predicate.same_eliminate 0.56% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 6: predicate.shard_identity_eliminate 0.72% : 0.000001s : 6: predicate.special_op_eliminate 0.77% : 0.000001s : 6: predicate.specialize_transform 0.99% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.30% : 0.000002s : 13: predicate.switch_defer_inline 1.82% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.88% : 0.000008s : 43: predicate.switch_simplify 0.91% : 0.000002s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.43% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.55% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.16% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.95% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.62% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 6: predicate.virtual_output_eliminate 0.28% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000425 8 43.70% : 0.000186s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.30% : 0.000239s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078355 196 0.00% : 0.000004s : 1: ForceFp32Comm 5.07% : 0.003976s : 1: add_attr 5.06% : 0.003961s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000060s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.09% : 0.000069s : 1: auto_monad 0.03% : 0.000022s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.94% : 0.000738s : 1: bootstrap 0.04% : 0.000031s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000026s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.03% : 0.000023s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.69% : 0.000537s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000590s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000017s : 1: opt.transform.mutable_eliminate 1.24% : 0.000975s : 78: opt.transform.opt_a 0.03% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000091s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.05% : 0.000036s : 4: opt.transform.symbol_engine_opt 3.24% : 0.002542s : 1: opt_a 0.13% : 0.000105s : 1: opt_after_cconv 0.68% : 0.000531s : 1: opt_after_jit_grad 0.26% : 0.000203s : 1: opt_b 6.18% : 0.004841s : 1: optimize 0.03% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.05% : 0.000036s : 1: pre_auto_parallel 0.04% : 0.000031s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000020s : 1: remove_dup_value 0.47% : 0.000365s : 1: renormalize.infer 0.37% : 0.000291s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000045s : 1: rewriter_after_opt_a 0.10% : 0.000076s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.10% : 0.000082s : 1: symbol_engine_optimizer 64.20% : 0.050301s : 1: task_emit 0.10% : 0.000076s : 1: tuple_transform 9.07% : 0.007108s : 1: type_inference 0.09% : 0.000072s : 1: validate TotalTime = 0.0588196, [24] [bootstrap]: 0.0005057 [type_inference]: 0.00622385 [event_method]: 1.374e-05 [auto_monad]: 6.107e-05 [graph_reusing]: 5.72999e-06 [inline]: 1.94e-06 [add_attr]: 0.00315484, [1] [add_attr_with_inline]: 0.00314525, [1] [Cycle 1]: 5.495e-05, [2] [tag_attr]: 1.519e-05 [meta_addattr_fg_expand]: 4.46002e-06 [parallel-infer-symbol]: 3.55e-06 [pre_auto_parallel]: 2.681e-05 [insert-virtual-dataset]: 2.59999e-06 [parallel-infer-symbol-second]: 1.03001e-06 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00425092, [53] [py_interpret_to_execute]: 2.37e-05 [rewriter_before_opt_a]: 5.474e-05 [opt_a]: 0.00220497, [2] [Cycle 1]: 0.00157593, [45] [expand_dump_flag]: 3.46001e-06 [switch_simplify]: 2.888e-05 [loop_unroll]: 1.73e-05 [a_1]: 0.00036442 [with_stream_mark]: 1.592e-05 [recompute_prepare]: 8.90999e-06 [updatestate_depend_eliminate]: 4.11001e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 3.38e-06 [parameter_eliminate]: 1.83997e-06 [a_2]: 8.229e-05 [accelerated_algorithm]: 7.27002e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 6.56999e-06 [merge_send_recv]: 9.02e-06 [auto_parallel]: 6.51e-06 [parallel]: 1.971e-05 [flash_sp]: 7.8e-06 [merge_comm]: 3.78999e-06 [allreduce_fusion]: 4.14002e-06 [matmul_add_comm_reduction]: 9.31e-06 [allreduce_slice_to_reducescatter]: 1.05999e-06 [virtual_shard_identity]: 7.41999e-06 [virtual_dataset]: 6.39001e-06 [get_grad_eliminate_]: 5.77001e-06 [virtual_output]: 5.86e-06 [merge_forward]: 4.08999e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.028e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.213e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 1.02e-05 [set_forward_comm_id_for_comm_node_pass]: 4.10998e-06 [meta_fg_expand]: 2.84001e-06 [flash_sp_send_recv_attached]: 2.54001e-06 [receive_attached]: 2.02001e-06 [after_resolve]: 9.84001e-06 [a_after_grad]: 8.92999e-06 [renormalize]: 0.00052644 [add_forward_monad_depend]: 4.58999e-06 [auto_monad_grad]: 2.09999e-06 [auto_monad_eliminator]: 1.52e-05 [cse]: 3.075e-05 [a_3]: 4.51e-05 [Cycle 2]: 0.00061753, [45] [expand_dump_flag]: 1.13001e-06 [switch_simplify]: 7.31001e-06 [loop_unroll]: 6.09001e-06 [a_1]: 0.00011699 [with_stream_mark]: 1.071e-05 [recompute_prepare]: 6.26e-06 [updatestate_depend_eliminate]: 3.3e-06 [updatestate_assign_eliminate]: 2.73e-06 [updatestate_loads_eliminate]: 2.76e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 7.138e-05 [accelerated_algorithm]: 5.86003e-06 [shard]: 8.99978e-07 [meta_shard_fg_expand]: 1.39998e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 5.87001e-06 [auto_parallel]: 6.73e-06 [parallel]: 4.43999e-06 [flash_sp]: 3.33998e-06 [merge_comm]: 3.68e-06 [allreduce_fusion]: 3.09999e-06 [matmul_add_comm_reduction]: 5.27001e-06 [allreduce_slice_to_reducescatter]: 4.59986e-07 [virtual_shard_identity]: 6.56e-06 [virtual_dataset]: 5.74999e-06 [get_grad_eliminate_]: 5.22e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 3.11999e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 6.76999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 9.00007e-07 [before_grad]: 8.69e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43999e-06 [meta_fg_expand]: 2.02001e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.70001e-06 [a_after_grad]: 7.81001e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 1.06002e-06 [auto_monad_eliminator]: 6.69999e-06 [cse]: 1.384e-05 [a_3]: 3.222e-05 [py_interpret_to_execute_after_opt_a]: 9.05999e-06 [slice_cell_reuse_recomputed_activation]: 2.15002e-06 [rewriter_after_opt_a]: 3.476e-05 [convert_after_rewriter]: 7.10002e-06 [order_py_execute_after_rewriter]: 5.46e-06 [mutable_eliminate]: 0.00053698 [opt_b]: 0.00024095, [1] [Cycle 1]: 0.00023389, [7] [b_1]: 0.00015898 [b_2]: 7.25e-06 [updatestate_depend_eliminate]: 6.19999e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.50002e-06 [renormalize]: 3.89991e-07 [cse]: 1.886e-05 [optimize_parallel_all_gather_comm]: 1.634e-05 [overlap_param_gather]: 1.86998e-06 [cconv]: 2.601e-05 [loop_unroll]: 0.00043434 [opt_after_cconv]: 9.923e-05, [1] [Cycle 1]: 9.294e-05, [7] [c_1]: 2.585e-05 [parameter_eliminate]: 2.92002e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.44999e-06 [cse]: 1.885e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.571e-05 [tuple_transform]: 6.994e-05, [1] [Cycle 1]: 6.482e-05, [4] [d_1]: 3.855e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.55002e-06 [partial_unused_args_eliminate]: 2.02999e-06 [add_recomputation]: 4.526e-05 [cse_after_recomputation]: 2.219e-05, [1] [Cycle 1]: 1.727e-05, [1] [cse]: 1.168e-05 [environ_conv]: 6.09999e-06 [swap_dp_allreduce_reducescatter]: 5.18002e-06 [bias_add_comm_swap]: 2.45002e-06 [label_micro_interleaved_index]: 4.43001e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.31998e-06 [assign_add_opt]: 1.97999e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 2.76999e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 1.06002e-06 [interleave_split_concat_branches]: 1.50999e-06 [interleave_parallel_branches]: 1.13001e-06 [overlap_opt_shard_in_pipeline]: 1.22999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.75001e-06 [control_data_broadcast_order]: 1.265e-05 [grouped_pairwise_exchange_alltoall]: 1.39e-06 [offloading_packed_experts]: 4.58999e-06 [overlap_recompute_and_grad_model_parallel]: 4.97e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27e-06 [overlap_recompute_comm]: 2.46e-06 [overlap_grad_ring_attention]: 4.13001e-06 [overlap_grad_flash_sp]: 1.918e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.29999e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 7.483e-05, [1] [Cycle 1]: 7.002e-05, [6] [build]: 2.95998e-06 [elim_shapecalc]: 8.97999e-06 [elim_not_effective]: 1.262e-05 [opt_reshape]: 6.24001e-06 [fold_const_symbol]: 9.78998e-06 [renormalize]: 3.49974e-07 [detach_backward]: 2.31e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.715e-05 [get_jit_bprop_graph]: 1.78002e-06 [rewriter_after_jit_bprop_graph]: 3.82002e-06 [opt_after_jit_grad]: 0.000472 [validate]: 3.882e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0437812 [execute]: 1.056e-05 Sums bootstrap : 0.000506s : 0.93% type_inference : 0.006224s : 11.40% event_method : 0.000014s : 0.03% auto_monad : 0.000061s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000027s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.04% optimize.rewriter_before_opt_a : 0.000055s : 0.10% optimize.opt_a.expand_dump_flag : 0.000005s : 0.01% optimize.opt_a.switch_simplify : 0.000036s : 0.07% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000481s : 0.88% optimize.opt_a.with_stream_mark : 0.000027s : 0.05% optimize.opt_a.recompute_prepare : 0.000015s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000154s : 0.28% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000015s : 0.03% optimize.opt_a.auto_parallel : 0.000013s : 0.02% optimize.opt_a.parallel : 0.000024s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000527s : 0.96% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.04% optimize.opt_a.cse : 0.000045s : 0.08% optimize.opt_a.a_3 : 0.000077s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000537s : 0.98% optimize.opt_b.b_1 : 0.000159s : 0.29% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.05% optimize.loop_unroll : 0.000434s : 0.80% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.03% optimize.tuple_transform.d_1 : 0.000039s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.08% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000019s : 0.04% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000472s : 0.86% validate : 0.000039s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.043781s : 80.17% execute : 0.000011s : 0.02% Time group info: ------[substitution.] 0.000154 24 20.94% : 0.000032s : 4: substitution.arithmetic_simplify 1.30% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000002s : 2: substitution.fold_const_symbol 3.60% : 0.000006s : 3: substitution.graph_param_transform 65.68% : 0.000101s : 3: substitution.inline 2.13% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.96% : 0.000005s : 4: substitution.remove_not_recompute_node 2.34% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.006173 2 92.22% : 0.005693s : 1: type_inference.infer 7.78% : 0.000480s : 1: type_inference.specialize ------[replace.] 0.000028 3 100.00% : 0.000028s : 3: replace.inline ------[match.] 0.000100 3 100.00% : 0.000100s : 3: match.inline ------[predicate.] 0.000150 815 0.89% : 0.000001s : 8: predicate.accumulaten_eliminater 1.08% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 6: predicate.addn_check_dump 0.85% : 0.000001s : 8: predicate.addn_zero_filter 0.79% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 14: predicate.arithmetic_simplify 0.85% : 0.000001s : 8: predicate.cast_eliminate 0.71% : 0.000001s : 6: predicate.check_bprop_eliminate 0.61% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.67% : 0.000001s : 6: predicate.depend_value_elim 0.83% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.42% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_depend_swap 1.92% : 0.000003s : 17: predicate.environ_get_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.16% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.88% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.96% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.77% : 0.000001s : 6: predicate.incorporate_call 0.61% : 0.000001s : 6: predicate.incorporate_call_switch 6.51% : 0.000010s : 37: predicate.inline 0.91% : 0.000001s : 6: predicate.inline_without_move 0.41% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.01% : 0.000002s : 6: predicate.less_batch_normalization 1.55% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 22: predicate.load_eliminater 1.04% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 6: predicate.merge_addn 0.67% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 8: predicate.minmaximum_grad 1.18% : 0.000002s : 3: predicate.mutable_eliminate 0.37% : 0.000001s : 3: predicate.opt_reshape 0.49% : 0.000001s : 3: predicate.parallel_virtual_node 1.44% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 11: predicate.partial_eliminate 0.84% : 0.000001s : 8: predicate.print_const_string_wrapper 0.65% : 0.000001s : 6: predicate.reduce_all_const_elim 1.09% : 0.000002s : 8: predicate.reduce_eliminate 2.24% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 6: predicate.remove_not_recompute_node 1.22% : 0.000002s : 14: predicate.replace_applicator 0.67% : 0.000001s : 6: predicate.replace_old_param 0.43% : 0.000001s : 3: predicate.reset_defer_inline 0.90% : 0.000001s : 8: predicate.reshape_eliminate 0.69% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.84% : 0.000001s : 6: predicate.same_eliminate 0.51% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 6: predicate.shard_identity_eliminate 0.77% : 0.000001s : 6: predicate.special_op_eliminate 0.89% : 0.000001s : 6: predicate.specialize_transform 1.14% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.24% : 0.000002s : 11: predicate.switch_defer_inline 1.88% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.72% : 0.000007s : 38: predicate.switch_simplify 0.83% : 0.000001s : 8: predicate.tile_eliminate 0.85% : 0.000001s : 8: predicate.transpose_eliminate 1.60% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.01% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 3: predicate.value_based_eliminate 0.96% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.35% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000330 7 38.93% : 0.000129s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.07% : 0.000202s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.067797 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.66% : 0.003160s : 1: add_attr 4.65% : 0.003149s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000049s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.10% : 0.000067s : 1: auto_monad 0.03% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.80% : 0.000544s : 1: bootstrap 0.04% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.03% : 0.000018s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.65% : 0.000444s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.81% : 0.000547s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.25% : 0.000851s : 78: opt.transform.opt_a 0.04% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.20% : 0.000137s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 3.26% : 0.002208s : 1: opt_a 0.15% : 0.000103s : 1: opt_after_cconv 0.71% : 0.000483s : 1: opt_after_jit_grad 0.36% : 0.000244s : 1: opt_b 6.28% : 0.004255s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.05% : 0.000031s : 1: pre_auto_parallel 0.04% : 0.000028s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.42% : 0.000282s : 1: renormalize.infer 0.35% : 0.000236s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000039s : 1: rewriter_after_opt_a 0.09% : 0.000059s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000078s : 1: symbol_engine_optimizer 64.62% : 0.043807s : 1: task_emit 0.11% : 0.000073s : 1: tuple_transform 9.21% : 0.006243s : 1: type_inference 0.10% : 0.000067s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x6-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x6-ge],max_mem:14.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x7-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x7-pynative],max_mem:14.0M TotalTime = 0.0240559, [24] [bootstrap]: 0.00061466 [type_inference]: 0.00691771 [event_method]: 1.598e-05 [auto_monad]: 6.541e-05 [graph_reusing]: 5.91e-06 [inline]: 2.15002e-06 [add_attr]: 0.00389948, [1] [add_attr_with_inline]: 0.00388603, [1] [Cycle 1]: 5.83e-05, [2] [tag_attr]: 1.641e-05 [meta_addattr_fg_expand]: 4.36002e-06 [parallel-infer-symbol]: 3.45e-06 [pre_auto_parallel]: 2.942e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 8.59989e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.92001e-06 [optimize]: 0.00462756, [53] [py_interpret_to_execute]: 2.512e-05 [rewriter_before_opt_a]: 6.715e-05 [opt_a]: 0.00251927, [2] [Cycle 1]: 0.00181873, [45] [expand_dump_flag]: 3.26001e-06 [switch_simplify]: 3.443e-05 [loop_unroll]: 2.066e-05 [a_1]: 0.00046498 [with_stream_mark]: 1.816e-05 [recompute_prepare]: 8.77999e-06 [updatestate_depend_eliminate]: 4.18001e-06 [updatestate_assign_eliminate]: 3.61001e-06 [updatestate_loads_eliminate]: 3.38e-06 [parameter_eliminate]: 1.98002e-06 [a_2]: 8.318e-05 [accelerated_algorithm]: 7.30003e-06 [shard]: 2.99999e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 6.12001e-06 [merge_send_recv]: 9.00001e-06 [auto_parallel]: 7.23e-06 [parallel]: 2.659e-05 [flash_sp]: 9.07999e-06 [merge_comm]: 4.77998e-06 [allreduce_fusion]: 3.68999e-06 [matmul_add_comm_reduction]: 1.023e-05 [allreduce_slice_to_reducescatter]: 9.00007e-07 [virtual_shard_identity]: 9.35001e-06 [virtual_dataset]: 6.78e-06 [get_grad_eliminate_]: 6.04999e-06 [virtual_output]: 6.27001e-06 [merge_forward]: 3.99002e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 1.066e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.417e-05 [merge_recompute_call_nodes]: 1.77001e-06 [before_grad]: 1.051e-05 [set_forward_comm_id_for_comm_node_pass]: 3.95e-06 [meta_fg_expand]: 2.81e-06 [flash_sp_send_recv_attached]: 2.66e-06 [receive_attached]: 2.04999e-06 [after_resolve]: 1.027e-05 [a_after_grad]: 9.32999e-06 [renormalize]: 0.00061365 [add_forward_monad_depend]: 1.058e-05 [auto_monad_grad]: 2.57001e-06 [auto_monad_eliminator]: 1.699e-05 [cse]: 3.18e-05 [a_3]: 4.584e-05 [Cycle 2]: 0.00068944, [45] [expand_dump_flag]: 1.20001e-06 [switch_simplify]: 7.88999e-06 [loop_unroll]: 6.12999e-06 [a_1]: 0.00011955 [with_stream_mark]: 1.267e-05 [recompute_prepare]: 6.26998e-06 [updatestate_depend_eliminate]: 3.49001e-06 [updatestate_assign_eliminate]: 2.79999e-06 [updatestate_loads_eliminate]: 2.68998e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 7.286e-05 [accelerated_algorithm]: 5.96998e-06 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.40999e-06 [shard_inline]: 6.12001e-06 [merge_send_recv]: 5.05999e-06 [auto_parallel]: 6.17001e-06 [parallel]: 6.09001e-06 [flash_sp]: 3.53e-06 [merge_comm]: 3.58999e-06 [allreduce_fusion]: 3.04001e-06 [matmul_add_comm_reduction]: 6.39999e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 6.88998e-06 [virtual_dataset]: 5.56002e-06 [get_grad_eliminate_]: 5.18002e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.69001e-06 [cell_reuse_recompute_pass]: 1.75001e-06 [offload_activation]: 7.2e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.201e-05 [merge_recompute_call_nodes]: 1.08001e-06 [before_grad]: 8.97999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55003e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.15001e-06 [after_resolve]: 8.45001e-06 [a_after_grad]: 7.92e-06 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 2.39001e-06 [auto_monad_grad]: 1.32e-06 [auto_monad_eliminator]: 9.97999e-06 [cse]: 1.809e-05 [a_3]: 3.416e-05 [py_interpret_to_execute_after_opt_a]: 1.021e-05 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 3.863e-05 [convert_after_rewriter]: 7.36999e-06 [order_py_execute_after_rewriter]: 4.84998e-06 [mutable_eliminate]: 0.00053841 [opt_b]: 0.00019974, [1] [Cycle 1]: 0.00019217, [7] [b_1]: 0.00011188 [b_2]: 7.45e-06 [updatestate_depend_eliminate]: 6.79999e-06 [updatestate_assign_eliminate]: 2.92002e-06 [updatestate_loads_eliminate]: 2.31998e-06 [renormalize]: 2.80008e-07 [cse]: 2.301e-05 [optimize_parallel_all_gather_comm]: 1.937e-05 [overlap_param_gather]: 2.74999e-06 [cconv]: 2.733e-05 [loop_unroll]: 0.00048146 [opt_after_cconv]: 0.00010426, [1] [Cycle 1]: 9.669e-05, [7] [c_1]: 2.625e-05 [parameter_eliminate]: 3.06001e-06 [updatestate_depend_eliminate]: 5.96e-06 [updatestate_assign_eliminate]: 3.04001e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 2.064e-05 [renormalize]: 6.09987e-07 [remove_dup_value]: 1.572e-05 [tuple_transform]: 7.014e-05, [1] [Cycle 1]: 6.552e-05, [4] [d_1]: 3.847e-05 [none_parameter_eliminate]: 1.66998e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.56e-06 [partial_unused_args_eliminate]: 2.25002e-06 [add_recomputation]: 5.321e-05 [cse_after_recomputation]: 2.261e-05, [1] [Cycle 1]: 1.807e-05, [1] [cse]: 1.228e-05 [environ_conv]: 8.32e-06 [swap_dp_allreduce_reducescatter]: 4.74998e-06 [bias_add_comm_swap]: 2.88e-06 [label_micro_interleaved_index]: 5.47999e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.58e-06 [assign_add_opt]: 1.86e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.39999e-06 [reorder_send_recv_between_fp_bp]: 3.18e-06 [comm_op_add_attrs]: 1.42999e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.85001e-06 [control_data_broadcast_order]: 1.376e-05 [grouped_pairwise_exchange_alltoall]: 1.76e-06 [offloading_packed_experts]: 4.16001e-06 [overlap_recompute_and_grad_model_parallel]: 4.45999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.77002e-06 [overlap_grad_ring_attention]: 4.32e-06 [overlap_grad_flash_sp]: 1.988e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.91e-06 [handle_group_info]: 1.14e-06 [symbol_engine_optimizer]: 7.923e-05, [1] [Cycle 1]: 7.482e-05, [6] [build]: 4.12e-06 [elim_shapecalc]: 9.80002e-06 [elim_not_effective]: 1.226e-05 [opt_reshape]: 6.40002e-06 [fold_const_symbol]: 9.71998e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.58002e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.702e-05 [get_jit_bprop_graph]: 1.74998e-06 [rewriter_after_jit_bprop_graph]: 4.43999e-06 [opt_after_jit_grad]: 0.00051115 [validate]: 4.197e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00704328 [execute]: 8.89e-06 Sums bootstrap : 0.000615s : 3.23% type_inference : 0.006918s : 36.36% event_method : 0.000016s : 0.08% auto_monad : 0.000065s : 0.34% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000025s : 0.13% optimize.rewriter_before_opt_a : 0.000067s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000042s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.14% optimize.opt_a.a_1 : 0.000585s : 3.07% optimize.opt_a.with_stream_mark : 0.000031s : 0.16% optimize.opt_a.recompute_prepare : 0.000015s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000156s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000014s : 0.07% optimize.opt_a.auto_parallel : 0.000013s : 0.07% optimize.opt_a.parallel : 0.000033s : 0.17% optimize.opt_a.flash_sp : 0.000013s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000018s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000026s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000614s : 3.23% optimize.opt_a.add_forward_monad_depend : 0.000013s : 0.07% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000027s : 0.14% optimize.opt_a.cse : 0.000050s : 0.26% optimize.opt_a.a_3 : 0.000080s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000039s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000538s : 2.83% optimize.opt_b.b_1 : 0.000112s : 0.59% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000023s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.10% optimize.overlap_param_gather : 0.000003s : 0.01% optimize.cconv : 0.000027s : 0.14% optimize.loop_unroll : 0.000481s : 2.53% optimize.opt_after_cconv.c_1 : 0.000026s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000021s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000053s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000008s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000020s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000511s : 2.69% validate : 0.000042s : 0.22% backend_pass : 0.000001s : 0.00% task_emit : 0.007043s : 37.02% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000189 26 20.47% : 0.000039s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000002s : 2: substitution.fold_const_symbol 3.07% : 0.000006s : 3: substitution.graph_param_transform 62.56% : 0.000119s : 3: substitution.inline 1.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.13% : 0.000006s : 4: substitution.remove_not_recompute_node 1.58% : 0.000003s : 2: substitution.replace_old_param 5.54% : 0.000010s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006861 2 90.64% : 0.006219s : 1: type_inference.infer 9.36% : 0.000642s : 1: type_inference.specialize ------[replace.] 0.000038 4 78.71% : 0.000030s : 3: replace.inline 21.29% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000125 4 92.76% : 0.000116s : 3: match.inline 7.24% : 0.000009s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000169 883 0.90% : 0.000002s : 9: predicate.accumulaten_eliminater 1.04% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.53% : 0.000001s : 6: predicate.addn_check_dump 0.83% : 0.000001s : 9: predicate.addn_zero_filter 0.77% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.15% : 0.000004s : 15: predicate.arithmetic_simplify 0.87% : 0.000001s : 9: predicate.cast_eliminate 0.70% : 0.000001s : 6: predicate.check_bprop_eliminate 0.55% : 0.000001s : 6: predicate.compare_switch_simplify 0.18% : 0.000000s : 3: predicate.const_output_eliminate 0.60% : 0.000001s : 6: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.67% : 0.000003s : 6: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 3: predicate.elim_not_effective 0.55% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.06% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 12: predicate.environ_get_depend_swap 1.62% : 0.000003s : 18: predicate.environ_get_eliminate 1.05% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.37% : 0.000004s : 13: predicate.float_depend_g_call 0.51% : 0.000001s : 6: predicate.float_environ_get_switch 0.79% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 3: predicate.fold_const_symbol 0.67% : 0.000001s : 6: predicate.get_grad_eliminate 0.32% : 0.000001s : 3: predicate.graph_param_transform 0.61% : 0.000001s : 6: predicate.incorporate_call 0.54% : 0.000001s : 6: predicate.incorporate_call_switch 6.84% : 0.000012s : 40: predicate.inline 1.34% : 0.000002s : 6: predicate.inline_without_move 0.37% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.77% : 0.000001s : 6: predicate.less_batch_normalization 1.69% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.20% : 0.000004s : 25: predicate.load_eliminater 1.85% : 0.000003s : 3: predicate.loop_unroll_after_grad 2.06% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.55% : 0.000001s : 6: predicate.merge_addn 0.55% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 9: predicate.minmaximum_grad 1.83% : 0.000003s : 3: predicate.mutable_eliminate 0.32% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.64% : 0.000003s : 13: predicate.partial_defer_inline 1.35% : 0.000002s : 13: predicate.partial_eliminate 0.87% : 0.000001s : 9: predicate.print_const_string_wrapper 0.55% : 0.000001s : 6: predicate.reduce_all_const_elim 1.09% : 0.000002s : 9: predicate.reduce_eliminate 2.22% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 6: predicate.remove_not_recompute_node 1.30% : 0.000002s : 16: predicate.replace_applicator 0.58% : 0.000001s : 6: predicate.replace_old_param 0.39% : 0.000001s : 3: predicate.reset_defer_inline 1.05% : 0.000002s : 9: predicate.reshape_eliminate 0.62% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.93% : 0.000002s : 6: predicate.same_eliminate 0.57% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.11% : 0.000002s : 6: predicate.shard_identity_eliminate 0.73% : 0.000001s : 6: predicate.special_op_eliminate 0.79% : 0.000001s : 6: predicate.specialize_transform 0.98% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.64% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.26% : 0.000002s : 13: predicate.switch_defer_inline 1.89% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.02% : 0.000009s : 43: predicate.switch_simplify 0.84% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.38% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000003s : 21: predicate.tuple_list_set_item_eliminator 1.50% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.15% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 2.89% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.94% : 0.000002s : 6: predicate.virtual_dataset_eliminate 0.63% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000475 8 50.18% : 0.000239s : 3: func_graph_cloner_run.FuncGraphClonerGraph 49.82% : 0.000237s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.034322 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.38% : 0.003906s : 1: add_attr 11.33% : 0.003890s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000058s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.21% : 0.000072s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.91% : 0.000656s : 1: bootstrap 0.09% : 0.000031s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000026s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.03% : 0.000012s : 1: environ_conv 0.06% : 0.000022s : 1: event_method 0.04% : 0.000015s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.43% : 0.000492s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.60% : 0.000549s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000017s : 1: opt.transform.mutable_eliminate 2.84% : 0.000974s : 78: opt.transform.opt_a 0.07% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.26% : 0.000090s : 28: opt.transform.opt_b 0.12% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.35% : 0.002522s : 1: opt_a 0.31% : 0.000108s : 1: opt_after_cconv 1.52% : 0.000522s : 1: opt_after_jit_grad 0.59% : 0.000203s : 1: opt_b 13.50% : 0.004632s : 1: optimize 0.07% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000034s : 1: pre_auto_parallel 0.09% : 0.000030s : 1: py_interpret_to_execute 0.04% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.97% : 0.000331s : 1: renormalize.infer 0.80% : 0.000274s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000044s : 1: rewriter_after_opt_a 0.21% : 0.000072s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000082s : 1: symbol_engine_optimizer 20.57% : 0.007059s : 1: task_emit 0.21% : 0.000073s : 1: tuple_transform 20.21% : 0.006936s : 1: type_inference 0.23% : 0.000078s : 1: validate TotalTime = 0.0202417, [24] [bootstrap]: 0.00046993 [type_inference]: 0.00597901 [event_method]: 1.269e-05 [auto_monad]: 6.038e-05 [graph_reusing]: 5.96e-06 [inline]: 2.53998e-06 [add_attr]: 0.0030102, [1] [add_attr_with_inline]: 0.00300223, [1] [Cycle 1]: 5.454e-05, [2] [tag_attr]: 1.457e-05 [meta_addattr_fg_expand]: 4.1e-06 [parallel-infer-symbol]: 3.39001e-06 [pre_auto_parallel]: 2.383e-05 [insert-virtual-dataset]: 2.53998e-06 [parallel-infer-symbol-second]: 1.07e-06 [dataset_repeat_opt]: 2.13002e-06 [pipeline_split]: 1.99e-06 [optimize]: 0.00393563, [53] [py_interpret_to_execute]: 1.859e-05 [rewriter_before_opt_a]: 5.137e-05 [opt_a]: 0.00206259, [2] [Cycle 1]: 0.0014517, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 2.908e-05 [loop_unroll]: 1.734e-05 [a_1]: 0.00035123 [with_stream_mark]: 1.394e-05 [recompute_prepare]: 7.15e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.2e-06 [updatestate_loads_eliminate]: 3.13e-06 [parameter_eliminate]: 1.95001e-06 [a_2]: 8.055e-05 [accelerated_algorithm]: 6.58e-06 [shard]: 1.98002e-06 [meta_shard_fg_expand]: 1.76998e-06 [shard_inline]: 6.04001e-06 [merge_send_recv]: 8.73001e-06 [auto_parallel]: 6.17001e-06 [parallel]: 5.895e-05 [flash_sp]: 7.4e-06 [merge_comm]: 3.97e-06 [allreduce_fusion]: 3.63e-06 [matmul_add_comm_reduction]: 9.79e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 8.39002e-06 [virtual_dataset]: 6.14999e-06 [get_grad_eliminate_]: 5.67001e-06 [virtual_output]: 5.81e-06 [merge_forward]: 4.50001e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 9.49999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.166e-05 [merge_recompute_call_nodes]: 1.67001e-06 [before_grad]: 1.012e-05 [set_forward_comm_id_for_comm_node_pass]: 3.71001e-06 [meta_fg_expand]: 2.53998e-06 [flash_sp_send_recv_attached]: 2.58e-06 [receive_attached]: 2.11e-06 [after_resolve]: 9.12999e-06 [a_after_grad]: 8.53001e-06 [renormalize]: 0.00040339 [add_forward_monad_depend]: 5.05001e-06 [auto_monad_grad]: 1.82001e-06 [auto_monad_eliminator]: 1.365e-05 [cse]: 2.908e-05 [a_3]: 4.084e-05 [Cycle 2]: 0.00060143, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.61999e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00011516 [with_stream_mark]: 1.001e-05 [recompute_prepare]: 6.31e-06 [updatestate_depend_eliminate]: 2.95998e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 7.134e-05 [accelerated_algorithm]: 5.75001e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.40999e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 4.54998e-06 [auto_parallel]: 5.38002e-06 [parallel]: 4.35999e-06 [flash_sp]: 3.5e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 2.82002e-06 [matmul_add_comm_reduction]: 5.29998e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.46e-06 [virtual_dataset]: 5.51e-06 [get_grad_eliminate_]: 5.19998e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.91e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 6.02001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.80001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51001e-06 [meta_fg_expand]: 1.77999e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 8.18001e-06 [a_after_grad]: 7.84002e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 6.19001e-06 [cse]: 1.296e-05 [a_3]: 3.275e-05 [py_interpret_to_execute_after_opt_a]: 7.38999e-06 [slice_cell_reuse_recomputed_activation]: 2.46998e-06 [rewriter_after_opt_a]: 3.321e-05 [convert_after_rewriter]: 6.88e-06 [order_py_execute_after_rewriter]: 5.07999e-06 [mutable_eliminate]: 0.00046554 [opt_b]: 0.00018697, [1] [Cycle 1]: 0.00018092, [7] [b_1]: 0.00010983 [b_2]: 7.03998e-06 [updatestate_depend_eliminate]: 5.50001e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.59999e-06 [renormalize]: 4.19997e-07 [cse]: 1.74e-05 [optimize_parallel_all_gather_comm]: 1.676e-05 [overlap_param_gather]: 1.91998e-06 [cconv]: 2.407e-05 [loop_unroll]: 0.00041795 [opt_after_cconv]: 9.48e-05, [1] [Cycle 1]: 8.871e-05, [7] [c_1]: 2.599e-05 [parameter_eliminate]: 2.24999e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.61e-05 [renormalize]: 3.7998e-07 [remove_dup_value]: 1.432e-05 [tuple_transform]: 6.875e-05, [1] [Cycle 1]: 6.442e-05, [4] [d_1]: 3.726e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 6.33002e-06 [partial_unused_args_eliminate]: 2.30002e-06 [add_recomputation]: 4.493e-05 [cse_after_recomputation]: 2.148e-05, [1] [Cycle 1]: 1.692e-05, [1] [cse]: 1.137e-05 [environ_conv]: 5.57001e-06 [swap_dp_allreduce_reducescatter]: 5.05001e-06 [bias_add_comm_swap]: 2.37999e-06 [label_micro_interleaved_index]: 4.77e-06 [label_fine_grained_interleaved_index]: 2.72001e-06 [merge_cast_opt]: 1.34998e-06 [slice_recompute_activation]: 2.29999e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.76e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.40002e-06 [reorder_send_recv_between_fp_bp]: 3.08998e-06 [comm_op_add_attrs]: 1.12999e-06 [add_comm_op_reuse_tag]: 1.42e-06 [interleave_split_concat_branches]: 1.25001e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.97001e-06 [control_data_broadcast_order]: 1.221e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 4e-06 [overlap_recompute_and_grad_model_parallel]: 5.05001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50999e-06 [overlap_recompute_comm]: 2.61e-06 [overlap_grad_ring_attention]: 4.48001e-06 [overlap_grad_flash_sp]: 1.721e-05 [begin_end_overlap_inline]: 5.40022e-07 [split_matmul_comm_elemetwise]: 2.32001e-06 [split_layernorm_comm]: 1.98997e-06 [handle_group_info]: 1.08001e-06 [symbol_engine_optimizer]: 7.089e-05, [1] [Cycle 1]: 6.658e-05, [6] [build]: 2.38998e-06 [elim_shapecalc]: 8.77999e-06 [elim_not_effective]: 1.191e-05 [opt_reshape]: 6.19999e-06 [fold_const_symbol]: 9.19e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.94e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 1.515e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.45998e-06 [opt_after_jit_grad]: 0.0004561 [validate]: 3.388e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.00600988 [execute]: 7.31001e-06 Sums bootstrap : 0.000470s : 2.89% type_inference : 0.005979s : 36.82% event_method : 0.000013s : 0.08% auto_monad : 0.000060s : 0.37% graph_reusing : 0.000006s : 0.04% inline : 0.000003s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.11% optimize.rewriter_before_opt_a : 0.000051s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000036s : 0.22% optimize.opt_a.loop_unroll : 0.000023s : 0.14% optimize.opt_a.a_1 : 0.000466s : 2.87% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000152s : 0.94% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000063s : 0.39% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000017s : 0.11% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000403s : 2.48% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000042s : 0.26% optimize.opt_a.a_3 : 0.000074s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000033s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000466s : 2.87% optimize.opt_b.b_1 : 0.000110s : 0.68% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.15% optimize.loop_unroll : 0.000418s : 2.57% optimize.opt_after_cconv.c_1 : 0.000026s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000456s : 2.81% validate : 0.000034s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006010s : 37.01% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000140 24 20.37% : 0.000029s : 4: substitution.arithmetic_simplify 1.37% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 3.94% : 0.000006s : 3: substitution.graph_param_transform 65.70% : 0.000092s : 3: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.27% : 0.000005s : 4: substitution.remove_not_recompute_node 2.12% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005865 2 92.22% : 0.005409s : 1: type_inference.infer 7.78% : 0.000456s : 1: type_inference.specialize ------[replace.] 0.000026 3 100.00% : 0.000026s : 3: replace.inline ------[match.] 0.000090 3 100.00% : 0.000090s : 3: match.inline ------[predicate.] 0.000148 815 0.87% : 0.000001s : 8: predicate.accumulaten_eliminater 0.92% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 8: predicate.addn_zero_filter 0.79% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.40% : 0.000004s : 14: predicate.arithmetic_simplify 0.88% : 0.000001s : 8: predicate.cast_eliminate 0.67% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.25% : 0.000000s : 3: predicate.const_output_eliminate 0.68% : 0.000001s : 6: predicate.depend_value_elim 0.82% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 11: predicate.environ_get_depend_swap 1.81% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 6: predicate.float_environ_get_switch 0.90% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.80% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.76% : 0.000001s : 6: predicate.incorporate_call 0.65% : 0.000001s : 6: predicate.incorporate_call_switch 6.31% : 0.000009s : 37: predicate.inline 1.00% : 0.000001s : 6: predicate.inline_without_move 0.56% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 6: predicate.less_batch_normalization 1.67% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 22: predicate.load_eliminater 1.03% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.98% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 6: predicate.merge_addn 0.65% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 8: predicate.minmaximum_grad 1.31% : 0.000002s : 3: predicate.mutable_eliminate 0.41% : 0.000001s : 3: predicate.opt_reshape 0.39% : 0.000001s : 3: predicate.parallel_virtual_node 1.56% : 0.000002s : 11: predicate.partial_defer_inline 1.29% : 0.000002s : 11: predicate.partial_eliminate 0.85% : 0.000001s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.15% : 0.000002s : 8: predicate.reduce_eliminate 2.20% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.62% : 0.000001s : 6: predicate.remove_not_recompute_node 1.21% : 0.000002s : 14: predicate.replace_applicator 0.66% : 0.000001s : 6: predicate.replace_old_param 0.30% : 0.000000s : 3: predicate.reset_defer_inline 0.88% : 0.000001s : 8: predicate.reshape_eliminate 0.67% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.93% : 0.000001s : 6: predicate.same_eliminate 0.47% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 6: predicate.shard_identity_eliminate 0.89% : 0.000001s : 6: predicate.special_op_eliminate 0.94% : 0.000001s : 6: predicate.specialize_transform 0.97% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.23% : 0.000002s : 11: predicate.switch_defer_inline 1.87% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.75% : 0.000007s : 38: predicate.switch_simplify 0.83% : 0.000001s : 8: predicate.tile_eliminate 0.84% : 0.000001s : 8: predicate.transpose_eliminate 1.52% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.69% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.43% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.81% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 3: predicate.value_based_eliminate 0.75% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000275 7 39.15% : 0.000108s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.85% : 0.000167s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028573 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.55% : 0.003015s : 1: add_attr 10.52% : 0.003006s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000049s : 1: add_recomputation 0.02% : 0.000005s : 1: assign_add_opt 0.23% : 0.000065s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.78% : 0.000507s : 1: bootstrap 0.10% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.49% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.66% : 0.000475s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.90% : 0.000829s : 78: opt.transform.opt_a 0.09% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000088s : 28: opt.transform.opt_b 0.15% : 0.000041s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.23% : 0.002066s : 1: opt_a 0.34% : 0.000098s : 1: opt_after_cconv 1.63% : 0.000465s : 1: opt_after_jit_grad 0.67% : 0.000190s : 1: opt_b 13.79% : 0.003939s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.02% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.08% : 0.000022s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.75% : 0.000214s : 1: renormalize.infer 0.64% : 0.000184s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000037s : 1: rewriter_after_opt_a 0.19% : 0.000055s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000074s : 1: symbol_engine_optimizer 21.07% : 0.006020s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 20.97% : 0.005993s : 1: type_inference 0.21% : 0.000061s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x7-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x7-kbk],max_mem:14.0M TotalTime = 0.0630808, [24] [bootstrap]: 0.00052729 [type_inference]: 0.00636128 [event_method]: 1.367e-05 [auto_monad]: 6.576e-05 [graph_reusing]: 5.26998e-06 [inline]: 1.92001e-06 [add_attr]: 0.00360425, [1] [add_attr_with_inline]: 0.00359453, [1] [Cycle 1]: 4.821e-05, [2] [tag_attr]: 1.516e-05 [meta_addattr_fg_expand]: 4.42998e-06 [parallel-infer-symbol]: 3.04001e-06 [pre_auto_parallel]: 2.509e-05 [insert-virtual-dataset]: 2.76999e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.56e-06 [pipeline_split]: 1.76998e-06 [optimize]: 0.00412784, [53] [py_interpret_to_execute]: 2.226e-05 [rewriter_before_opt_a]: 6.365e-05 [opt_a]: 0.00220192, [2] [Cycle 1]: 0.00158337, [45] [expand_dump_flag]: 2.79999e-06 [switch_simplify]: 3.392e-05 [loop_unroll]: 2.022e-05 [a_1]: 0.00044004 [with_stream_mark]: 1.381e-05 [recompute_prepare]: 7.73999e-06 [updatestate_depend_eliminate]: 3.75e-06 [updatestate_assign_eliminate]: 3.48999e-06 [updatestate_loads_eliminate]: 3.25002e-06 [parameter_eliminate]: 1.94999e-06 [a_2]: 7.928e-05 [accelerated_algorithm]: 6.44001e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 6.19001e-06 [merge_send_recv]: 8.79e-06 [auto_parallel]: 6.69001e-06 [parallel]: 2.533e-05 [flash_sp]: 7.38e-06 [merge_comm]: 4.01001e-06 [allreduce_fusion]: 3.45e-06 [matmul_add_comm_reduction]: 1.004e-05 [allreduce_slice_to_reducescatter]: 8.69972e-07 [virtual_shard_identity]: 7.58001e-06 [virtual_dataset]: 6.24001e-06 [get_grad_eliminate_]: 5.62001e-06 [virtual_output]: 5.72001e-06 [merge_forward]: 4.17e-06 [cell_reuse_recompute_pass]: 1.55999e-06 [offload_activation]: 1.004e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.153e-05 [merge_recompute_call_nodes]: 1.92999e-06 [before_grad]: 1e-05 [set_forward_comm_id_for_comm_node_pass]: 4.13999e-06 [meta_fg_expand]: 2.64999e-06 [flash_sp_send_recv_attached]: 2.40002e-06 [receive_attached]: 1.99e-06 [after_resolve]: 9.45001e-06 [a_after_grad]: 8.44998e-06 [renormalize]: 0.00046299 [add_forward_monad_depend]: 8.89998e-06 [auto_monad_grad]: 2.09e-06 [auto_monad_eliminator]: 1.399e-05 [cse]: 3.057e-05 [a_3]: 4.121e-05 [Cycle 2]: 0.00060907, [45] [expand_dump_flag]: 1.37e-06 [switch_simplify]: 7.15e-06 [loop_unroll]: 5.82999e-06 [a_1]: 0.00011462 [with_stream_mark]: 9.47001e-06 [recompute_prepare]: 6.01e-06 [updatestate_depend_eliminate]: 3.00002e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.31e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 7.08e-05 [accelerated_algorithm]: 5.85002e-06 [shard]: 8.99978e-07 [meta_shard_fg_expand]: 1.14003e-06 [shard_inline]: 5.51998e-06 [merge_send_recv]: 4.67998e-06 [auto_parallel]: 5.76003e-06 [parallel]: 4.07e-06 [flash_sp]: 4.15e-06 [merge_comm]: 3.27002e-06 [allreduce_fusion]: 2.94999e-06 [matmul_add_comm_reduction]: 5.02999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.93e-06 [virtual_dataset]: 5.98998e-06 [get_grad_eliminate_]: 5.32001e-06 [virtual_output]: 5.28002e-06 [merge_forward]: 2.77002e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 6.54999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.111e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.89998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 1.85001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 9.50007e-07 [after_resolve]: 7.92e-06 [a_after_grad]: 7.65e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 8.99978e-07 [auto_monad_eliminator]: 6.17999e-06 [cse]: 1.45e-05 [a_3]: 3.208e-05 [py_interpret_to_execute_after_opt_a]: 7.41999e-06 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 3.515e-05 [convert_after_rewriter]: 6.98e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.00045407 [opt_b]: 0.00018521, [1] [Cycle 1]: 0.00017877, [7] [b_1]: 0.00010825 [b_2]: 7.51999e-06 [updatestate_depend_eliminate]: 4.97999e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.19001e-06 [renormalize]: 4.09986e-07 [cse]: 1.778e-05 [optimize_parallel_all_gather_comm]: 1.684e-05 [overlap_param_gather]: 2.22999e-06 [cconv]: 2.441e-05 [loop_unroll]: 0.00041654 [opt_after_cconv]: 9.454e-05, [1] [Cycle 1]: 8.882e-05, [7] [c_1]: 2.542e-05 [parameter_eliminate]: 2.18998e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.40002e-06 [cse]: 1.73e-05 [renormalize]: 5.59987e-07 [remove_dup_value]: 1.479e-05 [tuple_transform]: 6.899e-05, [1] [Cycle 1]: 6.458e-05, [4] [d_1]: 3.71e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.56e-06 [partial_unused_args_eliminate]: 2.06998e-06 [add_recomputation]: 4.929e-05 [cse_after_recomputation]: 2.228e-05, [1] [Cycle 1]: 1.766e-05, [1] [cse]: 1.189e-05 [environ_conv]: 8.07998e-06 [swap_dp_allreduce_reducescatter]: 5.03002e-06 [bias_add_comm_swap]: 3.04999e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.19999e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.56e-06 [reorder_send_recv_between_fp_bp]: 3.01999e-06 [comm_op_add_attrs]: 1.15001e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.27e-06 [overlap_opt_shard_in_pipeline]: 1.23002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.277e-05 [grouped_pairwise_exchange_alltoall]: 1.56998e-06 [offloading_packed_experts]: 4.47e-06 [overlap_recompute_and_grad_model_parallel]: 4.89998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.28002e-06 [overlap_grad_ring_attention]: 4.39002e-06 [overlap_grad_flash_sp]: 1.8e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.87001e-06 [handle_group_info]: 1.28002e-06 [symbol_engine_optimizer]: 7.142e-05, [1] [Cycle 1]: 6.688e-05, [6] [build]: 2.79999e-06 [elim_shapecalc]: 9.02e-06 [elim_not_effective]: 1.197e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 9.10001e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.71e-06 [auto_monad_reorder]: 1.637e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.33e-06 [opt_after_jit_grad]: 0.0004572 [validate]: 3.269e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.047594 [execute]: 9.69e-06 Sums bootstrap : 0.000527s : 0.90% type_inference : 0.006361s : 10.89% event_method : 0.000014s : 0.02% auto_monad : 0.000066s : 0.11% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.04% optimize.rewriter_before_opt_a : 0.000064s : 0.11% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000041s : 0.07% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000555s : 0.95% optimize.opt_a.with_stream_mark : 0.000023s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000150s : 0.26% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.05% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000017s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.03% optimize.opt_a.renormalize : 0.000463s : 0.79% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.02% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000045s : 0.08% optimize.opt_a.a_3 : 0.000073s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000454s : 0.78% optimize.opt_b.b_1 : 0.000108s : 0.19% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.04% optimize.loop_unroll : 0.000417s : 0.71% optimize.opt_after_cconv.c_1 : 0.000025s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000037s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.08% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000457s : 0.78% validate : 0.000033s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.047594s : 81.47% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000166 26 18.85% : 0.000031s : 5: substitution.arithmetic_simplify 1.22% : 0.000002s : 2: substitution.elim_not_effective 0.84% : 0.000001s : 2: substitution.fold_const_symbol 3.55% : 0.000006s : 3: substitution.graph_param_transform 63.38% : 0.000105s : 3: substitution.inline 1.92% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.96% : 0.000005s : 4: substitution.remove_not_recompute_node 1.95% : 0.000003s : 2: substitution.replace_old_param 5.32% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006311 2 90.41% : 0.005706s : 1: type_inference.infer 9.59% : 0.000605s : 1: type_inference.specialize ------[replace.] 0.000035 4 78.73% : 0.000028s : 3: replace.inline 21.27% : 0.000007s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000111 4 92.68% : 0.000103s : 3: match.inline 7.32% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 883 0.95% : 0.000001s : 9: predicate.accumulaten_eliminater 0.83% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.93% : 0.000001s : 9: predicate.addn_zero_filter 0.85% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 15: predicate.arithmetic_simplify 0.91% : 0.000001s : 9: predicate.cast_eliminate 0.65% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.65% : 0.000001s : 6: predicate.depend_value_elim 0.90% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.93% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.00% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.27% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_depend_swap 1.79% : 0.000003s : 18: predicate.environ_get_eliminate 1.19% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.32% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.32% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.92% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.72% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.60% : 0.000001s : 6: predicate.incorporate_call_switch 6.33% : 0.000010s : 40: predicate.inline 0.94% : 0.000001s : 6: predicate.inline_without_move 0.39% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 6: predicate.less_batch_normalization 1.64% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 25: predicate.load_eliminater 0.94% : 0.000001s : 3: predicate.loop_unroll_after_grad 2.22% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.68% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 9: predicate.minmaximum_grad 1.02% : 0.000002s : 3: predicate.mutable_eliminate 0.34% : 0.000001s : 3: predicate.opt_reshape 0.36% : 0.000001s : 3: predicate.parallel_virtual_node 1.58% : 0.000002s : 13: predicate.partial_defer_inline 1.47% : 0.000002s : 13: predicate.partial_eliminate 0.91% : 0.000001s : 9: predicate.print_const_string_wrapper 0.61% : 0.000001s : 6: predicate.reduce_all_const_elim 1.17% : 0.000002s : 9: predicate.reduce_eliminate 2.45% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 6: predicate.remove_not_recompute_node 1.32% : 0.000002s : 16: predicate.replace_applicator 0.58% : 0.000001s : 6: predicate.replace_old_param 0.27% : 0.000000s : 3: predicate.reset_defer_inline 0.93% : 0.000001s : 9: predicate.reshape_eliminate 0.57% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.61% : 0.000001s : 3: predicate.row_tensor_eliminate 0.80% : 0.000001s : 6: predicate.same_eliminate 0.50% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 6: predicate.shard_identity_eliminate 0.71% : 0.000001s : 6: predicate.special_op_eliminate 0.83% : 0.000001s : 6: predicate.specialize_transform 0.91% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.70% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 13: predicate.switch_defer_inline 2.04% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 43: predicate.switch_simplify 0.90% : 0.000001s : 9: predicate.tile_eliminate 0.92% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.82% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.38% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.75% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000360 8 47.10% : 0.000170s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.90% : 0.000191s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.072340 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.99% : 0.003609s : 1: add_attr 4.97% : 0.003598s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000071s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.78% : 0.000567s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.59% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.64% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.28% : 0.000924s : 78: opt.transform.opt_a 0.03% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000088s : 28: opt.transform.opt_b 0.06% : 0.000041s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 3.05% : 0.002205s : 1: opt_a 0.14% : 0.000098s : 1: opt_after_cconv 0.64% : 0.000466s : 1: opt_after_jit_grad 0.26% : 0.000188s : 1: opt_b 5.71% : 0.004131s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.04% : 0.000026s : 1: py_interpret_to_execute 0.06% : 0.000046s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000018s : 1: remove_dup_value 0.33% : 0.000238s : 1: renormalize.infer 0.30% : 0.000218s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000039s : 1: rewriter_after_opt_a 0.09% : 0.000068s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.10% : 0.000074s : 1: symbol_engine_optimizer 65.82% : 0.047617s : 1: task_emit 0.10% : 0.000072s : 1: tuple_transform 8.81% : 0.006376s : 1: type_inference 0.08% : 0.000057s : 1: validate TotalTime = 0.0610332, [24] [bootstrap]: 0.00052242 [type_inference]: 0.00620347 [event_method]: 1.309e-05 [auto_monad]: 6.277e-05 [graph_reusing]: 5.89e-06 [inline]: 1.89e-06 [add_attr]: 0.00309685, [1] [add_attr_with_inline]: 0.00308871, [1] [Cycle 1]: 5.935e-05, [2] [tag_attr]: 1.5e-05 [meta_addattr_fg_expand]: 4.14002e-06 [parallel-infer-symbol]: 3.51001e-06 [pre_auto_parallel]: 2.668e-05 [insert-virtual-dataset]: 3.23e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.27001e-06 [pipeline_split]: 1.72999e-06 [optimize]: 0.00405664, [53] [py_interpret_to_execute]: 2.159e-05 [rewriter_before_opt_a]: 5.398e-05 [opt_a]: 0.00213318, [2] [Cycle 1]: 0.00152018, [45] [expand_dump_flag]: 2.96999e-06 [switch_simplify]: 2.813e-05 [loop_unroll]: 1.691e-05 [a_1]: 0.00037621 [with_stream_mark]: 1.701e-05 [recompute_prepare]: 7.73001e-06 [updatestate_depend_eliminate]: 4.05998e-06 [updatestate_assign_eliminate]: 3.68e-06 [updatestate_loads_eliminate]: 3.19001e-06 [parameter_eliminate]: 1.79e-06 [a_2]: 8.183e-05 [accelerated_algorithm]: 6.72002e-06 [shard]: 2.20002e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 6.16e-06 [merge_send_recv]: 8.57e-06 [auto_parallel]: 6.59999e-06 [parallel]: 1.82e-05 [flash_sp]: 7.97e-06 [merge_comm]: 4.03999e-06 [allreduce_fusion]: 3.86001e-06 [matmul_add_comm_reduction]: 9.89001e-06 [allreduce_slice_to_reducescatter]: 7.10017e-07 [virtual_shard_identity]: 7.73001e-06 [virtual_dataset]: 6.68998e-06 [get_grad_eliminate_]: 5.76e-06 [virtual_output]: 5.84999e-06 [merge_forward]: 3.90998e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 1.023e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.173e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 1.034e-05 [set_forward_comm_id_for_comm_node_pass]: 3.63999e-06 [meta_fg_expand]: 2.61999e-06 [flash_sp_send_recv_attached]: 2.45002e-06 [receive_attached]: 2.34001e-06 [after_resolve]: 9.74999e-06 [a_after_grad]: 8.63001e-06 [renormalize]: 0.00047447 [add_forward_monad_depend]: 4.95999e-06 [auto_monad_grad]: 2.26e-06 [auto_monad_eliminator]: 1.392e-05 [cse]: 3.046e-05 [a_3]: 4.211e-05 [Cycle 2]: 0.00060288, [45] [expand_dump_flag]: 8.29983e-07 [switch_simplify]: 6.83e-06 [loop_unroll]: 5.72999e-06 [a_1]: 0.000115 [with_stream_mark]: 1.212e-05 [recompute_prepare]: 5.93998e-06 [updatestate_depend_eliminate]: 3.06999e-06 [updatestate_assign_eliminate]: 2.38002e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 8.40024e-07 [a_2]: 7.099e-05 [accelerated_algorithm]: 5.63997e-06 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 4.58001e-06 [auto_parallel]: 5.44998e-06 [parallel]: 4.40999e-06 [flash_sp]: 3.61001e-06 [merge_comm]: 3.12002e-06 [allreduce_fusion]: 2.89001e-06 [matmul_add_comm_reduction]: 5.79999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 5.87999e-06 [virtual_dataset]: 5.45001e-06 [get_grad_eliminate_]: 5.22e-06 [virtual_output]: 5.14e-06 [merge_forward]: 2.86999e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 6.25002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.051e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.55999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.37002e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.04003e-06 [after_resolve]: 8.16002e-06 [a_after_grad]: 8.11002e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.30001e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.14001e-06 [cse]: 1.452e-05 [a_3]: 3.188e-05 [py_interpret_to_execute_after_opt_a]: 7.4e-06 [slice_cell_reuse_recomputed_activation]: 1.97001e-06 [rewriter_after_opt_a]: 3.344e-05 [convert_after_rewriter]: 6.79999e-06 [order_py_execute_after_rewriter]: 5.14e-06 [mutable_eliminate]: 0.0004852 [opt_b]: 0.00018898, [1] [Cycle 1]: 0.00018214, [7] [b_1]: 0.00011053 [b_2]: 7.97e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 2.60014e-07 [cse]: 1.802e-05 [optimize_parallel_all_gather_comm]: 1.551e-05 [overlap_param_gather]: 1.97001e-06 [cconv]: 2.286e-05 [loop_unroll]: 0.00043226 [opt_after_cconv]: 9.784e-05, [1] [Cycle 1]: 9.214e-05, [7] [c_1]: 2.591e-05 [parameter_eliminate]: 2.58003e-06 [updatestate_depend_eliminate]: 4.99003e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.805e-05 [renormalize]: 3.7998e-07 [remove_dup_value]: 1.598e-05 [tuple_transform]: 6.82e-05, [1] [Cycle 1]: 6.363e-05, [4] [d_1]: 3.705e-05 [none_parameter_eliminate]: 1.56998e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.58e-06 [partial_unused_args_eliminate]: 1.61998e-06 [add_recomputation]: 4.578e-05 [cse_after_recomputation]: 2.103e-05, [1] [Cycle 1]: 1.652e-05, [1] [cse]: 1.123e-05 [environ_conv]: 5.28002e-06 [swap_dp_allreduce_reducescatter]: 5.36998e-06 [bias_add_comm_swap]: 2.92002e-06 [label_micro_interleaved_index]: 4.36002e-06 [label_fine_grained_interleaved_index]: 2.80002e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.19999e-06 [micro_interleaved_order_control]: 2.58e-06 [assign_add_opt]: 1.54e-06 [ForceFp32Comm]: 1.10001e-06 [remove_cast_before_assign_add]: 1.20999e-06 [full_micro_interleaved_order_control]: 2.51e-06 [reorder_send_recv_between_fp_bp]: 3.03e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.39e-06 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.34998e-06 [overlap_opt_shard_grad_in_pipeline]: 2.19999e-06 [control_data_broadcast_order]: 1.241e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.95e-06 [overlap_recompute_and_grad_model_parallel]: 5.51998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.55999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.52001e-06 [overlap_grad_ring_attention]: 4.42e-06 [overlap_grad_flash_sp]: 1.909e-05 [begin_end_overlap_inline]: 5.89993e-07 [split_matmul_comm_elemetwise]: 2.37999e-06 [split_layernorm_comm]: 1.78002e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 7.219e-05, [1] [Cycle 1]: 6.783e-05, [6] [build]: 2.78998e-06 [elim_shapecalc]: 9.24e-06 [elim_not_effective]: 1.194e-05 [opt_reshape]: 6.61e-06 [fold_const_symbol]: 9.31e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.86e-06 [pipeline_parallel_scheduler]: 1.58002e-06 [auto_monad_reorder]: 1.647e-05 [get_jit_bprop_graph]: 1.04003e-06 [rewriter_after_jit_bprop_graph]: 3.82002e-06 [opt_after_jit_grad]: 0.00048851 [validate]: 3.729e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0462398 [execute]: 9.76e-06 Sums bootstrap : 0.000522s : 0.92% type_inference : 0.006203s : 10.90% event_method : 0.000013s : 0.02% auto_monad : 0.000063s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000027s : 0.05% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.04% optimize.rewriter_before_opt_a : 0.000054s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000035s : 0.06% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000491s : 0.86% optimize.opt_a.with_stream_mark : 0.000029s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000153s : 0.27% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.04% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000475s : 0.83% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000045s : 0.08% optimize.opt_a.a_3 : 0.000074s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000485s : 0.85% optimize.opt_b.b_1 : 0.000111s : 0.19% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.04% optimize.loop_unroll : 0.000432s : 0.76% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.03% optimize.tuple_transform.d_1 : 0.000037s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000019s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000489s : 0.86% validate : 0.000037s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.046240s : 81.26% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000149 24 20.67% : 0.000031s : 4: substitution.arithmetic_simplify 1.26% : 0.000002s : 2: substitution.elim_not_effective 0.91% : 0.000001s : 2: substitution.fold_const_symbol 3.65% : 0.000005s : 3: substitution.graph_param_transform 66.26% : 0.000099s : 3: substitution.inline 2.13% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.04% : 0.000005s : 4: substitution.remove_not_recompute_node 2.07% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.006157 2 91.88% : 0.005657s : 1: type_inference.infer 8.12% : 0.000500s : 1: type_inference.specialize ------[replace.] 0.000042 3 100.00% : 0.000042s : 3: replace.inline ------[match.] 0.000097 3 100.00% : 0.000097s : 3: match.inline ------[predicate.] 0.000148 815 0.91% : 0.000001s : 8: predicate.accumulaten_eliminater 1.04% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.89% : 0.000001s : 8: predicate.addn_zero_filter 0.80% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.65% : 0.000004s : 14: predicate.arithmetic_simplify 0.84% : 0.000001s : 8: predicate.cast_eliminate 0.70% : 0.000001s : 6: predicate.check_bprop_eliminate 0.64% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.64% : 0.000001s : 6: predicate.depend_value_elim 0.82% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.45% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_depend_swap 1.77% : 0.000003s : 17: predicate.environ_get_eliminate 1.16% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.15% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.07% : 0.000003s : 11: predicate.float_depend_g_call 0.60% : 0.000001s : 6: predicate.float_environ_get_switch 0.88% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.30% : 0.000000s : 3: predicate.graph_param_transform 0.73% : 0.000001s : 6: predicate.incorporate_call 0.64% : 0.000001s : 6: predicate.incorporate_call_switch 6.25% : 0.000009s : 37: predicate.inline 0.95% : 0.000001s : 6: predicate.inline_without_move 0.43% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 6: predicate.less_batch_normalization 1.48% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 22: predicate.load_eliminater 1.19% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.98% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.66% : 0.000001s : 6: predicate.merge_addn 0.65% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 8: predicate.minmaximum_grad 1.23% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.47% : 0.000002s : 11: predicate.partial_defer_inline 1.30% : 0.000002s : 11: predicate.partial_eliminate 0.85% : 0.000001s : 8: predicate.print_const_string_wrapper 0.73% : 0.000001s : 6: predicate.reduce_all_const_elim 1.16% : 0.000002s : 8: predicate.reduce_eliminate 2.17% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 6: predicate.remove_not_recompute_node 1.22% : 0.000002s : 14: predicate.replace_applicator 0.87% : 0.000001s : 6: predicate.replace_old_param 0.24% : 0.000000s : 3: predicate.reset_defer_inline 0.95% : 0.000001s : 8: predicate.reshape_eliminate 0.64% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 3: predicate.row_tensor_eliminate 0.96% : 0.000001s : 6: predicate.same_eliminate 0.54% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 6: predicate.shard_identity_eliminate 0.81% : 0.000001s : 6: predicate.special_op_eliminate 0.87% : 0.000001s : 6: predicate.specialize_transform 1.04% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.27% : 0.000002s : 11: predicate.switch_defer_inline 1.89% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.85% : 0.000007s : 38: predicate.switch_simplify 0.92% : 0.000001s : 8: predicate.tile_eliminate 0.85% : 0.000001s : 8: predicate.transpose_eliminate 1.63% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.16% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.66% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 2.98% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.75% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000328 7 37.98% : 0.000125s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.02% : 0.000204s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.069661 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.45% : 0.003101s : 1: add_attr 4.44% : 0.003093s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000068s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.80% : 0.000561s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.63% : 0.000441s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.71% : 0.000494s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.23% : 0.000854s : 78: opt.transform.opt_a 0.04% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000042s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 3.07% : 0.002136s : 1: opt_a 0.15% : 0.000101s : 1: opt_after_cconv 0.72% : 0.000498s : 1: opt_after_jit_grad 0.28% : 0.000192s : 1: opt_b 5.83% : 0.004061s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000031s : 1: pre_auto_parallel 0.04% : 0.000025s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000020s : 1: remove_dup_value 0.37% : 0.000258s : 1: renormalize.infer 0.30% : 0.000209s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000037s : 1: rewriter_after_opt_a 0.08% : 0.000058s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000075s : 1: symbol_engine_optimizer 66.41% : 0.046264s : 1: task_emit 0.10% : 0.000071s : 1: tuple_transform 8.93% : 0.006222s : 1: type_inference 0.09% : 0.000064s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x7-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x7-ge],max_mem:14.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x8-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x8-pynative],max_mem:14.0M TotalTime = 0.0218495, [24] [bootstrap]: 0.00056019 [type_inference]: 0.00644681 [event_method]: 1.399e-05 [auto_monad]: 6.055e-05 [graph_reusing]: 5.34998e-06 [inline]: 1.76e-06 [add_attr]: 0.00356057, [1] [add_attr_with_inline]: 0.0035503, [1] [Cycle 1]: 4.862e-05, [2] [tag_attr]: 1.449e-05 [meta_addattr_fg_expand]: 4.15e-06 [parallel-infer-symbol]: 2.99001e-06 [pre_auto_parallel]: 2.639e-05 [insert-virtual-dataset]: 2.51e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.67001e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.00413249, [53] [py_interpret_to_execute]: 2.278e-05 [rewriter_before_opt_a]: 6.359e-05 [opt_a]: 0.00219731, [2] [Cycle 1]: 0.00158274, [45] [expand_dump_flag]: 3.23e-06 [switch_simplify]: 3.279e-05 [loop_unroll]: 2.042e-05 [a_1]: 0.00044503 [with_stream_mark]: 1.489e-05 [recompute_prepare]: 8.16002e-06 [updatestate_depend_eliminate]: 3.8e-06 [updatestate_assign_eliminate]: 3.6e-06 [updatestate_loads_eliminate]: 3.38e-06 [parameter_eliminate]: 2.12999e-06 [a_2]: 7.878e-05 [accelerated_algorithm]: 7.15e-06 [shard]: 2.20002e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 6.16e-06 [merge_send_recv]: 8.95999e-06 [auto_parallel]: 6.51e-06 [parallel]: 2.522e-05 [flash_sp]: 8.08001e-06 [merge_comm]: 4.02e-06 [allreduce_fusion]: 3.8e-06 [matmul_add_comm_reduction]: 1.039e-05 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7.65e-06 [virtual_dataset]: 6.09999e-06 [get_grad_eliminate_]: 5.43002e-06 [virtual_output]: 5.66e-06 [merge_forward]: 4.36002e-06 [cell_reuse_recompute_pass]: 1.66e-06 [offload_activation]: 9.52999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.151e-05 [merge_recompute_call_nodes]: 1.71002e-06 [before_grad]: 1.015e-05 [set_forward_comm_id_for_comm_node_pass]: 3.68999e-06 [meta_fg_expand]: 2.80002e-06 [flash_sp_send_recv_attached]: 2.57001e-06 [receive_attached]: 2.33998e-06 [after_resolve]: 9.75002e-06 [a_after_grad]: 8.80001e-06 [renormalize]: 0.00045408 [add_forward_monad_depend]: 9.89999e-06 [auto_monad_grad]: 2.24999e-06 [auto_monad_eliminator]: 1.313e-05 [cse]: 2.998e-05 [a_3]: 4.165e-05 [Cycle 2]: 0.00060532, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 7.26001e-06 [loop_unroll]: 5.82999e-06 [a_1]: 0.00011506 [with_stream_mark]: 1.063e-05 [recompute_prepare]: 5.91998e-06 [updatestate_depend_eliminate]: 3.12002e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.90002e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 7.099e-05 [accelerated_algorithm]: 5.82001e-06 [shard]: 1.19003e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.84999e-06 [merge_send_recv]: 4.57e-06 [auto_parallel]: 5.39e-06 [parallel]: 4.34997e-06 [flash_sp]: 3.76001e-06 [merge_comm]: 3.31001e-06 [allreduce_fusion]: 3.51999e-06 [matmul_add_comm_reduction]: 5.03002e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.28e-06 [virtual_dataset]: 5.47999e-06 [get_grad_eliminate_]: 5.27001e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.79001e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 5.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.87999e-06 [merge_recompute_call_nodes]: 8.2e-07 [before_grad]: 8.65001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 1.76e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.09002e-06 [a_after_grad]: 8.27e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09003e-06 [auto_monad_grad]: 1.00999e-06 [auto_monad_eliminator]: 6.56e-06 [cse]: 1.461e-05 [a_3]: 3.266e-05 [py_interpret_to_execute_after_opt_a]: 8.32998e-06 [slice_cell_reuse_recomputed_activation]: 2.35002e-06 [rewriter_after_opt_a]: 3.359e-05 [convert_after_rewriter]: 7.06999e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00046275 [opt_b]: 0.00018915, [1] [Cycle 1]: 0.00018318, [7] [b_1]: 0.0001118 [b_2]: 7.59002e-06 [updatestate_depend_eliminate]: 5.53002e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.36998e-06 [renormalize]: 4.2998e-07 [cse]: 1.796e-05 [optimize_parallel_all_gather_comm]: 1.649e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.285e-05 [loop_unroll]: 0.0004204 [opt_after_cconv]: 0.00011937, [1] [Cycle 1]: 0.00011385, [7] [c_1]: 2.596e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 2.75002e-06 [updatestate_loads_eliminate]: 2.38998e-06 [cse]: 3.902e-05 [renormalize]: 5.19998e-07 [remove_dup_value]: 1.626e-05 [tuple_transform]: 7.02e-05, [1] [Cycle 1]: 6.533e-05, [4] [d_1]: 3.789e-05 [none_parameter_eliminate]: 1.89999e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 1.97999e-06 [add_recomputation]: 5.099e-05 [cse_after_recomputation]: 2.204e-05, [1] [Cycle 1]: 1.695e-05, [1] [cse]: 1.153e-05 [environ_conv]: 7.82e-06 [swap_dp_allreduce_reducescatter]: 5.19998e-06 [bias_add_comm_swap]: 2.68998e-06 [label_micro_interleaved_index]: 4.82e-06 [label_fine_grained_interleaved_index]: 2.87002e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.32001e-06 [micro_interleaved_order_control]: 2.51e-06 [assign_add_opt]: 1.29003e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.24999e-06 [reorder_send_recv_between_fp_bp]: 2.87002e-06 [comm_op_add_attrs]: 1.10999e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.45001e-06 [overlap_opt_shard_in_pipeline]: 1.40001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.97999e-06 [control_data_broadcast_order]: 1.3e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 3.8e-06 [overlap_recompute_and_grad_model_parallel]: 4.73001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.26002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47001e-06 [overlap_recompute_comm]: 2.91999e-06 [overlap_grad_ring_attention]: 4.03001e-06 [overlap_grad_flash_sp]: 1.787e-05 [begin_end_overlap_inline]: 7.2e-07 [split_matmul_comm_elemetwise]: 2.56998e-06 [split_layernorm_comm]: 1.83002e-06 [handle_group_info]: 1.32999e-06 [symbol_engine_optimizer]: 7.358e-05, [1] [Cycle 1]: 6.91e-05, [6] [build]: 3.03e-06 [elim_shapecalc]: 8.85999e-06 [elim_not_effective]: 1.261e-05 [opt_reshape]: 6.42001e-06 [fold_const_symbol]: 9.51e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.83002e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 1.677e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.79002e-06 [opt_after_jit_grad]: 0.00046087 [validate]: 3.583e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00629306 [execute]: 7.62002e-06 Sums bootstrap : 0.000560s : 3.24% type_inference : 0.006447s : 37.31% event_method : 0.000014s : 0.08% auto_monad : 0.000061s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000023s : 0.13% optimize.rewriter_before_opt_a : 0.000064s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000560s : 3.24% optimize.opt_a.with_stream_mark : 0.000026s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000150s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000030s : 0.17% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000454s : 2.63% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.06% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000045s : 0.26% optimize.opt_a.a_3 : 0.000074s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000463s : 2.68% optimize.opt_b.b_1 : 0.000112s : 0.65% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000420s : 2.43% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000039s : 0.23% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.30% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000008s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000461s : 2.67% validate : 0.000036s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006293s : 36.42% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000169 26 19.05% : 0.000032s : 5: substitution.arithmetic_simplify 1.20% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 3.25% : 0.000006s : 3: substitution.graph_param_transform 63.96% : 0.000108s : 3: substitution.inline 2.04% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.82% : 0.000005s : 4: substitution.remove_not_recompute_node 1.88% : 0.000003s : 2: substitution.replace_old_param 5.00% : 0.000008s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006397 2 89.33% : 0.005714s : 1: type_inference.infer 10.67% : 0.000683s : 1: type_inference.specialize ------[replace.] 0.000037 4 79.11% : 0.000030s : 3: replace.inline 20.89% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 4 93.25% : 0.000106s : 3: match.inline 6.75% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 883 1.09% : 0.000002s : 9: predicate.accumulaten_eliminater 0.83% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 6: predicate.addn_check_dump 0.92% : 0.000001s : 9: predicate.addn_zero_filter 0.83% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.12% : 0.000003s : 15: predicate.arithmetic_simplify 1.02% : 0.000002s : 9: predicate.cast_eliminate 0.65% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.62% : 0.000001s : 6: predicate.depend_value_elim 0.91% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.93% : 0.000001s : 9: predicate.dict_set_item_eliminator 0.96% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.18% : 0.000002s : 12: predicate.environ_get_depend_swap 1.86% : 0.000003s : 18: predicate.environ_get_eliminate 1.13% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.34% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.54% : 0.000004s : 13: predicate.float_depend_g_call 0.55% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 3: predicate.fold_const_symbol 0.72% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.65% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.33% : 0.000010s : 40: predicate.inline 0.94% : 0.000002s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 6: predicate.less_batch_normalization 1.65% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 25: predicate.load_eliminater 0.91% : 0.000001s : 3: predicate.loop_unroll_after_grad 2.12% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.61% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.59% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 9: predicate.minmaximum_grad 1.03% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.39% : 0.000001s : 3: predicate.parallel_virtual_node 1.58% : 0.000003s : 13: predicate.partial_defer_inline 1.45% : 0.000002s : 13: predicate.partial_eliminate 0.97% : 0.000002s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 6: predicate.reduce_all_const_elim 1.33% : 0.000002s : 9: predicate.reduce_eliminate 2.41% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.33% : 0.000002s : 16: predicate.replace_applicator 0.59% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.92% : 0.000001s : 9: predicate.reshape_eliminate 0.58% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.78% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 6: predicate.shard_identity_eliminate 0.71% : 0.000001s : 6: predicate.special_op_eliminate 0.83% : 0.000001s : 6: predicate.specialize_transform 0.94% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 13: predicate.switch_defer_inline 2.04% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.00% : 0.000008s : 43: predicate.switch_simplify 0.92% : 0.000001s : 9: predicate.tile_eliminate 0.89% : 0.000001s : 9: predicate.transpose_eliminate 1.61% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.50% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.19% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 3: predicate.value_based_eliminate 0.71% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000387 8 45.95% : 0.000178s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.05% : 0.000209s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031075 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.47% : 0.003565s : 1: add_attr 11.44% : 0.003554s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000066s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.93% : 0.000600s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000011s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.38% : 0.000429s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.52% : 0.000471s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.99% : 0.000930s : 78: opt.transform.opt_a 0.08% : 0.000024s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.08% : 0.002200s : 1: opt_a 0.40% : 0.000123s : 1: opt_after_cconv 1.52% : 0.000471s : 1: opt_after_jit_grad 0.62% : 0.000193s : 1: opt_b 13.31% : 0.004136s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.09% : 0.000027s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000020s : 1: remove_dup_value 0.76% : 0.000236s : 1: renormalize.infer 0.68% : 0.000212s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000037s : 1: rewriter_after_opt_a 0.22% : 0.000068s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000076s : 1: symbol_engine_optimizer 20.28% : 0.006303s : 1: task_emit 0.24% : 0.000073s : 1: tuple_transform 20.80% : 0.006462s : 1: type_inference 0.21% : 0.000064s : 1: validate TotalTime = 0.0203384, [24] [bootstrap]: 0.00049328 [type_inference]: 0.00593688 [event_method]: 1.202e-05 [auto_monad]: 6.08e-05 [graph_reusing]: 5.89e-06 [inline]: 2.01e-06 [add_attr]: 0.00304424, [1] [add_attr_with_inline]: 0.00303656, [1] [Cycle 1]: 4.552e-05, [2] [tag_attr]: 1.346e-05 [meta_addattr_fg_expand]: 4.13001e-06 [parallel-infer-symbol]: 3.46999e-06 [pre_auto_parallel]: 2.367e-05 [insert-virtual-dataset]: 2.63e-06 [parallel-infer-symbol-second]: 9.10019e-07 [dataset_repeat_opt]: 2.24001e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.00393705, [53] [py_interpret_to_execute]: 1.999e-05 [rewriter_before_opt_a]: 5.112e-05 [opt_a]: 0.00204619, [2] [Cycle 1]: 0.00143241, [45] [expand_dump_flag]: 2.81e-06 [switch_simplify]: 2.814e-05 [loop_unroll]: 1.73e-05 [a_1]: 0.0003592 [with_stream_mark]: 1.479e-05 [recompute_prepare]: 7.83001e-06 [updatestate_depend_eliminate]: 3.73999e-06 [updatestate_assign_eliminate]: 3.27002e-06 [updatestate_loads_eliminate]: 3.35e-06 [parameter_eliminate]: 2.39999e-06 [a_2]: 8.077e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 2.02001e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 8.22e-06 [auto_parallel]: 6.66e-06 [parallel]: 1.949e-05 [flash_sp]: 7.46999e-06 [merge_comm]: 3.69002e-06 [allreduce_fusion]: 3.43e-06 [matmul_add_comm_reduction]: 9.72999e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 8e-06 [virtual_dataset]: 5.88998e-06 [get_grad_eliminate_]: 5.50001e-06 [virtual_output]: 5.69999e-06 [merge_forward]: 4.25e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.173e-05 [merge_recompute_call_nodes]: 1.57999e-06 [before_grad]: 9.86e-06 [set_forward_comm_id_for_comm_node_pass]: 3.59002e-06 [meta_fg_expand]: 2.72001e-06 [flash_sp_send_recv_attached]: 2.61999e-06 [receive_attached]: 2.11e-06 [after_resolve]: 9.96e-06 [a_after_grad]: 8.89998e-06 [renormalize]: 0.00039271 [add_forward_monad_depend]: 4.68999e-06 [auto_monad_grad]: 2.01e-06 [auto_monad_eliminator]: 1.253e-05 [cse]: 5.157e-05 [a_3]: 4.18e-05 [Cycle 2]: 0.00060421, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.98998e-06 [loop_unroll]: 5.59e-06 [a_1]: 0.00011442 [with_stream_mark]: 1.016e-05 [recompute_prepare]: 5.91e-06 [updatestate_depend_eliminate]: 3.11001e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.77002e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 7.109e-05 [accelerated_algorithm]: 5.87999e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.24e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 4.77e-06 [auto_parallel]: 5.32001e-06 [parallel]: 4.63999e-06 [flash_sp]: 3.61001e-06 [merge_comm]: 3.65e-06 [allreduce_fusion]: 2.84999e-06 [matmul_add_comm_reduction]: 5.34e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.33e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 5.28002e-06 [virtual_output]: 5.12e-06 [merge_forward]: 2.69999e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 5.97001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.052e-05 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 8.89e-06 [set_forward_comm_id_for_comm_node_pass]: 3.61999e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 9.09989e-07 [after_resolve]: 8.67e-06 [a_after_grad]: 7.91001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 6.44001e-06 [cse]: 1.368e-05 [a_3]: 3.316e-05 [py_interpret_to_execute_after_opt_a]: 8.15999e-06 [slice_cell_reuse_recomputed_activation]: 2.94001e-06 [rewriter_after_opt_a]: 3.492e-05 [convert_after_rewriter]: 6.63e-06 [order_py_execute_after_rewriter]: 5.19998e-06 [mutable_eliminate]: 0.00046669 [opt_b]: 0.00018739, [1] [Cycle 1]: 0.00018065, [7] [b_1]: 0.0001098 [b_2]: 7.05e-06 [updatestate_depend_eliminate]: 5.35001e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.41e-06 [renormalize]: 3.19997e-07 [cse]: 1.757e-05 [optimize_parallel_all_gather_comm]: 1.632e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.215e-05 [loop_unroll]: 0.00042479 [opt_after_cconv]: 9.744e-05, [1] [Cycle 1]: 9.166e-05, [7] [c_1]: 2.633e-05 [parameter_eliminate]: 2.46e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.744e-05 [renormalize]: 6.30011e-07 [remove_dup_value]: 1.522e-05 [tuple_transform]: 7.026e-05, [1] [Cycle 1]: 6.536e-05, [4] [d_1]: 3.801e-05 [none_parameter_eliminate]: 1.61998e-06 [renormalize]: 4.30009e-07 [switch_simplify]: 6.37001e-06 [partial_unused_args_eliminate]: 2.07999e-06 [add_recomputation]: 4.536e-05 [cse_after_recomputation]: 2.209e-05, [1] [Cycle 1]: 1.769e-05, [1] [cse]: 1.222e-05 [environ_conv]: 5.39998e-06 [swap_dp_allreduce_reducescatter]: 5.05001e-06 [bias_add_comm_swap]: 2.84001e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.73998e-06 [merge_cast_opt]: 1.39998e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.08002e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.26002e-06 [full_micro_interleaved_order_control]: 2.59001e-06 [reorder_send_recv_between_fp_bp]: 2.96001e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 1.27999e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67001e-06 [control_data_broadcast_order]: 1.279e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.92002e-06 [overlap_recompute_and_grad_model_parallel]: 5.00001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52001e-06 [overlap_recompute_comm]: 2.35002e-06 [overlap_grad_ring_attention]: 4.18999e-06 [overlap_grad_flash_sp]: 1.751e-05 [begin_end_overlap_inline]: 5.89993e-07 [split_matmul_comm_elemetwise]: 2.32999e-06 [split_layernorm_comm]: 1.99e-06 [handle_group_info]: 1.04998e-06 [symbol_engine_optimizer]: 7.251e-05, [1] [Cycle 1]: 6.808e-05, [6] [build]: 2.59999e-06 [elim_shapecalc]: 8.70999e-06 [elim_not_effective]: 1.216e-05 [opt_reshape]: 6.51999e-06 [fold_const_symbol]: 9.36e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.98002e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 1.646e-05 [get_jit_bprop_graph]: 1.14e-06 [rewriter_after_jit_bprop_graph]: 3.56001e-06 [opt_after_jit_grad]: 0.00046284 [validate]: 3.428e-05 [backend_pass]: 1.03001e-06 [task_emit]: 0.00607329 [execute]: 8.38001e-06 Sums bootstrap : 0.000493s : 3.03% type_inference : 0.005937s : 36.45% event_method : 0.000012s : 0.07% auto_monad : 0.000061s : 0.37% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000051s : 0.31% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000035s : 0.22% optimize.opt_a.loop_unroll : 0.000023s : 0.14% optimize.opt_a.a_1 : 0.000474s : 2.91% optimize.opt_a.with_stream_mark : 0.000025s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000152s : 0.93% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000024s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000393s : 2.41% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000065s : 0.40% optimize.opt_a.a_3 : 0.000075s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.02% optimize.rewriter_after_opt_a : 0.000035s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000467s : 2.87% optimize.opt_b.b_1 : 0.000110s : 0.67% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000425s : 2.61% optimize.opt_after_cconv.c_1 : 0.000026s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000463s : 2.84% validate : 0.000034s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006073s : 37.29% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000144 24 20.56% : 0.000030s : 4: substitution.arithmetic_simplify 1.37% : 0.000002s : 2: substitution.elim_not_effective 0.92% : 0.000001s : 2: substitution.fold_const_symbol 3.95% : 0.000006s : 3: substitution.graph_param_transform 65.89% : 0.000095s : 3: substitution.inline 2.22% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.15% : 0.000005s : 4: substitution.remove_not_recompute_node 1.95% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005894 2 92.17% : 0.005432s : 1: type_inference.infer 7.83% : 0.000462s : 1: type_inference.specialize ------[replace.] 0.000029 3 100.00% : 0.000029s : 3: replace.inline ------[match.] 0.000093 3 100.00% : 0.000093s : 3: match.inline ------[predicate.] 0.000146 815 0.92% : 0.000001s : 8: predicate.accumulaten_eliminater 0.90% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.85% : 0.000001s : 8: predicate.addn_zero_filter 0.78% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.30% : 0.000003s : 14: predicate.arithmetic_simplify 0.87% : 0.000001s : 8: predicate.cast_eliminate 0.69% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.69% : 0.000001s : 6: predicate.depend_value_elim 0.82% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.06% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 3: predicate.elim_not_effective 0.45% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_depend_swap 1.80% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.17% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 6: predicate.float_environ_get_switch 0.92% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 3: predicate.fold_const_symbol 0.77% : 0.000001s : 6: predicate.get_grad_eliminate 0.31% : 0.000000s : 3: predicate.graph_param_transform 0.72% : 0.000001s : 6: predicate.incorporate_call 0.63% : 0.000001s : 6: predicate.incorporate_call_switch 6.32% : 0.000009s : 37: predicate.inline 1.04% : 0.000002s : 6: predicate.inline_without_move 0.45% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 6: predicate.less_batch_normalization 1.72% : 0.000003s : 14: predicate.list_to_tuple_eliminator_ 2.34% : 0.000003s : 22: predicate.load_eliminater 0.98% : 0.000001s : 3: predicate.loop_unroll_after_grad 2.08% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000003s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.67% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.87% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 8: predicate.minmaximum_grad 1.05% : 0.000002s : 3: predicate.mutable_eliminate 0.40% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.43% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 11: predicate.partial_eliminate 0.87% : 0.000001s : 8: predicate.print_const_string_wrapper 0.78% : 0.000001s : 6: predicate.reduce_all_const_elim 1.18% : 0.000002s : 8: predicate.reduce_eliminate 2.20% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 6: predicate.remove_not_recompute_node 1.23% : 0.000002s : 14: predicate.replace_applicator 0.68% : 0.000001s : 6: predicate.replace_old_param 0.25% : 0.000000s : 3: predicate.reset_defer_inline 1.05% : 0.000002s : 8: predicate.reshape_eliminate 0.75% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 3: predicate.row_tensor_eliminate 0.87% : 0.000001s : 6: predicate.same_eliminate 0.52% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 6: predicate.shard_identity_eliminate 0.77% : 0.000001s : 6: predicate.special_op_eliminate 0.88% : 0.000001s : 6: predicate.specialize_transform 1.07% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.24% : 0.000002s : 11: predicate.switch_defer_inline 1.91% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.85% : 0.000007s : 38: predicate.switch_simplify 0.85% : 0.000001s : 8: predicate.tile_eliminate 0.81% : 0.000001s : 8: predicate.transpose_eliminate 1.53% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.50% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.08% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.75% : 0.000003s : 14: predicate.tuple_to_list_eliminator_ 2.15% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 3: predicate.value_based_eliminate 0.67% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 6: predicate.virtual_output_eliminate 0.33% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000277 7 38.45% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.55% : 0.000171s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028702 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.62% : 0.003049s : 1: add_attr 10.59% : 0.003040s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000066s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.85% : 0.000532s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.51% : 0.000434s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.66% : 0.000476s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.91% : 0.000836s : 78: opt.transform.opt_a 0.09% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000088s : 28: opt.transform.opt_b 0.15% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.14% : 0.002049s : 1: opt_a 0.35% : 0.000101s : 1: opt_after_cconv 1.65% : 0.000473s : 1: opt_after_jit_grad 0.66% : 0.000191s : 1: opt_b 13.73% : 0.003941s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000019s : 1: remove_dup_value 0.72% : 0.000207s : 1: renormalize.infer 0.62% : 0.000178s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000039s : 1: rewriter_after_opt_a 0.19% : 0.000055s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000075s : 1: symbol_engine_optimizer 21.20% : 0.006085s : 1: task_emit 0.26% : 0.000073s : 1: tuple_transform 20.74% : 0.005952s : 1: type_inference 0.22% : 0.000062s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x8-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x8-kbk],max_mem:14.0M TotalTime = 0.07559, [24] [bootstrap]: 0.00058841 [type_inference]: 0.00657865 [event_method]: 1.384e-05 [auto_monad]: 5.954e-05 [graph_reusing]: 5.20001e-06 [inline]: 1.83002e-06 [add_attr]: 0.00362289, [1] [add_attr_with_inline]: 0.00361222, [1] [Cycle 1]: 4.816e-05, [2] [tag_attr]: 1.443e-05 [meta_addattr_fg_expand]: 4.30999e-06 [parallel-infer-symbol]: 3.05998e-06 [pre_auto_parallel]: 2.582e-05 [insert-virtual-dataset]: 2.59999e-06 [parallel-infer-symbol-second]: 8.60018e-07 [dataset_repeat_opt]: 2.26e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.00419103, [53] [py_interpret_to_execute]: 2.079e-05 [rewriter_before_opt_a]: 6.431e-05 [opt_a]: 0.00222759, [2] [Cycle 1]: 0.00161081, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 3.379e-05 [loop_unroll]: 2.103e-05 [a_1]: 0.00044364 [with_stream_mark]: 1.403e-05 [recompute_prepare]: 8.03999e-06 [updatestate_depend_eliminate]: 3.70998e-06 [updatestate_assign_eliminate]: 3.21001e-06 [updatestate_loads_eliminate]: 3.23e-06 [parameter_eliminate]: 2.06998e-06 [a_2]: 8.274e-05 [accelerated_algorithm]: 8.28999e-06 [shard]: 2.52001e-06 [meta_shard_fg_expand]: 1.77999e-06 [shard_inline]: 6.36998e-06 [merge_send_recv]: 8.91002e-06 [auto_parallel]: 6.50002e-06 [parallel]: 2.581e-05 [flash_sp]: 7.61001e-06 [merge_comm]: 3.77998e-06 [allreduce_fusion]: 3.95e-06 [matmul_add_comm_reduction]: 9.02e-06 [allreduce_slice_to_reducescatter]: 7.30011e-07 [virtual_shard_identity]: 8.26002e-06 [virtual_dataset]: 6.39001e-06 [get_grad_eliminate_]: 5.71998e-06 [virtual_output]: 5.62999e-06 [merge_forward]: 4.37998e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 9.94001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.165e-05 [merge_recompute_call_nodes]: 1.82999e-06 [before_grad]: 1.011e-05 [set_forward_comm_id_for_comm_node_pass]: 3.92998e-06 [meta_fg_expand]: 2.82002e-06 [flash_sp_send_recv_attached]: 2.56e-06 [receive_attached]: 2.13002e-06 [after_resolve]: 1.004e-05 [a_after_grad]: 8.50001e-06 [renormalize]: 0.00047569 [add_forward_monad_depend]: 9.24e-06 [auto_monad_grad]: 2.06e-06 [auto_monad_eliminator]: 1.339e-05 [cse]: 2.929e-05 [a_3]: 4.265e-05 [Cycle 2]: 0.00060652, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 7.03e-06 [loop_unroll]: 5.71e-06 [a_1]: 0.00011212 [with_stream_mark]: 9.93002e-06 [recompute_prepare]: 6.03998e-06 [updatestate_depend_eliminate]: 3.04001e-06 [updatestate_assign_eliminate]: 2.38002e-06 [updatestate_loads_eliminate]: 2.76e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 7.169e-05 [accelerated_algorithm]: 5.90002e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.82999e-06 [merge_send_recv]: 4.48999e-06 [auto_parallel]: 5.54998e-06 [parallel]: 4.13999e-06 [flash_sp]: 4.02e-06 [merge_comm]: 3.61001e-06 [allreduce_fusion]: 3.06001e-06 [matmul_add_comm_reduction]: 5.61003e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.48998e-06 [virtual_dataset]: 5.56e-06 [get_grad_eliminate_]: 5.29998e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.71999e-06 [cell_reuse_recompute_pass]: 1.21997e-06 [offload_activation]: 6.14001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.047e-05 [merge_recompute_call_nodes]: 7.80012e-07 [before_grad]: 9.05999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.66999e-06 [meta_fg_expand]: 1.79e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.39e-06 [after_resolve]: 8.85999e-06 [a_after_grad]: 7.89002e-06 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 9.90025e-07 [auto_monad_eliminator]: 6.48998e-06 [cse]: 1.412e-05 [a_3]: 3.308e-05 [py_interpret_to_execute_after_opt_a]: 7.53999e-06 [slice_cell_reuse_recomputed_activation]: 2.21e-06 [rewriter_after_opt_a]: 3.403e-05 [convert_after_rewriter]: 7.39002e-06 [order_py_execute_after_rewriter]: 4.99003e-06 [mutable_eliminate]: 0.00046735 [opt_b]: 0.00018677, [1] [Cycle 1]: 0.00018089, [7] [b_1]: 0.00010936 [b_2]: 7.14001e-06 [updatestate_depend_eliminate]: 5.32001e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 4.69998e-07 [cse]: 1.748e-05 [optimize_parallel_all_gather_comm]: 1.665e-05 [overlap_param_gather]: 2.02999e-06 [cconv]: 2.224e-05 [loop_unroll]: 0.00046904 [opt_after_cconv]: 9.853e-05, [1] [Cycle 1]: 9.292e-05, [7] [c_1]: 2.665e-05 [parameter_eliminate]: 2.32001e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.72001e-06 [updatestate_loads_eliminate]: 2.61999e-06 [cse]: 1.773e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.543e-05 [tuple_transform]: 7.044e-05, [1] [Cycle 1]: 6.55e-05, [4] [d_1]: 3.818e-05 [none_parameter_eliminate]: 1.44e-06 [renormalize]: 2.70025e-07 [switch_simplify]: 6.41e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 5.03e-05 [cse_after_recomputation]: 2.221e-05, [1] [Cycle 1]: 1.749e-05, [1] [cse]: 1.153e-05 [environ_conv]: 8.40999e-06 [swap_dp_allreduce_reducescatter]: 5.23002e-06 [bias_add_comm_swap]: 2.74999e-06 [label_micro_interleaved_index]: 4.33999e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.17999e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.39e-06 [ForceFp32Comm]: 1.07e-06 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.21e-06 [reorder_send_recv_between_fp_bp]: 3.04001e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.35001e-06 [overlap_opt_shard_in_pipeline]: 1.79e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94999e-06 [control_data_broadcast_order]: 1.262e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 4.4e-06 [overlap_recompute_and_grad_model_parallel]: 5.17e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.51998e-06 [overlap_grad_ring_attention]: 4.2e-06 [overlap_grad_flash_sp]: 1.752e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 1.37999e-06 [symbol_engine_optimizer]: 7.201e-05, [1] [Cycle 1]: 6.74e-05, [6] [build]: 2.60002e-06 [elim_shapecalc]: 8.97999e-06 [elim_not_effective]: 1.213e-05 [opt_reshape]: 6.06e-06 [fold_const_symbol]: 9.48002e-06 [renormalize]: 2.59985e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.632e-05 [get_jit_bprop_graph]: 1.17e-06 [rewriter_after_jit_bprop_graph]: 3.5e-06 [opt_after_jit_grad]: 0.00045737 [validate]: 3.493e-05 [backend_pass]: 8.99978e-07 [task_emit]: 0.0597397 [execute]: 1.019e-05 Sums bootstrap : 0.000588s : 0.83% type_inference : 0.006579s : 9.27% event_method : 0.000014s : 0.02% auto_monad : 0.000060s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000064s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000041s : 0.06% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000556s : 0.78% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000154s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000030s : 0.04% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000476s : 0.67% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000043s : 0.06% optimize.opt_a.a_3 : 0.000076s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000467s : 0.66% optimize.opt_b.b_1 : 0.000109s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000469s : 0.66% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.07% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000457s : 0.64% validate : 0.000035s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059740s : 84.22% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000170 26 18.83% : 0.000032s : 5: substitution.arithmetic_simplify 1.14% : 0.000002s : 2: substitution.elim_not_effective 0.84% : 0.000001s : 2: substitution.fold_const_symbol 3.33% : 0.000006s : 3: substitution.graph_param_transform 63.92% : 0.000109s : 3: substitution.inline 1.85% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.76% : 0.000005s : 4: substitution.remove_not_recompute_node 2.01% : 0.000003s : 2: substitution.replace_old_param 5.32% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006528 2 89.61% : 0.005849s : 1: type_inference.infer 10.39% : 0.000678s : 1: type_inference.specialize ------[replace.] 0.000038 4 78.65% : 0.000030s : 3: replace.inline 21.35% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 4 92.76% : 0.000106s : 3: match.inline 7.24% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 883 0.97% : 0.000002s : 9: predicate.accumulaten_eliminater 0.77% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.92% : 0.000001s : 9: predicate.addn_zero_filter 0.82% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.09% : 0.000003s : 15: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.66% : 0.000001s : 6: predicate.check_bprop_eliminate 0.59% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.67% : 0.000001s : 6: predicate.depend_value_elim 0.89% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 9: predicate.dict_set_item_eliminator 0.93% : 0.000001s : 6: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 3: predicate.elim_not_effective 0.42% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_depend_swap 1.79% : 0.000003s : 18: predicate.environ_get_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.32% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.35% : 0.000004s : 13: predicate.float_depend_g_call 0.56% : 0.000001s : 6: predicate.float_environ_get_switch 0.82% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 3: predicate.fold_const_symbol 0.76% : 0.000001s : 6: predicate.get_grad_eliminate 0.22% : 0.000000s : 3: predicate.graph_param_transform 0.68% : 0.000001s : 6: predicate.incorporate_call 0.61% : 0.000001s : 6: predicate.incorporate_call_switch 6.29% : 0.000010s : 40: predicate.inline 0.89% : 0.000001s : 6: predicate.inline_without_move 0.40% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 6: predicate.less_batch_normalization 1.84% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.50% : 0.000004s : 25: predicate.load_eliminater 1.01% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 6: predicate.merge_addn 0.59% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.03% : 0.000002s : 3: predicate.mutable_eliminate 0.36% : 0.000001s : 3: predicate.opt_reshape 0.35% : 0.000001s : 3: predicate.parallel_virtual_node 1.62% : 0.000003s : 13: predicate.partial_defer_inline 1.45% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.63% : 0.000001s : 6: predicate.reduce_all_const_elim 1.18% : 0.000002s : 9: predicate.reduce_eliminate 2.31% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 6: predicate.remove_not_recompute_node 1.42% : 0.000002s : 16: predicate.replace_applicator 0.58% : 0.000001s : 6: predicate.replace_old_param 0.27% : 0.000000s : 3: predicate.reset_defer_inline 0.90% : 0.000001s : 9: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 3: predicate.row_tensor_eliminate 0.80% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 1.02% : 0.000002s : 6: predicate.shard_identity_eliminate 0.71% : 0.000001s : 6: predicate.special_op_eliminate 0.85% : 0.000001s : 6: predicate.specialize_transform 0.95% : 0.000002s : 6: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 13: predicate.switch_defer_inline 1.97% : 0.000003s : 19: predicate.switch_layer_defer_inline 5.10% : 0.000008s : 43: predicate.switch_simplify 0.92% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.54% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.43% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.04% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 3: predicate.value_based_eliminate 0.68% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 6: predicate.virtual_output_eliminate 0.31% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000431 8 38.74% : 0.000167s : 3: func_graph_cloner_run.FuncGraphClonerGraph 61.26% : 0.000264s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.084954 196 0.00% : 0.000004s : 1: ForceFp32Comm 4.27% : 0.003627s : 1: add_attr 4.26% : 0.003616s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.08% : 0.000065s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.74% : 0.000629s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000018s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.56% : 0.000478s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000476s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.10% : 0.000934s : 78: opt.transform.opt_a 0.03% : 0.000025s : 1: opt.transform.opt_after_cconv 0.02% : 0.000020s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000088s : 28: opt.transform.opt_b 0.05% : 0.000042s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.63% : 0.002231s : 1: opt_a 0.12% : 0.000102s : 1: opt_after_cconv 0.55% : 0.000467s : 1: opt_after_jit_grad 0.22% : 0.000190s : 1: opt_b 4.94% : 0.004195s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000019s : 1: remove_dup_value 0.30% : 0.000251s : 1: renormalize.infer 0.26% : 0.000218s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000038s : 1: rewriter_after_opt_a 0.08% : 0.000069s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000075s : 1: symbol_engine_optimizer 70.35% : 0.059765s : 1: task_emit 0.09% : 0.000074s : 1: tuple_transform 7.76% : 0.006594s : 1: type_inference 0.07% : 0.000059s : 1: validate TotalTime = 0.0560727, [24] [bootstrap]: 0.00052892 [type_inference]: 0.00596981 [event_method]: 1.311e-05 [auto_monad]: 5.966e-05 [graph_reusing]: 6.11998e-06 [inline]: 2.29999e-06 [add_attr]: 0.00300542, [1] [add_attr_with_inline]: 0.00299765, [1] [Cycle 1]: 4.676e-05, [2] [tag_attr]: 1.426e-05 [meta_addattr_fg_expand]: 3.76001e-06 [parallel-infer-symbol]: 3.08998e-06 [pre_auto_parallel]: 2.351e-05 [insert-virtual-dataset]: 2.54001e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 2.27001e-06 [pipeline_split]: 1.87999e-06 [optimize]: 0.00395746, [53] [py_interpret_to_execute]: 2.012e-05 [rewriter_before_opt_a]: 5.187e-05 [opt_a]: 0.00201569, [2] [Cycle 1]: 0.00140444, [45] [expand_dump_flag]: 2.78e-06 [switch_simplify]: 2.884e-05 [loop_unroll]: 1.73e-05 [a_1]: 0.00035323 [with_stream_mark]: 1.434e-05 [recompute_prepare]: 7.6e-06 [updatestate_depend_eliminate]: 3.83999e-06 [updatestate_assign_eliminate]: 3.71999e-06 [updatestate_loads_eliminate]: 2.93998e-06 [parameter_eliminate]: 1.89999e-06 [a_2]: 8.024e-05 [accelerated_algorithm]: 6.53e-06 [shard]: 2.47001e-06 [meta_shard_fg_expand]: 1.65001e-06 [shard_inline]: 6.17001e-06 [merge_send_recv]: 8.47e-06 [auto_parallel]: 5.67001e-06 [parallel]: 1.956e-05 [flash_sp]: 7.13998e-06 [merge_comm]: 4.03001e-06 [allreduce_fusion]: 3.68e-06 [matmul_add_comm_reduction]: 9.82001e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 7.46999e-06 [virtual_dataset]: 6.27001e-06 [get_grad_eliminate_]: 5.94999e-06 [virtual_output]: 5.66998e-06 [merge_forward]: 3.95e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.62999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.156e-05 [merge_recompute_call_nodes]: 1.47999e-06 [before_grad]: 1.003e-05 [set_forward_comm_id_for_comm_node_pass]: 3.83001e-06 [meta_fg_expand]: 2.63e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 2.24001e-06 [after_resolve]: 9.69999e-06 [a_after_grad]: 8.70001e-06 [renormalize]: 0.00039456 [add_forward_monad_depend]: 4.92999e-06 [auto_monad_grad]: 1.97001e-06 [auto_monad_eliminator]: 1.368e-05 [cse]: 2.961e-05 [a_3]: 4.063e-05 [Cycle 2]: 0.00060131, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 7.34002e-06 [loop_unroll]: 5.96998e-06 [a_1]: 0.00011438 [with_stream_mark]: 1.141e-05 [recompute_prepare]: 5.92999e-06 [updatestate_depend_eliminate]: 3.21001e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 8.60018e-07 [a_2]: 7.073e-05 [accelerated_algorithm]: 5.66e-06 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 4.53999e-06 [auto_parallel]: 5.71e-06 [parallel]: 4.12e-06 [flash_sp]: 3.34001e-06 [merge_comm]: 3.38e-06 [allreduce_fusion]: 3.00002e-06 [matmul_add_comm_reduction]: 5.26998e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.41e-06 [virtual_dataset]: 5.46e-06 [get_grad_eliminate_]: 5.35999e-06 [virtual_output]: 5.25999e-06 [merge_forward]: 2.64999e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 6.67002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.022e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.52e-06 [set_forward_comm_id_for_comm_node_pass]: 3.20998e-06 [meta_fg_expand]: 1.80001e-06 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 9.5999e-07 [after_resolve]: 8.39002e-06 [a_after_grad]: 7.7e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.45997e-06 [cse]: 1.293e-05 [a_3]: 3.194e-05 [py_interpret_to_execute_after_opt_a]: 7.52998e-06 [slice_cell_reuse_recomputed_activation]: 2.12999e-06 [rewriter_after_opt_a]: 3.279e-05 [convert_after_rewriter]: 7.45e-06 [order_py_execute_after_rewriter]: 4.95001e-06 [mutable_eliminate]: 0.00052452 [opt_b]: 0.00018663, [1] [Cycle 1]: 0.00018067, [7] [b_1]: 0.00010985 [b_2]: 7.36001e-06 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 2.50002e-07 [cse]: 1.746e-05 [optimize_parallel_all_gather_comm]: 1.696e-05 [overlap_param_gather]: 2.02999e-06 [cconv]: 2.42e-05 [loop_unroll]: 0.0004249 [opt_after_cconv]: 9.65e-05, [1] [Cycle 1]: 9.076e-05, [7] [c_1]: 2.571e-05 [parameter_eliminate]: 2.41e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.71e-06 [updatestate_loads_eliminate]: 2.38002e-06 [cse]: 1.758e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.504e-05 [tuple_transform]: 6.81e-05, [1] [Cycle 1]: 6.379e-05, [4] [d_1]: 3.703e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.20025e-07 [switch_simplify]: 6.53e-06 [partial_unused_args_eliminate]: 2.07001e-06 [add_recomputation]: 4.493e-05 [cse_after_recomputation]: 2.077e-05, [1] [Cycle 1]: 1.65e-05, [1] [cse]: 1.115e-05 [environ_conv]: 5.45001e-06 [swap_dp_allreduce_reducescatter]: 4.92999e-06 [bias_add_comm_swap]: 2.44001e-06 [label_micro_interleaved_index]: 4.08999e-06 [label_fine_grained_interleaved_index]: 2.49999e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.36e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.12999e-06 [full_micro_interleaved_order_control]: 2.71999e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.23002e-06 [add_comm_op_reuse_tag]: 1.15001e-06 [interleave_split_concat_branches]: 1.35001e-06 [interleave_parallel_branches]: 1.09998e-06 [overlap_opt_shard_in_pipeline]: 1.22999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86998e-06 [control_data_broadcast_order]: 1.2e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 4.1e-06 [overlap_recompute_and_grad_model_parallel]: 4.89e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.43002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.26002e-06 [overlap_recompute_comm]: 2.59001e-06 [overlap_grad_ring_attention]: 4.27998e-06 [overlap_grad_flash_sp]: 1.804e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 1.19998e-06 [symbol_engine_optimizer]: 7.075e-05, [1] [Cycle 1]: 6.64e-05, [6] [build]: 2.53e-06 [elim_shapecalc]: 8.90999e-06 [elim_not_effective]: 1.151e-05 [opt_reshape]: 6.06998e-06 [fold_const_symbol]: 9.37999e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.67999e-06 [auto_monad_reorder]: 1.682e-05 [get_jit_bprop_graph]: 1.17e-06 [rewriter_after_jit_bprop_graph]: 3.47002e-06 [opt_after_jit_grad]: 0.00045524 [validate]: 3.626e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0417141 [execute]: 9.96e-06 Sums bootstrap : 0.000529s : 1.02% type_inference : 0.005970s : 11.48% event_method : 0.000013s : 0.03% auto_monad : 0.000060s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000024s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.04% optimize.rewriter_before_opt_a : 0.000052s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000036s : 0.07% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000468s : 0.90% optimize.opt_a.with_stream_mark : 0.000026s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000151s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.03% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000024s : 0.05% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.03% optimize.opt_a.renormalize : 0.000395s : 0.76% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000043s : 0.08% optimize.opt_a.a_3 : 0.000073s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000525s : 1.01% optimize.opt_b.b_1 : 0.000110s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.05% optimize.loop_unroll : 0.000425s : 0.82% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000037s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.09% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000455s : 0.88% validate : 0.000036s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.041714s : 80.19% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000143 24 20.04% : 0.000029s : 4: substitution.arithmetic_simplify 1.25% : 0.000002s : 2: substitution.elim_not_effective 0.97% : 0.000001s : 2: substitution.fold_const_symbol 3.75% : 0.000005s : 3: substitution.graph_param_transform 66.52% : 0.000095s : 3: substitution.inline 2.16% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.22% : 0.000005s : 4: substitution.remove_not_recompute_node 2.09% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005925 2 92.16% : 0.005461s : 1: type_inference.infer 7.84% : 0.000464s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000093 3 100.00% : 0.000093s : 3: match.inline ------[predicate.] 0.000146 815 0.90% : 0.000001s : 8: predicate.accumulaten_eliminater 0.92% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.92% : 0.000001s : 8: predicate.addn_zero_filter 0.77% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.25% : 0.000003s : 14: predicate.arithmetic_simplify 0.90% : 0.000001s : 8: predicate.cast_eliminate 0.69% : 0.000001s : 6: predicate.check_bprop_eliminate 0.66% : 0.000001s : 6: predicate.compare_switch_simplify 0.22% : 0.000000s : 3: predicate.const_output_eliminate 0.71% : 0.000001s : 6: predicate.depend_value_elim 0.87% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 3: predicate.elim_not_effective 0.41% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 11: predicate.environ_get_depend_swap 1.79% : 0.000003s : 17: predicate.environ_get_eliminate 1.10% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.27% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 6: predicate.float_environ_get_switch 0.91% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 3: predicate.fold_const_symbol 0.97% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.79% : 0.000001s : 6: predicate.incorporate_call 0.65% : 0.000001s : 6: predicate.incorporate_call_switch 6.26% : 0.000009s : 37: predicate.inline 0.96% : 0.000001s : 6: predicate.inline_without_move 0.42% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 6: predicate.less_batch_normalization 1.54% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 22: predicate.load_eliminater 1.07% : 0.000002s : 3: predicate.loop_unroll_after_grad 1.97% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.68% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 6: predicate.merge_addn 0.69% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 8: predicate.minmaximum_grad 1.20% : 0.000002s : 3: predicate.mutable_eliminate 0.43% : 0.000001s : 3: predicate.opt_reshape 0.37% : 0.000001s : 3: predicate.parallel_virtual_node 1.44% : 0.000002s : 11: predicate.partial_defer_inline 1.31% : 0.000002s : 11: predicate.partial_eliminate 0.88% : 0.000001s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.09% : 0.000002s : 8: predicate.reduce_eliminate 2.30% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.65% : 0.000001s : 6: predicate.remove_not_recompute_node 1.18% : 0.000002s : 14: predicate.replace_applicator 0.73% : 0.000001s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.92% : 0.000001s : 8: predicate.reshape_eliminate 0.66% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.86% : 0.000001s : 6: predicate.same_eliminate 0.52% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 6: predicate.shard_identity_eliminate 0.80% : 0.000001s : 6: predicate.special_op_eliminate 0.98% : 0.000001s : 6: predicate.specialize_transform 0.98% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.25% : 0.000002s : 11: predicate.switch_defer_inline 1.91% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.94% : 0.000007s : 38: predicate.switch_simplify 0.93% : 0.000001s : 8: predicate.tile_eliminate 0.86% : 0.000001s : 8: predicate.transpose_eliminate 1.61% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.35% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.62% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.18% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 28: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.75% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 6: predicate.virtual_output_eliminate 0.36% : 0.000001s : 3: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000292 7 42.09% : 0.000123s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.91% : 0.000169s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064368 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.68% : 0.003010s : 1: add_attr 4.66% : 0.003001s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000065s : 1: auto_monad 0.03% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.88% : 0.000567s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000011s : 1: convert_after_rewriter 0.04% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.03% : 0.000017s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.67% : 0.000434s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.83% : 0.000534s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.29% : 0.000829s : 78: opt.transform.opt_a 0.04% : 0.000024s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000089s : 28: opt.transform.opt_b 0.06% : 0.000041s : 2: opt.transform.opt_trans_graph 0.05% : 0.000032s : 4: opt.transform.symbol_engine_opt 3.14% : 0.002019s : 1: opt_a 0.16% : 0.000100s : 1: opt_after_cconv 0.72% : 0.000464s : 1: opt_after_jit_grad 0.30% : 0.000190s : 1: opt_b 6.15% : 0.003961s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000028s : 1: pre_auto_parallel 0.04% : 0.000024s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.32% : 0.000207s : 1: renormalize.infer 0.28% : 0.000181s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000037s : 1: rewriter_after_opt_a 0.09% : 0.000056s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000074s : 1: symbol_engine_optimizer 64.84% : 0.041737s : 1: task_emit 0.11% : 0.000071s : 1: tuple_transform 9.30% : 0.005985s : 1: type_inference 0.09% : 0.000059s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x8-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x8-ge],max_mem:14.0M . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x9-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x9-pynative],max_mem:14.0M TotalTime = 0.0219571, [24] [bootstrap]: 0.00060426 [type_inference]: 0.00638543 [event_method]: 1.389e-05 [auto_monad]: 6.392e-05 [graph_reusing]: 5.87999e-06 [inline]: 1.86e-06 [add_attr]: 0.0035533, [1] [add_attr_with_inline]: 0.0035433, [1] [Cycle 1]: 4.624e-05, [2] [tag_attr]: 1.543e-05 [meta_addattr_fg_expand]: 4.40999e-06 [parallel-infer-symbol]: 2.84999e-06 [pre_auto_parallel]: 2.582e-05 [insert-virtual-dataset]: 2.53998e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 2.07001e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00410904, [53] [py_interpret_to_execute]: 2.158e-05 [rewriter_before_opt_a]: 6.31e-05 [opt_a]: 0.00220713, [2] [Cycle 1]: 0.00159719, [45] [expand_dump_flag]: 3.08998e-06 [switch_simplify]: 3.4e-05 [loop_unroll]: 2.043e-05 [a_1]: 0.00044156 [with_stream_mark]: 1.355e-05 [recompute_prepare]: 8.13001e-06 [updatestate_depend_eliminate]: 4.11001e-06 [updatestate_assign_eliminate]: 3.5e-06 [updatestate_loads_eliminate]: 3.3e-06 [parameter_eliminate]: 2.50997e-06 [a_2]: 7.853e-05 [accelerated_algorithm]: 6.94999e-06 [shard]: 2.57001e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 6.11e-06 [merge_send_recv]: 8.74e-06 [auto_parallel]: 6.16998e-06 [parallel]: 2.562e-05 [flash_sp]: 8e-06 [merge_comm]: 4.1e-06 [allreduce_fusion]: 3.88001e-06 [matmul_add_comm_reduction]: 9.44998e-06 [allreduce_slice_to_reducescatter]: 8.70001e-07 [virtual_shard_identity]: 7.9e-06 [virtual_dataset]: 6.26e-06 [get_grad_eliminate_]: 5.86e-06 [virtual_output]: 6.21e-06 [merge_forward]: 3.76999e-06 [cell_reuse_recompute_pass]: 1.14003e-06 [offload_activation]: 9.99001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.196e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 1.071e-05 [set_forward_comm_id_for_comm_node_pass]: 3.76999e-06 [meta_fg_expand]: 3.05998e-06 [flash_sp_send_recv_attached]: 2.81999e-06 [receive_attached]: 2.21e-06 [after_resolve]: 9.46e-06 [a_after_grad]: 8.72e-06 [renormalize]: 0.00043733 [add_forward_monad_depend]: 8.27e-06 [auto_monad_grad]: 2.23998e-06 [auto_monad_eliminator]: 4.863e-05 [cse]: 2.883e-05 [a_3]: 4.173e-05 [Cycle 2]: 0.00060069, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 6.90002e-06 [loop_unroll]: 5.99e-06 [a_1]: 0.00011444 [with_stream_mark]: 9.90002e-06 [recompute_prepare]: 5.96e-06 [updatestate_depend_eliminate]: 2.96999e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 1.02998e-06 [a_2]: 7.198e-05 [accelerated_algorithm]: 5.68002e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 4.53001e-06 [auto_parallel]: 5.42999e-06 [parallel]: 3.89002e-06 [flash_sp]: 3.38e-06 [merge_comm]: 3.25e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 4.98001e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 6.29001e-06 [virtual_dataset]: 5.35001e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 5.01002e-06 [merge_forward]: 2.69001e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 5.99999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.97001e-06 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 8.78001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 1.74998e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 8.07e-06 [a_after_grad]: 8.03999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.10001e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 6.35002e-06 [cse]: 1.417e-05 [a_3]: 3.244e-05 [py_interpret_to_execute_after_opt_a]: 7.41001e-06 [slice_cell_reuse_recomputed_activation]: 2.04999e-06 [rewriter_after_opt_a]: 3.311e-05 [convert_after_rewriter]: 7.53999e-06 [order_py_execute_after_rewriter]: 5.43002e-06 [mutable_eliminate]: 0.00046009 [opt_b]: 0.00018752, [1] [Cycle 1]: 0.00018127, [7] [b_1]: 0.00010936 [b_2]: 7.33999e-06 [updatestate_depend_eliminate]: 5.51e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 4.89992e-07 [cse]: 1.802e-05 [optimize_parallel_all_gather_comm]: 1.718e-05 [overlap_param_gather]: 1.88002e-06 [cconv]: 2.513e-05 [loop_unroll]: 0.00042079 [opt_after_cconv]: 9.768e-05, [1] [Cycle 1]: 9.179e-05, [7] [c_1]: 2.614e-05 [parameter_eliminate]: 2.46e-06 [updatestate_depend_eliminate]: 5.71e-06 [updatestate_assign_eliminate]: 2.83e-06 [updatestate_loads_eliminate]: 2.51998e-06 [cse]: 1.744e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 1.518e-05 [tuple_transform]: 6.821e-05, [1] [Cycle 1]: 6.359e-05, [4] [d_1]: 3.688e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 2.29978e-07 [switch_simplify]: 6.44999e-06 [partial_unused_args_eliminate]: 1.82999e-06 [add_recomputation]: 5.104e-05 [cse_after_recomputation]: 2.193e-05, [1] [Cycle 1]: 1.738e-05, [1] [cse]: 1.188e-05 [environ_conv]: 8.12e-06 [swap_dp_allreduce_reducescatter]: 5.47001e-06 [bias_add_comm_swap]: 2.76999e-06 [label_micro_interleaved_index]: 4.4e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.39998e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.47001e-06 [assign_add_opt]: 1.40001e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.32e-06 [full_micro_interleaved_order_control]: 2.64001e-06 [reorder_send_recv_between_fp_bp]: 2.78998e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.25001e-06 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.23002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.80001e-06 [control_data_broadcast_order]: 1.244e-05 [grouped_pairwise_exchange_alltoall]: 1.42e-06 [offloading_packed_experts]: 3.8e-06 [overlap_recompute_and_grad_model_parallel]: 5.35999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.32001e-06 [overlap_grad_ring_attention]: 4.04002e-06 [overlap_grad_flash_sp]: 1.757e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.36e-06 [split_layernorm_comm]: 2.00002e-06 [handle_group_info]: 1.07998e-06 [symbol_engine_optimizer]: 7.075e-05, [1] [Cycle 1]: 6.655e-05, [6] [build]: 2.44001e-06 [elim_shapecalc]: 9.09e-06 [elim_not_effective]: 1.17e-05 [opt_reshape]: 6.17001e-06 [fold_const_symbol]: 9.49999e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.94999e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 1.621e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.91999e-06 [opt_after_jit_grad]: 0.00045156 [validate]: 3.457e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00646063 [execute]: 7.55e-06 Sums bootstrap : 0.000604s : 3.47% type_inference : 0.006385s : 36.69% event_method : 0.000014s : 0.08% auto_monad : 0.000064s : 0.37% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.12% optimize.rewriter_before_opt_a : 0.000063s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000556s : 3.20% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000004s : 0.02% optimize.opt_a.a_2 : 0.000151s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000030s : 0.17% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000018s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000437s : 2.51% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.05% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000055s : 0.32% optimize.opt_a.cse : 0.000043s : 0.25% optimize.opt_a.a_3 : 0.000074s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.19% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000460s : 2.64% optimize.opt_b.b_1 : 0.000109s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.14% optimize.loop_unroll : 0.000421s : 2.42% optimize.opt_after_cconv.c_1 : 0.000026s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.09% optimize.tuple_transform.d_1 : 0.000037s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000008s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000452s : 2.59% validate : 0.000035s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006461s : 37.13% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000166 26 18.71% : 0.000031s : 5: substitution.arithmetic_simplify 1.16% : 0.000002s : 2: substitution.elim_not_effective 0.88% : 0.000001s : 2: substitution.fold_const_symbol 3.39% : 0.000006s : 3: substitution.graph_param_transform 64.16% : 0.000107s : 3: substitution.inline 1.91% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.67% : 0.000004s : 4: substitution.remove_not_recompute_node 1.81% : 0.000003s : 2: substitution.replace_old_param 5.31% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006334 2 90.45% : 0.005729s : 1: type_inference.infer 9.55% : 0.000605s : 1: type_inference.specialize ------[replace.] 0.000037 4 78.91% : 0.000029s : 3: replace.inline 21.09% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000113 4 92.81% : 0.000105s : 3: match.inline 7.19% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 883 1.13% : 0.000002s : 9: predicate.accumulaten_eliminater 1.01% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 6: predicate.addn_check_dump 0.88% : 0.000001s : 9: predicate.addn_zero_filter 0.84% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 15: predicate.arithmetic_simplify 0.91% : 0.000001s : 9: predicate.cast_eliminate 0.61% : 0.000001s : 6: predicate.check_bprop_eliminate 0.58% : 0.000001s : 6: predicate.compare_switch_simplify 0.20% : 0.000000s : 3: predicate.const_output_eliminate 0.58% : 0.000001s : 6: predicate.depend_value_elim 0.98% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 9: predicate.dict_set_item_eliminator 0.96% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.38% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_depend_swap 1.74% : 0.000003s : 18: predicate.environ_get_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.49% : 0.000004s : 13: predicate.float_depend_g_call 0.55% : 0.000001s : 6: predicate.float_environ_get_switch 0.84% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 3: predicate.fold_const_symbol 0.70% : 0.000001s : 6: predicate.get_grad_eliminate 0.23% : 0.000000s : 3: predicate.graph_param_transform 0.73% : 0.000001s : 6: predicate.incorporate_call 0.58% : 0.000001s : 6: predicate.incorporate_call_switch 6.39% : 0.000010s : 40: predicate.inline 0.90% : 0.000001s : 6: predicate.inline_without_move 0.43% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.06% : 0.000002s : 6: predicate.less_batch_normalization 1.62% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 25: predicate.load_eliminater 0.97% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 21: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 6: predicate.merge_addn 0.59% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 9: predicate.minmaximum_grad 1.07% : 0.000002s : 3: predicate.mutable_eliminate 0.38% : 0.000001s : 3: predicate.opt_reshape 0.41% : 0.000001s : 3: predicate.parallel_virtual_node 1.63% : 0.000003s : 13: predicate.partial_defer_inline 1.45% : 0.000002s : 13: predicate.partial_eliminate 1.03% : 0.000002s : 9: predicate.print_const_string_wrapper 0.70% : 0.000001s : 6: predicate.reduce_all_const_elim 1.23% : 0.000002s : 9: predicate.reduce_eliminate 2.41% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 6: predicate.remove_not_recompute_node 1.24% : 0.000002s : 16: predicate.replace_applicator 0.56% : 0.000001s : 6: predicate.replace_old_param 0.26% : 0.000000s : 3: predicate.reset_defer_inline 0.94% : 0.000001s : 9: predicate.reshape_eliminate 0.60% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.85% : 0.000001s : 6: predicate.same_eliminate 0.45% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 6: predicate.shard_identity_eliminate 0.67% : 0.000001s : 6: predicate.special_op_eliminate 0.85% : 0.000001s : 6: predicate.specialize_transform 0.92% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.73% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 13: predicate.switch_defer_inline 1.92% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.95% : 0.000008s : 43: predicate.switch_simplify 0.91% : 0.000001s : 9: predicate.tile_eliminate 0.84% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 15: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.58% : 0.000006s : 22: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.59% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.39% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.05% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 3: predicate.value_based_eliminate 0.67% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000389 8 47.86% : 0.000186s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.14% : 0.000203s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031128 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.43% : 0.003558s : 1: add_attr 11.39% : 0.003547s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000069s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.06% : 0.000642s : 1: bootstrap 0.09% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.04% : 0.000011s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.38% : 0.000429s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.51% : 0.000469s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.98% : 0.000928s : 78: opt.transform.opt_a 0.08% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000089s : 28: opt.transform.opt_b 0.13% : 0.000041s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.10% : 0.002210s : 1: opt_a 0.32% : 0.000101s : 1: opt_after_cconv 1.48% : 0.000461s : 1: opt_after_jit_grad 0.61% : 0.000191s : 1: opt_b 13.21% : 0.004113s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 0.71% : 0.000222s : 1: renormalize.infer 0.67% : 0.000208s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000037s : 1: rewriter_after_opt_a 0.22% : 0.000067s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000073s : 1: symbol_engine_optimizer 20.79% : 0.006472s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.56% : 0.006401s : 1: type_inference 0.20% : 0.000062s : 1: validate TotalTime = 0.0202365, [24] [bootstrap]: 0.00049866 [type_inference]: 0.0059352 [event_method]: 1.285e-05 [auto_monad]: 6.013e-05 [graph_reusing]: 5.55001e-06 [inline]: 1.99e-06 [add_attr]: 0.00304039, [1] [add_attr_with_inline]: 0.00303159, [1] [Cycle 1]: 5.427e-05, [2] [tag_attr]: 1.429e-05 [meta_addattr_fg_expand]: 4.40999e-06 [parallel-infer-symbol]: 3.09001e-06 [pre_auto_parallel]: 2.42e-05 [insert-virtual-dataset]: 2.42001e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.94999e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00396044, [53] [py_interpret_to_execute]: 2.031e-05 [rewriter_before_opt_a]: 5.091e-05 [opt_a]: 0.00208189, [2] [Cycle 1]: 0.00144728, [45] [expand_dump_flag]: 2.94001e-06 [switch_simplify]: 2.852e-05 [loop_unroll]: 1.712e-05 [a_1]: 0.00035671 [with_stream_mark]: 1.423e-05 [recompute_prepare]: 8.16002e-06 [updatestate_depend_eliminate]: 3.90998e-06 [updatestate_assign_eliminate]: 3.9e-06 [updatestate_loads_eliminate]: 3.6e-06 [parameter_eliminate]: 1.84e-06 [a_2]: 8.463e-05 [accelerated_algorithm]: 7.08e-06 [shard]: 1.89e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 6.24999e-06 [merge_send_recv]: 8.82999e-06 [auto_parallel]: 6.73e-06 [parallel]: 2.012e-05 [flash_sp]: 7.39002e-06 [merge_comm]: 4.2e-06 [allreduce_fusion]: 3.55998e-06 [matmul_add_comm_reduction]: 9.24998e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.4e-06 [virtual_dataset]: 6.03002e-06 [get_grad_eliminate_]: 5.66e-06 [virtual_output]: 6.07999e-06 [merge_forward]: 3.9e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 9.64999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.187e-05 [merge_recompute_call_nodes]: 1.47999e-06 [before_grad]: 1.076e-05 [set_forward_comm_id_for_comm_node_pass]: 3.76001e-06 [meta_fg_expand]: 2.86999e-06 [flash_sp_send_recv_attached]: 2.83e-06 [receive_attached]: 2.29001e-06 [after_resolve]: 1.041e-05 [a_after_grad]: 9.54e-06 [renormalize]: 0.00042376 [add_forward_monad_depend]: 4.60001e-06 [auto_monad_grad]: 1.96998e-06 [auto_monad_eliminator]: 1.321e-05 [cse]: 2.988e-05 [a_3]: 4.102e-05 [Cycle 2]: 0.00062499, [45] [expand_dump_flag]: 7.80012e-07 [switch_simplify]: 7.31999e-06 [loop_unroll]: 5.83002e-06 [a_1]: 0.00011474 [with_stream_mark]: 1.05e-05 [recompute_prepare]: 6.16e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.65002e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 8.952e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.97001e-06 [merge_send_recv]: 4.65001e-06 [auto_parallel]: 5.69999e-06 [parallel]: 4.22e-06 [flash_sp]: 3.25e-06 [merge_comm]: 3.10998e-06 [allreduce_fusion]: 2.91e-06 [matmul_add_comm_reduction]: 5.30999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.28002e-06 [virtual_dataset]: 5.25999e-06 [get_grad_eliminate_]: 5.19998e-06 [virtual_output]: 5.05999e-06 [merge_forward]: 2.93998e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 6.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.05e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.52e-06 [set_forward_comm_id_for_comm_node_pass]: 3.68e-06 [meta_fg_expand]: 1.82999e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.49998e-06 [a_after_grad]: 7.68001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 6.56e-06 [cse]: 1.399e-05 [a_3]: 3.271e-05 [py_interpret_to_execute_after_opt_a]: 7.42998e-06 [slice_cell_reuse_recomputed_activation]: 2.48e-06 [rewriter_after_opt_a]: 3.411e-05 [convert_after_rewriter]: 6.76e-06 [order_py_execute_after_rewriter]: 5.54e-06 [mutable_eliminate]: 0.00046917 [opt_b]: 0.00018567, [1] [Cycle 1]: 0.00017926, [7] [b_1]: 0.0001098 [b_2]: 7.03e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.19999e-06 [renormalize]: 4.59986e-07 [cse]: 1.752e-05 [optimize_parallel_all_gather_comm]: 1.622e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.231e-05 [loop_unroll]: 0.00042091 [opt_after_cconv]: 9.664e-05, [1] [Cycle 1]: 9.062e-05, [7] [c_1]: 2.629e-05 [parameter_eliminate]: 2.53e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.44001e-06 [cse]: 1.697e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.448e-05 [tuple_transform]: 6.902e-05, [1] [Cycle 1]: 6.441e-05, [4] [d_1]: 3.767e-05 [none_parameter_eliminate]: 1.72999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.36e-06 [partial_unused_args_eliminate]: 1.99999e-06 [add_recomputation]: 4.466e-05 [cse_after_recomputation]: 2.087e-05, [1] [Cycle 1]: 1.631e-05, [1] [cse]: 1.098e-05 [environ_conv]: 4.90001e-06 [swap_dp_allreduce_reducescatter]: 5.09998e-06 [bias_add_comm_swap]: 2.66999e-06 [label_micro_interleaved_index]: 4.58999e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.49e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.81e-06 [assign_add_opt]: 1.66e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.09998e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.84001e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.23002e-06 [overlap_opt_shard_in_pipeline]: 1.27999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.285e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 3.65998e-06 [overlap_recompute_and_grad_model_parallel]: 4.50999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.63e-06 [overlap_grad_ring_attention]: 4.08999e-06 [overlap_grad_flash_sp]: 1.793e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.27001e-06 [split_layernorm_comm]: 2.27999e-06 [handle_group_info]: 1.47001e-06 [symbol_engine_optimizer]: 7.213e-05, [1] [Cycle 1]: 6.73e-05, [6] [build]: 2.26998e-06 [elim_shapecalc]: 8.50001e-06 [elim_not_effective]: 1.185e-05 [opt_reshape]: 6.53e-06 [fold_const_symbol]: 9.68997e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.75001e-06 [auto_monad_reorder]: 1.635e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.33e-06 [opt_after_jit_grad]: 0.00045477 [validate]: 3.375e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00596492 [execute]: 7.16999e-06 Sums bootstrap : 0.000499s : 3.08% type_inference : 0.005935s : 36.64% event_method : 0.000013s : 0.08% auto_monad : 0.000060s : 0.37% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000051s : 0.31% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000036s : 0.22% optimize.opt_a.loop_unroll : 0.000023s : 0.14% optimize.opt_a.a_1 : 0.000471s : 2.91% optimize.opt_a.with_stream_mark : 0.000025s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000174s : 1.08% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000024s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000424s : 2.62% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000044s : 0.27% optimize.opt_a.a_3 : 0.000074s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000034s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000469s : 2.90% optimize.opt_b.b_1 : 0.000110s : 0.68% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000421s : 2.60% optimize.opt_after_cconv.c_1 : 0.000026s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000455s : 2.81% validate : 0.000034s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005965s : 36.82% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000145 24 20.30% : 0.000029s : 4: substitution.arithmetic_simplify 1.25% : 0.000002s : 2: substitution.elim_not_effective 1.11% : 0.000002s : 2: substitution.fold_const_symbol 4.15% : 0.000006s : 3: substitution.graph_param_transform 65.28% : 0.000094s : 3: substitution.inline 2.12% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.10% : 0.000004s : 4: substitution.remove_not_recompute_node 2.68% : 0.000004s : 2: substitution.replace_old_param ------[type_inference.] 0.005892 2 92.24% : 0.005435s : 1: type_inference.infer 7.76% : 0.000457s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000093 3 100.00% : 0.000093s : 3: match.inline ------[predicate.] 0.000147 815 1.02% : 0.000001s : 8: predicate.accumulaten_eliminater 1.09% : 0.000002s : 3: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 6: predicate.addn_check_dump 0.92% : 0.000001s : 8: predicate.addn_zero_filter 0.81% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.20% : 0.000003s : 14: predicate.arithmetic_simplify 0.84% : 0.000001s : 8: predicate.cast_eliminate 0.71% : 0.000001s : 6: predicate.check_bprop_eliminate 0.62% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.69% : 0.000001s : 6: predicate.depend_value_elim 0.88% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.06% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_depend_swap 1.88% : 0.000003s : 17: predicate.environ_get_eliminate 1.11% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.21% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 6: predicate.float_environ_get_switch 0.91% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 3: predicate.fold_const_symbol 0.79% : 0.000001s : 6: predicate.get_grad_eliminate 0.28% : 0.000000s : 3: predicate.graph_param_transform 0.79% : 0.000001s : 6: predicate.incorporate_call 0.60% : 0.000001s : 6: predicate.incorporate_call_switch 6.46% : 0.000010s : 37: predicate.inline 1.20% : 0.000002s : 6: predicate.inline_without_move 0.45% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 6: predicate.less_batch_normalization 1.52% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 22: predicate.load_eliminater 1.04% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.01% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.67% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 6: predicate.merge_addn 0.71% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 8: predicate.minmaximum_grad 1.23% : 0.000002s : 3: predicate.mutable_eliminate 0.42% : 0.000001s : 3: predicate.opt_reshape 0.56% : 0.000001s : 3: predicate.parallel_virtual_node 1.42% : 0.000002s : 11: predicate.partial_defer_inline 1.35% : 0.000002s : 11: predicate.partial_eliminate 0.96% : 0.000001s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.12% : 0.000002s : 8: predicate.reduce_eliminate 2.31% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.60% : 0.000001s : 6: predicate.remove_not_recompute_node 1.19% : 0.000002s : 14: predicate.replace_applicator 0.61% : 0.000001s : 6: predicate.replace_old_param 0.33% : 0.000000s : 3: predicate.reset_defer_inline 0.90% : 0.000001s : 8: predicate.reshape_eliminate 0.61% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 3: predicate.row_tensor_eliminate 0.83% : 0.000001s : 6: predicate.same_eliminate 0.54% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 6: predicate.shard_identity_eliminate 0.92% : 0.000001s : 6: predicate.special_op_eliminate 0.95% : 0.000001s : 6: predicate.specialize_transform 0.95% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.25% : 0.000002s : 11: predicate.switch_defer_inline 1.85% : 0.000003s : 17: predicate.switch_layer_defer_inline 4.81% : 0.000007s : 38: predicate.switch_simplify 0.86% : 0.000001s : 8: predicate.tile_eliminate 0.86% : 0.000001s : 8: predicate.transpose_eliminate 1.59% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 14: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.36% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.52% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.16% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 3: predicate.value_based_eliminate 0.71% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000278 7 38.18% : 0.000106s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.82% : 0.000172s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028669 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.62% : 0.003045s : 1: add_attr 10.59% : 0.003036s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000049s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.23% : 0.000065s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.87% : 0.000535s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000429s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.67% : 0.000478s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.98% : 0.000856s : 78: opt.transform.opt_a 0.09% : 0.000025s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.27% : 0.002085s : 1: opt_a 0.35% : 0.000100s : 1: opt_after_cconv 1.62% : 0.000464s : 1: opt_after_jit_grad 0.66% : 0.000189s : 1: opt_b 13.83% : 0.003964s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.79% : 0.000226s : 1: renormalize.infer 0.67% : 0.000191s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000038s : 1: rewriter_after_opt_a 0.19% : 0.000055s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000075s : 1: symbol_engine_optimizer 20.84% : 0.005975s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 20.76% : 0.005951s : 1: type_inference 0.22% : 0.000062s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x9-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x9-kbk],max_mem:14.0M . TotalTime = 0.949415, [24] [bootstrap]: 0.00061546 [type_inference]: 0.00676016 [event_method]: 1.56e-05 [auto_monad]: 6.752e-05 [graph_reusing]: 5.88998e-06 [inline]: 2.64999e-06 [add_attr]: 0.00365456, [1] [add_attr_with_inline]: 0.00364364, [1] [Cycle 1]: 5.349e-05, [2] [tag_attr]: 1.55e-05 [meta_addattr_fg_expand]: 4.53999e-06 [parallel-infer-symbol]: 3.51001e-06 [pre_auto_parallel]: 2.654e-05 [insert-virtual-dataset]: 2.74001e-06 [parallel-infer-symbol-second]: 7.99977e-07 [dataset_repeat_opt]: 2.09999e-06 [pipeline_split]: 1.96e-06 [optimize]: 0.00429371, [53] [py_interpret_to_execute]: 2.182e-05 [rewriter_before_opt_a]: 6.393e-05 [opt_a]: 0.00227476, [2] [Cycle 1]: 0.00165502, [45] [expand_dump_flag]: 2.82002e-06 [switch_simplify]: 3.416e-05 [loop_unroll]: 5.394e-05 [a_1]: 0.00045299 [with_stream_mark]: 1.398e-05 [recompute_prepare]: 7.93001e-06 [updatestate_depend_eliminate]: 3.9e-06 [updatestate_assign_eliminate]: 3.56999e-06 [updatestate_loads_eliminate]: 3.18998e-06 [parameter_eliminate]: 1.85001e-06 [a_2]: 8.003e-05 [accelerated_algorithm]: 6.76e-06 [shard]: 2.23002e-06 [meta_shard_fg_expand]: 1.90001e-06 [shard_inline]: 5.96e-06 [merge_send_recv]: 8.78001e-06 [auto_parallel]: 6.64999e-06 [parallel]: 2.58e-05 [flash_sp]: 7.71001e-06 [merge_comm]: 3.78001e-06 [allreduce_fusion]: 3.71001e-06 [matmul_add_comm_reduction]: 9.96e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.33999e-06 [virtual_dataset]: 6.04999e-06 [get_grad_eliminate_]: 6.16e-06 [virtual_output]: 5.66e-06 [merge_forward]: 4.02002e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 9.95002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.174e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.61e-06 [set_forward_comm_id_for_comm_node_pass]: 3.64002e-06 [meta_fg_expand]: 2.58003e-06 [flash_sp_send_recv_attached]: 2.64001e-06 [receive_attached]: 2.44001e-06 [after_resolve]: 9.49e-06 [a_after_grad]: 8.59e-06 [renormalize]: 0.00048215 [add_forward_monad_depend]: 9.27001e-06 [auto_monad_grad]: 2.17999e-06 [auto_monad_eliminator]: 1.381e-05 [cse]: 3.072e-05 [a_3]: 4.243e-05 [Cycle 2]: 0.00060984, [45] [expand_dump_flag]: 1.15999e-06 [switch_simplify]: 7.61999e-06 [loop_unroll]: 5.81e-06 [a_1]: 0.0001166 [with_stream_mark]: 1.004e-05 [recompute_prepare]: 5.89e-06 [updatestate_depend_eliminate]: 3.01001e-06 [updatestate_assign_eliminate]: 2.62001e-06 [updatestate_loads_eliminate]: 2.83e-06 [parameter_eliminate]: 1.02998e-06 [a_2]: 7.212e-05 [accelerated_algorithm]: 6.04999e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 4.62e-06 [auto_parallel]: 6.01e-06 [parallel]: 4.33999e-06 [flash_sp]: 3.23998e-06 [merge_comm]: 3.36999e-06 [allreduce_fusion]: 3.04999e-06 [matmul_add_comm_reduction]: 5.39e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.30002e-06 [virtual_dataset]: 5.58997e-06 [get_grad_eliminate_]: 5.27001e-06 [virtual_output]: 5.06997e-06 [merge_forward]: 2.76999e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 5.76e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.053e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.93002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.63999e-06 [meta_fg_expand]: 1.86e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 8.2e-06 [a_after_grad]: 7.98001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.14998e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 5.98998e-06 [cse]: 1.565e-05 [a_3]: 3.306e-05 [py_interpret_to_execute_after_opt_a]: 7.8e-06 [slice_cell_reuse_recomputed_activation]: 1.95001e-06 [rewriter_after_opt_a]: 3.309e-05 [convert_after_rewriter]: 6.98e-06 [order_py_execute_after_rewriter]: 4.67998e-06 [mutable_eliminate]: 0.00049301 [opt_b]: 0.00018674, [1] [Cycle 1]: 0.00017997, [7] [b_1]: 0.00010965 [b_2]: 7.25e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 4.30009e-07 [cse]: 1.779e-05 [optimize_parallel_all_gather_comm]: 1.659e-05 [overlap_param_gather]: 1.96e-06 [cconv]: 2.412e-05 [loop_unroll]: 0.00043549 [opt_after_cconv]: 9.746e-05, [1] [Cycle 1]: 9.131e-05, [7] [c_1]: 2.536e-05 [parameter_eliminate]: 2.44999e-06 [updatestate_depend_eliminate]: 5.34998e-06 [updatestate_assign_eliminate]: 2.98003e-06 [updatestate_loads_eliminate]: 2.56e-06 [cse]: 1.709e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 1.511e-05 [tuple_transform]: 6.937e-05, [1] [Cycle 1]: 6.481e-05, [4] [d_1]: 3.718e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.64999e-06 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 5.121e-05 [cse_after_recomputation]: 2.18e-05, [1] [Cycle 1]: 1.704e-05, [1] [cse]: 1.155e-05 [environ_conv]: 8.32e-06 [swap_dp_allreduce_reducescatter]: 5.05999e-06 [bias_add_comm_swap]: 2.69001e-06 [label_micro_interleaved_index]: 4.28999e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.24003e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.43002e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 7.90023e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.26998e-06 [reorder_send_recv_between_fp_bp]: 2.63e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.25999e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.23002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99e-06 [control_data_broadcast_order]: 1.251e-05 [grouped_pairwise_exchange_alltoall]: 1.40999e-06 [offloading_packed_experts]: 4.07998e-06 [overlap_recompute_and_grad_model_parallel]: 4.67e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.46e-06 [overlap_grad_ring_attention]: 4.18001e-06 [overlap_grad_flash_sp]: 1.765e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 1.99999e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.57999e-06 [symbol_engine_optimizer]: 0.00013992, [1] [Cycle 1]: 0.00013554, [6] [build]: 2.85002e-06 [elim_shapecalc]: 8.92e-06 [elim_not_effective]: 1.293e-05 [opt_reshape]: 6.37001e-06 [fold_const_symbol]: 9.51e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.88997e-06 [pipeline_parallel_scheduler]: 1.71998e-06 [auto_monad_reorder]: 1.63e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.41999e-06 [opt_after_jit_grad]: 0.0004563 [validate]: 3.481e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.933199 [execute]: 9.55001e-06 Sums bootstrap : 0.000615s : 0.07% type_inference : 0.006760s : 0.72% event_method : 0.000016s : 0.00% auto_monad : 0.000068s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000027s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.00% optimize.rewriter_before_opt_a : 0.000064s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000042s : 0.00% optimize.opt_a.loop_unroll : 0.000060s : 0.01% optimize.opt_a.a_1 : 0.000570s : 0.06% optimize.opt_a.with_stream_mark : 0.000024s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000152s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000013s : 0.00% optimize.opt_a.parallel : 0.000030s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000018s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000482s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000046s : 0.00% optimize.opt_a.a_3 : 0.000075s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000493s : 0.05% optimize.opt_b.b_1 : 0.000110s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.00% optimize.loop_unroll : 0.000435s : 0.05% optimize.opt_after_cconv.c_1 : 0.000025s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.00% optimize.tuple_transform.d_1 : 0.000037s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000008s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000002s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000456s : 0.05% validate : 0.000035s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.933199s : 98.79% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000174 26 19.22% : 0.000033s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 2.88% : 0.000005s : 3: substitution.graph_param_transform 64.44% : 0.000112s : 3: substitution.inline 1.74% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.57% : 0.000004s : 4: substitution.remove_not_recompute_node 1.90% : 0.000003s : 2: substitution.replace_old_param 5.36% : 0.000009s : 1: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006704 2 90.94% : 0.006097s : 1: type_inference.infer 9.06% : 0.000607s : 1: type_inference.specialize ------[replace.] 0.000037 4 78.01% : 0.000029s : 3: replace.inline 21.99% : 0.000008s : 1: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 4 92.90% : 0.000110s : 3: match.inline 7.10% : 0.000008s : 1: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 883 0.92% : 0.000001s : 9: predicate.accumulaten_eliminater 0.84% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 6: predicate.addn_check_dump 0.90% : 0.000001s : 9: predicate.addn_zero_filter 0.81% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.24% : 0.000004s : 15: predicate.arithmetic_simplify 1.19% : 0.000002s : 9: predicate.cast_eliminate 0.66% : 0.000001s : 6: predicate.check_bprop_eliminate 0.57% : 0.000001s : 6: predicate.compare_switch_simplify 0.21% : 0.000000s : 3: predicate.const_output_eliminate 0.77% : 0.000001s : 6: predicate.depend_value_elim 0.92% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.10% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.96% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.02% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 3: predicate.elim_not_effective 0.38% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 12: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 12: predicate.environ_get_add_eliminate 1.14% : 0.000002s : 12: predicate.environ_get_depend_swap 1.83% : 0.000003s : 18: predicate.environ_get_eliminate 1.10% : 0.000002s : 12: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 13: predicate.exchange_switch_depend_value 2.37% : 0.000004s : 13: predicate.float_depend_g_call 0.58% : 0.000001s : 6: predicate.float_environ_get_switch 0.82% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 3: predicate.fold_const_symbol 0.74% : 0.000001s : 6: predicate.get_grad_eliminate 0.21% : 0.000000s : 3: predicate.graph_param_transform 0.66% : 0.000001s : 6: predicate.incorporate_call 0.57% : 0.000001s : 6: predicate.incorporate_call_switch 6.31% : 0.000010s : 40: predicate.inline 0.95% : 0.000002s : 6: predicate.inline_without_move 0.38% : 0.000001s : 6: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 6: predicate.less_batch_normalization 1.78% : 0.000003s : 16: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 25: predicate.load_eliminater 1.03% : 0.000002s : 3: predicate.loop_unroll_after_grad 2.17% : 0.000004s : 21: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 15: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 6: predicate.merge_addn 0.61% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.84% : 0.000001s : 9: predicate.minmaximum_grad 1.08% : 0.000002s : 3: predicate.mutable_eliminate 0.33% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.57% : 0.000003s : 13: predicate.partial_defer_inline 1.47% : 0.000002s : 13: predicate.partial_eliminate 0.88% : 0.000001s : 9: predicate.print_const_string_wrapper 0.61% : 0.000001s : 6: predicate.reduce_all_const_elim 1.13% : 0.000002s : 9: predicate.reduce_eliminate 2.38% : 0.000004s : 25: predicate.redundant_stop_gradient_eliminater 0.44% : 0.000001s : 6: predicate.remove_not_recompute_node 1.28% : 0.000002s : 16: predicate.replace_applicator 0.57% : 0.000001s : 6: predicate.replace_old_param 0.28% : 0.000000s : 3: predicate.reset_defer_inline 0.93% : 0.000002s : 9: predicate.reshape_eliminate 0.62% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 3: predicate.row_tensor_eliminate 0.77% : 0.000001s : 6: predicate.same_eliminate 0.48% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 6: predicate.shard_identity_eliminate 0.72% : 0.000001s : 6: predicate.special_op_eliminate 0.82% : 0.000001s : 6: predicate.specialize_transform 0.91% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 13: predicate.switch_defer_inline 2.11% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.94% : 0.000008s : 43: predicate.switch_simplify 0.94% : 0.000002s : 9: predicate.tile_eliminate 0.92% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000003s : 15: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 15: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 15: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 22: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 15: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000004s : 21: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 16: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 25: predicate.updatestate_pure_node_eliminater 3.01% : 0.000005s : 31: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 3: predicate.value_based_eliminate 0.68% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 6: predicate.virtual_output_eliminate 0.30% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000384 8 48.60% : 0.000187s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.40% : 0.000198s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.958948 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.38% : 0.003659s : 1: add_attr 0.38% : 0.003647s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000074s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.07% : 0.000653s : 1: bootstrap 0.00% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.00% : 0.000022s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000445s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000502s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000014s : 1: opt.transform.mutable_eliminate 0.10% : 0.000975s : 78: opt.transform.opt_a 0.00% : 0.000024s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000088s : 28: opt.transform.opt_b 0.00% : 0.000042s : 2: opt.transform.opt_trans_graph 0.00% : 0.000034s : 4: opt.transform.symbol_engine_opt 0.24% : 0.002278s : 1: opt_a 0.01% : 0.000101s : 1: opt_after_cconv 0.05% : 0.000466s : 1: opt_after_jit_grad 0.02% : 0.000190s : 1: opt_b 0.45% : 0.004298s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000031s : 1: pre_auto_parallel 0.00% : 0.000026s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000019s : 1: remove_dup_value 0.03% : 0.000250s : 1: renormalize.infer 0.02% : 0.000225s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000037s : 1: rewriter_after_opt_a 0.01% : 0.000068s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000143s : 1: symbol_engine_optimizer 97.32% : 0.933223s : 1: task_emit 0.01% : 0.000072s : 1: tuple_transform 0.71% : 0.006777s : 1: type_inference 0.01% : 0.000058s : 1: validate TotalTime = 0.0560229, [24] [bootstrap]: 0.00049917 [type_inference]: 0.00598549 [event_method]: 1.241e-05 [auto_monad]: 5.857e-05 [graph_reusing]: 5.99999e-06 [inline]: 1.79e-06 [add_attr]: 0.00302194, [1] [add_attr_with_inline]: 0.00301428, [1] [Cycle 1]: 4.758e-05, [2] [tag_attr]: 1.48e-05 [meta_addattr_fg_expand]: 4.13999e-06 [parallel-infer-symbol]: 3.08998e-06 [pre_auto_parallel]: 2.345e-05 [insert-virtual-dataset]: 2.49001e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 2.31e-06 [pipeline_split]: 1.85001e-06 [optimize]: 0.00395735, [53] [py_interpret_to_execute]: 1.924e-05 [rewriter_before_opt_a]: 5.173e-05 [opt_a]: 0.00206476, [2] [Cycle 1]: 0.00145391, [45] [expand_dump_flag]: 2.98e-06 [switch_simplify]: 2.781e-05 [loop_unroll]: 1.735e-05 [a_1]: 0.00035471 [with_stream_mark]: 1.511e-05 [recompute_prepare]: 8.72998e-06 [updatestate_depend_eliminate]: 3.93001e-06 [updatestate_assign_eliminate]: 4.00998e-06 [updatestate_loads_eliminate]: 3.28e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 8.135e-05 [accelerated_algorithm]: 7.03e-06 [shard]: 1.76003e-06 [meta_shard_fg_expand]: 1.90001e-06 [shard_inline]: 6.49001e-06 [merge_send_recv]: 8.75001e-06 [auto_parallel]: 6.09999e-06 [parallel]: 1.896e-05 [flash_sp]: 7.80998e-06 [merge_comm]: 4.02e-06 [allreduce_fusion]: 3.57002e-06 [matmul_add_comm_reduction]: 9.39e-06 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 7.38e-06 [virtual_dataset]: 6.33e-06 [get_grad_eliminate_]: 5.81e-06 [virtual_output]: 5.85002e-06 [merge_forward]: 4.17003e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.64e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.186e-05 [merge_recompute_call_nodes]: 1.61998e-06 [before_grad]: 1.075e-05 [set_forward_comm_id_for_comm_node_pass]: 3.75998e-06 [meta_fg_expand]: 2.56998e-06 [flash_sp_send_recv_attached]: 2.51e-06 [receive_attached]: 2.22999e-06 [after_resolve]: 1.01e-05 [a_after_grad]: 8.64998e-06 [renormalize]: 0.0004353 [add_forward_monad_depend]: 4.60001e-06 [auto_monad_grad]: 1.85001e-06 [auto_monad_eliminator]: 1.372e-05 [cse]: 3.046e-05 [a_3]: 4.207e-05 [Cycle 2]: 0.00060158, [45] [expand_dump_flag]: 8.79983e-07 [switch_simplify]: 7.01001e-06 [loop_unroll]: 5.67001e-06 [a_1]: 0.00011357 [with_stream_mark]: 1.13e-05 [recompute_prepare]: 5.83002e-06 [updatestate_depend_eliminate]: 2.93e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.72001e-06 [parameter_eliminate]: 1.06997e-06 [a_2]: 7.089e-05 [accelerated_algorithm]: 5.77001e-06 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 4.26001e-06 [auto_parallel]: 5.43002e-06 [parallel]: 4.23001e-06 [flash_sp]: 3.25e-06 [merge_comm]: 3.21999e-06 [allreduce_fusion]: 2.86999e-06 [matmul_add_comm_reduction]: 5.30001e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.86999e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.27999e-06 [virtual_output]: 5.14e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.062e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 8.57e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38999e-06 [meta_fg_expand]: 1.76e-06 [flash_sp_send_recv_attached]: 8.40024e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 8.60001e-06 [a_after_grad]: 8.17e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 1.06002e-06 [auto_monad_eliminator]: 6.40002e-06 [cse]: 1.348e-05 [a_3]: 3.302e-05 [py_interpret_to_execute_after_opt_a]: 7.53e-06 [slice_cell_reuse_recomputed_activation]: 2.07999e-06 [rewriter_after_opt_a]: 3.233e-05 [convert_after_rewriter]: 6.49001e-06 [order_py_execute_after_rewriter]: 5.25001e-06 [mutable_eliminate]: 0.00047121 [opt_b]: 0.00018777, [1] [Cycle 1]: 0.00018144, [7] [b_1]: 0.00011028 [b_2]: 7.68999e-06 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 4.39992e-07 [cse]: 1.781e-05 [optimize_parallel_all_gather_comm]: 1.579e-05 [overlap_param_gather]: 2.19999e-06 [cconv]: 2.207e-05 [loop_unroll]: 0.00042431 [opt_after_cconv]: 9.793e-05, [1] [Cycle 1]: 9.225e-05, [7] [c_1]: 2.623e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 5.40001e-06 [updatestate_assign_eliminate]: 2.78998e-06 [updatestate_loads_eliminate]: 2.44001e-06 [cse]: 1.781e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.537e-05 [tuple_transform]: 6.741e-05, [1] [Cycle 1]: 6.294e-05, [4] [d_1]: 3.64e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.48e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.573e-05 [cse_after_recomputation]: 2.238e-05, [1] [Cycle 1]: 1.796e-05, [1] [cse]: 1.235e-05 [environ_conv]: 5.74e-06 [swap_dp_allreduce_reducescatter]: 4.85999e-06 [bias_add_comm_swap]: 2.73e-06 [label_micro_interleaved_index]: 4.93001e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.64999e-06 [assign_add_opt]: 1.35001e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.63e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.52001e-06 [interleave_parallel_branches]: 1.14003e-06 [overlap_opt_shard_in_pipeline]: 1.31002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72999e-06 [control_data_broadcast_order]: 1.266e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 4.13999e-06 [overlap_recompute_and_grad_model_parallel]: 5.19e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.38002e-06 [overlap_recompute_comm]: 2.20002e-06 [overlap_grad_ring_attention]: 4.27e-06 [overlap_grad_flash_sp]: 1.958e-05 [begin_end_overlap_inline]: 6.10016e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 2.19999e-06 [handle_group_info]: 1.50999e-06 [symbol_engine_optimizer]: 7.247e-05, [1] [Cycle 1]: 6.823e-05, [6] [build]: 3.02002e-06 [elim_shapecalc]: 8.75001e-06 [elim_not_effective]: 1.22e-05 [opt_reshape]: 6.29001e-06 [fold_const_symbol]: 9.41e-06 [renormalize]: 2.69996e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.53002e-06 [auto_monad_reorder]: 1.647e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.56999e-06 [opt_after_jit_grad]: 0.00046271 [validate]: 3.579e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.0417019 [execute]: 9.79999e-06 Sums bootstrap : 0.000499s : 0.96% type_inference : 0.005985s : 11.51% event_method : 0.000012s : 0.02% auto_monad : 0.000059s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000023s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.04% optimize.rewriter_before_opt_a : 0.000052s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000035s : 0.07% optimize.opt_a.loop_unroll : 0.000023s : 0.04% optimize.opt_a.a_1 : 0.000468s : 0.90% optimize.opt_a.with_stream_mark : 0.000026s : 0.05% optimize.opt_a.recompute_prepare : 0.000015s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000152s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.03% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.04% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000435s : 0.84% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000044s : 0.08% optimize.opt_a.a_3 : 0.000075s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.06% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000471s : 0.91% optimize.opt_b.b_1 : 0.000110s : 0.21% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.04% optimize.loop_unroll : 0.000424s : 0.82% optimize.opt_after_cconv.c_1 : 0.000026s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000036s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.09% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.04% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000002s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000463s : 0.89% validate : 0.000036s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.041702s : 80.20% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000142 24 20.22% : 0.000029s : 4: substitution.arithmetic_simplify 1.41% : 0.000002s : 2: substitution.elim_not_effective 0.97% : 0.000001s : 2: substitution.fold_const_symbol 3.57% : 0.000005s : 3: substitution.graph_param_transform 65.81% : 0.000094s : 3: substitution.inline 2.59% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.15% : 0.000004s : 4: substitution.remove_not_recompute_node 2.28% : 0.000003s : 2: substitution.replace_old_param ------[type_inference.] 0.005942 2 91.98% : 0.005465s : 1: type_inference.infer 8.02% : 0.000476s : 1: type_inference.specialize ------[replace.] 0.000027 3 100.00% : 0.000027s : 3: replace.inline ------[match.] 0.000092 3 100.00% : 0.000092s : 3: match.inline ------[predicate.] 0.000145 815 0.87% : 0.000001s : 8: predicate.accumulaten_eliminater 0.82% : 0.000001s : 3: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 6: predicate.addn_check_dump 0.94% : 0.000001s : 8: predicate.addn_zero_filter 0.80% : 0.000001s : 8: predicate.adjust_all_reduce_mul_add 2.28% : 0.000003s : 14: predicate.arithmetic_simplify 0.87% : 0.000001s : 8: predicate.cast_eliminate 0.67% : 0.000001s : 6: predicate.check_bprop_eliminate 0.63% : 0.000001s : 6: predicate.compare_switch_simplify 0.23% : 0.000000s : 3: predicate.const_output_eliminate 0.62% : 0.000001s : 6: predicate.depend_value_elim 0.85% : 0.000001s : 8: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 8: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 8: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 6: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 3: predicate.elim_not_effective 0.43% : 0.000001s : 3: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 11: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 11: predicate.environ_get_add_eliminate 1.15% : 0.000002s : 11: predicate.environ_get_depend_swap 1.84% : 0.000003s : 17: predicate.environ_get_eliminate 1.09% : 0.000002s : 11: predicate.environ_get_set_eliminate 1.17% : 0.000002s : 11: predicate.exchange_switch_depend_value 2.21% : 0.000003s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 6: predicate.float_environ_get_switch 0.91% : 0.000001s : 9: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 3: predicate.fold_const_symbol 0.85% : 0.000001s : 6: predicate.get_grad_eliminate 0.25% : 0.000000s : 3: predicate.graph_param_transform 0.71% : 0.000001s : 6: predicate.incorporate_call 0.61% : 0.000001s : 6: predicate.incorporate_call_switch 6.17% : 0.000009s : 37: predicate.inline 1.02% : 0.000001s : 6: predicate.inline_without_move 0.44% : 0.000001s : 6: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 6: predicate.less_batch_normalization 1.55% : 0.000002s : 14: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 22: predicate.load_eliminater 0.98% : 0.000001s : 3: predicate.loop_unroll_after_grad 1.98% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.72% : 0.000002s : 14: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 6: predicate.merge_addn 0.65% : 0.000001s : 6: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 6: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 8: predicate.minmaximum_grad 1.23% : 0.000002s : 3: predicate.mutable_eliminate 0.43% : 0.000001s : 3: predicate.opt_reshape 0.40% : 0.000001s : 3: predicate.parallel_virtual_node 1.46% : 0.000002s : 11: predicate.partial_defer_inline 1.34% : 0.000002s : 11: predicate.partial_eliminate 0.87% : 0.000001s : 8: predicate.print_const_string_wrapper 0.66% : 0.000001s : 6: predicate.reduce_all_const_elim 1.19% : 0.000002s : 8: predicate.reduce_eliminate 2.26% : 0.000003s : 22: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 6: predicate.remove_not_recompute_node 1.29% : 0.000002s : 14: predicate.replace_applicator 1.07% : 0.000002s : 6: predicate.replace_old_param 0.29% : 0.000000s : 3: predicate.reset_defer_inline 0.90% : 0.000001s : 8: predicate.reshape_eliminate 0.63% : 0.000001s : 6: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 3: predicate.row_tensor_eliminate 0.91% : 0.000001s : 6: predicate.same_eliminate 0.49% : 0.000001s : 6: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 6: predicate.shard_identity_eliminate 0.76% : 0.000001s : 6: predicate.special_op_eliminate 0.92% : 0.000001s : 6: predicate.specialize_transform 1.02% : 0.000001s : 6: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 6: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 3: predicate.switch_call_monad_eliminater 1.28% : 0.000002s : 11: predicate.switch_defer_inline 1.90% : 0.000003s : 17: predicate.switch_layer_defer_inline 5.07% : 0.000007s : 38: predicate.switch_simplify 0.92% : 0.000001s : 8: predicate.tile_eliminate 0.87% : 0.000001s : 8: predicate.transpose_eliminate 1.54% : 0.000002s : 14: predicate.tuple_list_convert_item_index_to_positive 1.75% : 0.000003s : 14: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 14: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000005s : 20: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 14: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 20: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 14: predicate.tuple_to_list_eliminator_ 2.21% : 0.000003s : 22: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 28: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 3: predicate.value_based_eliminate 0.78% : 0.000001s : 6: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 6: predicate.virtual_output_eliminate 0.32% : 0.000000s : 3: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 3: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000297 7 39.92% : 0.000119s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.08% : 0.000179s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064420 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.70% : 0.003027s : 1: add_attr 4.68% : 0.003018s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000064s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.83% : 0.000536s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000018s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.67% : 0.000433s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.74% : 0.000480s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.30% : 0.000835s : 78: opt.transform.opt_a 0.04% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000041s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 3.21% : 0.002068s : 1: opt_a 0.16% : 0.000101s : 1: opt_after_cconv 0.73% : 0.000472s : 1: opt_after_jit_grad 0.30% : 0.000191s : 1: opt_b 6.15% : 0.003961s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.04% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000028s : 1: pre_auto_parallel 0.04% : 0.000023s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000019s : 1: remove_dup_value 0.38% : 0.000246s : 1: renormalize.infer 0.28% : 0.000183s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000036s : 1: rewriter_after_opt_a 0.09% : 0.000056s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000075s : 1: symbol_engine_optimizer 64.77% : 0.041723s : 1: task_emit 0.11% : 0.000070s : 1: tuple_transform 9.31% : 0.005999s : 1: type_inference 0.09% : 0.000058s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_scalar_tensor_promotion[True-dtype_x9-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[True-dtype_x9-ge],max_mem:14.0M =============================== warnings summary =============================== ../../../../../../../../usr/local/Ascend/cann-8.5.0/python/site-packages/tbe/dsl/classifier/transdata/transdata_classifier.py:222 /usr/local/Ascend/cann-8.5.0/python/site-packages/tbe/dsl/classifier/transdata/transdata_classifier.py:222: DeprecationWarning: invalid escape sequence \B """ ../../../../../../../../usr/local/Ascend/cann-8.5.0/python/site-packages/tbe/dsl/unify_schedule/vector/transdata/common/graph/transdata_graph_info.py:143 /usr/local/Ascend/cann-8.5.0/python/site-packages/tbe/dsl/unify_schedule/vector/transdata/common/graph/transdata_graph_info.py:143: DeprecationWarning: invalid escape sequence \c """ ../../../../../../../../usr/local/Ascend/cann-8.5.0/python/site-packages/tbe/dsl/unify_schedule/vector/transdata/common/graph/transdata_graph_info.py:170 /usr/local/Ascend/cann-8.5.0/python/site-packages/tbe/dsl/unify_schedule/vector/transdata/common/graph/transdata_graph_info.py:170: DeprecationWarning: invalid escape sequence \c """ ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2.py:57 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2.py:57: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("batchnorm_fold2") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2_grad.py:56 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2_grad.py:56: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("batchnorm_fold2_grad") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2_grad_reduce.py:48 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2_grad_reduce.py:48: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("batchnorm_fold2_grad_reduce") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul.py:51 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul.py:51: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("correction_mul") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul_grad.py:51 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul_grad.py:51: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("correction_mul_grad") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul_grad.py:143 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul_grad.py:143: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("correction_mul_grad_reduce") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer.py:50 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer.py:50: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perlayer") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer_grad.py:92 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer_grad.py:92: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perlayer_grad_d") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer_grad_reduce.py:49 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer_grad_reduce.py:49: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perlayer_grad_d_reduce") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel.py:50 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel.py:50: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perchannel") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel_grad.py:91 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel_grad.py:91: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perchannel_grad_d") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel_grad_reduce.py:48 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel_grad_reduce.py:48: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perchannel_grad_d_reduce") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perchannel.py:52 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perchannel.py:52: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_quant_perchannel") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perchannel_grad.py:81 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perchannel_grad.py:81: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_quant_perchannel_grad") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perlayer.py:54 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perlayer.py:54: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_quant_per_layer") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perlayer_grad.py:81 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perlayer_grad.py:81: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_quant_per_layer_grad") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/minmax_update_perchannel.py:50 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/minmax_update_perchannel.py:50: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("minmax_update_perchannel") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/minmax_update_perlayer.py:50 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/minmax_update_perlayer.py:50: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("minmax_update_perlayer") test_functional_mul.py::test_mint_mul_scalar_tensor_promotion[2.0-dtype_x0-ge] /usr/local/Ascend/cann-8.5.0/python/site-packages/asc_op_compile_base/asc_op_compiler/ascendc_compile_gen_code.py:161: DeprecationWarning: invalid escape sequence \w match = re.search(f'{option}=(\w+)', ' '.join(compile_options)) -- Docs: https://docs.pytest.org/en/stable/warnings.html ================= 90 passed, 26 warnings in 648.86s (0:10:48) ==================