test_run_multi_cards_cases-test_multi_cards_level0_cases-log/0000755000175100001440000000000015114171216024614 5ustar jenkinsuserstest_run_multi_cards_cases-test_multi_cards_level0_cases-log/log/0000750000175100001440000000000015114170476025377 5ustar jenkinsuserstest_run_multi_cards_cases-test_multi_cards_level0_cases-log/log/rank_0/0000750000175100001440000000000015114170013026534 5ustar jenkinsuserstest_run_multi_cards_cases-test_multi_cards_level0_cases-log/log/rank_0/info.log0000640000175100001440000040125015114170702030202 0ustar jenkinsusers[WARNING] 2025-12-04 10:20:59,833 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:20:59,872 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [WARNING] 2025-12-04 10:20:59,876 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [WARNING] 2025-12-04 10:25:44,037 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:25:44,076 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [WARNING] 2025-12-04 10:25:44,080 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [INFO] 2025-12-04 10:25:47,721 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:557] collect_task_cases: 🔍 Start importing test cases for level: level0 ... [INFO] 2025-12-04 10:25:47,801 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:562] collect_task_cases: ✅ Finished importing 13 test cases for level: level0. Group types: ['TaskType.EIGHT_CARDS_TASK', 'TaskType.FOUR_CARDS_TASK', 'TaskType.TWO_CARDS_TASK'] [INFO] 2025-12-04 10:25:47,801 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:566] collect_task_cases: - Group TaskType.EIGHT_CARDS_TASK: 1 tasks [INFO] 2025-12-04 10:25:47,801 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:566] collect_task_cases: - Group TaskType.FOUR_CARDS_TASK: 5 tasks [INFO] 2025-12-04 10:25:47,801 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:566] collect_task_cases: - Group TaskType.TWO_CARDS_TASK: 7 tasks [INFO] 2025-12-04 10:25:47,802 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:365] run: === Processing Group: 2 (7 tasks) === [INFO] 2025-12-04 10:25:47,802 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:445] schedule_group: 📝 Task execution plan: [INFO] 2025-12-04 10:25:47,802 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:447] schedule_group: [1] pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py (type: TaskType.TWO_CARDS_TASK, est. time: 300s) [INFO] 2025-12-04 10:25:47,802 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:447] schedule_group: [2] pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_attention/test_self_attention/test_infer_self_attention.py (type: TaskType.TWO_CARDS_TASK, est. time: 300s) [INFO] 2025-12-04 10:25:47,802 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:447] schedule_group: [3] pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_mlp/test_infer_mlp.py (type: TaskType.TWO_CARDS_TASK, est. time: 300s) [INFO] 2025-12-04 10:25:47,802 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:447] schedule_group: [4] pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/test_infer_transformer_block.py (type: TaskType.TWO_CARDS_TASK, est. time: 300s) [INFO] 2025-12-04 10:25:47,803 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:447] schedule_group: [5] pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/test_infer_transformer_layer.py (type: TaskType.TWO_CARDS_TASK, est. time: 300s) [INFO] 2025-12-04 10:25:47,803 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:447] schedule_group: [6] pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py (type: TaskType.TWO_CARDS_TASK, est. time: 300s) [INFO] 2025-12-04 10:25:47,803 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:447] schedule_group: [7] pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3_moe/test_qwen3_moe_infer/test_qwen3_moe_infer.py (type: TaskType.TWO_CARDS_TASK, est. time: 300s) [INFO] 2025-12-04 10:25:47,804 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:258] worker: 🏃 Running: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py on cards [0, 1] (port 50000) [Timeout: 600s] [INFO] 2025-12-04 10:25:47,804 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:258] worker: 🏃 Running: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_attention/test_self_attention/test_infer_self_attention.py on cards [2, 3] (port 50001) [Timeout: 600s] [INFO] 2025-12-04 10:25:47,805 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:258] worker: 🏃 Running: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_mlp/test_infer_mlp.py on cards [4, 5] (port 50002) [Timeout: 600s] [INFO] 2025-12-04 10:25:47,805 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:258] worker: 🏃 Running: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/test_infer_transformer_block.py on cards [6, 7] (port 50003) [Timeout: 600s] [WARNING] 2025-12-04 10:25:53,214 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:25:53,253 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [WARNING] 2025-12-04 10:25:53,257 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [WARNING] 2025-12-04 10:25:53,386 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:25:53,414 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:25:53,424 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:25:53,425 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [WARNING] 2025-12-04 10:25:53,430 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [WARNING] 2025-12-04 10:25:53,452 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [WARNING] 2025-12-04 10:25:53,457 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [WARNING] 2025-12-04 10:25:53,461 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [WARNING] 2025-12-04 10:25:53,465 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [INFO] 2025-12-04 10:25:57,028 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_attention/test_self_attention/test_infer_self_attention.py:95] build_msrun_command_list: Equivalent shell command for debugging (approximate): msrun --worker_num=2 --local_worker_num=2 --master_port=50001 --log_dir=/tmp/pytest-of-jenkins/pytest-8/test_two_cards_configurations_0/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_attention/test_self_attention/run_infer_self_attention.py --batch_size=2 --prefill_seq_len=2 --decode_seq_len=1 --num_heads=2 --num_query_groups=2 --hidden_size=64 --use_flash_attention=true --output_path=/tmp/pytest-of-jenkins/pytest-8/test_two_cards_configurations_0/output_ms.npz --tensor_parallel=2 [INFO] 2025-12-04 10:25:57,264 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:123] build_msrun_command_list: Equivalent shell command for debugging (approximate): msrun --worker_num=2 --local_worker_num=2 --master_port=50121 --log_dir=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 --is_prefill [INFO] 2025-12-04 10:25:57,264 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:232] run_test: Running command: msrun --worker_num=2 --local_worker_num=2 --master_port=50121 --log_dir=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 --is_prefill [INFO] 2025-12-04 10:25:57,288 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/test_infer_transformer_block.py:227] test_two_cards_cases: --- Running Multi-Card Test: model_args={'num_experts': None, 'moe_grouped_gemm': False, 'qk_layernorm': False, 'multi_latent_attention': False, 'qk_l2_norm': False, 'sandwich_norm': False, 'num_layers': 1}, TP=2 --- [INFO] 2025-12-04 10:25:57,289 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/test_infer_transformer_block.py:111] build_msrun_command_list: Equivalent shell command for debugging (approximate): msrun --worker_num=2 --local_worker_num=2 --master_port=50003 --log_dir=/tmp/pytest-of-jenkins/pytest-10/test_two_cards_cases_model_arg0/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/run_infer_transformer_block.py --batch_size=2 --seq_length=2 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=1 --output_path=/tmp/pytest-of-jenkins/pytest-10/test_two_cards_cases_model_arg0/output_ms.npz --tensor_parallel=2 [INFO] 2025-12-04 10:25:57,289 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/test_infer_transformer_block.py:217] run_test: Running command: msrun --worker_num=2 --local_worker_num=2 --master_port=50003 --log_dir=/tmp/pytest-of-jenkins/pytest-10/test_two_cards_cases_model_arg0/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/run_infer_transformer_block.py --batch_size=2 --seq_length=2 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=1 --output_path=/tmp/pytest-of-jenkins/pytest-10/test_two_cards_cases_model_arg0/output_ms.npz --tensor_parallel=2 [INFO] 2025-12-04 10:25:57,352 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_mlp/test_infer_mlp.py:95] build_msrun_command_list: Equivalent shell command for debugging (approximate): msrun --worker_num=2 --local_worker_num=2 --master_port=50002 --log_dir=/tmp/pytest-of-jenkins/pytest-11/test_two_cards_cases_model_arg0/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_mlp/run_infer_mlp.py --ffn_hidden_size=32 --has_bias=false --gated_linear_unit=true --output_path=/tmp/pytest-of-jenkins/pytest-11/test_two_cards_cases_model_arg0/output_ms.npz --tensor_parallel=2 --input_size=32 [WARNING] 2025-12-04 10:26:06,536 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:26:06,567 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:26:06,598 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:26:06,651 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:26:06,662 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:26:06,672 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:26:06,679 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:26:06,748 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [INFO] 2025-12-04 10:26:12,229 [mindformers/parallel_core/inference/parallel_state.py:297] initialize_moe_model_parallel: expert_model_parallel_size(1) is not equal to tensor_and_data_parallel_size(2), so we will use 2 as the MOE_tensor_parallel_size. [INFO] 2025-12-04 10:26:12,872 [mindformers/parallel_core/inference/parallel_state.py:297] initialize_moe_model_parallel: expert_model_parallel_size(1) is not equal to tensor_and_data_parallel_size(2), so we will use 2 as the MOE_tensor_parallel_size. [INFO] 2025-12-04 10:26:13,128 [mindformers/parallel_core/inference/parallel_state.py:297] initialize_moe_model_parallel: expert_model_parallel_size(1) is not equal to tensor_and_data_parallel_size(2), so we will use 2 as the MOE_tensor_parallel_size. [INFO] 2025-12-04 10:26:26,058 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_mlp/test_infer_mlp.py:95] build_msrun_command_list: Equivalent shell command for debugging (approximate): msrun --worker_num=2 --local_worker_num=2 --master_port=50002 --log_dir=/tmp/pytest-of-jenkins/pytest-11/test_two_cards_cases_model_arg1/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_mlp/run_infer_mlp.py --ffn_hidden_size=32 --has_bias=true --gated_linear_unit=true --output_path=/tmp/pytest-of-jenkins/pytest-11/test_two_cards_cases_model_arg1/output_ms.npz --tensor_parallel=2 --input_size=32 [WARNING] 2025-12-04 10:26:35,454 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:26:35,561 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [INFO] 2025-12-04 10:26:37,097 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/test_infer_transformer_block.py:227] test_two_cards_cases: --- Running Multi-Card Test: model_args={'num_experts': None, 'moe_grouped_gemm': False, 'qk_layernorm': False, 'multi_latent_attention': False, 'qk_l2_norm': False, 'sandwich_norm': False, 'num_layers': 2}, TP=2 --- [INFO] 2025-12-04 10:26:37,098 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/test_infer_transformer_block.py:111] build_msrun_command_list: Equivalent shell command for debugging (approximate): msrun --worker_num=2 --local_worker_num=2 --master_port=50003 --log_dir=/tmp/pytest-of-jenkins/pytest-10/test_two_cards_cases_model_arg1/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/run_infer_transformer_block.py --batch_size=2 --seq_length=2 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-10/test_two_cards_cases_model_arg1/output_ms.npz --tensor_parallel=2 [INFO] 2025-12-04 10:26:37,098 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/test_infer_transformer_block.py:217] run_test: Running command: msrun --worker_num=2 --local_worker_num=2 --master_port=50003 --log_dir=/tmp/pytest-of-jenkins/pytest-10/test_two_cards_cases_model_arg1/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/run_infer_transformer_block.py --batch_size=2 --seq_length=2 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-10/test_two_cards_cases_model_arg1/output_ms.npz --tensor_parallel=2 [INFO] 2025-12-04 10:26:41,342 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:281] worker: ✅ PASSED: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_attention/test_self_attention/test_infer_self_attention.py | Time: 53.538s [INFO] 2025-12-04 10:26:41,343 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:258] worker: 🏃 Running: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/test_infer_transformer_layer.py on cards [2, 3] (port 50001) [Timeout: 600s] [INFO] 2025-12-04 10:26:42,071 [mindformers/parallel_core/inference/parallel_state.py:297] initialize_moe_model_parallel: expert_model_parallel_size(1) is not equal to tensor_and_data_parallel_size(2), so we will use 2 as the MOE_tensor_parallel_size. [WARNING] 2025-12-04 10:26:46,384 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:26:46,481 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:26:46,942 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:26:46,982 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [WARNING] 2025-12-04 10:26:46,986 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [INFO] 2025-12-04 10:26:50,872 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/test_infer_transformer_layer.py:225] test_two_cards_cases: --- Running Multi-Card Test: model_args={'num_experts': None, 'moe_grouped_gemm': False, 'qk_layernorm': False, 'multi_latent_attention': False, 'qk_l2_norm': False, 'sandwich_norm': False}, TP=2 --- [INFO] 2025-12-04 10:26:50,873 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/test_infer_transformer_layer.py:109] build_msrun_command_list: Equivalent shell command for debugging (approximate): msrun --worker_num=2 --local_worker_num=2 --master_port=50001 --log_dir=/tmp/pytest-of-jenkins/pytest-12/test_two_cards_cases_model_arg0/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/run_infer_transformer_layer.py --batch_size=2 --seq_length=2 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --output_path=/tmp/pytest-of-jenkins/pytest-12/test_two_cards_cases_model_arg0/output_ms.npz --tensor_parallel=2 [INFO] 2025-12-04 10:26:50,873 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/test_infer_transformer_layer.py:215] run_test: Running command: msrun --worker_num=2 --local_worker_num=2 --master_port=50001 --log_dir=/tmp/pytest-of-jenkins/pytest-12/test_two_cards_cases_model_arg0/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/run_infer_transformer_layer.py --batch_size=2 --seq_length=2 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --output_path=/tmp/pytest-of-jenkins/pytest-12/test_two_cards_cases_model_arg0/output_ms.npz --tensor_parallel=2 [INFO] 2025-12-04 10:26:52,423 [mindformers/parallel_core/inference/parallel_state.py:297] initialize_moe_model_parallel: expert_model_parallel_size(1) is not equal to tensor_and_data_parallel_size(2), so we will use 2 as the MOE_tensor_parallel_size. [INFO] 2025-12-04 10:26:58,499 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:281] worker: ✅ PASSED: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_mlp/test_infer_mlp.py | Time: 70.695s [INFO] 2025-12-04 10:26:58,500 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:258] worker: 🏃 Running: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py on cards [4, 5] (port 50002) [Timeout: 600s] [WARNING] 2025-12-04 10:27:00,147 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:27:00,229 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:27:03,934 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:27:03,973 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [WARNING] 2025-12-04 10:27:03,977 [/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages [INFO] 2025-12-04 10:27:04,229 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:123] build_msrun_command_list: Equivalent shell command for debugging (approximate): msrun --worker_num=2 --local_worker_num=2 --master_port=65331 --log_dir=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 [INFO] 2025-12-04 10:27:04,230 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:232] run_test: Running command: msrun --worker_num=2 --local_worker_num=2 --master_port=65331 --log_dir=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 [INFO] 2025-12-04 10:27:06,235 [mindformers/parallel_core/inference/parallel_state.py:297] initialize_moe_model_parallel: expert_model_parallel_size(1) is not equal to tensor_and_data_parallel_size(2), so we will use 2 as the MOE_tensor_parallel_size. [INFO] 2025-12-04 10:27:07,716 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py:61] test_two_cards_cases: Running command: msrun --worker_num=2 --local_worker_num=2 --master_port=50002 --log_dir=./msrun_log_qwen3 --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py --device_num=2 [WARNING] 2025-12-04 10:27:13,740 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:27:13,772 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [INFO] 2025-12-04 10:27:18,904 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:283] worker: ❌ FAILED: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py (exit code 1) | Time: 20.404s [INFO] 2025-12-04 10:27:18,904 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:285] worker: Output: ============================= test session starts ============================== platform linux -- Python 3.9.19, pytest-6.2.5, py-1.11.0, pluggy-1.6.0 -- /home/miniconda3/envs/ci39/bin/python3.9 cachedir: .pytest_cache rootdir: /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases plugins: forked-1.6.0, anyio-4.9.0, mock-3.14.1, timeout-2.2.0, hydra-core-1.3.2, xdist-1.32.0 collecting ... 2025-12-04 10:27:03,934 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-12-04 10:27:03,973 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages 2025-12-04 10:27:03,977 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages collected 1 item test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py::TestMcoreQwen3ParallelInference::test_two_cards_cases 2025-12-04 10:27:07,716 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py:61] - INFO - Running command: msrun --worker_num=2 --local_worker_num=2 --master_port=50002 --log_dir=./msrun_log_qwen3 --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py --device_num=2 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) Start scheduler process, log file:./msrun_log_qwen3/scheduler.log. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py --device_num=2 Start worker process with rank id:0, log file:./msrun_log_qwen3/worker_0.log. Environment variable [RANK_ID=0] is exported. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py --device_num=2 Start worker process with rank id:1, log file:./msrun_log_qwen3/worker_1.log. Environment variable [RANK_ID=1] is exported. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py --device_num=2 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources [WARNING] ME(1984240:281473122873376,MainProcess):2025-12-04-10:27:12.413.979 [mindspore/parallel/cluster/process_entity/_api.py:268] Distributed job is spawned. Waiting all processes to exit... /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources Traceback (most recent call last): File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py", line 23, in from transformers import AutoTokenizer File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/__init__.py", line 27, in from . import dependency_versions_check File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/dependency_versions_check.py", line 16, in from .utils.versions import require_version, require_version_core File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/__init__.py", line 24, in from .auto_docstring import ( File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/auto_docstring.py", line 30, in from .generic import ModelOutput File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/generic.py", line 465, in _torch_pytree.register_pytree_node( AttributeError: module 'torch.utils._pytree' has no attribute 'register_pytree_node' Traceback (most recent call last): File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py", line 23, in from transformers import AutoTokenizer File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/__init__.py", line 27, in from . import dependency_versions_check File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/dependency_versions_check.py", line 16, in from .utils.versions import require_version, require_version_core File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/__init__.py", line 24, in from .auto_docstring import ( File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/auto_docstring.py", line 30, in from .generic import ModelOutput File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/generic.py", line 465, in _torch_pytree.register_pytree_node( AttributeError: module 'torch.utils._pytree' has no attribute 'register_pytree_node' [ERROR] ME(1984240:281473122873376,MainProcess):2025-12-04-10:27:15.417.674 [mindspore/parallel/cluster/process_entity/_api.py:358] Worker process 1984718 exit with exception. Error code: 1. [WARNING] ME(1984240:281473122873376,MainProcess):2025-12-04-10:27:15.417.820 [mindspore/parallel/cluster/process_entity/_api.py:363] There's worker exits with exception, kill all other workers. [ERROR] ME(1984240:281473122873376,MainProcess):2025-12-04-10:27:15.418.085 [mindspore/parallel/cluster/process_entity/_api.py:378] Scheduler process 1984716 exit with exception. /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources Traceback (most recent call last): File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py", line 23, in from transformers import AutoTokenizer File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/__init__.py", line 27, in from . import dependency_versions_check File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/dependency_versions_check.py", line 16, in -- from .auto_docstring import ( File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/auto_docstring.py", line 30, in from .generic import ModelOutput File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/generic.py", line 465, in _torch_pytree.register_pytree_node( AttributeError: module 'torch.utils._pytree' has no attribute 'register_pytree_node' Traceback (most recent call last): File "/home/jenkins/anaconda3/envs/ci39/bin/msrun", line 8, in sys.exit(main()) File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py", line 191, in main run(args) File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py", line 185, in run process_manager.run() File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 269, in run self.join_processes() File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 382, in join_processes raise RuntimeError("Distributed job exited with exception. Please check logs in " RuntimeError: Distributed job exited with exception. Please check logs in directory: ./msrun_log_qwen3. FAILED =================================== FAILURES =================================== _____________ TestMcoreQwen3ParallelInference.test_two_cards_cases _____________ self = @pytest.mark.level0 def test_two_cards_cases(self): """Test two cards for Qwen3.""" port_id = int(os.environ.get("ASCEND_PORT_ID", random.randint(50000, 65535))) cmd_list = [ "msrun", "--worker_num=2", "--local_worker_num=2", # Should match NPU cards available f"--master_port={port_id}", # Ensure port is unique per test run if parallelized at pytest level "--log_dir=./msrun_log_qwen3", "--join=True"] cmd_list += [ str(self.run_script_path), "--device_num=2" ] cmd = " ".join(cmd_list) logger.info(f"Running command: {cmd}") return_code = os.system(cmd) > assert return_code == 0, "Qwen3 inference st failed." E AssertionError: Qwen3 inference st failed. E assert 256 == 0 E +256 E -0 test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py:63: AssertionError =========================== short test summary info ============================ FAILED test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py::TestMcoreQwen3ParallelInference::test_two_cards_cases ======================= 1 failed, 22 warnings in 17.85s ======================== [ERROR] 2025-12-04 10:27:18,905 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:288] worker: 🔥 Triggering global stop due to task failure [INFO] 2025-12-04 10:27:18,905 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:299] worker: 🛑 Task interrupted: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py [INFO] 2025-12-04 10:27:18,905 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:241] worker: 🛑 Task canceled before start: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3_moe/test_qwen3_moe_infer/test_qwen3_moe_infer.py [ERROR] 2025-12-04 10:27:18,905 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:464] schedule_group: ❌ Task failed: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py - Command 'pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py' returned non-zero exit status 1. [INFO] 2025-12-04 10:27:19,362 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:281] worker: ✅ PASSED: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/test_infer_transformer_block.py | Time: 91.557s [INFO] 2025-12-04 10:27:30,596 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/test_infer_transformer_layer.py:225] test_two_cards_cases: --- Running Multi-Card Test: model_args={'num_experts': None, 'moe_grouped_gemm': False, 'qk_layernorm': True, 'multi_latent_attention': False, 'qk_l2_norm': False, 'sandwich_norm': False}, TP=2 --- [INFO] 2025-12-04 10:27:30,596 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/test_infer_transformer_layer.py:109] build_msrun_command_list: Equivalent shell command for debugging (approximate): msrun --worker_num=2 --local_worker_num=2 --master_port=50001 --log_dir=/tmp/pytest-of-jenkins/pytest-12/test_two_cards_cases_model_arg1/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/run_infer_transformer_layer.py --batch_size=2 --seq_length=2 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=true --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --output_path=/tmp/pytest-of-jenkins/pytest-12/test_two_cards_cases_model_arg1/output_ms.npz --tensor_parallel=2 [INFO] 2025-12-04 10:27:30,596 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/test_infer_transformer_layer.py:215] run_test: Running command: msrun --worker_num=2 --local_worker_num=2 --master_port=50001 --log_dir=/tmp/pytest-of-jenkins/pytest-12/test_two_cards_cases_model_arg1/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/run_infer_transformer_layer.py --batch_size=2 --seq_length=2 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=true --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --output_path=/tmp/pytest-of-jenkins/pytest-12/test_two_cards_cases_model_arg1/output_ms.npz --tensor_parallel=2 [WARNING] 2025-12-04 10:27:39,713 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:27:39,889 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [INFO] 2025-12-04 10:27:46,199 [mindformers/parallel_core/inference/parallel_state.py:297] initialize_moe_model_parallel: expert_model_parallel_size(1) is not equal to tensor_and_data_parallel_size(2), so we will use 2 as the MOE_tensor_parallel_size. [INFO] 2025-12-04 10:28:12,486 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:281] worker: ✅ PASSED: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/test_infer_transformer_layer.py | Time: 91.143s [INFO] 2025-12-04 10:28:17,876 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:283] worker: ❌ FAILED: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py (exit code 1) | Time: 150.073s [INFO] 2025-12-04 10:28:17,876 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:285] worker: Output: ============================= test session starts ============================== platform linux -- Python 3.9.19, pytest-6.2.5, py-1.11.0, pluggy-1.6.0 -- /home/miniconda3/envs/ci39/bin/python3.9 cachedir: .pytest_cache rootdir: /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases plugins: forked-1.6.0, anyio-4.9.0, mock-3.14.1, timeout-2.2.0, hydra-core-1.3.2, xdist-1.32.0 collecting ... 2025-12-04 10:25:53,424 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-12-04 10:25:53,461 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages 2025-12-04 10:25:53,465 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages collected 2 items test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py::TestInferGPTModel::test_multi_card_configurations[model_args0-data_keys0-False-1-2] 2025-12-04 10:25:57,264 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:123] - INFO - Equivalent shell command for debugging (approximate): msrun --worker_num=2 --local_worker_num=2 --master_port=50121 --log_dir=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 --is_prefill 2025-12-04 10:25:57,264 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:232] - INFO - Running command: msrun --worker_num=2 --local_worker_num=2 --master_port=50121 --log_dir=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 --is_prefill FAILED test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py::TestInferGPTModel::test_multi_card_configurations[model_args1-data_keys1-False-1-2] 2025-12-04 10:27:04,229 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:123] - INFO - Equivalent shell command for debugging (approximate): msrun --worker_num=2 --local_worker_num=2 --master_port=65331 --log_dir=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 2025-12-04 10:27:04,230 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:232] - INFO - Running command: msrun --worker_num=2 --local_worker_num=2 --master_port=65331 --log_dir=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 FAILED =================================== FAILURES =================================== _ TestInferGPTModel.test_multi_card_configurations[model_args0-data_keys0-False-1-2] _ self = model_args = {'is_prefill': True, 'moe_grouped_gemm': False, 'multi_latent_attention': False, 'num_experts': None, ...} data_keys = {'output': 'output_standard_layer1'}, expect_error = False tensor_parallel = 1, pipeline_parallel = 2 tmp_path = PosixPath('/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0') @pytest.mark.parametrize(TWO_CARD_TEST_PARAM, TWO_CARD_TEST_CASES) @pytest.mark.level0 def test_multi_card_configurations(self, model_args, data_keys, expect_error, tensor_parallel, pipeline_parallel, tmp_path): """Test two cards with various configurations for GPTModel.""" num_devices = tensor_parallel * pipeline_parallel > self.run_test( worker_num=num_devices, local_worker_num=num_devices, model_args=model_args, data_keys=data_keys, expect_error=expect_error, tmp_path=tmp_path, tensor_parallel=tensor_parallel, pipeline_parallel=pipeline_parallel, port=random.randint(50000, 65535) ) test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:244: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:236: in run_test self.check_result(output_file_path, model_args, data_keys, cmd_result, expect_error) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = output_file_path = PosixPath('/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/output_ms.npz') model_args = {'is_prefill': True, 'moe_grouped_gemm': False, 'multi_latent_attention': False, 'num_experts': None, ...} data_keys = {'output': 'output_standard_layer1'} result = CompletedProcess(args=['msrun', '--worker_num=2', '--local_worker_num=2', '--master_port=50121', '--log_dir=/tmp/pytes...ception. Please check logs in directory: /tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log.\n') expect_error = False def check_result( self, output_file_path, model_args, data_keys, result, expect_error ): """Helper function to check results""" if expect_error: assert result.returncode != 0, ( f"Expected an error but test script passed. " f"Stdout:\n{result.stdout}\n" f"Stderr:\n{result.stderr}" ) else: > assert result.returncode == 0, ( f"Test script failed with non-zero exit code: " f"{result.returncode}.\nStdout:\n{result.stdout}\nStderr:\n{result.stderr}" ) E AssertionError: Test script failed with non-zero exit code: 1. E Stdout: E Start scheduler process, log file:/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log/scheduler.log. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 --is_prefill E Start worker process with rank id:0, log file:/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log/worker_0.log. Environment variable [RANK_ID=0] is exported. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 --is_prefill E Start worker process with rank id:1, log file:/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log/worker_1.log. Environment variable [RANK_ID=1] is exported. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 --is_prefill E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E 2025-12-04 10:26:06,748 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. E 2025-12-04 10:26:06,862 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. E [WARNING] DISTRIBUTED(1979741,ffffab00b020,python):2025-12-04-10:26:10.599.797 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:53028, destination: 127.0.0.1:50121 E [WARNING] DISTRIBUTED(1979741,ffffab00b020,python):2025-12-04-10:26:10.599.944 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 21 source: 127.0.0.1:53042, destination: 127.0.0.1:50121 E [WARNING] DISTRIBUTED(1979741,ffffab00b020,python):2025-12-04-10:26:10.600.319 [mindspore/ccsrc/cluster/topology/cluster_context.cc:319] BuildCluster] Topology build timed out., retry(1/1200). E [WARNING] DISTRIBUTED(1979751,ffff8d01b020,python):2025-12-04-10:26:10.679.466 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:53054, destination: 127.0.0.1:50121 E [WARNING] DISTRIBUTED(1979751,ffff8d01b020,python):2025-12-04-10:26:10.679.582 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 21 source: 127.0.0.1:53064, destination: 127.0.0.1:50121 E [WARNING] DISTRIBUTED(1979751,ffff8d01b020,python):2025-12-04-10:26:10.690.075 [mindspore/ccsrc/cluster/topology/cluster_context.cc:319] BuildCluster] Topology build timed out., retry(1/1200). E [WARNING] DISTRIBUTED(1979741,ffffab00b020,python):2025-12-04-10:26:11.100.464 [mindspore/ccsrc/cluster/topology/cluster_context.cc:319] BuildCluster] Topology build timed out., retry(2/1200). E [WARNING] DISTRIBUTED(1979751,ffff8d01b020,python):2025-12-04-10:26:11.190.248 [mindspore/ccsrc/cluster/topology/cluster_context.cc:322] BuildCluster] Cluster is successfully initialized. E [WARNING] DISTRIBUTED(1979751,ffff8d01b020,python):2025-12-04-10:26:11.190.321 [mindspore/ccsrc/cluster/topology/cluster_context.cc:431] PostProcess] This node 1 rank id: 1 E [WARNING] DISTRIBUTED(1979741,ffffab00b020,python):2025-12-04-10:26:11.600.669 [mindspore/ccsrc/cluster/topology/cluster_context.cc:322] BuildCluster] Cluster is successfully initialized. E [WARNING] DISTRIBUTED(1979741,ffffab00b020,python):2025-12-04-10:26:11.600.734 [mindspore/ccsrc/cluster/topology/cluster_context.cc:431] PostProcess] This node 0 rank id: 0 E [WARNING] ME(1979741,fffe86fdefa0,python):2025-12-04-10:26:11.999.307 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group E [WARNING] DEVICE(1979741,fffe6bffefa0,python):2025-12-04-10:26:11.999.690 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup hccl_world_group, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1979741,fffe6bffefa0,python):2025-12-04-10:26:11.999.778 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] ME(1979751,fffe5d7fdfa0,python):2025-12-04-10:26:14.198.462 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group E [WARNING] DEVICE(1979751,fffe5cfedfa0,python):2025-12-04-10:26:14.198.675 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup hccl_world_group, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1979751,fffe5cfedfa0,python):2025-12-04-10:26:14.198.731 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] DEVICE(1979751,fffe5cfedfa0,python):2025-12-04-10:26:15.000.852 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group E [WARNING] ME(1979751,fffe5d7fdfa0,python):2025-12-04-10:26:15.000.998 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: hccl_world_group E [WARNING] DEVICE(1979741,fffe6bffefa0,python):2025-12-04-10:26:15.002.513 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group E [WARNING] DEVICE(1979751,fffe5d7fdfa0,python):2025-12-04-10:26:15.002.791 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group pp-0-1 from the meta server node...Retry time: 399/400, sleep 1 E [WARNING] ME(1979741,fffe86fdefa0,python):2025-12-04-10:26:15.002.911 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: hccl_world_group E [WARNING] ME(1979741,fffe86fdefa0,python):2025-12-04-10:26:15.005.038 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: pp-0-1 E [WARNING] DEVICE(1979741,fffe5dcbefa0,python):2025-12-04-10:26:15.005.251 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup pp-0-1, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1979741,fffe5dcbefa0,python):2025-12-04-10:26:15.005.302 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] ME(1979751,fffe5d7fdfa0,python):2025-12-04-10:26:15.503.166 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: pp-0-1 E [WARNING] DEVICE(1979751,fffe5cfedfa0,python):2025-12-04-10:26:15.503.387 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup pp-0-1, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1979751,fffe5cfedfa0,python):2025-12-04-10:26:15.503.428 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] DEVICE(1979751,fffe5cfedfa0,python):2025-12-04-10:26:15.511.256 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1 E [WARNING] ME(1979751,fffe5d7fdfa0,python):2025-12-04-10:26:15.511.324 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: pp-0-1 E [WARNING] DEVICE(1979741,fffe5dcbefa0,python):2025-12-04-10:26:15.512.421 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1 E [WARNING] ME(1979741,fffe86fdefa0,python):2025-12-04-10:26:15.512.785 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: pp-0-1 E [WARNING] PARSER(1979741,ffffab00b020,python):2025-12-04-10:26:25.474.316 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979741,ffffab00b020,python):2025-12-04-10:26:25.562.650 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979741,ffffab00b020,python):2025-12-04-10:26:25.838.449 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979741,ffffab00b020,python):2025-12-04-10:26:25.893.237 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979741,ffffab00b020,python):2025-12-04-10:26:25.974.957 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] RUNTIME_FRAMEWORK(1979741,ffffab00b020,python):2025-12-04-10:26:26.196.644 [mindspore/ccsrc/runtime/core/graph_scheduler/base/graph_scheduler.cc:790] BuildAndScheduleGlobalActor] Failed to get DebuggerBackendEnabled, data dump function may not work. E mki_log delete old file:/home/jenkins/ascend/log/atb/atb_1958585_20251203231019.log E [WARNING] ME(1979741,ffffab00b020,python):2025-12-04-10:26:27.048.703 [mindspore/ccsrc/tools/error_handler/error_config.cc:171] operator()] Can find `TRE` in environment var `MS_ENABLE_TFT` E [WARNING] PARSER(1979751,ffff8d01b020,python):2025-12-04-10:26:27.318.215 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979751,ffff8d01b020,python):2025-12-04-10:26:27.403.324 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979751,ffff8d01b020,python):2025-12-04-10:26:27.672.897 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979751,ffff8d01b020,python):2025-12-04-10:26:27.729.642 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979751,ffff8d01b020,python):2025-12-04-10:26:27.803.943 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979751,ffff8d01b020,python):2025-12-04-10:26:27.865.500 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979751,ffff8d01b020,python):2025-12-04-10:26:27.875.474 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] RUNTIME_FRAMEWORK(1979751,ffff8d01b020,python):2025-12-04-10:26:28.127.987 [mindspore/ccsrc/runtime/core/graph_scheduler/base/graph_scheduler.cc:790] BuildAndScheduleGlobalActor] Failed to get DebuggerBackendEnabled, data dump function may not work. E mki_log delete old file:/home/jenkins/ascend/log/atb/atb_1958671_20251203231019.log E [WARNING] ME(1979751,ffff8d01b020,python):2025-12-04-10:26:28.926.800 [mindspore/ccsrc/tools/error_handler/error_config.cc:171] operator()] Can find `TRE` in environment var `MS_ENABLE_TFT` E [INFO] DISTRIBUTED(1979741,fffed1acefa0,python):2025-12-04-10:26:29.114.964 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_server.cc:222] Start] Event base dispatch success! E [INFO] DISTRIBUTED(1979741,fffed12befa0,python):2025-12-04-10:26:29.114.964 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_client.cc:268] Start] Event base dispatch success! E [INFO] DISTRIBUTED(1979751,fffed2fdefa0,python):2025-12-04-10:26:29.131.135 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_client.cc:268] Start] Event base dispatch success! E [INFO] DISTRIBUTED(1979751,fffed37eefa0,python):2025-12-04-10:26:29.131.135 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_server.cc:222] Start] Event base dispatch success! E [WARNING] DISTRIBUTED(1979735,ffffb5ddb020,python):2025-12-04-10:26:35.872.311 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1979735,ffffb5ddb020,python):2025-12-04-10:26:40.872.516 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1979735,ffffb5ddb020,python):2025-12-04-10:26:45.872.699 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1979735,ffffb5ddb020,python):2025-12-04-10:26:50.872.830 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1979735,ffffb5ddb020,python):2025-12-04-10:26:55.872.982 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [ERROR] DISTRIBUTED(1979735,ffff26fdefa0,python):2025-12-04-10:27:00.377.434 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:544] UpdateTopoState] The node: 0 is timed out. It may exit with exception, please check this node's log. E [ERROR] DISTRIBUTED(1979735,ffff26fdefa0,python):2025-12-04-10:27:00.377.491 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:544] UpdateTopoState] The node: 1 is timed out. It may exit with exception, please check this node's log. E [ERROR] DISTRIBUTED(1979735,ffffb5ddb020,python):2025-12-04-10:27:00.873.166 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:95] Finalize] There are 2 abnormal compute graph nodes. E Traceback (most recent call last): E File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py", line 282, in E main() E File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py", line 277, in main E runner = GPTModelRunner(args) E File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py", line 100, in __init__ E init() E File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/communication/management.py", line 207, in init E init_cluster() E RuntimeError: The total number of timed out node is 2. Timed out node list is: [const vector]{0, 1}, worker 0 is the first one timed out, please check its log. E E ---------------------------------------------------- E - C++ Call Stack: (For framework developers) E ---------------------------------------------------- E mindspore/ccsrc/cluster/topology/meta_server_node.cc:550 UpdateTopoState E E Stderr: E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E [WARNING] ME(1979685:281473581690912,MainProcess):2025-12-04-10:26:01.895.325 [mindspore/parallel/cluster/process_entity/_api.py:268] Distributed job is spawned. Waiting all processes to exit... E [ERROR] ME(1979685:281473581690912,MainProcess):2025-12-04-10:26:29.924.860 [mindspore/parallel/cluster/process_entity/_api.py:358] Worker process 1979751 exit with exception. Error code: -6. E [WARNING] ME(1979685:281473581690912,MainProcess):2025-12-04-10:26:29.925.046 [mindspore/parallel/cluster/process_entity/_api.py:363] There's worker exits with exception, kill all other workers. E [ERROR] ME(1979685:281473581690912,MainProcess):2025-12-04-10:27:02.769.485 [mindspore/parallel/cluster/process_entity/_api.py:378] Scheduler process 1979735 exit with exception. E Traceback (most recent call last): E File "/home/jenkins/anaconda3/envs/ci39/bin/msrun", line 8, in E sys.exit(main()) E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py", line 191, in main E run(args) E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py", line 185, in run E process_manager.run() E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 269, in run E self.join_processes() E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 382, in join_processes E raise RuntimeError("Distributed job exited with exception. Please check logs in " E RuntimeError: Distributed job exited with exception. Please check logs in directory: /tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log. E E assert 1 == 0 E +1 E -0 test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:188: AssertionError _ TestInferGPTModel.test_multi_card_configurations[model_args1-data_keys1-False-1-2] _ self = model_args = {'is_prefill': False, 'moe_grouped_gemm': False, 'multi_latent_attention': False, 'num_experts': None, ...} data_keys = {'output': 'output_standard_layer1'}, expect_error = False tensor_parallel = 1, pipeline_parallel = 2 tmp_path = PosixPath('/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1') @pytest.mark.parametrize(TWO_CARD_TEST_PARAM, TWO_CARD_TEST_CASES) @pytest.mark.level0 def test_multi_card_configurations(self, model_args, data_keys, expect_error, tensor_parallel, pipeline_parallel, tmp_path): """Test two cards with various configurations for GPTModel.""" num_devices = tensor_parallel * pipeline_parallel > self.run_test( worker_num=num_devices, local_worker_num=num_devices, model_args=model_args, data_keys=data_keys, expect_error=expect_error, tmp_path=tmp_path, tensor_parallel=tensor_parallel, pipeline_parallel=pipeline_parallel, port=random.randint(50000, 65535) ) test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:244: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:236: in run_test self.check_result(output_file_path, model_args, data_keys, cmd_result, expect_error) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = output_file_path = PosixPath('/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/output_ms.npz') model_args = {'is_prefill': False, 'moe_grouped_gemm': False, 'multi_latent_attention': False, 'num_experts': None, ...} data_keys = {'output': 'output_standard_layer1'} result = CompletedProcess(args=['msrun', '--worker_num=2', '--local_worker_num=2', '--master_port=65331', '--log_dir=/tmp/pytes...ception. Please check logs in directory: /tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log.\n') expect_error = False def check_result( self, output_file_path, model_args, data_keys, result, expect_error ): """Helper function to check results""" if expect_error: assert result.returncode != 0, ( f"Expected an error but test script passed. " f"Stdout:\n{result.stdout}\n" f"Stderr:\n{result.stderr}" ) else: > assert result.returncode == 0, ( f"Test script failed with non-zero exit code: " f"{result.returncode}.\nStdout:\n{result.stdout}\nStderr:\n{result.stderr}" ) E AssertionError: Test script failed with non-zero exit code: 1. E Stdout: E Start scheduler process, log file:/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log/scheduler.log. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 E Start worker process with rank id:0, log file:/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log/worker_0.log. Environment variable [RANK_ID=0] is exported. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 E Start worker process with rank id:1, log file:/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log/worker_1.log. Environment variable [RANK_ID=1] is exported. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E 2025-12-04 10:27:13,632 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. E 2025-12-04 10:27:13,740 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:17.371.222 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:40748, destination: 127.0.0.1:65331 E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:17.381.385 [mindspore/ccsrc/cluster/topology/compute_graph_node.cc:175] Register] Failed to connect to the meta server node url: 127.0.0.1:65331 E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:17.381.427 [mindspore/ccsrc/cluster/topology/compute_graph_node.cc:366] ReconnectWithTimeoutWindow] Failed to register and try to reconnect to the meta server. E [WARNING] DISTRIBUTED(1984392,ffff8faeb020,python):2025-12-04-10:27:17.814.248 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:40760, destination: 127.0.0.1:65331 E [WARNING] DISTRIBUTED(1984392,ffff8faeb020,python):2025-12-04-10:27:17.814.388 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 21 source: 127.0.0.1:40776, destination: 127.0.0.1:65331 E [WARNING] DISTRIBUTED(1984392,ffff8faeb020,python):2025-12-04-10:27:17.814.962 [mindspore/ccsrc/cluster/topology/cluster_context.cc:319] BuildCluster] Topology build timed out., retry(1/1200). E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:17.881.859 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:40792, destination: 127.0.0.1:65331 E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:17.881.993 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 21 source: 127.0.0.1:40796, destination: 127.0.0.1:65331 E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:17.882.455 [mindspore/ccsrc/cluster/topology/cluster_context.cc:319] BuildCluster] Topology build timed out., retry(1/1200). E [WARNING] DISTRIBUTED(1984392,ffff8faeb020,python):2025-12-04-10:27:18.315.114 [mindspore/ccsrc/cluster/topology/cluster_context.cc:319] BuildCluster] Topology build timed out., retry(2/1200). E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:18.382.625 [mindspore/ccsrc/cluster/topology/cluster_context.cc:322] BuildCluster] Cluster is successfully initialized. E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:18.382.683 [mindspore/ccsrc/cluster/topology/cluster_context.cc:431] PostProcess] This node 0 rank id: 0 E [WARNING] DISTRIBUTED(1984392,ffff8faeb020,python):2025-12-04-10:27:18.815.315 [mindspore/ccsrc/cluster/topology/cluster_context.cc:322] BuildCluster] Cluster is successfully initialized. E [WARNING] DISTRIBUTED(1984392,ffff8faeb020,python):2025-12-04-10:27:18.815.389 [mindspore/ccsrc/cluster/topology/cluster_context.cc:431] PostProcess] This node 1 rank id: 1 E [WARNING] DEVICE(1984392,fffeb0f5efa0,python):2025-12-04-10:27:20.014.335 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 399/400, sleep 1 E [WARNING] DEVICE(1984392,fffeb0f5efa0,python):2025-12-04-10:27:20.514.531 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 398/400, sleep 2 E [WARNING] DEVICE(1984392,fffeb0f5efa0,python):2025-12-04-10:27:21.014.734 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 397/400, sleep 1 E [WARNING] DEVICE(1984392,fffeb0f5efa0,python):2025-12-04-10:27:21.514.916 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 396/400, sleep 1 E [WARNING] DEVICE(1984392,fffeb0f5efa0,python):2025-12-04-10:27:22.015.130 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 395/400, sleep 2 E [WARNING] DEVICE(1984392,fffeb0f5efa0,python):2025-12-04-10:27:22.515.352 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 394/400, sleep 2 E [WARNING] ME(1984383,fffe8d7aefa0,python):2025-12-04-10:27:22.604.276 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group E [WARNING] DEVICE(1984383,fffe6afdefa0,python):2025-12-04-10:27:22.604.521 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup hccl_world_group, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1984383,fffe6afdefa0,python):2025-12-04-10:27:22.604.578 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] ME(1984392,fffeb0f5efa0,python):2025-12-04-10:27:23.015.603 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group E [WARNING] DEVICE(1984392,fffe73ffefa0,python):2025-12-04-10:27:23.016.041 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup hccl_world_group, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1984392,fffe73ffefa0,python):2025-12-04-10:27:23.016.101 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] DEVICE(1984383,fffe6afdefa0,python):2025-12-04-10:27:23.817.951 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group E [WARNING] DEVICE(1984392,fffe73ffefa0,python):2025-12-04-10:27:23.818.127 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group E [WARNING] ME(1984383,fffe8d7aefa0,python):2025-12-04-10:27:23.818.223 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: hccl_world_group E [WARNING] ME(1984392,fffeb0f5efa0,python):2025-12-04-10:27:23.818.270 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: hccl_world_group E [WARNING] DEVICE(1984392,fffeb0f5efa0,python):2025-12-04-10:27:23.819.986 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group pp-0-1 from the meta server node...Retry time: 399/400, sleep 1 E [WARNING] ME(1984383,fffe8d7aefa0,python):2025-12-04-10:27:23.820.661 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: pp-0-1 E [WARNING] DEVICE(1984383,fffe6a7cefa0,python):2025-12-04-10:27:23.820.830 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup pp-0-1, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1984383,fffe6a7cefa0,python):2025-12-04-10:27:23.820.871 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] ME(1984392,fffeb0f5efa0,python):2025-12-04-10:27:24.320.179 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: pp-0-1 E [WARNING] DEVICE(1984392,fffe73ffefa0,python):2025-12-04-10:27:24.320.378 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup pp-0-1, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1984392,fffe73ffefa0,python):2025-12-04-10:27:24.320.418 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] DEVICE(1984392,fffe73ffefa0,python):2025-12-04-10:27:24.328.455 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1 E [WARNING] ME(1984392,fffeb0f5efa0,python):2025-12-04-10:27:24.328.525 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: pp-0-1 E [WARNING] DEVICE(1984383,fffe6a7cefa0,python):2025-12-04-10:27:24.328.581 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1 E [WARNING] ME(1984383,fffe8d7aefa0,python):2025-12-04-10:27:24.328.756 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: pp-0-1 E [WARNING] PARSER(1984383,ffff88bfb020,python):2025-12-04-10:27:34.095.454 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984383,ffff88bfb020,python):2025-12-04-10:27:34.182.651 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984383,ffff88bfb020,python):2025-12-04-10:27:34.348.737 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984383,ffff88bfb020,python):2025-12-04-10:27:34.403.695 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984383,ffff88bfb020,python):2025-12-04-10:27:34.481.120 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] RUNTIME_FRAMEWORK(1984383,ffff88bfb020,python):2025-12-04-10:27:34.682.071 [mindspore/ccsrc/runtime/core/graph_scheduler/base/graph_scheduler.cc:790] BuildAndScheduleGlobalActor] Failed to get DebuggerBackendEnabled, data dump function may not work. E mki_log delete old file:/home/jenkins/ascend/log/atb/atb_1960063_20251203231051.log E [WARNING] ME(1984383,ffff88bfb020,python):2025-12-04-10:27:35.472.320 [mindspore/ccsrc/tools/error_handler/error_config.cc:171] operator()] Can find `TRE` in environment var `MS_ENABLE_TFT` E [WARNING] PARSER(1984392,ffff8faeb020,python):2025-12-04-10:27:35.754.682 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984392,ffff8faeb020,python):2025-12-04-10:27:35.841.826 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984392,ffff8faeb020,python):2025-12-04-10:27:36.010.497 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984392,ffff8faeb020,python):2025-12-04-10:27:36.069.421 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984392,ffff8faeb020,python):2025-12-04-10:27:36.150.491 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] RUNTIME_FRAMEWORK(1984392,ffff8faeb020,python):2025-12-04-10:27:36.426.180 [mindspore/ccsrc/runtime/core/graph_scheduler/base/graph_scheduler.cc:790] BuildAndScheduleGlobalActor] Failed to get DebuggerBackendEnabled, data dump function may not work. E mki_log delete old file:/home/jenkins/ascend/log/atb/atb_1960077_20251203231052.log E [WARNING] ME(1984392,ffff8faeb020,python):2025-12-04-10:27:37.204.433 [mindspore/ccsrc/tools/error_handler/error_config.cc:171] operator()] Can find `TRE` in environment var `MS_ENABLE_TFT` E [INFO] DISTRIBUTED(1984392,fffef77eefa0,python):2025-12-04-10:27:37.410.844 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_server.cc:222] Start] Event base dispatch success! E [INFO] DISTRIBUTED(1984392,fffef6fdefa0,python):2025-12-04-10:27:37.410.844 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_client.cc:268] Start] Event base dispatch success! E [INFO] DISTRIBUTED(1984383,fffee7ffefa0,python):2025-12-04-10:27:37.442.611 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_client.cc:268] Start] Event base dispatch success! E [INFO] DISTRIBUTED(1984383,fffefce8efa0,python):2025-12-04-10:27:37.442.610 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_server.cc:222] Start] Event base dispatch success! E [WARNING] DISTRIBUTED(1984374,ffff8d42b020,python):2025-12-04-10:27:47.940.764 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1984374,ffff8d42b020,python):2025-12-04-10:27:52.940.938 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1984374,ffff8d42b020,python):2025-12-04-10:27:57.941.053 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1984374,ffff8d42b020,python):2025-12-04-10:28:02.941.228 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1984374,ffff8d42b020,python):2025-12-04-10:28:07.941.349 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [ERROR] DISTRIBUTED(1984374,ffff02eeefa0,python):2025-12-04-10:28:08.445.429 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:544] UpdateTopoState] The node: 0 is timed out. It may exit with exception, please check this node's log. E [ERROR] DISTRIBUTED(1984374,ffff02eeefa0,python):2025-12-04-10:28:08.445.468 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:544] UpdateTopoState] The node: 1 is timed out. It may exit with exception, please check this node's log. E [ERROR] DISTRIBUTED(1984374,ffff8d42b020,python):2025-12-04-10:28:12.941.504 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:95] Finalize] There are 2 abnormal compute graph nodes. E Traceback (most recent call last): E File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py", line 282, in E main() E File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py", line 277, in main E runner = GPTModelRunner(args) E File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py", line 100, in __init__ E init() E File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/communication/management.py", line 207, in init E init_cluster() E RuntimeError: The total number of timed out node is 2. Timed out node list is: [const vector]{0, 1}, worker 0 is the first one timed out, please check its log. E E ---------------------------------------------------- E - C++ Call Stack: (For framework developers) E ---------------------------------------------------- E mindspore/ccsrc/cluster/topology/meta_server_node.cc:550 UpdateTopoState E E Stderr: E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E [WARNING] ME(1984000:281473024897056,MainProcess):2025-12-04-10:27:08.866.030 [mindspore/parallel/cluster/process_entity/_api.py:268] Distributed job is spawned. Waiting all processes to exit... E [ERROR] ME(1984000:281473024897056,MainProcess):2025-12-04-10:27:38.897.721 [mindspore/parallel/cluster/process_entity/_api.py:358] Worker process 1984383 exit with exception. Error code: -6. E [WARNING] ME(1984000:281473024897056,MainProcess):2025-12-04-10:27:38.897.871 [mindspore/parallel/cluster/process_entity/_api.py:363] There's worker exits with exception, kill all other workers. E [ERROR] ME(1984000:281473024897056,MainProcess):2025-12-04-10:28:14.616.902 [mindspore/parallel/cluster/process_entity/_api.py:378] Scheduler process 1984374 exit with exception. E Traceback (most recent call last): E File "/home/jenkins/anaconda3/envs/ci39/bin/msrun", line 8, in E sys.exit(main()) E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py", line 191, in main E run(args) E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py", line 185, in run E process_manager.run() E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 269, in run E self.join_processes() E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 382, in join_processes E raise RuntimeError("Distributed job exited with exception. Please check logs in " E RuntimeError: Distributed job exited with exception. Please check logs in directory: /tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log. E E assert 1 == 0 E +1 E -0 test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:188: AssertionError =========================== short test summary info ============================ FAILED test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py::TestInferGPTModel::test_multi_card_configurations[model_args0-data_keys0-False-1-2] FAILED test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py::TestInferGPTModel::test_multi_card_configurations[model_args1-data_keys1-False-1-2] ================== 2 failed, 22 warnings in 147.44s (0:02:27) ================== [INFO] 2025-12-04 10:28:17,877 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:299] worker: 🛑 Task interrupted: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py [ERROR] 2025-12-04 10:28:17,877 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:464] schedule_group: ❌ Task failed: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py - Command 'pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py' returned non-zero exit status 1. [WARNING] 2025-12-04 10:28:18,803 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:478] schedule_group: 🔚 Group execution aborted due to failures [INFO] 2025-12-04 10:28:18,803 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:370] run: Group 2 completed in 151.001s [INFO] 2025-12-04 10:28:18,803 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:379] run: Total execution INTERRUPTED in 151.002s test_run_multi_cards_cases-test_multi_cards_level0_cases-log/log/rank_0/error.log0000640000175100001440000000306215114170701030376 0ustar jenkinsusers[ERROR] 2025-12-04 10:27:18,905 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:288] worker: 🔥 Triggering global stop due to task failure [ERROR] 2025-12-04 10:27:18,905 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:464] schedule_group: ❌ Task failed: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py - Command 'pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py' returned non-zero exit status 1. [ERROR] 2025-12-04 10:28:17,877 [/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:464] schedule_group: ❌ Task failed: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py - Command 'pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py' returned non-zero exit status 1. test_run_multi_cards_cases-test_multi_cards_level0_cases-log/log/rank_1/0000750000175100001440000000000015114170476026552 5ustar jenkinsuserstest_run_multi_cards_cases-test_multi_cards_level0_cases-log/log/rank_1/info.log0000640000175100001440000001155015114170641030205 0ustar jenkinsusers[WARNING] 2025-12-04 10:26:06,497 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:26:06,660 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:26:06,796 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:26:06,862 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [INFO] 2025-12-04 10:26:12,538 [mindformers/parallel_core/inference/parallel_state.py:297] initialize_moe_model_parallel: expert_model_parallel_size(1) is not equal to tensor_and_data_parallel_size(2), so we will use 2 as the MOE_tensor_parallel_size. [INFO] 2025-12-04 10:26:12,670 [mindformers/parallel_core/inference/parallel_state.py:297] initialize_moe_model_parallel: expert_model_parallel_size(1) is not equal to tensor_and_data_parallel_size(2), so we will use 2 as the MOE_tensor_parallel_size. [INFO] 2025-12-04 10:26:13,036 [mindformers/parallel_core/inference/parallel_state.py:297] initialize_moe_model_parallel: expert_model_parallel_size(1) is not equal to tensor_and_data_parallel_size(2), so we will use 2 as the MOE_tensor_parallel_size. [WARNING] 2025-12-04 10:26:35,714 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [INFO] 2025-12-04 10:26:42,271 [mindformers/parallel_core/inference/parallel_state.py:297] initialize_moe_model_parallel: expert_model_parallel_size(1) is not equal to tensor_and_data_parallel_size(2), so we will use 2 as the MOE_tensor_parallel_size. [WARNING] 2025-12-04 10:26:46,386 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [INFO] 2025-12-04 10:26:52,285 [mindformers/parallel_core/inference/parallel_state.py:297] initialize_moe_model_parallel: expert_model_parallel_size(1) is not equal to tensor_and_data_parallel_size(2), so we will use 2 as the MOE_tensor_parallel_size. [WARNING] 2025-12-04 10:27:00,281 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [INFO] 2025-12-04 10:27:05,778 [mindformers/parallel_core/inference/parallel_state.py:297] initialize_moe_model_parallel: expert_model_parallel_size(1) is not equal to tensor_and_data_parallel_size(2), so we will use 2 as the MOE_tensor_parallel_size. [WARNING] 2025-12-04 10:27:13,632 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [WARNING] 2025-12-04 10:27:39,765 [/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] _showwarnmsg: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. [INFO] 2025-12-04 10:27:45,737 [mindformers/parallel_core/inference/parallel_state.py:297] initialize_moe_model_parallel: expert_model_parallel_size(1) is not equal to tensor_and_data_parallel_size(2), so we will use 2 as the MOE_tensor_parallel_size. test_run_multi_cards_cases-test_multi_cards_level0_cases-log/log/rank_1/error.log0000640000175100001440000000000015114170476030375 0ustar jenkinsuserstest_run_multi_cards_cases-test_multi_cards_level0_cases-log/test.log0000644000175100001440000034032215114170702026301 0ustar jenkinsusers============================= test session starts ============================== platform linux -- Python 3.9.19, pytest-6.2.5, py-1.11.0, pluggy-1.6.0 rootdir: /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases, configfile: ../../../../../../../../sault/virtual_test/virtualenv_0013/sault/config/pytest.ini plugins: forked-1.6.0, anyio-4.9.0, mock-3.14.1, timeout-2.2.0, hydra-core-1.3.2, xdist-1.32.0 2025-12-04 10:25:44,037 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-12-04 10:25:44,076 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages 2025-12-04 10:25:44,080 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages collected 1 item test_run_multi_cards_cases.py 2025-12-04 10:25:47,721 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:557] - INFO - 🔍 Start importing test cases for level: level0 ... 2025-12-04 10:25:47,801 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:562] - INFO - ✅ Finished importing 13 test cases for level: level0. Group types: ['TaskType.EIGHT_CARDS_TASK', 'TaskType.FOUR_CARDS_TASK', 'TaskType.TWO_CARDS_TASK'] 2025-12-04 10:25:47,801 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:566] - INFO - - Group TaskType.EIGHT_CARDS_TASK: 1 tasks 2025-12-04 10:25:47,801 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:566] - INFO - - Group TaskType.FOUR_CARDS_TASK: 5 tasks 2025-12-04 10:25:47,801 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:566] - INFO - - Group TaskType.TWO_CARDS_TASK: 7 tasks 2025-12-04 10:25:47,802 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:365] - INFO - === Processing Group: 2 (7 tasks) === 2025-12-04 10:25:47,802 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:445] - INFO - 📝 Task execution plan: 2025-12-04 10:25:47,802 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:447] - INFO - [1] pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py (type: TaskType.TWO_CARDS_TASK, est. time: 300s) 2025-12-04 10:25:47,802 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:447] - INFO - [2] pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_attention/test_self_attention/test_infer_self_attention.py (type: TaskType.TWO_CARDS_TASK, est. time: 300s) 2025-12-04 10:25:47,802 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:447] - INFO - [3] pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_mlp/test_infer_mlp.py (type: TaskType.TWO_CARDS_TASK, est. time: 300s) 2025-12-04 10:25:47,802 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:447] - INFO - [4] pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/test_infer_transformer_block.py (type: TaskType.TWO_CARDS_TASK, est. time: 300s) 2025-12-04 10:25:47,803 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:447] - INFO - [5] pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/test_infer_transformer_layer.py (type: TaskType.TWO_CARDS_TASK, est. time: 300s) 2025-12-04 10:25:47,803 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:447] - INFO - [6] pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py (type: TaskType.TWO_CARDS_TASK, est. time: 300s) 2025-12-04 10:25:47,803 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:447] - INFO - [7] pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3_moe/test_qwen3_moe_infer/test_qwen3_moe_infer.py (type: TaskType.TWO_CARDS_TASK, est. time: 300s) 2025-12-04 10:25:47,804 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:258] - INFO - 🏃 Running: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py on cards [0, 1] (port 50000) [Timeout: 600s] 2025-12-04 10:25:47,804 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:258] - INFO - 🏃 Running: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_attention/test_self_attention/test_infer_self_attention.py on cards [2, 3] (port 50001) [Timeout: 600s] 2025-12-04 10:25:47,805 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:258] - INFO - 🏃 Running: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_mlp/test_infer_mlp.py on cards [4, 5] (port 50002) [Timeout: 600s] 2025-12-04 10:25:47,805 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:258] - INFO - 🏃 Running: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/test_infer_transformer_block.py on cards [6, 7] (port 50003) [Timeout: 600s] 2025-12-04 10:26:41,342 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:281] - INFO - ✅ PASSED: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_attention/test_self_attention/test_infer_self_attention.py | Time: 53.538s 2025-12-04 10:26:41,343 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:258] - INFO - 🏃 Running: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/test_infer_transformer_layer.py on cards [2, 3] (port 50001) [Timeout: 600s] 2025-12-04 10:26:58,499 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:281] - INFO - ✅ PASSED: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_mlp/test_infer_mlp.py | Time: 70.695s 2025-12-04 10:26:58,500 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:258] - INFO - 🏃 Running: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py on cards [4, 5] (port 50002) [Timeout: 600s] 2025-12-04 10:27:18,904 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:283] - INFO - ❌ FAILED: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py (exit code 1) | Time: 20.404s 2025-12-04 10:27:18,904 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:285] - INFO - Output: ============================= test session starts ============================== platform linux -- Python 3.9.19, pytest-6.2.5, py-1.11.0, pluggy-1.6.0 -- /home/miniconda3/envs/ci39/bin/python3.9 cachedir: .pytest_cache rootdir: /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases plugins: forked-1.6.0, anyio-4.9.0, mock-3.14.1, timeout-2.2.0, hydra-core-1.3.2, xdist-1.32.0 collecting ... 2025-12-04 10:27:03,934 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-12-04 10:27:03,973 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages 2025-12-04 10:27:03,977 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages collected 1 item test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py::TestMcoreQwen3ParallelInference::test_two_cards_cases 2025-12-04 10:27:07,716 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py:61] - INFO - Running command: msrun --worker_num=2 --local_worker_num=2 --master_port=50002 --log_dir=./msrun_log_qwen3 --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py --device_num=2 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) Start scheduler process, log file:./msrun_log_qwen3/scheduler.log. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py --device_num=2 Start worker process with rank id:0, log file:./msrun_log_qwen3/worker_0.log. Environment variable [RANK_ID=0] is exported. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py --device_num=2 Start worker process with rank id:1, log file:./msrun_log_qwen3/worker_1.log. Environment variable [RANK_ID=1] is exported. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py --device_num=2 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources [WARNING] ME(1984240:281473122873376,MainProcess):2025-12-04-10:27:12.413.979 [mindspore/parallel/cluster/process_entity/_api.py:268] Distributed job is spawned. Waiting all processes to exit... /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources Traceback (most recent call last): File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py", line 23, in from transformers import AutoTokenizer File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/__init__.py", line 27, in from . import dependency_versions_check File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/dependency_versions_check.py", line 16, in from .utils.versions import require_version, require_version_core File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/__init__.py", line 24, in from .auto_docstring import ( File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/auto_docstring.py", line 30, in from .generic import ModelOutput File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/generic.py", line 465, in _torch_pytree.register_pytree_node( AttributeError: module 'torch.utils._pytree' has no attribute 'register_pytree_node' Traceback (most recent call last): File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py", line 23, in from transformers import AutoTokenizer File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/__init__.py", line 27, in from . import dependency_versions_check File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/dependency_versions_check.py", line 16, in from .utils.versions import require_version, require_version_core File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/__init__.py", line 24, in from .auto_docstring import ( File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/auto_docstring.py", line 30, in from .generic import ModelOutput File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/generic.py", line 465, in _torch_pytree.register_pytree_node( AttributeError: module 'torch.utils._pytree' has no attribute 'register_pytree_node' [ERROR] ME(1984240:281473122873376,MainProcess):2025-12-04-10:27:15.417.674 [mindspore/parallel/cluster/process_entity/_api.py:358] Worker process 1984718 exit with exception. Error code: 1. [WARNING] ME(1984240:281473122873376,MainProcess):2025-12-04-10:27:15.417.820 [mindspore/parallel/cluster/process_entity/_api.py:363] There's worker exits with exception, kill all other workers. [ERROR] ME(1984240:281473122873376,MainProcess):2025-12-04-10:27:15.418.085 [mindspore/parallel/cluster/process_entity/_api.py:378] Scheduler process 1984716 exit with exception. /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources Traceback (most recent call last): File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/run_qwen3.py", line 23, in from transformers import AutoTokenizer File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/__init__.py", line 27, in from . import dependency_versions_check File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/dependency_versions_check.py", line 16, in -- from .auto_docstring import ( File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/auto_docstring.py", line 30, in from .generic import ModelOutput File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/transformers/utils/generic.py", line 465, in _torch_pytree.register_pytree_node( AttributeError: module 'torch.utils._pytree' has no attribute 'register_pytree_node' Traceback (most recent call last): File "/home/jenkins/anaconda3/envs/ci39/bin/msrun", line 8, in sys.exit(main()) File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py", line 191, in main run(args) File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py", line 185, in run process_manager.run() File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 269, in run self.join_processes() File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 382, in join_processes raise RuntimeError("Distributed job exited with exception. Please check logs in " RuntimeError: Distributed job exited with exception. Please check logs in directory: ./msrun_log_qwen3. FAILED =================================== FAILURES =================================== _____________ TestMcoreQwen3ParallelInference.test_two_cards_cases _____________ self = @pytest.mark.level0 def test_two_cards_cases(self): """Test two cards for Qwen3.""" port_id = int(os.environ.get("ASCEND_PORT_ID", random.randint(50000, 65535))) cmd_list = [ "msrun", "--worker_num=2", "--local_worker_num=2", # Should match NPU cards available f"--master_port={port_id}", # Ensure port is unique per test run if parallelized at pytest level "--log_dir=./msrun_log_qwen3", "--join=True"] cmd_list += [ str(self.run_script_path), "--device_num=2" ] cmd = " ".join(cmd_list) logger.info(f"Running command: {cmd}") return_code = os.system(cmd) > assert return_code == 0, "Qwen3 inference st failed." E AssertionError: Qwen3 inference st failed. E assert 256 == 0 E +256 E -0 test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py:63: AssertionError =========================== short test summary info ============================ FAILED test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py::TestMcoreQwen3ParallelInference::test_two_cards_cases ======================= 1 failed, 22 warnings in 17.85s ======================== 2025-12-04 10:27:18,905 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:288] - ERROR - 🔥 Triggering global stop due to task failure 2025-12-04 10:27:18,905 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:299] - INFO - 🛑 Task interrupted: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py 2025-12-04 10:27:18,905 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:241] - INFO - 🛑 Task canceled before start: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3_moe/test_qwen3_moe_infer/test_qwen3_moe_infer.py 2025-12-04 10:27:18,905 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:464] - ERROR - ❌ Task failed: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py - Command 'pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_model/test_qwen3/test_qwen3_infer/test_qwen3_infer.py' returned non-zero exit status 1. 2025-12-04 10:27:19,362 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:281] - INFO - ✅ PASSED: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_block/test_infer_transformer_block.py | Time: 91.557s 2025-12-04 10:28:12,486 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:281] - INFO - ✅ PASSED: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_transformer/test_transformer_layer/test_infer_transformer_layer.py | Time: 91.143s 2025-12-04 10:28:17,876 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:283] - INFO - ❌ FAILED: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py (exit code 1) | Time: 150.073s 2025-12-04 10:28:17,876 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:285] - INFO - Output: ============================= test session starts ============================== platform linux -- Python 3.9.19, pytest-6.2.5, py-1.11.0, pluggy-1.6.0 -- /home/miniconda3/envs/ci39/bin/python3.9 cachedir: .pytest_cache rootdir: /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases plugins: forked-1.6.0, anyio-4.9.0, mock-3.14.1, timeout-2.2.0, hydra-core-1.3.2, xdist-1.32.0 collecting ... 2025-12-04 10:25:53,424 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-12-04 10:25:53,461 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages 2025-12-04 10:25:53,465 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/miniconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages collected 2 items test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py::TestInferGPTModel::test_multi_card_configurations[model_args0-data_keys0-False-1-2] 2025-12-04 10:25:57,264 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:123] - INFO - Equivalent shell command for debugging (approximate): msrun --worker_num=2 --local_worker_num=2 --master_port=50121 --log_dir=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 --is_prefill 2025-12-04 10:25:57,264 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:232] - INFO - Running command: msrun --worker_num=2 --local_worker_num=2 --master_port=50121 --log_dir=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 --is_prefill FAILED test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py::TestInferGPTModel::test_multi_card_configurations[model_args1-data_keys1-False-1-2] 2025-12-04 10:27:04,229 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:123] - INFO - Equivalent shell command for debugging (approximate): msrun --worker_num=2 --local_worker_num=2 --master_port=65331 --log_dir=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 2025-12-04 10:27:04,230 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:232] - INFO - Running command: msrun --worker_num=2 --local_worker_num=2 --master_port=65331 --log_dir=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log --join=True /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 FAILED =================================== FAILURES =================================== _ TestInferGPTModel.test_multi_card_configurations[model_args0-data_keys0-False-1-2] _ self = model_args = {'is_prefill': True, 'moe_grouped_gemm': False, 'multi_latent_attention': False, 'num_experts': None, ...} data_keys = {'output': 'output_standard_layer1'}, expect_error = False tensor_parallel = 1, pipeline_parallel = 2 tmp_path = PosixPath('/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0') @pytest.mark.parametrize(TWO_CARD_TEST_PARAM, TWO_CARD_TEST_CASES) @pytest.mark.level0 def test_multi_card_configurations(self, model_args, data_keys, expect_error, tensor_parallel, pipeline_parallel, tmp_path): """Test two cards with various configurations for GPTModel.""" num_devices = tensor_parallel * pipeline_parallel > self.run_test( worker_num=num_devices, local_worker_num=num_devices, model_args=model_args, data_keys=data_keys, expect_error=expect_error, tmp_path=tmp_path, tensor_parallel=tensor_parallel, pipeline_parallel=pipeline_parallel, port=random.randint(50000, 65535) ) test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:244: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:236: in run_test self.check_result(output_file_path, model_args, data_keys, cmd_result, expect_error) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = output_file_path = PosixPath('/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/output_ms.npz') model_args = {'is_prefill': True, 'moe_grouped_gemm': False, 'multi_latent_attention': False, 'num_experts': None, ...} data_keys = {'output': 'output_standard_layer1'} result = CompletedProcess(args=['msrun', '--worker_num=2', '--local_worker_num=2', '--master_port=50121', '--log_dir=/tmp/pytes...ception. Please check logs in directory: /tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log.\n') expect_error = False def check_result( self, output_file_path, model_args, data_keys, result, expect_error ): """Helper function to check results""" if expect_error: assert result.returncode != 0, ( f"Expected an error but test script passed. " f"Stdout:\n{result.stdout}\n" f"Stderr:\n{result.stderr}" ) else: > assert result.returncode == 0, ( f"Test script failed with non-zero exit code: " f"{result.returncode}.\nStdout:\n{result.stdout}\nStderr:\n{result.stderr}" ) E AssertionError: Test script failed with non-zero exit code: 1. E Stdout: E Start scheduler process, log file:/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log/scheduler.log. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 --is_prefill E Start worker process with rank id:0, log file:/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log/worker_0.log. Environment variable [RANK_ID=0] is exported. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 --is_prefill E Start worker process with rank id:1, log file:/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log/worker_1.log. Environment variable [RANK_ID=1] is exported. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 --is_prefill E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E 2025-12-04 10:26:06,748 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. E 2025-12-04 10:26:06,862 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. E [WARNING] DISTRIBUTED(1979741,ffffab00b020,python):2025-12-04-10:26:10.599.797 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:53028, destination: 127.0.0.1:50121 E [WARNING] DISTRIBUTED(1979741,ffffab00b020,python):2025-12-04-10:26:10.599.944 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 21 source: 127.0.0.1:53042, destination: 127.0.0.1:50121 E [WARNING] DISTRIBUTED(1979741,ffffab00b020,python):2025-12-04-10:26:10.600.319 [mindspore/ccsrc/cluster/topology/cluster_context.cc:319] BuildCluster] Topology build timed out., retry(1/1200). E [WARNING] DISTRIBUTED(1979751,ffff8d01b020,python):2025-12-04-10:26:10.679.466 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:53054, destination: 127.0.0.1:50121 E [WARNING] DISTRIBUTED(1979751,ffff8d01b020,python):2025-12-04-10:26:10.679.582 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 21 source: 127.0.0.1:53064, destination: 127.0.0.1:50121 E [WARNING] DISTRIBUTED(1979751,ffff8d01b020,python):2025-12-04-10:26:10.690.075 [mindspore/ccsrc/cluster/topology/cluster_context.cc:319] BuildCluster] Topology build timed out., retry(1/1200). E [WARNING] DISTRIBUTED(1979741,ffffab00b020,python):2025-12-04-10:26:11.100.464 [mindspore/ccsrc/cluster/topology/cluster_context.cc:319] BuildCluster] Topology build timed out., retry(2/1200). E [WARNING] DISTRIBUTED(1979751,ffff8d01b020,python):2025-12-04-10:26:11.190.248 [mindspore/ccsrc/cluster/topology/cluster_context.cc:322] BuildCluster] Cluster is successfully initialized. E [WARNING] DISTRIBUTED(1979751,ffff8d01b020,python):2025-12-04-10:26:11.190.321 [mindspore/ccsrc/cluster/topology/cluster_context.cc:431] PostProcess] This node 1 rank id: 1 E [WARNING] DISTRIBUTED(1979741,ffffab00b020,python):2025-12-04-10:26:11.600.669 [mindspore/ccsrc/cluster/topology/cluster_context.cc:322] BuildCluster] Cluster is successfully initialized. E [WARNING] DISTRIBUTED(1979741,ffffab00b020,python):2025-12-04-10:26:11.600.734 [mindspore/ccsrc/cluster/topology/cluster_context.cc:431] PostProcess] This node 0 rank id: 0 E [WARNING] ME(1979741,fffe86fdefa0,python):2025-12-04-10:26:11.999.307 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group E [WARNING] DEVICE(1979741,fffe6bffefa0,python):2025-12-04-10:26:11.999.690 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup hccl_world_group, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1979741,fffe6bffefa0,python):2025-12-04-10:26:11.999.778 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] ME(1979751,fffe5d7fdfa0,python):2025-12-04-10:26:14.198.462 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group E [WARNING] DEVICE(1979751,fffe5cfedfa0,python):2025-12-04-10:26:14.198.675 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup hccl_world_group, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1979751,fffe5cfedfa0,python):2025-12-04-10:26:14.198.731 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] DEVICE(1979751,fffe5cfedfa0,python):2025-12-04-10:26:15.000.852 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group E [WARNING] ME(1979751,fffe5d7fdfa0,python):2025-12-04-10:26:15.000.998 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: hccl_world_group E [WARNING] DEVICE(1979741,fffe6bffefa0,python):2025-12-04-10:26:15.002.513 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group E [WARNING] DEVICE(1979751,fffe5d7fdfa0,python):2025-12-04-10:26:15.002.791 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group pp-0-1 from the meta server node...Retry time: 399/400, sleep 1 E [WARNING] ME(1979741,fffe86fdefa0,python):2025-12-04-10:26:15.002.911 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: hccl_world_group E [WARNING] ME(1979741,fffe86fdefa0,python):2025-12-04-10:26:15.005.038 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: pp-0-1 E [WARNING] DEVICE(1979741,fffe5dcbefa0,python):2025-12-04-10:26:15.005.251 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup pp-0-1, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1979741,fffe5dcbefa0,python):2025-12-04-10:26:15.005.302 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] ME(1979751,fffe5d7fdfa0,python):2025-12-04-10:26:15.503.166 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: pp-0-1 E [WARNING] DEVICE(1979751,fffe5cfedfa0,python):2025-12-04-10:26:15.503.387 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup pp-0-1, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1979751,fffe5cfedfa0,python):2025-12-04-10:26:15.503.428 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] DEVICE(1979751,fffe5cfedfa0,python):2025-12-04-10:26:15.511.256 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1 E [WARNING] ME(1979751,fffe5d7fdfa0,python):2025-12-04-10:26:15.511.324 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: pp-0-1 E [WARNING] DEVICE(1979741,fffe5dcbefa0,python):2025-12-04-10:26:15.512.421 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1 E [WARNING] ME(1979741,fffe86fdefa0,python):2025-12-04-10:26:15.512.785 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: pp-0-1 E [WARNING] PARSER(1979741,ffffab00b020,python):2025-12-04-10:26:25.474.316 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979741,ffffab00b020,python):2025-12-04-10:26:25.562.650 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979741,ffffab00b020,python):2025-12-04-10:26:25.838.449 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979741,ffffab00b020,python):2025-12-04-10:26:25.893.237 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979741,ffffab00b020,python):2025-12-04-10:26:25.974.957 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] RUNTIME_FRAMEWORK(1979741,ffffab00b020,python):2025-12-04-10:26:26.196.644 [mindspore/ccsrc/runtime/core/graph_scheduler/base/graph_scheduler.cc:790] BuildAndScheduleGlobalActor] Failed to get DebuggerBackendEnabled, data dump function may not work. E mki_log delete old file:/home/jenkins/ascend/log/atb/atb_1958585_20251203231019.log E [WARNING] ME(1979741,ffffab00b020,python):2025-12-04-10:26:27.048.703 [mindspore/ccsrc/tools/error_handler/error_config.cc:171] operator()] Can find `TRE` in environment var `MS_ENABLE_TFT` E [WARNING] PARSER(1979751,ffff8d01b020,python):2025-12-04-10:26:27.318.215 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979751,ffff8d01b020,python):2025-12-04-10:26:27.403.324 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979751,ffff8d01b020,python):2025-12-04-10:26:27.672.897 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979751,ffff8d01b020,python):2025-12-04-10:26:27.729.642 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979751,ffff8d01b020,python):2025-12-04-10:26:27.803.943 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979751,ffff8d01b020,python):2025-12-04-10:26:27.865.500 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1979751,ffff8d01b020,python):2025-12-04-10:26:27.875.474 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] RUNTIME_FRAMEWORK(1979751,ffff8d01b020,python):2025-12-04-10:26:28.127.987 [mindspore/ccsrc/runtime/core/graph_scheduler/base/graph_scheduler.cc:790] BuildAndScheduleGlobalActor] Failed to get DebuggerBackendEnabled, data dump function may not work. E mki_log delete old file:/home/jenkins/ascend/log/atb/atb_1958671_20251203231019.log E [WARNING] ME(1979751,ffff8d01b020,python):2025-12-04-10:26:28.926.800 [mindspore/ccsrc/tools/error_handler/error_config.cc:171] operator()] Can find `TRE` in environment var `MS_ENABLE_TFT` E [INFO] DISTRIBUTED(1979741,fffed1acefa0,python):2025-12-04-10:26:29.114.964 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_server.cc:222] Start] Event base dispatch success! E [INFO] DISTRIBUTED(1979741,fffed12befa0,python):2025-12-04-10:26:29.114.964 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_client.cc:268] Start] Event base dispatch success! E [INFO] DISTRIBUTED(1979751,fffed2fdefa0,python):2025-12-04-10:26:29.131.135 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_client.cc:268] Start] Event base dispatch success! E [INFO] DISTRIBUTED(1979751,fffed37eefa0,python):2025-12-04-10:26:29.131.135 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_server.cc:222] Start] Event base dispatch success! E [WARNING] DISTRIBUTED(1979735,ffffb5ddb020,python):2025-12-04-10:26:35.872.311 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1979735,ffffb5ddb020,python):2025-12-04-10:26:40.872.516 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1979735,ffffb5ddb020,python):2025-12-04-10:26:45.872.699 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1979735,ffffb5ddb020,python):2025-12-04-10:26:50.872.830 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1979735,ffffb5ddb020,python):2025-12-04-10:26:55.872.982 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [ERROR] DISTRIBUTED(1979735,ffff26fdefa0,python):2025-12-04-10:27:00.377.434 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:544] UpdateTopoState] The node: 0 is timed out. It may exit with exception, please check this node's log. E [ERROR] DISTRIBUTED(1979735,ffff26fdefa0,python):2025-12-04-10:27:00.377.491 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:544] UpdateTopoState] The node: 1 is timed out. It may exit with exception, please check this node's log. E [ERROR] DISTRIBUTED(1979735,ffffb5ddb020,python):2025-12-04-10:27:00.873.166 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:95] Finalize] There are 2 abnormal compute graph nodes. E Traceback (most recent call last): E File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py", line 282, in E main() E File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py", line 277, in main E runner = GPTModelRunner(args) E File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py", line 100, in __init__ E init() E File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/communication/management.py", line 207, in init E init_cluster() E RuntimeError: The total number of timed out node is 2. Timed out node list is: [const vector]{0, 1}, worker 0 is the first one timed out, please check its log. E E ---------------------------------------------------- E - C++ Call Stack: (For framework developers) E ---------------------------------------------------- E mindspore/ccsrc/cluster/topology/meta_server_node.cc:550 UpdateTopoState E E Stderr: E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E [WARNING] ME(1979685:281473581690912,MainProcess):2025-12-04-10:26:01.895.325 [mindspore/parallel/cluster/process_entity/_api.py:268] Distributed job is spawned. Waiting all processes to exit... E [ERROR] ME(1979685:281473581690912,MainProcess):2025-12-04-10:26:29.924.860 [mindspore/parallel/cluster/process_entity/_api.py:358] Worker process 1979751 exit with exception. Error code: -6. E [WARNING] ME(1979685:281473581690912,MainProcess):2025-12-04-10:26:29.925.046 [mindspore/parallel/cluster/process_entity/_api.py:363] There's worker exits with exception, kill all other workers. E [ERROR] ME(1979685:281473581690912,MainProcess):2025-12-04-10:27:02.769.485 [mindspore/parallel/cluster/process_entity/_api.py:378] Scheduler process 1979735 exit with exception. E Traceback (most recent call last): E File "/home/jenkins/anaconda3/envs/ci39/bin/msrun", line 8, in E sys.exit(main()) E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py", line 191, in main E run(args) E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py", line 185, in run E process_manager.run() E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 269, in run E self.join_processes() E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 382, in join_processes E raise RuntimeError("Distributed job exited with exception. Please check logs in " E RuntimeError: Distributed job exited with exception. Please check logs in directory: /tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations0/msrun_log. E E assert 1 == 0 E +1 E -0 test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:188: AssertionError _ TestInferGPTModel.test_multi_card_configurations[model_args1-data_keys1-False-1-2] _ self = model_args = {'is_prefill': False, 'moe_grouped_gemm': False, 'multi_latent_attention': False, 'num_experts': None, ...} data_keys = {'output': 'output_standard_layer1'}, expect_error = False tensor_parallel = 1, pipeline_parallel = 2 tmp_path = PosixPath('/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1') @pytest.mark.parametrize(TWO_CARD_TEST_PARAM, TWO_CARD_TEST_CASES) @pytest.mark.level0 def test_multi_card_configurations(self, model_args, data_keys, expect_error, tensor_parallel, pipeline_parallel, tmp_path): """Test two cards with various configurations for GPTModel.""" num_devices = tensor_parallel * pipeline_parallel > self.run_test( worker_num=num_devices, local_worker_num=num_devices, model_args=model_args, data_keys=data_keys, expect_error=expect_error, tmp_path=tmp_path, tensor_parallel=tensor_parallel, pipeline_parallel=pipeline_parallel, port=random.randint(50000, 65535) ) test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:244: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:236: in run_test self.check_result(output_file_path, model_args, data_keys, cmd_result, expect_error) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = output_file_path = PosixPath('/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/output_ms.npz') model_args = {'is_prefill': False, 'moe_grouped_gemm': False, 'multi_latent_attention': False, 'num_experts': None, ...} data_keys = {'output': 'output_standard_layer1'} result = CompletedProcess(args=['msrun', '--worker_num=2', '--local_worker_num=2', '--master_port=65331', '--log_dir=/tmp/pytes...ception. Please check logs in directory: /tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log.\n') expect_error = False def check_result( self, output_file_path, model_args, data_keys, result, expect_error ): """Helper function to check results""" if expect_error: assert result.returncode != 0, ( f"Expected an error but test script passed. " f"Stdout:\n{result.stdout}\n" f"Stderr:\n{result.stderr}" ) else: > assert result.returncode == 0, ( f"Test script failed with non-zero exit code: " f"{result.returncode}.\nStdout:\n{result.stdout}\nStderr:\n{result.stderr}" ) E AssertionError: Test script failed with non-zero exit code: 1. E Stdout: E Start scheduler process, log file:/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log/scheduler.log. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 E Start worker process with rank id:0, log file:/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log/worker_0.log. Environment variable [RANK_ID=0] is exported. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 E Start worker process with rank id:1, log file:/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log/worker_1.log. Environment variable [RANK_ID=1] is exported. Execute command: python /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py --batch_size=2 --seq_length=2 --vocab_size=32 --hidden_size=32 --ffn_hidden_size=64 --num_attention_heads=2 --moe_grouped_gemm=false --qk_layernorm=false --multi_latent_attention=false --qk_l2_norm=false --sandwich_norm=false --num_layers=2 --output_path=/tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/output_ms.npz --tensor_parallel=1 --pipeline_parallel=2 E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E 2025-12-04 10:27:13,632 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. E 2025-12-04 10:27:13,740 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/anaconda3/envs/ci39/lib/python3.9/warnings.py:109] - WARNING - UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:17.371.222 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:40748, destination: 127.0.0.1:65331 E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:17.381.385 [mindspore/ccsrc/cluster/topology/compute_graph_node.cc:175] Register] Failed to connect to the meta server node url: 127.0.0.1:65331 E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:17.381.427 [mindspore/ccsrc/cluster/topology/compute_graph_node.cc:366] ReconnectWithTimeoutWindow] Failed to register and try to reconnect to the meta server. E [WARNING] DISTRIBUTED(1984392,ffff8faeb020,python):2025-12-04-10:27:17.814.248 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:40760, destination: 127.0.0.1:65331 E [WARNING] DISTRIBUTED(1984392,ffff8faeb020,python):2025-12-04-10:27:17.814.388 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 21 source: 127.0.0.1:40776, destination: 127.0.0.1:65331 E [WARNING] DISTRIBUTED(1984392,ffff8faeb020,python):2025-12-04-10:27:17.814.962 [mindspore/ccsrc/cluster/topology/cluster_context.cc:319] BuildCluster] Topology build timed out., retry(1/1200). E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:17.881.859 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:40792, destination: 127.0.0.1:65331 E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:17.881.993 [mindspore/ccsrc/cluster/rpc/tcp/tcp_comm.cc:485] Connect] Connection 21 source: 127.0.0.1:40796, destination: 127.0.0.1:65331 E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:17.882.455 [mindspore/ccsrc/cluster/topology/cluster_context.cc:319] BuildCluster] Topology build timed out., retry(1/1200). E [WARNING] DISTRIBUTED(1984392,ffff8faeb020,python):2025-12-04-10:27:18.315.114 [mindspore/ccsrc/cluster/topology/cluster_context.cc:319] BuildCluster] Topology build timed out., retry(2/1200). E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:18.382.625 [mindspore/ccsrc/cluster/topology/cluster_context.cc:322] BuildCluster] Cluster is successfully initialized. E [WARNING] DISTRIBUTED(1984383,ffff88bfb020,python):2025-12-04-10:27:18.382.683 [mindspore/ccsrc/cluster/topology/cluster_context.cc:431] PostProcess] This node 0 rank id: 0 E [WARNING] DISTRIBUTED(1984392,ffff8faeb020,python):2025-12-04-10:27:18.815.315 [mindspore/ccsrc/cluster/topology/cluster_context.cc:322] BuildCluster] Cluster is successfully initialized. E [WARNING] DISTRIBUTED(1984392,ffff8faeb020,python):2025-12-04-10:27:18.815.389 [mindspore/ccsrc/cluster/topology/cluster_context.cc:431] PostProcess] This node 1 rank id: 1 E [WARNING] DEVICE(1984392,fffeb0f5efa0,python):2025-12-04-10:27:20.014.335 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 399/400, sleep 1 E [WARNING] DEVICE(1984392,fffeb0f5efa0,python):2025-12-04-10:27:20.514.531 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 398/400, sleep 2 E [WARNING] DEVICE(1984392,fffeb0f5efa0,python):2025-12-04-10:27:21.014.734 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 397/400, sleep 1 E [WARNING] DEVICE(1984392,fffeb0f5efa0,python):2025-12-04-10:27:21.514.916 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 396/400, sleep 1 E [WARNING] DEVICE(1984392,fffeb0f5efa0,python):2025-12-04-10:27:22.015.130 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 395/400, sleep 2 E [WARNING] DEVICE(1984392,fffeb0f5efa0,python):2025-12-04-10:27:22.515.352 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 394/400, sleep 2 E [WARNING] ME(1984383,fffe8d7aefa0,python):2025-12-04-10:27:22.604.276 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group E [WARNING] DEVICE(1984383,fffe6afdefa0,python):2025-12-04-10:27:22.604.521 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup hccl_world_group, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1984383,fffe6afdefa0,python):2025-12-04-10:27:22.604.578 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] ME(1984392,fffeb0f5efa0,python):2025-12-04-10:27:23.015.603 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group E [WARNING] DEVICE(1984392,fffe73ffefa0,python):2025-12-04-10:27:23.016.041 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup hccl_world_group, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1984392,fffe73ffefa0,python):2025-12-04-10:27:23.016.101 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] DEVICE(1984383,fffe6afdefa0,python):2025-12-04-10:27:23.817.951 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group E [WARNING] DEVICE(1984392,fffe73ffefa0,python):2025-12-04-10:27:23.818.127 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group E [WARNING] ME(1984383,fffe8d7aefa0,python):2025-12-04-10:27:23.818.223 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: hccl_world_group E [WARNING] ME(1984392,fffeb0f5efa0,python):2025-12-04-10:27:23.818.270 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: hccl_world_group E [WARNING] DEVICE(1984392,fffeb0f5efa0,python):2025-12-04-10:27:23.819.986 [mindspore/ccsrc/plugin/cpu/res_manager/collective/ms_collective_comm_lib.cc:254] QueryUniqueID] Retry to lookup the unique id for group pp-0-1 from the meta server node...Retry time: 399/400, sleep 1 E [WARNING] ME(1984383,fffe8d7aefa0,python):2025-12-04-10:27:23.820.661 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: pp-0-1 E [WARNING] DEVICE(1984383,fffe6a7cefa0,python):2025-12-04-10:27:23.820.830 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup pp-0-1, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1984383,fffe6a7cefa0,python):2025-12-04-10:27:23.820.871 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] ME(1984392,fffeb0f5efa0,python):2025-12-04-10:27:24.320.179 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:957] CreateDeviceCommunicator] Begin initialize communication group on the device side: pp-0-1 E [WARNING] DEVICE(1984392,fffe73ffefa0,python):2025-12-04-10:27:24.320.378 [mindspore/ccsrc/plugin/ascend/res_manager/collective/utils.cc:281] GetHcclBufferSize] HcclGroup pp-0-1, ranks are [0, 1], default hcclBufferSize: 200 MB. E [WARNING] DEVICE(1984392,fffe73ffefa0,python):2025-12-04-10:27:24.320.418 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:246] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1, hcclBufferSize is 200 MB, hcclDeterministic is 1 E [WARNING] DEVICE(1984392,fffe73ffefa0,python):2025-12-04-10:27:24.328.455 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1 E [WARNING] ME(1984392,fffeb0f5efa0,python):2025-12-04-10:27:24.328.525 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: pp-0-1 E [WARNING] DEVICE(1984383,fffe6a7cefa0,python):2025-12-04-10:27:24.328.581 [mindspore/ccsrc/plugin/ascend/res_manager/collective/ascend_communication_group.cc:261] InitByRootInfoConfig] End to initialize communicator by HcclCommInitRootInfoConfig for pp-0-1 E [WARNING] ME(1984383,fffe8d7aefa0,python):2025-12-04-10:27:24.328.756 [mindspore/ccsrc/runtime/hardware_abstract/collective/collective_manager.cc:968] CreateDeviceCommunicator] End initialize communication group on the device side: pp-0-1 E [WARNING] PARSER(1984383,ffff88bfb020,python):2025-12-04-10:27:34.095.454 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984383,ffff88bfb020,python):2025-12-04-10:27:34.182.651 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984383,ffff88bfb020,python):2025-12-04-10:27:34.348.737 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984383,ffff88bfb020,python):2025-12-04-10:27:34.403.695 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984383,ffff88bfb020,python):2025-12-04-10:27:34.481.120 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] RUNTIME_FRAMEWORK(1984383,ffff88bfb020,python):2025-12-04-10:27:34.682.071 [mindspore/ccsrc/runtime/core/graph_scheduler/base/graph_scheduler.cc:790] BuildAndScheduleGlobalActor] Failed to get DebuggerBackendEnabled, data dump function may not work. E mki_log delete old file:/home/jenkins/ascend/log/atb/atb_1960063_20251203231051.log E [WARNING] ME(1984383,ffff88bfb020,python):2025-12-04-10:27:35.472.320 [mindspore/ccsrc/tools/error_handler/error_config.cc:171] operator()] Can find `TRE` in environment var `MS_ENABLE_TFT` E [WARNING] PARSER(1984392,ffff8faeb020,python):2025-12-04-10:27:35.754.682 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984392,ffff8faeb020,python):2025-12-04-10:27:35.841.826 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984392,ffff8faeb020,python):2025-12-04-10:27:36.010.497 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984392,ffff8faeb020,python):2025-12-04-10:27:36.069.421 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] PARSER(1984392,ffff8faeb020,python):2025-12-04-10:27:36.150.491 [mindspore/ccsrc/frontend/jit/ps/parse/data_converter.cc:661] CheckAPI] The mint interface reshape was called, and the operators under this interface have different view capabilities on pynative and graph mode. Use this interface with caution in graph mode, as it may produce unexpected results. For more information, please refer to: https://www.mindspore.cn/docs/en/master/features/view.html E E [WARNING] RUNTIME_FRAMEWORK(1984392,ffff8faeb020,python):2025-12-04-10:27:36.426.180 [mindspore/ccsrc/runtime/core/graph_scheduler/base/graph_scheduler.cc:790] BuildAndScheduleGlobalActor] Failed to get DebuggerBackendEnabled, data dump function may not work. E mki_log delete old file:/home/jenkins/ascend/log/atb/atb_1960077_20251203231052.log E [WARNING] ME(1984392,ffff8faeb020,python):2025-12-04-10:27:37.204.433 [mindspore/ccsrc/tools/error_handler/error_config.cc:171] operator()] Can find `TRE` in environment var `MS_ENABLE_TFT` E [INFO] DISTRIBUTED(1984392,fffef77eefa0,python):2025-12-04-10:27:37.410.844 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_server.cc:222] Start] Event base dispatch success! E [INFO] DISTRIBUTED(1984392,fffef6fdefa0,python):2025-12-04-10:27:37.410.844 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_client.cc:268] Start] Event base dispatch success! E [INFO] DISTRIBUTED(1984383,fffee7ffefa0,python):2025-12-04-10:27:37.442.611 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_client.cc:268] Start] Event base dispatch success! E [INFO] DISTRIBUTED(1984383,fffefce8efa0,python):2025-12-04-10:27:37.442.610 [mindspore/ccsrc/cluster/rpc/core/communicator/tcp_server.cc:222] Start] Event base dispatch success! E [WARNING] DISTRIBUTED(1984374,ffff8d42b020,python):2025-12-04-10:27:47.940.764 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1984374,ffff8d42b020,python):2025-12-04-10:27:52.940.938 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1984374,ffff8d42b020,python):2025-12-04-10:27:57.941.053 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1984374,ffff8d42b020,python):2025-12-04-10:28:02.941.228 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [WARNING] DISTRIBUTED(1984374,ffff8d42b020,python):2025-12-04-10:28:07.941.349 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:91] Finalize] Cluster currently has 2 alive nodes. E [ERROR] DISTRIBUTED(1984374,ffff02eeefa0,python):2025-12-04-10:28:08.445.429 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:544] UpdateTopoState] The node: 0 is timed out. It may exit with exception, please check this node's log. E [ERROR] DISTRIBUTED(1984374,ffff02eeefa0,python):2025-12-04-10:28:08.445.468 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:544] UpdateTopoState] The node: 1 is timed out. It may exit with exception, please check this node's log. E [ERROR] DISTRIBUTED(1984374,ffff8d42b020,python):2025-12-04-10:28:12.941.504 [mindspore/ccsrc/cluster/topology/meta_server_node.cc:95] Finalize] There are 2 abnormal compute graph nodes. E Traceback (most recent call last): E File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py", line 282, in E main() E File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py", line 277, in main E runner = GPTModelRunner(args) E File "/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/run_infer_gpt_model.py", line 100, in __init__ E init() E File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/communication/management.py", line 207, in init E init_cluster() E RuntimeError: The total number of timed out node is 2. Timed out node list is: [const vector]{0, 1}, worker 0 is the first one timed out, please check its log. E E ---------------------------------------------------- E - C++ Call Stack: (For framework developers) E ---------------------------------------------------- E mindspore/ccsrc/cluster/topology/meta_server_node.cc:550 UpdateTopoState E E Stderr: E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. E setattr(self, word, getattr(machar, word).flat[0]) E /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. E return self._float_to_str(self.smallest_subnormal) E [WARNING] ME(1984000:281473024897056,MainProcess):2025-12-04-10:27:08.866.030 [mindspore/parallel/cluster/process_entity/_api.py:268] Distributed job is spawned. Waiting all processes to exit... E [ERROR] ME(1984000:281473024897056,MainProcess):2025-12-04-10:27:38.897.721 [mindspore/parallel/cluster/process_entity/_api.py:358] Worker process 1984383 exit with exception. Error code: -6. E [WARNING] ME(1984000:281473024897056,MainProcess):2025-12-04-10:27:38.897.871 [mindspore/parallel/cluster/process_entity/_api.py:363] There's worker exits with exception, kill all other workers. E [ERROR] ME(1984000:281473024897056,MainProcess):2025-12-04-10:28:14.616.902 [mindspore/parallel/cluster/process_entity/_api.py:378] Scheduler process 1984374 exit with exception. E Traceback (most recent call last): E File "/home/jenkins/anaconda3/envs/ci39/bin/msrun", line 8, in E sys.exit(main()) E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py", line 191, in main E run(args) E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py", line 185, in run E process_manager.run() E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 269, in run E self.join_processes() E File "/home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 382, in join_processes E raise RuntimeError("Distributed job exited with exception. Please check logs in " E RuntimeError: Distributed job exited with exception. Please check logs in directory: /tmp/pytest-of-jenkins/pytest-9/test_multi_card_configurations1/msrun_log. E E assert 1 == 0 E +1 E -0 test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py:188: AssertionError =========================== short test summary info ============================ FAILED test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py::TestInferGPTModel::test_multi_card_configurations[model_args0-data_keys0-False-1-2] FAILED test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py::TestInferGPTModel::test_multi_card_configurations[model_args1-data_keys1-False-1-2] ================== 2 failed, 22 warnings in 147.44s (0:02:27) ================== 2025-12-04 10:28:17,877 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:299] - INFO - 🛑 Task interrupted: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py 2025-12-04 10:28:17,877 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:464] - ERROR - ❌ Task failed: pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py - Command 'pytest -vs --disable-warnings -m 'level0' /home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_parallel_core/test_inference/test_base_models/test_gpt_model/test_infer_gpt_model_parallel.py' returned non-zero exit status 1. 2025-12-04 10:28:18,803 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:478] - WARNING - 🔚 Group execution aborted due to failures 2025-12-04 10:28:18,803 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:370] - INFO - Group 2 completed in 151.001s 2025-12-04 10:28:18,803 - mindformers/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/output/log[/home/jenkins/mindspore/testcases/testcases/tests/st/networks/large_models/test_multi_cards_cases/test_run_multi_cards_cases.py:379] - INFO - Total execution INTERRUPTED in 151.002s F =================================== FAILURES =================================== ________________________ test_multi_cards_level0_cases _________________________ @arg_mark(plat_marks=['platform_ascend910b'], level_mark='level0', card_mark='allcards', essential_mark='essential') def test_multi_cards_level0_cases(): """ Feature: Multi-card Level 0 Test Execution Description: This test function gathers all task cases labeled as "level0" using the `collect_task_cases` function, initializes a scheduler with these cases, and executes them by invoking the scheduler's `run` method. Expectation: All "level0" multi-card task cases are executed successfully without errors. """ scheduler = collect_task_cases("level0") success, total_time = scheduler.run() # 检查执行结果 > assert success, "One, or more tasks failed during execution." E AssertionError: One, or more tasks failed during execution. E assert False test_run_multi_cards_cases.py:582: AssertionError =============================== warnings summary =============================== ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2.py:57 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2.py:57: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("batchnorm_fold2") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2_grad.py:56 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2_grad.py:56: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("batchnorm_fold2_grad") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2_grad_reduce.py:48 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2_grad_reduce.py:48: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("batchnorm_fold2_grad_reduce") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul.py:51 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul.py:51: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("correction_mul") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul_grad.py:51 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul_grad.py:51: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("correction_mul_grad") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul_grad.py:143 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul_grad.py:143: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("correction_mul_grad_reduce") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer.py:50 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer.py:50: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perlayer") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer_grad.py:92 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer_grad.py:92: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perlayer_grad_d") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer_grad_reduce.py:49 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer_grad_reduce.py:49: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perlayer_grad_d_reduce") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel.py:50 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel.py:50: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perchannel") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel_grad.py:91 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel_grad.py:91: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perchannel_grad_d") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel_grad_reduce.py:48 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel_grad_reduce.py:48: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perchannel_grad_d_reduce") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perchannel.py:52 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perchannel.py:52: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_quant_perchannel") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perchannel_grad.py:81 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perchannel_grad.py:81: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_quant_perchannel_grad") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perlayer.py:54 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perlayer.py:54: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_quant_per_layer") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perlayer_grad.py:81 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perlayer_grad.py:81: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_quant_per_layer_grad") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/minmax_update_perchannel.py:50 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/minmax_update_perchannel.py:50: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("minmax_update_perchannel") ../../../../../../../../../miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/minmax_update_perlayer.py:50 /home/miniconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/minmax_update_perlayer.py:50: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("minmax_update_perlayer") -- Docs: https://docs.pytest.org/en/stable/warnings.html =========================== short test summary info ============================ FAILED test_run_multi_cards_cases.py::test_multi_cards_level0_cases - Asserti... ================== 1 failed, 22 warnings in 159.58s (0:02:39) ==================