============================= test session starts ============================== platform linux -- Python 3.7.5, pytest-5.4.3, py-1.11.0, pluggy-0.13.1 rootdir: /home/jenkins/mindspore/testcases/testcases/tests/st/msrun, inifile: /home/jenkins/sault/virtual_test/virtualenv_009/sault/config/pytest.ini plugins: anyio-3.7.1, xdist-1.32.0, forked-1.4.0 collected 1 item test_entry_msrun.py [INFO] ATRACE(107727,python3.7):2024-01-10-09:05:52.670.620 [trace_attr.c:105](tid:107727) platform is 1. [INFO] ATRACE(107727,python3.7):2024-01-10-09:05:52.670.796 [trace_recorder.c:114](tid:107727) use root path: /home/jenkins/ascend/atrace [INFO] ATRACE(107727,python3.7):2024-01-10-09:05:52.670.838 [trace_signal.c:133](tid:107727) register signal handler for signo 2 succeed. [INFO] ATRACE(107727,python3.7):2024-01-10-09:05:52.670.848 [trace_signal.c:133](tid:107727) register signal handler for signo 15 succeed. [INFO] RUNTIME(107727,python3.7):2024-01-10-09:05:53.039.134 [runtime.cc:1159] 107727 GetAicoreNumByLevel: workingDev_=0 [INFO] RUNTIME(107727,python3.7):2024-01-10-09:05:53.039.216 [runtime.cc:4719] 107727 GetVisibleDevices: ASCEND_RT_VISIBLE_DEVICES param was not set [WARNING] ME(107727:281473598599184,MainProcess):2024-01-10-09:05:55.164.617 [mindspore/run_check/_check_version.py:382] MindSpore version 2.3.0 and "hccl" wheel package version 6.3 does not match. For details, refer to the installation guidelines: https://www.mindspore.cn/install [WARNING] ME(107727:281473598599184,MainProcess):2024-01-10-09:05:55.164.777 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 3 [WARNING] ME(107727:281473598599184,MainProcess):2024-01-10-09:05:56.165.601 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 2 [WARNING] ME(107727:281473598599184,MainProcess):2024-01-10-09:05:57.166.698 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 1 [WARNING] ME(107727:281473598599184,MainProcess):2024-01-10-09:05:59.118.376 [mindspore/common/_decorator.py:40] 'Expand' is deprecated from version 2.1 and will be removed in a future version, use 'BroadcastTo' instead. [WARNING] ME(107727:281473598599184,MainProcess):2024-01-10-09:05:59.120.983 [mindspore/common/_decorator.py:40] 'ScalarToArray' is deprecated from version 2.0 and will be removed in a future version, use 'ops.scalar_to_tensor' instead. [WARNING] ME(107727:281473598599184,MainProcess):2024-01-10-09:06:00.243.603 [mindspore/parallel/cluster/process_entity/_api.py:186] Start worker process with rank id:0, log file:worker_0.log [WARNING] ME(107727:281473598599184,MainProcess):2024-01-10-09:06:00.279.497 [mindspore/parallel/cluster/process_entity/_api.py:186] Start worker process with rank id:1, log file:worker_1.log [WARNING] ME(107727:281473598599184,MainProcess):2024-01-10-09:06:00.321.346 [mindspore/parallel/cluster/process_entity/_api.py:186] Start worker process with rank id:2, log file:worker_2.log [WARNING] ME(107727:281473598599184,MainProcess):2024-01-10-09:06:00.369.930 [mindspore/parallel/cluster/process_entity/_api.py:186] Start worker process with rank id:3, log file:worker_3.log [WARNING] ME(107727:281473598599184,MainProcess):2024-01-10-09:06:00.370.288 [mindspore/parallel/cluster/process_entity/_api.py:158] Distributed job is spawned. Waiting all processes to exit... [ERROR] ME(107727:281473598599184,MainProcess):2024-01-10-09:10:45.821.642 [mindspore/parallel/cluster/process_entity/_api.py:212] Worker process 107934 exit with exception. [ERROR] ME(107727:281473598599184,MainProcess):2024-01-10-09:10:45.821.902 [mindspore/parallel/cluster/process_entity/_api.py:212] Worker process 107935 exit with exception. [ERROR] ME(107727:281473598599184,MainProcess):2024-01-10-09:10:45.822.072 [mindspore/parallel/cluster/process_entity/_api.py:212] Worker process 107938 exit with exception. [ERROR] ME(107727:281473598599184,MainProcess):2024-01-10-09:10:45.822.215 [mindspore/parallel/cluster/process_entity/_api.py:220] Analyzing exception log... IP address found on this node. Address info:{'family': 'inet', 'local': '127.0.0.1', 'prefixlen': 8, 'scope': 'host', 'label': 'lo', 'valid_life_time': 4294967295, 'preferred_life_time': 4294967295}. Traceback (most recent call last): File "/home/jenkins/.local/bin/msrun", line 8, in sys.exit(main()) File "/home/jenkins/.local/lib/python3.7/site-packages/mindspore/parallel/cluster/run.py", line 129, in main run(args) File "/home/jenkins/.local/lib/python3.7/site-packages/mindspore/parallel/cluster/run.py", line 123, in run process_manager.run() File "/home/jenkins/.local/lib/python3.7/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 159, in run self.join_processes() File "/home/jenkins/.local/lib/python3.7/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 222, in join_processes raise RuntimeError("Distributed job exited with exception. Please check logs in " RuntimeError: Distributed job exited with exception. Please check logs in directory: . [INFO] RUNTIME(107727,python3.7):2024-01-10-09:10:47.192.043 [runtime.cc:1737] 107727 ~Runtime: deconstruct runtime. F =================================== FAILURES =================================== __________________________________ test_msrun __________________________________ @pytest.mark.level0 @pytest.mark.platform_arm_ascend_training @pytest.mark.platform_x86_ascend_training @pytest.mark.env_single def test_msrun(): """ Feature: 'msrun' launch utility. Description: Launch distributed training job with dynamic cluster using msrun. Expectation: All workers are successfully spawned and running training. """ return_code = os.system( "export GLOG_v=2 && msrun --worker_num=4 --local_worker_num=4 --master_addr=127.0.0.1 --master_port=10969 --join=True "\ "test_msrun.py --device_target=Ascend --dataset_path=/home/workspace/mindspore_dataset/mnist" ) > assert return_code == 0 E assert 256 == 0 test_entry_msrun.py:33: AssertionError =========================== short test summary info ============================ FAILED test_entry_msrun.py::test_msrun - assert 256 == 0 ======================== 1 failed in 296.23s (0:04:56) =========================