============================= test session starts ============================== platform linux -- Python 3.7.5, pytest-5.4.3, py-1.11.0, pluggy-0.13.1 rootdir: /home/jenkins/mindspore/testcases/testcases/tests/st/msrun, inifile: /home/jenkins/sault/virtual_test/virtualenv_009/sault/config/pytest.ini plugins: anyio-3.7.1, xdist-1.32.0, forked-1.4.0 collected 1 item test_entry_msrun.py [INFO] ATRACE(119321,python3.7):2024-01-10-16:10:27.084.963 [trace_attr.c:105](tid:119321) platform is 1. [INFO] ATRACE(119321,python3.7):2024-01-10-16:10:27.085.145 [trace_recorder.c:114](tid:119321) use root path: /home/jenkins/ascend/atrace [INFO] ATRACE(119321,python3.7):2024-01-10-16:10:27.085.174 [trace_signal.c:133](tid:119321) register signal handler for signo 2 succeed. [INFO] ATRACE(119321,python3.7):2024-01-10-16:10:27.085.184 [trace_signal.c:133](tid:119321) register signal handler for signo 15 succeed. [INFO] CORE(119321,ffff9d083440,python3.7):2024-01-10-16:10:27.481.924 [mindspore/core/utils/ms_context.cc:225] set_backend_policy] ms set context backend policy:ge [INFO] RUNTIME(119321,python3.7):2024-01-10-16:10:27.482.108 [runtime.cc:1159] 119321 GetAicoreNumByLevel: workingDev_=0 [INFO] RUNTIME(119321,python3.7):2024-01-10-16:10:27.482.154 [runtime.cc:4719] 119321 GetVisibleDevices: ASCEND_RT_VISIBLE_DEVICES param was not set [INFO] DEVICE(119321,ffff9d083440,python3.7):2024-01-10-16:10:27.482.889 [mindspore/ccsrc/plugin/device/ascend/hal/hardware/ge_device_context.cc:580] SetContextSocVersion] The soc version :Ascend910PremiumA [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:10:27.552.329 [mindspore/ccsrc/pybind_api/ir/log_adapter_py.h:34] PyExceptionInitializer] Set exception handler [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:10:27.566.455 [mindspore/ccsrc/pipeline/jit/ps/init.cc:179] pybind11_init__c_expression] Start GraphExecutorPy... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:10:27.567.135 [mindspore/ccsrc/pipeline/jit/ps/init.cc:271] pybind11_init__c_expression] Start ParallelContext... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:10:27.567.785 [mindspore/ccsrc/pipeline/jit/ps/init.cc:379] pybind11_init__c_expression] Start CostModelContext... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:10:27.568.245 [mindspore/ccsrc/pipeline/jit/ps/init.cc:481] pybind11_init__c_expression] Start OffloadContext... [INFO] DEVICE(119321,ffff9d083440,python3.7):2024-01-10-16:10:27.570.140 [mindspore/ccsrc/plugin/device/ascend/hal/hardware/ge_device_context.cc:580] SetContextSocVersion] The soc version :Ascend910PremiumA [WARNING] ME(119321:281473316303936,MainProcess):2024-01-10-16:10:29.734.716 [mindspore/run_check/_check_version.py:382] MindSpore version 2.3.0 and "hccl" wheel package version 6.3 does not match. For details, refer to the installation guidelines: https://www.mindspore.cn/install [WARNING] ME(119321:281473316303936,MainProcess):2024-01-10-16:10:29.734.962 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 3 [WARNING] ME(119321:281473316303936,MainProcess):2024-01-10-16:10:30.736.139 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 2 [WARNING] ME(119321:281473316303936,MainProcess):2024-01-10-16:10:31.737.396 [mindspore/run_check/_check_version.py:396] Please pay attention to the above warning, countdown: 1 [INFO] CORE(119321,ffff9d083440,python3.7):2024-01-10-16:10:32.739.025 [mindspore/core/utils/ms_context.cc:362] SetDeviceTargetFromInner] ms set context device target:Ascend [INFO] PARALLEL(119321,ffff9d083440,python3.7):2024-01-10-16:10:32.739.103 [mindspore/ccsrc/frontend/parallel/costmodel_context.cc:30] GetInstance] Create costmodel_context [INFO] ME(119321:281473316303936,MainProcess):2024-01-10-16:10:32.739.226 [mindspore/run_check/_check_version.py:544] Setting the env `PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python` to prevent memory overflow during save or load checkpoint file. [WARNING] ME(119321:281473316303936,MainProcess):2024-01-10-16:10:34.906.415 [mindspore/parallel/cluster/process_entity/_api.py:186] Start worker process with rank id:0, log file:worker_0.log [WARNING] ME(119321:281473316303936,MainProcess):2024-01-10-16:10:34.955.310 [mindspore/parallel/cluster/process_entity/_api.py:186] Start worker process with rank id:1, log file:worker_1.log [WARNING] ME(119321:281473316303936,MainProcess):2024-01-10-16:10:35.135.88 [mindspore/parallel/cluster/process_entity/_api.py:186] Start worker process with rank id:2, log file:worker_2.log [WARNING] ME(119321:281473316303936,MainProcess):2024-01-10-16:10:35.750.01 [mindspore/parallel/cluster/process_entity/_api.py:186] Start worker process with rank id:3, log file:worker_3.log [WARNING] ME(119321:281473316303936,MainProcess):2024-01-10-16:10:35.754.09 [mindspore/parallel/cluster/process_entity/_api.py:158] Distributed job is spawned. Waiting all processes to exit... [ERROR] ME(119321:281473316303936,MainProcess):2024-01-10-16:15:14.945.166 [mindspore/parallel/cluster/process_entity/_api.py:212] Worker process 119517 exit with exception. [ERROR] ME(119321:281473316303936,MainProcess):2024-01-10-16:15:14.945.515 [mindspore/parallel/cluster/process_entity/_api.py:212] Worker process 119518 exit with exception. [ERROR] ME(119321:281473316303936,MainProcess):2024-01-10-16:15:16.328.937 [mindspore/parallel/cluster/process_entity/_api.py:212] Worker process 119519 exit with exception. [ERROR] ME(119321:281473316303936,MainProcess):2024-01-10-16:15:16.329.260 [mindspore/parallel/cluster/process_entity/_api.py:220] Analyzing exception log... IP address found on this node. Address info:{'family': 'inet', 'local': '127.0.0.1', 'prefixlen': 8, 'scope': 'host', 'label': 'lo', 'valid_life_time': 4294967295, 'preferred_life_time': 4294967295}. [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.455.109 [mindspore/ccsrc/pipeline/jit/ps/init.cc:515] operator()] Start register... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.455.173 [mindspore/ccsrc/pipeline/jit/ps/init.cc:519] operator()] Start mindspore.profiler... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.455.293 [mindspore/ccsrc/pipeline/jit/ps/init.cc:527] operator()] Start EmbeddingCacheScheduler... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.455.378 [mindspore/ccsrc/pipeline/jit/ps/init.cc:534] operator()] Start releasing dataset handles... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.455.451 [mindspore/ccsrc/pipeline/jit/ps/init.cc:537] operator()] End release dataset handles. [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.455.470 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2335] ClearResAtexit] Pipeline clear all resource [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.455.581 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:290] RecordExitStatus] Status record: system exit. [INFO] DEBUG(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.455.627 [mindspore/ccsrc/common/debug/env_config_parser.cc:152] ParseFromFile] The 'env_config_path' in 'mindspore.context.set_context(env_config_path={path})' is empty. [INFO] ME(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.455.717 [mindspore/core/mindrt/src/actor/actormgr.cc:153] Finalize] mindrt Actors finish exiting. [INFO] ME(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.455.733 [mindspore/core/mindrt/src/actor/actormgr.cc:156] Finalize] mindrt Threads finish exiting. [INFO] ME(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.455.747 [mindspore/core/mindrt/src/actor/actormgr.cc:167] Finalize] mindrt IOMGRS finish exiting. [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.455.791 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2207] ClearResPart1] Start Finalize StreamSynchronizer... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.455.821 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2209] ClearResPart1] End Finalize StreamSynchronizer... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.457.307 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:829] ClearRes] Clean executor resource! [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.457.353 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2223] ClearResPart2] Start clear PyNativeExecutor... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.457.520 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2225] ClearResPart2] End clear PyNativeExecutor. [INFO] GE_ADPT(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.457.587 [mindspore/ccsrc/transform/graph_ir/df_graph_manager.cc:179] ClearGraph] Remove all graphs in GraphManager [INFO] DEVICE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.466.668 [mindspore/ccsrc/plugin/device/ascend/hal/hardware/ascend_deprecated_interface.cc:385] UnregisterExternalAllocator] The graph_runner is not exist [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.466.700 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2241] ClearResPart2] Start clear kernel runtime... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.466.731 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2243] ClearResPart2] End clear kernel runtime. [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.466.748 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2258] ClearResPart2] Start clear device context... [INFO] ME(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.466.763 [mindspore/ccsrc/runtime/hardware/device_context_manager.cc:469] ClearDeviceContexts] Release device Ascend_0 [INFO] DEVICE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.466.854 [mindspore/ccsrc/plugin/device/ascend/hal/hardware/ascend_deprecated_interface.cc:317] CloseTsd] Start to close tsd, ref = 0 [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.466.876 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2260] ClearResPart2] End clear device context. [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.466.888 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2262] ClearResPart2] Start clear AnalysisResultCacheMgr... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.466.901 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2264] ClearResPart2] End clear AnalysisResultCacheMgr. [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.466.911 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2266] ClearResPart2] Start clear AnalysisContext... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.466.933 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2268] ClearResPart2] End clear AnalysisContext... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.466.944 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2270] ClearResPart2] Start clear AnalysisSchedule... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.467.650 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2272] ClearResPart2] End clear AnalysisSchedule... [INFO] DEBUG(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.467.700 [mindspore/ccsrc/debug/debugger/debugger.cc:101] Debugger] Debugger got device_target: Ascend [INFO] DEBUG(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.467.742 [mindspore/ccsrc/debug/debugger/debugger.cc:305] Reset] Release Debugger resource. [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.467.791 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2285] ClearResPart3] Start clear ClearObjectCache... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.467.803 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2287] ClearResPart3] End clear ClearObjectCache... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.467.813 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2289] ClearResPart3] Start clear Parser... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.467.840 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2291] ClearResPart3] End clear Parser... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.467.851 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2293] ClearResPart3] Start ClearTraceStack... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.467.906 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2295] ClearResPart3] End ClearTraceStack... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.467.927 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2297] ClearResPart3] Start clear InterpretNodeRecorder... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.467.941 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2299] ClearResPart3] End clear InterpretNodeRecorder... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.467.951 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2301] ClearResPart3] Start clear parallel::entire_costgraph... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.467.972 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2303] ClearResPart3] End clear parallel::entire_costgraph... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.467.983 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2305] ClearResPart3] Start clear ProtobufLibrary... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.468.302 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2307] ClearResPart3] End clear ProtobufLibrary... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.468.318 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2309] ClearResPart3] Start clear python_adapter... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.468.330 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2311] ClearResPart3] End clear python_adapter. [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.468.341 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2315] ClearSingleton] Start clear singleton... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.468.554 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2331] ClearSingleton] End clear singleton. [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.468.567 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2347] ClearResAtexit] Start unload dynamic lib... [INFO] PIPELINE(119321,ffff9d083440,python3.7):2024-01-10-16:15:16.468.600 [mindspore/ccsrc/pipeline/jit/ps/pipeline.cc:2349] ClearResAtexit] End unload dynamic lib... Traceback (most recent call last): File "/home/jenkins/.local/bin/msrun", line 8, in sys.exit(main()) File "/home/jenkins/.local/lib/python3.7/site-packages/mindspore/parallel/cluster/run.py", line 129, in main run(args) File "/home/jenkins/.local/lib/python3.7/site-packages/mindspore/parallel/cluster/run.py", line 123, in run process_manager.run() File "/home/jenkins/.local/lib/python3.7/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 159, in run self.join_processes() File "/home/jenkins/.local/lib/python3.7/site-packages/mindspore/parallel/cluster/process_entity/_api.py", line 222, in join_processes raise RuntimeError("Distributed job exited with exception. Please check logs in " RuntimeError: Distributed job exited with exception. Please check logs in directory: . [INFO] GE_ADPT(119321,ffff9d083440,python3.7):2024-01-10-16:15:17.061.901 [mindspore/ccsrc/transform/graph_ir/df_graph_manager.cc:261] DeleteGraphRunner] GraphRunner is not exist [INFO] GE_ADPT(119321,ffff9d083440,python3.7):2024-01-10-16:15:17.061.973 [mindspore/ccsrc/transform/graph_ir/df_graph_manager.cc:225] DeleteGeSession] Ge Session is not exist [INFO] GE_ADPT(119321,ffff9d083440,python3.7):2024-01-10-16:15:17.061.990 [mindspore/ccsrc/transform/graph_ir/df_graph_manager.cc:179] ClearGraph] Remove all graphs in GraphManager [INFO] RUNTIME(119321,python3.7):2024-01-10-16:15:17.790.653 [runtime.cc:1737] 119321 ~Runtime: deconstruct runtime. F =================================== FAILURES =================================== __________________________________ test_msrun __________________________________ @pytest.mark.level0 @pytest.mark.platform_arm_ascend_training @pytest.mark.platform_x86_ascend_training @pytest.mark.env_single def test_msrun(): """ Feature: 'msrun' launch utility. Description: Launch distributed training job with dynamic cluster using msrun. Expectation: All workers are successfully spawned and running training. """ return_code = os.system( "export GLOG_v=1 && msrun --worker_num=4 --local_worker_num=4 --master_addr=127.0.0.1 "\ "--master_port=10969 --join=True "\ "test_msrun.py --device_target=Ascend --dataset_path=/home/workspace/mindspore_dataset/mnist" ) > assert return_code == 0 E assert 256 == 0 test_entry_msrun.py:34: AssertionError =========================== short test summary info ============================ FAILED test_entry_msrun.py::test_msrun - assert 256 == 0 ======================== 1 failed in 292.45s (0:04:52) =========================