LINUX.ORG.RU

Периодически падает amdgpu при запуске wine

 , ,


0

1

OS: OpenSuse Leap 15.2 5.3.18-lp152.19-default Motherboard: Asus PRIME B450M-A, BIOS 2807 02/01/2021 CPU: AMD Athlon 240GE with Radeon Vega Graphics

При запуске приложения Wine сразу или спустя время становится неактивной графическая оболочка, на консольные рабочие столы не переключает, доступ по ssh работает корректно. Обновление ОС и Bios по субъективным оценкам уменьшили количество падений, но ориентировочно раз в 30 минут amdgpu роняет ОС

Проблема появляется только на данной конфигурации (3 раза на различных ПК), всего в production порядка 50 машин, на других проблем нет В логе следующее:

Feb 12 16:07:41 m0752lin kwin_x11[2467]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 32058, resource id: 52429857, major code: 18 (>
Feb 12 16:07:41 m0752lin kwin_x11[2467]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 32068, resource id: 52429861, major code: 18 (>Feb 12 16:07:48 m0752lin kwin_x11[2467]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 36514, resource id: 83886089, major code: 15 (>
Feb 12 16:07:48 m0752lin kwin_x11[2467]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 36519, resource id: 83886089, major code: 18 (>
Feb 12 16:07:48 m0752lin kwin_x11[2467]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 36522, resource id: 83886081, major code: 25 (>
Feb 12 16:07:48 m0752lin kwin_x11[2467]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 36538, resource id: 83886090, major code: 12 (>Feb 12 16:07:48 m0752lin kwin_x11[2467]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 36539, resource id: 83886081, major code: 15 (>
Feb 12 16:07:48 m0752lin kwin_x11[2467]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 36567, resource id: 52429872, major code: 18 (>
Feb 12 16:07:48 m0752lin kernel: fpu exception: 0000 [#1] SMP NOPTI
Feb 12 16:07:48 m0752lin kernel: CPU: 2 PID: 1939 Comm: X Not tainted 5.3.18-lp152.19-default #1 openSUSE Leap 15.2 (unreleased)
Feb 12 16:07:48 m0752lin kernel: Hardware name: System manufacturer System Product Name/PRIME B450M-A, BIOS 2807 02/01/2021
Feb 12 16:07:48 m0752lin kernel: RIP: 0010:dml1_rq_dlg_get_dlg_params+0xec5/0x3580 [amdgpu]
Feb 12 16:07:48 m0752lin kernel: Code: f3 0f 10 0d 29 35 1d 00 f3 0f 10 05 e9 34 1d 00 48 89 44 24 30 0f b7 84 24 7a 01 00 00 df 6c 24 30 dd 54 24 70 8>Feb 12 16:07:48 m0752lin kernel: RSP: 0018:ffffbb8680e3b280 EFLAGS: 00210202
Feb 12 16:07:48 m0752lin kernel: RAX: 0000000000001f72 RBX: ffff9ee159b40d88 RCX: 000000000000002c
Feb 12 16:07:48 m0752lin kernel: RDX: 000000000000000f RSI: 0000000000000007 RDI: 0000000000000001
Feb 12 16:07:48 m0752lin kernel: RBP: ffff9ee159b40e4c R08: 0000000000000000 R09: 0000000000000001
Feb 12 16:07:48 m0752lin kernel: R10: 0000000000000019 R11: 0000000000000019 R12: 00000000000002f2
Feb 12 16:07:48 m0752lin kernel: R13: 000000000000a360 R14: 0000000000000600 R15: ffff9ee162588d88
Feb 12 16:07:48 m0752lin kernel: FS:  00007f03440d4ec0(0000) GS:ffff9ee16ae80000(0000) knlGS:0000000000000000
Feb 12 16:07:48 m0752lin kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 12 16:07:48 m0752lin kernel: CR2: 00007f09cb50e958 CR3: 0000000126b84000 CR4: 00000000003406e0
Feb 12 16:07:48 m0752lin kernel: Call Trace:
Feb 12 16:07:48 m0752lin kernel:  dcn_bw_calc_rq_dlg_ttu+0x781/0x870 [amdgpu]
Feb 12 16:07:48 m0752lin kernel:  dcn_validate_bandwidth+0x1895/0x1f80 [amdgpu]
Feb 12 16:07:48 m0752lin kernel:  dc_validate_global_state+0x2c1/0x330 [amdgpu]
Feb 12 16:07:48 m0752lin kernel:  amdgpu_dm_atomic_check+0x7be/0x820 [amdgpu]
Feb 12 16:07:48 m0752lin kernel:  drm_atomic_check_only+0x55d/0x810 [drm]
Feb 12 16:07:48 m0752lin kernel:  ? drm_mode_object_put.part.3+0x1f/0x50 [drm]
Feb 12 16:07:48 m0752lin kernel:  ? drm_atomic_set_property+0x81/0x970 [drm]
Feb 12 16:07:48 m0752lin kernel:  drm_atomic_commit+0x13/0x50 [drm]
Feb 12 16:07:48 m0752lin kernel:  drm_mode_obj_set_property_ioctl+0x24d/0x2e0 [drm]
Feb 12 16:07:48 m0752lin kernel:  ? mutex_lock+0xe/0x30
Feb 12 16:07:48 m0752lin kernel:  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
Feb 12 16:07:48 m0752lin kernel:  drm_ioctl_kernel+0xac/0xf0 [drm]
Feb 12 16:07:48 m0752lin kernel:  drm_ioctl+0x2eb/0x3b0 [drm]
Feb 12 16:07:48 m0752lin kernel:  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
Feb 12 16:07:48 m0752lin kernel:  ? do_iter_write+0xe2/0x190
Feb 12 16:07:48 m0752lin kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Feb 12 16:07:48 m0752lin kernel:  do_vfs_ioctl+0xa0/0x680
Feb 12 16:07:48 m0752lin kernel:  ? __sys_recvmsg+0x8a/0xa0
Feb 12 16:07:48 m0752lin kernel:  ksys_ioctl+0x70/0x80
Feb 12 16:07:48 m0752lin kernel:  __x64_sys_ioctl+0x16/0x20
Feb 12 16:07:48 m0752lin kernel:  do_syscall_64+0x65/0x1f0
Feb 12 16:07:48 m0752lin kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 12 16:07:48 m0752lin kernel: RIP: 0033:0x7f03419c6ac7
Feb 12 16:07:48 m0752lin kernel: Code: b3 66 90 48 8b 05 d1 13 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 0>Feb 12 16:07:48 m0752lin kernel: RSP: 002b:00007ffdf3e0e938 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
Feb 12 16:07:48 m0752lin kernel: RAX: ffffffffffffffda RBX: 000055c5d9f8a160 RCX: 00007f03419c6ac7
Feb 12 16:07:48 m0752lin kernel: RDX: 00007ffdf3e0e970 RSI: 00000000c01864ba RDI: 000000000000000d
Feb 12 16:07:48 m0752lin kernel: RBP: 00007ffdf3e0e970 R08: 0000000000000053 R09: 000055c5d9f8aa20
Feb 12 16:07:48 m0752lin kernel: R10: 000055c5daf24414 R11: 0000000000003246 R12: 00000000c01864ba
Feb 12 16:07:48 m0752lin kernel: R13: 000000000000000d R14: 0000000000000fff R15: 0000000000000003
Feb 12 16:07:48 m0752lin kernel: Modules linked in: md4 nls_utf8 cifs libarc4 dns_resolver fscache fuse af_packet xt_tcpudp ip6t_rpfilter ip6t_REJECT n>Feb 12 16:07:48 m0752lin kernel:  xor amdgpu raid6_pq crc32c_intel amd_iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysim>Feb 12 16:07:48 m0752lin kernel: ---[ end trace 586259440bdef3bb ]---
Feb 12 16:07:48 m0752lin kernel: RIP: 0010:dml1_rq_dlg_get_dlg_params+0xec5/0x3580 [amdgpu]
Feb 12 16:07:48 m0752lin kernel: Code: f3 0f 10 0d 29 35 1d 00 f3 0f 10 05 e9 34 1d 00 48 89 44 24 30 0f b7 84 24 7a 01 00 00 df 6c 24 30 dd 54 24 70 8>Feb 12 16:07:48 m0752lin kernel: RSP: 0018:ffffbb8680e3b280 EFLAGS: 00210202
Feb 12 16:07:48 m0752lin kernel: RAX: 0000000000001f72 RBX: ffff9ee159b40d88 RCX: 000000000000002c
Feb 12 16:07:48 m0752lin kernel: RDX: 000000000000000f RSI: 0000000000000007 RDI: 0000000000000001
Feb 12 16:07:48 m0752lin kernel: RBP: ffff9ee159b40e4c R08: 0000000000000000 R09: 0000000000000001
Feb 12 16:07:48 m0752lin kernel: R10: 0000000000000019 R11: 0000000000000019 R12: 00000000000002f2
Feb 12 16:07:48 m0752lin kernel: R13: 000000000000a360 R14: 0000000000000600 R15: ffff9ee162588d88
Feb 12 16:07:48 m0752lin kernel: FS:  00007f03440d4ec0(0000) GS:ffff9ee16ae80000(0000) knlGS:0000000000000000
Feb 12 16:07:48 m0752lin kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 12 16:07:48 m0752lin kernel: CR2: 00007f09cb50e958 CR3: 0000000126b84000 CR4: 00000000003406e0
Feb 12 16:19:12 m0752lin systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 30825 (find)
Feb 12 16:19:12 m0752lin systemd[1]: Mounting Arbitrary Executable File Formats File System...
Feb 12 16:19:12 m0752lin systemd[1]: Mounted Arbitrary Executable File Formats File System.
Feb 12 16:36:01 m0752lin plasmashell[2471]: libkcups: Renew-Subscription last error: 0 successful-ok
lines 5762-5783/5783 (END)
Feb 12 16:07:48 m0752lin kernel: RDX: 00007ffdf3e0e970 RSI: 00000000c01864ba RDI: 000000000000000d
Feb 12 16:07:48 m0752lin kernel: RBP: 00007ffdf3e0e970 R08: 0000000000000053 R09: 000055c5d9f8aa20
Feb 12 16:07:48 m0752lin kernel: R10: 000055c5daf24414 R11: 0000000000003246 R12: 00000000c01864ba
Feb 12 16:07:48 m0752lin kernel: R13: 000000000000000d R14: 0000000000000fff R15: 0000000000000003
Feb 12 16:07:48 m0752lin kernel: Modules linked in: md4 nls_utf8 cifs libarc4 dns_resolver fscache fuse af_packet xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJEC>
Feb 12 16:07:48 m0752lin kernel:  xor amdgpu raid6_pq crc32c_intel amd_iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xh>
Feb 12 16:07:48 m0752lin kernel: ---[ end trace 586259440bdef3bb ]---
Feb 12 16:07:48 m0752lin kernel: RIP: 0010:dml1_rq_dlg_get_dlg_params+0xec5/0x3580 [amdgpu]
Feb 12 16:07:48 m0752lin kernel: Code: f3 0f 10 0d 29 35 1d 00 f3 0f 10 05 e9 34 1d 00 48 89 44 24 30 0f b7 84 24 7a 01 00 00 df 6c 24 30 dd 54 24 70 80 cc 0c d8 f9 <d9> c9 6>
Feb 12 16:07:48 m0752lin kernel: RSP: 0018:ffffbb8680e3b280 EFLAGS: 00210202
Feb 12 16:07:48 m0752lin kernel: RAX: 0000000000001f72 RBX: ffff9ee159b40d88 RCX: 000000000000002c
Feb 12 16:07:48 m0752lin kernel: RDX: 000000000000000f RSI: 0000000000000007 RDI: 0000000000000001
Feb 12 16:07:48 m0752lin kernel: RBP: ffff9ee159b40e4c R08: 0000000000000000 R09: 0000000000000001
Feb 12 16:07:48 m0752lin kernel: R10: 0000000000000019 R11: 0000000000000019 R12: 00000000000002f2
Feb 12 16:07:48 m0752lin kernel: R13: 000000000000a360 R14: 0000000000000600 R15: ffff9ee162588d88
Feb 12 16:07:48 m0752lin kernel: FS:  00007f03440d4ec0(0000) GS:ffff9ee16ae80000(0000) knlGS:0000000000000000
Feb 12 16:07:48 m0752lin kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 12 16:07:48 m0752lin kernel: CR2: 00007f09cb50e958 CR3: 0000000126b84000 CR4: 00000000003406e0
Feb 12 16:19:12 m0752lin systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 30825 (find)
Feb 12 16:19:12 m0752lin systemd[1]: Mounting Arbitrary Executable File Formats File System...
Feb 12 16:19:12 m0752lin systemd[1]: Mounted Arbitrary Executable File Formats File System.
Feb 12 16:36:01 m0752lin plasmashell[2471]: libkcups: Renew-Subscription last error: 0 successful-ok


Последнее исправление: mnx_vol (всего исправлений: 2)

вот мои фиксы, лишнне выкинь:

processor.max_cstate=1 idle=nomwait rcu_nocbs=0-5 noiswmd amdgpu.audio=0 fsck.mode=force fsck.repair=yes amdgpu.dpm=1 amdgpu.dc=1 zswap.enabled=1 zswap.max_pool_percent=90 scsi_mod.use_blk_mq=1 elevator=bfq amdgpu.ppfeaturemask=0xfffd7fff"

darkenshvein ★★★★★
()
Ответ на: комментарий от darkenshvein

порядка 50 машин

Таки да, если из них хотя бы один идентичен этому и работает нормально, то дело вряд ли в драйверах или линуксах.

Nervous ★★★★★
()
Ответ на: комментарий от Nervous

Нет, они разные по конфигурации, идентична ОС, набор ПО, в трех случаях сбоя на разных машинах код ошибки один и тот же

mnx_vol
() автор топика
Ответ на: комментарий от darkenshvein

Попробую. Думал поменять камень в эту мать для начала, но можно и наоборот

mnx_vol
() автор топика

Патчи для Vega10 iGPU приняты в ядро начиная с 5.6 , воркэраунды для Mesa начиная с 20.1 . Иначе надо патчить и тюнить вручную.

anonymous
()
Ответ на: комментарий от mnx_vol

аллилуйя

в дебианчик подвезли новоя ядро и теперь перестало виснуть, до этого штабильно только 5.4 работало. осталось дождаться, когда под вяленым начнут запускаться в вайне игры с вулканом.

anonymous
()
Вы не можете добавлять комментарии в эту тему. Тема перемещена в архив.