diff options
Diffstat (limited to 'guides/sysadmin/machines')
| -rw-r--r-- | guides/sysadmin/machines/amdahl30/killing-time.org | 98 |
1 files changed, 55 insertions, 43 deletions
diff --git a/guides/sysadmin/machines/amdahl30/killing-time.org b/guides/sysadmin/machines/amdahl30/killing-time.org index f781582..be38736 100644 --- a/guides/sysadmin/machines/amdahl30/killing-time.org +++ b/guides/sysadmin/machines/amdahl30/killing-time.org @@ -1,17 +1,16 @@ -* Failure On November 19 2024, LDLC's off-brand SSD died on me. RIP. Re-installed Tumbleweed on the replacement (Kingston SA400S3) on -November 28. Since then… -** Performance loss -Getting uncannily reproducible frame drops (60 ↘ 40±10, movement -visibly choppy) in Hades Ⅱ when moving toward effects/particles-heavy -areas. No idea WTF, those areas ran fine before. +November 28. + +Since then, I have been getting uncannily reproducible stuttering and +frame drops (60↘40±10) in Hades Ⅱ when moving toward effect- or +particle-heavy areas of the hub rooms (Crossroads, Training Grounds). +No idea WTF, those areas ran fine before. - "High" graphics setting at native 1920×1080 resolution. - - Tried "Low" graphics, lowered resolution, disabled vsync: symptoms - persist. -- Not forcing any "compatibility tool" version, assuming this yields - "Proton Experimental". + - Tried "Low" graphics, lowered resolution, disabled vsync, switched + to Windowed mode: symptoms persist. +- Proton Experimental. - Tried a couple of old Proton versions: symptoms persist. - Reinstalled game & nuked everything under - =~/.cache/mesa_shader_cache*= @@ -23,15 +22,22 @@ areas. No idea WTF, those areas ran fine before. in case "stale shaders" were to blame or something. - Tumbleweed/Plasma/Wayland session. - Tried X11: symptoms persist. -- Reducing noise with =balooctl6 suspend=, =swapoff -a= (RAM nowhere - near exhausted). +- Reducing noise with + - ~balooctl6 suspend~ + - ~swapoff -a~ (RAM nowhere near exhausted) Well then. -*** CPU frequency scaling? +* CPU frequency scaling? +(Hey 👋 A warning: this was the first rabbit hole I burrowed into. +Spoiler alert: nothing I learned here solved the problem. Feel free +to skip to the next section if you want to know how this ends +{{{narrator(he wrote\, furiously hoping against hope that he would +indeed see the end of this someday)}}}) + Started by noticing that the Plasma "Power Management" tray widget -says "Power Profile" is "Not available". Not 100% sure whether that -was the case with the old installation; maybe I had had something -configured or installed to enable this? +says "Power Profile" is "Not available". Not sure whether that was +the case with the old installation; maybe I had something configured +or installed to enable this? Internet says "install and enable power-profiles-daemon", except that's on: @@ -60,10 +66,18 @@ $ powerprofilesctl PlatformDriver: placeholder #+end_example -Internet says I am missing the right scaling driver, and seems very -keen on enabling =amd_pstate=, which I do not seem to have available: +Internet says I am missing the right scaling driver, and sounds very +keen on enabling =amd_pstate=, which I do not seem to have available. +=/proc/config.gz= suggests the kernel configuration supports it, but +=cpupower= does not appear to know about it: #+begin_example +$ zcat /proc/config.gz | grep -i pstate +CONFIG_X86_INTEL_PSTATE=y +CONFIG_X86_AMD_PSTATE=y +CONFIG_X86_AMD_PSTATE_DEFAULT_MODE=3 +# CONFIG_X86_AMD_PSTATE_UT is not set + $ cpupower frequency-info analyzing CPU 5: driver: acpi-cpufreq @@ -81,16 +95,9 @@ analyzing CPU 5: boost state support: Supported: yes Active: no - -$ zcat /proc/config.gz | grep -i pstate -CONFIG_X86_INTEL_PSTATE=y -CONFIG_X86_AMD_PSTATE=y -CONFIG_X86_AMD_PSTATE_DEFAULT_MODE=3 -# CONFIG_X86_AMD_PSTATE_UT is not set #+end_example -=/proc/config.gz= suggests the kernel configuration supports it, but -=cpupower= does not seem to know about it. =dmesg= offers: +=dmesg= offers: #+begin_example $ sudo dmesg -H @@ -105,14 +112,15 @@ Flags: […] cppc […] #+end_example So ACPI problem? Lots of posts mentioning =amd_= parameters on the -kernel command-line but AFAIU those are stale with newer kernels (6.11 -here) which automatically (attempt to) load the =amd_pstate= driver. +kernel command-line, but AFAIU those posts are stale with newer +kernels (6.11 here) which automatically (attempt to) load the +=amd_pstate= driver. Went through the UEFI menu and found nothing related to ACPI or -[[https://forum.level1techs.com/t/amd-p-state-driver/197885/24][X2APIC]]. Skeptical UEFI settings anyway, since I did not change them -between the old and new installations. +[[https://forum.level1techs.com/t/amd-p-state-driver/197885/24][X2APIC]]. Skeptical of UEFI settings anyway, since I did not change +them between the old and new installations. -/Some time later/ +{{{narrator(Some time later)}}} Probably not ACPI, =dmesg= is choke full of ACPI noise. OTOH, using some diagnosis methods from [[https://bugzilla.kernel.org/show_bug.cgi?id=218171][this kernel bug report]]: @@ -122,10 +130,10 @@ $ find /sys/devices -name '*cppc*' 🦗 #+end_example -(=acpidump ; acpixtract ; iasl ; grep -i cpc *.dsl= also yields 🦗, +(~acpidump ; acpixtract ; iasl ; grep -i cpc *.dsl~ also yields 🦗, but =iasl= complains about "unresolved" "control methods", so 🤷) -/Some time later/ +{{{narrator(Some time later)}}} [[https://wiki.archlinux.org/title/CPU_frequency_scaling#amd_pstate][ArchWiki]] does say "Change /Enable CPPC/ […] from /Auto/ to /Enabled/". My UEFI menu tucks that under /Overclocking → Advanced CPU @@ -199,7 +207,7 @@ No. No it does not; no discernible difference in FPS nor vibes. Will assume this new baseline cannot hurt - OT1H "overclocking" is scary, OTOH Linux now has a finer handle on the CPU and hopefully will not overwork it to death? -*** Sᴇᴠᴇʀᴀʟ Wᴇᴇᴋꜱ Lᴀᴛᴇʀ +* Sᴇᴠᴇʀᴀʟ Wᴇᴇᴋꜱ Lᴀᴛᴇʀ - [[https://www.gamingonlinux.com/forum/topic/5475/page=1/][ridge reports]] "bad frame pacing on ADMGPU", - when vsync is turned off: a non-factor in my testing, - lots of useful information in that thread tho and @@ -213,6 +221,8 @@ not overwork it to death? - /lots/ of sysfs noodling there; unfortunately, none of the suggested settings for =power_dpm_force_performance_level= & =pp_power_profile_mode= change the symptoms. + - Since this forum seems full of knowledgeable folks, posted [[https://www.gamingonlinux.com/forum/topic/6437/][a new + topic]] there… but then [[https://www.gamingonlinux.com/forum/topic/6463/][the UK OSA dropped]]. - In [[https://gitlab.freedesktop.org/drm/amd/-/issues/3618#note_2689087][this drm/amd#3618 thread]], @agd5f suggests "6.11 stable kernels" include a fix for the issue at hand there and a further rework "was @@ -248,29 +258,31 @@ not overwork it to death? - Looking at Steam forums, [[https://steamcommunity.com/app/1145350/discussions/1/596260472619121965/][some folks]] do report FPS drops /shortly after the update/: #+begin_quote - it started fine after the major update, now suddenly im stuck with 40~50 fps with micro sutters + it started fine after the major update, now suddenly im stuck with + 40~50 fps with micro sutters — December 6 2024 #+end_quote - After AMD drivers & Mesa, figured I could look at vkd3d's issue tracker. [[https://github.com/doitsujin/dxvk/issues/4436][doitsujin/dxvk#4436]] and - [[ValveSoftware/steam-for-linux#11446]] looked somewhat promising: + [[https://github.com/ValveSoftware/steam-for-linux/issues/11446][ValveSoftware/steam-for-linux#11446]] looked somewhat promising: reports of lag on "KDE Tumbleweed Wayland", reported not long before my symptoms began (November 2024)); alas, ~LD_PRELOAD=~ does not help. - - #+begin_quote - Alternatively, remove the offending line in =/usr/share/drirc.d/00-radv-defaults.conf= - #+end_quote + #+begin_quote + Alternatively, remove the offending line in + =/usr/share/drirc.d/00-radv-defaults.conf= + #+end_quote - /discovers [[https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/00-radv-defaults.conf][=/usr/share/drirc.d/=]]/ + {{{narrator(discovers [[https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/00-radv-defaults.conf][=/usr/share/drirc.d/=]])}}} - Computers were a mistake. + Computers were a mistake. - Peeked at [[https://github.com/HansKristian-Work/vkd3d-proton/blob/master/.github/ISSUE_TEMPLATE/bug_report.md][vkd3d-proton's issue template]] and idly ran with ~PROTON_LOG=1~. Over the course of 30 seconds or so, the log file gets flooded with 3MB's worth of =trace:unwind:dump_unwind_info= 🤨 -*** This is insane +* This is insane Selected subset of moving parts; "testability" considering ease of clean reverts: @@ -297,5 +309,5 @@ Let's throw in: | Part | Testability | |---------------+-----------------------------------| -| Mobo firmware | 🔥 reports of nuked boot settings | +| Mobo firmware | 🔥 [[file:maintenance.org::*Firmware updates][reports]] of nuked boot settings | |
