• 1 Post
  • 10 Comments
Joined 2 years ago
cake
Cake day: June 14th, 2023

help-circle
  • Alternate perspective: I use the heck out of session restore, and it has driven me nuts that it hasn’t worked properly under Wayland.

    I tend to use different virtual desktops for different projects, so being able to reboot (because of a kernel update and needing to load a module or something) without losing and having to rebuild that state is is super valuable.



  • There’s some weirdness on that because she did some important but not-very-public work at IBM in the 60s with their ACS/“Project Y” effort that did what we later call superscalar/multi-issue processors like …20 years before those terms existed. As part of that she wrote a paper about “Dynamic Instruction Scheduling” in 1966 under her pre-transition identity that is a like retroactive first cause for a bunch of computer architecture ideas.

    There was almost nothing about that work in public until Mark Smotherman was doing some history of computing work in the late 90s, put out a call for information about it, and she produced a huge trove of insider information after deciding it was worth exposing the provenance. There’s a neat long-form LATimes piece about the situation which is probably the primary source for the history in OP’s link.


  • That’s credible.

    I find the hardware architecture and licensing situation with AMD much more appealing than Nivida and really want to like their cards for compute, but they sure make it challenging to recommend.

    I had to do a little dead reckoning with the list of supported targets to find one that did the right thing with the 12CU RDNA2 680M.

    I’ve been meaning to put my findings on the internet since it might be useful to someone else, this is a good a place as any.

    On a fresh Xubuntu 22.04.4 LTS install doing the official ROCm 6.1 setup instructions, using a Minisforum UM690S Ryzen 9 6900HX/64GB/1TB box as the target, and after setting the GPU Memory to 8GB in the EFI before boot so it doesn’t OOM.

    For OpenMP projects, you’ll probably need to install libstdc++-12-dev in addition to the documented stuff because HIP won’t see the cmath libs otherwise (bug), then the <CMakeConfig.txt> mods for adapting a project with accelerator directives to that target are

    find_package(hip REQUIRED)
    list(APPEND CMAKE_PREFIX_PATH /opt/rocm-6.1.0)
    set(CMAKE_CXX_COMPILER ${HIP_HIPCC_EXECUTABLE})
    set(CMAKE_CXX_LINKER   ${HIP_HIPCC_EXECUTABLE})
    target_compile_options(yourtargetname PUBLIC "-lm;-fopenmp;-fopenmp-targets=amdgcn-amd-amdhsa;-Xopenmp-target=amdgcn-amd-amdhsa;-march=gfx1035"
    

    And torch, because I was curious how that would go (after I watched the Docker based suggested method download 30GB of trash then fall over, and did the bare metal install instead) seems to work with PYTORCH_TEST_WITH_ROCM=1 HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 testtorch.py which is the most confidence inspiring.

    Also amdgpu_top is your friend for figuring out if you actually have something on the GPU compute pipes or if it’s just lying and running on the CPU.


  • Neat.

    I set up some basic compute stuff with the ROCm stack on a 6900HX-based mini computer the other week (mostly to see if it was possible as there are some image processing workloads a colleague was hoping to accelerate on a similar host) and noticed that the docs occasionally pretend you could use GTT dynamicly allocated memory for compute tasks, but there was no evidence of it ever having worked for anyone.

    That machine had flexible firmware and 64GB of RAM stuffed in it so I just shuffled the boot time allocation in the EFI to give 8GB to the GPU to make it work, but it’s not elegant.

    It’s also pretty clumsy to actually make things run, lot of “set the magic environment variable because the tool chain will mis-detect the architecture of your unsupported card” and “Inject this wall of text into your CMake list to override libraries with our cooked versions” to make things work. Then it performs like an old GTX1060, which is on one hand impressive for an integrated part in a fairly low wattage machine, and on the other hand is competing with a low-mid range card from 2016.

    Pretty on brand really, they’ve been fucking up their compute stack since before any other vendor was doing the GPGPU thing (abandoning CTM for Stream in like a year).

    I think the OpenMP situation was the least jank of the ways I tried getting something to offload on an APU, but it was also one of the later attempts so maybe I was just getting used to it’s shit.


  • Don’t trust that they’re 100% compatible with mainline Linux, ChromeOS carries some weird patches and proprietary stuff up-stack.

    I have a little Dell Chromebook 11 3189 that I did the Mr.Chromebox Coreboot + Linux thing on, a couple years ago I couldn’t get the (weird i2c) input devices to work right, that has since been fixed in upstream coreboot tables and/or Linux but (as of a couple months ago) still don’t play nice with smaller alternative OSes like NetBSD or a Haiku nightly.

    The Audio situation is technically functional but still a little rough, the way the codec in bay/cherry trail devices is half chipset half external occasionally leads to the audio configuration crapping itself in ways that take some patience and/or expertise to deal with (Why do I suddenly have 20 inoperable sound cards in my pulse audio settings?).

    This particular machine also does some goofy bullshit with 2 IMUs in the halves instead of a fold-back sensor, so the rotation/folding stuff via iio sensors is a little quirky.

    But, they absolutely are fun, cheap hacker toys that are generally easy targets.



  • They don’t have to be specified in a monolithic fashion, but some things - like the input plumbing and session management examples I made - do have to be specified for for software to work when running under different compositors. FD.o basically exists because we already learned this lesson with other compat problems, and solved it without putting it in the X monolith - it’s why things like ICCM and EWMH happened; there were more details than were in the existing APIs that everyone needed to agree on to make software interoperate.

    Competing implementations are great, but once you have significant inertia behind competing implementations which are not compatible or at least interoperable, you’ve fragmented the already-small Linux market share into a maze of partially-incompatible micro-platforms. We’re not going to have compositing and non-compositing, we’re going to have 3ish (KDE/Qt [kde], Gnome/Gtk who aren’t even doing documented protocols, and Everyone else - mostly [wlr] extensions) incompatible sets of protocols for basic functionality.

    Looking at the slow bitter process to extend or replace components once implementations that rely on them exist, that’s not something to count on. Remember how it took 15 years of contention to eventually transition to D-Bus after CORBA/Bonobo and DCOP? That’s whats about to happen with things like the incompatible gtk and qt session management schemes. And that resolution was forced by the old HAL system using it, not the other parties involved getting their shit together of their own accord.

    One place we’re about to see innovation is wayland-stack-bypassing workarounds. Key remapping is currently in that category, the wayland protocols suite punted… so instead, keyd sniffing all the HID traffic at the evdev and/or uinput layer and outputting the rule-edited streams to virtual HID devices. That one does have a certain global elegance (works on ttys!), but it’s also layering violations with privileged processes.


  • I will preface that Xorg is obviously an unmaintainable mess of legacy decisions and legacy code, and I have both a machine that runs Hyprland and a machine that usually starts Plasma in Wayland mode so the Wayland situation getting to be more-or-less adequate with persistent irritations here and there… but Wayland is trauma-driven-development. It’s former xorg developers minimizing their level of responsibility for actual platform code, but controlling the protocol spec, and in the position to give up on X in time with their preferred successor.

    Essentially all of the platform is being outsourced to other libraries and toolkits, who are all doing their own incompatible things (Which is why we have like 8 xdg-desktop-portal back-ends with different sets of deficiencies, because portals were probably designed at the wrong level of abstraction), and all have to figure out how to work around the limitations in the protocols. Or they can spend years bikeshedding about extensions over theoretical security concerns in features that every other remotely modern platform supports.

    Some of that outsourcing has been extremely successful, like Pipewire.

    Some attempts have been less successful, like the ongoing lack of a reasonable way to handle input plumbing in a Wayland environment (think auto-type and network kvm functionality) because they seem to have imagined their libinput prototype spun out of Weston would serve as complete generic input plumbing, and it’s barely adequate for common hardware devices - hopefully it’s not too late to get something adequate widely standardized upon, but I’m increasingly afraid we missed the window of opportunity.

    Some things that had to be standardized to actually work - like session management - have been intentionally abdicated, and now KDE and Gnome have each become married to their own mutually-incompatible half solution, so we’re probably boned on that ever working properly until the next “start over to escape our old bad decisions” cycle… which, if history holds, isn’t that far away.

    We’re 15 years in to Wayland, and only in the last few years has it made it from “barely a tech demo” through “Linux in the early 2000s” broken, and in the last year to “problems with specific features” broken … and it is only 4 years younger than the xf86->xorg fork.


  • The near instant heat up is a big part of how I ended up with my Bambino with its “Thermojet”(Thermoblock coil thing) heater.

    3s from wake to ready, it takes longer to grind and prep than to heat. I usually pull a blank shot through the clean portafilter into the cup I’m going to pull the shot in so the downstream parts aren’t crashing the temperature, but that’s still seconds.

    Ascaso and Decent have more up-market offerings with thermoblock heaters that are similarly fast but offer more control. I wasn’t 5-10x price compelled for my needs, and I’m certainly not over 100x price in to that thing… But it is a great feature that the commercial derived machines don’t do.