📝 Added another blog sub

2023-05-29 02:05:51 -07:00 · 2023-05-29 02:05:51 -07:00 · 4a6bc39fe4
commit 4a6bc39fe4
parent b31a2d3e1a
2 changed files with 473 additions and 0 deletions
--- a/.config/newsboat/my_urls
+++ b/.config/newsboat/my_urls
@ -54,3 +54,4 @@ file://./rss/bigdinosaur_blog.rss
 file://./rss/justingarrison.xml
 file://./rss/keithjgrant.xml
 file://./rss/mark_manson.xml
+file://./rss/rosenzweig.xml
--- a/.config/newsboat/rss/rosenzweig.xml
+++ b/.config/newsboat/rss/rosenzweig.xml
@ -0,0 +1,472 @@
+<?xml version='1.0' encoding='UTF-8'?>
+<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0"><channel><title>On Life and Lisp</title><link>https://rosenzweig.io/</link><description>Software freedom, graphics, and gay</description><docs>http://www.rssboard.org/rss-specification</docs><generator>python-feedgen</generator><language>en</language><lastBuildDate>Mon, 22 May 2023 03:13:01 +0000</lastBuildDate><item><title>Growing up Alyssa</title><link>https://rosenzweig.io/blog/growing-up-alyssa.html</link><description>&lt;p&gt;When I was 10, I came out as transgender. I was a girl and I knew it.&lt;/p&gt;
+&lt;p&gt;I was one of the lucky ones.&lt;/p&gt;
+&lt;p&gt;After four painful years, I was fortunate enough to access gender-affirming health care. First testosterone blockers. Later estrogen, the stuff my peers soaked in for years while I threw myself into software development to distract from pain.&lt;/p&gt;
+&lt;p&gt;Despite being old enough to go through the wrong puberty and suffer its permanent changes, it took four years to access the medical fix. Four years of gender therapy, hard talks with doctors, and a lot of determination.&lt;/p&gt;
+&lt;p&gt;There’s a vicious myth that kids just walk into clinics and leave with hormones. Quite the opposite.&lt;/p&gt;
+&lt;p&gt;I was lucky: my parents supported me, and by then we lived near San Francisco, where a gender clinic was willing to take me as patient.&lt;/p&gt;
+&lt;p&gt;I’m 21 now. I’ll be blunt: if not for gender-affirming care, I don’t know if I would be around. If there would be FOSS graphics drivers for Mali-T860 or the Apple M1.&lt;/p&gt;
+&lt;p&gt;If I were a few years younger, lived in the wrong part of the US, that may well be the reality, because gender-affirming care is banned for minors in conservative areas across the United States. Texas, for example, would threaten to take me from my loving parents under Greg Abbott’s directive.&lt;/p&gt;
+&lt;p&gt;Even now, I’m lucky I don’t live in the wrong place: the medication I’m prescribed is banned for &lt;em&gt;adults&lt;/em&gt; in several American states.&lt;/p&gt;
+&lt;p&gt;I fear the 2024 election. How long until there’s a ban nationwide?&lt;/p&gt;
+&lt;p&gt;In high school, I knew this day might come. I applied to Canadian universities. Canada isn’t perfect, far from it. But stripping trans rights isn’t on the ballot yet.&lt;/p&gt;
+&lt;p&gt;Growing up, we liked visiting Florida.&lt;/p&gt;
+&lt;p&gt;Now there are travel advisories against it.&lt;/p&gt;
+&lt;p&gt;One recent Florida law threatens jail time if a trans person uses the bathroom - &lt;em&gt;any&lt;/em&gt; bathroom - in a public space. I remember in high school, arguing back against “bathroom bills” designed to marginalize trans people. They seem tame next to the vile attacks on trans people championed by Ron DeSantis.&lt;/p&gt;
+&lt;p&gt;What’s next?&lt;/p&gt;
+&lt;p&gt;Does anybody remember the Nuremberg laws?&lt;/p&gt;
+&lt;p&gt;I was raised Jewish. Growing up, we were haunted by the spectre of the Holocaust. I knew queer Germans were in the cross-hairs alongside Jews. I didn’t know that Berlin was a queer centre before Hitler came to power.&lt;/p&gt;
+&lt;p&gt;In high school, I understood if fascists came to power in the United States, I might be first to go. Nazis had a special symbol for people like me: a pink triangle superimposed on a yellow triangle. I was 16 when I wondered if one day I would be forced to wear it.&lt;/p&gt;
+&lt;p&gt;In 2020, Donald Trump used the Nazi’s symbol for political prisoners – forced to be worn in camps – to threaten leftists in a campaign ad.&lt;/p&gt;
+&lt;p&gt;Subtle.&lt;/p&gt;
+&lt;p&gt;You don’t need to like Democrats, but I need you to understand that if you vote Republican in 2024, you vote erasure. You vote oppression. You vote fascism.&lt;/p&gt;
+&lt;p&gt;Maybe you “just have some concerns” about trans kids.&lt;/p&gt;
+&lt;p&gt;I was a trans kid, and I want you to know that DeSantis, Abbott, and Trump were my nightmares. Their policies will lead to the deaths of transgender Americans. With hundreds of GOP-sponsored anti-trans bills and laws simultaneously sweeping the United States, it’s hard to believe this isn’t by design.&lt;/p&gt;
+&lt;p&gt;It doesn’t have to be that way.&lt;/p&gt;
+&lt;p&gt;The trans experience isn’t inherently defined by suffering. Not for trans kids, not for trans adults.&lt;/p&gt;
+&lt;p&gt;When treated with respect, allowed to transition, when we can access the medication we know we need, life can be great.&lt;/p&gt;
+&lt;p&gt;Personally, I have felt virtually no gender-related discomfort in years now.&lt;/p&gt;
+&lt;p&gt;I once recoiled at my reflection. Now I look in the mirror and smile at the cute woman smiling back at me. I’m surrounded by lovely friends, and we support each other. Laugh together. Cry together. Text endless stickers of cartoon sharks together. Past the shared struggle, there is immense trans joy.&lt;/p&gt;
+&lt;p&gt;When we are made to suffer – by banning our medication, arresting us for peeing, legislating our identities out of existence on the road to establishing a theocratic state – that is a policy choice.&lt;/p&gt;
+&lt;p&gt;We’re not asking for much. We don’t want special treatment. We just want respect. Life, liberty, and the pursuit of happiness.&lt;/p&gt;
+&lt;p&gt;Right now I want legislators to get the fuck out of our doctors’ office.&lt;/p&gt;
+&lt;p&gt;I’m on the board overseeing Linux graphics. Half of us are trans. If all you care about is Linux, resist the attacks on trans people.&lt;/p&gt;
+&lt;p&gt;If you have any decency, fight back.&lt;/p&gt;
+&lt;p&gt;It’s your choice.&lt;/p&gt;
+&lt;hr /&gt;
+&lt;p&gt;Selected reading:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;&lt;p&gt;&lt;a href="https://www.erininthemorning.com/p/may-anti-trans-legislative-risk-map"&gt;May Anti-Trans Legislative Risk Map&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
+&lt;li&gt;&lt;p&gt;&lt;a href="https://time.com/5953047/lgbtq-holocaust-stories/"&gt;Why It Took Decades for LGBTQ Stories to Be Included in Holocaust History&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
+&lt;li&gt;&lt;p&gt;&lt;a href="https://www.vox.com/recode/2020/6/18/21295226/facebook-trump-campaign-nazi-symbol-antifa"&gt;Facebook takes down another Trump campaign ad, this time for Nazi imagery&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
+&lt;li&gt;&lt;p&gt;&lt;a href="https://naacp.org/articles/naacp-issues-travel-advisory-florida"&gt;NAACP Issues Travel Advisory in Florida&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
+&lt;li&gt;&lt;p&gt;&lt;a href="https://www.cnn.com/2022/03/01/us/texas-transgender-family-investigation-lawsuit/index.html"&gt;Texas begins investigating parents of transgender teens for child abuse, according to a lawsuit. One parent works in the department involved in the investigations&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
+&lt;/ul&gt;
+&lt;hr /&gt;
+</description><guid isPermaLink="true">https://rosenzweig.io/blog/growing-up-alyssa.html</guid><pubDate>Sun, 21 May 2023 00:00:00 -0500</pubDate></item><item><title>Passing the reins on Panfrost</title><link>https://rosenzweig.io/blog/passing-reins-panfrost.html</link><description>&lt;p&gt;Today is my last day at &lt;a href="https://www.collabora.com/"&gt;Collabora&lt;/a&gt; and my last day leading the &lt;a href="https://docs.mesa3d.org/drivers/panfrost.html"&gt;Panfrost&lt;/a&gt; driver.&lt;/p&gt;
+&lt;p&gt;It’s been a wild ride.&lt;/p&gt;
+&lt;p&gt;In 2017, I began work on the &lt;code&gt;chai&lt;/code&gt; driver for Mali T (Midgard). &lt;code&gt;chai&lt;/code&gt; would later be merged into &lt;a href="https://queer.party/@Lyude"&gt;Lyude Paul&lt;/a&gt;’s and Connor Abbott’s BiOpenly project for Mali G (Bifrost) to form Panfrost.&lt;/p&gt;
+&lt;p&gt;In 2019, I joined Collabora to accelerate work on the driver stack. The initial goal was to run GNOME on a Mali-T860 Chromebook.&lt;/p&gt;
+&lt;p&gt;&lt;a href="https://www.collabora.com/news-and-blog/blog/2019/06/26/gnome-meets-panfrost/"&gt;Huge success&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;&lt;img src="https://rosenzweig.io/glmark-gears-gnome-panfrost-crop.webp" alt="GNOME running on Panfrost in 2019" /&gt;&lt;br /&gt;
+&lt;/p&gt;
+&lt;p&gt;Today, Panfrost supports a broad spectrum of Mali GPUs, conformant to the OpenGL ES 3.1 specification on Mali-G52 and Mali-G57. It’s hard to overstate how far we’ve come. I’ve had the thrills of architecting several backend shader compilers as well as the Gallium-based OpenGL driver, while my dear colleague Boris Brezillon has put together a proof-of-concept Vulkan driver which I think you’ll hear more about soon.&lt;/p&gt;
+&lt;p&gt;Lately, my focus has been ensuring the project can stand on its own four legs. I have every confidence in other Collaborans hacking on Panfrost, including Boris and Italo Nicola. The project has a bright future. It’s time for me to pass the reins.&lt;/p&gt;
+&lt;p&gt;I’m still alive. I plan to continue working on Mesa drivers for a long time, including the common infrastructure upon which Panfrost relies. And I’ll still send the odd Panfrost patch now and then. That said, my focus will shift.&lt;/p&gt;
+&lt;p&gt;I’m not ready to announce what’s in store yet… but maybe you can read between the lines!&lt;/p&gt;
+</description><guid isPermaLink="true">https://rosenzweig.io/blog/passing-reins-panfrost.html</guid><pubDate>Mon, 10 Apr 2023 00:00:00 -0500</pubDate></item><item><title>Apple GPU drivers now in Asahi Linux</title><link>https://rosenzweig.io/blog/asahi-gpu-part-7.html</link><description>&lt;p&gt;&lt;a href="https://rosenzweig.io/Quake3.png"&gt;&lt;img src="https://rosenzweig.io/Quake3.webp" /&gt;&lt;/a&gt;&lt;/p&gt;
+&lt;p&gt;We’re excited to announce our first Apple GPU driver release!&lt;/p&gt;
+&lt;p&gt;We’ve been working hard over the past two years to bring this new driver to everyone, and we’re really proud to finally be here. This is still an alpha driver, but it’s already good enough to run a smooth desktop experience and some games.&lt;/p&gt;
+&lt;p&gt;Read on to find out more about the state of things today, how to install it (it’s an opt-in package), and how to report bugs!&lt;/p&gt;
+&lt;h2 id="status"&gt;Status&lt;/h2&gt;
+&lt;p&gt;This release features work-in-progress OpenGL 2.1 and OpenGL ES 2.0 support for all current Apple M-series systems. That’s enough for hardware acceleration with desktop environments, like GNOME and KDE. It’s also enough for older 3D games, like Quake3 and Neverball. While there’s always room for improvement, the driver is fast enough to run all of the above at 60 frames per second at 4K.&lt;/p&gt;
+&lt;p&gt;Please note: these drivers have not yet passed the OpenGL (ES) conformance tests. There will be bugs!&lt;/p&gt;
+&lt;p&gt;What’s next? Supporting more applications. While OpenGL (ES) 2 suffices for some applications, newer ones (especially games) demand more OpenGL features. OpenGL (ES) 3 brings with it a slew of new features, like multiple render targets, multisampling, and transform feedback. Work on these features is well under way, but they will each take a great deal of additional development effort, and all are needed before OpenGL (ES) 3.0 is available.&lt;/p&gt;
+&lt;p&gt;What about Vulkan? We’re working on it! Although we’re only shipping OpenGL right now, we’re designing with Vulkan in mind. Most of the work we’re putting toward OpenGL will be reused for Vulkan. We estimated that we could ship working OpenGL 2 drivers much sooner than a working Vulkan 1.0 driver, and we wanted to get hardware accelerated desktops into your hands as soon as possible. For the most part, those desktops use OpenGL, so supporting OpenGL first made more sense to us than diving into the Vulkan deep end, only to use Zink to translate OpenGL 2 to Vulkan to run desktops. Plus, there is a large spectrum of OpenGL support, with OpenGL 2.1 containing a fraction of the features of OpenGL 4.6. The same is true for Vulkan: the baseline Vulkan 1.0 profile is roughly equivalent to OpenGL ES 3.1, but applications these days want Vulkan 1.3 with tons of extensions and “optional” features. Zink’s “layering” of OpenGL on top of Vulkan isn’t magic: it can only expose the OpenGL features that the underlying Vulkan driver has. A baseline Vulkan 1.0 driver isn’t even enough to get OpenGL 2.1 on Zink! Zink itself advertises support for OpenGL 4.6, but of course that’s only when paired with Vulkan drivers that support the equivalent of OpenGL 4.6… and that gets us back to a tremendous amount of time and effort.&lt;/p&gt;
+&lt;p&gt;When will OpenGL 3 support be ready? OpenGL 4? Vulkan 1.0? Vulkan 1.3? In community open source projects, it’s said that every time somebody asks when a feature will be done, it delays that feature by a month. Well, a lot of people have been asking…&lt;/p&gt;
+&lt;p&gt;At any rate, for a sneak peek… here is SuperTuxKart’s deferred renderer running at full speed, making liberal use of OpenGL ES 3 features like multiple render targets~&lt;/p&gt;
+&lt;p&gt;&lt;a href="https://rosenzweig.io/SuperTuxKart-Deferred.png"&gt;&lt;img src="https://rosenzweig.io/SuperTuxKart-Deferred.webp" /&gt;&lt;/a&gt;&lt;/p&gt;
+&lt;h2 id="anatomy-of-a-gpu-driver"&gt;Anatomy of a GPU driver&lt;/h2&gt;
+&lt;p&gt;Modern GPUs consist of many distinct “layered” parts. There is…&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;a memory management unit and an interface to submit memory-mapped work to the hardware&lt;/li&gt;
+&lt;li&gt;fixed-function 3D hardware to rasterize triangles, perform depth/stencil testing, and more&lt;/li&gt;
+&lt;li&gt;programmable “shader cores” (like little CPUs with bespoke instruction sets) with work dispatched by the fixed-function hardware&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;This “layered” hardware demands a “layered” graphics driver stack. We need…&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;a kernel driver to map memory and submit memory-mapped work&lt;/li&gt;
+&lt;li&gt;a userspace driver to translate OpenGL and Vulkan calls into hardware-specific data structures in graphics memory&lt;/li&gt;
+&lt;li&gt;a compiler translating shading programming languages like GLSL to the hardware’s instruction set&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;That’s a lot of work, calling for a team effort! Fortunately, that layering gives us natural boundaries to divide work among our small team.&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;&lt;a href="https://social.treehouse.systems/@alyssa"&gt;&lt;strong&gt;Alyssa Rosenzweig&lt;/strong&gt;&lt;/a&gt; is writing the OpenGL driver and compiler.&lt;/li&gt;
+&lt;li&gt;&lt;a href="https://vt.social/@lina"&gt;&lt;strong&gt;Asahi Lina&lt;/strong&gt;&lt;/a&gt; is writing the kernel driver and helping with OpenGL.&lt;/li&gt;
+&lt;li&gt;&lt;a href="https://mastodon.social/@dougall"&gt;&lt;strong&gt;Dougall Johnson&lt;/strong&gt;&lt;/a&gt; is reverse-engineering the instruction set with Alyssa.&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;Meanwhile, &lt;a href="https://tech.lgbt/@ella"&gt;&lt;strong&gt;Ella Stanforth&lt;/strong&gt;&lt;/a&gt; is working on a Vulkan driver, reusing the kernel driver, the compiler, and some code shared with the OpenGL driver.&lt;/p&gt;
+&lt;p&gt;Of course, we couldn’t build an OpenGL driver in under two years just ourselves. Thanks to the power of free and open source software, we stand on the shoulders of FOSS giants. The compiler implements a “NIR” backend, where NIR is a powerful intermediate representation, including GLSL to NIR translation. The kernel driver users the “Direct Rendering Manager” (DRM) subsystem of the Linux kernel to minimize boilerplate. Finally, the OpenGL driver implements the “Gallium3D” API inside of &lt;a href="https://mesa3d.org/"&gt;Mesa&lt;/a&gt;, the home for open source OpenGL and Vulkan drivers. Through Mesa and Gallium3D, we benefit from thirty years of OpenGL driver development, with common code translating OpenGL into the much simpler Gallium3D. Thanks to the incredible engineering of NIR, Mesa, and Gallium3D, our ragtag team of reverse-engineers can focus on what’s left: the Apple hardware.&lt;/p&gt;
+&lt;h2 id="installation-instructions"&gt;Installation instructions&lt;/h2&gt;
+&lt;p&gt;To get the new drivers, you need to run the &lt;code&gt;linux-asahi-edge&lt;/code&gt; kernel and also install the &lt;code&gt;mesa-asahi-edge&lt;/code&gt; Mesa package.&lt;/p&gt;
+&lt;pre class="shell"&gt;&lt;code&gt;$ sudo pacman -Syu
+$ sudo pacman -S linux-asahi-edge mesa-asahi-edge
+$ sudo update-grub&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;Since only one version of Mesa can be installed at a time, pacman will prompt you to replace &lt;code&gt;mesa&lt;/code&gt; with &lt;code&gt;mesa-asahi-edge&lt;/code&gt;. This is normal!&lt;/p&gt;
+&lt;p&gt;We also recommend running Wayland instead of Xorg at this point, so if you’re using the KDE Plasma environment, make sure to install the Wayland session:&lt;/p&gt;
+&lt;pre class="shell"&gt;&lt;code&gt;$ sudo pacman -S plasma-wayland-session&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;Then reboot, pick the Wayland session at the top of the login screen (SDDM), and enjoy! You might want to adjust the screen scale factor in &lt;em&gt;System Settings → Display and Monitor&lt;/em&gt; (Plasma Wayland defaults to 100% or 200%, while 150% is often nicer). If you have “Force font DPI” enabled under &lt;em&gt;Appearance → Fonts&lt;/em&gt;, you should disable that (it is saved separately for Wayland and Xorg, and shouldn’t be necessary on Wayland sessions). Log out and back in for these changes to fully apply.&lt;/p&gt;
+&lt;p&gt;Xorg and Xorg-based desktop environments should work, but there are a few known issues:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;Expect screen tearing (this might be fixed &lt;a href="https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/1006"&gt;soon&lt;/a&gt;)&lt;/li&gt;
+&lt;li&gt;VSync does not work (some KDE animations will be too fast, and GL apps will not limit their FPS even with VSync enabled). This is a limitation of Xorg on the Apple DCP display controllers, which do not support VBlank interrupts.&lt;/li&gt;
+&lt;li&gt;There are still driver bugs triggered by Xorg/KWin. We’re looking into this.&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;The &lt;code&gt;linux-asahi-edge&lt;/code&gt; kernel can be installed side-by-side with the standard &lt;code&gt;linux-asahi&lt;/code&gt; package, but both versions should be kept in sync, so make sure to always update your packages together! You can always pick the &lt;code&gt;linux-asahi&lt;/code&gt; kernel in the GRUB boot menu, which will disable GPU acceleration and the DCP display driver.&lt;/p&gt;
+&lt;p&gt;When the packages are updated in the future, it’s possible that graphical apps will stop starting up after an update until you reboot, or they may fall back to software rendering. This is normal. Until the UAPI is stable, we’ll have to break compatibility between Mesa and the kernel every now and then, so you will need to reboot to make things work after updates. In general, if apps &lt;em&gt;do&lt;/em&gt; keep working with acceleration after any particular Mesa update, then it’s probably safe not to reboot, but you should still do it to make sure you’re running the latest kernel!&lt;/p&gt;
+&lt;h2 id="reporting-bugs"&gt;Reporting bugs&lt;/h2&gt;
+&lt;p&gt;Since the driver is still in development, there are lots of known issues and we’re still working hard on improving conformance test results. Please don’t open new bugs for random apps not working! It’s still the early days and we know there’s a lot of work to do. Here’s a quick guide of how to report bugs:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;If you find an app that does not start up at all, please don’t report it as a bug. Lots of apps won’t work because they require a newer GL version than what we support. Please set the &lt;code&gt;LIBGL_ALWAYS_SOFTWARE=1&lt;/code&gt; environment variable for those apps to fall back to software rendering. If it is a popular app that is part of the Arch Linux ARM repository, you can make a comment on &lt;a href="https://github.com/AsahiLinux/linux/issues/73"&gt;this issue&lt;/a&gt; instead, so we can add Mesa quirks to workaround.&lt;/li&gt;
+&lt;li&gt;If you run into issues caused by &lt;code&gt;linux-asahi-edge&lt;/code&gt; unrelated to the GPU, please add a comment to &lt;a href="https://github.com/AsahiLinux/linux/issues/70"&gt;this issue&lt;/a&gt;. This includes display output issues! (Resolutions, backlight control, display power control, etc.)&lt;/li&gt;
+&lt;li&gt;If the GPU locks up and all GPU apps stop working, run &lt;code&gt;asahi-diagnose&lt;/code&gt; (for example, from an SSH session), open a new bug on &lt;a href="https://github.com/AsahiLinux/linux"&gt;the AsahiLinux/linux repository&lt;/a&gt;, attach the file generated by that command, and tell us what you were doing that caused the lockup.&lt;/li&gt;
+&lt;li&gt;For other GPU issues (rendering glitches, apps that crash after starting up correctly, and things like that), run &lt;code&gt;asahi-diagnose&lt;/code&gt; and make a comment on &lt;a href="https://github.com/AsahiLinux/linux/issues/72"&gt;this issue&lt;/a&gt;, attaching the file generated by that command. Don’t forget to tell us about your environment!&lt;/li&gt;
+&lt;li&gt;In the future, if a driver update causes a regression (rendering problems or crashes for apps that previously worked properly), you can open a bug &lt;a href="https://gitlab.freedesktop.org/asahi/mesa/-/issues"&gt;directly in the Mesa tracker&lt;/a&gt;.&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;We hope you enjoy our driver! Remember, things are still moving quickly, so make sure to update your packages regularly to get updates and bug fixes!&lt;/p&gt;
+&lt;p&gt;&lt;em&gt;Co-written with Asahi Lina. Can you tell who wrote what?&lt;/em&gt;&lt;/p&gt;
+</description><guid isPermaLink="true">https://rosenzweig.io/blog/asahi-gpu-part-7.html</guid><pubDate>Wed, 07 Dec 2022 00:00:00 -0500</pubDate></item><item><title>Clip control on the Apple GPU</title><link>https://rosenzweig.io/blog/asahi-gpu-part-6.html</link><description>&lt;figure&gt;
+&lt;img src="/Neverball.webp" alt="" /&gt;&lt;figcaption&gt;Neverball rendered on the Apple M1 GPU with an open source OpenGL driver&lt;/figcaption&gt;
+&lt;/figure&gt;
+&lt;p&gt;After a year in development, &lt;a href="https://docs.mesa3d.org/drivers/asahi.html"&gt;the open source “Asahi” driver for the Apple GPU&lt;/a&gt; is running real games. There’s more to do, but &lt;a href="https://neverball.org/"&gt;Neverball&lt;/a&gt; is already playable (and a lot of fun!).&lt;/p&gt;
+&lt;p&gt;Neverball uses legacy “fixed function” OpenGL. Rather than supply programmable shaders like OpenGL 2, old OpenGL 1 applications configure a fixed set of graphics effects like fog and alpha testing. Modern GPUs don’t implement these features in hardware. Instead, the driver synthesizes shaders implementing the desired graphics. This translation is complicated, but we get it for “free” as an open source driver in Mesa. If we implement the modern shader pipeline, Mesa will handle fixed function OpenGL for us transparently. That’s a win for open source drivers, and a win for GPU acceleration on &lt;a href="https://asahilinux.org/"&gt;Asahi Linux&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;To implement the modern OpenGL features, we rely on reverse-engineering the behaviour of Apple’s Metal driver, as we don’t have hardware documentation. Although Metal uses the same shader pipeline as OpenGL, it doesn’t support all the OpenGL features that the hardware does, which puts us in bind. In the past, I’ve relied on &lt;a href="https://rosenzweig.io/blog/asahi-gpu-part-4.html"&gt;educated guesswork&lt;/a&gt; to bridge the gap, but there’s another solution… and it’s a doozy.&lt;/p&gt;
+&lt;p&gt;For motivation, consider the &lt;em&gt;clip space&lt;/em&gt; used in OpenGL. In every other API on the planet, the Z component (depth) of points in the 3D world range from 0 to 1, where 0 is “near” and 1 is “far”. In OpenGL, however, Z ranges from &lt;em&gt;negative 1&lt;/em&gt; to 1. As Metal uses the 0/1 clip space, implementing OpenGL on Metal requires emulating the -1/1 clip space by inserting extra instructions into the vertex shader to transform the Z coordinate. Although this emulation adds overhead, it works for &lt;a href="https://github.com/google/angle"&gt;ANGLE&lt;/a&gt;’s open source implementation of OpenGL ES on Metal.&lt;/p&gt;
+&lt;p&gt;Like ANGLE, Apple’s OpenGL driver internally translates to Metal. Because Metal uses the 0 to 1 clip space, it should require this emulation code. Curiously, when we disassemble shaders compiled with their OpenGL implementation, we don’t see any such emulation. That means Apple’s GPU must support -1/1 clip spaces in addition to Metal’s preferred 0/1. The problem is figuring out how to use this other clip space.&lt;/p&gt;
+&lt;p&gt;We expect that there’s a bit toggling between these clip spaces. The logical place for such a bit is the viewport packet, but there’s no obvious difference between the viewport packets emitted by Metal and OpenGL-on-Metal. Ordinarily, we would identify the bit by toggling the clip space in Metal and comparing memory dumps. However, according to Apple’s documentation, there’s no way to change the clip space in Metal.&lt;/p&gt;
+&lt;p&gt;That’s an apparently contradiction. There’s no way to use the -1/1 clip space with Metal, but Apple’s OpenGL-on-Metal translator uses uses the -1/1 clip space. What gives?&lt;/p&gt;
+&lt;p&gt;Here’s a little secret: there are two graphics APIs called “Metal”. There’s the Metal you know, a limited API that Apple documents for App Store developers, an API that lacks useful features supported by OpenGL and Vulkan.&lt;/p&gt;
+&lt;p&gt;And there’s the Metal that Apple uses themselves, an internal API adding back features that Apple doesn’t want you using. While ANGLE implements OpenGL ES on the documented Metal, Apple can implement OpenGL on the secret Metal.&lt;/p&gt;
+&lt;p&gt;Apple does not publish documentation or headers for this richer Metal API, but if we’re lucky, we can catch a glimpse behind the curtain. The undocumented classes and methods making up the internal Metal API are still available in the production Metal binaries. To use them, we only need the missing headers. Fortunately, Objective-C symbols contain enough information to reconstruct header files, allowing us to experiment with undocumented methods with “extra” functionality inherited from OpenGL.&lt;/p&gt;
+&lt;p&gt;Compared to the desktop GPUs found in Intel Macs, Apple’s own GPU implements a slim, modern feature set mapping well to Metal. Most of the “extra” functionality is emulated. It is interesting to know the emulation happens in their Metal driver instead of their OpenGL frontend, but that’s unsurprising, as it allows their Metal drivers for Intel and AMD GPUs to implement the functionality natively. While this information is fascinating for “macOS hermeneutics”, it won’t help us with our Apple GPU mystery.&lt;/p&gt;
+&lt;p&gt;What &lt;em&gt;will&lt;/em&gt; help us are the catch-all mystery methods named &lt;code&gt;setOpenGLModeEnabled&lt;/code&gt;, apparently enabling “OpenGL mode”.&lt;/p&gt;
+&lt;p&gt;Mystery methods named like just &lt;em&gt;beg&lt;/em&gt; to be called.&lt;/p&gt;
+&lt;p&gt;The render pipeline descriptor has such a method. That descriptor contains state that can change every draw. In some graphics APIs, like OpenGL with &lt;a href="https://registry.khronos.org/OpenGL/extensions/ARB/ARB_clip_control.txt"&gt;&lt;code&gt;ARB_clip_control&lt;/code&gt;&lt;/a&gt; and Vulkan with &lt;a href="https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_depth_clip_control.html"&gt;&lt;code&gt;VK_EXT_depth_clip_control&lt;/code&gt;&lt;/a&gt;, the application can change the clip space every draw. Ideally, the clip space state would be part of this descriptor.&lt;/p&gt;
+&lt;p&gt;We can test this optimistic guess by augmenting our Metal test bench to call &lt;code&gt;[MTLRenderPipelineDescriptorInternal setOpenGLModeEnabled: YES]&lt;/code&gt;.&lt;/p&gt;
+&lt;p&gt;It feels strange to call this hidden method. It’s stranger when the code compiles and runs just fine.&lt;/p&gt;
+&lt;p&gt;We can then compare traces between OpenGL mode and the normal Metal mode. Seemingly, enabling OpenGL mode toggles a plethora of random unknown bits. Even if one of them is what we want, it’s a bit unsatisfying that the “real” Metal would lack a proper &lt;code&gt;[setClipSpace: MTLMinusOneToOne]&lt;/code&gt; method, rather than this blunt hack reconfiguring a pile of loosely related API behaviours.&lt;/p&gt;
+&lt;p&gt;Alas, for all the random changes in “OpenGL mode”, none seem to affect clipping behaviour.&lt;/p&gt;
+&lt;p&gt;Hope is not yet lost. There’s another &lt;code&gt;setOpenGLModeEnabled&lt;/code&gt; method, this time in the render pass descriptor. Rather than pipeline state that can change every draw, this descriptor’s state can only change in between render passes. Changing that state in between draws would require an expensive flush to main memory, similar to &lt;a href="https://rosenzweig.io/blog/asahi-gpu-part-5.html"&gt;the partial renders seen elsewhere with the Apple GPU&lt;/a&gt;. Nevertheless, it’s worth a shot.&lt;/p&gt;
+&lt;p&gt;Changing our test bench to call &lt;code&gt;[MTLRenderPassDescriptorInternal setOpenGLModeEnabled: YES]&lt;/code&gt;, we find another collection of random bits changed. Most of them are in hardware packets, and none of those seem to control clip space, either.&lt;/p&gt;
+&lt;p&gt;One bit does stand out. It’s not a hardware bit.&lt;/p&gt;
+&lt;p&gt;In addition to the packets that the userspace driver prepares for the hardware, userspace passes to the &lt;em&gt;kernel&lt;/em&gt; a large block of render pass state describing everything from tile size to the depth/stencil buffers. Such a design is unusual. Ordinarily, GPU kernel drivers are only concerned with memory management and scheduling, remaining oblivious of 3D graphics. By contrast, Apple processes this state in the kernel forwarding the state to the GPU’s firmware to configure the actual hardware.&lt;/p&gt;
+&lt;p&gt;Comparing traces, the render pass “OpenGL mode” sets an unknown bit in this kernel-processed block. If we set the same bit in our OpenGL driver, we find the clip space changes to -1/1. Victory, right?&lt;/p&gt;
+&lt;p&gt;Almost. Because this bit is render pass state, we can’t use it to change the clip space between draws. That’s okay for baseline OpenGL and Vulkan, but it prevents us from efficiently implementing the &lt;code&gt;ARB_clip_control&lt;/code&gt; and &lt;code&gt;VK_EXT_depth_clip_control&lt;/code&gt; extensions. There &lt;em&gt;are&lt;/em&gt; at least three (inefficient) implementations.&lt;/p&gt;
+&lt;p&gt;The first is ignoring the hardware support and emulating one of the clip spaces by inserting extra instructions into the vertex shader when the “wrong” clip space is used. In addition to extra overhead, that requires &lt;em&gt;shader variants&lt;/em&gt; for the different clip spaces.&lt;/p&gt;
+&lt;p&gt;Shader variants are terrible.&lt;/p&gt;
+&lt;p&gt;In new APIs like Vulkan, Metal, and D3D12, everything needed to compile a shader is known up-front as part of a monolithic pipeline. That means pipelines are compiled when they’re created, not when they’re used, and are never recompiled. By contrast, older APIs like OpenGL and D3D11 allow using the same shader with different API states, requiring some drivers to recompile shaders on the fly. Compiling shaders is slow, so shader variants can cause unpredictable drops in an application’s frame rate, familiar to desktop gamers as stuttering. If we use this approach in our OpenGL driver, switching clip modes could cause stuttering due to recompiling shaders. In bad circumstances, that stutter could even happen long after the mode is switched.&lt;/p&gt;
+&lt;p&gt;That option is undesirable, so the second approach is &lt;em&gt;always&lt;/em&gt; inserting emulation instructions that read the desired clip space at run-time, reserving a uniform (push constant) for the transformation. That way, the same shader is usable with either clip space, eliminating shader variants. However, that has even higher overhead than the first method. If an application frequently changes clip spaces within a render pass, this approach will be the most efficient of the three. If it does not, this approach adds constant overhead to &lt;em&gt;every&lt;/em&gt; application. Knowing which approach is better requires the driver to have a magic crystal ball.&lt;a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
+&lt;p&gt;The final option is &lt;em&gt;using&lt;/em&gt; the hardware clip space bit and splitting the render pass when the clip space is changed. Here, the shaders are optimal and do not require variants. However, splitting the render pass wastes tremendous memory bandwidth if the application changes clip spaces frequently. Nevertheless, this approach has some support from the &lt;code&gt;ARB_clip_control&lt;/code&gt; specification:&lt;/p&gt;
+&lt;blockquote&gt;
+&lt;p&gt;Some [OpenGL] implementations may introduce a flush when changing the clip control state. Hence frequent clip control changes are not recommended.&lt;/p&gt;
+&lt;/blockquote&gt;
+&lt;p&gt;Each approach has trade-offs. For now, the easiest “option” is sticking our head in the sand and giving up on &lt;code&gt;ARB_clip_control&lt;/code&gt; altogether. The OpenGL extension is optional until we get to OpenGL 4.5. Apple doesn’t implement it in their OpenGL stack. Because &lt;code&gt;ARB_clip_control&lt;/code&gt; is primarily for porting Direct3D games, native OpenGL games are happy without it. Certainly, Neverball doesn’t mind. For now, we can use the hardware bit to use the -1/1 clip space unconditionally in OpenGL and 0/1 unconditionally in Vulkan. That does not require any emulation or flushing, though it prevents us from advertising the extensions.&lt;/p&gt;
+&lt;p&gt;That’s enough to run Neverball on macOS, using our userspace OpenGL driver in Mesa, and Apple’s proprietary kernel driver. There’s a catch: Neverball has to present with the deprecated X11 server on macOS. Years ago, Apple engineers&lt;a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; contributed Mesa support for X11 on macOS (XQuartz), allowing us to run X11 applications with our Mesa driver. However, there’s no support for Apple’s own Cocoa windowing system, meaning native macOS applications won’t work with our driver. There’s also no easy way to run Linux’s newer Wayland display server on macOS. Nevertheless, Neverball does not use Cocoa directly. Instead, it uses the cross-platform &lt;a href="https://github.com/libsdl-org/SDL"&gt;SDL2&lt;/a&gt; library to create its window, which internally uses Cocoa, X11, or Wayland as appropriate for the operating system. With enough sweat and tears, we can build an macOS/X11 version of SDL2 and link Neverball with that.&lt;/p&gt;
+&lt;p&gt;This Neverball/macOS/X11 port was frustrating, especially when the game is one &lt;code&gt;apt install&lt;/code&gt; away on Linux. That’s a job for &lt;a href="https://twitter.com/linaasahi"&gt;Asahi Lina&lt;/a&gt;, who has been hard at work writing a Linux kernel driver for Apple’s GPU. When our work converges, my userspace Mesa driver will run on Linux with her kernel driver to implement a full open source graphics stack for 3D acceleration on Asahi Linux.&lt;/p&gt;
+&lt;p&gt;Please temper your expectations: even with hardware documentation, an optimized Vulkan driver stack (with enough features to layer OpenGL 4.6 with Zink) requires many years of full time work. At least for now, nobody is working on this driver full time&lt;a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;. Reverse-engineering slows the process considerably. We won’t be playing AAA games any time soon.&lt;/p&gt;
+&lt;p&gt;That said, thanks to the tremendous shared code in Mesa, a basic OpenGL driver is doable by a single person. I’m optimistic that we’ll have native OpenGL 2.1 in Asahi Linux by the end of the year. That’s enough to accelerate your desktop environment and browser. It’s also enough to play older games (like Neverball). Even without fancy features, GPU acceleration means smooth animations and better battery life.&lt;/p&gt;
+&lt;p&gt;In that light, the Asahi Linux future looks bright.&lt;/p&gt;
+&lt;p&gt;&lt;img src="/Neverball2.webp" /&gt;&lt;/p&gt;
+&lt;section class="footnotes" role="doc-endnotes"&gt;
+&lt;hr /&gt;
+&lt;ol&gt;
+&lt;li id="fn1" role="doc-endnote"&gt;&lt;p&gt;This crystal ball is called “Vulkan, Metal, or D3D12”, and it has its own problems.&lt;a href="#fnref1" class="footnote-back" role="doc-backlink"&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
+&lt;li id="fn2" role="doc-endnote"&gt;&lt;p&gt;Hi Jeremy!&lt;a href="#fnref2" class="footnote-back" role="doc-backlink"&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
+&lt;li id="fn3" role="doc-endnote"&gt;&lt;p&gt;I work full-time at &lt;a href="https://collabora.com"&gt;Collabora&lt;/a&gt; on my baby, the open source Panfrost driver for Mali GPUs.&lt;a href="#fnref3" class="footnote-back" role="doc-backlink"&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
+&lt;/ol&gt;
+&lt;/section&gt;
+</description><guid isPermaLink="true">https://rosenzweig.io/blog/asahi-gpu-part-6.html</guid><pubDate>Mon, 22 Aug 2022 00:00:00 -0500</pubDate></item><item><title>The Apple GPU and the Impossible Bug</title><link>https://rosenzweig.io/blog/asahi-gpu-part-5.html</link><description>&lt;p&gt;In late 2020, Apple debuted the M1 with Apple’s GPU architecture, AGX, rumoured to be derived from Imagination’s PowerVR series. Since then, &lt;a href="https://asahilinux.org"&gt;we’ve&lt;/a&gt; been reverse-engineering AGX and building open source graphics drivers. Last January, I &lt;a href="https://rosenzweig.io/blog/asahi-gpu-part-2.html"&gt;rendered a triangle&lt;/a&gt; with my own code, but there has since been a heinous bug lurking:&lt;/p&gt;
+&lt;p&gt;The driver fails to render large amounts of geometry.&lt;/p&gt;
+&lt;p&gt;Spinning a cube is fine, low polygon geometry is okay, but detailed models won’t render. Instead, the GPU renders only part of the model and then faults.&lt;/p&gt;
+&lt;figure&gt;
+&lt;img src="/PartialPhong.webp" alt="" /&gt;&lt;figcaption&gt;Partially rendered bunny&lt;/figcaption&gt;
+&lt;/figure&gt;
+&lt;p&gt;It’s hard to pinpoint how much we can render without faults. It’s not just the geometry complexity that matters. The same geometry can render with simple shaders but fault with complex ones.&lt;/p&gt;
+&lt;p&gt;That suggests rendering detailed geometry with a complex shader “takes too long”, and the GPU is timing out. Maybe it renders only the parts it finished in time.&lt;/p&gt;
+&lt;p&gt;Given the hardware architecture, this explanation is unlikely.&lt;/p&gt;
+&lt;p&gt;This hypothesis is easy to test, because we can control for timing with a shader that takes as long as we like:&lt;/p&gt;
+&lt;div class="sourceCode" id="cb1"&gt;&lt;pre class="sourceCode c"&gt;&lt;code class="sourceCode c"&gt;&lt;span id="cb1-1"&gt;&lt;a href="#cb1-1" aria-hidden="true"&gt;&lt;/a&gt;&lt;span class="cf"&gt;for&lt;/span&gt; (&lt;span class="dt"&gt;int&lt;/span&gt; i = &lt;span class="dv"&gt;0&lt;/span&gt;; i &amp;lt; LARGE_NUMBER; ++i) {&lt;/span&gt;
+&lt;span id="cb1-2"&gt;&lt;a href="#cb1-2" aria-hidden="true"&gt;&lt;/a&gt;    &lt;span class="co"&gt;/* some work to prevent the optimizer from removing the loop */&lt;/span&gt;&lt;/span&gt;
+&lt;span id="cb1-3"&gt;&lt;a href="#cb1-3" aria-hidden="true"&gt;&lt;/a&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;After experimenting with such a shader, we learn…&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;If shaders have a time limit to protect against infinite loops, it’s astronomically high. There’s no way our bunny hits that limit.&lt;/li&gt;
+&lt;li&gt;The symptoms of timing out differ from the symptoms of our driver rendering too much geometry.&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;That theory is out.&lt;/p&gt;
+&lt;p&gt;Let’s experiment more. Modifying the shader and seeing where it breaks, we find the only part of the shader contributing to the bug: the amount of data interpolated per vertex. Modern graphics APIs allow specifying “varying” data for each vertex, like the colour or the surface normal. Then, for each triangle the hardware renders, these “varyings” are interpolated across the triangle to provide smooth inputs to the fragment shader, allowing efficient implementation of common graphics techniques like Blinn-Phong shading.&lt;/p&gt;
+&lt;p&gt;Putting the pieces together, what matters is the &lt;em&gt;product&lt;/em&gt; of the number of vertices (geometry complexity) &lt;em&gt;times&lt;/em&gt; amount of data per vertex (“shading” complexity). That product is “total amount of per-vertex data”. The GPU faults if we use too much &lt;em&gt;total&lt;/em&gt; per-vertex data.&lt;/p&gt;
+&lt;p&gt;Why?&lt;/p&gt;
+&lt;p&gt;When the hardware processes each vertex, the vertex shader produces per-vertex data. That data has to &lt;em&gt;go&lt;/em&gt; somewhere. How this works depends on the hardware architecture. Let’s consider common GPU architectures.&lt;a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
+&lt;p&gt;Traditional &lt;strong&gt;immediate mode renderers&lt;/strong&gt; render directly into the framebuffer. They first run the vertex shader for each vertex of a triangle, then run the fragment shader for each pixel in the triangle. Per-vertex “varying” data is passed almost directly between the shaders, so immediate mode renderers are efficient for complex scenes.&lt;/p&gt;
+&lt;p&gt;There is a drawback: rendering directly into the framebuffer requires tremendous amounts of memory access to constantly write the results of the fragment shader and to read out back results when blending. Immediate mode renderers are suited to discrete, power-hungry desktop GPUs with dedicated video RAM.&lt;/p&gt;
+&lt;p&gt;By contrast, &lt;strong&gt;tile-based deferred renderers&lt;/strong&gt; split rendering into two passes. First, the hardware runs all vertex shaders for the entire frame, not just for a single model. Then the framebuffer is divided into small tiles, and dedicated hardware called a &lt;em&gt;tiler&lt;/em&gt; determines which triangles are in each tile. Finally, for each tile, the hardware runs all relevant fragment shaders and writes the final blended result to memory.&lt;/p&gt;
+&lt;p&gt;Tilers reduce memory traffic required for the framebuffer. As the hardware renders a single tile at a time, it keeps a “cached” copy of that tile of the framebuffer (called the “tilebuffer”). The tilebuffer is small, just a few kilobytes, but tilebuffer access is &lt;em&gt;fast&lt;/em&gt;. Writing to the tilebuffer is cheap, and unlike immediate renderers, blending is almost free. Because main memory access is expensive and mobile GPUs can’t afford dedicated video memory, tilers are suited to mobile GPUs, like Arm’s Mali, Imaginations’s PowerVR, and Apple’s AGX.&lt;/p&gt;
+&lt;p&gt;Yes, AGX is a &lt;em&gt;mobile&lt;/em&gt; GPU, designed for the iPhone. The M1 is a screaming fast desktop, but its unified memory and tiler GPU have roots in mobile phones. Tilers work well on the desktop, but there are some drawbacks.&lt;/p&gt;
+&lt;p&gt;First, at the start of a frame, the contents of the tilebuffer are undefined. If the application needs to preserve existing framebuffer contents, the driver needs to load the framebuffer from main memory and store it into the tilebuffer. This is expensive.&lt;/p&gt;
+&lt;p&gt;Second, because all vertex shaders are run before any fragment shaders, the hardware needs a buffer to store the outputs of all vertex shaders. In general, there is much more data required than space inside the GPU, so this buffer must be in main memory. This is also expensive.&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;Ah-ha&lt;/strong&gt;. Because AGX is a tiler, it requires a buffer of &lt;em&gt;all&lt;/em&gt; per-vertex data. We fault when we use too &lt;em&gt;much&lt;/em&gt; total per-vertex data, overflowing the buffer.&lt;/p&gt;
+&lt;p&gt;…So how do we allocate a larger buffer?&lt;/p&gt;
+&lt;p&gt;On some tilers, like older versions of Arm’s Mali GPU, the userspace driver computes how large this “varyings” buffer should be and allocates it.&lt;a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; To fix the faults, we can try increasing the sizes of all buffers we allocate, in the hopes that one of them contains the per-vertex data.&lt;/p&gt;
+&lt;p&gt;No dice.&lt;/p&gt;
+&lt;p&gt;It’s prudent to observe what Apple’s Metal driver does. We can cook up a Metal program drawing variable amounts of geometry and trace all GPU memory allocations that Metal performs while running our program. Doing so, we learn that increasing the amount of geometry drawn does &lt;em&gt;not&lt;/em&gt; increase the sizes of any allocated buffers. In fact, it doesn’t change &lt;em&gt;anything&lt;/em&gt; in the command buffer submitted to the kernel, except for the single “number of vertices” field in the draw command.&lt;/p&gt;
+&lt;p&gt;We &lt;em&gt;know&lt;/em&gt; that buffer exists. If it’s not allocated by userspace – and by now it seems that it’s not – it must be allocated by the kernel or firmware.&lt;/p&gt;
+&lt;p&gt;Here’s a funny thought: maybe we don’t specify the size of the buffer at all. Maybe it’s &lt;em&gt;okay&lt;/em&gt; for it to overflow, and there’s a way to handle the overflow.&lt;/p&gt;
+&lt;p&gt;It’s time for a little reconnaissance. Digging through what little public documentation exists for AGX, we learn from one &lt;a href="https://developer.apple.com/videos/play/wwdc2020/10602/"&gt;WWDC presentation&lt;/a&gt;:&lt;/p&gt;
+&lt;blockquote&gt;
+&lt;p&gt;The Tiled Vertex Buffer stores the Tiling phase output, which includes the post-transform vertex data…&lt;/p&gt;
+&lt;p&gt;But it may cause a Partial Render if full. A Partial Render is when the GPU splits the render pass in order to flush the contents of that buffer.&lt;/p&gt;
+&lt;/blockquote&gt;
+&lt;p&gt;Bullseye. The buffer we’re chasing, the “tiled vertex buffer”, can overflow. To cope, the GPU stops accepting new geometry, renders the existing geometry, and restarts rendering.&lt;/p&gt;
+&lt;p&gt;Since partial renders hurt performance, Metal application developers need to know about them to optimize their applications. There should be &lt;em&gt;performance counters&lt;/em&gt; flagging this issue. Poking around, we find two:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;Number of partial renders.&lt;/li&gt;
+&lt;li&gt;Number of bytes used of the parameter buffer.&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;Wait, what’s a “parameter buffer”?&lt;/p&gt;
+&lt;p&gt;Remember the rumours that AGX is derived from PowerVR? The public PowerVR &lt;a href="https://docs.imgtec.com/Profiling_and_Optimisations/PerfRec/topics/c_PerfRec_parameter_buffer.html"&gt;optimization&lt;/a&gt; &lt;a href="https://github.com/powervr-graphics/WebGL_SDK/blob/4.0/Documentation/Architecture%20Guides/PowerVR.Performance%20Recommendations.pdf"&gt;guides&lt;/a&gt; explain:&lt;/p&gt;
+&lt;blockquote&gt;
+&lt;p&gt;[The] list containing pointers to each vertex passed in from the application… is called the &lt;strong&gt;parameter buffer&lt;/strong&gt; (PB) and is stored in system memory along with the vertex data.&lt;/p&gt;
+&lt;p&gt;Each varying requires additional space in the parameter buffer.&lt;/p&gt;
+&lt;/blockquote&gt;
+&lt;p&gt;The Tiled Vertex Buffer &lt;em&gt;is&lt;/em&gt; the Parameter Buffer. PB is the PowerVR name, TVB is the public Apple name, and PB is still an internal Apple name.&lt;/p&gt;
+&lt;p&gt;What happens when PowerVR overflows the parameter buffer?&lt;/p&gt;
+&lt;p&gt;An old &lt;a href="http://imgtec.eetrend.com/sites/imgtec.eetrend.com/files/download/201402/1458-2110-1385011428.pdf"&gt;PowerVR presentation&lt;/a&gt; says that when the parameter buffer is full, the “render is flushed”, meaning “flushed data must be retrieved from the frame buffer as successive tile renders are performed”. In other words, it performs a partial render.&lt;/p&gt;
+&lt;p&gt;Back to the Apple M1, it seems the hardware is failing to perform a partial render. Let’s revisit the broken render.&lt;/p&gt;
+&lt;figure&gt;
+&lt;img src="/PartialPhong.webp" alt="" /&gt;&lt;figcaption&gt;Partially rendered bunny, again&lt;/figcaption&gt;
+&lt;/figure&gt;
+&lt;p&gt;Notice &lt;em&gt;parts&lt;/em&gt; of the model are correctly rendered. The parts that are not only have the black clear colour of the scene rendered at the start. Let’s consider the logical order of events.&lt;/p&gt;
+&lt;p&gt;First, the hardware runs vertex shaders for the bunny until the parameter buffer overflows. This works: the partial geometry is correct.&lt;/p&gt;
+&lt;p&gt;Second, the hardware rasterizes the partial geometry and runs the fragment shaders. This works: the shading is correct.&lt;/p&gt;
+&lt;p&gt;Third, the hardware flushes the partial render to the framebuffer. This must work for us to see anything at all.&lt;/p&gt;
+&lt;p&gt;Fourth, the hardware runs vertex shaders for the rest of the bunny’s geometry. This ought to work: the configuration is identical to the original vertex shaders.&lt;/p&gt;
+&lt;p&gt;Fifth, the hardware rasterizes and shades the rest of the geometry, blending with the old partial render. Because AGX is a tiler, to preserve that existing partial render, the hardware needs to load it back into the tilebuffer. We have no idea how it does this.&lt;/p&gt;
+&lt;p&gt;Finally, the hardware flushes the render to the framebuffer. This should work as it did the first time.&lt;/p&gt;
+&lt;p&gt;The only problematic step is &lt;em&gt;loading the framebuffer back into the tilebuffer after a partial render&lt;/em&gt;. Usually, the driver supplies two “extra” fragment shaders. One clears the tilebuffer at the start, and the other flushes out the tilebuffer contents at the end.&lt;/p&gt;
+&lt;p&gt;If the application needs the existing framebuffer contents preserved, instead of writing a clear colour, the “load tilebuffer” program instead reads from the framebuffer to reload the contents. Handling this requires quite a bit of code, but it works in our driver.&lt;/p&gt;
+&lt;p&gt;Looking closer, AGX requires more auxiliary programs.&lt;/p&gt;
+&lt;p&gt;The “store” program is supplied &lt;em&gt;twice&lt;/em&gt;. I noticed this when initially bringing up the hardware, but the reason for the duplication was unclear. Omitting each copy separately and seeing what breaks, the reason becomes clear: one program flushes the &lt;em&gt;final&lt;/em&gt; render, and the other flushes a &lt;em&gt;partial render&lt;/em&gt;.&lt;a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
+&lt;p&gt;…What about the program that loads the framebuffer into the tilebuffer?&lt;/p&gt;
+&lt;p&gt;When a partial render is possible, there are two “load” programs. One writes the clear colour or loads the framebuffer, depending on the application setting. We understand this one. The other &lt;em&gt;always&lt;/em&gt; loads the framebuffer.&lt;/p&gt;
+&lt;p&gt;…Always loads the framebuffer, as in, for loading back with a partial render even if there is a clear at the start of the frame?&lt;/p&gt;
+&lt;p&gt;If this program is the issue, we can confirm easily. Metal must require it to draw the same bunny, so we can write a Metal application drawing the bunny and stomp over its GPU memory to replace this auxiliary load program with one always loading with black.&lt;/p&gt;
+&lt;figure&gt;
+&lt;img src="/Metal-Artefacts.webp" alt="" /&gt;&lt;figcaption&gt;Metal drawing the bunny, stomping over its memory.&lt;/figcaption&gt;
+&lt;/figure&gt;
+&lt;p&gt;Doing so, Metal fails in a similar way. That means we’re at the root cause. Looking at our own driver code, we don’t specify &lt;em&gt;any&lt;/em&gt; program for this partial render load. Up until now, that’s worked okay. If the parameter buffer is never overflowed, this program is unused. As soon as a partial render is required, however, failing to provide this program means the GPU dereferences a null pointer and faults. That explains our GPU faults at the beginning.&lt;/p&gt;
+&lt;p&gt;Following Metal, we supply our own program to load back the tilebuffer after a partial render…&lt;/p&gt;
+&lt;figure&gt;
+&lt;img src="/BrokenDepth.webp" alt="" /&gt;&lt;figcaption&gt;Bunny with the fix&lt;/figcaption&gt;
+&lt;/figure&gt;
+&lt;p&gt;…which does &lt;em&gt;not&lt;/em&gt; fix the rendering! Cursed, this GPU. The faults go away, but the render still isn’t quite right for the first few frames, indicating partial renders are still broken. Notice the weird artefacts on the feet.&lt;/p&gt;
+&lt;p&gt;Curiously, the render “repairs itself” after a few frames, suggesting the parameter buffer stops overflowing. This implies the parameter buffer can be resized (by the kernel or by the firmware), and the system is growing the parameter buffer after a few frames in response to overflow. This mechanism makes sense:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;The hardware can’t allocate more parameter buffer space itself.&lt;/li&gt;
+&lt;li&gt;Overflowing the parameter buffer is expensive, as partial renders require tremendous memory bandwidth.&lt;/li&gt;
+&lt;li&gt;Overallocating the parameter buffer wastes memory for applications rendering simple geometry.&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;Starting the parameter buffer small and growing in response to overflow provides a balance, reducing the GPU’s memory footprint and minimizing partial renders.&lt;/p&gt;
+&lt;p&gt;Back to our misrendering. There are actually &lt;em&gt;two&lt;/em&gt; buffers being used by our program, a colour buffer (framebuffer)… and a depth buffer. The depth buffer isn’t directly visible, but facilitates the “depth test”, which discards far pixels that are occluded by other close pixels. While the partial render mechanism discards geometry, the depth test discards pixels.&lt;/p&gt;
+&lt;p&gt;That would explain the missing pixels on our bunny. The depth test is broken with partial renders. Why? The depth test depends on the depth buffer, so the depth buffer must &lt;em&gt;also&lt;/em&gt; be stored after a partial render and loaded back when resuming. Comparing a trace from our driver to a trace from Metal, looking for any relevant difference, we eventually stumble on the configuration required to make depth buffer flushes work.&lt;/p&gt;
+&lt;p&gt;And with that, we get our bunny.&lt;/p&gt;
+&lt;figure&gt;
+&lt;img src="/Final.webp" alt="" /&gt;&lt;figcaption&gt;The final Phong shaded bunny&lt;/figcaption&gt;
+&lt;/figure&gt;
+&lt;section class="footnotes" role="doc-endnotes"&gt;
+&lt;hr /&gt;
+&lt;ol&gt;
+&lt;li id="fn1" role="doc-endnote"&gt;&lt;p&gt;These explanations are massive oversimplifications of how modern GPUs work, but it’s good enough for our purposes here.&lt;a href="#fnref1" class="footnote-back" role="doc-backlink"&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
+&lt;li id="fn2" role="doc-endnote"&gt;&lt;p&gt;This is a worse idea than it sounds. Starting with the new Valhall architecture, Mali allocates varyings much more efficiently.&lt;a href="#fnref2" class="footnote-back" role="doc-backlink"&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
+&lt;li id="fn3" role="doc-endnote"&gt;&lt;p&gt;Why the duplication? I have not yet observed Metal using different programs for each. However, for front buffer rendering, partial renders need to be flushed to a temporary buffer for this scheme to work. Of course, you may as well use double buffering at that point.&lt;a href="#fnref3" class="footnote-back" role="doc-backlink"&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
+&lt;/ol&gt;
+&lt;/section&gt;
+</description><guid isPermaLink="true">https://rosenzweig.io/blog/asahi-gpu-part-5.html</guid><pubDate>Fri, 13 May 2022 00:00:00 -0500</pubDate></item><item><title>Software freedom isn't about licenses -- it's about power.</title><link>https://rosenzweig.io/blog/software-freedom-isnt-about-licenses-its-about-power.html</link><description>&lt;p&gt;A restrictive end-user license agreement is one way a company can exert power over the user. When the free software movement was founded thirty years ago, these restrictive licenses were the primary user-hostile power dynamic, so permissive and copyleft licenses emerged as synonyms to software freedom. Licensing does matter; user autonomy is lost with subscription models, revocable licenses, binary-only software, and onerous legal clauses. Yet these issues pertinent to desktop software do not scratch the surface of today’s digital power dynamics.&lt;/p&gt;
+&lt;p&gt;Today, companies exert power over their users by: tracking, selling data, psychological manipulation, intrusive advertising, planned obsolescence, and hostile Digital “Rights” Management (DRM) software. These issues affect every digital user, technically inclined or otherwise, on desktops and smartphones alike.&lt;/p&gt;
+&lt;p&gt;The free software movement promised to right these wrongs via free licenses on the source code, with adherents arguing free licenses provide immunity to these forms of malware since users could modify the code. Unfortunately most users lack the resources to do so. While the most egregious violations of user freedom come from companies publishing proprietary software, these ills can remain unchecked even in open source programs, and not all proprietary software exhibits these issues. The modern browser is nominally free software containing the trifecta of telemetry, advertisement, and DRM; a retro video game is proprietary software but relatively harmless.&lt;/p&gt;
+&lt;p&gt;As such, it’s not enough to look at the license. It’s not even enough to consider the license and a fixed set of issues endemic to proprietary software; the context matters. Software does not exist in a vacuum. Just as proprietary software tends to integrate with other proprietary software, free software tends to integrate with other free software. Software freedom &lt;em&gt;in context&lt;/em&gt; demands a gentle nudge towards software in user interests, rather than corporate interests.&lt;/p&gt;
+&lt;p&gt;How then should we conceptualize software freedom?&lt;/p&gt;
+&lt;p&gt;Consider the three adherents to free software and open source: hobbyists, corporations, and activists. Individual hobbyists care about tinkering with the software of their choice, emphasizing freely licensed source code. These concerns do not affect those who do not make a sport out of modifying code. There is &lt;em&gt;nothing wrong&lt;/em&gt; with this, but it will never be a household issue.&lt;/p&gt;
+&lt;p&gt;For their part, large corporations claim to love “open source”. No, they do not care about the social movement, only the cost reduction achieved by taking advantage of permissively licensed software. This corporate emphasis on licensing is often to the detriment of software freedom in the broader context. In fact, it is this irony that motivates software freedom beyond the license.&lt;/p&gt;
+&lt;p&gt;It is the activist whose ethos must apply to everyone regardless of technical ability or financial status. There is no shortage of open source software, often of corporate origin, but this is insufficient – it is the power dynamic we must fight.&lt;/p&gt;
+&lt;p&gt;We are not alone. Software freedom is intertwined with contemporary social issues, including copyright reform, privacy, sustainability, and Internet addiction. Each issue arises as a hostile power dynamic between a corporate software author and the user, with complicated interactions with software licensing. Disentangling each issue from licensing provides a framework to address nuanced questions of political reform in the digital era.&lt;/p&gt;
+&lt;p&gt;Copyright reform generalizes the licensing approaches of the free software and free culture movements. Indeed, free licenses empower us to freely use, adapt, remix, and share media and software alike. However, proprietary licenses micromanaging the core of human community and creativity are doomed to fail. Proprietary licenses have had little success preventing the proliferation of the creative works they seek to “protect”, and the rights to adapt and remix media have long been exercised by dedicated fans of proprietary media, producing volumes of fanfiction and fan art. The same observation applies to software: proprietary end-user license agreements have stopped neither file sharing nor reverse-engineering. In fact, a unique creative fandom around proprietary software has emerged in video game modding communities. Regardless of legal concerns, the human imagination and spirit of sharing persists. As such, we need not judge anyone for proprietary software and media in their life; rather, we must work towards copyright reform and free licensing to protect them from copyright overreach.&lt;/p&gt;
+&lt;p&gt;Privacy concerns are also traditional in software freedom discourse. True, secure communications software can never be proprietary, given the possibility of backdoors and impossibility of transparent audits. Unfortunately, the converse fails: there are freely licensed programs that inherently compromise user privacy. Consider third-party clients to centralized unencrypted chat systems. Although two users of such a client privately messaging one another are using only free software, if their messages are being data mined, there is still harm. The need for context is once more underscored.&lt;/p&gt;
+&lt;p&gt;Sustainability is an emergent concern, tying to software freedom via the electronic waste crisis. In the mobile space, where deprecating smartphones after a few short years is the norm and lithium batteries are hanging around in landfills indefinitely, we see the paradox of a freely licensed operating system with an abysmal social track record. A curious implication is the need for free device drivers. Where proprietary drivers force devices into obsolescence shortly after the vendor abandons them in favour of a new product, free drivers enable long-term maintenance. As before, licensing is not enough; the code must also be upstreamed and mainlined. Simply throwing source code over a wall is insufficient to resolve electronic waste, but it is a prerequisite. At risk is the right of a device owner to continue use of a device they have already purchased, even after the manufacturer no longer wishes to support it. Desired by climate activists and the dollar conscious alike, we cannot allow software to override this right.&lt;/p&gt;
+&lt;p&gt;Beyond copyright, privacy, and sustainability concerns, no software can be truly “free” if the technology itself shackles us, dumbing us down and driving us to outrage for clicks. Thanks to television culture spilling onto the Internet, the typical citizen has less to fear from government wiretaps than from themselves. For every encrypted message broken by an intelligence agency, thousands of messages are willingly broadcast to the public, seeking instant gratification. Why should a corporation or a government bother snooping into our private lives, if we present them on a silver platter? Indeed, popular open source implementations of corrupt technology do not constitute success, an issue epitomized by free software responses to social media. No, even without proprietary software, centralization, or cruel psychological manipulation, the proliferation of social media still endangers society.&lt;/p&gt;
+&lt;p&gt;Overall, focusing on concrete software freedom issues provides room for nuance, rather than the traditional binary view. End-users may make more informed decisions, with awareness of technologies’ trade-offs beyond the license. Software developers gain a framework to understand how their software fits into the bigger picture, as a free license is necessary but not sufficient for guaranteeing software freedom today. Activists can divide-and-conquer.&lt;/p&gt;
+&lt;p&gt;Many outside of our immediate sphere understand and care about these issues; long-term success requires these allies. Claims of moral superiority by licenses are unfounded and foolish; there is no success backstabbing our friends. Instead, a nuanced approach broadens our reach. While abstract moral philosophies may be intellectually valid, they are inaccessible to all but academics and the most dedicated supporters. Abstractions are perpetually on the political fringe, but these concrete issues are already understood by the general public. Furthermore, we cannot limit ourselves to technical audiences; understanding network topology cannot be a prerequisite to private conversations. Overemphasizing the role of source code and under-emphasizing the power dynamics at play is a doomed strategy; for decades we have tried and failed. In a post-Snowden world, there is too much at stake for more failures. Reforming the specific issues paves the way to software freedom. After all, social change is harder than writing code, but with incremental social reform, licenses become the easy part.&lt;/p&gt;
+&lt;p&gt;The nuanced analysis even helps individual software freedom activists. Purist attempts to refuse non-free technology categorically are laudable, but outside a closed community, going against the grain leads to activist burnout. During the day, employers and schools invariably mandate proprietary software, sometimes used to facilitate surveillance. At night, popular hobbies and social connections today are mediated by questionable software, from the DRM in a video game to the surveillance of a chat with a group of friends. Cutting ties with friends and abandoning self-care as a &lt;em&gt;prerequisite&lt;/em&gt; to fighting powerful organizations seems noble, but is futile. Even without politics, there remain technical challenges to using only free software. Layering in other concerns, or perhaps foregoing a mobile smartphone, only amplifies the risk of software freedom burnout.&lt;/p&gt;
+&lt;p&gt;As an application, this approach to software freedom brings to light disparate issues with the modern web raising alarm in the free software community. The traditional issue is proprietary JavaScript, a licensing question, yet considering only JavaScript licensing prompts both imprecise and inaccurate conclusions about web “applications”. Deeper issues include rampant advertising and tracking; the Internet is the largest surveillance network in human history, largely for commercial aims. To some degree, these issues are mitigated by script, advertisement, and tracker blockers; these may be pre-installed in a web browser for harm reduction in pursuit of a gentler web. However, the web’s fatal flaw is yet more fundamental. By design, when a user navigates to a URL, their browser executes &lt;em&gt;whatever&lt;/em&gt; code is piped on the wire. Effectively, the web implies an automatic auto-update, regardless of the license of the code. Even if the code is benign, it is still every year more expensive to run, forcing a hardware upgrade cycle deprecating old hardware which would work if only the web weren’t bloated by corporate interests. A subtler point is the “attention economy” tied into the web. While it’s hard to become addicted to reading in a text-only browser, binge-watching DRM-encumbered television is a different story. Half-hearted advances like “Reading Mode” are limited by the ironic distribution of documents over an app store. On the web, disparate issues of DRM, forced auto-update, privacy, sustainability, and psychological dark patterns converge to a single worst case scenario for software freedom. The licenses were only the beginning.&lt;/p&gt;
+&lt;p&gt;Nevertheless, there is cause for optimism. Framed appropriately, the fight for software freedom &lt;em&gt;is&lt;/em&gt; winnable. To fight for software freedom, fight for privacy. Fight for copyright reform. Fight for sustainability. Resist psychological dark patterns. At the heart of each is a software freedom battle – keep fighting and we can win.&lt;/p&gt;
+&lt;h2 id="see-also"&gt;See also&lt;/h2&gt;
+&lt;p&gt;&lt;a href="https://techautonomy.org/"&gt;Declaration of Digital Autonomy&lt;/a&gt;&lt;/p&gt;
+&lt;p&gt;&lt;a href="https://www.inkandswitch.com/local-first.html"&gt;Local-first software: You own your data, in spite of the cloud&lt;/a&gt;&lt;/p&gt;
+&lt;p&gt;&lt;a href="https://www.gnu.org/philosophy/wwworst-app-store.html"&gt;The WWWorst App Store&lt;/a&gt;&lt;/p&gt;
+</description><guid isPermaLink="true">https://rosenzweig.io/blog/software-freedom-isnt-about-licenses-its-about-power.html</guid><pubDate>Sun, 28 Mar 2021 00:00:00 -0500</pubDate></item><item><title>Fun and Games with Exposure Notifications</title><link>https://rosenzweig.io/blog/fun-and-games-with-exposure-notifications.html</link><description>&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Exposure_Notification"&gt;&lt;em&gt;Exposure Notifications&lt;/em&gt;&lt;/a&gt; is a protocol developed by Apple and Google for facilitating COVID-19 contact tracing on &lt;em&gt;mobile phones&lt;/em&gt; by exchanging codes with nearby phones over &lt;a href="https://en.wikipedia.org/wiki/Bluetooth"&gt;Bluetooth&lt;/a&gt;, implemented within the Android and iOS operating systems, now available here in Toronto.&lt;/p&gt;
+&lt;p&gt;Wait – phones? Android and iOS only? Can’t my &lt;a href="https://debian.org"&gt;Debian&lt;/a&gt; laptop participate? It has a recent Bluetooth chip. What about phones running GNU/Linux distributions like the &lt;a href="https://en.wikipedia.org/wiki/PinePhone"&gt;PinePhone&lt;/a&gt; or &lt;a href="https://en.wikipedia.org/wiki/Librem_5"&gt;Librem 5&lt;/a&gt;?&lt;/p&gt;
+&lt;p&gt;Exposure Notifications breaks down neatly into three sections: a Bluetooth layer, some cryptography, and integration with local public health authorities. Linux is up to the task, via &lt;a href="http://www.bluez.org/"&gt;BlueZ&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/OpenSSL"&gt;OpenSSL&lt;/a&gt;, and some &lt;a href="https://en.wikipedia.org/wiki/Python_(programming_language)"&gt;Python&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;Given my background, will this build to be a &lt;a href="/blog/my-name-is-cafe-beverage.html"&gt;reverse-engineering epic&lt;/a&gt; resulting in a novel open stack for a closed system?&lt;/p&gt;
+&lt;p&gt;…&lt;/p&gt;
+&lt;p&gt;Not at all. The specifications for the Exposure Notifications are available for both the &lt;a href="https://covid19-static.cdn-apple.com/applications/covid19/current/static/contact-tracing/pdf/ExposureNotification-BluetoothSpecificationv1.2.pdf?1"&gt;Bluetooth protocol&lt;/a&gt; and the &lt;a href="https://covid19-static.cdn-apple.com/applications/covid19/current/static/contact-tracing/pdf/ExposureNotification-CryptographySpecificationv1.2.pdf?1"&gt;underlying cryptography&lt;/a&gt;. A &lt;a href="https://github.com/google/exposure-notifications-internals"&gt;partial reference implementation&lt;/a&gt; is available for Android, as is an independent Android implementation in &lt;a href="https://github.com/microg/android_packages_apps_GmsCore"&gt;microG&lt;/a&gt;. In Canada, the key servers run an &lt;a href="https://github.com/cds-snc/covid-alert-server"&gt;open source stack&lt;/a&gt; originally built by Shopify and now maintained by the &lt;a href="https://digital.canada.ca/"&gt;Canadian Digital Service&lt;/a&gt;, including open &lt;a href="https://github.com/cds-snc/covid-alert-server/tree/master/proto"&gt;protocol documentation&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;All in all, this is looking to be a smooth-sailing weekend&lt;a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; project.&lt;/p&gt;
+&lt;p&gt;The devil’s in the details.&lt;/p&gt;
+&lt;h2 id="bluetooth"&gt;Bluetooth&lt;/h2&gt;
+&lt;p&gt;Exposure Notifications operates via Bluetooth Low Energy “advertisements”. Scanning for other devices is as simple as scanning for advertisements, and broadcasting is as simple as advertising ourselves.&lt;/p&gt;
+&lt;p&gt;On an Android phone, this is handled deep within Google Play Services. Can we drive the protocol from userspace on a regular GNU/Linux laptop? It depends. Not all laptops support Bluetooth, not all Bluetooth implementations support Bluetooth Low Energy, and I hear not all Bluetooth Low Energy implementations properly support undirected transmissions (“advertising”).&lt;/p&gt;
+&lt;p&gt;Luckily in my case, I develop on an Debianized Chromebook with a Wi-Fi/Bluetooth module. I’ve never used the Bluetooth, but it turns out the module has full support for advertisements, verified with the &lt;code&gt;lescan&lt;/code&gt; (&lt;strong&gt;L&lt;/strong&gt;ow &lt;strong&gt;E&lt;/strong&gt;nergy &lt;strong&gt;Scan&lt;/strong&gt;) command of the &lt;code&gt;hcitool&lt;/code&gt; Bluetooth utility.&lt;/p&gt;
+&lt;p&gt;&lt;code&gt;hcitool&lt;/code&gt; is a part of BlueZ, the standard Linux library for Bluetooth. Since &lt;code&gt;lescan&lt;/code&gt; is able to detect nearby phones running Exposure Notifications, pouring through its source code is a good first step to our implementation. With some minor changes to &lt;code&gt;hcitool&lt;/code&gt; to dump packets as raw hex and to filter for the Exposure Notifications protocol, we can print all nearby Exposure Notifications advertisements. So far, so good.&lt;/p&gt;
+&lt;p&gt;That’s about where the good ends.&lt;/p&gt;
+&lt;p&gt;While scanning is simple with reference code in &lt;code&gt;hcitool&lt;/code&gt;, advertising is complicated by BlueZ’s lack of an interface at the time of writing. While a general “enable advertising” routine exists, routines to set advertising parameters and data per the Exposure Notifications specification are unavailable. This is not a showstopper, since BlueZ is itself an open source userspace library. We can drive the Bluetooth module the same way BlueZ does internally, filling in the necessary gaps in the API, while continuing to use BlueZ for the heavy-lifting.&lt;/p&gt;
+&lt;p&gt;Some care is needed to multiplex scanning and advertising within a single thread while remaining power efficient. The key is that advertising, once configured, is handled entirely in hardware without CPU intervention. On the other hand, scanning does require CPU involvement, but it is &lt;em&gt;not&lt;/em&gt; necessary to scan continuously. Since COVID-19 is thought to transmit from &lt;em&gt;sustained&lt;/em&gt; exposure, we only need to scan every few minutes. (Food for thought: how does this connect to the sampling theorem?)&lt;/p&gt;
+&lt;p&gt;Thus we can order our operations as:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;Configure advertising&lt;/li&gt;
+&lt;li&gt;Scan for devices&lt;/li&gt;
+&lt;li&gt;Wait for several minutes&lt;/li&gt;
+&lt;li&gt;Repeat.&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;Since most of the time the program is asleep, this loop is efficient. It additionally allows us to reconfigure advertising every ten to fifteen minutes, in order to change the Bluetooth address to prevent tracking.&lt;/p&gt;
+&lt;p&gt;All of the above amounts to a few hundred lines of C code, treating the Exposure Notifications packets themselves as opaque random data.&lt;/p&gt;
+&lt;h2 id="cryptography"&gt;Cryptography&lt;/h2&gt;
+&lt;p&gt;Yet the data is far from random; it is the result of a series of operations in terms of secret keys defined by the Exposure Notifications cryptography specification. Every day, a “temporary exposure key” is generated, from which a “rolling proximity identifier key” and an “associated encrypted metadata key” are derived. These are used to generate a “rolling proximity identifier” and the “associated encrypted metadata”, which are advertised over Bluetooth and changed in lockstep with the Bluetooth random addresses.&lt;/p&gt;
+&lt;p&gt;There are lots of moving parts to get right, but each derivation reuses a common encryption primitive: HKDF-SHA256 for key derivation, AES-128 for the rolling proximity identifier, and AES-128-CTR for the associated encrypted metadata. Ideally, we would grab a state-of-the-art library of cryptography primitives like &lt;a href="https://nacl.cr.yp.to/"&gt;&lt;code&gt;NaCl&lt;/code&gt;&lt;/a&gt; or &lt;a href="https://doc.libsodium.org/"&gt;&lt;code&gt;libsodium&lt;/code&gt;&lt;/a&gt; and wire everything up.&lt;/p&gt;
+&lt;p&gt;First, some good news: once these routines are written, we can reliably unit test them. Though the specification states that “test vectors… are available upon request”, it isn’t clear &lt;em&gt;who&lt;/em&gt; to request from. But Google’s reference implementation is itself unit-tested, and sure enough, it contains a &lt;a href="https://github.com/google/exposure-notifications-internals/blob/a0394e69c51aa118f5000b8a2c2f15f1f9aedb7d/app/src/androidTest/java/com/google/samples/exposurenotification/testing/TestVectors.java"&gt;&lt;code&gt;TestVectors.java&lt;/code&gt;&lt;/a&gt; file, from which we can grab the vectors for a complete set of unit tests.&lt;/p&gt;
+&lt;p&gt;After patting ourselves on the back for writing unit tests, we’ll need to pick a library to implement the cryptography. Suppose we try &lt;code&gt;NaCl&lt;/code&gt; first. We’ll quickly realize the primitives we need are missing, so we move onto &lt;code&gt;libsodium&lt;/code&gt;, which is backwards-compatible with NaCl. For a moment, this will work – &lt;code&gt;libsodium&lt;/code&gt; has upstream support for HKDF-SHA256. Unfortunately, the version of &lt;code&gt;libsodium&lt;/code&gt; shipping in Debian testing is too old for HKDF-SHA256. Not a big problem – we can backwards port the implementation, written in terms of the underlying HMAC-SHA256 operations, and move on to the AES.&lt;/p&gt;
+&lt;p&gt;AES is a standard symmetric cipher, so &lt;code&gt;libsodium&lt;/code&gt; has excellent support… for some modes. However standard, AES is not &lt;em&gt;one&lt;/em&gt; cipher; it is a family of ciphers with different key lengths and operating modes, with dramatically different security properties. “AES-128-CTR” in the Exposure Notifications specification is clearly 128-bit AES in CTR (&lt;strong&gt;C&lt;/strong&gt;oun&lt;strong&gt;t&lt;/strong&gt;e&lt;strong&gt;r&lt;/strong&gt;) mode, but what about “AES-128” alone, stated to operate on a “single AES-128 block”?&lt;/p&gt;
+&lt;p&gt;The mode implicitly specified is known as ECB (&lt;strong&gt;E&lt;/strong&gt;lectronic &lt;strong&gt;C&lt;/strong&gt;ode&lt;strong&gt;b&lt;/strong&gt;ook) mode and is known to have fatal security flaws in most applications. Because AES-ECB is generally insecure, &lt;code&gt;libsodium&lt;/code&gt; does not have any support for this cipher mode. Great, now we have &lt;em&gt;two&lt;/em&gt; problems – we have to rewrite our cryptography code against a new library, and we have to consider if there is a vulnerability in Exposure Notifications.&lt;/p&gt;
+&lt;p&gt;ECB’s crucial flaw is that for a given key, identical plaintext will always yield identical ciphertext, regardless of position in the stream. Since AES is block-based, this means identical blocks yield identical ciphertext, leading to &lt;a href="https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#ECB"&gt;trivial cryptanalysis&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;In Exposure Notifications, ECB mode is used only to derive rolling proximity identifiers from the rolling proximity identifier key and the timestamp, by the equation:&lt;/p&gt;
+&lt;pre&gt;&lt;code&gt;RPI_ij = AES_128_ECB(RPIK_i, PaddedData_j)&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;…where &lt;code&gt;PaddedData&lt;/code&gt; is a function of the quantized timestamp. Thus the issue is avoided, as every plaintext will be unique (since timestamps are monotonically increasing, unless you’re trying to contact trace &lt;em&gt;Back to the Future&lt;/em&gt;).&lt;/p&gt;
+&lt;p&gt;Nevertheless, &lt;code&gt;libsodium&lt;/code&gt; doesn’t know that, so we’ll need to resort to a ubiquitous cryptography library that doesn’t, uh, take security quite so seriously…&lt;/p&gt;
+&lt;p&gt;I’ll leave &lt;a href="https://en.wikipedia.org/wiki/Heartbleed"&gt;the implications&lt;/a&gt; up to your imagination.&lt;/p&gt;
+&lt;h2 id="database"&gt;Database&lt;/h2&gt;
+&lt;p&gt;While the Bluetooth and cryptography sections are governed by upstream specifications, making sense of the data requires tracking a significant amount of state. At &lt;em&gt;minimum&lt;/em&gt;, we must:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;Record received packets (the Rolling Proximity Identifier and the Associated Encrypted Metadata).&lt;/li&gt;
+&lt;li&gt;Query received packets for diagnosed identifiers.&lt;/li&gt;
+&lt;li&gt;Record our Temporary Encryption Keys.&lt;/li&gt;
+&lt;li&gt;Query our keys to upload if we are diagnosed.&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;If we were so inclined, we could handwrite all the serialization and concurrency logic and hope we don’t have a bug that results in COVID-19 mayhem.&lt;/p&gt;
+&lt;p&gt;A better idea is to grab &lt;a href="https://sqlite.org/index.html"&gt;SQLite&lt;/a&gt;, perhaps &lt;a href="https://sqlite.org/mostdeployed.html"&gt;the most deployed software in the world&lt;/a&gt;, and express these actions as SQL queries. The database persists to disk, and we can even express natural unit tests with a synthetic in-memory database.&lt;/p&gt;
+&lt;p&gt;With this infrastructure, we’re now done with the primary daemon, recording Exposure Notification identifiers to the database and broadcasting our own identifiers. That’s not interesting if we never &lt;em&gt;do&lt;/em&gt; anything with that data, though. Onwards!&lt;/p&gt;
+&lt;h2 id="key-retrieval"&gt;Key retrieval&lt;/h2&gt;
+&lt;p&gt;Once per day, Exposure Notifications implementations are expected to query the server for Temporary Encryption Keys associated with diagnosed COVID-19 cases. From these keys, the cryptography implementation can reconstruct the associated Rolling Proximity Identifiers, for which we can query the database to detect if we have been exposed.&lt;/p&gt;
+&lt;p&gt;Per Google’s &lt;a href="https://developers.google.com/android/exposure-notifications/exposure-key-file-format"&gt;documentation&lt;/a&gt;, the servers are expected to return a &lt;code&gt;zip&lt;/code&gt; file containing two files:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;&lt;code&gt;export.bin&lt;/code&gt;: a container serialized as &lt;a href="https://en.wikipedia.org/wiki/Protocol_Buffers"&gt;Protocol Buffers&lt;/a&gt; containing Diagnosis Keys&lt;/li&gt;
+&lt;li&gt;&lt;code&gt;export.sig&lt;/code&gt;: a signature for the export with the public health agency’s key&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;The signature is not terribly interesting to us. On Android, it appears the system pins the public keys of recognized public health agencies as an integrity check for the received file. However, this public key is given directly to Google; we don’t appear to have an easy way to access it.&lt;/p&gt;
+&lt;p&gt;Does it matter? For our purposes, it’s unlikely. The Canadian key retrieval server is already transport-encrypted via HTTPS, so tampering with the data would already require compromising a certificate authority in addition to intercepting the requests to &lt;a href="https://canada.ca" class="uri"&gt;https://canada.ca&lt;/a&gt;. Broadly speaking, that limits attackers to nation-states, and since Canada has no reason to attack its own infrastructure, that limits our threat model to foreign nation-states. International intelligence agencies probably have better uses of resources than getting people to take extra COVID tests.&lt;/p&gt;
+&lt;p&gt;It’s worth noting other countries’ implementations could serve this zip file over plaintext HTTP, in which case this signature check becomes important.&lt;/p&gt;
+&lt;p&gt;Focusing then on &lt;code&gt;export.bin&lt;/code&gt;, we may import the relevant protocol buffer definitions to extract the keys for matching against our database. Since this requires only read-only access to the database and executes infrequently, we can safely perform this work from a separate process written in a higher-level language like Python, interfacing with the cryptography routines over the Python &lt;a href="https://docs.python.org/3/library/ctypes.html"&gt;foreign function interface &lt;code&gt;ctypes&lt;/code&gt;&lt;/a&gt;. Extraction is easy with the Python protocol buffers implementation, and downloading should be as easy as a &lt;code&gt;GET&lt;/code&gt; request with the standard library’s &lt;code&gt;urllib&lt;/code&gt;, right?&lt;/p&gt;
+&lt;p&gt;Here we hit a gotcha: the retrieval endpoint is guarded behind an &lt;a href="https://en.wikipedia.org/wiki/HMAC"&gt;HMAC&lt;/a&gt;, requiring authentication to download the &lt;code&gt;zip&lt;/code&gt;. The protocol documentation states:&lt;/p&gt;
+&lt;blockquote&gt;
+&lt;p&gt;Of course there’s no reliable way to truly authenticate these requests in an environment where millions of devices have immediate access to them upon downloading an Application: this scheme is purely to make it much more difficult to casually scrape these keys.&lt;/p&gt;
+&lt;/blockquote&gt;
+&lt;p&gt;Ah, security by obscurity. Calculating the HMAC itself is simple given the documentation, but it requires a “secret” HMAC key specific to the server. As the documentation is aware, this key is hardly secret, but it’s not available on the Canadian Digital Service’s &lt;a href="https://github.com/cds-snc/covid-alert-app"&gt;official repositories&lt;/a&gt;. Interoperating with the upstream servers would require some “extra” tricks.&lt;/p&gt;
+&lt;p&gt;From purely academic interest, we can write and debug our implementation without any such authorization by running our own sandbox server. Minus the configuration, the server source is available, so after spinning up a virtual machine and fighting with Go versioning, we can test our Python script.&lt;/p&gt;
+&lt;p&gt;Speaking of a personal sandbox…&lt;/p&gt;
+&lt;h2 id="key-upload"&gt;Key upload&lt;/h2&gt;
+&lt;p&gt;There is one essential edge case to the contact tracing implementation, one that we &lt;em&gt;can’t&lt;/em&gt; test against the Canadian servers. And edge cases matter. In effect, the entire Exposure Notifications infrastructure is designed for the edge cases. If you don’t care about edge cases, you don’t care about digital contact tracing (so please, stay at home.)&lt;/p&gt;
+&lt;p&gt;The key feature – and key edge case – is uploading Temporary Exposure Keys to the Canadian key server in case of a COVID-19 diagnosis. This upload requires an alphanumeric code generated by a healthcare provider upon diagnosis, so if we used the shared servers, we couldn’t test an implementation. With our sandbox, we can generate as many alphanumeric codes as we’d like.&lt;/p&gt;
+&lt;p&gt;Once sandboxed, there isn’t much to the implementation itself: the keys are snarfed out of the SQLite database, we handshake with the server over protocol buffers marshaled over POST requests, and we throw in some public-key cryptography via the Python bindings to &lt;code&gt;libsodium&lt;/code&gt;.&lt;/p&gt;
+&lt;p&gt;This functionality neatly fits into a second dedicated Python script which does &lt;em&gt;not&lt;/em&gt; interface with the main library. It’s exposed as a command line interface with flow resembling that of the mobile application, adhering reasonably to the UNIX philosophy. Admittedly I’m not sure wrestling with the command line is top on the priority list of a Linux hacker ill with COVID-19. Regardless, the interface is suitable for higher-level (graphical) abstractions.&lt;/p&gt;
+&lt;p&gt;Problem solved, but of course there’s a gotcha: if the request is malformed, an error should be generated as a key robustness feature. Unfortunately, while developing the script against my sandbox, a bug led the request to be dropped unexpectedly, rather than returning with an error message. On the server implemented in &lt;a href="https://en.wikipedia.org/wiki/Go_(programming_language)"&gt;Go&lt;/a&gt;, there was an apparent &lt;code&gt;nil&lt;/code&gt; dereference. Oops. Fixing this isn’t necessary for this project, but it’s still a bug, even if it requires a COVID-19 diagnosis to trigger. So I went and did the Canadian thing and sent a pull request.&lt;/p&gt;
+&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
+&lt;p&gt;All in all, we end up with a Linux implementation of Exposure Notifications functional in Ontario, Canada. What’s next? Perhaps supporting contact tracing systems elsewhere in the world – patches welcome. Closer to home, while functional, the aesthetics are not (yet) anything to write home about – perhaps we could write a touch-based Linux interface for mobile Linux interfaces like &lt;a href="https://en.wikipedia.org/wiki/KDE_Plasma_5#Plasma_Mobile"&gt;Plasma Mobile&lt;/a&gt; and &lt;a href="https://developer.puri.sm/Librem5/Software_Reference/Environments/Phosh.html"&gt;Phosh&lt;/a&gt;, maybe even running it on a Android flagship flashed with &lt;a href="https://en.wikipedia.org/wiki/PostmarketOS"&gt;postmarketOS&lt;/a&gt; to go full circle.&lt;/p&gt;
+&lt;p&gt;&lt;a href="https://gitlab.freedesktop.org/alyssa/liben"&gt;Source code for &lt;code&gt;liben&lt;/code&gt; is available&lt;/a&gt; for any one who dares go near. Compiling from source is straightforward but necessary at the time of writing. As for packaging?&lt;/p&gt;
+&lt;p&gt;Here’s hoping COVID-19 contact tracing will be obsolete by the time &lt;code&gt;liben&lt;/code&gt; hits Debian stable.&lt;/p&gt;
+&lt;section class="footnotes" role="doc-endnotes"&gt;
+&lt;hr /&gt;
+&lt;ol&gt;
+&lt;li id="fn1" role="doc-endnote"&gt;&lt;p&gt;Today (Monday) is Labour Day, so this is a 3-day weekend. But I started on Saturday and posted this today, so it &lt;em&gt;technically&lt;/em&gt; counts.&lt;a href="#fnref1" class="footnote-back" role="doc-backlink"&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
+&lt;/ol&gt;
+&lt;/section&gt;
+</description><guid isPermaLink="true">https://rosenzweig.io/blog/fun-and-games-with-exposure-notifications.html</guid><pubDate>Mon, 07 Sep 2020 00:00:00 -0500</pubDate></item><item><title>The Federation Fallacy</title><link>https://rosenzweig.io/blog/the-federation-fallacy.html</link><description>&lt;p&gt;Throughout the free software community, an unbridled aura of justified mistrust fills the air: mistrust of large corporations, mistrust of governments, and of course, mistrust of proprietary software. Each mistrust is connected by a critical thread: centralisation.&lt;/p&gt;
+&lt;p&gt;Thus, permeating the community are calls for decentralisation. To attack the information silos, corporate conglomerates, and governmental surveillance, decentralisation calls for &lt;em&gt;individuals&lt;/em&gt; to host servers for their own computing, rather than defaulting on the servers of those rich in data.&lt;/p&gt;
+&lt;p&gt;In the decentralised dream, every user hosts their own server. Every toddler and grandmother is required to become their own system administrator. This dream is an accessibility nightmare, for if advanced technical skills are the price to privacy, all but the technocratic elite are walled off from freedom.&lt;/p&gt;
+&lt;p&gt;Federation is a compromise. Rather than everyone hosting their own systems, ideally every technically able person would host a system for themselves and for their friends, and everyone’s systems could connect. If I’m technically able, I can host an “instance” not only for myself but also my loved ones around me. In theory, through federation my friends and family could take back their computing from the conglomerates, by trusting me and ceding power to me to cover the burden of their system administration.&lt;/p&gt;
+&lt;p&gt;What a dream.&lt;/p&gt;
+&lt;p&gt;Federated systems are all around us. The classic example is e-mail. I host my own email server, so I have the privilege of managing my own email address. According to the ideas of decentralisation, for an e-mail user to be fully free, they should host their own e-mail server.&lt;/p&gt;
+&lt;p&gt;But do &lt;em&gt;you&lt;/em&gt; host your own mail server? Do your friends? Does your &lt;em&gt;grandmother&lt;/em&gt;? Setting up a mail server often is time-consuming, ad hoc, and brittle; despite technical literacy and the hours I poured in, I continue to have problems with my e-mail delivery. Of course, e-mail is &lt;em&gt;technically&lt;/em&gt; federated, but for pragmatic reasons, most people’s personal e-mail is controlled by a centralised service like Google Gmail. Is Google a small individual you know personally?&lt;/p&gt;
+&lt;p&gt;There’s no surprise large companies administer most e-mail accounts; it is the expected consequence of &lt;em&gt;economies of scale&lt;/em&gt;. Due to the overhead of running a mail server, it makes economic sense to centralise. Decentralisation, while certainly possible, is impractical for email; accordingly, there are considerable privacy and censorship risks for many e-mail users, submitting to the rules of a massive service provider.&lt;/p&gt;
+&lt;p&gt;But maybe, due to its configuration complexity, e-mail could be an outlier. What about the federated chat protocol, XMPP?&lt;/p&gt;
+&lt;p&gt;Well? How many people do &lt;em&gt;you&lt;/em&gt; know who run their own XMPP server?&lt;/p&gt;
+&lt;p&gt;The protocol family claims over one billion users; you may unwittingly be one of them. But almost every one of those users is connecting not to the federated, open paradise, but rather to a walled garden. Of course, a few users connect to their personal server via a free software client, but most connect to &lt;em&gt;Facebook’s&lt;/em&gt; walled garden, via &lt;em&gt;Facebook&lt;/em&gt;’s app: WhatsApp.&lt;/p&gt;
+&lt;p&gt;Yes, &lt;em&gt;internally&lt;/em&gt; the ever-popular, ever-proprietary WhatsApp was once a federated protocol. &lt;a href="https://en.wikipedia.org/wiki/WhatsApp#Technical"&gt;Inside WhatsApp is XMPP&lt;/a&gt;, but WhatsApp users are isolated from non-WhatsApp users. Likewise, users preferring free software for practical or ideological reasons are isolated from their friends on WhatsApp. Federation was baked into the genes of this protocol, but while the original “freedom-preserving” chat system claims a few technically advanced adherents, the freedom of the masses was unfortunately lost to corporate interests.&lt;/p&gt;
+&lt;p&gt;Chat protocols aside, arguably the web itself is a classic, if overlooked, example of a federated system. The web, as a collection of interlinked websites backed by the &lt;a href="https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol"&gt;Hypertext Transfer Protocol&lt;/a&gt;, contains the two essential features of a federated, decentralised system. Every author is their own publisher with their own web server, decentralising the web. Similarly, authors are encouraged to link their pages to other pages on other parts of the Internet, hosted by other authors, and in exchange, other pages tend to link to them, creating an interconnected – federated? – system. The fabric of the Internet itself follows the decentralised technical structure.&lt;/p&gt;
+&lt;p&gt;Yet the web is no friend of freedom. The overwhelming majority of web traffic is to commercial silos without users’ best interests at heart. It is certainly possible, if complicated, to host a web server oneself. But practically, the English-speaking web is undeniably centralised around Silicon Valley enterprises, the same enterprises that pose considerable threats to user freedom without the possibility of informed consent. Decentralisation specifically seeks to push &lt;em&gt;back&lt;/em&gt; against these Internet masters, yet these centralised giants are operating within the technical framework of a decentralised system.&lt;/p&gt;
+&lt;p&gt;Admittedly, the web is nebulous as an example of federation. What do we make of &lt;a href="https://joinmastodon.org/"&gt;Mastodon&lt;/a&gt;, a freedom-respecting federated alternative to Twitter, whose network of servers hosted by diverse individuals and organizations is lovingly known as the “fediverse”?&lt;/p&gt;
+&lt;p&gt;Mastodon is perhaps the most promising example of a federated system staying true to its grassroots ideals, so far lacking the tell-tale signs of large-scale corporate tampering. If any federated system can work, let it be Mastodon.&lt;/p&gt;
+&lt;p&gt;But let’s look at the data first. In comparison to some other federated systems, a comprehensive list of federated Mastodon instances (servers) is available for browsing online at &lt;a href="https://instances.social"&gt;instances.social&lt;/a&gt;. Some non-Mastodon microblogging services are mixed in, as they federate with Mastodon, knitting together the larger “fediverse”. Nevertheless, some simple scripting allows the machine-readable list to be downloaded and processed.&lt;/p&gt;
+&lt;p&gt;Analysing the data, after filtering out instances with no users, we see there are 3070 listed instances. So far, sounds good – that is 3070x more theoretically independent instances than Twitter.&lt;/p&gt;
+&lt;p&gt;Looking at user count, we see there are just under two-million user accounts registered. In other words, there is an average of 642 users per instance. Already the federation fantasy is feeling shaky. How could one system administrator maintain close friendships with over six-hundred people? How could so many people trust a single administrator with their private status updates and direct messages? It seems improbable, if not impossible. With the amount of time required to maintain intimacy with 640 people, there would be no time left in a day to maintain a large Mastodon instance!&lt;/p&gt;
+&lt;p&gt;But what’s really striking about the data is the &lt;em&gt;distribution&lt;/em&gt; of users across the instances. In the federation fantasy, given 3,070 instances, we hope that 50% of the user base is spread across 1,535 instances, and the other 50% across the other 1535 instances. That is, in fantasy land, every instance would host an equal number of users (642).&lt;/p&gt;
+&lt;p&gt;In reality, guess how many instances encompass half of the user base. Maybe 1,000? Alright, there are some big instances in there, so perhaps 100? Well, there are a lot of really tiny instances mixed in, so possibly only 20?&lt;/p&gt;
+&lt;p&gt;The answer?&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;Three&lt;/strong&gt;.&lt;/p&gt;
+&lt;p&gt;Just &lt;em&gt;three&lt;/em&gt; instances encompass 50.8% of users.&lt;/p&gt;
+&lt;p&gt;The most popular instance in the Mastodon universe sports over a &lt;em&gt;half a million users&lt;/em&gt;. The runner up is &lt;a href="https://mastodon.social"&gt;mastodon.social&lt;/a&gt;, the flagship instance run by the developer, clocking in at just over three-hundred thousand. These two instances alone encompass 41.2% of users.&lt;/p&gt;
+&lt;p&gt;It is true that many Mastodon users have multiple accounts spread out across different instances, so it may be more reliable to check by a metric like number of status updates. Unfortunately, results there are similar: half of reported status updates correspond to a mere five instances.&lt;/p&gt;
+&lt;p&gt;Overall, Mastodon certainly encourages people to use its &lt;em&gt;technical&lt;/em&gt; ability to host their own instances for themselves, their friends, and their shared interest groups. True, federation technically allows all 3,070 instances to connect to each other, subject to the policies of each instance owner. But practically, what have we achieved when the typical Mastodon user has an account on one of just &lt;em&gt;three&lt;/em&gt; instances?&lt;/p&gt;
+&lt;p&gt;If we define its success by decentralisation, unfortunately – unsurprisingly – Mastodon has failed. Unless you use an obscure micro instance, what good is decentralisation anyway?&lt;/p&gt;
+&lt;p&gt;&lt;a href="/Mastodon-Users.png"&gt;&lt;img src="/Mastodon-Users.png" alt="Mastodon users per instance" /&gt;&lt;/a&gt;&lt;/p&gt;
+&lt;p&gt;Visually, the above graph shows user count on the Y-axis, where the X-axis corresponds to the instance popularity ranking. So, the most popular instance is the left-most point, the second most popular instance is a hair to the right of it, and all the way on the right is the tiniest instance. In total, the plot shows the distribution of users on (primarily Mastodon) instances&lt;/p&gt;
+&lt;p&gt;In the federated ideal, where all instances are created equal, the graph should be a horizontal line showing each instance with 642 people.&lt;/p&gt;
+&lt;p&gt;Of course, due to substantial inequality between instance size, we expect to see a &lt;a href="https://en.wikipedia.org/wiki/Power_law"&gt;power distribution&lt;/a&gt;, with a spike on the left that quickly falls and tapers out. Power laws govern much of the real world; many phenomena behave according to this unequal distribution.&lt;/p&gt;
+&lt;p&gt;But look at that graph. Calling this distribution a power law would be generous to say the least. There is a &lt;em&gt;massive&lt;/em&gt; spike corresponding to just a few instances, and the rest of the graph is nearly invisible to the naked eye, so tiny and so overshadowed by just a few giants. Frankly, this distribution is closer to the &lt;a href="https://en.wikipedia.org/wiki/Dirac_delta_function"&gt;Dirac delta function&lt;/a&gt; than a power law.&lt;/p&gt;
+&lt;p&gt;Ultimately, there are two types of production-quality networked systems: those designed for centralisation, and those designed for federation. But from the tales of e-mail, XMPP, the web, and Mastodon, it is clear that &lt;em&gt;federation does not mean decentralisation&lt;/em&gt;. Each federated system analyzed above only became production-quality and accessible to the masses at the immense cost of &lt;em&gt;centralisation&lt;/em&gt;. Each service retains the theoretical ability to federate with tiny self-hosted servers, but the vast majority of users are &lt;em&gt;de facto&lt;/em&gt; concentrated about a few major servers. E-mail concentrated on Google, XMPP concentrated on Facebook, the web concentrated on Silicon Valley, and Mastodon concentrated onto a few flagship instances.&lt;/p&gt;
+&lt;p&gt;Indeed, it seems all networked systems tend towards centralisation as the natural consequence of growth. Some systems, both legitimate and illegitimate, are intentionally designed for centralisation. Other systems, like those in the Mastodon universe, are specifically designed to avoid centralisation, but even these succumb to the centralised black hole as their user bases grow towards the event horizon.&lt;/p&gt;
+&lt;p&gt;Consequently, it is not enough to build systems that “can” federate: all four case studies above &lt;em&gt;can&lt;/em&gt; federate but de facto stay centralised. Nor is it enough to “save ourselves”, self-hosting our own decentralised digital islands, while ignoring the reality of the masses. We cannot close our eyes and rest, content with freedom in our personal bubble, ignoring the reality of our non-technical friends and family who do not enjoy the same luxuries of privacy and free speech. We cannot ignore their struggles after resolving our own, justifying our behaviour to ourselves given that there is no technical obstacle to their digital freedom “if only” they abandoned their convenience and dedicated themselves to learning system administration and software debugging. With or without decentralised free software, within the technocracy our non-technical loved ones are barred from liberty over their own lives. Decentralisation is certainly better than dependence on centralised corporate conglomerates, but &lt;em&gt;for whom?&lt;/em&gt; A society free for the few is a society in chains.&lt;/p&gt;
+&lt;p&gt;Nevertheless, as we mourn the centralised fates of these promising systems and yearn to improve the future, we must understand that &lt;em&gt;centralisation is inevitable&lt;/em&gt;. For those of us immersed in the decentralised dream, this fact is uncomfortable to face, but until we do, we will never move on to build systems that are truly free. The fact is, as networked systems designers our choice is rarely “concentrated power or dispersed power?”, but rather “where will power centralise?” If we quixotically opt to cede power over our creations via decentralisation, inevitably someone else will fill the power gap.&lt;/p&gt;
+&lt;p&gt;Alas, naive decentralisation is an experiment in anarchy. Anarchy, whether real or cyber, is a zone where life is “nasty, brutish, and short”; there is no guarantee of true freedom. In any anarchy, soon the power hungry will fill in the void. Remember, history teaches these new despots often establish totalitarian regimes no better than those originally overturned. Did microblogging suffer a parallel fate? If the explosion of Mastodon dethroned Twitter, it did so at the cost of establishing a new instance as the new king.&lt;/p&gt;
+&lt;p&gt;For freedom’s sake, face the facts: federation is dead.&lt;/p&gt;
+&lt;p&gt;But there is hope. In political history, the birth of free nations typically contains three stages: dictatorship, briefly overthrown to anarchy, reformed to democracy. Democracy balances the will of the people with the efficiency of centralisation. Democratic freedom is incompatible with an oligarchic authority, accumulating power imposing their will on others. But neither is democracy a free-for-all; some degree of centralisation is acceptable and even useful for protecting liberty, provided it’s centralised around a legitimate, democratic institution.&lt;/p&gt;
+&lt;p&gt;So it is in cyberspace. As the Internet blossomed into chains, the lucrative, exploitative practices of the Silicon Valley giants created digital totalitarianism, a system offering profits for the few at the expense of freedom of the many: &lt;strong&gt;information dictatorship&lt;/strong&gt;.&lt;/p&gt;
+&lt;p&gt;The decentralisation movement understands this oppression as a consequence of unjust organized power, and platforms like Mastodon are successfully overthrowing this information dictatorship, in its absence creating &lt;strong&gt;information anarchy&lt;/strong&gt;.&lt;/p&gt;
+&lt;p&gt;While the story does not end with anarchy, it need not end with a return to the status quo or to the aristocracy of the technical elite. No, in the power vacuum left by the victory of decentralisation lies the opportunity to create something new, something beautiful, something promising true digital freedom for everyone. After the fall of the information dictatorship, balancing in our fingertips is the precious opportunity to create &lt;strong&gt;information democracy&lt;/strong&gt;.&lt;/p&gt;
+&lt;p&gt;Like its real world counterpart, information democracy is partially centralised for efficiency but encourages community participation for legitimacy. This primarily centralised democratic model is the only realistic option, evident from the structure of free nations in the real world. Yes, there is centralisation required, but we cannot continue to paint centralisation as the virtual bogeyman. Spreading fear is a natural knee jerk reaction to the illegitimate corporatocracy, but we cannot succumb to fear. Whether physical or virtual, centralisation is &lt;em&gt;not evil&lt;/em&gt;; it is merely morally neutral.&lt;/p&gt;
+&lt;p&gt;That said, when we ultimately do centralise, we must ensure that the central organization belongs to us, the users, not to special interests. On this point, both corporate and decentralised services alike fail. Yes, corporate interests are oligarchies, bowing to money, but so too is the decentralised technocracy, bowing to arcane technical know-how. In a truly democratic digital space, participation must be accessible to &lt;em&gt;everyone&lt;/em&gt;, not just those with money or expertise. If my grandmother cannot participate in the administration of the technology she uses, her best interests may not be adequately represented by the technology. Whether she is beholden to a corporation or to a system administrator, that is not digital freedom.&lt;/p&gt;
+&lt;p&gt;These features distinguish information democracy from its predecessors, creating a system not only more practical but also freer than the federated anarchy. In any democracy, centralised power is wielded to protect freedom, but in decentralised anarchy, no power is wielded at all. Anarchy fantasizes that freedom protects itself; inevitably, life in anarchy is life in chains, for any freedom gained is temporary as the system collapses under the &lt;a href="https://en.wikipedia.org/wiki/Tragedy_of_the_commons"&gt;tragedy of the commons&lt;/a&gt; and the &lt;a href="https://en.wikipedia.org/wiki/Paradox_of_tolerance"&gt;paradox of tolerance&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;Granted, information democracy is not a perfect system; in a virtual world composed primarily of information oligarchies, it is natural to be wary of its potential to stray from its ideals. Nevertheless, the few flaws of information democracy directly mirror the flaws of physical democracy; most objections to information democracy were raised centuries ago during the spread of real-world democracies. Indeed, democracy is not perfect, but it is the only just system we have. Fear cannot blind us, painting perfect as the enemy of good. However flawed, democracy is the only option for a free world.&lt;/p&gt;
+&lt;p&gt;Without further ado, if proprietary services are dictatorships and decentralised services oligarchic anarchies, imagine democratic microblogging: a flagship instance with a clear set of liberal rules, reached by mutual consensus in the community. When necessary, these rules are enforced by an elected moderation team voted in by the community, though generally bureaucracy should be minimized. The platform is entirely free software, so anyone technical can contribute via code. The server exposes a public API, allowing third-party clients to flourish without encumbering progress on the “official” client. Similarly, although the service is primarily centralised, the servers may permit federation if relevant. Ideally, when sensitive data like private messages is involved, communications are end-to-end encrypted and possibly even peer-to-peer, minimizing the centralised server’s power.&lt;/p&gt;
+&lt;p&gt;Sound good? These criteria are non-exhaustive but illustrate the potential for information democracy. More importantly, sound familiar? Many of these ideals are &lt;em&gt;already embodied by Mastodon&lt;/em&gt;. Mastodon may not have lived up to its federated fantasy, and judging it by its own meter stick of decentralisation, it failed. But judging it by the standards of an information democracy, regardless of its inadvertent centralisation, Mastodon is &lt;em&gt;a success story&lt;/em&gt;.&lt;/p&gt;
+&lt;p&gt;True, Mastodon is de facto centralised, but despite the size of the largest instances, it retains the ability to federate with other Mastodon instances. Further, Mastodon is able to federate with other free software friendly networks via a pair of common protocols, creating the familial fabric of the “fediverse”. Centralization and federation can certainly co-exist in harmony to improve efficiency while retaining user choice.&lt;/p&gt;
+&lt;p&gt;Nevertheless, the most successful information democracy is orders of magnitude larger than Mastodon. A centralised site with a documented set of rules reached by community consensus, hosted by a non-profit funded by donations from the user base itself. A community governed by administrators elected by participants in the community. A site whose source code is licensed as free software, allowing specialized adaptations to flourish, complementing rather than harming the main instance. A site that embraces &lt;a href="https://en.wikipedia.org/wiki/Free_culture_movement"&gt;free culture&lt;/a&gt; to ensure its fruits are accessible to all of humanity in perpetuity.&lt;/p&gt;
+&lt;p&gt;Wikipedia.&lt;/p&gt;
+&lt;p&gt;The Wikipedia projects embody the ideals of information democracy. Yes, Wikipedia is centralised; indeed, its software does not directly federate with other smaller wikis, many of which mirror Wikipedia articles – and that’s okay. Wikipedia derives its user freedom not from decentralisation, but from participatory democracy structures. Anyone can edit; with some exceptions to keep logistics manageable, a new user’s first edit is valued as much an administrator’s one-millionth. Typically, conflicts are resolved not from a top-down dictatorship, despite the centralisation, but rather from bottom-up consensus seeking. &lt;a href="https://en.wikipedia.org/wiki/WP:DEMOCRACY"&gt;Wikipedia is not an experiment in democracy&lt;/a&gt;, but it nevertheless operates as an information democracy, no experimentation needed.&lt;/p&gt;
+&lt;p&gt;To protect the digital freedom of &lt;em&gt;everyone&lt;/em&gt;, not just the wealthy or the technically inclined, we need &lt;strong&gt;information democracy&lt;/strong&gt;. To see the future of Internet liberty for the few, look not to Silicon Valley, for overwhelming commercial interests can never adequately protect the user. To see the future of liberty for the many, look not to the obscurity of XMPP, for arcane technical voodoo can never be wielded by those who need it most. To see the free future, look to Wikipedia.&lt;/p&gt;
+&lt;p&gt;Whether online or in the real world, rejecting dictatorships is not enough for freedom.&lt;/p&gt;
+&lt;p&gt;We must endorse democracy.&lt;/p&gt;
+&lt;p&gt;&lt;em&gt;Thank you to April, Connor, Florrie, and Natalia for soundboarding and reading early drafts.&lt;/em&gt;&lt;/p&gt;
+</description><guid isPermaLink="true">https://rosenzweig.io/blog/the-federation-fallacy.html</guid><pubDate>Sun, 03 Mar 2019 00:00:00 -0500</pubDate></item><item><title>Hilariously Fast Volume Computation with the Divergence Theorem</title><link>https://rosenzweig.io/blog/hilariously-fast-volume-computation-with-the-divergence-theorem.html</link><description>&lt;p&gt;(No, there won’t be jokes.)&lt;/p&gt;
+&lt;p&gt;The following presents a fast algorithm for volume computation of a simple, closed, triangulated 3D mesh. This assumption is a consequence of the divergence theorem. Further extensions may generalise to other meshes as well, although that is presently out of scope.&lt;/p&gt;
+&lt;p&gt;We begin with the definition of volume as the triple integral over a region of the constant one:&lt;/p&gt;
+&lt;p&gt;&lt;span class="math display"&gt;\[V = \iiint_R 1 \mathrm{d}V\]&lt;/span&gt;&lt;/p&gt;
+&lt;p&gt;Let &lt;span class="math inline"&gt;\(\mathbf{F}\)&lt;/span&gt; be a function in &lt;span class="math inline"&gt;\(\mathbb{R}^3\)&lt;/span&gt; such that its divergence is equal to one. For the purposes of this paper, we choose:&lt;/p&gt;
+&lt;p&gt;&lt;span class="math display"&gt;\[\mathbf{F}(x, y, z) = &amp;lt;x, 0, 0&amp;gt;\]&lt;/span&gt;&lt;/p&gt;
+&lt;p&gt;It can easily be verified that&lt;/p&gt;
+&lt;p&gt;&lt;span class="math display"&gt;\[\mathrm{div} \mathbf{F} = \frac{\partial F}{\partial x} + \frac{\partial F}{\partial y} + \frac{\partial F}{\partial z} = 1 + 0 + 0 = 1\]&lt;/span&gt;&lt;/p&gt;
+&lt;p&gt;Therefore,&lt;/p&gt;
+&lt;p&gt;&lt;span class="math display"&gt;\[V = \iiint_R 1 dV = \iiint_R \mathrm{div} \mathbf{F}(x, y, z) \mathrm{d}V\]&lt;/span&gt;&lt;/p&gt;
+&lt;p&gt;By the Divergence Theorem, this is equal to the surface integral:&lt;/p&gt;
+&lt;p&gt;&lt;span class="math display"&gt;\[V = \iint_S \mathbf{F}(x, y, z) \mathrm{d}\mathbf{S}\]&lt;/span&gt;&lt;/p&gt;
+&lt;p&gt;This surface integral, defined over the surface S of the 3D mesh, is equal to the sum of its piecewise triangle parts. Let &lt;span class="math inline"&gt;\(T_i\)&lt;/span&gt; denote the surface of the &lt;span class="math inline"&gt;\(i\)&lt;/span&gt;’th triangle in the mesh. Then,&lt;/p&gt;
+&lt;p&gt;&lt;span class="math display"&gt;\[V = \sum_{i = 0} \iint_{T_i} \mathbf{F}(x, y, z) \mathrm{d}\mathbf{S}\]&lt;/span&gt;&lt;/p&gt;
+&lt;p&gt;Let &lt;span class="math inline"&gt;\(T_{in}\)&lt;/span&gt; represent the &lt;span class="math inline"&gt;\(n\)&lt;/span&gt;’th vertex of the &lt;span class="math inline"&gt;\(i\)&lt;/span&gt;’th triangle. Let &lt;span class="math inline"&gt;\(\Delta_1\)&lt;/span&gt; equal the vector difference between &lt;span class="math inline"&gt;\(T_{i1}\)&lt;/span&gt; and &lt;span class="math inline"&gt;\(T_{i0}\)&lt;/span&gt;, and &lt;span class="math inline"&gt;\(\Delta_2\)&lt;/span&gt; likewise equal to &lt;span class="math inline"&gt;\(T_{i2} - T{i0}\)&lt;/span&gt;. Each individual triangle &lt;span class="math inline"&gt;\(T_i\)&lt;/span&gt; may thus be parametrised as:&lt;/p&gt;
+&lt;p&gt;&lt;span class="math display"&gt;\[\mathbf{r}(u, v) = T_{i0} + u\Delta_1 + v\Delta_2\]&lt;/span&gt;&lt;/p&gt;
+&lt;p&gt;Then, simple differentiation yields:&lt;/p&gt;
+&lt;p&gt;&lt;span class="math display"&gt;\[\mathbf{r}_u = \Delta_1\]&lt;/span&gt; &lt;span class="math display"&gt;\[\mathbf{r}_v = \Delta_2\]&lt;/span&gt;&lt;/p&gt;
+&lt;p&gt;Therefore,&lt;/p&gt;
+&lt;p&gt;&lt;span class="math display"&gt;\[\mathbf{r}_u \times \mathbf{r}_v = \Delta_1 \times \Delta_2\]&lt;/span&gt;&lt;/p&gt;
+&lt;p&gt;Thus, the surface integral can be rewritten in terms of this parametrisation, substituting in the definition of &lt;span class="math inline"&gt;\(\mathbf{F}\)&lt;/span&gt; as needed:&lt;/p&gt;
+&lt;p&gt;&lt;span class="math display"&gt;\[V = \sum_{i = 0} \iint_{T_i} \mathbf{F}(x, y, z) (\mathbf{r}_u \times \mathbf{r}_v) dA\]&lt;/span&gt; &lt;span class="math display"&gt;\[= \sum_{i = 0} \iint_{T_i} \mathbf{F}(x, y, z) \dot (\Delta_{i1} \times \Delta_{i2}) dA\]&lt;/span&gt; &lt;span class="math display"&gt;\[= \sum_{i = 0} \iint_{T_i} &amp;lt;x, 0, 0&amp;gt; \dot (\Delta_{i1} \times \Delta_{i2}) dA\]&lt;/span&gt;&lt;/p&gt;
+&lt;p&gt;This cross product is constant throughout the triangle and easy to calculate from the vertex data. Only the X component of the cross product should be calculated; the others are equal to zero due to the dot product with the zero components of &lt;span class="math inline"&gt;\(\mathbf{F}\)&lt;/span&gt;. &lt;span class="math inline"&gt;\(V\)&lt;/span&gt; can be thus be rewritten as:&lt;/p&gt;
+&lt;p&gt;&lt;span class="math display"&gt;\[V = \sum_{i = 0} (\Delta_{i1} \times \Delta_{i2})_x \iint_{T_i} x dA\]&lt;/span&gt;&lt;/p&gt;
+&lt;p&gt;We now focus on the surface integral &lt;span class="math inline"&gt;\(\iint_{T_i} x dA\)&lt;/span&gt;. Expanding with the parametrisation yields:&lt;/p&gt;
+&lt;p&gt;&lt;span class="math display"&gt;\[\iint_{T_i} x dA = \int_{0}^{1} \int_{0}^{u} x dv du = \int_{0}^{1} \int_{0}^{u} (T_{i0x} + u \Delta_{i1x} + v \Delta_{i2x}) dv du\]&lt;/span&gt;&lt;/p&gt;
+&lt;p&gt;This integral can be directly evaluated, treating vertex data as constants:&lt;/p&gt;
+&lt;p&gt;&lt;span class="math display"&gt;\[\int_{0}^{1} \int_{0}^{1-u} (T_{i0x} + u \Delta_{i1x} + v \Delta_{i2x}) dv du\]&lt;/span&gt; &lt;span class="math display"&gt;\[= T_{i0x} \int_{0}^{1} \int_{0}^{1-u} dv du + \Delta_{i1x} \int_{0}^{1} \int_{0}^{1-u} u dv du + \Delta_{i2x}) \int_{0}^{1} \int_{0}^{1-u} v dv du\]&lt;/span&gt; &lt;span class="math display"&gt;\[= T_{i0x} (\frac{1}{2}) + \Delta_{i1x} (\frac{1}{6}) + \Delta_{i2x} (\frac{1}{6})\]&lt;/span&gt; &lt;span class="math display"&gt;\[= T_{i0x} (\frac{1}{2}) + (T_{i1x} - T_{i0x})(\frac{1}{6}) + (T_{i2x} - T_{i0x})(\frac{1}{6})\]&lt;/span&gt; &lt;span class="math display"&gt;\[= T_{i0x} (\frac{1}{6}) + (T_{i1x})(\frac{1}{6}) + (T_{i2x})(\frac{1}{6})\]&lt;/span&gt; &lt;span class="math display"&gt;\[= \frac{1}{6}(T_{i0x} + T_{i1x} + T_{i2x})\]&lt;/span&gt;&lt;/p&gt;
+&lt;p&gt;Substituting into the original sum and pulling out a constant factor of &lt;span class="math inline"&gt;\(\frac{1}{6}\)&lt;/span&gt; to avoid the inner loop division, this yields the following compact formula for the volume:&lt;/p&gt;
+&lt;p&gt;&lt;span class="math display"&gt;\[V = \frac{1}{6} \sum_{i = 0} (\Delta_{i1} \times \Delta_{i2})_x (T_{i0x} + T_{i1x} + T_{i2x})\]&lt;/span&gt;&lt;/p&gt;
+&lt;h2 id="performance-analysis"&gt;Performance analysis&lt;/h2&gt;
+&lt;p&gt;The final algorithm contains no numerical integration nor differentiation. In contrast to common naive algorithms for volume, which are equivalent to rendering the mesh and then sampling the render, an expensive operation, there is only a single loop in this algorithm, over the triangles. Thus, this algorithm for volume computation is O(n) to the number of the triangles. Furthermore, the per-triangle calculation is similarly efficient: given the natural expansion of the cross product, the inner part contains seven additions and three multiplications. On the outside of the loop is only a single multiplication. Thus, for a mesh of &lt;span class="math inline"&gt;\(n\)&lt;/span&gt; triangles, the algorithm requires &lt;span class="math inline"&gt;\(8n - 1\)&lt;/span&gt; additions and &lt;span class="math inline"&gt;\(3n + 1\)&lt;/span&gt; multiplications, or &lt;span class="math inline"&gt;\(11n\)&lt;/span&gt; floating point operations. This is &lt;em&gt;very&lt;/em&gt; fast.&lt;/p&gt;
+&lt;p&gt;For a ballpark number, if volume needs to be calculated every frame in a high-performance 60 frames per second application, without the aid of a GPU, only using the CPU capabilities of a &lt;a href="https://raspberrypi.stackexchange.com/questions/55862/what-is-the-performance-and-the-performance-per-watt-of-raspberry-pi-3-in-gflops"&gt;$35 Raspberry Pi&lt;/a&gt;, around 30 million triangles could be measured every frame.&lt;/p&gt;
+&lt;h2 id="motivation"&gt;Motivation&lt;/h2&gt;
+&lt;p&gt;The vector calculus exam is soon, and I need to study. Plus, who doesn’t love 3D graphics?!&lt;/p&gt;
+&lt;p&gt;&lt;del&gt;I would be (pleasantly) surprised if the algorithm is novel.&lt;/del&gt; Further research &lt;em&gt;after&lt;/em&gt; posting reveals the paper &lt;a href="http://chenlab.ece.cornell.edu/Publication/Cha/icip01_Cha.pdf"&gt;Efficient Feature Extraction for 2D/3D Objects in Mesh Representation&lt;/a&gt; by Cha Zheng and Tsuhan Chen, which appears to describe the same algorithm, although the derivation is different. It was fun while it lasted!&lt;/p&gt;
+</description><guid isPermaLink="true">https://rosenzweig.io/blog/hilariously-fast-volume-computation-with-the-divergence-theorem.html</guid><pubDate>Fri, 16 Feb 2018 00:00:00 -0500</pubDate></item></channel></rss>