Pid Einshttps://0pointer.net/blog/Thu, 03 Nov 2022 00:00:00 +0100Linux Boot Partitionshttps://0pointer.net/blog/linux-boot-partitions.html<h1>💽 Linux Boot Partitions and How to Set Them Up 🚀</h1> <p><em>Let’s have a look how traditional Linux distributions set up <code>/boot/</code> and the ESP, and how this could be improved.</em></p> <p>How Linux distributions traditionally have been setting up their “boot” file systems has been varying to some degree, but the most common choice has been to have a separate partition mounted to <code>/boot/</code>. Usually the partition is formatted as a Linux file system such as ext2/ext3/ext4. The partition contains the kernel images, the initrd and various boot loader resources. Some distributions, like Debian and Ubuntu, also store ancillary files associated with the kernel here, such as <code>kconfig</code> or <code>System.map</code>. Such a traditional boot partition is only defined within the context of the distribution, and typically not immediately recognizable as such when looking just at the partition table (i.e. it uses the generic Linux partition type UUID).</p> <p>With the arrival of UEFI a new partition relevant for boot appeared, the <em>EFI System Partition</em> (ESP). This partition is defined by the firmware environment, but typically accessed by Linux to install or update boot loaders. The choice of file system is not up to Linux, but effectively mandated by the UEFI specifications: vFAT. In theory it could be formatted as other file systems too. However, this would require the firmware to support file systems other than vFAT. This is rare and firmware specific though, as vFAT is the only file system mandated by the UEFI specification. In other words, vFAT is the only file system which is guaranteed to be universally supported.</p> <p>There’s a major overlap of the type of the data typically stored in the ESP and in the traditional boot partition mentioned earlier: a variety of boot loader resources as well as kernels/initrds.</p> <p>Unlike the traditional boot partition, the ESP is easily recognizable in the partition table via its GPT partition type UUID. The ESP is also a <em>shared resource</em>: all OSes installed on the same disk will share it and put their boot resources into them (as opposed to the traditional boot partition, of which there is one per installed Linux OS, and only that one will put resources there).</p> <p>To summarize, the most common setup on typical Linux distributions is something like this:</p> <table> <tr> <th>Type</th> <th>Linux Mount Point</th> <th>File System Choice</th> </tr> <tr> <td>Linux “Boot” Partition</td> <td><code>/boot/</code></td> <td>Any Linux File System, typically ext2/ext3/ext4</td> </tr> <tr> <td>ESP</td> <td><code>/boot/efi/</code></td> <td>vFAT</td> </tr> </table> <p>As mentioned, not all distributions or local installations agree on this. For example, it’s probably worth mentioning that some distributions decided to put kernels onto the root file system of the OS itself. For this setup to work the boot loader itself [sic!] must implement a non-trivial part of the storage stack. This may have to include RAID, storage drivers, networked storage, volume management, disk encryption, and Linux file systems. Leaving aside the conceptual argument that complex storage stacks don’t belong in boot loaders there are very practical problems with this approach. Reimplementing the Linux storage stack in all its combinations is a massive amount of work. It took decades to implement what we have on Linux now, and it will take a similar amount of work to catch up in the boot loader’s reimplementation. Moreover, there’s a political complication: some Linux file system communities made clear they have no interest in supporting a second file system implementation that is not maintained as part of the Linux kernel.</p> <p>What’s interesting is that the <code>/boot/efi/</code> mount point is nested below the <code>/boot/</code> mount point. This effectively means that to access the ESP the Boot partition must exist and be mounted first. A system with just an ESP and without a Boot partition hence doesn’t fit well into the current model. The Boot partition will also have to carry an empty “efi” directory that can be used as the inner mount point, and serves no other purpose.</p> <p>Given that the traditional boot partition and the ESP may carry similar data (i.e. boot loader resources, kernels, initrds) one may wonder why they are separate concepts. Historically, this was the easiest way to make the pre-UEFI way how Linux systems were booted compatible with UEFI: conceptually, the ESP can be seen as just a minor addition to the status quo ante that way. Today, primarily two reasons remained:</p> <ul> <li> <p>Some distributions see a benefit in support for complex Linux file system concepts such as hardlinks, symlinks, SELinux labels/extended attributes and so on when storing boot loader resources. – I personally believe that making use of features in the boot file systems that the firmware environment cannot really make sense of is very clearly not advisable. The UEFI file system APIs know no symlinks, and what is SELinux to UEFI anyway? Moreover, putting more than the absolute minimum of simple data files into such file systems immediately raises questions about how to authenticate them comprehensively (including all fancy metadata) cryptographically on use (see below).</p> </li> <li> <p>On real-life systems that ship with non-Linux OSes the ESP often comes pre-installed with a size too small to carry multiple Linux kernels and initrds. As growing the size of an existing ESP is problematic (for example, because there’s no space available immediately after the ESP, or because some low-quality firmware reacts badly to the ESP changing size) placing the kernel in a separate, secondary partition (i.e. the boot partition) circumvents these space issues.</p> </li> </ul> <h2>File System Choices</h2> <p>We already mentioned that the ESP effectively has to be vFAT, as that is what UEFI (more or less) guarantees. The file system choice for the boot partition is not quite as restricted, but using arbitrary Linux file systems is not really an option either. The file system must be accessible by both the boot loader and the Linux OS. Hence only file systems that are available in both can be used. Note that such secondary implementations of Linux file systems in the boot environment – limited as they may be – are not typically welcomed or supported by the maintainers of the canonical file system implementation in the upstream Linux kernel. Modern file systems are notoriously complicated and delicate and simply don’t belong in boot loaders.</p> <p>In a trusted boot world, the two file systems for the ESP and the <code>/boot/</code> partition should be considered <em>untrusted</em>: any code or essential data read from them must be authenticated cryptographically before use. And even more, the file system structures themselves are also untrusted. The file system driver reading them must be careful not to be exploitable by a rogue file system image. Effectively this means a simple file system (for which a driver can be more easily validated and reviewed) is generally a better choice than a complex file system (Linux file system communities made it pretty clear that robustness against rogue file system images is outside of their scope and not what is being tested for.).</p> <p>Some approaches tried to address the fact that boot partitions are untrusted territory by encrypting them via a mechanism compatible to LUKS, and adding decryption capabilities to the boot loader so it can access it. This misses the point though, as encryption does not imply authentication, and only authentication is typically desired. The boot loader and kernel code are typically Open Source anyway, and hence there’s little value in attempting to keep secret what is already public knowledge. Moreover, encryption implies the existence of an encryption key. Physically typing in the decryption key on a keyboard might still be acceptable on desktop systems with a single human user in front, but outside of that scenario unlock via TPM, PKCS#11 or network services are typically required. And even on the desktop FIDO2 unlocking is probably the future. Implementing all the technologies these unlocking mechanisms require in the boot loader is not realistic, unless the boot loader shall become a full OS on its own as it would require subsystems for FIDO2, PKCS#11, USB, Bluetooth network, smart card access, and so on.</p> <h2>File System Access Patterns</h2> <p>Note that traditionally both mentioned partitions were read-only during most parts of the boot. Only later, once the OS is up, write access was required to implement OS or boot loader updates. In today’s world things have become a bit more complicated. A modern OS might want to require some limited write access already in the boot loader, to implement boot counting/boot assessment/automatic fallback (e.g., if the same kernel fails to boot 3 times, automatically revert to older kernel), or to maintain an early storage-based random seed. This means that even though the file system is <em>mostly read-only,</em> we need limited write access after all.</p> <p>vFAT cannot compete with modern Linux file systems such as <code>btrfs</code> when it comes to data safety guarantees. It’s not a journaled file system, does not use CoW or any form of checksumming. This means when used for the system boot process we need to be particularly careful when accessing it, and in particular when making changes to it (i.e., trying to keep changes local to single sectors). It is essential to use write patterns that minimize the chance of file system corruption. Checking the file system (“<code>fsck</code>”) before modification (and probably also reading) is important, as is ensuring the file system is put into a “clean” state as quickly as possible after each modification.</p> <p>Code quality of the firmware in typical systems is known to not always be great. When relying on the file system driver included in the firmware it’s hence a good idea to limit use to operations that have a better chance to be correctly implemented. For example, when writing from the UEFI environment it might be wise to avoid any operation that requires allocation algorithms, but instead focus on access patterns that only override already written data, and do not require allocation of new space for the data.</p> <p>Besides write access from the boot loader code (as described above) these file systems will require write access from the OS, to facilitate boot loader and kernel/initrd updates. These types of accesses are generally not fully random accesses (i.e., never partial file updates) but usually mean adding new files as whole, and removing old files as a whole. Existing files are typically not modified once created, though they might be replaced wholly by newer versions.</p> <h2>Boot Loader Updates</h2> <p>Note that the update cycle frequencies for boot loaders and for kernels/initrds are probably similar these days. While kernels are still vastly more complex than boot loaders, security issues are regularly found in both. In particular, as boot loaders (through “shim” and similar components) carry certificate/keyring and denylist information, which typically require frequent updates. Update cycles hence have to be expected regularly.</p> <h2>Boot Partition Discovery</h2> <p>The traditional boot partition was not recognizable by looking just at the partition table. On MBR systems it was directly referenced from the boot sector of the disk, and on EFI systems from information stored in the ESP. This is less than ideal since by losing this entrypoint information the system becomes unbootable. It’s typically a better, more robust idea to make boot partitions recognizable as such in the partition table directly. This is done for the ESP via the GPT partition type UUID. For traditional boot partitions this was not done though.</p> <h2>Current Situation Summary</h2> <p>Let’s try to summarize the above:</p> <ul> <li> <p>Currently, typical deployments use <strong>two distinct boot partitions</strong>, often using two distinct file system implementations</p> </li> <li> <p>Firmware effectively dictates existence of the ESP, and the use of <strong>vFAT</strong></p> </li> <li> <p>In userspace view: the ESP <strong>mount is nested</strong> below the general Boot partition mount</p> </li> <li> <p>Resources stored in both partitions are primarily kernel/initrd, and boot loader resources</p> </li> <li> <p>The mandatory use of vFAT brings certain <strong>data safety challenges</strong>, as does quality of firmware file system driver code</p> </li> <li> <p><strong>During boot limited write access</strong> is needed, during OS runtime more comprehensive write access is needed (though still not fully random).</p> </li> <li> <p>Less restricted but still <strong>limited write patterns from OS environment</strong> (only full file additions/updates/removals, during OS/boot loader updates)</p> </li> <li> <p>Boot loaders should not implement complex storage stacks.</p> </li> <li> <p>ESP can be <strong>auto-discovered</strong> from the partition table, traditional boot partition cannot.</p> </li> <li> <p>ESP and the traditional boot partition are not protected cryptographically neither in structure nor contents. It is expected that loaded files are individually authenticated after being read.</p> </li> <li> <p>The ESP is a <strong>shared resource</strong> — the traditional boot partition a resource specific to each installed Linux OS on the same disk.</p> </li> </ul> <h2>How to Do it Better</h2> <p>Now that we have discussed many of the issues with the status quo ante, let’s see how we can do things better:</p> <ul> <li> <p>Two partitions for essentially the same data is a bad idea. Given they carry data very similar or identical in nature, the common case should be to have only one (but see below).</p> </li> <li> <p>Two file system implementations are worse than one. Given that vFAT is more or less mandated by UEFI and the only format universally understood by all players, and thus has to be used anyway, it might as well be the only file system that is used.</p> </li> <li> <p>Data safety is unnecessarily bad so far: both ESP and boot partition are continuously mounted from the OS, even though access is pretty restricted: outside of update cycles access is typically not required.</p> </li> <li> <p>All partitions should be auto-discoverable/self-descriptive</p> </li> <li> <p>The two partitions should not be exposed as nested mounts to userspace</p> </li> </ul> <p>To be more specific, here’s how I think a better way to set this all up would look like:</p> <ul> <li> <p>Whenever possible, only have <strong>one boot partition</strong>, not two. On EFI systems, make it the ESP. On non-EFI systems use an XBOOTLDR partition instead (see below). Only have both in the case where a Linux OS is installed on a system that already contains an OS with an ESP that is too small to carry sufficient kernels/initrds. When a system contains a XBOOTLDR partition put kernels/initrd on that, otherwise the ESP.</p> </li> <li> <p>Instead of the vaguely defined, traditional Linux “boot” partition use the <strong>XBOOTLDR</strong> partition type as defined by the <a href="https://uapi-group.org/specifications/specs/discoverable_partitions_specification/">Discoverable Partitions Specification</a>. This ensures the partition is discoverable, and can be automatically mounted by things like <a href="https://www.freedesktop.org/software/systemd/man/systemd-gpt-auto-generator.html"><code>systemd-gpt-auto-generator</code></a>. Use XBOOTLDR only if you have to, i.e., when dealing with systems that lack UEFI (and where the ESP hence has no value) or to address the mentioned size issues with the ESP. Note that unlike the traditional boot partition the XBOOTLDR partition is a shared resource, i.e., shared between multiple parallel Linux OS installations on the same disk. Because of this it is typically wise to place a per-OS directory at the top of the XBOOTLDR file system to avoid conflicts.</p> </li> <li> <p>Use <strong>vFAT</strong> for both partitions, it’s the only thing universally understood among relevant firmwares and Linux. It’s simple enough to be useful for untrusted storage. Or to say this differently: writing a file system driver that is not easily vulnerable to rogue disk images is much easier for vFAT than for let’s say btrfs. – But the choice of vFAT implies some care needs to be taken to address the data safety issues it brings, see below.</p> </li> <li> <p>Mount the two partitions via the “<strong>automount</strong>” logic. For example, via systemd’s <a href="https://www.freedesktop.org/software/systemd/man/systemd.automount.html">automount</a> units, with a very short idle time-out (one second or so). This improves data safety immensely, as the file systems will remain mounted (and thus possibly in a “dirty” state) only for very short periods of time, when they are actually accessed – and all that while the fact that they are not mounted continuously is mostly not noticeable for applications as the file system paths remain continuously around. Given that the backing file system (vFAT) has poor data safety properties, it is essential to shorten the access for unclean file system state as much as possible. In fact, this is what the aforementioned <code>systemd-gpt-auto-generator</code> logic actually does by default.</p> </li> <li> <p>Whenever mounting one of the two partitions, do a file system check (<strong>fsck</strong>; in fact this is also what <code>systemd-gpt-auto-generator</code>does by default, hooked into the automount logic, to run on first access). This ensures that even if the file system is in an unclean state it is restored to be clean when needed, i.e., on first access.</p> </li> <li> <p>Do not mount the two partitions <strong>nested</strong>, i.e., no more <code>/boot/efi/</code>. First of all, as mentioned above, it should be possible (and is desirable) to only have one of the two. Hence it is simply a bad idea to require the other as well, just to be able to mount it. More importantly though, by nesting them, automounting is complicated, as it is necessary to trigger the first automount to establish the second automount, which defeats the point of automounting them in the first place. Use the two distinct mount points <code>/efi/</code> (for the ESP) and <code>/boot/</code> (for XBOOTLDR) instead. You might have guessed, but that too is what <code>systemd-gpt-auto-generator</code> does by default.</p> </li> <li> <p>When making additions or updates to ESP/XBOOTLDR from the OS make sure to create a file and write it in full, then <code>syncfs()</code> the whole file system, then rename to give it its final name, and <code>syncfs()</code> again. Similar when removing files.</p> </li> <li> <p>When writing from the boot loader environment/UEFI to ESP/XBOOTLDR, do not append to files or create new files. Instead overwrite already allocated file contents (for example to maintain a random seed file) or rename already allocated files to include information in the file name (and ideally do not increase the file name in length; for example to maintain boot counters).</p> </li> <li> <p>Consider adopting <a href="https://0pointer.net/blog/brave-new-trusted-boot-world.html">UKIs</a>, which minimize the number of files that need to be updated on the ESP/XBOOTLDR during OS/kernel updates (ideally down to 1)</p> </li> <li> <p>Consider adopting <a href="https://www.freedesktop.org/software/systemd/man/systemd-boot.html"><code>systemd-boot</code></a>, which minimizes the number of files that need to be updated on boot loader updates (ideally down to 1)</p> </li> <li> <p>Consider removing any mention of ESP/XBOOTLDR from <code>/etc/fstab</code>, and just let <code>systemd-gpt-auto-generator</code> do its thing.</p> </li> <li> <p>Stop implementing file systems, complex storage, disk encryption, … in your boot loader.</p> </li> </ul> <p>Implementing things like that you gain:</p> <ul> <li> <p><strong>Simplicity</strong>: only one file system implementation, typically only one partition and mount point</p> </li> <li> <p><strong>Robust auto-discovery</strong> of all partitions, no need to even configure <code>/etc/fstab</code></p> </li> <li> <p><strong>Data safety</strong> guarantees as good as possible, given the circumstances</p> </li> </ul> <p>To summarize this in a table:</p> <table> <tr> <th>Type</th> <th>Linux Mount Point</th> <th>File System Choice</th> <th>Automount</th> </tr> <tr> <td>ESP</td> <td><code>/efi/</code></td> <td>vFAT</td> <td>yes</td> </tr> <tr> <td>XBOOTLDR</td> <td><code>/boot/</code></td> <td>vFAT</td> <td>yes</td> </tr> </table> <p>A note regarding modern boot loaders that implement the <a href="https://uapi-group.org/specifications/specs/boot_loader_specification/">Boot Loader Specification</a>: both partitions are explicitly listed in the specification as sources for both Type #1 and Type #2 boot menu entries. Hence, if you use such a modern boot loader (e.g. systemd-boot) these two partitions are the preferred location for boot loader resources, kernels and initrds anyway.</p> <h1>Addendum: You got RAID?</h1> <p>You might wonder, what about RAID setups and the ESP? This comes up regularly in discussions: how to set up the ESP so that (software) RAID1 (mirroring) can be done on the ESP. Long story short: I’d strongly advise against using RAID on the ESP. Firmware typically doesn’t have native RAID support, and given that firmware and boot loader can write to the file systems involved, any attempt to use software RAID on them will mean that a boot cycle might corrupt the RAID sync, and immediately requires a re-synchronization after boot. If RAID1 backing for the ESP is really necessary, the only way to implement that safely would be to implement this as a driver for UEFI – but that creates certain bootstrapping issues (i.e., where to place the driver if not the ESP, a file system the driver is supposed to be used for), and also reimplements a considerable component of the OS storage stack in firmware mode, which seems problematic.</p> <p>So what to do instead? My recommendation would be to solve this via userspace tooling. If redundant disk support shall be implemented for the ESP, then create separate ESPs on all disks, and synchronize them on the file system level instead of the block level. Or in other words, the tools that install/update/manage kernels or boot loaders should be taught to maintain multiple ESPs instead of one. Copy the kernels/boot loader files to all of them, and remove them from all of them. Under the assumption that the goal of RAID is a more reliable system this should be the best way to achieve that, as it doesn’t pretend the firmware could do things it actually cannot do. Moreover it minimizes the complexity of the boot loader, shifting the syncing logic to userspace, where it’s typically easier to get right.</p> <h1>Addendum: Networked Boot</h1> <p>The discussion above focuses on booting up from a local disk. When thinking about networked boot I think two scenarios are particularly relevant:</p> <ol> <li> <p>PXE-style network booting. I think in this mode of operation focus should be on directly booting a single UKI image instead of a boot loader. This sidesteps the whole issue of maintaining any boot partition at all, and simplifies the boot process greatly. In scenarios where this is not sufficient, and an interactive boot menu or other boot loader features are desired, it might be a good idea to take inspiration from the UKI concept, and build a single boot loader EFI binary (such as systemd-boot), and include the UKIs for the boot menu items and other resources inside it via PE sections. Or in other words, build a single boot loader binary that is “supercharged” and contains all auxiliary resources in its own PE sections. (Note: this does not exist, it’s an idea I intend to explore with systemd-boot). Benefit: a single file has to be downloaded via PXE/TFTP, not more. Disadvantage: unused resources are downloaded unnecessarily. Either way: in this context there is no local storage, and the ESP/XBOOTLDR discussion above is without relevance.</p> </li> <li> <p>Initrd-style network booting. In this scenario the boot loader and kernel/initrd (better: UKI) are available on a local disk. The initrd then configures the network and transitions to a network share or file system on a network block device for the root file system. In this case the discussion above applies, and in fact the ESP or XBOOTLDR partition would be the only partition available locally on disk.</p> </li> </ol> <p>And this is all I have for today.</p>Lennart PoetteringThu, 03 Nov 2022 00:00:00 +0100tag:0pointer.net,2022-11-03:/blog/linux-boot-partitions.htmlprojectsBrave New Trusted Boot Worldhttps://0pointer.net/blog/brave-new-trusted-boot-world.html<h1>🔐 Brave New Trusted Boot World 🚀</h1> <p><em>This document looks at the boot process of general purpose Linux distributions. It covers the status quo and how we envision Linux boot to work in the future with a focus on robustness and simplicity.</em></p> <p>This document will assume that the reader has comprehensive familiarity with TPM 2.0 security chips and their capabilities (e.g., PCRs, measurements, SRK), boot loaders, the <code>shim</code> binary, Linux, initrds, UEFI Firmware, PE binaries, and SecureBoot.</p> <h2>Problem Description</h2> <p>Status quo ante of the boot logic on typical Linux distributions:</p> <ul> <li> <p>Most popular Linux distributions generate <code>initrds</code> locally, and they are unsigned, thus not protected through SecureBoot (since that would require local SecureBoot key enrollment, which is generally not done), nor TPM PCRs.</p> </li> <li> <p>Boot chain is typically Firmware → <a href="https://github.com/rhboot/shim"><code>shim</code></a> → <code>grub</code> → Linux kernel → <code>initrd</code> (<code>dracut</code> or similar) → root file system</p> </li> <li> <p>Firmware’s UEFI SecureBoot protects shim, shim’s key management protects grub and kernel. No code signing protects initrd. initrd acquires the key for encrypted root fs from the user (or TPM/FIDO2/PKCS11).</p> </li> <li> <p><code>shim</code>/<code>grub</code>/kernel is measured into TPM PCR 4, among other stuff</p> </li> <li> <p>EFI TPM event log reports measured data into TPM PCRs, and can be used to reconstruct and validate state of TPM PCRs from the used resources.</p> </li> <li> <p>No userspace components are typically measured, except for what IMA measures</p> </li> <li> <p>New kernels require locally generating new boot loader scripts and generating a new initrd each time. OS updates thus mean fragile generation of multiple resources and copying multiple files into the boot partition.</p> </li> </ul> <p>Problems with the status quo ante:</p> <ul> <li> <p>initrd typically unlocks root file system encryption, but is not protected <em>whatsoever</em>, and trivial to attack and modify offline</p> </li> <li> <p>OS updates are brittle: PCR values of grub are very hard to pre-calculate, as grub measures chosen control flow path, not just code images. PCR values vary wildly, and OS provided resources are not measured into separate PCRs. Grub’s PCR measurements might be useful up to a point to reason about the boot after the fact, for the most basic remote attestation purposes, but useless for calculating them ahead of time during the OS build process (which would be desirable to be able to bind secrets to future expected PCR state, for example to bind secrets to an OS in a way that it remain accessible even after that OS is updated).</p> </li> <li> <p>Updates of a boot loader are not robust, require multi-file updates of ESP and boot partition, and regeneration of boot scripts</p> </li> <li> <p>No rollback protection (no way to cryptographically invalidate access to TPM-bound secrets on OS updates)</p> </li> <li> <p>Remote attestation of running software is needlessly complex since initrds are generated locally and thus basically are guaranteed to vary on each system.</p> </li> <li> <p>Locking resources maintained by arbitrary user apps to TPM state (PCRs) is not realistic for general purpose systems, since PCRs will change on every OS update, and there’s no mechanism to re-enroll each such resource before every OS update, and remove the old enrollment after the update.</p> </li> <li> <p>There is no concept to cryptographically invalidate/revoke secrets for an older OS version once updated to a new OS version. An attacker thus can always access the secrets generated on old OSes if they manage to exploit an old version of the OS — even if a newer version already has been deployed.</p> </li> </ul> <p>Goals of the new design:</p> <ul> <li> <p>Provide a <strong>fully signed execution path</strong> from firmware to userspace, no exceptions</p> </li> <li> <p>Provide a <strong>fully measured execution path</strong> from firmware to userspace, no exceptions</p> </li> <li> <p><strong>Separate out TPM PCRs assignments</strong>, by “owner” of measured resources, so that resources can be bound to them in a fine-grained fashion.</p> </li> <li> <p>Allow <strong>easy pre-calculation of expected PCR values</strong> based on booted kernel/initrd, configuration, local identity of the system</p> </li> <li> <p><strong>Rollback protection</strong></p> </li> <li> <p>Simple &amp; robust updates: <strong>one updated file per concept</strong></p> </li> <li> <p><strong>Updates without requiring re-enrollment/local preparation</strong> of the TPM-protected resources (no more “brittle” PCR hashes that must be propagated into every TPM-protected resource on each OS update)</p> </li> <li> <p>System ready for easy <strong>remote attestation</strong>, to prove validity of booted OS, configuration and local identity</p> </li> <li> <p>Ability to <strong>bind secrets to specific phases of the boot</strong>, e.g. the root fs encryption key should be retrievable from the TPM only in the initrd, but not after the host transitioned into the root fs.</p> </li> <li> <p>Reasonably <strong>secure, automatic, unattended unlocking</strong> of disk encryption secrets should be possible.</p> </li> <li> <p>“Democratize” use of PCR policies by defining PCR register meanings, and making binding to them robust against updates, so that <strong>external projects</strong> can safely and securely bind their own data to them (or use them for remote attestation) without risking breakage whenever the OS is updated.</p> </li> <li> <p>Build around <strong>TPM 2.0</strong> (with graceful fallback for TPM-less systems if desired, but TPM 1.2 support is out of scope)</p> </li> </ul> <p>Considered attack scenarios and considerations:</p> <ul> <li> <p>Evil Maid: neither online nor offline (i.e. “at rest”), physical access to a storage device should enable an attacker to read the user’s plaintext data on disk (confidentiality); neither online nor offline, physical access to a storage device should allow undetected modification/backdooring of user data or OS (integrity), or exfiltration of secrets.</p> </li> <li> <p>TPMs are assumed to be reasonably “secure”, i.e. can securely store/encrypt secrets. Communication to TPM is not “secure” though and must be protected on the wire.</p> </li> <li> <p>Similar, the CPU is assumed to be reasonably “secure”</p> </li> <li> <p>SecureBoot is assumed to be reasonably “secure” to permit validated boot up to and including shim+boot loader+kernel (but see discussion below)</p> </li> <li> <p>All user data must be encrypted <em>and</em> authenticated. All vendor and administrator data must be authenticated.</p> </li> <li> <p>It is assumed all software involved regularly contains vulnerabilities and requires frequent updates to address them, plus regular revocation of old versions.</p> </li> <li> <p>It is further assumed that key material used for signing code by the OS vendor can reasonably be kept secure (via use of HSM, and similar, where secret key information never leaves the signing hardware) and does <em>not</em> require frequent roll-over.</p> </li> </ul> <h2>Proposed Construction</h2> <p>Central to the proposed design is the concept of a <strong>Unified Kernel Image (UKI)</strong>. These UKIs are the combination of a Linux kernel image, and initrd, a UEFI boot stub program (and further resources, see below) into one single UEFI PE file that can either be directly invoked by the UEFI firmware (which is useful in particular in some cloud/Confidential Computing environments) or through a boot loader (which is generally useful to implement support for multiple kernel versions, with interactive or automatic selection of image to boot into, potentially with automatic fallback management to increase robustness).</p> <h2>UKI Components</h2> <p>Specifically, UKIs typically consist of the following resources:</p> <ol> <li> <p>An UEFI boot stub that is a small piece of code still running in UEFI mode and that transitions into the Linux kernel included in the UKI (e.g., as implemented in <a href="https://www.freedesktop.org/software/systemd/man/systemd-stub.html"><code>sd-stub</code></a>, see below)</p> </li> <li> <p>The Linux kernel to boot in the <code>.linux</code> PE section</p> </li> <li> <p>The initrd that the kernel shall unpack and invoke in the <code>.initrd</code> PE section</p> </li> <li> <p>A kernel command line string, in the <code>.cmdline</code> PE section</p> </li> <li> <p>Optionally, information describing the OS this kernel is intended for, in the <code>.osrel</code> PE section (derived from <code>/etc/os-release</code> of the booted OS). This is useful for presentation of the UKI in the boot loader menu, and ordering it against other entries, using the included version information.</p> </li> <li> <p>Optionally, information describing kernel release information (i.e. <code>uname -r</code> output) in the <code>.uname</code> PE section. This is also useful for presentation of the UKI in the boot loader menu, and ordering it against other entries.</p> </li> <li> <p>Optionally, a boot splash to bring to screen before transitioning into the Linux kernel in the <code>.splash</code> PE section</p> </li> <li> <p>Optionally, a compiled Devicetree database file, for systems which need it, in the <code>.dtb</code> PE section</p> </li> <li> <p>Optionally, the public key in PEM format that matches the signatures of the <code>.pcrsig</code> PE section (see below), in a <code>.pcrpkey</code> PE section.</p> </li> <li> <p>Optionally, a JSON file encoding expected PCR 11 hash values seen from userspace once the UKI has booted up, along with signatures of these expected PCR 11 hash values, matching a specific public key in the <code>.pcrsig</code> PE section. (Note: we use plural for “values” and “signatures” here, as this JSON file will typically carry a separate value and signature for each PCR bank for PCR 11, i.e. one pair of value and signature for the SHA1 bank, and another pair for the SHA256 bank, and so on. This ensures when enrolling or unlocking a TPM-bound secret we’ll always have a signature around matching the banks available locally (after all, which banks the local hardware supports is up to the hardware). For the sake of simplifying this already overly complex topic, we’ll pretend in the rest of the text there was only one PCR signature per UKI we have to care about, even if this is not actually the case.)</p> </li> </ol> <p>Given UKIs are regular UEFI PE files, they can thus be signed as one for SecureBoot, protecting all of the individual resources listed above at once, and their combination. Standard Linux tools such as <code>sbsigntool</code> and <code>pesign</code> can be used to sign UKI files.</p> <p>UKIs wrap all of the above data in a single file, hence all of the above components can be updated in one go through single file atomic updates, which is useful given that the primary expected storage place for these UKIs is the UEFI System Partition (ESP), which is a vFAT file system, with its limited data safety guarantees.</p> <p>UKIs can be generated via a single, relatively simple objcopy invocation, that glues the listed components together, generating one PE binary that then can be signed for SecureBoot. (For details on building these, see below.)</p> <p>Note that the primary location to place UKIs in is the EFI System Partition (or an otherwise firmware accessible file system). This typically means a VFAT file system of some form. Hence an effective UKI size limit of 4GiB is in place, as that’s the largest file size a FAT32 file system supports.</p> <h2>Basic UEFI Stub Execution Flow</h2> <p>The mentioned UEFI stub program will execute the following operations in UEFI mode before transitioning into the Linux kernel that is included in its <code>.linux</code> PE section:</p> <ol> <li> <p>The PE sections listed are searched for in the invoked UKI the stub is part of, and superficially validated (i.e. general file format is in order).</p> </li> <li> <p>All PE sections listed above of the invoked UKI are measured into TPM PCR 11. This TPM PCR is expected to be all zeroes before the UKI initializes. Pre-calculation is thus very straight-forward if the resources included in the PE image are known. (Note: as a single exception the <code>.pcrsig</code> PE section is excluded from this measurement, as it is supposed to carry the expected result of the measurement, and thus cannot also be input to it, see below for further details about this section.)</p> </li> <li> <p>If the <code>.splash</code> PE section is included in the UKI it is brought onto the screen</p> </li> <li> <p>If the <code>.dtb</code> PE section is included in the UKI it is activated using the Devicetree UEFI “fix-up” protocol</p> </li> <li> <p>If a command line was passed from the boot loader to the UKI executable it is discarded if SecureBoot is enabled and the command line from the <code>.cmdline</code> used. If SecureBoot is disabled and a command line was passed it is used in place of the one from <code>.cmdline</code>. Either way the used command line is measured into TPM PCR 12. (This of course removes any flexibility of control of the kernel command line of the local user. In many scenarios this is probably considered beneficial, but in others it is not, and some flexibility might be desired. Thus, this concept probably needs to be extended sooner or later, to allow more flexible kernel command line policies to be enforced via definitions embedded into the UKI. For example: allowing definition of multiple kernel command lines the user/boot menu can select one from; allowing additional allowlisted parameters to be specified; or even optionally allowing any verification of the kernel command line to be turned off even in SecureBoot mode. It would then be up to the builder of the UKI to decide on the policy of the kernel command line.)</p> </li> <li> <p>It will set a couple of volatile EFI variables to inform userspace about executed TPM PCR measurements (and which PCR registers were used), and other execution properties. (For example: the EFI variable <code>StubPcrKernelImage</code> in the <code>4a67b082-0a4c-41cf-b6c7-440b29bb8c4f</code> vendor namespace indicates the PCR register used for the UKI measurement, i.e. the value “11”).</p> </li> <li> <p>An initrd cpio archive is dynamically synthesized from the <code>.pcrsig</code> and <code>.pcrpkey</code> PE section data (this is later passed to the invoked Linux kernel as additional initrd, to be overlaid with the main initrd from the .initrd section). These files are later available in the <code>/.extra/</code> directory in the initrd context.</p> </li> <li> <p>The Linux kernel from the <code>.linux</code> PE section is invoked with with a combined initrd that is composed from the blob from the <code>.initrd</code> PE section, the dynamically generated initrd containing the <code>.pcrsig</code> and <code>.pcrpkey</code> PE sections, and possibly some additional components like sysexts or syscfgs.</p> </li> </ol> <h2>TPM PCR Assignments</h2> <p>In the construction above we take possession of two PCR registers previously unused on generic Linux distributions:</p> <ul> <li> <p>TPM <strong>PCR 11</strong> shall contain measurements of all components of the UKI (with exception of the <code>.pcrsig</code> PE section, see above). This PCR will also contain measurements of the boot phase once userspace takes over (see below).</p> </li> <li> <p>TPM <strong>PCR 12</strong> shall contain measurements of the used kernel command line. (Plus potentially other forms of parameterization/configuration passed into the UKI, not discussed in this document)</p> </li> </ul> <p>On top of that we intend to define two more PCR registers like this:</p> <ul> <li> <p>TPM <strong>PCR 15</strong> shall contain measurements of the volume encryption key of the root file system of the OS.</p> </li> <li> <p>[TPM <strong>PCR 13</strong> shall contain measurements of additional extension images for the initrd, to enable a modularized initrd – not covered by this document]</p> </li> </ul> <p>(See the <a href="https://uapi-group.org/specifications/specs/linux_tpm_pcr_registry/">Linux TPM PCR Registry</a> for an overview how these four PCRs fit into the list of Linux PCR assignments.)</p> <p>For all four PCRs the assumption is that they are zero before the UKI initializes, and only the data that the UKI and the OS measure into them is included. This makes pre-calculating them straightforward: given a specific set of UKI components, it is immediately clear what PCR values can be expected in PCR 11 once the UKI booted up. Given a kernel command line (and other parameterization/configuration) it is clear what PCR values are expected in PCR 12.</p> <p>Note that these four PCRs are defined by the conceptual “owner” of the resources measured into them. PCR 11 only contains resources the <strong>OS vendor</strong> controls. Thus it is straight-forward for the OS vendor to pre-calculate and then cryptographically sign the expected values for PCR 11. The PCR 11 values will be identical on all systems that run the same version of the UKI. PCR 12 only contains resources the <strong>administrator</strong> controls, thus the administrator can pre-calculate PCR values, and they will be correct on all instances of the OS that use the same parameters/configuration. PCR 15 only contains resources inherently local to the <strong>local system</strong>, i.e. the cryptographic key material that encrypts the root file system of the OS.</p> <p>Separating out these three roles does not imply these actually need to be separate when used. However the assumption is that in many popular environments these three roles should be separate.</p> <p>By separating out these PCRs by the owner’s role, it becomes straightforward to remotely attest, individually, on the software that runs on a node (PCR 11), the configuration it uses (PCR 12) or the identity of the system (PCR 15). Moreover, it becomes straightforward to robustly and securely encrypt data so that it can only be unlocked on a specific set of systems that share the same OS, or the same configuration, or have a specific identity – or a combination thereof.</p> <p>Note that the mentioned PCRs are so far not typically used on generic Linux-based operating systems, to our knowledge. Windows uses them, but given that Windows and Linux should typically not be included in the same boot process this should be unproblematic, as Windows’ use of these PCRs should thus not conflict with ours.</p> <p>To summarize:</p> <table> <tr> <th>PCR</th> <th>Purpose</th> <th>Owner</th> <th>Expected Value before UKI boot</th> <th>Pre-Calculable</th> </tr> <tr> <td><strong>11</strong></td> <td>Measurement of <strong>UKI components</strong> and <strong>boot phases</strong></td> <td>OS Vendor</td> <td>Zero</td> <td>Yes<br/>(at UKI build time)</td> </tr> <tr> <td><strong>12</strong></td> <td>Measurement of <strong>kernel command line,</strong> additional <strong>kernel runtime configuration</strong> such as systemd credentials, systemd syscfg images</td> <td>Administrator</td> <td>Zero</td> <td>Yes<br/>(when system configuration is assembled)</td> </tr> <tr> <td><strong>13</strong> </td> <td><strong>System Extension Images</strong> of initrd<br/>(and possibly more)</td> <td>(Administrator)</td> <td>Zero</td> <td>Yes<br/(when set of extensions is assembled)</td> </tr> <tr> <td><strong>15</strong> </td> <td>Measurement of<strong> root file system volume key</strong><br/>(Possibly later more: measurement of root file system UUIDs and labels and of the machine ID <code>/etc/machine-id</code>)</td> <td>Local System</td> <td>Zero</td> <td>Yes<br/>(after first boot once ll such IDs are determined)</td> </tr> </table> <h2>Signature Keys</h2> <p>In the model above in particular two sets of private/public key pairs are relevant:</p> <ul> <li> <p>The SecureBoot key to sign the UKI PE executable with. This controls permissible choices of OS/kernel</p> </li> <li> <p>The key to sign the expected PCR 11 values with. Signatures made with this key will end up in the <code>.pcrsig</code> PE section. The public key part will end up in the <code>.pcrpkey</code> PE section.</p> </li> </ul> <p>Typically the key pair for the PCR 11 signatures should be chosen with a narrow focus, reused for exactly one specific OS (e.g. “Fedora Desktop Edition”) and the series of UKIs that belong to it (all the way through all the versions of the OS). The SecureBoot signature key can be used with a broader focus, if desired. By keeping the PCR 11 signature key narrow in focus one can ensure that secrets bound to the signature key can only be unlocked on the narrow set of UKIs desired.</p> <h2>TPM Policy Use</h2> <p>Depending on the intended access policy to a resource protected by the TPM, one or more of the PCRs described above should be selected to bind TPM policy to.</p> <p>For example, the root file system encryption key should likely be bound to TPM PCR 11, so that it can only be unlocked if a specific set of UKIs is booted (it should then, once acquired, be measured into PCR 15, as discussed above, so that later TPM objects can be bound to it, further down the chain). With the model described above this is reasonably straight-forward to do:</p> <ul> <li> <p>When userspace wants to bind disk encryption to a specific series of UKIs (“<strong>enrollment</strong>”), it looks for the public key passed to the <code>initrd</code> in the <code>/.extra/</code> directory (which as discussed above originates in the <code>.pcrpkey</code> PE section of the UKI). The relevant userspace component (e.g. <code>systemd</code>) is then responsible for generating a random key to be used as symmetric encryption key for the storage volume (let’s call it <em>disk encryption key _here</em>, DEK_). The TPM is then used to encrypt (“seal”) the DEK with its internal Storage Root Key (TPM SRK). A TPM2 policy is bound to the encrypted DEK. The policy enforces that the DEK may only be decrypted if a valid signature is provided that matches the state of PCR 11 and the public key provided in the <code>/.extra/</code> directory of the <code>initrd</code>. The plaintext DEK key is passed to the kernel to implement disk encryption (e.g. LUKS/dm-crypt). (Alternatively, hardware disk encryption can be used too, i.e. Intel MKTME, AMD SME or even OPAL, all of which are outside of the scope of this document.) The TPM-encrypted version of the DEK which the TPM returned is written to the encrypted volume’s superblock.</p> </li> <li> <p>When userspace wants to <strong>unlock</strong> disk encryption on a specific UKI, it looks for the signature data passed to the initrd in the <code>/.extra/</code> directory (which as discussed above originates in the <code>.pcrsig</code> PE section of the UKI). It then reads the encrypted version of the DEK from the superblock of the encrypted volume. The signature and the encrypted DEK are then passed to the TPM. The TPM then checks if the current PCR 11 state matches the supplied signature from the <code>.pcrsig</code> section and the public key used during enrollment. If all checks out it decrypts (“unseals”) the DEK and passes it back to the OS, where it is then passed to the kernel which implements the symmetric part of disk encryption.</p> </li> </ul> <p>Note that in this scheme the encrypted volume’s DEK is <strong>not</strong> bound to specific literal PCR hash values, but to a public key which is expected to sign PCR hash values.</p> <p>Also note that the state of PCR 11 only matters during unlocking. It is not used or checked when enrolling.</p> <p>In this scenario:</p> <ul> <li> <p>Input to the TPM part of the <strong>enrollment</strong> process are the TPM’s internal SRK, the plaintext DEK provided by the OS, and the public key later used for signing expected PCR values, also provided by the OS. – Output is the encrypted (“sealed”) DEK.</p> </li> <li> <p>Input to the TPM part of the <strong>unlocking</strong> process are the TPM’s internal SRK, the current TPM PCR 11 values, the public key used during enrollment, a signature that matches both these PCR values and the public key, and the encrypted DEK. – Output is the plaintext (“unsealed”) DEK.</p> </li> </ul> <p>Note that sealing/unsealing is done entirely on the TPM chip, the host OS just provides the inputs (well, only the inputs that the TPM chip doesn’t know already on its own), and receives the outputs. With the exception of the plaintext DEK, none of the inputs/outputs are sensitive, and can safely be stored in the open. On the wire the plaintext DEK is protected via TPM parameter encryption (not discussed in detail here because though important not in scope for this document).</p> <p>TPM PCR 11 is the most important of the mentioned PCRs, and its use is thus explained in detail here. The other mentioned PCRs can be used in similar ways, but signatures/public keys must be provided via other means.</p> <p>This scheme builds on the functionality Linux’ LUKS2 functionality provides, i.e. key management supporting multiple slots, and the ability to embed arbitrary metadata in the encrypted volume’s superblock. Note that this means the TPM2-based logic explained here doesn’t have to be the only way to unlock an encrypted volume. For example, in many setups it is wise to enroll both this TPM-based mechanism and an additional “<em>recovery key</em>” (i.e. a high-entropy computer generated passphrase the user can provide manually in case they lose access to the TPM and need to access their data), of which either can be used to unlock the volume.</p> <h2>Boot Phases</h2> <p>Secrets needed during boot-up (such as the root file system encryption key) should typically not be accessible anymore afterwards, to protect them from access if a system is attacked during runtime. To implement this the scheme above is extended in one way: at certain milestones of the boot process additional fixed “words” should be measured into PCR 11. These milestones are placed at conceptual security boundaries, i.e. whenever code transitions from a higher privileged context to a less privileged context.</p> <p>Specifically:</p> <ul> <li> <p>When the initrd initializes (“<code>initrd-enter</code>”)</p> </li> <li> <p>When the initrd transitions into the root file system (“<code>initrd-leave</code>”)</p> </li> <li> <p>When the early boot phase of the OS on the root file system has completed, i.e. all storage and file systems have been set up and mounted, immediately before regular services are started (“<code>sysinit</code>”)</p> </li> <li> <p>When the OS on the root file system completed the boot process far enough to allow unprivileged users to log in (“<code>complete</code>”)</p> </li> <li> <p>When the OS begins shut down (“<code>shutdown</code>”)</p> </li> <li> <p>When the service manager is mostly finished with shutting down and is about to pass control to the final phase of the shutdown logic (“<code>final</code>”)</p> </li> </ul> <p>By measuring these additional words into PCR 11 the distinct phases of the boot process can be distinguished in a relatively straight-forward fashion and the expected PCR values in each phase can be determined.</p> <p>The phases are measured into PCR 11 (as opposed to some other PCR) mostly because available PCRs are scarce, and the boot phases defined are typically specific to a chosen OS, and hence fit well with the other data measured into PCR 11: the UKI which is also specific to the OS. The OS vendor generates both the UKI and defines the boot phases, and thus can safely and reliably pre-calculate/sign the expected PCR values for each phase of the boot.</p> <h2>Revocation/Rollback Protection</h2> <p>In order to secure secrets stored at rest, in particular in environments where unattended decryption shall be possible, it is essential that an attacker cannot use old, known-buggy – but properly signed – versions of software to access them.</p> <p>Specifically, if disk encryption is bound to an OS vendor (via UKIs that include expected PCR values, signed by the vendor’s public key) there must be a mechanism to lock out old versions of the OS or UKI from accessing TPM based secrets once it is determined that the old version is vulnerable.</p> <p>To implement this we propose making use of one of the “counters” TPM 2.0 devices provide: integer registers that are persistent in the TPM and can only be increased on request of the OS, but never be decreased. When sealing resources to the TPM, a policy may be declared to the TPM that restricts how the resources can later be unlocked: here we use one that requires that along with the expected PCR values (as discussed above) a counter integer range is provided to the TPM chip, along with a suitable signature covering both, matching the public key provided during sealing. The sealing/unsealing mechanism described above is thus extended: the signature passed to the TPM during unsealing now covers both the expected PCR values and the expected counter range. To be able to use a signature associated with an UKI provided by the vendor to unseal a resource, the counter thus must be at least increased to the lower end of the range the signature is for. By doing so the ability is lost to unseal the resource for signatures associated with older versions of the UKI, because their upper end of the range disables access once the counter has been increased far enough. By carefully choosing the upper and lower end of the counter range whenever the PCR values for an UKI shall be signed it is thus possible to ensure that updates can invalidate prior versions’ access to resources. By placing some space between the upper and lower end of the range it is possible to allow a controlled level of fallback UKI support, with clearly defined milestones where fallback to older versions of an UKI is not permitted anymore.</p> <p>Example: a hypothetical distribution FooOS releases a regular stream of UKI kernels 5.1, 5.2, 5.3, … It signs the expected PCR values for these kernels with a key pair it maintains in a HSM. When signing UKI 5.1 it includes information directed at the TPM in the signed data declaring that the TPM counter must be above 100, and below 120, in order for the signature to be used. Thus, when the UKI is booted up and used for unlocking an encrypted volume the unlocking code must first increase the counter to 100 if needed, as the TPM will otherwise refuse unlocking the volume. The next release of the UKI, i.e. UKI 5.2 is a feature release, i.e. reverting back to the old kernel locally is acceptable. It thus does not increase the lower bound, but it increases the upper bound for the counter in the signature payload, thus encoding a valid range 100…121 in the signed payload. Now a major security vulnerability is discovered in UKI 5.1. A new UKI 5.3 is prepared that fixes this issue. It is now essential that UKI 5.1 can no longer be used to unlock the TPM secrets. Thus UKI 5.3 will bump the lower bound to 121, and increase the upper bound by one, thus allowing a range 121…122. Or in other words: for each new UKI release the signed data shall include a counter range declaration where the upper bound is increased by one. The lower range is left as-is between releases, except when an old version shall be cut off, in which case it is bumped to one above the upper bound used in that release.</p> <h2>UKI Generation</h2> <p>As mentioned earlier, UKIs are the combination of various resources into one PE file. For most of these individual components there are pre-existing tools to generate the components. For example the included kernel image can be generated with the usual Linux kernel build system. The initrd included in the UKI can be generated with existing tools such as <code>dracut</code> and similar. Once the basic components (<code>.linux</code>, <code>.initrd</code>, <code>.cmdline</code>, <code>.splash</code>, <code>.dtb</code>, <code>.osrel</code>, <code>.uname</code>) have been acquired the combination process works roughly like this:</p> <ol> <li> <p>The expected PCR 11 hashes (and signatures for them) for the UKI are calculated. The tool for that takes all basic UKI components and a signing key as input, and generates a JSON object as output that includes both the literal expected PCR hash values and a signature for them. (For all selected TPM2 banks)</p> </li> <li> <p>The EFI stub binary is now combined with the basic components, the generated JSON PCR signature object from the first step (in the <code>.pcrsig</code> section) and the public key for it (in the <code>.pcrpkey</code> section). This is done via a simple “<code>objcopy</code>” invocation resulting in a single UKI PE binary.</p> </li> <li> <p>The resulting EFI PE binary is then signed for SecureBoot (via a tool such as <a href="https://git.kernel.org/pub/scm/linux/kernel/git/jejb/sbsigntools.git/"><code>sbsign</code></a> or similar).</p> </li> </ol> <p>Note that the UKI model implies pre-built initrds. How to generate these (and securely extend and parameterize them) is outside of the scope of this document, but a related document will be provided highlighting these concepts.</p> <h2>Protection Coverage of SecureBoot Signing and PCRs</h2> <p>The scheme discussed here touches both SecureBoot code signing and TPM PCR measurements. These two distinct mechanisms cover separate parts of the boot process.</p> <p>Specifically:</p> <ul> <li> <p>Firmware/Shim SecureBoot signing covers bootloader and UKI</p> </li> <li> <p>TPM PCR 11 covers the UKI components and boot phase</p> </li> <li> <p>TPM PCR 12 covers admin configuration</p> </li> <li> <p>TPM PCR 15 covers the local identity of the host</p> </li> </ul> <p>Note that this means SecureBoot coverage ends once the system transitions from the initrd into the root file system. It is assumed that trust and integrity have been established before this transition by some means, for example LUKS/dm-crypt/dm-integrity, ideally bound to PCR 11 (i.e. UKI and boot phase).</p> <p>A robust and secure update scheme for PCR 11 (i.e. UKI) has been described above, which allows binding TPM-locked resources to a UKI. For PCR 12 no such scheme is currently designed, but might be added later (use case: permit access to certain secrets only if the system runs with configuration signed by a specific set of keys). Given that resources measured into PCR 15 typically aren’t updated (or if they are updated loss of access to other resources linked to them is desired) no update scheme should be necessary for it.</p> <p>This document focuses on the three PCRs discussed above. Disk encryption and other userspace may choose to also bind to other PCRs. However, doing so means the PCR brittleness issue returns that this design is supposed to remove. PCRs defined by the various firmware UEFI/TPM specifications generally do not know any concept for signatures of expected PCR values.</p> <p>It is known that the industry-adopted SecureBoot signing keys are too broad to act as more than a denylist for known bad code. It is thus probably a good idea to enroll vendor SecureBoot keys wherever possible (e.g. in environments where the hardware is very well known, and VM environments), to raise the bar on preparing rogue UKI-like PE binaries that will result in PCR values that match expectations but actually contain bad code. Discussion about that is however outside of the scope of this document.</p> <h2>Whole OS embedded in the UKI</h2> <p>The above is written under the assumption that the UKI embeds an initrd whose job it is to set up the root file system: find it, validate it, cryptographically unlock it and similar. Once the root file system is found, the system transitions into it.</p> <p>While this is the traditional design and likely what most systems will use, it is also possible to embed a regular root file system into the UKI and avoid any transition to an on-disk root file system. In this mode the whole OS would be encapsulated in the UKI, and signed/measured as one. In such a scenario the whole of the OS must be loaded into RAM and remain there, which typically restricts the general usability of such an approach. However, for specific purposes this might be the design of choice, for example to implement self-sufficient recovery or provisioning systems.</p> <h1>Proposed Implementations &amp; Current Status</h1> <p>The toolset for most of the above is already implemented in systemd and related projects in one way or another. Specifically:</p> <ol> <li> <p>The <a href="https://www.freedesktop.org/software/systemd/man/systemd-stub.html"><code>systemd-stub</code></a> (or short: <code>sd-stub</code>) component implements the discussed UEFI stub program</p> </li> <li> <p>The <a href="https://www.freedesktop.org/software/systemd/man/systemd-measure.html"><code>systemd-measure</code></a> tool can be used to pre-calculate expected PCR 11 values given the UKI components and can sign the result, as discussed in the UKI Image Generation section above.</p> </li> <li> <p>The <a href="https://www.freedesktop.org/software/systemd/man/systemd-cryptenroll.html"><code>systemd-cryptenroll</code></a> and <a href="https://www.freedesktop.org/software/systemd/man/systemd-cryptsetup@.service.html"><code>systemd-cryptsetup</code></a> tools can be used to bind a LUKS2 encrypted file system volume to a TPM and PCR 11 public key/signatures, according to the scheme described above. (The two components also implement a “<em>recovery key</em>” concept, as discussed above)</p> </li> <li> <p>The <a href="https://www.freedesktop.org/software/systemd/man/systemd-pcrphase.service.html"><code>systemd-pcrphase</code></a> component measures specific words into PCR 11 at the discussed phases of the boot process.</p> </li> <li> <p>The <a href="https://www.freedesktop.org/software/systemd/man/systemd-creds.html"><code>systemd-creds</code></a> tool may be used to encrypt/decrypt data objects called “credentials” that can be passed into services and booted systems, and are automatically decrypted (if needed) immediately before service invocation. Encryption is typically bound to the local TPM, to ensure the data cannot be recovered elsewhere.</p> </li> </ol> <p>Note that <a href="https://www.freedesktop.org/software/systemd/man/systemd-stub.html"><code>systemd-stub</code></a> (i.e. the UEFI code glued into the UKI) is distinct from <a href="https://www.freedesktop.org/software/systemd/man/systemd-boot.html"><code>systemd-boot</code></a> (i.e. the UEFI boot loader than can manage multiple UKIs and other boot menu items and implements automatic fallback, an interactive menu and a programmatic interface for the OS among other things). One can be used without the other – both <code>sd-stub</code> without <code>sd-boot</code> and vice versa – though they integrate nicely if used in combination.</p> <p>Note that the mechanisms described are relatively generic, and can be implemented and be consumed in other software too, systemd should be considered a reference implementation, though one that found comprehensive adoption across Linux distributions.</p> <p>Some concepts discussed above are currently not implemented. Specifically:</p> <ol> <li> <p>The rollback protection logic is currently not implemented.</p> </li> <li> <p>The mentioned measurement of the root file system volume key to PCR 15 is implemented, but not merged into the systemd main branch yet.</p> </li> </ol> <h1>The UAPI Group</h1> <p>We recently started a new group for discussing concepts and specifications of basic OS components, including UKIs as described above. It's called <a href="https://uapi-group.org/">the UAPI Group</a>. Please have a look at the various documents and specifications already available there, and expect more to come. Contributions welcome!</p> <h2>Glossary</h2> <blockquote> <p><strong>TPM</strong></p> </blockquote> <p><em>Trusted Platform Module</em>; a security chip found in many modern systems, both physical systems and increasingly also in virtualized environments. Traditionally a discrete chip on the mainboard but today often implemented in firmware, and lately directly in the CPU SoC.</p> <blockquote> <p><strong>PCR</strong></p> </blockquote> <p><em>Platform Configuration Register</em>; a set of registers on a TPM that are initialized to zero at boot. The firmware and OS can “<em>extend</em>” these registers with hashes of data used during the boot process and afterwards. “Extension” means the supplied data is first cryptographically hashed. The resulting hash value is then combined with the previous value of the PCR and the combination hashed again. The result will become the new value of the PCR. By doing this iteratively for all parts of the boot process (always with the data that will be used next during the boot process) a concept of “<em>Measured Boot</em>” can be implemented: as long as every element in the boot chain measures (i.e. extends into the PCR) the next part of the boot like this, the resulting PCR values will prove cryptographically that only a certain set of boot components can have been used to boot up. A standards compliant TPM usually has 24 PCRs, but more than half of those are already assigned specific meanings by the firmware. Some of the others may be used by the OS, of which we use four in the concepts discussed in this document.</p> <blockquote> <p><strong>Measurement</strong></p> </blockquote> <p>The act of “<em>extending</em>” a PCR with some data object.</p> <blockquote> <p><strong>SRK</strong></p> </blockquote> <p><em>Storage Root Key</em>; a special cryptographic key generated by a TPM that never leaves the TPM, and can be used to encrypt/decrypt data passed to the TPM.</p> <blockquote> <p><strong>UKI</strong></p> </blockquote> <p><em>Unified Kernel Image</em>; the concept this document is about. A combination of kernel, <code>initrd</code> and other resources. See above.</p> <blockquote> <p><strong>SecureBoot</strong></p> </blockquote> <p>A mechanism where every software component involved in the boot process is cryptographically signed and checked against a set of public keys stored in the mainboard hardware, implemented in firmware, before it is used.</p> <blockquote> <p><strong>Measured Boot</strong></p> </blockquote> <p>A boot process where each component measures (i.e., hashes and extends into a TPM PCR, see above) the next component it will pass control to before doing so. This serves two purposes: it can be used to bind security policy for encrypted secrets to the resulting PCR values (or signatures thereof, see above), and it can be used to reason about used software after the fact, for example for the purpose of remote attestation.</p> <blockquote> <p><strong>initrd</strong></p> </blockquote> <p>Short for “<em>initial RAM disk</em>”, which – strictly speaking – is a misnomer today, because no RAM disk is anymore involved, but a <code>tmpfs</code> file system instance. Also known as “<code>initramfs</code>”, which is also misleading, given the file system is not <code>ramfs</code> anymore, but <code>tmpfs</code> (both of which are in-memory file systems on Linux, with different semantics). The <code>initrd</code> is passed to the Linux kernel and is basically a file system tree in <code>cpio</code> archive. The kernel unpacks the image into a <code>tmpfs</code> (i.e., into an in-memory file system), and then executes a binary from it. It thus contains the binaries for the first userspace code the kernel invokes. Typically, the <code>initrd</code>’s job is to find the actual root file system, unlock it (if encrypted), and transition into it.</p> <blockquote> <p><strong>UEFI</strong></p> </blockquote> <p>Short for “<em>Unified Extensible Firmware Interface</em>”, it is a widely adopted standard for PC firmware, with native support for <em>SecureBoot</em> and Measured Boot.</p> <blockquote> <p><strong>EFI</strong></p> </blockquote> <p>More or less synonymous to UEFI, IRL.</p> <blockquote> <p><strong>Shim</strong></p> </blockquote> <p>A boot component originating in the Linux world, which in a way extends the public key database SecureBoot maintains (which is under control from Microsoft) with a second layer (which is under control of the Linux distributions and of the owner of the physical device).</p> <blockquote> <p><strong>PE</strong></p> </blockquote> <p><em>Portable Executable</em>; a file format for executable binaries, originally from the Windows world, but also used by UEFI firmware. PE files may contain code and data, categorized in labeled “sections”</p> <blockquote> <p><strong>ESP</strong></p> </blockquote> <p><em>EFI System Partition</em>; a special partition on a storage medium that the firmware is able to look for UEFI PE binaries in to execute at boot.</p> <blockquote> <p><strong>HSM</strong></p> </blockquote> <p><em>Hardware Security Module</em>; a piece of hardware that can generate and store secret cryptographic keys, and execute operations with them, without the keys leaving the hardware (though this is configurable). TPMs can act as HSMs.</p> <blockquote> <p><strong>DEK</strong></p> </blockquote> <p><em>Disk Encryption Key</em>; an asymmetric cryptographic key used for unlocking disk encryption, i.e. passed to LUKS/dm-crypt for activating an encrypted storage volume.</p> <blockquote> <p><strong>LUKS2</strong></p> </blockquote> <p><em>Linux Unified Key Setup Version 2</em>; a specification for a superblock for encrypted volumes widely used on Linux. LUKS2 is the default on-disk format for the <code>cryptsetup</code> suite of tools. It provides flexible key management with multiple independent key slots and allows embedding arbitrary metadata in a JSON format in the superblock.</p> <h2>Thanks</h2> <p><em>I’d like to thank Alain Gefflaut, Anna Trikalinou, Christian Brauner, Daan de Meyer, Luca Boccassi, Zbigniew Jędrzejewski-Szmek for reviewing this text.</em></p>Lennart PoetteringMon, 24 Oct 2022 00:00:00 +0200tag:0pointer.net,2022-10-24:/blog/brave-new-trusted-boot-world.htmlprojectsFitting Everything Togetherhttps://0pointer.net/blog/fitting-everything-together.html<p><em>TLDR: Hermetic <code>/usr/</code> is awesome; let's popularize image-based OSes with modernized security properties built around immutability, SecureBoot, TPM2, adaptability, auto-updating, factory reset, uniformity – built from traditional distribution packages, but deployed via images.</em></p> <p>Over the past years, systemd gained a number of components for building Linux-based operating systems. While these components individually have been adopted by many distributions and products for specific purposes, we did not publicly communicate a broader vision of how they should all fit together in the long run. In this blog story I hope to provide that from my personal perspective, i.e. explain how I <em>personally</em> would build an OS and where I <em>personally</em> think OS development with Linux should go.</p> <p>I figure this is going to be a longer blog story, but I hope it will be equally enlightening. Please understand though that everything I write about OS design here is my personal opinion, and not one of my employer.</p> <p>For the last 12 years or so I have been working on Linux OS development, mostly around <code>systemd</code>. In all those years I had a lot of time thinking about the Linux platform, and specifically traditional Linux distributions and their strengths and weaknesses. I have seen many attempts to reinvent Linux distributions in one way or another, to varying success. After all this most would probably agree that the traditional RPM or dpkg/apt-based distributions still define the Linux platform more than others (for 25+ years now), even though some Linux-based OSes (Android, ChromeOS) probably outnumber the installations overall.</p> <p>And over all those 12 years I kept wondering, how would <em>I</em> actually build an OS for a system or for an appliance, and what are the components necessary to achieve that. And most importantly, how can we make these components generic enough so that they are useful in generic/traditional distributions too, and in other use cases than my own.</p> <h1>The Project</h1> <p>Before figuring out how I would build an OS it's probably good to figure out what type of OS I actually want to build, what purpose I intend to cover. I think a desktop OS is probably the most interesting. Why is that? Well, first of all, I use one of these for my job every single day, so I care immediately, it's my primary tool of work. But more importantly: I think building a desktop OS is one of the most complex overall OS projects you can work on, simply because desktops are so much more versatile and variable than servers or embedded devices. If one figures out the desktop case, I think there's a lot more to learn from, and reuse in the server or embedded case, then going the other way. After all, there's a reason why so much of the widely accepted Linux userspace stack comes from people with a desktop background (including systemd, BTW).</p> <p>So, let's see how <em>I</em> would build a desktop OS. If you press me hard, and ask me why I would do that given that ChromeOS already exists and more or less is a Linux desktop OS: there's plenty I am missing in ChromeOS, but most importantly, I am lot more interested in building something people can easily and naturally rebuild and hack on, i.e. Google-style over-the-wall open source with its skewed power dynamic is not particularly attractive to me. I much prefer building this within the framework of a proper open source community, out in the open, and basing all this strongly on the status quo ante, i.e. the existing distributions. I think it is crucial to provide a clear avenue to build a modern OS based on the existing distribution model, if there shall ever be a chance to make this interesting for a larger audience.</p> <p>(Let me underline though: even though I am going to focus on a desktop here, most of this is directly relevant for servers as well, in particular container host OSes and suchlike, or embedded devices, e.g. car IVI systems and so on.)</p> <h1>Design Goals</h1> <ol> <li> <p>First and foremost, I think the focus must be on an image-based design rather than a package-based one. For robustness and security it is essential to operate with reproducible, immutable images that describe the OS or large parts of it in full, rather than operating always with fine-grained RPM/dpkg style packages. That's not to say that packages are not relevant (I actually think they matter a lot!), but I think they should be less of a tool for deploying code but more one of building the objects to deploy. A different way to see this: any OS built like this must be easy to replicate in a large number of instances, with minimal variability. Regardless if we talk about desktops, servers or embedded devices: focus for my OS should be on "cattle", not "pets", i.e that from the start it's trivial to reuse the well-tested, cryptographically signed combination of software over a large set of devices the same way, with a maximum of bit-exact reuse and a minimum of local variances.</p> </li> <li> <p>The trust chain matters, from the boot loader all the way to the apps. This means all code that is run must be cryptographically validated before it is run. All storage must be cryptographically protected: public data must be integrity checked; private data must remain confidential.</p> <p>This is in fact where big distributions currently fail pretty badly. I would go as far as saying that SecureBoot on Linux distributions is mostly security theater at this point, if you so will. That's because the initrd that unlocks your FDE (i.e. the cryptographic concept that protects the rest of your system) is not signed or protected in any way. It's trivial to modify for an attacker with access to your hard disk in an undetectable way, and collect your FDE passphrase. The involved bureaucracy around the implementation of UEFI SecureBoot of the big distributions is to a large degree pointless if you ask me, given that once the kernel is assumed to be in a good state, as the next step the system invokes completely unsafe code with full privileges.</p> <p>This is a fault of current Linux distributions though, not of SecureBoot in general. Other OSes use this functionality in more useful ways, and we should correct that too.</p> </li> <li> <p>Pretty much the same thing: offline security matters. I want my data to be reasonably safe at rest, i.e. cryptographically inaccessible even when I leave my laptop in my hotel room, suspended.</p> </li> <li> <p>Everything should be cryptographically measured, so that remote attestation is supported for as much software shipped on the OS as possible.</p> </li> <li> <p>Everything should be self descriptive, have single sources of truths that are closely attached to the object itself, instead of stored externally.</p> </li> <li> <p>Everything should be self-updating. Today we know that software is never bug-free, and thus requires a continuous update cycle. Not only the OS itself, but also any extensions, services and apps running on it.</p> </li> <li> <p>Everything should be robust in respect to aborted OS operations, power loss and so on. It should be robust towards hosed OS updates (regardless if the download process failed, or the image was buggy), and not require user interaction to recover from them.</p> </li> <li> <p>There must always be a way to put the system back into a well-defined, guaranteed safe state ("factory reset"). This includes that all sensitive data from earlier uses becomes cryptographically inaccessible.</p> </li> <li> <p>The OS should enforce clear separation between vendor resources, system resources and user resources: conceptually and when it comes to cryptographical protection.</p> </li> <li> <p>Things should be adaptive: the system should come up and make the best of the system it runs on, adapt to the storage and hardware. Moreover, the system should support execution on bare metal equally well as execution in a VM environment and in a container environment (i.e. <code>systemd-nspawn</code>).</p> </li> <li> <p>Things should not require explicit installation. i.e. every image should be a live image. For installation it should be sufficient to <code>dd</code> an OS image onto disk. Thus, strong focus on "instantiate on first boot", rather than "instantiate before first boot".</p> </li> <li> <p>Things should be reasonably minimal. The image the system starts its life with should be quick to download, and not include resources that can as well be created locally later.</p> </li> <li> <p>System identity, local cryptographic keys and so on should be generated locally, not be pre-provisioned, so that there's no leak of sensitive data during the transport onto the system possible.</p> </li> <li> <p>Things should be reasonably democratic and hackable. It should be easy to fork an OS, to modify an OS and still get reasonable cryptographic protection. Modifying your OS should not necessarily imply that your "warranty is voided" and you lose all good properties of the OS, if you so will.</p> </li> <li> <p>Things should be reasonably modular. The privileged part of the core OS must be extensible, including on the individual system. It's not sufficient to support extensibility just through high-level UI applications.</p> </li> <li> <p>Things should be reasonably uniform, i.e. ideally the same formats and cryptographic properties are used for all components of the system, regardless if for the host OS itself or the payloads it receives and runs.</p> </li> <li> <p>Even taking all these goals into consideration, it should still be close to traditional Linux distributions, and take advantage of what they are really good at: integration and security update cycles.</p> </li> </ol> <p>Now that we know our goals and requirements, let's start designing the OS along these lines.</p> <h1>Hermetic <code>/usr/</code></h1> <p>First of all the OS resources (code, data files, …) should be <em>hermetic</em> in an immutable <code>/usr/</code>. This means that a <code>/usr/</code> tree should carry everything needed to set up the minimal set of directories and files outside of <code>/usr/</code> to make the system work. This <code>/usr/</code> tree can then be mounted read-only into the writable root file system that then will eventually carry the local configuration, state and user data in <code>/etc/</code>, <code>/var/</code> and <code>/home/</code> as usual.</p> <p>Thankfully, modern distributions are surprisingly close to working without issues in such a hermetic context. Specifically, Fedora works mostly just fine: it has adopted the <code>/usr/</code> merge and the declarative <a href="https://www.freedesktop.org/software/systemd/man/systemd-sysusers.html"><code>systemd-sysusers</code></a> and <a href="https://www.freedesktop.org/software/systemd/man/systemd-tmpfiles-setup.service.html"><code>systemd-tmpfiles</code></a> components quite comprehensively, which means the directory trees outside of <code>/usr/</code> are automatically generated as needed if missing. In particular <code>/etc/passwd</code> and <code>/etc/group</code> (and related files) are appropriately populated, should they be missing entries.</p> <p>In my model a hermetic OS is hence comprehensively defined within <code>/usr/</code>: combine the <code>/usr/</code> tree with an empty, otherwise unpopulated root file system, and it will boot up successfully, automatically adding the strictly necessary files, and resources that are necessary to boot up.</p> <p>Monopolizing vendor OS resources and definitions in an immutable <code>/usr/</code> opens multiple doors to us:</p> <ul> <li> <p>We can apply <code>dm-verity</code> to the whole <code>/usr/</code> tree, i.e. guarantee structural, cryptographic integrity on the whole vendor OS resources at once, with full file system metadata.</p> </li> <li> <p>We can implement updates to the OS easily: by implementing an A/B update scheme on the <code>/usr/</code> tree we can update the OS resources atomically and robustly, while leaving the rest of the OS environment untouched.</p> </li> <li> <p>We can implement factory reset easily: erase the root file system and reboot. The hermetic OS in <code>/usr/</code> has all the information it needs to set up the root file system afresh — exactly like in a new installation.</p> </li> </ul> <h1>Initial Look at the Partition Table</h1> <p>So let's have a look at a suitable partition table, taking a hermetic <code>/usr/</code> into account. Let's conceptually start with a table of four entries:</p> <ol> <li> <p>An UEFI System Partition (required by firmware to boot)</p> </li> <li> <p>Immutable, Verity-protected, signed file system with the <code>/usr/</code> tree in version A</p> </li> <li> <p>Immutable, Verity-protected, signed file system with the <code>/usr/</code> tree in version B</p> </li> <li> <p>A writable, encrypted root file system</p> </li> </ol> <p>(This is just for initial illustration here, as we'll see later it's going to be a bit more complex in the end.)</p> <p>The <a href="https://systemd.io/DISCOVERABLE_PARTITIONS">Discoverable Partitions Specification</a> provides suitable partition types UUIDs for all of the above partitions. Which is great, because it makes the image self-descriptive: simply by looking at the image's GPT table we know what to mount where. This means we do not need a manual <code>/etc/fstab</code>, and a multitude of tools such as <code>systemd-nspawn</code> and similar can operate directly on the disk image and boot it up.</p> <h1>Booting</h1> <p>Now that we have a rough idea how to organize the partition table, let's look a bit at how to boot into that. Specifically, in my model "unified kernels" are the way to go, specifically those implementing <a href="https://systemd.io/BOOT_LOADER_SPECIFICATION">Boot Loader Specification Type #2</a>. These are basically kernel images that have an initial RAM disk attached to them, as well as a kernel command line, a boot splash image and possibly more, all wrapped into a single UEFI PE binary. By combining these into one we achieve two goals: they become extremely easy to update (i.e. drop in one file, and you update kernel+initrd) and more importantly, you can sign them as one for the purpose of UEFI SecureBoot.</p> <p>In my model, each version of such a kernel would be associated with exactly one version of the <code>/usr/</code> tree: both are always updated at the same time. An update then becomes relatively simple: drop in one new <code>/usr/</code> file system plus one kernel, and the update is complete.</p> <p>The boot loader used for all this would be <a href="https://www.freedesktop.org/software/systemd/man/systemd-boot.html">systemd-boot</a>, of course. It's a very simple loader, and implements the aforementioned boot loader specification. This means it requires no explicit configuration or anything: it's entirely sufficient to drop in one such unified kernel file, and it will be picked up, and be made a candidate to boot into.</p> <p>You might wonder how to configure the root file system to boot from with such a unified kernel that contains the kernel command line and is signed as a whole and thus immutable. The idea here is to use the <code>usrhash=</code> kernel command line option implemented by <a href="https://www.freedesktop.org/software/systemd/man/systemd-veritysetup-generator.html">systemd-veritysetup-generator</a> and <a href="https://www.freedesktop.org/software/systemd/man/systemd-fstab-generator.html">systemd-fstab-generator</a>. It does two things: it will search and set up a <code>dm-verity</code> volume for the <code>/usr/</code> file system, and then mount it. It takes the root hash value of the <code>dm-verity</code> Merkle tree as the parameter. This hash is then also used to find the <code>/usr/</code> partition in the GPT partition table, under the assumption that the partition UUIDs are derived from it, as per the suggestions in the discoverable partitions specification (see above).</p> <p><code>systemd-boot</code> (if not told otherwise) will do a version sort of the kernel image files it finds, and then automatically boot the newest one. Picking a specific kernel to boot will also fixate which version of the <code>/usr/</code> tree to boot into, because — as mentioned — the Verity root hash of it is built into the kernel command line the unified kernel image contains.</p> <p>In my model I'd place the kernels directly into the UEFI System Partition (ESP), in order to simplify things. (<code>systemd-boot</code> also supports reading them from a separate boot partition, but let's not complicate things needlessly, at least for now.)</p> <p>So, with all this, we now already have a boot chain that goes something like this: once the boot loader is run, it will pick the newest kernel, which includes the initial RAM disk and a secure reference to the <code>/usr/</code> file system to use. This is already great. But a <code>/usr/</code> alone won't make us happy, we also need a root file system. In my model, that file system would be writable, and the <code>/etc/</code> and <code>/var/</code> hierarchies would be located directly on it. Since these trees potentially contain secrets (SSH keys, …) the root file system needs to be encrypted. We'll use LUKS2 for this, of course. In my model, I'd bind this to the TPM2 chip (for compatibility with systems lacking one, we can find a suitable fallback, which then provides weaker guarantees, see below). A TPM2 is a security chip available in most modern PCs. Among other things it contains a persistent secret key that can be used to encrypt data, in a way that only if you possess access to it and can prove you are using validated software you can decrypt it again. The cryptographic measuring I mentioned earlier is what allows this to work. But … let's not get lost too much in the details of TPM2 devices, that'd be material for a novel, and this blog story is going to be way too long already.</p> <p>What does using a TPM2 bound key for unlocking the root file system get us? We can encrypt the root file system with it, and you can only read or make changes to the root file system if you also possess the TPM2 chip and run our validated version of the OS. This protects us against an <em>evil</em> <em>maid</em> scenario to some level: an attacker cannot just copy the hard disk of your laptop while you leave it in your hotel room, because unless the attacker also steals the TPM2 device it cannot be decrypted. The attacker can also not just modify the root file system, because such changes would be detected on next boot because they aren't done with the right cryptographic key.</p> <p>So, now we have a system that already can boot up somewhat completely, and run userspace services. All code that is run is verified in some way: the <code>/usr/</code> file system is Verity protected, and the root hash of it is included in the kernel that is signed via UEFI SecureBoot. And the root file system is locked to the TPM2 where the secret key is only accessible if our signed OS + <code>/usr/</code> tree is used.</p> <p>(One brief intermission here: so far all the components I am referencing here exist already, and have been shipped in <code>systemd</code> and other projects already, including the TPM2 based disk encryption. There's one thing missing here however at the moment that still needs to be developed (happy to take PRs!): right now TPM2 based LUKS2 unlocking is bound to PCR hash values. This is hard to work with when implementing updates — what we'd need instead is unlocking by signatures of PCR hashes. TPM2 supports this, but we don't support it yet in our <code>systemd-cryptsetup</code> + <code>systemd-cryptenroll</code> stack.)</p> <p>One of the goals mentioned above is that cryptographic key material should always be generated locally on first boot, rather than pre-provisioned. This of course has implications for the encryption key of the root file system: if we want to boot into this system we need the root file system to exist, and thus a key already generated that it is encrypted with. But where precisely would we generate it if we have no installer which could generate while installing (as it is done in traditional Linux distribution installers). My proposed solution here is to use <a href="https://www.freedesktop.org/software/systemd/man/systemd-repart.html"><code>systemd-repart</code></a>, which is a declarative, purely additive repartitioner. It can run from the initrd to create and format partitions on boot, before transitioning into the root file system. It can also format the partitions it creates and encrypt them, automatically enrolling an TPM2-bound key.</p> <p>So, let's revisit the partition table we mentioned earlier. Here's what in my model we'd actually ship in the initial image:</p> <ol> <li> <p>An UEFI System Partition (ESP)</p> </li> <li> <p>An immutable, Verity-protected, signed file system with the <code>/usr/</code> tree in version A</p> </li> </ol> <p>And that's already it. No root file system, no B <code>/usr/</code> partition, nothing else. Only two partitions are shipped: the ESP with the <code>systemd-boot</code> loader and one unified kernel image, and the A version of the <code>/usr/</code> partition. Then, on first boot <code>systemd-repart</code> will notice that the root file system doesn't exist yet, and will create it, encrypt it, and format it, and enroll the key into the TPM2. It will also create the second <code>/usr/</code> partition (B) that we'll need for later A/B updates (which will be created empty for now, until the first update operation actually takes place, see below). Once done the initrd will combine the fresh root file system with the shipped <code>/usr/</code> tree, and transition into it. Because the OS is hermetic in <code>/usr/</code> and contains all the <code>systemd-tmpfiles</code> and <code>systemd-sysuser</code> information it can then set up the root file system properly and create any directories and symlinks (and maybe a few files) necessary to operate.</p> <p>Besides the fact that the root file system's encryption keys are generated on the system we boot from and never leave it, it is also pretty nice that the root file system will be sized dynamically, taking into account the physical size of the backing storage. This is perfect, because on first boot the image will automatically adapt to what it has been <code>dd</code>'ed onto.</p> <h1>Factory Reset</h1> <p>This is a good point to talk about the factory reset logic, i.e. the mechanism to place the system back into a known good state. This is important for two reasons: in our laptop use case, once you want to pass the laptop to someone else, you want to ensure your data is fully and comprehensively erased. Moreover, if you have reason to believe your device was hacked you want to revert the device to a known good state, i.e. ensure that exploits cannot persist. <code>systemd-repart</code> already has a mechanism for it. In the declarations of the partitions the system should have, entries may be marked to be candidates for erasing on factory reset. The actual factory reset is then requested by one of two means: by specifying a specific kernel command line option (which is not too interesting here, given we lock that down via UEFI SecureBoot; but then again, one could also add a second kernel to the ESP that is identical to the first, with only different that it lists this command line option: thus when the user selects this entry it will initiate a factory reset) — and via an EFI variable that can be set and is honoured on the immediately following boot. So here's how a factory reset would then go down: once the factory reset is requested it's enough to reboot. On the subsequent boot <code>systemd-repart</code> runs from the initrd, where it will honour the request and erase the partitions marked for erasing. Once that is complete the system is back in the state we shipped the system in: only the ESP and the <code>/usr/</code> file system will exist, but the root file system is gone. And from here we can continue as on the original first boot: create a new root file system (and any other partitions), and encrypt/set it up afresh.</p> <p>So now we have a nice setup, where everything is either signed or encrypted securely. The system can adapt to the system it is booted on automatically on first boot, and can easily be brought back into a well defined state identical to the way it was shipped in.</p> <h1>Modularity</h1> <p>But of course, such a monolithic, immutable system is only useful for very specific purposes. If <code>/usr/</code> can't be written to, – at least in the traditional sense – one cannot just go and install a new software package that one needs. So here two goals are superficially conflicting: on one hand one wants modularity, i.e. the ability to add components to the system, and on the other immutability, i.e. that precisely this is prohibited.</p> <p>So let's see what I propose as a middle ground in my model. First, what's the precise use case for such modularity? I see a couple of different ones:</p> <ol> <li> <p>For some cases it is necessary to extend the system itself at the lowest level, so that the components added in extend (or maybe even replace) the resources shipped in the base OS image, so that they live in the same namespace, and are subject to the same security restrictions and privileges. Exposure to the details of the base OS and its interface for this kind of modularity is at the maximum.</p> <p>Example: a module that adds a debugger or tracing tools into the system. Or maybe an optional hardware driver module.</p> </li> <li> <p>In other cases, more isolation is preferable: instead of extending the system resources directly, additional services shall be added in that bring their own files, can live in their own namespace (but with "windows" into the host namespaces), however still are system components, and provide services to other programs, whether local or remote. Exposure to the details of the base OS for this kind of modularity is restricted: it mostly focuses on the ability to consume and provide IPC APIs from/to the system. Components of this type can still be highly privileged, but the level of integration is substantially smaller than for the type explained above.</p> <p>Example: a module that adds a specific VPN connection service to the OS.</p> </li> <li> <p>Finally, there's the actual payload of the OS. This stuff is relatively isolated from the OS and definitely from each other. It mostly consumes OS APIs, and generally doesn't provide OS APIs. This kind of stuff runs with minimal privileges, and in its own namespace of concepts.</p> <p>Example: a desktop app, for reading your emails.</p> </li> </ol> <p>Of course, the lines between these three types of modules are blurry, but I think distinguishing them does make sense, as I think different mechanisms are appropriate for each. So here's what I'd propose in my model to use for this.</p> <ol> <li> <p>For the system extension case I think the <a href="https://www.freedesktop.org/software/systemd/man/systemd-sysext.html"><code>systemd-sysext</code></a> images are appropriate. This tool operates on system extension images that are very similar to the host's disk image: they also contain a <code>/usr/</code> partition, protected by Verity. However, they just include additions to the host image: binaries that extend the host. When such a system extension image is activated, it is merged via an immutable <code>overlayfs</code> mount into the host's <code>/usr/</code> tree. Thus any file shipped in such a system extension will suddenly appear as if it was part of the host OS itself. For optional components that should be considered part of the OS more or less this is a very simple and powerful way to combine an immutable OS with an immutable extension. Note that most likely extensions for an OS matching this tool should be built at the same time within the same update cycle scheme as the host OS itself. After all, the files included in the extensions will have dependencies on files in the system OS image, and care must be taken that these dependencies remain in order.</p> </li> <li> <p>For adding in additional somewhat isolated system services in my model, <a href="https://systemd.io/PORTABLE_SERVICES">Portable Services</a> are the proposed tool of choice. Portable services are in most ways just like regular system services; they could be included in the system OS image or an extension image. However, portable services use <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RootImage="><code>RootImage=</code></a> to run off separate disk images, thus within their own namespace. Images set up this way have various ways to integrate into the host OS, as they are in most ways regular system services, which just happen to bring their own directory tree. Also, unlike regular system services, for them sandboxing is opt-out rather than opt-in. In my model, here too the disk images are Verity protected and thus immutable. Just like the host OS they are GPT disk images that come with a <code>/usr/</code> partition and Verity data, along with signing.</p> </li> <li> <p>Finally, the actual payload of the OS, i.e. the apps. To be useful in real life here it is important to hook into existing ecosystems, so that a large set of apps are available. Given that on Linux flatpak (or on servers OCI containers) are the established format that pretty much won they are probably the way to go. That said, I think both of these mechanisms have relatively weak properties, in particular when it comes to security, since immutability/measurements and similar are not provided. This means, unlike for system extensions and portable services a complete trust chain with attestation and per-app cryptographically protected data is much harder to implement sanely.</p> </li> </ol> <p>What I'd like to underline here is that the main system OS image, as well as the system extension images and the portable service images are put together the same way: they are GPT disk images, with one immutable file system and associated Verity data. The latter two should also contain a PKCS#7 signature for the top-level Verity hash. This uniformity has many benefits: you can use the same tools to build and process these images, but most importantly: by using a single way to validate them throughout the stack (i.e. Verity, in the latter cases with PKCS#7 signatures), validation and measurement is straightforward. In fact it's so obvious that we don't even have to implement it in systemd: the kernel has direct support for this Verity signature checking natively already (IMA).</p> <p>So, by composing a system at runtime from a host image, extension images and portable service images we have a nicely modular system where every single component is cryptographically validated on every single IO operation, and every component is measured, in its entire combination, directly in the kernel's IMA subsystem.</p> <p>(Of course, once you add the desktop apps or OCI containers on top, then these properties are lost further down the chain. But well, a lot is already won, if you can close the chain that far down.)</p> <p>Note that system extensions are not designed to replicate the fine grained packaging logic of RPM/dpkg. Of course, <code>systemd-sysext</code> is a generic tool, so you can use it for whatever you want, but there's a reason it does not bring support for a dependency language: the goal here is not to replicate traditional Linux packaging (we have that already, in RPM/dpkg, and I think they are actually OK for what they do) but to provide delivery of larger, coarser sets of functionality, in lockstep with the underlying OS' life-cycle and in particular with no interdependencies, except on the underlying OS.</p> <p>Also note that depending on the use case it might make sense to also use system extensions to modularize the <code>initrd</code> step. This is probably less relevant for a desktop OS, but for server systems it might make sense to package up support for specific complex storage in a <code>systemd-sysext</code> system extension, which can be applied to the initrd that is built into the unified kernel. (In fact, we have been working on implementing signed yet modular initrd support to general purpose Fedora this way.)</p> <p>Note that portable services are composable from system extension too, by the way. This makes them even more useful, as you can share a common runtime between multiple portable service, or even use the host image as common runtime for portable services. In this model a common runtime image is shared between one or more system extensions, and composed at runtime via an <code>overlayfs</code> instance.</p> <h1>More Modularity: Secondary OS Installs</h1> <p>Having an immutable, cryptographically locked down host OS is great I think, and if we have some moderate modularity on top, that's also great. But oftentimes it's useful to be able to depart/compromise for some specific use cases from that, i.e. provide a bridge for example to allow workloads designed around RPM/dpkg package management to coexist reasonably nicely with such an immutable host.</p> <p>For this purpose in my model I'd propose using <code>systemd-nspawn</code> containers. The containers are focused on OS containerization, i.e. they allow you to run a full OS with init system and everything as payload (unlike for example Docker containers which focus on a single service, and where running a full OS in it is a mess).</p> <p>Running <code>systemd-nspawn</code> containers for such secondary OS installs has various nice properties. One of course is that <code>systemd-nspawn</code> supports the same level of cryptographic image validation that we rely on for the host itself. Thus, to some level the whole OS trust chain is reasonably recursive if desired: the firmware validates the OS, and the OS can validate a secondary OS installed within it. In fact, we can run our trusted OS recursively on itself and get similar security guarantees! Besides these security aspects, <code>systemd-nspawn</code> also has really nice properties when it comes to integration with the host. For example the <code>--bind-user=</code> permits binding a host user record and their directory into a container as a simple one step operation. This makes it extremely easy to have a single user and <code>$HOME</code> but share it concurrently with the host <em>and</em> a zoo of secondary OSes in <code>systemd-nspawn</code> containers, which each could run different distributions even.</p> <h1>Developer Mode</h1> <p>Superficially, an OS with an immutable <code>/usr/</code> appears much less <em>hackable</em> than an OS where everything is writable. Moreover, an OS where everything must be signed and cryptographically validated makes it hard to insert your own code, given you are unlikely to possess access to the signing keys.</p> <p>To address this issue other systems have supported a "developer" mode: when entered the security guarantees are disabled, and the system can be freely modified, without cryptographic validation. While that's a great concept to have I doubt it's what most developers really want: the cryptographic properties of the OS are great after all, it sucks having to give them up once developer mode is activated.</p> <p>In my model I'd thus propose two different approaches to this problem. First of all, I think there's value in allowing users to additively extend/override the OS via local developer <a href="https://0pointer.net/blog/testing-my-system-code-in-usr-without-modifying-usr.html">system extensions</a>. With this scheme the underlying cryptographic validation would remain in tact, but — if this form of development mode is explicitly enabled – the developer could add in more resources from local storage, that are not tied to the OS builder's chain of trust, but a local one (i.e. simply backed by encrypted storage of some form).</p> <p>The second approach is to make it easy to extend (or in fact replace) the set of trusted validation keys, with local ones that are under the control of the user, in order to make it easy to operate with kernel, OS, extension, portable service or container images signed by the local developer without involvement of the OS builder. This is relatively easy to do for components down the trust chain, i.e. the elements further up the chain should optionally allow additional certificates to allow validation with.</p> <p>(Note that systemd currently has no explicit support for a "developer" mode like this. I think we should add that sooner or later however.)</p> <h1>Democratizing Code Signing</h1> <p>Closely related to the question of developer mode is the question of code signing. If you ask me, the status quo of UEFI SecureBoot code signing in the major Linux distributions is pretty sad. The work to get stuff signed is massive, but in effect it delivers very little in return: because initrds are entirely unprotected, and reside on partitions lacking any form of cryptographic integrity protection any attacker can trivially easily modify the boot process of any such Linux system and freely collected FDE passphrases entered. There's little value in signing the boot loader and kernel in a complex bureaucracy if it then happily loads entirely unprotected code that processes the actually relevant security credentials: the FDE keys.</p> <p>In my model, through use of unified kernels this important gap is closed, hence UEFI SecureBoot code signing becomes an integral part of the boot chain from firmware to the host OS. Unfortunately, code signing – and having something a user can locally hack, is to some level conflicting. However, I think we can improve the situation here, and put more emphasis on enrolling developer keys in the trust chain easily. Specifically, I see one relevant approach here: enrolling keys directly in the firmware is something that we should make less of a theoretical exercise and more something we can realistically deploy. See <a href="https://github.com/systemd/systemd/pull/20255#issuecomment-1098334694">this work in progress</a> making this more automatic and eventually safe. Other approaches are thinkable (including some that build on existing MokManager infrastructure), but given the politics involved, are harder to conclusively implement.</p> <h1>Running the OS itself in a container</h1> <p>What I explain above is put together with running on a bare metal system in mind. However, one of the stated goals is to make the OS adaptive enough to also run in a container environment (specifically: <code>systemd-nspawn</code>) nicely. Booting a disk image on bare metal or in a VM generally means that the UEFI firmware validates and invokes the boot loader, and the boot loader invokes the kernel which then transitions into the final system. This is different for containers: here the container manager immediately calls the init system, i.e. PID 1. Thus the validation logic must be different: cryptographic validation must be done by the container manager. In my model this is solved by shipping the OS image not only with a Verity data partition (as is already necessary for the UEFI SecureBoot trust chain, see above), but also with another partition, containing a PKCS#7 signature of the root hash of said Verity partition. This of course is exactly what I propose for both the system extension and portable service image. Thus, in my model the images for all three uses are put together the same way: an immutable <code>/usr/</code> partition, accompanied by a Verity partition and a PKCS#7 signature partition. The OS image itself then has two ways "into" the trust chain: either through the signed unified kernel in the ESP (which is used for bare metal and VM boots) <em>or</em> by using the PKCS#7 signature stored in the partition (which is used for container/<code>systemd-nspawn</code> boots).</p> <h1>Parameterizing Kernels</h1> <p>A fully immutable and signed OS has to establish trust in the user data it makes use of before doing so. In the model I describe here, for <code>/etc/</code> and <code>/var/</code> we do this via disk encryption of the root file system (in combination with integrity checking). But the point where the root file system is mounted comes relatively late in the boot process, and thus cannot be used to parameterize the boot itself. In many cases it's important to be able to parameterize the boot process however.</p> <p>For example, for the implementation of the developer mode indicated above it's useful to be able to pass this fact safely to the initrd, in combination with other fields (e.g. hashed root password for allowing in-initrd logins for debug purposes). After all, if the initrd is pre-built by the vendor and signed as whole together with the kernel it cannot be modified to carry such data directly (which is in fact how parameterizing of the initrd to a large degree was traditionally done).</p> <p>In my model this is achieved through <a href="https://systemd.io/CREDENTIALS/">system credentials</a>, which allow passing parameters to systems (and services for the matter) in an encrypted and authenticated fashion, bound to the TPM2 chip. This means that we can securely pass data into the initrd so that it can be authenticated and decrypted only on the system it is intended for and with the unified kernel image it was intended for.</p> <h1>Swap</h1> <p>In my model the OS would also carry a swap partition. For the simple reason that only then <a href="https://www.freedesktop.org/software/systemd/man/systemd-oomd.service.html"><code>systemd-oomd.service</code></a> can provide the best results. Also see <a href="https://chrisdown.name/2018/01/02/in-defence-of-swap.html">In defence of swap: common misconceptions</a></p> <h1>Updating Images</h1> <p>We have a rough idea how the system shall be organized now, let's next focus on the deployment cycle: software needs regular update cycles, and software that is not updated regularly is a security problem. Thus, I am sure that any modern system must be automatically updated, without this requiring avoidable user interaction.</p> <p>In my model, this is the job for <a href="https://www.freedesktop.org/software/systemd/man/systemd-sysupdate.html">systemd-sysupdate</a>. It's a relatively simple A/B image updater: it operates either on partitions, on regular files in a directory, or on subdirectories in a directory. Each entry has a version (which is encoded in the GPT partition label for partitions, and in the filename for regular files and directories): whenever an update is initiated the oldest version is erased, and the newest version is downloaded.</p> <p>With the setup described above a system update becomes a really simple operation. On each update the <code>systemd-sysupdate</code> tool downloads a <code>/usr/</code> file system partition, an accompanying Verity partition, a PKCS#7 signature partition, and drops it into the host's partition table (where it possibly replaces the oldest version so far stored there). Then it downloads a unified kernel image and drops it into the EFI System Partition's <code>/EFI/Linux</code> (as per Boot Loader Specification; possibly erase the oldest such file there). And that's already the whole update process: four files are downloaded from the server, unpacked and put in the most straightforward of ways into the partition table or file system. Unlike in other OS designs there's no mechanism required to explicitly switch to the newer version, the aforementioned <code>systemd-boot</code> logic will automatically pick the newest kernel once it is dropped in.</p> <p>Above we talked a lot about modularity, and how to put systems together as a combination of a host OS image, system extension images for the initrd and the host, portable service images and <code>systemd-nspawn</code> container images. I already emphasized that these image files are actually always the same: GPT disk images with partition definitions that match the Discoverable Partition Specification. This comes very handy when thinking about updating: we can use the exact same <code>systemd-sysupdate</code> tool for updating these other images as we use for the host image. The uniformity of the on-disk format allows us to update them uniformly too.</p> <h1>Boot Counting + Assessment</h1> <p>Automatic OS updates do not come without risks: if they happen automatically, and an update goes wrong this might mean your system might be automatically updated into a brick. This of course is less than ideal. Hence it is essential to address this reasonably automatically. In my model, there's systemd's <a href="https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT">Automatic Boot Assessment</a> for that. The mechanism is simple: whenever a new unified kernel image is dropped into the system it will be stored with a small integer counter value included in the filename. Whenever the unified kernel image is selected for booting by <code>systemd-boot</code>, it is decreased by one. Once the system booted up successfully (which is determined by userspace) the counter is removed from the file name (which indicates "this entry is known to work"). If the counter ever hits zero, this indicates that it tried to boot it a couple of times, and each time failed, thus is apparently "bad". In this case <code>systemd-boot</code> will not consider the kernel anymore, and revert to the next older (that doesn't have a counter of zero).</p> <p>By sticking the boot counter into the filename of the unified kernel we can directly attach this information to the kernel, and thus need not concern ourselves with cleaning up secondary information about the kernel when the kernel is removed. Updating with a tool like <code>systemd-sysupdate</code> remains a very simple operation hence: drop one old file, add one new file.</p> <h1>Picking the Newest Version</h1> <p>I already mentioned that <code>systemd-boot</code> automatically picks the newest unified kernel image to boot, by looking at the version encoded in the filename. This is done via a simple <a href="https://man7.org/linux/man-pages/man3/strverscmp.3.html"><code>strverscmp()</code></a> call (well, truth be told, it's a modified version of that call, different from the one implemented in libc, because real-life package managers use more complex rules for comparing versions these days, and hence it made sense to do that here too). The concept of having multiple entries of some resource in a directory, and picking the newest one automatically is a powerful concept, I think. It means adding/removing new versions is extremely easy (as we discussed above, in <code>systemd-sysupdate</code> context), and allows stateless determination of what to use.</p> <p>If <code>systemd-boot</code> can do that, what about system extension images, portable service images, or <code>systemd-nspawn</code> container images that do not actually use <code>systemd-boot</code> as the entrypoint? All these tools actually implement the very same logic, but on the partition level: if multiple suitable <code>/usr/</code> partitions exist, then the newest is determined by comparing the GPT partition label of them.</p> <p>This is in a way the counterpart to the <code>systemd-sysupdate</code> update logic described above: we always need a way to determine which partition to actually then use after the update took place: and this becomes very easy each time: enumerate possible entries, pick the newest as per the (modified) <code>strverscmp()</code> result.</p> <h1>Home Directory Management</h1> <p>In my model the device's users and their home directories are managed by <a href="https://www.freedesktop.org/software/systemd/man/systemd-homed.service.html"><code>systemd-homed</code></a>. This means they are relatively self-contained and can be migrated easily between devices. The numeric UID assignment for each user is done at the moment of login only, and the files in the home directory are mapped as needed via a <code>uidmap</code> mount. It also allows us to protect the data of each user individually with a credential that belongs to the user itself. i.e. instead of binding confidentiality of the user's data to the system-wide full-disk-encryption each user gets their own encrypted home directory where the user's authentication token (password, FIDO2 token, PKCS#11 token, recovery key…) is used as authentication and decryption key for the user's data. This brings a major improvement for security as it means the user's data is cryptographically inaccessible except when the user is actually logged in.</p> <p>It also allows us to correct another major issue with traditional Linux systems: the way how data encryption works during system suspend. Traditionally on Linux the disk encryption credentials (e.g. LUKS passphrase) is kept in memory also when the system is suspended. This is a bad choice for security, since many (most?) of us probably never turn off their laptop but suspend it instead. But if the decryption key is always present in unencrypted form during the suspended time, then it could potentially be read from there by a sufficiently equipped attacker.</p> <p>By encrypting the user's home directory with the user's authentication token we can first safely "suspend" the home directory before going to the system suspend state (i.e. flush out the cryptographic keys needed to access it). This means any process currently accessing the home directory will be frozen for the time of the suspend, but that's expected anyway during a system suspend cycle. Why is this better than the status quo ante? In this model the home directory's cryptographic key material is erased during suspend, but it can be safely reacquired on resume, from system code. If the system is only encrypted as a whole however, then the system code itself couldn't reauthenticate the user, because it would be frozen too. By separating home directory encryption from the root file system encryption we can avoid this problem.</p> <h1>Partition Setup</h1> <p>So we discussed the organization of the partitions OS images multiple times in the above, each time focusing on a specific aspect. Let's now summarize how this should look like all together.</p> <p>In my model, the initial, shipped OS image should look roughly like this:</p> <ul> <li>(1) An UEFI System Partition, with <code>systemd-boot</code> as boot loader and one unified kernel</li> <li>(2) A <code>/usr/</code> partition (version "A"), with a label <code>fooOS_0.7</code> (under the assumption we called our project <code>fooOS</code> and the image version is <code>0.7</code>).</li> <li>(3) A Verity partition for the <code>/usr/</code> partition (version "A"), with the same label</li> <li>(4) A partition carrying the Verity root hash for the <code>/usr/</code> partition (version "A"), along with a PKCS#7 signature of it, also with the same label</li> </ul> <p>On first boot this is augmented by <code>systemd-repart</code> like this:</p> <ul> <li>(5) A second <code>/usr/</code> partition (version "B"), initially with a label <code>_empty</code> (which is the label <code>systemd-sysupdate</code> uses to mark partitions that currently carry no valid payload)</li> <li>(6) A Verity partition for that (version "B"), similar to the above case, also labelled <code>_empty</code></li> <li>(7) And ditto a Verity root hash partition with a PKCS#7 signature (version "B"), also labelled <code>_empty</code></li> <li>(8) A root file system, encrypted and locked to the TPM2</li> <li>(9) A home file system, integrity protected via a key also in TPM2 (encryption is unnecessary, since <code>systemd-homed</code> adds that on its own, and it's nice to avoid duplicate encryption)</li> <li>(10) A swap partition, encrypted and locked to the TPM2</li> </ul> <p>Then, on the first OS update the partitions 5, 6, 7 are filled with a new version of the OS (let's say <code>0.8</code>) and thus get their label updated to <code>fooOS_0.8</code>. After a boot, this version is active.</p> <p>On a subsequent update the three partitions <code>fooOS_0.7</code> get wiped and replaced by <code>fooOS_0.9</code> and so on.</p> <p>On factory reset, the partitions 8, 9, 10 are deleted, so that <code>systemd-repart</code> recreates them, using a new set of cryptographic keys.</p> <p>Here's a graphic that hopefully illustrates the partition stable from shipped image, through first boot, multiple update cycles and eventual factory reset:</p> <p><a href="images/partitions.svg"><img alt="Partitions Overview" src="images/partitions.svg" width="640"></a></p> <h1>Trust Chain</h1> <p>So let's summarize the intended chain of trust (for bare metal/VM boots) that ensures every piece of code in this model is signed and validated, and any system secret is locked to TPM2.</p> <ol> <li> <p>First, firmware (or possibly shim) authenticates <code>systemd-boot</code>.</p> </li> <li> <p>Once <code>systemd-boot</code> picks a unified kernel image to boot, it is also authenticated by firmware/shim.</p> </li> <li> <p>The unified kernel image contains an initrd, which is the first userspace component that runs. It finds any system extensions passed into the initrd, and sets them up through Verity. The kernel will validate the Verity root hash signature of these system extension images against its usual keyring.</p> </li> <li> <p>The initrd also finds credentials passed in, then securely unlocks (which means: decrypts + authenticates) them with a secret from the TPM2 chip, locked to the kernel image itself.</p> </li> <li> <p>The kernel image also contains a kernel command line which contains a <code>usrhash=</code> option that pins the root hash of the <code>/usr/</code> partition to use.</p> </li> <li> <p>The initrd then unlocks the encrypted root file system, with a secret bound to the TPM2 chip.</p> </li> <li> <p>The system then transitions into the main system, i.e. the combination of the Verity protected <code>/usr/</code> and the encrypted root files system. It then activates two more encrypted (and/or integrity protected) volumes for <code>/home/</code> and swap, also with a secret tied to the TPM2 chip.</p> </li> </ol> <p>Here's an attempt to illustrate the above graphically:</p> <p><a href="images/trustchain.svg"><img alt="Trust Chain" src="images/trustchain.svg" width="640"></a></p> <p>This is the trust chain of the basic OS. Validation of system extension images, portable service images, <code>systemd-nspawn</code> container images always takes place the same way: the kernel validates these Verity images along with their PKCS#7 signatures against the kernel's keyring.</p> <h1>File System Choice</h1> <p>In the above I left the choice of file systems unspecified. For the immutable <code>/usr/</code> partitions <code>squashfs</code> might be a good candidate, but any other that works nicely in a read-only fashion and generates reproducible results is a good choice, too. The home directories as managed by <code>systemd-homed</code> should certainly use <code>btrfs</code>, because it's the only general purpose file system supporting online grow and shrink, which <code>systemd-homed</code> can take benefit of, to manage storage.</p> <p>For the root file system <code>btrfs</code> is likely also the best idea. That's because we intend to use LUKS/<code>dm-crypt</code> underneath, which by default only provides confidentiality, not authenticity of the data (unless combined with <code>dm-integrity</code>). Since <code>btrfs</code> (unlike xfs/ext4) does full data checksumming it's probably the best choice here, since it means we don't have to use <code>dm-integrity</code> (which comes at a higher performance cost).</p> <h1>OS Installation vs. OS Instantiation</h1> <p>In the discussion above a lot of focus was put on setting up the OS and completing the partition layout and such on first boot. This means installing the OS becomes as simple as <code>dd</code>-ing (i.e. "streaming") the shipped disk image into the final HDD medium. Simple, isn't it?</p> <p>Of course, such a scheme is just <em>too</em> simple for many setups in real life. Whenever multi-boot is required (i.e. co-installing an OS implementing this model with another unrelated one), <code>dd</code>-ing a disk image onto the HDD is going to overwrite user data that was supposed to be kept around.</p> <p>In order to cover for this case, in my model, we'd use <code>systemd-repart</code> (again!) to allow streaming the source disk image into the target HDD in a smarter, additive way. The tool after all is purely additive: it will add in partitions or grow them if they are missing or too small. <code>systemd-repart</code> already has all the necessary provisions to not only create a partition on the target disk, but also copy blocks from a raw installer disk. An install operation would then become a two stop process: one invocation of <code>systemd-repart</code> that adds in the <code>/usr/</code>, its Verity and the signature partition to the target medium, populated with a copy of the same partition of the installer medium. And one invocation of <code>bootctl</code> that installs the <code>systemd-boot</code> boot loader in the ESP. (Well, there's one thing missing here: the unified OS kernel also needs to be dropped into the ESP. For now, this can be done with a simple <code>cp</code> call. In the long run, this should probably be something <code>bootctl</code> can do as well, if told so.)</p> <p>So, with this scheme we have a simple scheme to cover all bases: we can either just <code>dd</code> an image to disk, or we can stream an image onto an existing HDD, adding a couple of new partitions and files to the ESP.</p> <p>Of course, in reality things are more complex than that even: there's a good chance that the existing ESP is simply too small to carry multiple unified kernels. In my model, the way to address this is by shipping two slightly different <code>systemd-repart</code> partition definition file sets: the <em>ideal</em> case when the ESP is large enough, and a <em>fallback</em> case, where it isn't and where we then add in an addition XBOOTLDR partition (as per the Discoverable Partitions Specification). In that mode the ESP carries the boot loader, but the unified kernels are stored in the XBOOTLDR partition. This scenario is not quite as simple as the XBOOTLDR-less scenario described first, but is equally well supported in the various tools. Note that <code>systemd-repart</code> can be told size constraints on the partitions it shall create or augment, thus to implement this scheme it's enough to invoke the tool with the fallback partition scheme if invocation with the ideal scheme fails.</p> <p>Either way: regardless how the partitions, the boot loader and the unified kernels ended up on the system's hard disk, on first boot the code paths are the same again: <code>systemd-repart</code> will be called to augment the partition table with the root file system, and properly encrypt it, as was already discussed earlier here. This means: all cryptographic key material used for disk encryption is generated on first boot only, the installer phase does not encrypt anything.</p> <h1>Live Systems vs. Installer Systems vs. Installed Systems</h1> <p>Traditionally on Linux three types of systems were common: "installed" systems, i.e. that are stored on the main storage of the device and are the primary place people spend their time in; "installer" systems which are used to install them and whose job is to copy and setup the packages that make up the installed system; and "live" systems, which were a middle ground: a system that behaves like an installed system in most ways, but lives on removable media.</p> <p>In my model I'd like to remove the distinction between these three concepts as much as possible: each of these three images should carry the exact same <code>/usr/</code> file system, and should be suitable to be replicated the same way. Once installed the resulting image can also act as an installer for another system, and so on, creating a certain "viral" effect: if you have one image or installation it's automatically something you can replicate 1:1 with a simple <code>systemd-repart</code> invocation.</p> <h1>Building Images According to this Model</h1> <p>The above explains how the image should look like and how its first boot and update cycle will modify it. But this leaves one question unanswered: how to actually build the initial image for OS instances according to this model?</p> <p>Note that there's nothing too special about the images following this model: they are ultimately just GPT disk images with Linux file systems, following the Discoverable Partition Specification. This means you can use any set of tools of your choice that can put together GPT disk images for compliant images.</p> <p>I personally would use <a href="https://github.com/systemd/mkosi"><code>mkosi</code></a> for this purpose though. It's designed to generate compliant images, and has a rich toolset for SecureBoot and signed/Verity file systems already in place.</p> <p>What is key here is that this model doesn't depart from RPM and dpkg, instead it builds on top of that: in this model they are excellent for putting together images on the build host, but deployment onto the runtime host does not involve individual packages.</p> <p>I think one cannot underestimate the value traditional distributions bring, regarding security, integration and general polishing. The concepts I describe above are inherited from this, but depart from the idea that distribution packages are a runtime concept and make it a build-time concept instead.</p> <p>Note that the above is pretty much independent from the underlying distribution.</p> <h1>Final Words</h1> <p>I have no illusions, general purpose distributions are not going to adopt this model as their default any time soon, and it's not even my goal that they do that. The above is <em>my</em> <em>personal</em> vision, and I don't expect people to buy into it 100%, and that's fine. However, what I am interested in is finding the overlaps, i.e. work with people who buy 50% into this vision, and share the components.</p> <p>My goals here thus are to:</p> <ol> <li> <p>Get distributions to move to a model where images like this can be built from the distribution easily. Specifically this means that distributions make their OS hermetic in <code>/usr/</code>.</p> </li> <li> <p>Find the overlaps, share components with other projects to revisit how distributions are put together. This is already happening, see <code>systemd-tmpfiles</code> and <code>systemd-sysuser</code> support in various distributions, but I think there's more to share.</p> </li> <li> <p>Make people interested in building actual real-world images based on general purpose distributions adhering to the model described above. I'd love a "GnomeBook" image with full trust properties, that is built from <em>true</em> Linux distros, such as Fedora or ArchLinux.</p> </li> </ol> <h1>FAQ</h1> <ol> <li> <p><em>What about <code>ostree</code>? Doesn't <code>ostree</code> already deliver what this blog story describes?</em></p> <p><code>ostree</code> is fine technology, but in respect to security and robustness properties it's not too interesting I think, because unlike image-based approaches it cannot really deliver integrity/robustness guarantees over the whole tree easily. To be able to trust an <code>ostree</code> setup you have to establish trust in the underlying file system first, and the complexity of the file system makes that challenging. To provide an effective offline-secure trust chain through the whole depth of the stack it is essential to cryptographically validate every single I/O operation. In an image-based model this is trivially easy, but in <code>ostree</code> model it's with current file system technology not possible and even if this is added in one way or another in the future (though I am not aware of anyone doing on-access file-based integrity that spans a whole hierarchy of files that was compatible with <code>ostree</code>'s hardlink farm model) I think validation is still at too high a level, since Linux file system developers made very clear their implementations are not robust to rogue images. (There's <a href="https://github.com/ostreedev/ostree-rs-ext/issues/288">this stuff planned</a>, but doing structural authentication ahead of time instead of on access makes the idea to weak — and I'd expect too slow — in my eyes.)</p> <p>With my design I want to deliver similar security guarantees as ChromeOS does, but <code>ostree</code> is much weaker there, and I see no perspective of this changing. In a way <code>ostree</code>'s integrity checks are similar to RPM's and enforced on download rather than on access. In the model I suggest above, it's always on access, and thus safe towards offline attacks (i.e. evil maid attacks). In today's world, I think offline security is absolutely necessary though.</p> <p>That said, <code>ostree</code> does have some benefits over the model described above: it naturally shares file system inodes if many of the modules/images involved share the same data. It's thus more space efficient on disk (and thus also in RAM/cache to some degree) by default. In my model it would be up to the image builders to minimize shipping overly redundant disk images, by making good use of suitably composable system extensions.</p> </li> <li> <p><em>What about configuration management?</em></p> <p>At first glance immutable systems and configuration management don't go that well together. However, do note, that in the model I propose above the root file system with all its contents, including <code>/etc/</code> and <code>/var/</code> is actually writable and can be modified like on any other typical Linux distribution. The only exception is <code>/usr/</code> where the immutable OS is hermetic. That means configuration management tools should work just fine in this model – up to the point where they are used to install additional RPM/dpkg packages, because that's something not allowed in the model above: packages need to be installed at image build time and thus on the image build host, not the runtime host.</p> </li> <li> <p><em>What about non-UEFI and non-TPM2 systems?</em></p> <p>The above is designed around the feature set of contemporary PCs, and this means UEFI and TPM2 being available (simply because the PC is pretty much defined by the Windows platform, and current versions of Windows require both).</p> <p>I think it's important to make the best of the features of today's PC hardware, and then find suitable fallbacks on more limited hardware. Specifically this means: if there's desire to implement something like the this on non-UEFI or non-TPM2 hardware we should look for suitable fallbacks for the individual functionality, but generally try to add glue to the old systems so that conceptually they behave more like the new systems instead of the other way round. Or in other words: most of the above is not strictly tied to UEFI or TPM2, and for many cases <em>already</em> there are reasonably fallbacks in place for more limited systems. Of course, without TPM2 many of the security guarantees will be weakened.</p> </li> <li> <p><em>How would you name an OS built that way?</em></p> <p>I think a desktop OS built this way if it has the GNOME desktop should of course be called <em>GnomeBook</em>, to mimic the <em>ChromeBook</em> name. ;-)</p> <p>But in general, I'd call hermetic, adaptive, immutable OSes like this "<em>particles</em>".</p> </li> </ol> <h1>How can you help?</h1> <ol> <li> <p><em>Help making Distributions Hermetic in <code>/usr/</code>!</em></p> <p>One of the core ideas of the approach described above is to make the OS <em>hermetic</em> in <code>/usr/</code>, i.e. make it carry a comprehensive description of what needs to be set up outside of it when instantiated. Specifically, this means that system users that are needed are declared in <code>systemd-sysusers</code> snippets, and skeleton files and directories are created via <code>systemd-tmpfiles</code>. Moreover additional partitions should be declared via <code>systemd-repart</code> drop-ins.</p> <p>At this point some distributions (such as Fedora) are (probably more by accident than on purpose) already mostly hermetic in <code>/usr/</code>, at least for the most basic parts of the OS. However, this is not complete: many daemons require to have specific resources set up in <code>/var/</code> or <code>/etc/</code> before they can work, and the relevant packages do not carry <code>systemd-tmpfiles</code> descriptions that add them if missing. So there are two ways you could help here: politically, it would be highly relevant to convince distributions that an OS that is hermetic in <code>/usr/</code> is highly desirable and it's a worthy goal for packagers to get there. More specifically, it would be desirable if RPM/dpkg packages would ship with enough <code>systemd-tmpfiles</code> information so that configuration files the packages strictly need for operation are symlinked (or copied) from <code>/usr/share/factory/</code> if they are missing (even better of course would be if packages from their upstream sources on would just work with an empty <code>/etc/</code> and <code>/var/</code>, and create themselves what they need and default to good defaults in absence of configuration files).</p> <p>Note that distributions that adopted <code>systemd-sysusers</code>, <code>systemd-tmpfiles</code> and the <code>/usr/</code> merge are already quite close to providing an OS that is hermetic in <code>/usr/</code>. These were the big, the major advancements: making the image fully hermetic should be less controversial – at least that's my guess.</p> <p>Also note that making the OS hermetic in <code>/usr/</code> is not just useful in scenarios like the above. It also means that stuff <a href="https://0pointer.net/blog/testing-my-system-code-in-usr-without-modifying-usr.html">like this</a> and <a href="https://0pointer.net/blog/running-an-container-off-the-host-usr.html">like this</a> can work well.</p> </li> <li> <p><em>Fill in the gaps!</em></p> <p>I already mentioned a couple of missing bits and pieces in the implementation of the overall vision. In the <code>systemd</code> project we'd be delighted to review/merge any PRs that fill in the voids.</p> </li> <li> <p><em>Build your own OS like this!</em></p> <p>Of course, while we built all these building blocks and they have been adopted to various levels and various purposes in the various distributions, no one so far built an OS that puts things together just like that. It would be excellent if we had communities that work on building images like what I propose above. i.e. if you want to work on making a secure GnomeBook as I suggest above a reality that would be more than welcome.</p> <p>How could this look like specifically? Pick an existing distribution, write a set of <code>mkosi</code> descriptions plus some additional drop-in files, and then build this on some build infrastructure. While doing so, report the gaps, and help us address them.</p> </li> </ol> <h1>Further Documentation of Used Components and Concepts</h1> <ol> <li><a href="https://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html"><code>systemd-tmpfiles</code></a></li> <li><a href="https://www.freedesktop.org/software/systemd/man/systemd-sysusers.html"><code>systemd-sysusers</code></a></li> <li><a href="https://www.freedesktop.org/software/systemd/man/systemd-boot.html"><code>systemd-boot</code></a></li> <li><a href="https://www.freedesktop.org/software/systemd/man/systemd-stub.html"><code>systemd-stub</code></a></li> <li><a href="https://www.freedesktop.org/software/systemd/man/systemd-sysext.html"><code>systemd-sysext</code></a></li> <li><a href="https://www.freedesktop.org/software/systemd/man/systemd-portabled.service.html"><code>systemd-portabled</code></a>, <a href="https://systemd.io/PORTABLE_SERVICES">Portable Services Introduction</a></li> <li><a href="https://www.freedesktop.org/software/systemd/man/systemd-repart.html"><code>systemd-repart</code></a></li> <li><a href="https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html"><code>systemd-nspawn</code></a></li> <li><a href="https://www.freedesktop.org/software/systemd/man/systemd-sysupdate.html"><code>systemd-sysupdate</code></a></li> <li><a href="https://www.freedesktop.org/software/systemd/man/systemd-creds.html"><code>systemd-creds</code></a>, <a href="https://systemd.io/CREDENTIALS">System and Service Credentials</a></li> <li><a href="https://www.freedesktop.org/software/systemd/man/systemd-homed.service.html"><code>systemd-homed</code></a></li> <li><a href="https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT">Automatic Boot Assessment</a></li> <li><a href="https://systemd.io/BOOT_LOADER_SPECIFICATION">Boot Loader Specification</a></li> <li><a href="https://systemd.io/DISCOVERABLE_PARTITIONS">Discoverable Partitions Specification</a></li> <li><a href="https://systemd.io/BUILDING_IMAGES">Safely Building Images</a></li> </ol> <h1>Earlier Blog Stories Related to this Topic</h1> <ol> <li><a href="https://0pointer.net/blog/authenticated-boot-and-disk-encryption-on-linux.html">The Strange State of Authenticated Boot and Disk Encryption on Generic Linux Distributions</a></li> <li><a href="https://0pointer.net/blog/the-wondrous-world-of-discoverable-gpt-disk-images.html">The Wondrous World of Discoverable GPT Disk Images</a></li> <li><a href="https://0pointer.net/blog/unlocking-luks2-volumes-with-tpm2-fido2-pkcs11-security-hardware-on-systemd-248.html">Unlocking LUKS2 volumes with TPM2, FIDO2, PKCS#11 Security Hardware on systemd 248</a></li> <li><a href="https://0pointer.net/blog/walkthrough-for-portable-services.html">Portable Services with systemd v239</a></li> <li><a href="https://0pointer.net/blog/mkosi-a-tool-for-generating-os-images.html">mkosi — A Tool for Generating OS Images</a></li> </ol> <p>And that's all for now.</p>Lennart PoetteringTue, 03 May 2022 00:00:00 +0200tag:0pointer.net,2022-05-03:/blog/fitting-everything-together.htmlprojectsTesting my System Code in /usr/ Without Modifying /usr/https://0pointer.net/blog/testing-my-system-code-in-usr-without-modifying-usr.html<p><a href="https://0pointer.net/blog/running-an-container-off-the-host-usr.html">I recently blogged</a> about how to run a volatile <code>systemd-nspawn</code> container from your host's <code>/usr/</code> tree, for quickly testing stuff in your host environment, sharing your home drectory, but all that without making a single modification to your host, and on an isolated node.</p> <p>The one-liner discussed in that blog story is great for testing during system software development. Let's have a look at another <code>systemd</code> tool that I regularly use to test things during <code>systemd</code> development, in a relatively safe environment, but still taking full benefit of my host's setup.</p> <p>Since a while now, systemd has been shipping with a simple component called <a href="https://www.freedesktop.org/software/systemd/man/systemd-sysext.html"><code>systemd-sysext</code></a>. It's primary usecase goes something like this: on one hand OS systems with immutable <code>/usr/</code> hierarchies are fantastic for security, robustness, updating and simplicity, but on the other hand not being able to quickly add stuff to <code>/usr/</code> is just annoying.</p> <p><code>systemd-sysext</code> is supposed to bridge this contradiction: when invoked it will merge a bunch of "system extension" images into <code>/usr/</code> (and <code>/opt/</code> as a matter of fact) through the use of read-only <code>overlayfs</code>, making all files shipped in the image instantly and <em>atomically</em> appear in <code>/usr/</code> during runtime — as if they always had been there. Now, let's say you are building your locked down OS, with an immutable <code>/usr/</code> tree, and it comes without ability to log into, without debugging tools, without anything you want and need when trying to debug and fix something in the system. With <code>systemd-sysext</code> you could use a system extension image that contains all this, drop it into the system, and activate it with <code>systemd-sysext</code> so that it genuinely extends the host system.</p> <p>(There are many other usecases for this tool, for example, you could build systems that way that at their base use a generic image, but by installing one or more system extensions get extended to with additional more specific functionality, or drivers, or similar. The tool is generic, use it for whatever you want, but for now let's not get lost in listing all the possibilites.)</p> <p>What's particularly nice about the tool is that it supports automatically discovered <code>dm-verity</code> images, with signatures and everything. So you can even do this in a fully authenticated, measured, safe way. But I am digressing…</p> <p>Now that we (hopefully) have a rough understanding what <code>systemd-sysext</code> is and does, let's discuss how specficially we can use this in the context of system software development, to safely use and test bleeding edge development code — built freshly from your project's build tree – in your host OS without having to risk that the host OS is corrupted or becomes unbootable by stuff that didn't quite yet work the way it was envisioned:</p> <p>The images <code>systemd-sysext</code> merges into <code>/usr/</code> can be of two kinds: disk images with a file system/verity/signature, or simple, plain directory trees. To make these images available to the tool, they can be placed or symlinked into <code>/usr/lib/extensions/</code>, <code>/var/lib/extensions/</code>, <code>/run/extensions/</code> (and a bunch of others). So if we now install our freshly built development software into a subdirectory of those paths, then that's entirely sufficient to make them valid system extension images in the sense of <code>systemd-sysext</code>, and thus can be merged into <code>/usr/</code> to try them out.</p> <p>To be more specific: when I develop <code>systemd</code> itself, here's what I do regularly, to see how my new development version would behave on my host system. As preparation I checked out the systemd development git tree first of course, hacked around in it a bit, then built it with meson/ninja. And now I want to test what I just built:</p> <div class="highlight"><pre><span></span><code>sudo DESTDIR=/run/extensions/systemd-test meson install -C build --quiet --no-rebuild &amp;&amp; sudo systemd-sysext refresh --force </code></pre></div> <p>Explanation: first, we'll install my current build tree as a system extension into <code>/run/extensions/systemd-test/</code>. And then we apply it to the host via the <code>systemd-sysext refresh</code> command. This command will search for all installed system extension images in the aforementioned directories, then unmount (i.e. "unmerge") any previously merged dirs from <code>/usr/</code> and then freshly mount (i.e. "merge") the new set of system extensions on top of <code>/usr/</code>. And just like that, I have installed my development tree of <code>systemd</code> into the host OS, and all that without actually modifying/replacing even a single file on the host at all. Nothing here actually hit the disk!</p> <p>Note that all this works on any system really, it is not necessary that the underlying OS even is designed with immutability in mind. Just because the tool was developed with immutable systems in mind it doesn't mean you couldn't use it on traditional systems where <code>/usr/</code> is mutable as well. In fact, my development box actually runs regular Fedora, i.e. is RPM-based and thus has a mutable <code>/usr/</code> tree. As long as system extensions are applied the whole of <code>/usr/</code> becomes read-only though.</p> <p>Once I am done testing, when I want to revert to how things were without the image installed, it is sufficient to call:</p> <div class="highlight"><pre><span></span><code>sudo systemd-sysext unmerge </code></pre></div> <p>And there you go, all files my development tree generated are gone again, and the host system is as it was before (and <code>/usr/</code> mutable again, in case one is on a traditional Linux distribution).</p> <p>Also note that a reboot (regardless if a <em>clean</em> one or an <em>abnormal</em> shutdown) will undo the whole thing automatically, since we installed our build tree into <code>/run/</code> after all, i.e. a <code>tmpfs</code> instance that is flushed on boot. And given that the <code>overlayfs</code> merge is a runtime thing, too, the whole operation was executed without any persistence. Isn't that great?</p> <p>(You might wonder why I specified <code>--force</code> on the <code>systemd-sysext refresh</code> line earlier. That's because <code>systemd-sysext</code> actually does some minimal version compatibility checks when applying system extension images. For that it will look at the host's <code>/etc/os-release</code> file with <code>/usr/lib/extension-release.d/extension-release.&lt;name&gt;</code>, and refuse operaton if the image is not actually built for the host OS version. Here we don't want to bother with dropping that file in there, we <em>know</em> already that the extension image is compatible with the host, as we just built it on it. <code>--force</code> allows us to skip the version check.)</p> <p>You might wonder: what about the combination of the idea from the previous blog story (regarding running container's off the host <code>/usr/</code> tree) with system extensions? Glad you asked. Right now we have no support for this, but it's high on our TODO list (patches welcome, of course!). i.e. a new switch for <code>systemd-nspawn</code> called <code>--system-extension=</code> that would allow merging one or more such extensions into the container tree booted would be stellar. With that, with a single command I could run a container off my host OS but with a development version of systemd dropped in, all without any persistence. How awesome would that be?</p> <p>(Oh, and in case you wonder, all of this only works with distributions that have completed the <code>/usr/</code> merge. On legacy distributions that didn't do that and still place parts of <code>/usr/</code> all over the hierarchy the above won't work, since merging <code>/usr/</code> trees via <code>overlayfs</code> is pretty pointess if the OS is not hermetic in <code>/usr/</code>.)</p> <p>And that's all for now. Happy hacking!</p>Lennart PoetteringWed, 27 Apr 2022 00:00:00 +0200tag:0pointer.net,2022-04-27:/blog/testing-my-system-code-in-usr-without-modifying-usr.htmlprojectsRunning a Container off the Host /usr/https://0pointer.net/blog/running-an-container-off-the-host-usr.html<p>Apparently, in some parts of <a href="https://lwn.net/Articles/890219/">this world</a>, the <code>/usr/</code>-merge transition is still ongoing. Let's take the opportunity to have a look at one specific way to take benefit of the <code>/usr/</code>-merge (and associated work) IRL.</p> <p>I develop system-level software as you might know. Oftentimes I want to run my development code on my PC but be reasonably sure it cannot destroy or otherwise negatively affect my host system. Now I could set up a container tree for that, and boot into that. But often I am too lazy for that, I don't want to bother with a slow package manager setting up a new OS tree for me. So here's what I often do instead — and this only works because of the <code>/usr/</code>-merge.</p> <p>I run a command like the following (without any preparatory work):</p> <div class="highlight"><pre><span></span><code>systemd-nspawn <span class="se">\</span> --directory<span class="o">=</span>/ <span class="se">\</span> --volatile<span class="o">=</span>yes <span class="se">\</span> -U <span class="se">\</span> --set-credential<span class="o">=</span>passwd.hashed-password.root:<span class="k">$(</span>mkpasswd mysecret<span class="k">)</span> <span class="se">\</span> --set-credential<span class="o">=</span>firstboot.locale:C.UTF-8 <span class="se">\</span> --bind-user<span class="o">=</span>lennart <span class="se">\</span> -b </code></pre></div> <p>And then I very quickly get a login prompt on a container that runs the exact same software as my host — but is also isolated from the host. I do <em>not</em> need to prepare any separate OS tree or anything else. It <em>just</em> works. And my host user <code>lennart</code> is <em>just</em> there, ready for me to log into.</p> <p>So here's what these <a href="https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html"><code>systemd-nspawn</code></a> options specifically do:</p> <ul> <li> <p><code>--directory=/</code> tells <code>systemd-nspawn</code> to run off the host OS' file hierarchy. That smells like danger of course, running two OS instances off the same directory hierarchy. But don't be scared, because:</p> </li> <li> <p><code>--volatile=yes</code> enables volatile mode. Specifically this means what we configured with <code>--directory=/</code> as root file system is slightly rearranged. Instead of mounting that tree as it is, we'll mount a <code>tmpfs</code> instance as actual root file system, and then mount the <code>/usr/</code> subdirectory of the specified hierarchy into the <code>/usr/</code> subdirectory of the container file hierarchy in read-only fashion – and <em>only</em> that directory. So now we have a container directory tree that is basically empty, but imports all host OS binaries and libraries into its <code>/usr/</code> tree. All software installed on the host is also available in the container with no manual work. This mechanism only works because on <code>/usr/</code>-merged OSes vendor resources are monopolized at a single place: <code>/usr/</code>. It's sufficient to share that one directory with the container to get a second instance of the host OS running. Note that this means <code>/etc/</code> and <code>/var/</code> will be entirely empty initially when this second system boots up. Thankfully, forward looking distributions (such as Fedora) have adopted <a href="https://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html"><code>systemd-tmpfiles</code></a> and <a href="https://www.freedesktop.org/software/systemd/man/systemd-sysusers.html"><code>systemd-sysusers</code></a> quite pervasively, so that system users and files/directories required for operation are created automatically should they be missing. Thus, even though at boot the mentioned directories are initially empty, once the system is booted up they are sufficiently populated for things to just work.</p> </li> <li> <p><code>-U</code> means we'll enable user namespacing, in fully automatic mode. This does three things: it picks a free host UID range dynamically for the container, then sets up user namespacing for the container processes mapping host UID range to UIDs 0…65534 in the container. It then sets up a similar UID mapped mount on the <code>/usr/</code> tree of the container. Net effect: file ownerships as set on the host OS tree appear as they belong to the very <em>same</em> users inside of the container environment, except that we use user namespacing for everything, and thus the users are <em>actually</em> neatly isolated from the host.</p> </li> <li> <p><code>--set-credential=passwd.hashed-password.root:$(mkpasswd mysecret)</code> passes a <em>credential</em> to the container. Credentials are bits of data that you can pass to systemd services and whole systems. They are actually awesome concepts (e.g. they support TPM2 authentication/encryption that just works!) but I am not going to go into details around that, given it's off-topic in this specific scenario. Here we just take benefit of the fact that <code>systemd-sysusers</code> looks for a credential called <code>passwd.hashed-password.root</code> to initialize the root password of the system from. We set it to <code>mysecret</code>. This means once the system is booted up we can log in as <code>root</code> and the supplied password. Yay. (Remember, <code>/etc/</code> is initially empty on this container, and thus also carries no <code>/etc/passwd</code> or <code>/etc/shadow</code>, and thus has no root user record, and thus no root password.)</p> <p><a href="https://linux.die.net/man/1/mkpasswd"><code>mkpasswd</code></a> is a tool then converts a plain text password into a UNIX hashed password, which is what this specific credential expects.</p> </li> <li> <p>Similar, <code>--set-credential=firstboot.locale:C.UTF-8</code> tells the <a href="https://www.freedesktop.org/software/systemd/man/systemd-firstboot.html"><code>systemd-firstboot</code></a> service in the container to initialize <code>/etc/locale.conf</code> with this locale.</p> </li> <li> <p><code>--bind-user=lennart</code> binds the host user <code>lennart</code> into the container, also as user <code>lennart</code>. This does two things: it mounts the host user's home directory into the container. It also copies a minimal user record of the specified user into the container that <a href="https://www.freedesktop.org/software/systemd/man/nss-systemd.html"><code>nss-systemd</code></a> then picks up and includes in the regular user database. This means, once the container is booted up I can log in as <code>lennart</code> with my regular password, and once I logged in I will see my regular host home directory, and can make changes to it. Yippieh! (This does a couple of more things, such as UID mapping and things, but let's not get lost in too much details.)</p> </li> </ul> <p>So, if I run this, I will very quickly get a login prompt, where I can log into as my regular user. I have full access to my host home directory, but otherwise everything is nicely isolated from the host, and changes outside of the home directory are either prohibited or are volatile, i.e. go to a <code>tmpfs</code> instance whose lifetime is bound to the container's lifetime: when I shut down the container I just started, then any changes outside of my user's home directory are lost.</p> <p>Note that while here I use <code>--volatile=yes</code> in combination with <code>--directory=/</code> you can actually use it on any OS hierarchy, i.e. just about any directory that contains OS binaries.</p> <p>Similar, the <code>--bind-user=</code> stuff works with any OS hierarchy too (but do note that only systemd 249 and newer will pick up the user records passed to the container that way, i.e. this requires at least v249 both on the host and in the container to work).</p> <p>Or in short: the possibilities are endless!</p> <h2>Requirements</h2> <p>For this all to work, you need:</p> <ol> <li> <p>A recent kernel (5.15 should suffice, as it brings UID mapped mounts for the most common file systems, so that <code>-U</code> and <code>--bind-user=</code> can work well.)</p> </li> <li> <p>A recent systemd (249 should suffice, which brings <code>--bind-user=</code>, and a <code>-U</code> switch backed by UID mapped mounts).</p> </li> <li> <p>A distribution that adopted the <code>/usr/</code>-merge, <code>systemd-tmpfiles</code> and <code>systemd-sysusers</code> so that the directory hierarchy and user databases are automatically populated when empty at boot. (Fedora 35 should suffice.)</p> </li> </ol> <h2>Limitations</h2> <p>While a lot of today's software actually out of the box works well on systems that come up with an unpopulated <code>/etc/</code> and <code>/var/</code>, and either fall back to reasonable built-in defaults, or deploy <code>systemd-tmpfiles</code> to create what is missing, things aren't perfect: some software typically installed an desktop OSes will fail to start when invoked in such a container, and be visible as ugly failed services, but it won't stop me from logging in and using the system for what I want to use it. It would be excellent to get that fixed, though. This can either be fixed in the relevant software upstream (i.e. if opening your configuration file fails with <code>ENOENT</code>, then just default to reasonable defaults), or in the distribution packaging (i.e. add a <a href="https://www.freedesktop.org/software/systemd/man/tmpfiles.d.html"><code>tmpfiles.d/</code></a> file that copies or symlinks in skeleton configuration from <code>/usr/share/factory/etc/</code> via the <code>C</code> or <code>L</code> line types).</p> <p>And then there's certain software dealing with hardware management and similar that simply cannot work in a container (as device APIs on Linux are generally not virtualized for containers) reasonably. It would be excellent if software like that would be updated to carry <code>ConditionVirtualization=!container</code> or <code>ConditionPathIsReadWrite=/sys</code> conditionalization in their unit files, so that it is automatically – cleanly – skipped when executed in such a container environment.</p> <p>And that's all for now.</p>Lennart PoetteringWed, 06 Apr 2022 00:00:00 +0200tag:0pointer.net,2022-04-06:/blog/running-an-container-off-the-host-usr.htmlprojectsAuthenticated Boot and Disk Encryption on Linuxhttps://0pointer.net/blog/authenticated-boot-and-disk-encryption-on-linux.html<h1>The Strange State of Authenticated Boot and Disk Encryption on Generic Linux Distributions</h1> <p><em>TL;DR: Linux has been supporting Full Disk Encryption (FDE) and technologies such as UEFI SecureBoot and TPMs for a long time. However, the way they are set up by most distributions is not as secure as they should be, and in some ways quite frankly weird. In fact, right now, your data is probably more secure if stored on current ChromeOS, Android, Windows or MacOS devices, than it is on typical Linux distributions.</em></p> <p>Generic Linux distributions (i.e. Debian, Fedora, Ubuntu, …) adopted Full Disk Encryption (FDE) more than 15 years ago, with the LUKS/cryptsetup infrastructure. It was a big step forward to a more secure environment. Almost ten years ago the big distributions started adding UEFI SecureBoot to their boot process. Support for Trusted Platform Modules (TPMs) has been added to the distributions a long time ago as well — but even though many PCs/laptops these days have TPM chips on-board it's generally not used in the default setup of generic Linux distributions.</p> <p>How these technologies currently fit together on generic Linux distributions doesn't really make too much sense to me — and falls short of what they could actually deliver. In this story I'd like to have a closer look at why I think that, and what I propose to do about it.</p> <h2>The Basic Technologies</h2> <p>Let's have a closer look what these technologies actually deliver:</p> <ol> <li> <p>LUKS/<code>dm-crypt</code>/<code>cryptsetup</code> provide disk encryption, and optionally data authentication. Disk encryption means that reading the data in clear-text form is only possible if you possess a secret of some form, usually a password/passphrase. Data authentication means that no one can make changes to the data on disk unless they possess a secret of some form. Most distributions only enable the former though — the latter is a more recent addition to LUKS/cryptsetup, and is not used by default on most distributions (though it probably should be). Closely related to LUKS/<code>dm-crypt</code> is <code>dm-verity</code> (which can authenticate immutable volumes) and <code>dm-integrity</code> (which can authenticate writable volumes, among other things).</p> </li> <li> <p>UEFI SecureBoot provides mechanisms for authenticating boot loaders and other pre-OS binaries before they are invoked. If those boot loaders then authenticate the next step of booting in a similar fashion there's a chain of trust which can ensure that only code that has some level of trust associated with it will run on the system. Authentication of boot loaders is done via cryptographic signatures: the OS/boot loader vendors cryptographically sign their boot loader binaries. The cryptographic certificates that may be used to validate these signatures are then signed by Microsoft, and since Microsoft's certificates are basically built into all of today's PCs and laptops this will provide some basic trust chain: if you want to modify the boot loader of a system you must have access to the private key used to sign the code (or to the private keys further up the certificate chain).</p> </li> <li> <p>TPMs do many things. For this text we'll focus one facet: they can be used to protect secrets (for example for use in disk encryption, see above), that are released only if the code that booted the host can be authenticated in some form. This works roughly like this: every component that is used during the boot process (i.e. code, certificates, configuration, …) is hashed with a cryptographic hash function before it is used. The resulting hash is written to some small volatile memory the TPM maintains that is write-only (the so called Platform Configuration Registers, "PCRs"): each step of the boot process will write hashes of the resources needed by the next part of the boot process into these PCRs. The PCRs cannot be written freely: the hashes written are combined with what is already stored in the PCRs — also through hashing and the result of that then replaces the previous value. Effectively this means: only if every component involved in the boot matches expectations the hash values exposed in the TPM PCRs match the expected values too. And if you then use those values to unlock the secrets you want to protect you can guarantee that the key is only released to the OS if the expected OS and configuration is booted. The process of hashing the components of the boot process and writing that to the TPM PCRs is called "measuring". What's also important to mention is that the secrets are not only protected by these PCR values but encrypted with a "seed key" that is generated on the TPM chip itself, and cannot leave the TPM (at least so goes the theory). The idea is that you cannot read out a TPM's seed key, and thus you cannot duplicate the chip: unless you possess the original, physical chip you cannot retrieve the secret it might be able to unlock for you. Finally, TPMs can enforce a limit on unlock attempts per time ("anti-hammering"): this makes it hard to brute force things: if you can only execute a certain number of unlock attempts within some specific time then brute forcing will be prohibitively slow.</p> </li> </ol> <h2>How Linux Distributions use these Technologies</h2> <p>As mentioned already, Linux distributions adopted the first two of these technologies widely, the third one not so much.</p> <p>So typically, here's how the boot process of Linux distributions works these days:</p> <ol> <li> <p>The UEFI firmware invokes a piece of code called "shim" (which is stored in the EFI System Partition — the "ESP" — of your system), that more or less is just a list of certificates compiled into code form. The shim is signed with the aforementioned Microsoft key, that is built into all PCs/laptops. This list of certificates then can be used to validate the next step of the boot process. The shim is measured by the firmware into the TPM. (Well, the shim can do a bit more than what I describe here, but this is outside of the focus of this article.)</p> </li> <li> <p>The shim then invokes a boot loader (often Grub) that is signed by a private key owned by the distribution vendor. The boot loader is stored in the ESP as well, plus some other places (i.e. possibly a separate boot partition). The corresponding certificate is included in the list of certificates built into the shim. The boot loader components are also measured into the TPM.</p> </li> <li> <p>The boot loader then invokes the kernel and passes it an initial RAM disk image (initrd), which contains initial userspace code. The kernel itself is signed by the distribution vendor too. It's also validated via the shim. The initrd is not validated, though (!). The kernel is measured into the TPM, the initrd sometimes too.</p> </li> <li> <p>The kernel unpacks the initrd image, and invokes what is contained in it. Typically, the initrd then asks the user for a password for the encrypted root file system. The initrd then uses that to set up the encrypted volume. No code authentication or TPM measurements take place.</p> </li> <li> <p>The initrd then transitions into the root file system. No code authentication or TPM measurements take place.</p> </li> <li> <p>When the OS itself is up the user is prompted for their user name, and their password. If correct, this will unlock the user account: the system is now ready to use. At this point no code authentication, no TPM measurements take place. Moreover, the user's password is not used to unlock any data, it's used only to allow or deny the login attempt — the user's data has already been decrypted a long time ago, by the initrd, as mentioned above.</p> </li> </ol> <p>What you'll notice here of course is that code validation happens for the shim, the boot loader and the kernel, but not for the initrd or the main OS code anymore. TPM measurements might go one step further: the initrd is measured sometimes too, if you are lucky. Moreover, you might notice that the disk encryption password and the user password are inquired by code that is not validated, and is thus not safe from external manipulation. You might also notice that even though TPM measurements of boot loader/OS components are done nothing actually ever makes use of the resulting PCRs in the typical setup.</p> <h2>Attack Scenarios</h2> <p>Of course, before determining whether the setup described above makes sense or not, one should have an idea what one actually intends to protect against.</p> <p>The most basic attack scenario to focus on is probably that you want to be reasonably sure that if someone steals your laptop that contains all your data then this data remains confidential. The model described above probably delivers that to some degree: the full disk encryption when used with a reasonably strong password should make it hard for the laptop thief to access the data. The data is as secure as the password used is strong. The attacker might attempt to brute force the password, thus if the password is not chosen carefully the attacker might be successful.</p> <p>Two more interesting attack scenarios go something like this:</p> <ol> <li> <p>Instead of stealing your laptop the attacker takes the harddisk from your laptop while you aren't watching (e.g. while you went for a walk and left it at home or in your hotel room), makes a copy of it, and then puts it back. You'll never notice they did that. The attacker then analyzes the data in their lab, maybe trying to brute force the password. In this scenario you won't even know that your data is at risk, because for you nothing changed — unlike in the basic scenario above. If the attacker manages to break your password they have full access to the data included on it, i.e. everything you so far stored on it, but not necessarily on what you are going to store on it later. This scenario is worse than the basic one mentioned above, for the simple fact that you won't know that you might be attacked. (This scenario could be extended further: maybe the attacker has a chance to watch you type in your password or so, effectively lowering the password strength.)</p> </li> <li> <p>Instead of stealing your laptop the attacker takes the harddisk from your laptop while you aren't watching, inserts backdoor code on it, and puts it back. In this scenario you won't know your data is at risk, because physically everything is as before. What's really bad though is that the attacker gets access to anything you do on your laptop, both the data already on it, and whatever you will do in the future.</p> </li> </ol> <p>I think in particular this backdoor attack scenario is something we should be concerned about. We know for a fact that attacks like that happen all the time (Pegasus, industry espionage, …), hence we should make them hard.</p> <h2>Are we Safe?</h2> <p>So, does the scheme so far implemented by generic Linux distributions protect us against the latter two scenarios? Unfortunately not at all. Because distributions set up disk encryption the way they do, and only bind it to a user password, an attacker can easily duplicate the disk, and then attempt to brute force your password. What's worse: since code authentication ends at the kernel — and the initrd is not authenticated anymore —, backdooring is trivially easy: an attacker can change the initrd any way they want, without having to fight any kind of protections. And given that FDE unlocking is implemented in the initrd, and it's the initrd that asks for the encryption password things are just too easy: an attacker could trivially easily insert some code that picks up the FDE password as you type it in and send it wherever they want. And not just that: since once they are in they are in, they can do anything they like for the rest of the system's lifecycle, with full privileges — including installing backdoors for versions of the OS or kernel that are installed on the device in the future, so that their backdoor remains open for as long as they like.</p> <p>That is sad of course. It's particular sad given that the other popular OSes all address this much better. ChromeOS, Android, Windows and MacOS all have way better built-in protections against attacks like this. And it's why one can certainly claim that your data is probably better protected right now if you store it on those OSes then it is on generic Linux distributions.</p> <p>(Yeah, I know that there are some niche distros which do this better, and some hackers hack their own. But I care about general purpose distros here, i.e. the big ones, that most people base their work on.)</p> <p>Note that there are more problems with the current setup. For example, it's really weird that during boot the user is queried for an FDE password which actually protects their data, and then once the system is up they are queried again – now asking for a username, and another password. And the weird thing is that this second authentication that appears to be user-focused doesn't really protect the user's data anymore — at that moment the data is already unlocked and accessible. The username/password query is supposed to be useful in multi-user scenarios of course, but how does that make any sense, given that these multiple users would all have to know a disk encryption password that unlocks the whole thing during the FDE step, and thus they have access to every user's data anyway if they make an offline copy of the harddisk?</p> <h2>Can we do better?</h2> <p>Of course we can, and that is what this story is actually supposed to be about.</p> <p>Let's first figure out what the minimal issues we should fix are (at least in my humble opinion):</p> <ol> <li> <p>The initrd must be authenticated before being booted into. (And measured unconditionally.)</p> </li> <li> <p>The OS binary resources (i.e. <code>/usr/</code>) must be authenticated before being booted into. (But don't need to be encrypted, since everyone has the same anyway, there's nothing to hide here.)</p> </li> <li> <p>The OS configuration and state (i.e. <code>/etc/</code> and <code>/var/</code>) must be encrypted, and authenticated before they are used. The encryption key should be bound to the TPM device; i.e system data should be locked to a security concept belonging to the system, not the user.</p> </li> <li> <p>The user's home directory (i.e. <code>/home/lennart/</code> and similar) must be encrypted and authenticated. The unlocking key should be bound to a user password or user security token (FIDO2 or PKCS#11 token); i.e. user data should be locked to a security concept belonging to the user, not the system.</p> </li> </ol> <p>Or to summarize this differently:</p> <ol> <li> <p>Every single component of the boot process and OS needs to be authenticated, i.e. all of shim (done), boot loader (done), kernel (done), initrd (missing so far), OS binary resources (missing so far), OS configuration and state (missing so far), the user's home (missing so far).</p> </li> <li> <p>Encryption is necessary for the OS configuration and state (bound to TPM), and for the user's home directory (bound to a user password or user security token).</p> </li> </ol> <h2>In Detail</h2> <p>Let's see how we can achieve the above in more detail.</p> <h3>How to Authenticate the initrd</h3> <p>At the moment initrds are generated on the installed host via scripts (dracut and similar) that try to figure out a minimal set of binaries and configuration data to build an initrd that contains just enough to be able to find and set up the root file system. What is included in the initrd hence depends highly on the individual installation and its configuration. Pretty likely no two initrds generated that way will be fully identical due to this. This model clearly has benefits: the initrds generated this way are very small and minimal, and support exactly what is necessary for the system to boot, and not less or more. It comes with serious drawbacks too though: the generation process is fragile and sometimes more akin to black magic than following clear rules: the generator script natively has to understand a myriad of storage stacks to determine what needs to be included and what not. It also means that authenticating the image is hard: given that each individual host gets a different specialized initrd, it means we cannot just sign the initrd with the vendor key like we sign the kernel. If we want to keep this design we'd have to figure out some other mechanism (e.g. a per-host signature key – that is generated locally; or by authenticating it with a message authentication code bound to the TPM). While these approaches are certainly thinkable, I am not convinced they actually are a good idea though: locally and dynamically generated per-host initrds is something we probably should move away from.</p> <p>If we move away from locally generated initrds, things become a lot simpler. If the distribution vendor generates the initrds on their build systems then it can be attached to the kernel image itself, and thus be signed and measured along with the kernel image, without any further work. This simplicity is simply lovely. Besides robustness and reproducibility this gives us an easy route to authenticated initrds.</p> <p>But of course, nothing is really that simple: working with vendor-generated initrds means that we can't adjust them anymore to the specifics of the individual host: if we pre-build the initrds and include them in the kernel image in immutable fashion then it becomes harder to support complex, more exotic storage or to parameterize it with local network server information, credentials, passwords, and so on. Now, for my simple laptop use-case these things don't matter, there's no need to extend/parameterize things, laptops and their setups are not that wildly different. But what to do about the cases where we want both: extensibility to cover for less common storage subsystems (iscsi, LVM, multipath, drivers for exotic hardware…) and parameterization?</p> <p>Here's a proposal how to achieve that: let's build a basic initrd into the kernel as suggested, but then do two things to make this scheme both extensible and parameterizable, without compromising security.</p> <ol> <li> <p>Let's define a way how the basic initrd can be extended with additional files, which are stored in separate "extension images". The basic initrd should be able to discover these extension images, authenticate them and then activate them, thus extending the initrd with additional resources on-the-fly.</p> </li> <li> <p>Let's define a way how we can safely pass additional parameters to the kernel/initrd (and actually the rest of the OS, too) in an authenticated (and possibly encrypted) fashion. Parameters in this context can be anything specific to the local installation, i.e. server information, security credentials, certificates, SSH server keys, or even just the root password that shall be able to unlock the root account in the initrd …</p> </li> </ol> <p>In such a scheme we should be able to deliver everything we are looking for:</p> <ol> <li> <p>We'll have a full trust chain for the code: the boot loader will authenticate and measure the kernel and basic initrd. The initrd extension images will then be authenticated by the basic initrd image.</p> </li> <li> <p>We'll have authentication for all the parameters passed to the initrd.</p> </li> </ol> <p>This so far sounds very unspecific? Let's make it more specific by looking closer at the components I'd suggest to be used for this logic:</p> <ol> <li> <p>The <code>systemd</code> suite since a few months contains a subsystem implementing <em>system</em> <em>extensions</em> (v248). System extensions are ultimately just disk images (for example a squashfs file system in a GPT envelope) that can <em>extend</em> an underlying OS tree. Extending in this regard means they simply add additional files and directories into the OS tree, i.e. below <code>/usr/</code>. For a longer explanation see <a href="https://www.freedesktop.org/software/systemd/man/systemd-sysext.html">systemd-sysext(8)</a>. When a system extension is activated it is simply mounted and then merged into the main <code>/usr/</code> tree via a read-only overlayfs mount. Now what's particularly nice about them in this context we are talking about here is that the extension images may carry <em>dm-verity</em> authentication data, and PKCS#7 signatures (<a href="https://github.com/systemd/systemd/pull/20691">once this is merged, that is, i.e. v250</a>).</p> </li> <li> <p>The <code>systemd</code> suite also contains a concept called service "credentials". These are small pieces of information passed to services in a secure way. One key feature of these credentials is that they can be encrypted and authenticated in a very simple way with a key bound to the TPM (v250). See <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#LoadCredential=ID:PATH">LoadCredentialEncrypted=</a> and <a href="https://www.freedesktop.org/software/systemd/man/systemd-creds.html">systemd-creds(1)</a> for details. They are great for safely storing SSL private keys and similar on your system, but they also come handy for parameterizing initrds: an encrypted credential is just a file that can only be decoded if the right TPM is around with the right PCR values set.</p> </li> <li> <p>The <code>systemd</code> suite contains a component called <a href="https://www.freedesktop.org/software/systemd/man/systemd-stub.html">systemd-stub(7)</a>. It's an EFI stub, i.e. a small piece of code that is attached to a kernel image, and turns the kernel image into a regular EFI binary that can be directly executed by the firmware (or a boot loader). This stub has a number of nice features (for example, it can show a boot splash before invoking the Linux kernel itself and such). <a href="https://github.com/systemd/systemd/pull/20789">Once this work is merged (v250)</a> the stub will support one more feature: it will automatically search for system extension image files and credential files next to the kernel image file, measure them and pass them on to the main initrd of the host.</p> </li> </ol> <p>Putting this together we have nice way to provide fully authenticated kernel images, initrd images and initrd extension images; as well as encrypted and authenticated parameters via the credentials logic.</p> <p>How would a distribution actually make us of this? A distribution vendor would pre-build the basic initrd, and glue it into the kernel image, and sign that as a whole. Then, for each supposed extension of the basic initrd (e.g. one for iscsi support, one for LVM, one for multipath, …), the vendor would use a tool such as <a href="https://github.com/systemd/mkosi">mkosi</a> to build an extension image, i.e. a GPT disk image containing the files in squashfs format, a Verity partition that authenticates it, plus a PKCS#7 signature partition that validates the root hash for the dm-verity partition, and that can be checked against a key provided by the boot loader or main initrd. Then, any parameters for the initrd will be encrypted using <a href="https://www.freedesktop.org/software/systemd/man/systemd-creds.html">systemd-creds encrypt -T</a>. The resulting encrypted credentials and the initrd extension images are then simply placed next to the kernel image in the ESP (or boot partition). Done.</p> <p>This checks all boxes: everything is authenticated and measured, the credentials also encrypted. Things remain extensible and modular, can be pre-built by the vendor, and installation is as simple as dropping in one file for each extension and/or credential.</p> <h3>How to Authenticate the Binary OS Resources</h3> <p>Let's now have a look how to authenticate the Binary OS resources, i.e. the stuff you find in <code>/usr/</code>, i.e. the stuff traditionally shipped to the user's system via RPMs or DEBs.</p> <p>I think there are three relevant ways how to authenticate this:</p> <ol> <li> <p>Make <code>/usr/</code> a <code>dm-verity</code> volume. <code>dm-verity</code> is a concept implemented in the Linux kernel that provides authenticity to read-only block devices: every read access is cryptographically verified against a <em>top-level</em> <em>hash</em> <em>value</em>. This top-level hash is typically a 256bit value that you can either encode in the kernel image you are using, or cryptographically sign (<a href="https://github.com/systemd/systemd/pull/20691">which is particularly nice once this is merged</a>). I think this is actually the best approach since it makes the <code>/usr/</code> tree entirely immutable in a very simple way. However, this also means that the whole of <code>/usr/</code> needs to be updated as once, i.e. the traditional <code>rpm</code>/<code>apt</code> based update logic cannot work in this mode.</p> </li> <li> <p>Make <code>/usr/</code> a <code>dm-integrity</code> volume. <code>dm-integrity</code> is a concept provided by the Linux kernel that offers integrity guarantees to writable block devices, i.e. in some ways it can be considered to be a bit like <code>dm-verity</code> while permitting write access. It can be used in three ways, one of which I think is particularly relevant here. The first way is with a simple hash function in "stand-alone" mode: this is not too interesting here, it just provides greater data safety for file systems that don't hash check their files' data on their own. The second way is in combination with <code>dm-crypt</code>, i.e. with disk encryption. In this case it adds authenticity to confidentiality: only if you know the right secret you can read and make changes to the data, and any attempt to make changes without knowing this secret key will be detected as IO error on next read by those in possession of the secret (more about this below). The third way is the one I think is most interesting here: in "stand-alone" mode, but with a keyed hash function (e.g. HMAC). What's this good for? This provides authenticity without encryption: if you make changes to the disk without knowing the secret this will be noticed on the next read attempt of the data and result in IO errors. This mode provides what we want (authenticity) and doesn't do what we don't need (encryption). Of course, the secret key for the HMAC must be provided somehow, I think ideally by the TPM.</p> </li> <li> <p>Make <code>/usr/</code> a <code>dm-crypt</code> (LUKS) + <code>dm-integrity</code> volume. This provides both authenticity and encryption. The latter isn't typically needed for <code>/usr/</code> given that it generally contains no secret data: anyone can download the binaries off the Internet anyway, and the sources too. By encrypting this you'll waste CPU cycles, but beyond that it doesn't hurt much. (Admittedly, some people might want to hide the precise set of packages they have installed, since it of course does reveal a bit of information about you: i.e. what you are working on, maybe what your job is – think: if you are a hacker you have hacking tools installed – and similar). Going this way might simplify things in some cases, as it means you don't have to distinguish "OS binary resources" (i.e <code>/usr/</code>) and "OS configuration and state" (i.e. <code>/etc/</code> + <code>/var/</code>, see below), and just make it the same volume. Here too, the secret key must be provided somehow, I think ideally by the TPM.</p> </li> </ol> <p>All three approach are valid. The first approach has my primary sympathies, but for distributions not willing to abandon client-side updates via RPM/dpkg this is not an option, in which case I would propose the other two approaches for these cases.</p> <p>The LUKS encryption key (and in case of <code>dm-integrity</code> standalone mode the key for the keyed hash function) should be bound to the TPM. Why the TPM for this? You could also use a user password, a FIDO2 or PKCS#11 security token — but I think TPM is the right choice: why that? To reduce the requirement for repeated authentication, i.e. that you first have to provide the disk encryption password, and then you have to login, providing another password. It should be possible that the system boots up unattended and then only one authentication prompt is needed to unlock the user's data properly. The TPM provides a way to do this in a reasonably safe and fully unattended way. Also, when we stop considering just the laptop use-case for a moment: on servers interactive disk encryption prompts don't make much sense — the fact that TPMs can provide secrets without this requiring user interaction and thus the ability to work in entirely unattended environments is quite desirable. Note that <a href="https://www.freedesktop.org/software/systemd/man/crypttab.html">crypttab(5)</a> as implemented by <code>systemd</code> (v248) provides native support for authentication via password, via TPM2, via PKCS#11 or via FIDO2, so the choice is ultimately all yours.</p> <h3>How to Encrypt/Authenticate OS Configuration and State</h3> <p>Let's now look at the OS configuration and state, i.e. the stuff in <code>/etc/</code> and <code>/var/</code>. It probably makes sense to not consider these two hierarchies independently but instead just consider this to be the root file system. If the OS binary resources are in a separate file system it is then mounted onto the <code>/usr/</code> sub-directory of the root file system.</p> <p>The OS configuration and state (or: root file system) should be both encrypted and authenticated: it might contain secret keys, user passwords, privileged logs and similar. This data matters and contains plenty data that should remain confidential.</p> <p>The encryption of choice here is <code>dm-crypt</code> (LUKS) + <code>dm-integrity</code> similar as discussed above, again with the key bound to the TPM.</p> <p>If the OS binary resources are protected the same way it is safe to merge these two volumes and have a single partition for both (see above)</p> <h3>How to Encrypt/Authenticate the User's Home Directory</h3> <p>The data in the user's home directory should be encrypted, and bound to the user's preferred token of authentication (i.e. a password or FIDO2/PKCS#11 security token). As mentioned, in the traditional mode of operation the user's home directory is not individually encrypted, but only encrypted because FDE is in use. The encryption key for that is a system wide key though, not a per-user key. And I think that's problem, as mentioned (and probably not even generally understood by our users). We should correct that and ensure that the user's password is what unlocks the user's data.</p> <p>In the <code>systemd</code> suite we provide a service <a href="https://www.freedesktop.org/software/systemd/man/systemd-homed.service.html">systemd-homed(8)</a> (v245) that implements this in a safe way: each user gets its own LUKS volume stored in a loopback file in <code>/home/</code>, and this is enough to synthesize a user account. The encryption password for this volume is the user's account password, thus it's really the password provided at login time that unlocks the user's data. <code>systemd-homed</code> also supports other mechanisms of authentication, in particular PKCS#11/FIDO2 security tokens. It also provides support for other storage back-ends (such as fscrypt), but I'd always suggest to use the LUKS back-end since it's the only one providing the comprehensive confidentiality guarantees one wants for a UNIX-style home directory.</p> <p>Note that there's one special caveat here: if the user's home directory (e.g. <code>/home/lennart/</code>) is encrypted and authenticated, what about the file system this data is stored on, i.e. <code>/home/</code> itself? If that dir is part of the the root file system this would result in double encryption: first the data is encrypted with the TPM root file system key, and then again with the per-user key. Such double encryption is a waste of resources, and unnecessary. I'd thus suggest to make <code>/home/</code> its own <code>dm-integrity</code> volume with a HMAC, keyed by the TPM. This means the data stored directly in <code>/home/</code> will be authenticated but not encrypted. That's good not only for performance, but also has practical benefits: it allows extracting the encrypted volume of the various users in case the TPM key is lost, as a way to recover from dead laptops or similar.</p> <p>Why authenticate <code>/home/</code>, if it only contains per-user home directories that are authenticated on their own anyway? That's a valid question: it's because the kernel file system maintainers made clear that Linux file system code is not considered safe against rogue disk images, and is not tested for that; this means before you mount anything you need to establish trust in some way because otherwise there's a risk that the act of mounting might exploit your kernel.</p> <h3>Summary of Resources and their Protections</h3> <p>So, let's now put this all together. Here's a table showing the various resources we deal with, and how I think they should be protected (in my idealized world).</p> <table> <thead> <tr> <th>Resource</th> <th>Needs Authentication</th> <th>Needs Encryption</th> <th>Suggested Technology</th> <th>Validation/Encryption Keys/Certificates acquired via</th> <th>Stored where</th> </tr> </thead> <tbody> <tr> <td>Shim</td> <td>yes</td> <td>no</td> <td>SecureBoot signature verification</td> <td>firmware certificate database</td> <td>ESP</td> </tr> <tr> <td>Boot loader</td> <td>yes</td> <td>no</td> <td>ditto</td> <td>firmware certificate database/shim</td> <td>ESP/boot partition</td> </tr> <tr> <td>Kernel</td> <td>yes</td> <td>no</td> <td>ditto</td> <td>ditto</td> <td>ditto</td> </tr> <tr> <td>initrd</td> <td>yes</td> <td>no</td> <td>ditto</td> <td>ditto</td> <td>ditto</td> </tr> <tr> <td>initrd parameters</td> <td>yes</td> <td>yes</td> <td>systemd TPM encrypted credentials</td> <td>TPM</td> <td>ditto</td> </tr> <tr> <td>initrd extensions</td> <td>yes</td> <td>no</td> <td><code>systemd-sysext</code> with Verity+PKCS#7 signatures</td> <td>firmware/initrd certificate database</td> <td>ditto</td> </tr> <tr> <td>OS binary resources</td> <td>yes</td> <td>no</td> <td><code>dm-verity</code></td> <td>root hash linked into kernel image, or firmware/initrd certificate database</td> <td>top-level partition</td> </tr> <tr> <td>OS configuration and state</td> <td>yes</td> <td>yes</td> <td><code>dm-crypt</code> (LUKS) + <code>dm-integrity</code></td> <td>TPM</td> <td>top-level partition</td> </tr> <tr> <td><code>/home/</code> itself</td> <td>yes</td> <td>no</td> <td><code>dm-integrity</code> with HMAC</td> <td>TPM</td> <td>top-level partition</td> </tr> <tr> <td>User home directories</td> <td>yes</td> <td>yes</td> <td><code>dm-crypt</code> (LUKS) + <code>dm-integrity</code> in loopback files</td> <td>User password/FIDO2/PKCS#11 security token</td> <td>loopback file inside <code>/home</code> partition</td> </tr> </tbody> </table> <p>This should provide all the desired guarantees: everything is authenticated, and the individualized per-host or per-user data is also encrypted. No double encryption takes place. The encryption keys/verification certificates are stored/bound to the most appropriate infrastructure.</p> <p>Does this address the three attack scenarios mentioned earlier? I think so, yes. The basic attack scenario I described is addressed by the fact that <code>/var/</code>, <code>/etc/</code> and <code>/home/*/</code> are encrypted. Brute forcing the former two is harder than in the status quo ante model, since a high entropy key is used instead of one derived from a user provided password. Moreover, the "anti-hammering" logic of the TPM will make brute forcing prohibitively slow. The home directories are protected by the user's password or ideally a personal FIDO2/PKCS#11 security token in this model. Of course, a password isn't better security-wise then the status quo ante. But given the FIDO2/PKCS#11 support built into <code>systemd-homed</code> it should be easier to lock down the home directories securely.</p> <p>Binding encryption of <code>/var/</code> and <code>/etc/</code> to the TPM also addresses the first of the two more advanced attack scenarios: a copy of the harddisk is useless without the physical TPM chip, since the seed key is sealed into that. (And even if the attacker had the chance to watch you type in your password, it won't help unless they possess access to to the TPM chip.) For the home directory this attack is not addressed as long as a plain password is used. However, since binding home directories to FIDO2/PKCS#11 tokens is built into <code>systemd-homed</code> things should be safe here too — provided the user actually possesses and uses such a device.</p> <p>The backdoor attack scenario is addressed by the fact that every resource in play now is authenticated: it's hard to backdoor the OS if there's no component that isn't verified by signature keys or TPM secrets the attacker hopefully doesn't know.</p> <p>For general purpose distributions that focus on updating the OS per RPM/dpkg the idealized model above won't work out, since (as mentioned) this implies an immutable <code>/usr/</code>, and thus requires updating <code>/usr/</code> via an atomic update operation. For such distros a setup like the following is probably more realistic, but see above.</p> <table> <thead> <tr> <th>Resource</th> <th>Needs Authentication</th> <th>Needs Encryption</th> <th>Suggested Technology</th> <th>Validation/Encryption Keys/Certificates acquired via</th> <th>Stored where</th> </tr> </thead> <tbody> <tr> <td>Shim</td> <td>yes</td> <td>no</td> <td>SecureBoot signature verification</td> <td>firmware certificate database</td> <td>ESP</td> </tr> <tr> <td>Boot loader</td> <td>yes</td> <td>no</td> <td>ditto</td> <td>firmware certificate database/shim</td> <td>ESP/boot partition</td> </tr> <tr> <td>Kernel</td> <td>yes</td> <td>no</td> <td>ditto</td> <td>ditto</td> <td>ditto</td> </tr> <tr> <td>initrd</td> <td>yes</td> <td>no</td> <td>ditto</td> <td>ditto</td> <td>ditto</td> </tr> <tr> <td>initrd parameters</td> <td>yes</td> <td>yes</td> <td>systemd TPM encrypted credentials</td> <td>TPM</td> <td>ditto</td> </tr> <tr> <td>initrd extensions</td> <td>yes</td> <td>no</td> <td><code>systemd-sysext</code> with Verity+PKCS#7 signatures</td> <td>firmware/initrd certificate database</td> <td>ditto</td> </tr> <tr> <td>OS binary resources, configuration and state</td> <td>yes</td> <td>yes</td> <td><code>dm-crypt</code> (LUKS) + <code>dm-integrity</code></td> <td>TPM</td> <td>top-level partition</td> </tr> <tr> <td><code>/home/</code> itself</td> <td>yes</td> <td>no</td> <td><code>dm-integrity</code> with HMAC</td> <td>TPM</td> <td>top-level partition</td> </tr> <tr> <td>User home directories</td> <td>yes</td> <td>yes</td> <td><code>dm-crypt</code> (LUKS) + <code>dm-integrity</code> in loopback files</td> <td>User password/FIDO2/PKCS#11 security token</td> <td>loopback file inside <code>/home</code> partition</td> </tr> </tbody> </table> <p>This means there's only one root file system that contains all of <code>/etc/</code>, <code>/var/</code> and <code>/usr/</code>.</p> <h2>Recovery Keys</h2> <p>When binding encryption to TPMs one problem that arises is what strategy to adopt if the TPM is lost, due to hardware failure: if I need the TPM to unlock my encrypted volume, what do I do if I need the data but lost the TPM?</p> <p>The answer here is supporting recovery keys (this is similar to how other OSes approach this). Recovery keys are pretty much the same concept as passwords. The main difference being that they are computer generated rather than user-chosen. Because of that they typically have much higher entropy (which makes them more annoying to type in, i.e you want to use them only when you must, not day-to-day). By having higher entropy they are useful in combination with TPM, FIDO2 or PKCS#11 based unlocking: unlike a combination with passwords they do not compromise the higher strength of protection that TPM/FIDO2/PKCS#11 based unlocking is supposed to provide.</p> <p>Current versions of <a href="https://www.freedesktop.org/software/systemd/man/systemd-cryptenroll.html">systemd-cryptenroll(1)</a> implement a recovery key concept in an attempt to address this problem. You may enroll any combination of TPM chips, PKCS#11 tokens, FIDO2 tokens, recovery keys and passwords on the same LUKS volume. When enrolling a recovery key it is generated and shown on screen both in text form and as QR code you can scan off screen if you like. The idea is write down/store this recovery key at a safe place so that you can use it when you need it. Note that such recovery keys can be entered wherever a LUKS password is requested, i.e. after generation they behave pretty much the same as a regular password.</p> <h2>TPM PCR Brittleness</h2> <p>Locking devices to TPMs and enforcing a PCR policy with this (i.e. configuring the TPM key to be unlockable only if certain PCRs match certain values, and thus requiring the OS to be in a certain state) brings a problem with it: TPM PCR brittleness. If the key you want to unlock with the TPM requires the OS to be in a specific state (i.e. that all OS components' hashes match certain expectations or similar) then doing OS updates might have the affect of making your key inaccessible: the OS updates will cause the code to change, and thus the hashes of the code, and thus certain PCRs. (Thankfully, you unrolled a recovery key, as described above, so this doesn't mean you lost your data, right?).</p> <p>To address this I'd suggest three strategies:</p> <ol> <li> <p>Most importantly: don't actually use the TPM PCRs that contain code hashes. There are actually <a href="https://www.freedesktop.org/software/systemd/man/systemd-cryptenroll.html#id-1.6.3.9.2.2">multiple PCRs defined</a>, each containing measurements of different aspects of the boot process. My recommendation is to bind keys to PCR 7 only, a PCR that contains measurements of the UEFI SecureBoot certificate databases. Thus, the keys will remain accessible as long as these databases remain the same, and updates to code will not affect it (updates to the certificate databases will, and they do happen too, though hopefully much less frequent then code updates). Does this reduce security? Not much, no, because the code that's run is after all not just measured but also validated via code signatures, and those signatures are validated with the aforementioned certificate databases. Thus binding an encrypted TPM key to PCR 7 should enforce a similar level of trust in the boot/OS code as binding it to a PCR with hashes of specific versions of that code. i.e. using PCR 7 means you say "every code signed by these vendors is allowed to unlock my key" while using a PCR that contains code hashes means "only this exact version of my code may access my key".</p> </li> <li> <p>Use LUKS key management to enroll multiple versions of the TPM keys in relevant volumes, to support multiple versions of the OS code (or multiple versions of the certificate database, as discussed above). Specifically: whenever an update is done that might result changing the relevant PCRs, pre-calculate the new PCRs, and enroll them in an additional LUKS slot on the relevant volumes. This means that the unlocking keys tied to the TPM remain accessible in both states of the system. Eventually, once rebooted after the update, remove the old slots.</p> </li> <li> <p>If these two strategies didn't work out (maybe because the OS/firmware was updated outside of OS control, or the update mechanism was aborted at the wrong time) and the TPM PCRs changed unexpectedly, and the user now needs to use their recovery key to get access to the OS back, let's handle this gracefully and automatically reenroll the current TPM PCRs at boot, after the recovery key checked out, so that for future boots everything is in order again.</p> </li> </ol> <p>Other approaches can work too: for example, some OSes simply remove TPM PCR policy protection of disk encryption keys altogether immediately before OS or firmware updates, and then reenable it right after. Of course, this opens a time window where the key bound to the TPM is much less protected than people might assume. I'd try to avoid such a scheme if possible.</p> <h2>Anything Else?</h2> <p>So, given that we are talking about idealized systems: I personally actually think the ideal OS would be much simpler, and thus more secure than this:</p> <p>I'd try to ditch the Shim, and instead focus on enrolling the distribution vendor keys directly in the UEFI firmware certificate list. This is actually supported by all firmwares too. This has various benefits: it's no longer necessary to bind everything to Microsoft's root key, you can just enroll your own stuff and thus make sure only what you want to trust is trusted and nothing else. To make an approach like this easier, we have been working on doing automatic enrollment of these keys from the <code>systemd-boot</code> boot loader, see <a href="https://github.com/systemd/systemd/pull/20255">this work in progress for details</a>. This way the Firmware will authenticate the boot loader/kernel/initrd without any further component for this in place.</p> <p>I'd also not bother with a separate boot partition, and just use the ESP for everything. The ESP is required anyway by the firmware, and is good enough for storing the few files we need.</p> <h2>FAQ</h2> <h3>Can I implement all of this in my distribution today?</h3> <p>Probably not. While the big issues have mostly been addressed there's a lot of integration work still missing. As you might have seen I linked some PRs that haven't even been merged into our tree yet, and definitely not been released yet or even entered the distributions.</p> <h3>Will this show up in Fedora/Debian/Ubuntu soon?</h3> <p>I don't know. I am making a proposal how these things might work, and am working on getting various building blocks for this into shape. What the distributions do is up to them. But even if they don't follow the recommendations I make 100%, or don't want to use the building blocks I propose I think it's important they start thinking about this, and yes, I think they should be thinking about defaulting to setups like this.</p> <p>Work for measuring/signing initrds on Fedora has been started, <a href="https://raw.githubusercontent.com/keszybz/mkosi-initrd-talk/main/mkosi-initrd.pdf">here's a slide deck with some information about it</a>.</p> <h3>But isn't a TPM evil?</h3> <p>Some corners of the community tried (unfortunately successfully to some degree) to paint TPMs/Trusted Computing/SecureBoot as generally evil technologies that stop us from using our systems the way we want. That idea is rubbish though, I think. We should focus on what it can deliver for us (and that's a lot I think, see above), and appreciate the fact we can actually use it to kick out perceived evil empires from our devices instead of being subjected to them. Yes, the way SecureBoot/TPMs are defined puts <em>you</em> in the driver seat if you want — and you may enroll your own certificates to keep out everything you don't like.</p> <h3>What if my system doesn't have a TPM?</h3> <p>TPMs are becoming quite ubiquitous, in particular as the upcoming Windows versions will require them. In general I think we should focus on modern, fully equipped systems when designing all this, and then find fall-backs for more limited systems. Frankly it feels as if so far the design approach for all this was the other way round: try to make the new stuff work like the old rather than the old like the new (I mean, to me it appears this thinking is the main raison d'être for the Grub boot loader).</p> <p>More specifically, on the systems where we have no TPM we ultimately cannot provide the same security guarantees as for those which have. So depending on the resource to protect we should fall back to different TPM-less mechanisms. For example, if we have no TPM then the root file system should probably be encrypted with a user provided password, typed in at boot as before. And for the encrypted boot credentials we probably should simply not encrypt them, and place them in the ESP unencrypted.</p> <p>Effectively this means: without TPM you'll still get protection regarding the basic attack scenario, as before, but not the other two.</p> <h3>What if my system doesn't have UEFI?</h3> <p>Many of the mechanisms explained above taken individually do not require UEFI. But of course the chain of trust suggested above requires something like UEFI SecureBoot. If your system lacks UEFI it's probably best to find work-alikes to the technologies suggested above, but I doubt I'll be able to help you there.</p> <h3>rpm/dpkg already cryptographically validates all packages at installation time (<code>gpg</code>), why would I need more than that?</h3> <p>This type of package validation happens once: at the moment of installation (or update) of the package, but not anymore when the data installed is actually used. Thus when an attacker manages to modify the package data after installation and before use they can make any change they like without this ever being noticed. Such package download validation does address certain attack scenarios (i.e. man-in-the-middle attacks on network downloads), but it doesn't protect you from attackers with physical access, as described in the attack scenarios above.</p> <p>Systems such as <code>ostree</code> aren't better than rpm/dpkg regarding this BTW, their data is not validated on use either, but only during download or when processing tree checkouts.</p> <p>Key really here is that the scheme explained here provides <em>offline</em> protection for the data "at rest" — even someone with physical access to your device cannot easily make changes that aren't noticed on next use. rpm/dpkg/ostree provide <em>online</em> protection only: as long as the system remains up, and all OS changes are done through the intended program code-paths, and no one has physical access everything should be good. In today's world I am sure this is not good enough though. As mentioned most modern OSes provide offline protection for the data at rest in one way or another. Generic Linux distributions are terribly behind on this.</p> <h3>This is all so desktop/laptop focused, what about servers?</h3> <p>I am pretty sure servers should provide similar security guarantees as outlined above. In a way servers are a much simpler case: there are no users and no interactivity. Thus the discussion of <code>/home/</code> and what it contains and of user passwords doesn't matter. However, the authenticated initrd and the unattended TPM-based encryption I think are very important for servers too, in a trusted data center environment. It provides security guarantees so far not given by Linux server OSes.</p> <h3>I'd like to help with this, or discuss/comment on this</h3> <p>Submit patches or reviews through <a href="https://github.com/systemd/systemd">GitHub</a>. General discussion about this is best done on the <a href="https://lists.freedesktop.org/mailman/listinfo/systemd-devel">systemd mailing list</a>.</p>Lennart PoetteringThu, 23 Sep 2021 00:00:00 +0200tag:0pointer.net,2021-09-23:/blog/authenticated-boot-and-disk-encryption-on-linux.htmlprojectsThe Wondrous World of Discoverable GPT Disk Imageshttps://0pointer.net/blog/the-wondrous-world-of-discoverable-gpt-disk-images.html<p><em>TL;DR: Tag your GPT partitions with the right, descriptive partition types, and the world will become a better place.</em></p> <p>A number of years ago we started the <a href="https://systemd.io/DISCOVERABLE_PARTITIONS">Discoverable Partitions Specification</a> which defines <a href="https://en.wikipedia.org/wiki/GUID_Partition_Table">GPT</a> partition type UUIDs and partition flags for the various partitions Linux systems typically deal with. Before the specification all Linux partitions usually just used the same type, basically saying "Hey, I am a Linux partition" and not much else. With this specification the GPT partition type, flags and label system becomes a lot more expressive, as it can tell you:</p> <ol> <li>What kind of data a partition contains (i.e. is this swap data, a file system or Verity data?)</li> <li>What the purpose/mount point of a partition is (i.e. is this a <code>/home/</code> partition or a root file system?)</li> <li>What CPU architecture a partition is intended for (i.e. is this a root partition for x86-64 or for aarch64?)</li> <li>Shall this partition be mounted automatically? (i.e. without specifically be configured via <code>/etc/fstab</code>)</li> <li>And if so, shall it be mounted read-only?</li> <li>And if so, shall the file system be grown to its enclosing partition size, if smaller?</li> <li>Which partition contains the newer version of the same data (i.e. multiple root file systems, with different versions)</li> </ol> <p>By embedding all of this information inside the GPT partition table disk images become self-descriptive: without requiring any other source of information (such as <code>/etc/fstab</code>) if you look at a compliant GPT disk image it is clear how an image is put together and how it should be used and mounted. This self-descriptiveness in particular breaks one philosophical weirdness of traditional Linux installations: the original source of information which file system the root file system is, typically is embedded in the root file system itself, in <code>/etc/fstab</code>. Thus, in a way, in order to know what the root file system is you need to know what the root file system is. 🤯 🤯 🤯</p> <p>(Of course, the way this recursion is traditionally broken up is by then copying the root file system information from <code>/etc/fstab</code> into the boot loader configuration, resulting in a situation where the primary source of information for this — i.e. <code>/etc/fstab</code> — is actually mostly irrelevant, and the secondary source — i.e. the copy in the boot loader — becomes the configuration that actually matters.)</p> <p>Today, the GPT partition type UUIDs defined by the specification have been adopted quite widely, by distributions and their installers, as well as a variety of partitioning tools and other tools.</p> <p>In this article I want to highlight how the various tools the <a href="https://systemd.io/">systemd</a> project provides make use of the concepts the specification introduces.</p> <p>But before we start with that, let's underline why tagging partitions with these descriptive partition type UUIDs (and the associated partition flags) is a good thing, besides the philosophical points made above.</p> <ol> <li> <p>Simplicity: in particular OS installers become simpler — adjusting <code>/etc/fstab</code> as part of the installation is not necessary anymore, as the partitioning step already put all information into place for assembling the system properly at boot. i.e. installing doesn't mean that you always have to get <code>fdisk</code> <em>and</em> <code>/etc/fstab</code> into place, the former suffices entirely.</p> </li> <li> <p>Robustness: since partition tables mostly remain static after installation the chance of corruption is much lower than if the data is stored in file systems (e.g. in <code>/etc/fstab</code>). Moreover by associating the metadata directly with the objects it describes the chance of things getting out of sync is reduced. (i.e. if you lose <code>/etc/fstab</code>, or forget to rerun your initrd builder you still know what a partition is supposed to be just by looking at it.)</p> </li> <li> <p>Programmability: if partitions are self-descriptive it's much easier to automatically process them with various tools. In fact, this blog story is mostly about that: various systemd tools can naturally process disk images prepared like this.</p> </li> <li> <p>Alternative entry points: on traditional disk images, the boot loader needs to be told which kernel command line option <code>root=</code> to use, which then provides access to the root file system, where <code>/etc/fstab</code> is then found which describes the rest of the file systems. Where precisely <code>root=</code> is configured for the boot loader highly depends on the boot loader and distribution used, and is typically encoded in a Turing complete programming language (Grub…). This makes it very hard to automatically determine the right root file system to use, to implement alternative entry points to the system. By alternative entry points I mean other ways to boot the disk image, specifically for running it as a <code>systemd-nspawn</code> container — but this extends to other mechanisms where the boot loader may be bypassed to boot up the system, for example <code>qemu</code> when configured without a boot loader.</p> </li> <li> <p>User friendliness: it's simply a lot nicer for the user looking at a partition table if the partition table explains what is what, instead of just saying "Hey, this is a Linux partition!" and nothing else.</p> </li> </ol> <h1>Uses for the concept</h1> <p>Now that we cleared up the Why?, lets have a closer look how this is currently used and exposed in <code>systemd</code>'s various components.</p> <h2>Use #1: Running a disk image in a container</h2> <p>If a disk image follows the Discoverable Partition Specification then <a href="https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html"><code>systemd-nspawn</code></a> has all it needs to just boot it up. Specifically, if you have a GPT disk image in a file <code>foobar.raw</code> and you want to boot it up in a container, just run <code>systemd-nspawn -i foobar.raw -b</code>, and that's it (you can specify a block device like <code>/dev/sdb</code> too if you like). It becomes easy and natural to prepare disk images that can be booted either on a physical machine, inside a virtual machine manager or inside such a container manager: the necessary meta-information is included in the image, easily accessible before actually looking into its file systems.</p> <h2>Use #2: Booting an OS image on bare-metal without <code>/etc/fstab</code> or kernel command line <code>root=</code></h2> <p>If a disk image follows the specification in many cases you can remove <code>/etc/fstab</code> (or never even install it) — as the basic information needed is already included in the partition table. The <a href="https://www.freedesktop.org/software/systemd/man/systemd-gpt-auto-generator.html"><code>systemd-gpt-auto-generator</code></a> logic implements automatic discovery of the root file system as well as all auxiliary file systems. (Note that the former requires an initrd that uses systemd, some more conservative distributions do not support that yet, unfortunately). Effectively this means you can boot up a kernel/initrd with an entirely empty kernel command line, and the initrd will automatically find the root file system (by looking for a suitably marked partition on the same drive the EFI System Partition was found on).</p> <p>(Note, if <code>/etc/fstab</code> or <code>root=</code> exist and contain relevant information they always takes precedence over the automatic logic. This is in particular useful to tweaks thing by specifying additional mount options and such.)</p> <h2>Use #3: Mounting a complex disk image for introspection or manipulation</h2> <p>The <a href="https://www.freedesktop.org/software/systemd/man/systemd-dissect.html"><code>systemd-dissect</code></a> tool may be used to introspect and manipulate OS disk images that implement the specification. If you pass the path to a disk image (or block device) it will extract various bits of useful information from the image (e.g. what OS is this? what partitions to mount?) and display it.</p> <p>With the <code>--mount</code> switch a disk image (or block device) can be mounted to some location. This is useful for looking what is inside it, or changing its contents. This will dissect the image and then automatically mount all contained file systems matching their GPT partition description to the right places, so that you subsequently could <code>chroot</code> into it. (But why <code>chroot</code> if you can just use <code>systemd-nspawn</code>? 😎)</p> <h2>Use #4: Copying files in and out of a disk image</h2> <p>The <a href="https://www.freedesktop.org/software/systemd/man/systemd-dissect.html"><code>systemd-dissect</code></a> tool also has two switches <code>--copy-from</code> and <code>--copy-to</code> which allow copying files out of or into a compliant disk image, taking all included file systems and the resulting mount hierarchy into account.</p> <h2>Use #5: Running services directly off a disk image</h2> <p>The <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RootImage="><code>RootImage=</code></a> setting in service unit files accepts paths to compliant disk images (or block device nodes), and can mount them automatically, running service binaries directly off them (in <code>chroot()</code> style). In fact, this is the base for the <a href="https://systemd.io/PORTABLE_SERVICES">Portable Service</a> concept of systemd.</p> <h2>Use #6: Provisioning disk images</h2> <p><code>systemd</code> provides various tools that can run operations provisioning disk images in an "offline" mode. Specifically:</p> <h3><code>systemd-tmpfiles</code></h3> <p>With the <code>--image=</code> switch <a href="https://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html"><code>systemd-tmpfiles</code></a> can directly operate on a disk image, and for example create all directories and other inodes defined in its declarative configuration files included in the image. This can be useful for example to set up the <code>/var/</code> or <code>/etc/</code> tree according to such configuration before first boot.</p> <h3><code>systemd-sysusers</code></h3> <p>Similar, the <code>--image=</code> switch of <a href="https://www.freedesktop.org/software/systemd/man/systemd-sysusers.html"><code>systemd-sysusers</code></a> tells the tool to read the declarative system user specifications included in the image and synthesizes system users from it, writing them to the <code>/etc/passwd</code> (and related) files in the image. This is useful for provisioning these users before the first boot, for example to ensure UID/GID numbers are pre-allocated, and such allocations not delayed until first boot.</p> <h3><code>systemd-machine-id-setup</code></h3> <p>The <code>--image=</code> switch of <a href="https://www.freedesktop.org/software/systemd/man/systemd-machine-id-setup.html"><code>systemd-machine-id-setup</code></a> may be used to provision a fresh machine ID into <a href="https://www.freedesktop.org/software/systemd/man/machine-id.html"><code>/etc/machine-id</code></a> of a disk image, before first boot.</p> <h3><code>systemd-firstboot</code></h3> <p>The <code>--image=</code> switch of <a href="https://www.freedesktop.org/software/systemd/man/systemd-firstboot.html"><code>systemd-firstboot</code></a> may be used to set various basic system setting (such as root password, locale information, hostname, …) on the specified disk image, before booting it up.</p> <h2>Use #7: Extracting log information</h2> <p>The <a href="https://www.freedesktop.org/software/systemd/man/journalctl.html"><code>journalctl</code></a> switch <code>--image=</code> may be used to show the journal log data included in a disk image (or, as usual, the specified block device). This is very useful for analyzing failed systems offline, as it gives direct access to the logs without any further, manual analysis.</p> <h2>Use #8: Automatic repartitioning/growing of file systems</h2> <p>The <a href="https://www.freedesktop.org/software/systemd/man/systemd-repart.html"><code>systemd-repart</code></a> tool may be used to repartition a disk or image in an declarative and additive way. One primary use-case for it is to run during boot on physical or VM systems to grow the root file system to the disk size, or to add in, format, encrypt, populate additional partitions at boot.</p> <p>With its <code>--image=</code> switch it the tool may operate on compliant disk images in <em>offline</em> mode of operation: it will then read the partition definitions that shall be grown or created off the image itself, and then apply them to the image. This is particularly useful in combination with the <code>--size=</code> which allows growing disk images to the specified size.</p> <p>Specifically, consider the following work-flow: you download a minimized disk image <code>foobar.raw</code> that contains only the minimized root file system (and maybe an ESP, if you want to boot it on bare-metal, too). You then run <code>systemd-repart --image=foo.raw --size=15G</code> to enlarge the image to the 15G, based on the declarative rules defined in the <a href="https://www.freedesktop.org/software/systemd/man/repart.d.html"><code>repart.d/</code></a> drop-in files included in the image (this means this can grow the root partition, and/or add in more partitions, for example for <code>/srv</code> or so, maybe encrypted with a locally generated key or so). Then, you proceed to boot it up with <code>systemd-nspawn --image=foo.raw -b</code>, making use of the full 15G.</p> <h1>Versioning + Multi-Arch</h1> <p>Disk images implementing this specifications can carry OS executables in one of three ways:</p> <ol> <li> <p>Only a root file system</p> </li> <li> <p>Only a <code>/usr/</code> file system (in which case the root file system is automatically picked as <code>tmpfs</code>).</p> </li> <li> <p>Both a root and a <code>/usr/</code>file system (in which case the two are combined, the <code>/usr/</code> file system mounted into the root file system, and the former possibly in read-only fashion`)</p> </li> </ol> <p>They may also contain OS executables for different architectures, permitting "multi-arch" disk images that can safely boot up on multiple CPU architectures. As the root and <code>/usr/</code> partition type UUIDs are specific to architectures this is easily done by including one such partition for <code>x86-64</code>, and another for <code>aarch64</code>. If the image is now used on an <code>x86-64</code> system automatically the former partition is used, on <code>aarch64</code> the latter.</p> <p>Moreover, these OS executables may be contained in different versions, to implement a simple versioning scheme: when tools such as <code>systemd-nspawn</code> or <code>systemd-gpt-auto-generator</code> dissect a disk image, and they find two or more root or <code>/usr/</code> partitions of the same type UUID, they will automatically pick the one whose GPT partition label (a 36 character free-form string every GPT partition may have) is the newest according to <a href="https://man7.org/linux/man-pages/man3/strverscmp.3.html"><code>strverscmp()</code></a> (OK, truth be told, we don't use <code>strverscmp()</code> as-is, but a modified version with some more modern syntax and semantics, but conceptually identical).</p> <p>This logic allows to implement a very simple and natural A/B update scheme: an updater can drop multiple versions of the OS into separate root or <code>/usr/</code> partitions, always updating the partition label to the version included there-in once the download is complete. All of the tools described here will then honour this, and always automatically pick the newest version of the OS.</p> <h1>Verity</h1> <p>When building modern OS appliances, security is highly relevant. Specifically, <em>offline</em> security matters: an attacker with physical access should have a difficult time modifying the OS in a way that isn't noticed. i.e. think of a car or a cell network base station: these appliances are usually parked/deployed in environments attackers can get physical access to: it's essential that in this case the OS itself sufficiently protected, so that the attacker cannot just mount the OS file system image, make modifications (inserting a backdoor, spying software or similar) and the system otherwise continues to run without this being immediately detected.</p> <p>A great way to implement offline security is via Linux' <code>dm-verity</code> subsystem: it allows to securely bind immutable disk IO to a single, short trusted hash value: if an attacker manages to offline modify the disk image the modified disk image won't match the trusted hash anymore, and will not be trusted anymore (depending on policy this then just result in IO errors being generated, or automatic reboot/power-off).</p> <p>The Discoverable Partitions Specification declares how to include Verity validation data in disk images, and how to relate them to the file systems they protect, thus making if very easy to deploy and work with such protected images. For example <code>systemd-nspawn</code> supports a <code>--root-hash=</code> switch, which accepts the Verity root hash and then will automatically assemble <code>dm-verity</code> with this, automatically matching up the payload and verity partitions. (Alternatively, just place a <code>.roothash</code> file next to the image file).</p> <h1>Future</h1> <p>The above already is a powerful tool set for working with disk images. However, there are some more areas I'd like to extend this logic to:</p> <h2><code>bootctl</code></h2> <p>Similar to the other tools mentioned above, <a href="https://www.freedesktop.org/software/systemd/man/bootctl.html"><code>bootctl</code></a> (which is a tool to interface with the boot loader, and install/update systemd's own EFI boot loader <a href="https://www.freedesktop.org/software/systemd/man/bootctl.html"><code>sd-boot</code></a>) should learn a <code>--image=</code> switch, to make installation of the boot loader on disk images easy and natural. It would automatically find the ESP and other relevant partitions in the image, and copy the boot loader binaries into them (or update them).</p> <h2><code>coredumpctl</code></h2> <p>Similar to the existing <code>journalctl --image=</code> logic the <code>coredumpctl</code> tool should also gain an <code>--image=</code> switch for extracting coredumps from compliant disk images. The combination of <code>journalctl --image=</code> and <code>coredumpctl --image=</code> would make it exceptionally easy to work with OS disk images of appliances and extracting logging and debugging information from them after failures.</p> <p>And that's all for now. Please refer to the specification and the man pages for further details. If your distribution's installer does not yet tag the GPT partition it creates with the right GPT type UUIDs, consider asking them to do so.</p> <p>Thank you for your time.</p>Lennart PoetteringFri, 11 Jun 2021 00:00:00 +0200tag:0pointer.net,2021-06-11:/blog/the-wondrous-world-of-discoverable-gpt-disk-images.htmlprojectsFile Descriptor Limitshttps://0pointer.net/blog/file-descriptor-limits.html<p><em>TL;DR: don't use <code>select()</code> + bump the <code>RLIMIT_NOFILE</code> soft limit to the hard limit in your modern programs.</em></p> <p>The primary way to reference, allocate and pin runtime OS resources on Linux today are file descriptors ("fds"). Originally they were used to reference open files and directories and maybe a bit more, but today they may be used to reference almost any kind of runtime resource in Linux userspace, including open devices, memory (<a href="https://man7.org/linux/man-pages/man2/memfd_create.2.html"><code>memfd_create(2)</code></a>), timers (<a href="https://man7.org/linux/man-pages/man2/timerfd_create.2.html"><code>timefd_create(2)</code></a>) and even processes (with the new <a href="https://man7.org/linux/man-pages/man2/pidfd_open.2.html"><code>pidfd_open(2)</code></a> system call). In a way, the philosophically skewed UNIX concept of "everything is a file" through the proliferation of fds actually acquires a bit of sensible meaning: "everything <em>has</em> a file <em>descriptor</em>" is certainly a much better motto to adopt.</p> <p>Because of this proliferation of fds, non-trivial modern programs tend to have to deal with substantially more fds at the same time than they traditionally did. Today, you'll often encounter real-life programs that have a few thousand fds open at the same time.</p> <p>Like on most runtime resources on Linux limits are enforced on file descriptors: once you hit the resource limit configured via <a href="https://man7.org/linux/man-pages/man2/getrlimit.2.html"><code>RLIMIT_NOFILE</code></a> any attempt to allocate more is refused with the <code>EMFILE</code> error — until you close a couple of those you already have open.</p> <p>Because fds weren't such a universal concept traditionally, the limit of <code>RLIMIT_NOFILE</code> used to be quite low. Specifically, when the Linux kernel first invokes userspace it still sets <code>RLIMIT_NOFILE</code> to a low value of 1024 (soft) and 4096 (hard). (Quick explanation: the <em>soft</em> limit is what matters and causes the <code>EMFILE</code> issues, the <em>hard</em> limit is a secondary limit that processes may bump their soft limit to — if they like — without requiring further privileges to do so. Bumping the limit further would require privileges however.). A limit of 1024 fds made fds a <em>scarce</em> resource: APIs tried to be careful with using fds, since you simply couldn't have that many of them at the same time. This resulted in some questionable coding decisions and concepts at various places: often secondary descriptors that are very similar to fds — but were not actually fds — were introduced (e.g. inotify watch descriptors), simply to avoid for them the low limits enforced on true fds. Or code tried to aggressively close fds when not absolutely needing them (e.g. <code>ftw()</code>/<code>nftw()</code>), losing the nice + stable "pinning" effect of open fds.</p> <p>Worse though is that certain OS level APIs were designed having only the low limits in mind. The worst offender being the BSD/POSIX <a href="https://man7.org/linux/man-pages/man2/select.2.html"><code>select(2)</code></a> system call: it only works with fds in the numeric range of 0…1023 (aka <code>FD_SETSIZE</code>-1). If you have an fd outside of this range, tough luck: select() won't work, and only if you are lucky you'll detect that and can handle it somehow.</p> <p>Linux fds are exposed as simple integers, and for most calls it is guaranteed that the lowest unused integer is allocated for new fds. Thus, as long as the <code>RLIMIT_NOFILE</code> soft limit is set to 1024 everything remains compatible with <code>select()</code>: the resulting fds will also be below 1024. Yay. If we'd bump the soft limit above this threshold though and at some point in time an fd higher than the threshold is allocated, this fd would not be compatible with <code>select()</code> anymore.</p> <p>Because of that, indiscriminately increasing the soft <code>RLIMIT_NOFILE</code> resource limit today for every userspace process is problematic: as long as there's userspace code still using <code>select()</code> doing so will risk triggering hard-to-handle, hard-to-debug errors all over the place.</p> <p>However, given the nowadays ubiquitous use of fds for all kinds of resources (did you know, an eBPF program is an fd? and a cgroup too? and attaching an eBPF program to cgroup is another fd? …), we'd really like to raise the limit anyway. 🤔</p> <p>So before we continue thinking about this problem, let's make the problem more complex (…uh, I mean… "more exciting") first. Having just one hard and one soft per-process limit on fds is boring. Let's add more limits on fds to the mix. Specifically on Linux there are two system-wide sysctls: <code>fs.nr_open</code> and <code>fs.file-max</code>. (Don't ask me why one uses a dash and the other an underscore, or why there are two of them...) On today's kernels they kinda lost their relevance. They had some originally, because fds weren't accounted by any other counter. But today, the kernel tracks fds mostly as small pieces of memory allocated on userspace requests — because that's ultimately what they are —, and thus charges them to the memory accounting done anyway.</p> <p>So now, we have four limits (actually: five if you count the memory accounting) on the same kind of resource, and all of them make a resource artificially scarce that we don't want to be scarce. So what to do?</p> <p>Back in systemd v240 already (i.e. 2019) we decided to do something about it. Specifically:</p> <ul> <li> <p>Automatically at boot we'll now bump the two sysctls to their maximum, making them effectively ineffective. This one was easy. We got rid of two pretty much redundant knobs. Nice!</p> </li> <li> <p>The <code>RLIMIT_NOFILE</code> hard limit is bumped substantially to 512K. Yay, cheap fds! <em>You</em> may have an fd, and <em>you</em>, and <em>you</em> as well, <em>everyone</em> may have an fd!</p> </li> <li> <p>But … we left the soft <code>RLIMIT_NOFILE</code> limit at 1024. We weren't quite ready to break all programs still using <code>select()</code> in 2019 yet. But it's not as bad as it might sound I think: given the hard limit is bumped every program can easily opt-in to a larger number of fds, by setting the soft limit to the hard limit early on — without requiring privileges.</p> </li> </ul> <p>So effectively, with this approach fds should be much less scarce (at least for programs that opt into that), and the limits should be much easier to configure, since there are only two knobs now one really needs to care about:</p> <ul> <li> <p>Configure the <code>RLIMIT_NOFILE</code> hard limit to the maximum number of fds you actually want to allow a process.</p> </li> <li> <p>In the program code then either bump the soft to the hard limit, or not. If you do, you basically declare "I understood the problem, I promise to not use <code>select()</code>, drown me fds please!". If you don't then effectively everything remains as it always was.</p> </li> </ul> <p>Apparently this approach worked, since the negative feedback on change was even scarcer than fds traditionally were (ha, fun!). We got reports from pretty much only two projects that were bitten by the change (one being a JVM implementation): they already bumped their soft limit automatically to their hard limit during program initialization, and then allocated an array with one entry per possible fd. With the new high limit this resulted in one massive allocation that traditionally was just a few K, and this caused memory checks to be hit.</p> <p>Anyway, here's the take away of this blog story:</p> <ul> <li> <p>Don't use <code>select()</code> anymore in 2021. Use <code>poll()</code>, <code>epoll</code>, <code>iouring</code>, …, but for heaven's sake don't use <code>select()</code>. It might have been all the rage in the 1990s but it doesn't scale and is simply not designed for today's programs. I wished the man page of <code>select()</code> would make clearer how icky it is and that there are plenty of more preferably APIs.</p> </li> <li> <p>If you hack on a program that potentially uses a lot of fds, add <a href="https://github.com/systemd/systemd/blob/e7901aba1480db21e06e21cef4f6486ad71b2ec5/src/basic/rlimit-util.c#L373">some simple code</a> somewhere to its start-up that bumps the <code>RLIMIT_NOFILE</code> soft limit to the hard limit. But if you do this, you have to make sure your code (and any code that you link to from it) refrains from using <code>select()</code>. (Note: there's at least one glibc NSS plugin using <code>select()</code> internally. Given that NSS modules can end up being loaded into pretty much <em>any</em> process such modules should probably be considered just buggy.)</p> </li> <li> <p>If said program you hack on forks off foreign programs, make sure to reset the <code>RLIMIT_NOFILE</code> soft limit <a href="https://github.com/systemd/systemd/blob/e7901aba1480db21e06e21cef4f6486ad71b2ec5/src/basic/rlimit-util.c#L394">back to 1024</a> for them. Just because your program might be fine with fds &gt;= 1024 it doesn't mean that those foreign programs might. And unfortunately <code>RLIMIT_NOFILE</code> is inherited down the process tree unless explicitly set.</p> </li> </ul> <p>And that's all I have for today. I hope this was enlightening.</p>Lennart PoetteringWed, 19 May 2021 00:00:00 +0200tag:0pointer.net,2021-05-19:/blog/file-descriptor-limits.htmlprojectsUnlocking LUKS2 volumes with TPM2, FIDO2, PKCS#11 Security Hardware on systemd 248https://0pointer.net/blog/unlocking-luks2-volumes-with-tpm2-fido2-pkcs11-security-hardware-on-systemd-248.html<p><em>TL;DR: It's now easy to unlock your LUKS2 volume with a FIDO2 security token (e.g. YubiKey, Nitrokey FIDO2, AuthenTrend ATKey.Pro). And TPM2 unlocking is easy now too.</em></p> <p>Blogging is a lot of work, and a lot less fun than hacking. I mostly focus on the latter because of that, but from time to time I guess stuff is just too interesting to not be blogged about. Hence here, finally, another blog story about exciting new features in systemd.</p> <p>With the upcoming systemd v248 the <a href="https://www.freedesktop.org/software/systemd/man/systemd-cryptsetup@.service.html"><code>systemd-cryptsetup</code></a> component of systemd (which is responsible for assembling encrypted volumes during boot) gained direct support for unlocking encrypted storage with three types of security hardware:</p> <ol> <li> <p>Unlocking with FIDO2 security tokens (well, at least with those which implement the <code>hmac-secret</code> extension; most do). i.e. your YubiKeys (series 5 and above), Nitrokey FIDO2, AuthenTrend ATKey.Pro and such.</p> </li> <li> <p>Unlocking with TPM2 security chips (pretty ubiquitous on non-budget PCs/laptops/…)</p> </li> <li> <p>Unlocking with PKCS#11 security tokens, i.e. your smartcards and older YubiKeys (the ones that implement PIV). (Strictly speaking this was supported on older systemd already, but was a lot more "manual".)</p> </li> </ol> <p>For completeness' sake, let's keep in mind that the component also allows unlocking with these more traditional mechanisms:</p> <ol> <li> <p>Unlocking interactively with a user-entered passphrase (i.e. the way most people probably already deploy it, supported since about forever)</p> </li> <li> <p>Unlocking via key file on disk (optionally on removable media plugged in at boot), supported since forever.</p> </li> <li> <p>Unlocking via a key acquired through trivial <code>AF_UNIX</code>/<code>SOCK_STREAM</code> socket IPC. (Also new in v248)</p> </li> <li> <p>Unlocking via <em>recovery</em> <em>keys</em>. These are pretty much the same thing as a regular passphrase (and in fact can be entered wherever a passphrase is requested) — the main difference being that they are always generated by the computer, and thus have guaranteed high entropy, typically higher than user-chosen passphrases. They are generated in a way they are easy to type, in many cases even if the local key map is misconfigured. (Also new in v248)</p> </li> </ol> <p>In this blog story, let's focus on the first three items, i.e. those that talk to specific types of hardware for implementing unlocking.</p> <p>To make working with security tokens and TPM2 easy, a new, small tool was added to the systemd tool set: <a href="https://www.freedesktop.org/software/systemd/man/systemd-cryptenroll.html">systemd-cryptenroll</a>. It's only purpose is to make it easy to enroll your security token/chip of choice into an encrypted volume. It works with any LUKS2 volume, and embeds a tiny bit of meta-information into the LUKS2 header with parameters necessary for the unlock operation.</p> <h1>Unlocking with FIDO2</h1> <p>So, let's see how this fits together in the FIDO2 case. Most likely this is what you want to use if you have one of these fancy FIDO2 tokens (which need to implement the <code>hmac-secret</code> extension, as mentioned). Let's say you already have your LUKS2 volume set up, and previously unlocked it with a simple passphrase. Plug in your token, and run:</p> <div class="highlight"><pre><span></span><code><span class="gp"># </span>systemd-cryptenroll --fido2-device<span class="o">=</span>auto /dev/sda5 </code></pre></div> <p>(Replace <code>/dev/sda5</code> with the underlying block device of your volume).</p> <p>This will enroll the key as an additional way to unlock the volume, and embeds all necessary information for it in the LUKS2 volume header. Before we can unlock the volume with this at boot, we need to allow FIDO2 unlocking via <a href="https://www.freedesktop.org/software/systemd/man/crypttab.html"><code>/etc/crypttab</code></a>. For that, find the right entry for your volume in that file, and edit it like so:</p> <div class="highlight"><pre><span></span><code>myvolume /dev/sda5 - fido2-device=auto </code></pre></div> <p>Replace <code>myvolume</code> and <code>/dev/sda5</code> with the right volume name, and underlying device of course. Key here is the <code>fido2-device=auto</code> option you need to add to the fourth column in the file. It tells <code>systemd-cryptsetup</code> to use the FIDO2 metadata now embedded in the LUKS2 header, wait for the FIDO2 token to be plugged in at boot (utilizing <code>systemd-udevd</code>, …) and unlock the volume with it.</p> <p>And that's it already. Easy-peasy, no?</p> <p>Note that all of this doesn't modify the FIDO2 token itself in any way. Moreover you can enroll the same token in as many volumes as you like. Since all enrollment information is stored in the LUKS2 header (and not on the token) there are no bounds on any of this. (OK, well, admittedly, there's a cap on LUKS2 key slots per volume, i.e. you can't enroll more than a bunch of keys per volume.)</p> <h1>Unlocking with PKCS#11</h1> <p>Let's now have a closer look how the same works with a PKCS#11 compatible security token or smartcard. For this to work, you need a device that can store an RSA key pair. I figure most security tokens/smartcards that implement PIV qualify. How you actually get the keys onto the device might differ though. Here's how you do this for any YubiKey that implements the PIV feature:</p> <div class="highlight"><pre><span></span><code><span class="gp"># </span>ykman piv reset <span class="gp"># </span>ykman piv generate-key -a RSA2048 9d pubkey.pem <span class="gp"># </span>ykman piv generate-certificate --subject <span class="s2">&quot;Knobelei&quot;</span> 9d pubkey.pem <span class="gp"># </span>rm pubkey.pem </code></pre></div> <p>(This chain of commands erases what was stored in PIV feature of your token before, be careful!)</p> <p>For tokens/smartcards from other vendors a different series of commands might work. Once you have a key pair on it, you can enroll it with a LUKS2 volume like so:</p> <div class="highlight"><pre><span></span><code><span class="gp"># </span>systemd-cryptenroll --pkcs11-token-uri<span class="o">=</span>auto /dev/sda5 </code></pre></div> <p>Just like the same command's invocation in the FIDO2 case this enrolls the security token as an additional way to unlock the volume, any passphrases you already have enrolled remain enrolled.</p> <p>For the PKCS#11 case you need to edit your <code>/etc/crypttab</code> entry like this:</p> <div class="highlight"><pre><span></span><code>myvolume /dev/sda5 - pkcs11-uri=auto </code></pre></div> <p>If you have a security token that implements both PKCS#11 PIV and FIDO2 I'd probably enroll it as FIDO2 device, given it's the more contemporary, future-proof standard. Moreover, it requires no special preparation in order to get an RSA key onto the device: FIDO2 keys typically <em>just</em> <em>work</em>.</p> <h1>Unlocking with TPM2</h1> <p>Most modern (non-budget) PC hardware (and other kind of hardware too) nowadays comes with a TPM2 security chip. In many ways a TPM2 chip is a smartcard that is soldered onto the mainboard of your system. Unlike your usual USB-connected security tokens you thus cannot remove them from your PC, which means they address quite a different security scenario: they aren't immediately comparable to a physical key you can take with you that unlocks some door, but they are a key you leave at the door, but that refuses to be turned by anyone but you.</p> <p>Even though this sounds a lot weaker than the FIDO2/PKCS#11 model TPM2 still bring benefits for securing your systems: because the cryptographic key material stored in TPM2 devices cannot be extracted (at least that's the theory), if you bind your hard disk encryption to it, it means attackers cannot just copy your disk and analyze it offline — they always need access to the TPM2 chip too to have a chance to acquire the necessary cryptographic keys. Thus, they can still steal your whole PC and analyze it, but they cannot just copy the disk without you noticing and analyze the copy.</p> <p>Moreover, you can bind the ability to unlock the harddisk to specific software versions: for example you could say that only your trusted Fedora Linux can unlock the device, but not any arbitrary OS some hacker might boot from a USB stick they plugged in. Thus, if you trust your OS vendor, you can entrust storage unlocking to the vendor's OS together with your TPM2 device, and thus can be reasonably sure intruders cannot decrypt your data unless they both hack your OS vendor <em>and</em> steal/break your TPM2 chip.</p> <p>Here's how you enroll your LUKS2 volume with your TPM2 chip:</p> <div class="highlight"><pre><span></span><code><span class="gp"># </span>systemd-cryptenroll --tpm2-device<span class="o">=</span>auto --tpm2-pcrs<span class="o">=</span><span class="m">7</span> /dev/sda5 </code></pre></div> <p>This looks almost as straightforward as the two earlier <code>sytemd-cryptenroll</code> command lines — if it wasn't for the <code>--tpm2-pcrs=</code> part. With that option you can specify to which TPM2 PCRs you want to bind the enrollment. TPM2 PCRs are a set of (typically 24) hash values that every TPM2 equipped system at boot calculates from all the software that is invoked during the boot sequence, in a secure, unfakable way (this is called "measurement"). If you bind unlocking to a specific value of a specific PCR you thus require the system has to follow the same sequence of software at boot to re-acquire the disk encryption key. Sounds complex? Well, that's because it is.</p> <p>For now, let's see how we have to modify your <code>/etc/crypttab</code> to unlock via TPM2:</p> <div class="highlight"><pre><span></span><code>myvolume /dev/sda5 - tpm2-device=auto </code></pre></div> <p>This part is easy again: the <code>tpm2-device=</code> option is what tells <code>systemd-cryptsetup</code> to use the TPM2 metadata from the LUKS2 header and to wait for the TPM2 device to show up.</p> <h1>Bonus: Recovery Key Enrollment</h1> <p>FIDO2, PKCS#11 and TPM2 security tokens and chips pair well with recovery keys: since you don't need to type in your password everyday anymore it makes sense to get rid of it, and instead enroll a high-entropy recovery key you then print out or scan off screen and store a safe, physical location. i.e. forget about good ol' passphrase-based unlocking, go for FIDO2 plus recovery key instead! Here's how you do it:</p> <div class="highlight"><pre><span></span><code><span class="gp"># </span>systemd-cryptenroll --recovery-key /dev/sda5 </code></pre></div> <p>This will generate a key, enroll it in the LUKS2 volume, show it to you on screen and generate a QR code you may scan off screen if you like. The key has highest entropy, and can be entered wherever you can enter a passphrase. Because of that you don't have to modify <code>/etc/crypttab</code> to make the recovery key work.</p> <h1>Future</h1> <p>There's still plenty room for further improvement in all of this. In particular for the TPM2 case: what the text above doesn't really mention is that binding your encrypted volume unlocking to specific software versions (i.e. kernel + initrd + OS versions) actually sucks hard: if you naively update your system to newer versions you might lose access to your TPM2 enrolled keys (which isn't terrible, after all you did enroll a recovery key — <em>right</em>? — which you then can use to regain access). To solve this some more integration with distributions would be necessary: whenever they upgrade the system they'd have to make sure to enroll the TPM2 again — with the PCR hashes matching the new version. And whenever they remove an old version of the system they need to remove the old TPM2 enrollment. Alternatively TPM2 also knows a concept of <em>signed</em> PCR hash values. In this mode the distro could just ship a set of PCR signatures which would unlock the TPM2 keys. (But quite frankly I don't really see the point: whether you drop in a signature file on each system update, or enroll a new set of PCR hashes in the LUKS2 header doesn't make much of a difference). Either way, to make TPM2 enrollment smooth some more integration work with your distribution's system update mechanisms need to happen. And yes, because of this OS updating complexity the example above — where I referenced your trusty Fedora Linux — doesn't actually work IRL (yet? hopefully…). Nothing updates the enrollment automatically after you initially enrolled it, hence after the first kernel/initrd update you have to manually re-enroll things again, and again, and again … after every update.</p> <p>The TPM2 could also be used for other kinds of key policies, we might look into adding later too. For example, Windows uses TPM2 stuff to allow short (4 digits or so) "PINs" for unlocking the harddisk, i.e. kind of a low-entropy password you type in. The reason this is reasonably safe is that in this case the PIN is passed to the TPM2 which enforces that not more than some limited amount of unlock attempts may be made within some time frame, and that after too many attempts the PIN is invalidated altogether. Thus making dictionary attacks harder (which would normally be easier given the short length of the PINs).</p> <h1>Postscript</h1> <p>(BTW: Yubico sent me two YubiKeys for testing, Nitrokey a Nitrokey FIDO2, and AuthenTrend three ATKey.Pro tokens, thank you! — That's why you see all those references to YubiKey/Nitrokey/AuthenTrend devices in the text above: it's the hardware I had to test this with. That said, I also tested the FIDO2 stuff with a SoloKey I bought, where it also worked fine. And yes, you!, other vendors!, who might be reading this, please send me your security tokens <em>for</em> <em>free</em>, too, and I might test things with them as well. No promises though. And I am not going to give them back, if you do, sorry. ;-))</p>Lennart PoetteringWed, 13 Jan 2021 00:00:00 +0100tag:0pointer.net,2021-01-13:/blog/unlocking-luks2-volumes-with-tpm2-fido2-pkcs11-security-hardware-on-systemd-248.htmlprojectsASG! 2019 CfP Re-Opened!https://0pointer.net/blog/asg-2019-cfp-re-opened.html<p><large><b>The All Systems Go! 2019 Call for Participation Re-Opened for ONE DAY!</b></large></p> <p>Due to popular request we have re-opened the Call for Participation (CFP) for <a href="https://all-systems-go.io/">All Systems Go! 2019</a> for one day. It will close again <em>TODAY</em>, on 15 of July 2019, midnight Central European Summit Time! If you missed the deadline so far, we’d like to invite you to submit your proposals for consideration to <a href="https://cfp.all-systems-go.io/ASG2019/cfp">the CFP submission site</a> quickly! (And yes, this is the last extension, there's not going to be any more extensions.)</p> <p><img src="https://pbs.twimg.com/profile_banners/869627937145802752/1551356869/1500x500" alt="ASG image" width="1000" height="333"/></p> <p>All Systems Go! is everybody's favourite low-level Userspace Linux conference, taking place in Berlin, Germany in September 20-22, 2019.</p> <p>For more information please visit <a href="https://all-systems-go.io/">our conference website</a>!</p>Lennart PoetteringMon, 15 Jul 2019 00:00:00 +0200tag:0pointer.net,2019-07-15:/blog/asg-2019-cfp-re-opened.htmlprojectsWalkthrough for Portable Services in Gohttps://0pointer.net/blog/walkthrough-for-portable-services-in-go.html<h1>Portable Services Walkthrough (Go Edition)</h1> <p>A few months ago I posted <a href="http://0pointer.net/blog/walkthrough-for-portable-services.html">a blog story with a walkthrough of systemd Portable Services</a>. The example service given was written in C, and the image was built with <a href="https://github.com/systemd/mkosi"><code>mkosi</code></a>. In this blog story I'd like to revisit the exercise, but this time focus on a different aspect: modern programming languages like Go and Rust push users a lot more towards static linking of libraries than the usual dynamic linking preferred by C (at least in the way C is used by traditional Linux distributions).</p> <p>Static linking means we can greatly simplify image building: if we don't have to link against shared libraries during runtime we don't have to include them in the portable service image. And that means pretty much all need for building an image from a Linux distribution of some kind goes away as we'll have next to no dependencies that would require us to rely on a distribution package manager or distribution packages. In fact, as it turns out, we only need as few as three files in the portable service image to be fully functional.</p> <p>So, let's have a closer look how such an image can be put together. All of the following is available in <a href="https://github.com/systemd/portable-walkthrough-go">this git repository</a>.</p> <h2>A Simple Go Service</h2> <p>Let's start with a simple Go service, an HTTP service that simply counts how often a page from it is requested. Here are the sources: <a href="https://github.com/systemd/portable-walkthrough-go/blob/master/main.go">main.go</a> — note that I am not a seasoned Go programmer, hence please be gracious.</p> <p>The service implements systemd's socket activation protocol, and thus can receive bound TCP listener sockets from systemd, using the <code>$LISTEN_PID</code> and <code>$LISTEN_FDS</code> environment variables.</p> <p>The service will store the counter data in the directory indicated in the <code>$STATE_DIRECTORY</code> environment variable, which happens to be an environment variable current systemd versions set based on the <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RuntimeDirectory="><code>StateDirectory=</code></a> setting in service files.</p> <h1>Two Simple Unit Files</h1> <p>When a service shall be managed by systemd a unit file is required. Since the service we are putting together shall be socket activatable, we even have two: <a href="https://github.com/systemd/portable-walkthrough-go/blob/master/portable-walkthrough-go.service"><code>portable-walkthrough-go.service</code></a> (the description of the service binary itself) and <a href="https://github.com/systemd/portable-walkthrough-go/blob/master/portable-walkthrough-go.socket"><code>portable-walkthrough-go.socket</code></a> (the description of the sockets to listen on for the service).</p> <p>These units are not particularly remarkable: the <code>.service</code> file primarily contains the command line to invoke and a <code>StateDirectory=</code> setting to make sure the service when invoked gets its own private state directory under <code>/var/lib/</code> (and the <code>$STATE_DIRECTORY</code> environment variable is set to the resulting path). The <code>.socket</code> file simply lists 8088 as TCP/IP port to listen on.</p> <h1>An OS Description File</h1> <p>OS images (and that includes portable service images) generally should include an <a href="https://www.freedesktop.org/software/systemd/man/os-release.html"><code>os-release</code></a> file. Usually, that is provided by the distribution. Since we are building an image without any distribution let's write our <a href="https://github.com/systemd/portable-walkthrough-go/blob/master/os-release">own version of such a file</a>. Later on we can use the <code>portablectl inspect</code> command to have a look at this metadata of our image.</p> <h1>Putting it All Together</h1> <p>The four files described above are already every file we need to build our image. Let's now put the portable service image together. For that I've written a <a href="https://github.com/systemd/portable-walkthrough-go/blob/master/Makefile"><code>Makefile</code></a>. It contains two relevant rules: the first one builds the static binary from the Go program sources. The second one then puts together a <code>squashfs</code> file system combining the following:</p> <ol> <li>The compiled, statically linked service binary</li> <li>The two systemd unit files</li> <li>The <code>os-release</code> file</li> <li>A couple of empty directories such as <code>/proc/</code>, <code>/sys/</code>, <code>/dev/</code> and so on that need to be over-mounted with the respective kernel API file system. We need to create them as empty directories here since Linux insists on directories to exist in order to over-mount them, and since the image we are building is going to be an immutable read-only image (<code>squashfs</code>) these directories cannot be created dynamically when the portable image is mounted.</li> <li>Two empty files <code>/etc/resolv.conf</code> and <code>/etc/machine-id</code> that can be over-mounted with the same files from the host.</li> </ol> <p>And that's already it. After a quick <code>make</code> we'll have our portable service image <code>portable-walkthrough-go.raw</code> and are ready to go.</p> <h1>Trying it out</h1> <p>Let's now attach the portable service image to our host system:</p> <div class="highlight"><pre><span></span><code><span class="err">#</span><span class="w"> </span><span class="n">portablectl</span><span class="w"> </span><span class="n">attach</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">raw</span><span class="w"></span> <span class="p">(</span><span class="n">Matching</span><span class="w"> </span><span class="n">unit</span><span class="w"> </span><span class="n">files</span><span class="w"> </span><span class="k">with</span><span class="w"> </span><span class="k">prefix</span><span class="w"> </span><span class="s1">&#39;portable-walkthrough-go&#39;</span><span class="p">.)</span><span class="w"></span> <span class="n">Created</span><span class="w"> </span><span class="n">directory</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="p">.</span><span class="w"></span> <span class="n">Created</span><span class="w"> </span><span class="n">directory</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">socket</span><span class="p">.</span><span class="n">d</span><span class="p">.</span><span class="w"></span> <span class="n">Written</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">socket</span><span class="p">.</span><span class="n">d</span><span class="o">/</span><span class="mi">20</span><span class="o">-</span><span class="n">portable</span><span class="p">.</span><span class="n">conf</span><span class="p">.</span><span class="w"></span> <span class="n">Copied</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">socket</span><span class="p">.</span><span class="w"></span> <span class="n">Created</span><span class="w"> </span><span class="n">directory</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">service</span><span class="p">.</span><span class="n">d</span><span class="p">.</span><span class="w"></span> <span class="n">Written</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">service</span><span class="p">.</span><span class="n">d</span><span class="o">/</span><span class="mi">20</span><span class="o">-</span><span class="n">portable</span><span class="p">.</span><span class="n">conf</span><span class="p">.</span><span class="w"></span> <span class="n">Created</span><span class="w"> </span><span class="n">symlink</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">service</span><span class="p">.</span><span class="n">d</span><span class="o">/</span><span class="mi">10</span><span class="o">-</span><span class="n">profile</span><span class="p">.</span><span class="n">conf</span><span class="w"> </span><span class="err">→</span><span class="w"> </span><span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="n">portable</span><span class="o">/</span><span class="n">profile</span><span class="o">/</span><span class="k">default</span><span class="o">/</span><span class="n">service</span><span class="p">.</span><span class="n">conf</span><span class="p">.</span><span class="w"></span> <span class="n">Copied</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">service</span><span class="p">.</span><span class="w"></span> <span class="n">Created</span><span class="w"> </span><span class="n">symlink</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">portables</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">raw</span><span class="w"> </span><span class="err">→</span><span class="w"> </span><span class="o">/</span><span class="n">home</span><span class="o">/</span><span class="n">lennart</span><span class="o">/</span><span class="n">projects</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">raw</span><span class="p">.</span><span class="w"></span> </code></pre></div> <p>The portable service image is now attached to the host, which means we can now go and start it (or even enable it):</p> <div class="highlight"><pre><span></span><code><span class="err">#</span><span class="w"> </span><span class="n">systemctl</span><span class="w"> </span><span class="k">start</span><span class="w"> </span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">socket</span><span class="w"></span> </code></pre></div> <p>Let's see if our little web service works, by doing an HTTP request on port 8088:</p> <div class="highlight"><pre><span></span><code># curl localhost:8088 Hello! You are visitor #1! </code></pre></div> <p>Let's try this again, to check if it counts correctly:</p> <div class="highlight"><pre><span></span><code># curl localhost:8088 Hello! You are visitor #2! </code></pre></div> <p>Nice! It worked. Let's now stop the service again, and detach the image again:</p> <div class="highlight"><pre><span></span><code><span class="err">#</span><span class="w"> </span><span class="n">systemctl</span><span class="w"> </span><span class="n">stop</span><span class="w"> </span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">service</span><span class="w"> </span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">socket</span><span class="w"></span> <span class="err">#</span><span class="w"> </span><span class="n">portablectl</span><span class="w"> </span><span class="n">detach</span><span class="w"> </span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="w"></span> <span class="n">Removed</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">service</span><span class="p">.</span><span class="w"></span> <span class="n">Removed</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">service</span><span class="p">.</span><span class="n">d</span><span class="o">/</span><span class="mi">10</span><span class="o">-</span><span class="n">profile</span><span class="p">.</span><span class="n">conf</span><span class="p">.</span><span class="w"></span> <span class="n">Removed</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">service</span><span class="p">.</span><span class="n">d</span><span class="o">/</span><span class="mi">20</span><span class="o">-</span><span class="n">portable</span><span class="p">.</span><span class="n">conf</span><span class="p">.</span><span class="w"></span> <span class="n">Removed</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">service</span><span class="p">.</span><span class="n">d</span><span class="p">.</span><span class="w"></span> <span class="n">Removed</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">socket</span><span class="p">.</span><span class="w"></span> <span class="n">Removed</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">socket</span><span class="p">.</span><span class="n">d</span><span class="o">/</span><span class="mi">20</span><span class="o">-</span><span class="n">portable</span><span class="p">.</span><span class="n">conf</span><span class="p">.</span><span class="w"></span> <span class="n">Removed</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">socket</span><span class="p">.</span><span class="n">d</span><span class="p">.</span><span class="w"></span> <span class="n">Removed</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">portables</span><span class="o">/</span><span class="n">portable</span><span class="o">-</span><span class="n">walkthrough</span><span class="o">-</span><span class="k">go</span><span class="p">.</span><span class="n">raw</span><span class="p">.</span><span class="w"></span> <span class="n">Removed</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="p">.</span><span class="n">attached</span><span class="p">.</span><span class="w"></span> </code></pre></div> <p>And there we go, the portable image file is detached from the host again.</p> <h2>A Couple of Notes</h2> <ol> <li> <p>Of course, this is a simplistic example: in real life services will be more than one compiled file, even when statically linked. But you get the idea, and it's very easy to extend the example above to include any additional, auxiliary files in the portable service image.</p> </li> <li> <p>The service is very nicely sandboxed during runtime: while it runs as regular service on the host (and you thus can watch its logs or do resource management on it like you would do for all other systemd services), it runs in a very restricted environment under a dynamically assigned UID that ceases to exist when the service is stopped again.</p> </li> <li> <p>Originally I wanted to make the service not only socket activatable but also implement exit-on-idle, i.e. add a logic so that the service terminates on its own when there's no ongoing HTTP connection for a while. I couldn't figure out how to do this race-freely in Go though, but I am sure an interested reader might want to add that? By combining socket activation with exit-on-idle we can turn this project into an excercise of putting together an extremely resource-friendly and robust service architecture: the service is started only when needed and terminates when no longer needed. This would allow to pack services at a much higher density even on systems with few resources.</p> </li> <li> <p>While the basic concepts of portable services have been around since systemd 239, it's best to try the above with systemd 241 or newer since the portable service logic received a number of fixes since then.</p> </li> </ol> <h2>Further Reading</h2> <p>A low-level document introducing Portable Services is <a href="https://systemd.io/PORTABLE_SERVICES">shipped along with systemd</a>.</p> <p>Please have a look at the <a href="http://0pointer.net/blog/walkthrough-for-portable-services.html">blog story from a few months ago</a> that did something very similar with a service written in C.</p> <p>There are also relevant manual pages: <a href="https://www.freedesktop.org/software/systemd/man/portablectl.html"><code>portablectl(1)</code></a> and <a href="https://www.freedesktop.org/software/systemd/man/systemd-portabled.service.html"><code>systemd-portabled(8)</code></a>.</p>Lennart PoetteringWed, 03 Apr 2019 00:00:00 +0200tag:0pointer.net,2019-04-03:/blog/walkthrough-for-portable-services-in-go.htmlprojectsASG! 2018 Ticketshttps://0pointer.net/blog/asg-2018-tickets.html<p><large><b>All Systems Go! 2018 Tickets Selling Out Quickly!</b></large></p> <p>Buy your tickets for <a href="https://all-systems-go.io/">All Systems Go! 2018</a> soon, they are quickly selling out! The conference takes place on <em>September 28-30</em>, in <em>Berlin</em>, Germany, in a bit over two weeks.</p> <p>Why should you attend? If you are interested in low-level Linux userspace, then All Systems Go! is the right conference for you. It covers all topics relevant to foundational open-source Linux technologies. For details on the covered topics see our schedule <a href="https://cfp.all-systems-go.io/en/ASG2018/public/schedule/2">for day #1</a> and <a href="https://cfp.all-systems-go.io/en/ASG2018/public/schedule/3">for day #2</a>.</p> <p>For more information please visit <a href="https://all-systems-go.io/">our conference website</a>!</p> <p>See you in Berlin!</p>Lennart PoetteringTue, 11 Sep 2018 00:00:00 +0200tag:0pointer.net,2018-09-11:/blog/asg-2018-tickets.htmlprojectsASG! 2018 CfP Closes TODAYhttps://0pointer.net/blog/asg-2018-cfp-closes-today.html<p><large><b>The All Systems Go! 2018 Call for Participation Closes TODAY!</b></large></p> <p>The Call for Participation (CFP) for <a href="https://all-systems-go.io/">All Systems Go! 2018</a> will close <em>TODAY</em>, on 30th of July! We’d like to invite you to submit your proposals for consideration to <a href="https://cfp.all-systems-go.io/de/ASG2018/cfp">the CFP submission site</a> quickly!</p> <p><img src="https://scontent-frx5-1.xx.fbcdn.net/v/t1.0-9/32372869_2062729060632451_4411941877062828032_o.jpg?_nc_cat=0&oh=112809c076e808ede4dee6e50afe2b99&oe=5B8ACDDF" alt="ASG image" width="512" height="256"/></p> <p>All Systems Go! is everybody's favourite low-level Userspace Linux conference, taking place in Berlin, Germany in September 28-30, 2018.</p> <p>For more information please visit <a href="https://all-systems-go.io/">our conference website</a>!</p>Lennart PoetteringMon, 30 Jul 2018 00:00:00 +0200tag:0pointer.net,2018-07-30:/blog/asg-2018-cfp-closes-today.htmlprojectsASG! 2018 CfP Closes Soonhttps://0pointer.net/blog/asg-2018-cfp-closes-soon.html<p><large><b>The All Systems Go! 2018 Call for Participation Closes in One Week!</b></large></p> <p>The Call for Participation (CFP) for <a href="https://all-systems-go.io/">All Systems Go! 2018</a> will close <em>in one week</em>, on 30th of July! We’d like to invite you to submit your proposals for consideration to <a href="https://cfp.all-systems-go.io/de/ASG2018/cfp">the CFP submission site</a> quickly!</p> <p><img src="https://scontent-frx5-1.xx.fbcdn.net/v/t1.0-9/32372869_2062729060632451_4411941877062828032_o.jpg?_nc_cat=0&oh=112809c076e808ede4dee6e50afe2b99&oe=5B8ACDDF" alt="ASG image" width="512" height="256"/></p> <p>Notification of acceptance and non-acceptance will go out within 7 days of the closing of the CFP.</p> <p>All topics relevant to foundational open-source Linux technologies are welcome. In particular, however, we are looking for proposals including, but not limited to, the following topics:</p> <ul> <li>Low-level container executors and infrastructure</li> <li>IoT and embedded OS infrastructure</li> <li>BPF and eBPF filtering</li> <li>OS, container, IoT image delivery and updating</li> <li>Building Linux devices and applications</li> <li>Low-level desktop technologies</li> <li>Networking</li> <li>System and service management</li> <li>Tracing and performance measuring</li> <li>IPC and RPC systems</li> <li>Security and Sandboxing</li> </ul> <p>While our focus is definitely more on the user-space side of things, talks about kernel projects are welcome, as long as they have a clear and direct relevance for user-space.</p> <p>For more information please visit <a href="https://all-systems-go.io/">our conference website</a>!</p>Lennart PoetteringMon, 23 Jul 2018 00:00:00 +0200tag:0pointer.net,2018-07-23:/blog/asg-2018-cfp-closes-soon.htmlprojectsWalkthrough for Portable Serviceshttps://0pointer.net/blog/walkthrough-for-portable-services.html<h1>Portable Services with systemd v239</h1> <p><a href="https://lists.freedesktop.org/archives/systemd-devel/2018-June/040879.html">systemd v239</a> contains a great number of new features. One of them is first class support for <a href="https://systemd.io/PORTABLE_SERVICES">Portable Services</a>. In this blog story I'd like to shed some light on what they are and why they might be interesting for your application.</p> <h2>What are "Portable Services"?</h2> <p>The "Portable Service" concept takes inspiration from classic <code>chroot()</code> environments as well as container management and brings a number of their features to more regular system service management.</p> <p>While the definition of what a "container" really is is hotly debated, I figure people can generally agree that the "container" concept primarily provides two major features:</p> <ol> <li> <p>Resource bundling: a container generally brings its own file system tree along, bundling any shared libraries and other resources it might need along with the main service executables.</p> </li> <li> <p>Isolation and sand-boxing: a container operates in a name-spaced environment that is relatively detached from the host. Besides living in its own file system namespace it usually also has its own user database, process tree and so on. Access from the container to the host is limited with various security technologies.</p> </li> </ol> <p>Of these two concepts the first one is also what traditional UNIX <code>chroot()</code> environments are about.</p> <p>Both resource bundling and isolation/sand-boxing are concepts systemd has implemented to varying degrees for a longer time. Specifically, <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RootDirectory="><code>RootDirectory=</code></a> and <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RootImage="><code>RootImage=</code></a> have been around for a long time, and so have been the various <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Sandboxing">sand-boxing features</a> systemd provides. The Portable Services concept builds on that, putting these features together in a new, integrated way to make them more accessible and usable.</p> <h2>OK, so what precisely is a "Portable Service"?</h2> <p>Much like a container image, a portable service on disk can be just a directory tree that contains service executables and all their dependencies, in a hierarchy resembling the normal Linux directory hierarchy. A portable service can also be a raw disk image, containing a file system containing such a tree (which can be mounted via a loop-back block device), or multiple file systems (in which case they need to follow the <a href="https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/">Discoverable Partitions Specification</a> and be located within a GPT partition table). Regardless whether the portable service on disk is a simple directory tree or a raw disk image, let's call this concept the portable service <em>image</em>.</p> <p>Such images can be generated with any tool typically used for the purpose of installing OSes inside some directory, for example <code>dnf --installroot=</code> or <code>debootstrap</code>. There are very few requirements made on these trees, except the following two:</p> <ol> <li> <p>The tree should carry <a href="https://www.freedesktop.org/software/systemd/man/systemd.unit.html">systemd unit files</a> for relevant services in them.</p> </li> <li> <p>The tree should carry <a href="https://www.freedesktop.org/software/systemd/man/os-release.html"><code>/usr/lib/os-release</code></a> (or <code>/etc/os-release</code>) OS release information.</p> </li> </ol> <p>Of course, as you might notice, OS trees generated from any of today's big distributions generally qualify for these two requirements without any further modification, as pretty much all of them adopted <code>/usr/lib/os-release</code> and tend to ship their major services with systemd unit files.</p> <p>A portable service image generated like this can be "attached" or "detached" from a host:</p> <ol> <li> <p>"Attaching" an image to a host is done through the new <a href="https://www.freedesktop.org/software/systemd/man/portablectl.html"><code>portablectl attach</code></a> command. This command dissects the image, reading the <code>os-release</code> information, and searching for unit files in them. It then copies relevant unit files out of the images and into <code>/etc/systemd/system/</code>. After that it augments any copied service unit files in two ways: a drop-in adding a <code>RootDirectory=</code> or <code>RootImage=</code> line is added in so that even though the unit files are now available on the host when started they run the referenced binaries from the image. It also symlinks in a second drop-in which is called a "profile", which is supposed to carry additional security settings to enforce on the attached services, to ensure the right amount of sand-boxing.</p> </li> <li> <p>"Detaching" an image from the host is done through <code>portable detach</code>. It reverses the steps above: the unit files copied out are removed again, and so are the two drop-in files generated for them.</p> </li> </ol> <p>While a portable service is attached its relevant unit files are made available on the host like any others: they will appear in <code>systemctl list-unit-files</code>, you can enable and disable them, you can start them and stop them. You can extend them with <code>systemctl edit</code>. You can introspect them. You can apply resource management to them like to any other service, and you can process their logs like any other service and so on. That's because they really <em>are</em> native systemd services, except that they have 'twist' if you so will: they have tougher security by default and store their resources in a root directory or image.</p> <p>And that's already the essence of what Portable Services are.</p> <p>A couple of interesting points:</p> <ol> <li> <p>Even though the focus is on shipping <em>service</em> unit files in portable service images, you can actually ship timer units, socket units, target units, path units in portable services too. This means you can very naturally do time, socket and path based activation. It's also entirely fine to ship multiple service units in the same image, in case you have more complex applications.</p> </li> <li> <p>This concept introduces zero new metadata. Unit files are an existing concept, as are <code>os-release</code> files, and — in case you opt for raw disk images — GPT partition tables are already established too. This also means existing tools to generate images can be reused for building portable service images to a large degree as no completely new artifact types need to be generated.</p> </li> <li> <p>Because the Portable Service concepts introduces zero new metadata and just builds on existing security and resource bundling features of systemd it's implemented in a set of distinct tools, relatively disconnected from the rest of systemd. Specifically, the main user-facing command is <a href="https://www.freedesktop.org/software/systemd/man/portablectl.html"><code>portablectl</code></a>, and the actual operations are implemented in <a href="https://www.freedesktop.org/software/systemd/man/systemd-portabled.service.html"><code>systemd-portabled.service</code></a>. If you so will, portable services are a true add-on to systemd, just making a specific work-flow nicer to use than with the basic operations systemd otherwise provides. Also note that <code>systemd-portabled</code> provides bus APIs accessible to any program that wants to interface with it, <code>portablectl</code> is just one tool that happens to be shipped along with systemd.</p> </li> <li> <p>Since Portable Services are a feature we only added very recently we wanted to keep some freedom to make changes still. Due to that we decided to install the <code>portablectl</code> command into <code>/usr/lib/systemd/</code> for now, so that it does not appear in <code>$PATH</code> by default. This means, for now you have to invoke it with a full path: <code>/usr/lib/systemd/portablectl</code>. We expect to move it into <code>/usr/bin/</code> very soon though, and make it a fully supported interface of systemd.</p> </li> <li> <p>You may wonder which unit files contained in a portable service image are the ones considered "relevant" and are actually copied out by the <code>portablectl attach</code> operation. Currently, this is derived from the image name. Let's say you have an image stored in a directory <code>/var/lib/portables/foobar_4711/</code> (or alternatively in a raw image <code>/var/lib/portables/foobar_4711.raw</code>). In that case the unit files copied out match the pattern <code>foobar*.service</code>, <code>foobar*.socket</code>, <code>foobar*.target</code>, <code>foobar*.path</code>, <code>foobar*.timer</code>.</p> </li> <li> <p>The Portable Services concept does not define any specific method how images get on the deployment machines, that's entirely up to administrators. You can just <code>scp</code> them there, or <code>wget</code> them. You could even package them as RPMs and then deploy them with <code>dnf</code> if you feel adventurous.</p> </li> <li> <p>Portable service images can reside in any directory you like. However, if you place them in <code>/var/lib/portables/</code> then <code>portablectl</code> will find them easily and can show you a list of images you can attach and suchlike.</p> </li> <li> <p>Attaching a portable service image can be done persistently, so that it remains attached on subsequent boots (which is the default), or it can be attached only until the next reboot, by passing <code>--runtime</code> to <code>portablectl</code>.</p> </li> <li> <p>Because portable service images are ultimately just regular OS images, it's natural and easy to build a single image that can be used in three different ways:</p> <ol> <li> <p>It can be attached to any host as a portable service image.</p> </li> <li> <p>It can be booted as OS container, for example in a container manager like <a href="https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html"><code>systemd-nspawn</code></a>.</p> </li> <li> <p>It can be booted as host system, for example on bare metal or in a VM manager.</p> </li> </ol> <p>Of course, to qualify for the latter two the image needs to contain more than just the service binaries, the <code>os-release</code> file and the unit files. To be bootable an OS container manager such as <code>systemd-nspawn</code> the image needs to contain an init system of some form, for example <a href="https://www.freedesktop.org/software/systemd/man/systemd.html"><code>systemd</code></a>. To be bootable on bare metal or as VM it also needs a boot loader of some form, for example <a href="https://www.freedesktop.org/software/systemd/man/systemd-boot.html"><code>systemd-boot</code></a>.</p> </li> </ol> <h2>Profiles</h2> <p>In the previous section the "profile" concept was briefly mentioned. Since they are a major feature of the Portable Services concept, they deserve some focus. A "profile" is ultimately just a pre-defined drop-in file for unit files that are attached to a host. They are supposed to mostly contain sand-boxing and security settings, but may actually contain any other settings, too. When a portable service is attached a suitable profile has to be selected. If none is selected explicitly, the default profile called <code>default</code> is used. systemd ships with four different profiles out of the box:</p> <ol> <li> <p>The <a href="https://github.com/systemd/systemd/blob/master/src/portable/profile/default/service.conf"><code>default</code></a> profile provides a medium level of security. It contains settings to drop capabilities, enforce system call filters, restrict many kernel interfaces and mount various file systems read-only.</p> </li> <li> <p>The <a href="https://github.com/systemd/systemd/blob/master/src/portable/profile/strict/service.conf"><code>strict</code></a> profile is similar to the <code>default</code> profile, but generally uses the most restrictive sand-boxing settings. For example networking is turned off and access to <code>AF_NETLINK</code> sockets is prohibited.</p> </li> <li> <p>The <a href="https://github.com/systemd/systemd/blob/master/src/portable/profile/trusted/service.conf"><code>trusted</code></a> profile is the least strict of them all. In fact it makes almost no restrictions at all. A service run with this profile has basically full access to the host system.</p> </li> <li> <p>The <a href="https://github.com/systemd/systemd/blob/master/src/portable/profile/nonetwork/service.conf"><code>nonetwork</code></a> profile is mostly identical to <code>default</code>, but also turns off network access.</p> </li> </ol> <p>Note that the profile is selected at the time the portable service image is attached, and it applies to all service files attached, in case multiple are shipped in the same image. Thus, the sand-boxing restriction to enforce are selected by the administrator attaching the image and not the image vendor.</p> <p>Additional profiles can be defined easily by the administrator, if needed. We might also add additional profiles sooner or later to be shipped with systemd out of the box.</p> <h2>What's the use-case for this? If I have containers, why should I bother?</h2> <p>Portable Services are primarily intended to cover use-cases where code should more feel like "extensions" to the host system rather than live in disconnected, separate worlds. The profile concept is supposed to be tunable to the exact right amount of integration or isolation needed for an application.</p> <p>In the container world the concept of "super-privileged containers" has been touted a lot, i.e. containers that run with full privileges. It's precisely that use-case that portable services are intended for: extensions to the host OS, that default to isolation, but can optionally get as much access to the host as needed, and can naturally take benefit of the full functionality of the host. The concept should hence be useful for all kinds of low-level system software that isn't shipped with the OS itself but needs varying degrees of integration with it. Besides servers and appliances this should be particularly interesting for IoT and embedded devices.</p> <p>Because portable services are just a relatively small extension to the way system services are otherwise managed, they can be treated like regular service for almost all use-cases: they will appear along regular services in all tools that can introspect systemd unit data, and can be managed the same way when it comes to logging, resource management, runtime life-cycles and so on.</p> <p>Portable services are a very generic concept. While the original use-case is OS extensions, it's of course entirely up to you and other users to use them in a suitable way of your choice.</p> <h2>Walkthrough</h2> <p>Let's have a look how this all can be used. We'll start with building a portable service image from scratch, before we attach, enable and start it on a host.</p> <h3>Building a Portable Service image</h3> <p>As mentioned, you can use any tool you like that can create OS trees or raw images for building Portable Service images, for example <code>debootstrap</code> or <code>dnf --installroot=</code>. For this example walkthrough run we'll use <a href="https://github.com/systemd/mkosi"><code>mkosi</code></a>, which is ultimately just a fancy wrapper around <code>dnf</code> and <code>debootstrap</code> but makes a number of things particularly easy when repetitively building images from source trees.</p> <p>I have pushed everything necessary to reproduce this walkthrough locally to <a href="https://github.com/systemd/portable-walkthrough">a GitHub repository</a>. Let's check it out:</p> <div class="highlight"><pre><span></span><code>$ git clone https://github.com/systemd/portable-walkthrough.git </code></pre></div> <p>Let's have a look in the repository:</p> <ol> <li> <p>First of all, <a href="https://github.com/systemd/portable-walkthrough/blob/master/walkthroughd.c"><code>walkthroughd.c</code></a> is the main source file of our little service. To keep things simple it's written in C, but it could be in any language of your choice. The daemon as implemented won't do much: it just starts up and waits for <code>SIGTERM</code>, at which point it will shut down. It's ultimately useless, but hopefully illustrates how this all fits together. The C code has no dependencies besides libc.</p> </li> <li> <p><a href="https://github.com/systemd/portable-walkthrough/blob/master/walkthroughd.service"><code>walkthroughd.service</code></a> is a systemd unit file that starts our little daemon. It's a simple service, hence the unit file is trivial.</p> </li> <li> <p><a href="https://github.com/systemd/portable-walkthrough/blob/master/Makefile"><code>Makefile</code></a> is a short make build script to build the daemon binary. It's pretty trivial, too: it just takes the C file and builds a binary from it. It can also install the daemon. It places the binary in <code>/usr/local/lib/walkthroughd/walkthroughd</code> (why not in <code>/usr/local/bin</code>? because it's not a user-facing binary but a system service binary), and its unit file in <code>/usr/local/lib/systemd/walkthroughd.service</code>. If you want to test the daemon on the host we can now simply run <code>make</code> and then <code>./walkthroughd</code> in order to check everything works.</p> </li> <li> <p><a href="https://github.com/systemd/portable-walkthrough/blob/master/mkosi.default"><code>mkosi.default</code></a> is file that tells <code>mkosi</code> how to build the image. We opt for a Fedora-based image here (but we might as well have used Debian here, or any other supported distribution). We need no particular packages during runtime (after all we only depend on libc), but during the build phase we need gcc and make, hence these are the only packages we list in <code>BuildPackages=</code>.</p> </li> <li> <p><a href="https://github.com/systemd/portable-walkthrough/blob/master/mkosi.build"><code>mkosi.build</code></a> is a shell script that is invoked during mkosi's build logic. All it does is invoke <code>make</code> and <code>make install</code> to build and install our little daemon, and afterwards it extends the distribution-supplied <code>/etc/os-release</code> file with an additional field that describes our portable service a bit.</p> </li> </ol> <p>Let's now use this to build the portable service image. For that we use the <a href="https://github.com/systemd/mkosi">mkosi</a> tool. It's sufficient to invoke it without parameter to build the first image: it will automatically discover <code>mkosi.default</code> and <code>mkosi.build</code> which tells it what to do. (Note that if you work on a project like this for a longer time, <code>mkosi -if</code> is probably the better command to use, as it that speeds up building substantially by using an incremental build mode). <code>mkosi</code> will download the necessary RPMs, and put them all together. It will build our little daemon inside the image and after all that's done it will output the resulting image: <code>walkthroughd_1.raw</code>.</p> <p>Because we opted to build a GPT raw disk image in <code>mkosi.default</code> this file is actually a raw disk image containing a GPT partition table. You can use <code>fdisk -l walkthroughd_1.raw</code> to enumerate the partition table. You can also use <code>systemd-nspawn -i walkthroughd_1.raw</code> to explore the image quickly if you need.</p> <h2>Using the Portable Service Image</h2> <p>Now that we have a portable service image, let's see how we can attach, enable and start the service included within it.</p> <p>First, let's attach the image:</p> <div class="highlight"><pre><span></span><code><span class="gp"># </span>/usr/lib/systemd/portablectl attach ./walkthroughd_1.raw <span class="gp gp-VirtualEnv">(Matching unit files with prefix &#39;walkthroughd&#39;.)</span> <span class="go">Created directory /etc/systemd/system/walkthroughd.service.d.</span> <span class="go">Written /etc/systemd/system/walkthroughd.service.d/20-portable.conf.</span> <span class="go">Created symlink /etc/systemd/system/walkthroughd.service.d/10-profile.conf → /usr/lib/systemd/portable/profile/default/service.conf.</span> <span class="go">Copied /etc/systemd/system/walkthroughd.service.</span> <span class="go">Created symlink /etc/portables/walkthroughd_1.raw → /home/lennart/projects/portable-walkthrough/walkthroughd_1.raw.</span> </code></pre></div> <p>The command will show you exactly what is has been doing: it just copied the main service file out, and added the two drop-ins, as expected.</p> <p>Let's see if the unit is now available on the host, just like a regular unit, as promised:</p> <div class="highlight"><pre><span></span><code><span class="gp"># </span>systemctl status walkthroughd.service <span class="go">● walkthroughd.service - A simple example service</span> <span class="go"> Loaded: loaded (/etc/systemd/system/walkthroughd.service; disabled; vendor preset: disabled)</span> <span class="go"> Drop-In: /etc/systemd/system/walkthroughd.service.d</span> <span class="go"> └─10-profile.conf, 20-portable.conf</span> <span class="go"> Active: inactive (dead)</span> </code></pre></div> <p>Nice, it worked. We see that the unit file is available and that systemd correctly discovered the two drop-ins. The unit is neither enabled nor started however. Yes, attaching a portable service image doesn't imply enabling nor starting. It just means the unit files contained in the image are made available to the host. It's up to the administrator to then enable them (so that they are automatically started when needed, for example at boot), and/or start them (in case they shall run right-away).</p> <p>Let's now enable and start the service in one step:</p> <div class="highlight"><pre><span></span><code><span class="gp"># </span>systemctl <span class="nb">enable</span> --now walkthroughd.service <span class="go">Created symlink /etc/systemd/system/multi-user.target.wants/walkthroughd.service → /etc/systemd/system/walkthroughd.service.</span> </code></pre></div> <p>Let's check if it's running:</p> <div class="highlight"><pre><span></span><code><span class="gp"># </span>systemctl status walkthroughd.service <span class="go">● walkthroughd.service - A simple example service</span> <span class="go"> Loaded: loaded (/etc/systemd/system/walkthroughd.service; enabled; vendor preset: disabled)</span> <span class="go"> Drop-In: /etc/systemd/system/walkthroughd.service.d</span> <span class="go"> └─10-profile.conf, 20-portable.conf</span> <span class="go"> Active: active (running) since Wed 2018-06-27 17:55:30 CEST; 4s ago</span> <span class="go"> Main PID: 45003 (walkthroughd)</span> <span class="go"> Tasks: 1 (limit: 4915)</span> <span class="go"> Memory: 4.3M</span> <span class="go"> CGroup: /system.slice/walkthroughd.service</span> <span class="go"> └─45003 /usr/local/lib/walkthroughd/walkthroughd</span> <span class="go">Jun 27 17:55:30 sigma walkthroughd[45003]: Initializing.</span> </code></pre></div> <p>Perfect! We can see that the service is now enabled and running. The daemon is running as PID 45003.</p> <p>Now that we verified that all is good, let's stop, disable and detach the service again:</p> <div class="highlight"><pre><span></span><code><span class="gp"># </span>systemctl disable --now walkthroughd.service <span class="go">Removed /etc/systemd/system/multi-user.target.wants/walkthroughd.service.</span> <span class="gp"># </span>/usr/lib/systemd/portablectl detach ./walkthroughd_1.raw <span class="go">Removed /etc/systemd/system/walkthroughd.service.</span> <span class="go">Removed /etc/systemd/system/walkthroughd.service.d/10-profile.conf.</span> <span class="go">Removed /etc/systemd/system/walkthroughd.service.d/20-portable.conf.</span> <span class="go">Removed /etc/systemd/system/walkthroughd.service.d.</span> <span class="go">Removed /etc/portables/walkthroughd_1.raw.</span> </code></pre></div> <p>And finally, let's see that it's really gone:</p> <div class="highlight"><pre><span></span><code><span class="gp"># </span>systemctl status walkthroughd <span class="go">Unit walkthroughd.service could not be found.</span> </code></pre></div> <p>Perfect! It worked!</p> <p>I hope the above gets you started with Portable Services. If you have further questions, please contact <a href="https://lists.freedesktop.org/mailman/listinfo/systemd-devel">our mailing list</a>.</p> <h2>Further Reading</h2> <p>A more low-level document explaining details is <a href="https://systemd.io/PORTABLE_SERVICES">shipped along with systemd</a>.</p> <p>There are also relevant manual pages: <a href="https://www.freedesktop.org/software/systemd/man/portablectl.html"><code>portablectl(1)</code></a> and <a href="https://www.freedesktop.org/software/systemd/man/systemd-portabled.service.html"><code>systemd-portabled(8)</code></a>.</p> <p>For further information about <code>mkosi</code> see <a href="https://github.com/systemd/mkosi">its homepage</a>.</p>Lennart PoetteringWed, 27 Jun 2018 00:00:00 +0200tag:0pointer.net,2018-06-27:/blog/walkthrough-for-portable-services.htmlprojectsAll Systems Go! 2018 CfP Openhttps://0pointer.net/blog/all-systems-go-2018-cfp-open.html<p><large><b>The All Systems Go! 2018 Call for Participation is Now Open!</b></large></p> <p>The Call for Participation (CFP) for <a href="https://all-systems-go.io/">All Systems Go! 2018</a> is now open. We’d like to invite you to submit your proposals for consideration to <a href="https://cfp.all-systems-go.io/de/ASG2018/cfp">the CFP submission site</a>.</p> <p><img src="https://scontent-frx5-1.xx.fbcdn.net/v/t1.0-9/32372869_2062729060632451_4411941877062828032_o.jpg?_nc_cat=0&oh=112809c076e808ede4dee6e50afe2b99&oe=5B8ACDDF" alt="ASG image" width="512" height="256"/></p> <p>The CFP will close on July 30th. Notification of acceptance and non-acceptance will go out within 7 days of the closing of the CFP.</p> <p>All topics relevant to foundational open-source Linux technologies are welcome. In particular, however, we are looking for proposals including, but not limited to, the following topics:</p> <ul> <li>Low-level container executors and infrastructure</li> <li>IoT and embedded OS infrastructure</li> <li>BPF and eBPF filtering</li> <li>OS, container, IoT image delivery and updating</li> <li>Building Linux devices and applications</li> <li>Low-level desktop technologies</li> <li>Networking</li> <li>System and service management</li> <li>Tracing and performance measuring</li> <li>IPC and RPC systems</li> <li>Security and Sandboxing</li> </ul> <p>While our focus is definitely more on the user-space side of things, talks about kernel projects are welcome, as long as they have a clear and direct relevance for user-space.</p> <p>For more information please visit <a href="https://all-systems-go.io/">our conference website</a>!</p>Lennart PoetteringMon, 21 May 2018 00:00:00 +0200tag:0pointer.net,2018-05-21:/blog/all-systems-go-2018-cfp-open.htmlprojectsAll Systems Go! 2017 Videos Online!https://0pointer.net/blog/all-systems-go-2017-videos-online.html<p>For those living under a rock, the videos from everybody's favourite Userspace Linux Conference <a href="https://all-systems-go.io/"><em>All Systems Go!</em> 2017</a> are now available online.</p> <p><a href="https://media.ccc.de/b/conferences/all_systems_go/2017">All videos</a></p> <p>The videos for my own two talks are available here:</p> <p><a href="https://media.ccc.de/v/ASG2017-125-synchronizing_images_with_casync">Synchronizing Images with casync</a> (<a href="http://0pointer.de/public/casync-asg2017.pdf">Slides</a>)</p> <p><a href="https://media.ccc.de/v/ASG2017-101-containers_without_a_container_manager_with_systemd">Containers without a Container Manager, with systemd</a> (<a href="http://0pointer.de/public/systemd-asg2017.pdf">Slides</a>)</p> <p>Of course, this is the stellar work of the <a href="https://c3voc.de/">CCC VOC</a> folks, who are hard to beat when it comes to videotaping of community conferences.</p> <p><a href="https://all-systems-go.io/"><img src="https://all-systems-go.io/img/header-graphic.png" width="600" height="195" border="5"/></a></p>Lennart PoetteringTue, 24 Oct 2017 00:00:00 +0200tag:0pointer.net,2017-10-24:/blog/all-systems-go-2017-videos-online.htmlprojectsAttending and Speaking at GNOME.Asia 2017 Summithttps://0pointer.net/blog/attending-and-speaking-at-gnomeasia-2017-summit.html<p>The <a href="https://2017.gnome.asia/">GNOME.Asia Summit 2017</a> organizers invited to me to speak at their conference in Chongqing/China, and it was an excellent event! Here's my brief report:</p> <p><img src="https://wiki.gnome.org/Travel/Policy?action=AttachFile&do=get&target=sponsored-badge-shadow.png" width="230" height="230"/></p> <p>Because we arrived one day early in Chongqing, my GNOME friends Sri, Matthias, Jonathan, David and I started our journey with an excursion to the <a href="https://en.wikipedia.org/wiki/Dazu_Rock_Carvings">Dazu Rock Carvings</a>, a short bus trip from Chongqing, and an excellent (and sometimes quite surprising) sight. I mean, where else can you see a buddha with 1000+ hands, and centuries old, holding a cell Nexus 5 cell phone? Here's proof:</p> <p><a href="http://0pointer.de/public/chongqing/big/IMG_0234.jpg"><img src="http://0pointer.de/public/chongqing/small/IMG_0234.jpg" width="167" height="250"/></a></p> <p>The GNOME.Asia schedule was excellent, with various good talks, including some about Flatpak, Endless OS, rpm-ostree, Blockchains and more. My own talk was about <em>The Path to a Fully Protected GNOME Desktop OS Image</em> (<a href="http://0pointer.de/public/systemd-gnomeasia2017.pdf">Slides available here</a>). In the hallway track I did my best to advocate <a href="https://github.com/systemd/casync">casync</a> to whoever was willing to listen, and I think enough were ;-). As we all know attending conferences is at least as much about the hallway track as about the talks, and GNOME.Asia was a fantastic way to meet the Chinese GNOME and Open Source communities.</p> <p>The day after the conference the organizers of GNOME.Asia organized a Chongqing day trip. A particular highlight was the ubiqutious hot pot, sometimes with the local speciality: fresh pig brain.</p> <p>Here some random photos from the trip: sights, food, social event and more.</p> <p><a href="http://0pointer.de/public/chongqing/big/IMG_0409.jpg"><img src="http://0pointer.de/public/chongqing/small/IMG_0409.jpg" width="135" height="250"/></a> <a href="http://0pointer.de/public/chongqing/big/IMG_0265.jpg"><img src="http://0pointer.de/public/chongqing/small/IMG_0265.jpg" width="167" height="250"/></a> <a href="http://0pointer.de/public/chongqing/big/IMG_0183.jpg"><img src="http://0pointer.de/public/chongqing/small/IMG_0183.jpg" width="177" height="250"/></a> <a href="http://0pointer.de/public/chongqing/handy/esel.jpg"><img src="http://0pointer.de/public/chongqing/handy/esel-klein.jpg" width="240" height="320"/></a> <a href="http://0pointer.de/public/chongqing/big/IMG_0273.jpg"><img src="http://0pointer.de/public/chongqing/small/IMG_0273.jpg" width="167" height="250"/></a> <a href="http://0pointer.de/public/chongqing/big/IMG_0164.jpg"><img src="http://0pointer.de/public/chongqing/small/IMG_0164.jpg" width="167" height="250"/></a> <a href="http://0pointer.de/public/chongqing/handy/hotpot.jpg"><img src="http://0pointer.de/public/chongqing/handy/hotpot-klein.jpg" width="240" height="320"/></a></p> <p><a href="http://0pointer.de/public/chongqing/big/IMG_0176.jpg"><img src="http://0pointer.de/public/chongqing/small/IMG_0176.jpg" width="250" height="152"/></a> <a href="http://0pointer.de/public/chongqing/big/IMG_0150.jpg"><img src="http://0pointer.de/public/chongqing/small/IMG_0150.jpg" width="250" height="195"/></a> <a href="http://0pointer.de/public/chongqing/big/IMG_0216.jpg"><img src="http://0pointer.de/public/chongqing/small/IMG_0216.jpg" width="250" height="167"/></a> <a href="http://0pointer.de/public/chongqing/big/IMG_0326.jpg"><img src="http://0pointer.de/public/chongqing/small/IMG_0326.jpg" width="250" height="169"/></a> <a href="http://0pointer.de/public/chongqing/big/IMG_0371.jpg"><img src="http://0pointer.de/public/chongqing/small/IMG_0371.jpg" width="250" height="167"/></a> <a href="http://0pointer.de/public/chongqing/big/IMG_0442.jpg"><img src="http://0pointer.de/public/chongqing/small/IMG_0442.jpg" width="250" height="167"/></a> <a href="http://0pointer.de/public/chongqing/big/IMG_0480.jpg"><img src="http://0pointer.de/public/chongqing/small/IMG_0480.jpg" width="250" height="177"/></a> <a href="http://0pointer.de/public/chongqing/big/IMG_0536.jpg"><img src="http://0pointer.de/public/chongqing/small/IMG_0536.jpg" width="250" height="94"/></a></p> <p>I'd like to thank the GNOME Foundation for funding my trip to GNOME.Asia. And that's all for now. But let me close with an old chinese wisdom:</p> <p><a href="http://0pointer.de/public/chongqing/handy/wahlspruch.jpg"><img src="http://0pointer.de/public/chongqing/handy/wahlspruch-klein.jpg" width="320" height="210"/></a></p> <p><big><i>&nbsp;&nbsp;&nbsp;The Trials Of A Long Journey Always Feeling, Civilized Travel Pass Reputation.</i></big></p>Lennart PoetteringTue, 24 Oct 2017 00:00:00 +0200tag:0pointer.net,2017-10-24:/blog/attending-and-speaking-at-gnomeasia-2017-summit.htmlprojectsIP Accounting and Access Lists with systemdhttps://0pointer.net/blog/ip-accounting-and-access-lists-with-systemd.html<p><em>TL;DR: systemd now can do per-service IP traffic accounting, as well as access control for IP address ranges.</em></p> <p>Last Friday we released <a href="https://lists.freedesktop.org/archives/systemd-devel/2017-October/039589.html">systemd 235</a>. <a href="http://0pointer.net/blog/dynamic-users-with-systemd.html">I already blogged about its Dynamic User feature in detail</a>, but there's one more piece of new functionality that I think deserves special attention: IP accounting and access control.</p> <p>Before v235 systemd already provided per-unit resource management hooks for a number of different kinds of resources: consumed CPU time, disk I/O, memory usage and number of tasks. With v235 another kind of resource can be controlled per-unit with systemd: network traffic (specifically IP).</p> <p>Three new unit file settings have been added in this context:</p> <ol> <li> <p><a href="https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#IPAccounting="><code>IPAccounting=</code></a> is a boolean setting. If enabled for a unit, all IP traffic sent and received by processes associated with it is counted both in terms of bytes and of packets.</p> </li> <li> <p><a href="https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#IPAddressAllow=ADDDRESS%5B/PREFIXLENGTH%5D%E2%80%A6"><code>IPAddressDeny=</code></a> takes an IP address prefix (that means: an IP address with a network mask). All traffic from and to this address will be prohibited for processes of the service.</p> </li> <li> <p><a href="https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#IPAddressAllow=ADDDRESS%5B/PREFIXLENGTH%5D%E2%80%A6"><code>IPAddressAllow=</code></a> is the matching positive counterpart to <code>IPAddressDeny=</code>. All traffic matching this IP address/network mask combination will be allowed, even if otherwise listed in <code>IPAddressDeny=</code>.</p> </li> </ol> <p>The three options are thin wrappers around kernel functionality introduced with Linux 4.11: the control group eBPF hooks. The actual work is done by the kernel, systemd just provides a number of new settings to configure this facet of it. Note that cgroup/eBPF is unrelated to classic Linux firewalling, i.e. NetFilter/<code>iptables</code>. It's up to you whether you use one or the other, or both in combination (or of course neither).</p> <h1>IP Accounting</h1> <p>Let's have a closer look at the IP accounting logic mentioned above. Let's write a simple unit <code>/etc/systemd/system/ip-accounting-test.service</code>:</p> <div class="highlight"><pre><span></span><code><span class="k">[Service]</span><span class="w"></span> <span class="na">ExecStart</span><span class="o">=</span><span class="s">/usr/bin/ping 8.8.8.8</span><span class="w"></span> <span class="na">IPAccounting</span><span class="o">=</span><span class="s">yes</span><span class="w"></span> </code></pre></div> <p>This simple unit invokes the <a href="http://man7.org/linux/man-pages/man8/ping.8.html">ping(8)</a> command to send a series of ICMP/IP ping packets to the IP address 8.8.8.8 (which is the Google DNS server IP; we use it for testing here, since it's easy to remember, reachable everywhere and known to react to ICMP pings; any other IP address responding to pings would be fine to use, too). The <code>IPAccounting=</code> option is used to turn on IP accounting for the unit.</p> <p>Let's start this service after writing the file. Let's then have a look at the status output of <code>systemctl</code>:</p> <div class="highlight"><pre><span></span><code><span class="c1"># systemctl daemon-reload</span> <span class="c1"># systemctl start ip-accounting-test</span> <span class="c1"># systemctl status ip-accounting-test</span> ● ip-accounting-test.service Loaded: loaded <span class="o">(</span>/etc/systemd/system/ip-accounting-test.service<span class="p">;</span> static<span class="p">;</span> vendor preset: disabled<span class="o">)</span> Active: active <span class="o">(</span>running<span class="o">)</span> since Mon <span class="m">2017</span>-10-09 <span class="m">18</span>:05:47 CEST<span class="p">;</span> 1s ago Main PID: <span class="m">32152</span> <span class="o">(</span>ping<span class="o">)</span> IP: 168B <span class="k">in</span>, 168B out Tasks: <span class="m">1</span> <span class="o">(</span>limit: <span class="m">4915</span><span class="o">)</span> CGroup: /system.slice/ip-accounting-test.service └─32152 /usr/bin/ping <span class="m">8</span>.8.8.8 Okt <span class="m">09</span> <span class="m">18</span>:05:47 sigma systemd<span class="o">[</span><span class="m">1</span><span class="o">]</span>: Started ip-accounting-test.service. Okt <span class="m">09</span> <span class="m">18</span>:05:47 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: PING <span class="m">8</span>.8.8.8 <span class="o">(</span><span class="m">8</span>.8.8.8<span class="o">)</span> <span class="m">56</span><span class="o">(</span><span class="m">84</span><span class="o">)</span> bytes of data. Okt <span class="m">09</span> <span class="m">18</span>:05:47 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">1</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">29</span>.2 ms Okt <span class="m">09</span> <span class="m">18</span>:05:48 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">2</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">28</span>.0 ms </code></pre></div> <p>This shows the <code>ping</code> command running — it's currently at its second ping cycle as we can see in the logs at the end of the output. More interesting however is the <code>IP:</code> line further up showing the current IP byte counters. It currently shows 168 bytes have been received, and 168 bytes have been sent. That the two counters are at the same value is not surprising: ICMP ping requests and responses are supposed to have the same size. Note that this line is shown only if <code>IPAccounting=</code> is turned on for the service, as only then this data is collected.</p> <p>Let's wait a bit, and invoke <code>systemctl status</code> again:</p> <div class="highlight"><pre><span></span><code><span class="c1"># systemctl status ip-accounting-test</span> ● ip-accounting-test.service Loaded: loaded <span class="o">(</span>/etc/systemd/system/ip-accounting-test.service<span class="p">;</span> static<span class="p">;</span> vendor preset: disabled<span class="o">)</span> Active: active <span class="o">(</span>running<span class="o">)</span> since Mon <span class="m">2017</span>-10-09 <span class="m">18</span>:05:47 CEST<span class="p">;</span> 4min 28s ago Main PID: <span class="m">32152</span> <span class="o">(</span>ping<span class="o">)</span> IP: <span class="m">22</span>.2K <span class="k">in</span>, <span class="m">22</span>.2K out Tasks: <span class="m">1</span> <span class="o">(</span>limit: <span class="m">4915</span><span class="o">)</span> CGroup: /system.slice/ip-accounting-test.service └─32152 /usr/bin/ping <span class="m">8</span>.8.8.8 Okt <span class="m">09</span> <span class="m">18</span>:10:07 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">260</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">27</span>.7 ms Okt <span class="m">09</span> <span class="m">18</span>:10:08 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">261</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">28</span>.0 ms Okt <span class="m">09</span> <span class="m">18</span>:10:09 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">262</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">33</span>.8 ms Okt <span class="m">09</span> <span class="m">18</span>:10:10 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">263</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">48</span>.9 ms Okt <span class="m">09</span> <span class="m">18</span>:10:11 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">264</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">27</span>.2 ms Okt <span class="m">09</span> <span class="m">18</span>:10:12 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">265</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">27</span>.0 ms Okt <span class="m">09</span> <span class="m">18</span>:10:13 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">266</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">26</span>.8 ms Okt <span class="m">09</span> <span class="m">18</span>:10:14 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">267</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">27</span>.4 ms Okt <span class="m">09</span> <span class="m">18</span>:10:15 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">268</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">29</span>.7 ms Okt <span class="m">09</span> <span class="m">18</span>:10:16 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">269</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">27</span>.6 ms </code></pre></div> <p>As we can see, after 269 pings the counters are much higher: at 22K.</p> <p>Note that while <code>systemctl status</code> shows only the byte counters, packet counters are kept as well. Use the low-level <code>systemctl show</code> command to query the current raw values of the in and out packet and byte counters:</p> <div class="highlight"><pre><span></span><code><span class="c1"># systemctl show ip-accounting-test -p IPIngressBytes -p IPIngressPackets -p IPEgressBytes -p IPEgressPackets</span> <span class="nv">IPIngressBytes</span><span class="o">=</span><span class="m">37776</span> <span class="nv">IPIngressPackets</span><span class="o">=</span><span class="m">449</span> <span class="nv">IPEgressBytes</span><span class="o">=</span><span class="m">37776</span> <span class="nv">IPEgressPackets</span><span class="o">=</span><span class="m">449</span> </code></pre></div> <p>Of course, the same information is also available via the D-Bus APIs. If you want to process this data further consider talking proper D-Bus, rather than scraping the output of <code>systemctl show</code>.</p> <p>Now, let's stop the service again:</p> <div class="highlight"><pre><span></span><code><span class="c1"># systemctl stop ip-accounting-test</span> </code></pre></div> <p>When a service with such accounting turned on terminates, a log line about all its consumed resources is written to the logs. Let's check with <code>journalctl</code>:</p> <div class="highlight"><pre><span></span><code><span class="c1"># journalctl -u ip-accounting-test -n 5</span> -- Logs begin at Thu <span class="m">2016</span>-08-18 <span class="m">23</span>:09:37 CEST, end at Mon <span class="m">2017</span>-10-09 <span class="m">18</span>:17:02 CEST. -- Okt <span class="m">09</span> <span class="m">18</span>:15:50 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">603</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">26</span>.9 ms Okt <span class="m">09</span> <span class="m">18</span>:15:51 sigma ping<span class="o">[</span><span class="m">32152</span><span class="o">]</span>: <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">604</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">27</span>.2 ms Okt <span class="m">09</span> <span class="m">18</span>:15:52 sigma systemd<span class="o">[</span><span class="m">1</span><span class="o">]</span>: Stopping ip-accounting-test.service... Okt <span class="m">09</span> <span class="m">18</span>:15:52 sigma systemd<span class="o">[</span><span class="m">1</span><span class="o">]</span>: Stopped ip-accounting-test.service. Okt <span class="m">09</span> <span class="m">18</span>:15:52 sigma systemd<span class="o">[</span><span class="m">1</span><span class="o">]</span>: ip-accounting-test.service: Received <span class="m">49</span>.5K IP traffic, sent <span class="m">49</span>.5K IP traffic </code></pre></div> <p>The last line shown is the interesting one, that shows the accounting data. It's actually a structured log message, and among its metadata fields it contains the more comprehensive raw data:</p> <div class="highlight"><pre><span></span><code><span class="c1"># journalctl -u ip-accounting-test -n 1 -o verbose</span> -- Logs begin at Thu <span class="m">2016</span>-08-18 <span class="m">23</span>:09:37 CEST, end at Mon <span class="m">2017</span>-10-09 <span class="m">18</span>:18:50 CEST. -- Mon <span class="m">2017</span>-10-09 <span class="m">18</span>:15:52.649028 CEST <span class="o">[</span><span class="nv">s</span><span class="o">=</span>89a2cc877fdf4dafb2269a7631afedad<span class="p">;</span><span class="nv">i</span><span class="o">=</span>14d7<span class="p">;</span><span class="nv">b</span><span class="o">=</span>4c7e7adcba0c45b69d612857270716d3<span class="p">;</span><span class="nv">m</span><span class="o">=</span>137592e75e<span class="p">;</span><span class="nv">t</span><span class="o">=</span>55b1f81298605<span class="p">;</span><span class="nv">x</span><span class="o">=</span>c3c9b57b28c9490e<span class="o">]</span> <span class="nv">PRIORITY</span><span class="o">=</span><span class="m">6</span> <span class="nv">_BOOT_ID</span><span class="o">=</span>4c7e7adcba0c45b69d612857270716d3 <span class="nv">_MACHINE_ID</span><span class="o">=</span>e87bfd866aea4ae4b761aff06c9c3cb3 <span class="nv">_HOSTNAME</span><span class="o">=</span>sigma <span class="nv">SYSLOG_FACILITY</span><span class="o">=</span><span class="m">3</span> <span class="nv">SYSLOG_IDENTIFIER</span><span class="o">=</span>systemd <span class="nv">_UID</span><span class="o">=</span><span class="m">0</span> <span class="nv">_GID</span><span class="o">=</span><span class="m">0</span> <span class="nv">_TRANSPORT</span><span class="o">=</span>journal <span class="nv">_PID</span><span class="o">=</span><span class="m">1</span> <span class="nv">_COMM</span><span class="o">=</span>systemd <span class="nv">_EXE</span><span class="o">=</span>/usr/lib/systemd/systemd <span class="nv">_CAP_EFFECTIVE</span><span class="o">=</span>3fffffffff <span class="nv">_SYSTEMD_CGROUP</span><span class="o">=</span>/init.scope <span class="nv">_SYSTEMD_UNIT</span><span class="o">=</span>init.scope <span class="nv">_SYSTEMD_SLICE</span><span class="o">=</span>-.slice <span class="nv">CODE_FILE</span><span class="o">=</span>../src/core/unit.c <span class="nv">_CMDLINE</span><span class="o">=</span>/usr/lib/systemd/systemd --switched-root --system --deserialize <span class="m">25</span> <span class="nv">_SELINUX_CONTEXT</span><span class="o">=</span>system_u:system_r:init_t:s0 <span class="nv">UNIT</span><span class="o">=</span>ip-accounting-test.service <span class="nv">CODE_LINE</span><span class="o">=</span><span class="m">2115</span> <span class="nv">CODE_FUNC</span><span class="o">=</span>unit_log_resources <span class="nv">MESSAGE_ID</span><span class="o">=</span>ae8f7b866b0347b9af31fe1c80b127c0 <span class="nv">INVOCATION_ID</span><span class="o">=</span>98a6e756fa9d421d8dfc82b6df06a9c3 <span class="nv">IP_METRIC_INGRESS_BYTES</span><span class="o">=</span><span class="m">50880</span> <span class="nv">IP_METRIC_INGRESS_PACKETS</span><span class="o">=</span><span class="m">605</span> <span class="nv">IP_METRIC_EGRESS_BYTES</span><span class="o">=</span><span class="m">50880</span> <span class="nv">IP_METRIC_EGRESS_PACKETS</span><span class="o">=</span><span class="m">605</span> <span class="nv">MESSAGE</span><span class="o">=</span>ip-accounting-test.service: Received <span class="m">49</span>.6K IP traffic, sent <span class="m">49</span>.6K IP traffic <span class="nv">_SOURCE_REALTIME_TIMESTAMP</span><span class="o">=</span><span class="m">1507565752649028</span> </code></pre></div> <p>The interesting fields of this log message are of course <code>IP_METRIC_INGRESS_BYTES=</code>, <code>IP_METRIC_INGRESS_PACKETS=</code>, <code>IP_METRIC_EGRESS_BYTES=</code>, <code>IP_METRIC_EGRESS_PACKETS=</code> that show the consumed data.</p> <p>The log message carries a <a href="https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html#MESSAGE_ID=">message ID</a> that may be used to quickly search for all such resource log messages (<code>ae8f7b866b0347b9af31fe1c80b127c0</code>). We can combine a search term for messages of this ID with <code>journalctl</code>'s <code>-u</code> switch to quickly find out about the resource usage of any invocation of a specific service. Let's try:</p> <div class="highlight"><pre><span></span><code><span class="c1"># journalctl -u ip-accounting-test MESSAGE_ID=ae8f7b866b0347b9af31fe1c80b127c0</span> -- Logs begin at Thu <span class="m">2016</span>-08-18 <span class="m">23</span>:09:37 CEST, end at Mon <span class="m">2017</span>-10-09 <span class="m">18</span>:25:27 CEST. -- Okt <span class="m">09</span> <span class="m">18</span>:15:52 sigma systemd<span class="o">[</span><span class="m">1</span><span class="o">]</span>: ip-accounting-test.service: Received <span class="m">49</span>.6K IP traffic, sent <span class="m">49</span>.6K IP traffic </code></pre></div> <p>Of course, the output above shows only one message at the moment, since we started the service only once, but a new one will appear every time you start and stop it again.</p> <p>The IP accounting logic is also hooked up with <a href="https://www.freedesktop.org/software/systemd/man/systemd-run.html"><code>systemd-run</code></a>, which is useful for transiently running a command as systemd service with IP accounting turned on. Let's try it:</p> <div class="highlight"><pre><span></span><code><span class="c1"># systemd-run -p IPAccounting=yes --wait wget https://cfp.all-systems-go.io/en/ASG2017/public/schedule/2.pdf</span> Running as unit: run-u2761.service Finished with result: success Main processes terminated with: <span class="nv">code</span><span class="o">=</span>exited/status<span class="o">=</span><span class="m">0</span> Service runtime: 878ms IP traffic received: <span class="m">231</span>.0K IP traffic sent: <span class="m">3</span>.7K </code></pre></div> <p>This uses <a href="https://linux.die.net/man/1/wget"><code>wget</code></a> to download <a href="https://cfp.all-systems-go.io/en/ASG2017/public/schedule/2.pdf">the PDF version of the 2nd day schedule</a> of everybody's favorite Linux user-space conference <a href="https://all-systems-go.io/">All Systems Go! 2017</a> (BTW, have you already <a href="https://all-systems-go.io/#tickets">booked your ticket</a>? We are very close to selling out, be quick!). The IP traffic this command generated was 231K ingress and 4K egress. In the <code>systemd-run</code> command line two parameters are important. First of all, we use <code>-p IPAccounting=yes</code> to turn on IP accounting for the transient service (as above). And secondly we use <code>--wait</code> to tell <code>systemd-run</code> to wait for the service to exit. If <code>--wait</code> is used, <code>systemd-run</code> will also show you various statistics about the service that just ran and terminated, including the IP statistics you are seeing if IP accounting has been turned on.</p> <p>It's fun to combine this sort of IP accounting with interactive transient units. Let's try that:</p> <div class="highlight"><pre><span></span><code><span class="c1"># systemd-run -p IPAccounting=1 -t /bin/sh</span> Running as unit: run-u2779.service Press ^<span class="o">]</span> three <span class="nb">times</span> within 1s to disconnect TTY. sh-4.4# dnf update … sh-4.4# dnf install firefox … sh-4.4# <span class="nb">exit</span> Finished with result: success Main processes terminated with: <span class="nv">code</span><span class="o">=</span>exited/status<span class="o">=</span><span class="m">0</span> Service runtime: <span class="m">5</span>.297s IP traffic received: …B IP traffic sent: …B </code></pre></div> <p>This uses <code>systemd-run</code>'s <code>--pty</code> switch (or short: <code>-t</code>), which opens an interactive pseudo-TTY connection to the invoked service process, which is a bourne shell in this case. Doing this means we have a full, comprehensive shell with job control and everything. Since the shell is running as part of a service with IP accounting turned on, all IP traffic we generate or receive will be accounted for. And as soon as we exit the shell, we'll see what it consumed. (For the sake of brevity I actually didn't paste the whole output above, but truncated core parts. Try it out for yourself, if you want to see the output in full.)</p> <p>Sometimes it might make sense to turn on IP accounting for a unit that is already running. For that, use <code>systemctl set-property foobar.service IPAccounting=yes</code>, which will instantly turn on accounting for it. Note that it won't count retroactively though: only the traffic sent/received after the point in time you turned it on will be collected. You may turn off accounting for the unit with the same command.</p> <p>Of course, sometimes it's interesting to collect IP accounting data for all services, and turning on <code>IPAccounting=yes</code> in every single unit is cumbersome. To deal with that there's a global option <a href="https://www.freedesktop.org/software/systemd/man/systemd-system.conf.html#DefaultCPUAccounting="><code>DefaultIPAccounting=</code></a> available which can be set in <code>/etc/systemd/system.conf</code>.</p> <h1>IP Access Lists</h1> <p>So much about IP accounting. Let's now have a look at IP access control with systemd 235. As mentioned above, the two new unit file settings, <code>IPAddressAllow=</code> and <code>IPAddressDeny=</code> maybe be used for that. They operate in the following way:</p> <ol> <li> <p>If the source address of an incoming packet or the destination address of an outgoing packet matches one of the IP addresses/network masks in the relevant unit's <code>IPAddressAllow=</code> setting then it will be allowed to go through.</p> </li> <li> <p>Otherwise, if a packet matches an <code>IPAddressDeny=</code> entry configured for the service it is dropped.</p> </li> <li> <p>If the packet matches neither of the above it is allowed to go through.</p> </li> </ol> <p>Or in other words, <code>IPAddressDeny=</code> implements a blacklist, but <code>IPAddressAllow=</code> takes precedence.</p> <p>Let's try that out. Let's modify our last example above in order to get a transient service running an interactive shell which has such an access list set:</p> <div class="highlight"><pre><span></span><code><span class="c1"># systemd-run -p IPAddressDeny=any -p IPAddressAllow=8.8.8.8 -p IPAddressAllow=127.0.0.0/8 -t /bin/sh</span> Running as unit: run-u2850.service Press ^<span class="o">]</span> three <span class="nb">times</span> within 1s to disconnect TTY. sh-4.4# ping <span class="m">8</span>.8.8.8 -c1 PING <span class="m">8</span>.8.8.8 <span class="o">(</span><span class="m">8</span>.8.8.8<span class="o">)</span> <span class="m">56</span><span class="o">(</span><span class="m">84</span><span class="o">)</span> bytes of data. <span class="m">64</span> bytes from <span class="m">8</span>.8.8.8: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">1</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">59</span> <span class="nv">time</span><span class="o">=</span><span class="m">27</span>.9 ms --- <span class="m">8</span>.8.8.8 ping statistics --- <span class="m">1</span> packets transmitted, <span class="m">1</span> received, <span class="m">0</span>% packet loss, <span class="nb">time</span> 0ms rtt min/avg/max/mdev <span class="o">=</span> <span class="m">27</span>.957/27.957/27.957/0.000 ms sh-4.4# ping <span class="m">8</span>.8.4.4 -c1 PING <span class="m">8</span>.8.4.4 <span class="o">(</span><span class="m">8</span>.8.4.4<span class="o">)</span> <span class="m">56</span><span class="o">(</span><span class="m">84</span><span class="o">)</span> bytes of data. ping: sendmsg: Operation not permitted ^C --- <span class="m">8</span>.8.4.4 ping statistics --- <span class="m">1</span> packets transmitted, <span class="m">0</span> received, <span class="m">100</span>% packet loss, <span class="nb">time</span> 0ms sh-4.4# ping <span class="m">127</span>.0.0.2 -c1 PING <span class="m">127</span>.0.0.1 <span class="o">(</span><span class="m">127</span>.0.0.2<span class="o">)</span> <span class="m">56</span><span class="o">(</span><span class="m">84</span><span class="o">)</span> bytes of data. <span class="m">64</span> bytes from <span class="m">127</span>.0.0.2: <span class="nv">icmp_seq</span><span class="o">=</span><span class="m">1</span> <span class="nv">ttl</span><span class="o">=</span><span class="m">64</span> <span class="nv">time</span><span class="o">=</span><span class="m">0</span>.116 ms --- <span class="m">127</span>.0.0.2 ping statistics --- <span class="m">1</span> packets transmitted, <span class="m">1</span> received, <span class="m">0</span>% packet loss, <span class="nb">time</span> 0ms rtt min/avg/max/mdev <span class="o">=</span> <span class="m">0</span>.116/0.116/0.116/0.000 ms sh-4.4# <span class="nb">exit</span> </code></pre></div> <p>The access list we set up uses <code>IPAddressDeny=any</code> in order to define an IP white-list: all traffic will be prohibited for the session, except for what is explicitly white-listed. In this command line, we white-listed two address prefixes: 8.8.8.8 (with no explicit network mask, which means the mask with all bits turned on is implied, i.e. <code>/32</code>), and 127.0.0.0/8. Thus, the service can communicate with Google's DNS server and everything on the local loop-back, but nothing else. The commands run in this interactive shell show this: First we try pinging 8.8.8.8 which happily responds. Then, we try to ping 8.8.4.4 (that's Google's other DNS server, but excluded from this white-list), and as we see it is immediately refused with an <em>Operation not permitted</em> error. As last step we ping 127.0.0.2 (which is on the local loop-back), and we see it works fine again, as expected.</p> <p>In the example above we used <code>IPAddressDeny=any</code>. The <code>any</code> identifier is a shortcut for writing 0.0.0.0/0 ::/0, i.e. it's a shortcut for <em>everything</em>, on both IPv4 and IPv6. A number of other such shortcuts exist. For example, instead of spelling out <code>127.0.0.0/8</code> we could also have used the more descriptive shortcut <code>localhost</code> which is expanded to 127.0.0.0/8 ::1/128, i.e. everything on the local loopback device, on both IPv4 and IPv6.</p> <p>Being able to configure IP access lists individually for each unit is pretty nice already. However, typically one wants to configure this comprehensively, not just for individual units, but for a set of units in one go or even the system as a whole. In systemd, that's possible by making use of <a href="https://www.freedesktop.org/software/systemd/man/systemd.slice.html"><code>.slice</code></a> units (for those who don't know systemd that well, slice units are a concept for organizing services in hierarchical tree for the purpose of resource management): the IP access list in effect for a unit is the combination of the individual IP access lists configured for the unit itself and those of all slice units it is contained in.</p> <p>By default, system services are assigned to <a href="https://www.freedesktop.org/software/systemd/man/systemd.special.html#system.slice"><code>system.slice</code></a>, which in turn is a child of the root slice <a href="https://www.freedesktop.org/software/systemd/man/systemd.special.html#-.slice"><code>-.slice</code></a>. Either of these two slice units are hence suitable for locking down <em>all</em> system services at once. If an access list is configured on <code>system.slice</code> it will only apply to system services, however, if configured on <code>-.slice</code> it will apply to all user processes of the system, including all user session processes (i.e. which are by default assigned to <code>user.slice</code> which is a child of <code>-.slice</code>) in addition to the system services.</p> <p>Let's make use of this:</p> <div class="highlight"><pre><span></span><code># systemctl set-property system.slice IPAddressDeny=any IPAddressAllow=localhost # systemctl set-property apache.service IPAddressAllow=10.0.0.0/8 </code></pre></div> <p>The two commands above are a very powerful way to first turn off all IP communication for all system services (with the exception of loop-back traffic), followed by an explicit white-listing of 10.0.0.0/8 (which could refer to the local company network, you get the idea) but only for the Apache service.</p> <h1>Use-cases</h1> <p>After playing around a bit with this, let's talk about use-cases. Here are a few ideas:</p> <ol> <li> <p>The IP access list logic can in many ways provide a more modern replacement for the venerable <a href="https://en.wikipedia.org/wiki/TCP_Wrapper">TCP Wrapper</a>, but unlike it it applies to all IP sockets of a service unconditionally, and requires no explicit support in any way in the service's code: no patching required. On the other hand, TCP wrappers have a number of features this scheme cannot cover, most importantly systemd's IP access lists operate solely on the level of IP addresses and network masks, there is no way to configure access by DNS name (though quite frankly, that is a very dubious feature anyway, as doing networking — unsecured networking even – in order to restrict networking sounds quite questionable, at least to me).</p> </li> <li> <p>It can also replace (or augment) some facets of IP firewalling, i.e. Linux NetFilter/<code>iptables</code>. Right now, systemd's access lists are of course a lot more minimal than NetFilter, but they have one major benefit: they understand the service concept, and thus are a lot more context-aware than NetFilter. Classic firewalls, such as NetFilter, derive most service context from the IP port number alone, but we live in a world where IP port numbers are a lot more dynamic than they used to be. As one example, a BitTorrent client or server may use any IP port it likes for its file transfer, and writing IP firewalling rules matching that precisely is hence hard. With the systemd IP access list implementing this is easy: just set the list for your BitTorrent service unit, and all is good.</p> <p>Let me stress though that you should be careful when comparing NetFilter with systemd's IP address list logic, it's really like comparing apples and oranges: to start with, the IP address list logic has a clearly local focus, it only knows what a local service is and manages access of it. NetFilter on the other hand may run on border gateways, at a point where the traffic flowing through is pure IP, carrying no information about a systemd unit concept or anything like that.</p> </li> <li> <p>It's a simple way to lock down distribution/vendor supplied system services by default. For example, if you ship a service that you know never needs to access the network, then simply set <code>IPAddressDeny=any</code> (possibly combined with <code>IPAddressAllow=localhost</code>) for it, and it will live in a very tight networking sand-box it cannot escape from. systemd itself makes use of this for a number of its services by default now. For example, the logging service <code>systemd-journald.service</code>, the login manager <code>systemd-logind</code> or the core-dump processing unit <code>systemd-coredump@.service</code> all have such a rule set out-of-the-box, because we know that neither of these services should be able to access the network, under any circumstances.</p> </li> <li> <p>Because the IP access list logic can be combined with transient units, it can be used to quickly and effectively sandbox arbitrary commands, and even include them in shell pipelines and such. For example, let's say we don't trust our <a href="https://linux.die.net/man/1/curl"><code>curl</code></a> implementation (maybe it got modified locally by a hacker, and phones home?), but want to use it anyway to download the <a href="http://0pointer.de/public/casync-kinvolk2017.pdf">the slides of my most recent casync talk</a> in order to print it, but want to make sure it doesn't connect anywhere except where we tell it to (and to make this even more fun, let's minimize privileges further, by setting <a href="http://0pointer.net/blog/dynamic-users-with-systemd.html"><code>DynamicUser=yes</code></a>):</p> <div class="highlight"><pre><span></span><code># systemd-resolve 0pointer.de 0pointer.de: 85.214.157.71 2a01:238:43ed:c300:10c3:bcf3:3266:da74 -- Information acquired via protocol DNS in 2.8ms. -- Data is authenticated: no # systemd-run --pipe -p IPAddressDeny=any \ -p IPAddressAllow=85.214.157.71 \ -p IPAddressAllow=2a01:238:43ed:c300:10c3:bcf3:3266:da74 \ -p DynamicUser=yes \ curl http://0pointer.de/public/casync-kinvolk2017.pdf | lp </code></pre></div> </li> </ol> <p>So much about use-cases. This is by no means a comprehensive list of what you can do with it, after all both IP accounting and IP access lists are very generic concepts. But I do hope the above inspires your fantasy.</p> <h1>What does that mean for packagers?</h1> <p>IP accounting and IP access control are primarily concepts for the local administrator. However, As suggested above, it's a very good idea to ship services that by design have no network-facing functionality with an access list of <code>IPAddressDeny=any</code> (and possibly <code>IPAddressAllow=localhost</code>), in order to improve the out-of-the-box security of our systems.</p> <p>An option for security-minded distributions might be a more radical approach: ship the system with <code>-.slice</code> or <code>system.slice</code> configured to <code>IPAddressDeny=any</code> by default, and ask the administrator to punch holes into that for each network facing service with <code>systemctl set-property … IPAddressAllow=…</code>. But of course, that's only an option for distributions willing to break compatibility with what was before.</p> <h1>Notes</h1> <p>A couple of additional notes:</p> <ol> <li> <p>IP accounting and access lists may be mixed with socket activation. In this case, it's a good idea to configure access lists and accounting for both the socket unit that activates and the service unit that is activated, as both units maintain fully separate settings. Note that IP accounting and access lists configured on the socket unit applies to all sockets created on behalf of that unit, and even if these sockets are passed on to the activated services, they will still remain in effect and belong to the socket unit. This also means that IP traffic done on such sockets will be accounted to the socket unit, not the service unit. The fact that IP access lists are maintained separately for the kernel sockets created on behalf of the socket unit and for the kernel sockets created by the service code itself enables some interesting uses. For example, it's possible to set a relatively open access list on the socket unit, but a very restrictive access list on the service unit, thus making the sockets configured through the socket unit the only way in and out of the service.</p> </li> <li> <p>systemd's IP accounting and access lists apply to IP sockets only, not to sockets of any other address families. That also means that <code>AF_PACKET</code> (i.e. raw) sockets are not covered. This means it's a good idea to combine IP access lists with <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RestrictAddressFamilies="><code>RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6</code></a> in order to lock this down.</p> </li> <li> <p>You may wonder if the per-unit resource log message and <code>systemd-run --wait</code> may also show you details about other types or resources consumed by a service. The answer is yes: if you turn on <code>CPUAccounting=</code> for a service, you'll also see a summary of consumed CPU time in the log message and the command output. And we are planning to hook-up <code>IOAccounting=</code> the same way too, soon.</p> </li> <li> <p>Note that IP accounting and access lists aren't entirely free. systemd inserts an eBPF program into the IP pipeline to make this functionality work. However, eBPF execution has been optimized for speed in the last kernel versions already, and given that it currently is in the focus of interest to many I'd expect to be optimized even further, so that the cost for enabling these features will be negligible, if it isn't already.</p> </li> <li> <p>IP accounting is currently not recursive. That means you cannot use a slice unit to join the accounting of multiple units into one. This is something we definitely want to add, but requires some more kernel work first.</p> </li> <li> <p>You might wonder how the <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#PrivateNetwork="><code>PrivateNetwork=</code></a> setting relates to <code>IPAccessDeny=any</code>. Superficially they have similar effects: they make the network unavailable to services. However, looking more closely there are a number of differences. <code>PrivateNetwork=</code> is implemented using Linux network name-spaces. As such it entirely detaches all networking of a service from the host, including non-IP networking. It does so by creating a private little environment the service lives in where communication with itself is still allowed though. In addition using the <a href="https://www.freedesktop.org/software/systemd/man/systemd.unit.html#JoinsNamespaceOf="><code>JoinsNamespaceOf=</code></a> dependency additional services may be added to the same environment, thus permitting communication with each other but not with anything outside of this group. <code>IPAddressAllow=</code> and <code>IPAddressDeny=</code> are much less invasive. First of all they apply to IP networking only, and can match against specific IP addresses. A service running with <code>PrivateNetwork=</code> turned off but <code>IPAddressDeny=any</code> turned on, may enumerate the network interfaces and their IP configured even though it cannot actually do any IP communication. On the other hand if you turn on <code>PrivateNetwork=</code> all network interfaces besides <code>lo</code> disappear. Long story short: depending on your use-case one, the other, both or neither might be suitable for sand-boxing of your service. If possible I'd always turn on both, for best security, and that's what we do for all of systemd's own long-running services.</p> </li> </ol> <p>And that's all for now. Have fun with per-unit IP accounting and access lists!</p>Lennart PoetteringMon, 09 Oct 2017 00:00:00 +0200tag:0pointer.net,2017-10-09:/blog/ip-accounting-and-access-lists-with-systemd.htmlprojectsDynamic Users with systemdhttps://0pointer.net/blog/dynamic-users-with-systemd.html<p><em>TL;DR: you may now configure systemd to dynamically allocate a UNIX user ID for service processes when it starts them and release it when it stops them. It's pretty secure, mixes well with transient services, socket activated services and service templating.</em></p> <p>Today we released <a href="https://lists.freedesktop.org/archives/systemd-devel/2017-October/039589.html">systemd 235</a>. Among other improvements this greatly extends the dynamic user logic of systemd. Dynamic users are a powerful but little known concept, supported in its basic form since systemd 232. With this blog story I hope to make it a bit better known.</p> <p>The UNIX <em>user</em> concept is the most basic and well-understood security concept in POSIX operating systems. It is UNIX/POSIX' primary security concept, the one everybody can agree on, and most security concepts that came after it (such as process capabilities, SELinux and other MACs, user name-spaces, …) in some form or another build on it, extend it or at least interface with it. If you build a Linux kernel with all security features turned off, the user concept is pretty much the one you'll still retain.</p> <p>Originally, the user concept was introduced to make multi-user systems a reality, i.e. systems enabling multiple <em>human</em> users to share the same system at the same time, cleanly separating their resources and protecting them from each other. The majority of today's UNIX systems don't really use the user concept like that anymore though. Most of today's systems probably have only one actual human user (or even less!), but their user databases (<code>/etc/passwd</code>) list a good number more entries than that. Today, the majority of UNIX users in most environments are <em>system users</em>, i.e. users that are not the technical representation of a human sitting in front of a PC anymore, but the security identity a system service — an executable program — runs as. Even though traditional, simultaneous multi-user systems slowly became less relevant, their ground-breaking basic concept became the cornerstone of UNIX security. The OS is nowadays partitioned into isolated services — and each service runs as its own system user, and thus within its own, minimal security context.</p> <p>The people behind the Android OS realized the relevance of the UNIX user concept as the primary security concept on UNIX, and took its use even further: on Android not only system services take benefit of the UNIX user concept, but each UI app gets its own, individual user identity too — thus neatly separating app resources from each other, and protecting app processes from each other, too.</p> <p>Back in the more traditional Linux world things are a bit less advanced in this area. Even though users are the quintessential UNIX security concept, allocation and management of system users is still a pretty limited, raw and static affair. In most cases, RPM or DEB package installation scripts allocate a fixed number of (usually one) system users when you install the package of a service that wants to take benefit of the user concept, and from that point on the system user remains allocated on the system and is never deallocated again, even if the package is later removed again. Most Linux distributions limit the number of system users to 1000 (which isn't particularly a lot). Allocating a system user is hence expensive: the number of available users is limited, and there's no defined way to dispose of them after use. If you make use of system users too liberally, you are very likely to run out of them sooner rather than later.</p> <p>You may wonder why system users are generally not deallocated when the package that registered them is uninstalled from a system (at least on most distributions). The reason for that is one relevant property of the user concept (you might even want to call this a <em>design flaw</em>): user IDs are <em>sticky</em> to files (and other objects such as IPC objects). If a service running as a specific system user creates a file at some location, and is then terminated and its package and user removed, then the created file still belongs to the numeric ID ("UID") the system user originally got assigned. When the next system user is allocated and — due to ID recycling — happens to get assigned the same numeric ID, then it will also gain access to the file, and that's generally considered a problem, given that the file belonged to a potentially very different service once upon a time, and likely should not be readable or changeable by anything coming after it. Distributions hence tend to avoid UID recycling which means system users remain registered forever on a system after they have been allocated once.</p> <p>The above is a description of the status quo ante. Let's now focus on what systemd's dynamic user concept brings to the table, to improve the situation.</p> <h1>Introducing Dynamic Users</h1> <p>With systemd dynamic users we hope to make make it easier and cheaper to allocate system users on-the-fly, thus substantially increasing the possible uses of this core UNIX security concept.</p> <p>If you write a systemd service unit file, you may enable the dynamic user logic for it by setting the <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#DynamicUser="><code>DynamicUser=</code></a> option in its <code>[Service]</code> section to <code>yes</code>. If you do a system user is dynamically allocated the instant the service binary is invoked, and released again when the service terminates. The user is automatically allocated from the UID range 61184–65519, by looking for a so far unused UID.</p> <p>Now you may wonder, how does this concept deal with the sticky user issue discussed above? In order to counter the problem, two strategies easily come to mind:</p> <ol> <li> <p>Prohibit the service from creating any files/directories or IPC objects</p> </li> <li> <p>Automatically removing the files/directories or IPC objects the service created when it shuts down.</p> </li> </ol> <p>In systemd we implemented both strategies, but for different parts of the execution environment. Specifically:</p> <ol> <li> <p>Setting <code>DynamicUser=yes</code> implies <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#ProtectSystem="><code>ProtectSystem=strict</code></a> and <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#ProtectHome="><code>ProtectHome=read-only</code></a>. These sand-boxing options turn off write access to pretty much the whole OS directory tree, with a few relevant exceptions, such as the API file systems <code>/proc</code>, <code>/sys</code> and so on, as well as <code>/tmp</code> and <code>/var/tmp</code>. (BTW: setting these two options on your regular services that do not use <code>DynamicUser=</code> is a good idea too, as it drastically reduces the exposure of the system to exploited services.)</p> </li> <li> <p>Setting <code>DynamicUser=yes</code> implies <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#PrivateTmp="><code>PrivateTmp=yes</code></a>. This option sets up <code>/tmp</code> and <code>/var/tmp</code> for the service in a way that it gets its own, disconnected version of these directories, that are not shared by other services, and whose life-cycle is bound to the service's own life-cycle. Thus if the service goes down, the user is removed and all its temporary files and directories with it. (BTW: as above, consider setting this option for your regular services that do not use <code>DynamicUser=</code> too, it's a great way to lock things down security-wise.)</p> </li> <li> <p>Setting <code>DynamicUser=yes</code> implies <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RemoveIPC="><code>RemoveIPC=yes</code></a>. This option ensures that when the service goes down all SysV and POSIX IPC objects (shared memory, message queues, semaphores) owned by the service's user are removed. Thus, the life-cycle of the IPC objects is bound to the life-cycle of the dynamic user and service, too. (BTW: yes, here too, consider using this in your regular services, too!)</p> </li> </ol> <p>With these four settings in effect, services with dynamic users are nicely sand-boxed. They cannot create files or directories, except in <code>/tmp</code> and <code>/var/tmp</code>, where they will be removed automatically when the service shuts down, as will any IPC objects created. Sticky ownership of files/directories and IPC objects is hence dealt with effectively.</p> <p>The <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RuntimeDirectory="><code>RuntimeDirectory=</code></a> option may be used to open up a bit the sandbox to external programs. If you set it to a directory name of your choice, it will be created below <code>/run</code> when the service is started, and removed in its entirety when it is terminated. The ownership of the directory is assigned to the service's dynamic user. This way, a dynamic user service can expose API interfaces (AF_UNIX sockets, …) to other services at a well-defined place and again bind the life-cycle of it to the service's own run-time. Example: set <code>RuntimeDirectory=foobar</code> in your service, and watch how a directory <code>/run/foobar</code> appears at the moment you start the service, and disappears the moment you stop it again. (BTW: Much like the other settings discussed above, <code>RuntimeDirectory=</code> may be used outside of the <code>DynamicUser=</code> context too, and is a nice way to run any service with a properly owned, life-cycle-managed run-time directory.)</p> <h1>Persistent Data</h1> <p>Of course, a service running in such an environment (although already very useful for many cases!), has a major limitation: it cannot leave persistent data around it can reuse on a later run. As pretty much the whole OS directory tree is read-only to it, there's simply no place it could put the data that survives from one service invocation to the next.</p> <p>With systemd 235 this limitation is removed: there are now three new settings: <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RuntimeDirectory="><code>StateDirectory=</code></a>, <code>LogsDirectory=</code> and <code>CacheDirectory=</code>. In many ways they operate like <code>RuntimeDirectory=</code>, but create sub-directories below <code>/var/lib</code>, <code>/var/log</code> and <code>/var/cache</code>, respectively. There's one major difference beyond that however: directories created that way are <em>persistent</em>, they will survive the run-time cycle of a service, and thus may be used to store data that is supposed to stay around between invocations of the service.</p> <p>Of course, the obvious question to ask now is: how do these three settings deal with the <em>sticky file ownership problem</em>?</p> <p>For that we lifted a concept from container managers. Container managers have a very similar problem: each container and the host typically end up using a very similar set of numeric UIDs, and unless user name-spacing is deployed this means that host users might be able to access the data of specific containers that also have a user by the same numeric UID assigned, even though it actually refers to a very different identity in a different context. (Actually, it's even worse than just getting access, due to the existence of <code>setuid</code> file bits, access might translate to privilege elevation.) The way container managers protect the container images from the host (and from each other to some level) is by placing the container trees below a <em>boundary</em> directory, with very restrictive access modes and ownership (0700 and <code>root:root</code> or so). A host user hence cannot take advantage of the files/directories of a container user of the same UID inside of a local container tree, simply because the boundary directory makes it impossible to even reference files in it. After all on UNIX, in order to get access to a specific path you need access to every single component of it.</p> <p>How is that applied to dynamic user services? Let's say <code>StateDirectory=foobar</code> is set for a service that has <code>DynamicUser=</code> turned off. The instant the service is started, <code>/var/lib/foobar</code> is created as state directory, owned by the service's user and remains in existence when the service is stopped. If the same service now is run with <code>DynamicUser=</code> turned on, the implementation is slightly altered. Instead of a directory <code>/var/lib/foobar</code> a symbolic link by the same path is created (owned by root), pointing to <code>/var/lib/private/foobar</code> (the latter being owned by the service's dynamic user). The <code>/var/lib/private</code> directory is created as boundary directory: it's owned by <code>root:root</code>, and has a restrictive access mode of 0700. Both the symlink and the service's state directory will survive the service's life-cycle, but the state directory will remain, and continues to be owned by the now disposed dynamic UID — however it is protected from other host users (and other services which might get the same dynamic UID assigned due to UID recycling) by the boundary directory.</p> <p>The obvious question to ask now is: but if the boundary directory prohibits access to the directory from unprivileged processes, how can the service itself which runs under its own dynamic UID access it anyway? This is achieved by invoking the service process in a slightly modified mount name-space: it will see most of the file hierarchy the same way as everything else on the system (modulo <code>/tmp</code> and <code>/var/tmp</code> as mentioned above), except for <code>/var/lib/private</code>, which is over-mounted with a read-only <code>tmpfs</code> file system instance, with a slightly more liberal access mode permitting the service read access. Inside of this <code>tmpfs</code> file system instance another mount is placed: a bind mount to the host's real <code>/var/lib/private/foobar</code> directory, onto the same name. Putting this together these means that superficially everything looks the same and is available at the same place on the host and from inside the service, but two important changes have been made: the <code>/var/lib/private</code> boundary directory lost its restrictive character inside the service, and has been emptied of the state directories of any other service, thus making the protection complete. Note that the symlink <code>/var/lib/foobar</code> hides the fact that the boundary directory is used (making it little more than an implementation detail), as the directory is available this way under the same name as it would be if <code>DynamicUser=</code> was not used. Long story short: for the daemon and from the view from the host the indirection through <code>/var/lib/private</code> is mostly transparent.</p> <p>This logic of course raises another question: what happens to the state directory if a dynamic user service is started with a state directory configured, gets UID X assigned on this first invocation, then terminates and is restarted and now gets UID Y assigned on the second invocation, with X ≠ Y? On the second invocation the directory — and all the files and directories below it — will still be owned by the original UID X so how could the second instance running as Y access it? Our way out is simple: systemd will recursively change the ownership of the directory and everything contained within it to UID Y before invoking the service's executable.</p> <p>Of course, such recursive ownership changing (<code>chown()</code>ing) of whole directory trees can become expensive (though according to my experiences, IRL and for most services it's much cheaper than you might think), hence in order to optimize behavior in this regard, the allocation of dynamic UIDs has been tweaked in two ways to avoid the necessity to do this expensive operation in most cases: firstly, when a dynamic UID is allocated for a service an allocation loop is employed that starts out with a UID hashed from the service's name. This means a service by the same name is likely to always use the same numeric UID. That means that a stable service name translates into a stable dynamic UID, and that means recursive file ownership adjustments can be skipped (of course, after validation). Secondly, if the configured state directory already exists, and is owned by a suitable currently unused dynamic UID, it's preferably used above everything else, thus maximizing the chance we can avoid the <code>chown()</code>ing. (That all said, ultimately we have to face it, the currently available UID space of 4K+ is very small still, and conflicts are pretty likely sooner or later, thus a chown()ing has to be expected every now and then when this feature is used extensively).</p> <p>Note that <code>CacheDirectory=</code> and <code>LogsDirectory=</code> work very similar to <code>StateDirectory=</code>. The only difference is that they manage directories below the <code>/var/cache</code> and <code>/var/logs</code> directories, and their boundary directory hence is <code>/var/cache/private</code> and <code>/var/log/private</code>, respectively.</p> <h1>Examples</h1> <p>So, after all this introduction, let's have a look how this all can be put together. Here's a trivial example:</p> <div class="highlight"><pre><span></span><code><span class="c1"># cat &gt; /etc/systemd/system/dynamic-user-test.service &lt;&lt;EOF</span> <span class="o">[</span>Service<span class="o">]</span> <span class="nv">ExecStart</span><span class="o">=</span>/usr/bin/sleep <span class="m">4711</span> <span class="nv">DynamicUser</span><span class="o">=</span>yes EOF <span class="c1"># systemctl daemon-reload</span> <span class="c1"># systemctl start dynamic-user-test</span> <span class="c1"># systemctl status dynamic-user-test</span> ● dynamic-user-test.service Loaded: loaded <span class="o">(</span>/etc/systemd/system/dynamic-user-test.service<span class="p">;</span> static<span class="p">;</span> vendor preset: disabled<span class="o">)</span> Active: active <span class="o">(</span>running<span class="o">)</span> since Fri <span class="m">2017</span>-10-06 <span class="m">13</span>:12:25 CEST<span class="p">;</span> 3s ago Main PID: <span class="m">2967</span> <span class="o">(</span>sleep<span class="o">)</span> Tasks: <span class="m">1</span> <span class="o">(</span>limit: <span class="m">4915</span><span class="o">)</span> CGroup: /system.slice/dynamic-user-test.service └─2967 /usr/bin/sleep <span class="m">4711</span> Okt <span class="m">06</span> <span class="m">13</span>:12:25 sigma systemd<span class="o">[</span><span class="m">1</span><span class="o">]</span>: Started dynamic-user-test.service. <span class="c1"># ps -e -o pid,comm,user | grep 2967</span> <span class="m">2967</span> sleep dynamic-user-test <span class="c1"># id dynamic-user-test</span> <span class="nv">uid</span><span class="o">=</span><span class="m">64642</span><span class="o">(</span>dynamic-user-test<span class="o">)</span> <span class="nv">gid</span><span class="o">=</span><span class="m">64642</span><span class="o">(</span>dynamic-user-test<span class="o">)</span> <span class="nv">groups</span><span class="o">=</span><span class="m">64642</span><span class="o">(</span>dynamic-user-test<span class="o">)</span> <span class="c1"># systemctl stop dynamic-user-test</span> <span class="c1"># id dynamic-user-test</span> id: ‘dynamic-user-test’: no such user </code></pre></div> <p>In this example, we create a unit file with <code>DynamicUser=</code> turned on, start it, check if it's running correctly, have a look at the service process' user (which is named like the service; systemd does this automatically if the service name is suitable as user name, and you didn't configure any user name to use explicitly), stop the service and verify that the user ceased to exist too.</p> <p>That's already pretty cool. Let's step it up a notch, by doing the same in an interactive <em>transient</em> service (for those who don't know systemd well: a transient service is a service that is defined and started dynamically at run-time, for example via the <a href="https://www.freedesktop.org/software/systemd/man/systemd-run.html"><code>systemd-run</code></a> command from the shell. Think: run a service without having to write a unit file first):</p> <div class="highlight"><pre><span></span><code><span class="c1"># systemd-run --pty --property=DynamicUser=yes --property=StateDirectory=wuff /bin/sh</span> Running as unit: run-u15750.service Press ^<span class="o">]</span> three <span class="nb">times</span> within 1s to disconnect TTY. sh-4.4$ id <span class="nv">uid</span><span class="o">=</span><span class="m">63122</span><span class="o">(</span>run-u15750<span class="o">)</span> <span class="nv">gid</span><span class="o">=</span><span class="m">63122</span><span class="o">(</span>run-u15750<span class="o">)</span> <span class="nv">groups</span><span class="o">=</span><span class="m">63122</span><span class="o">(</span>run-u15750<span class="o">)</span> <span class="nv">context</span><span class="o">=</span>system_u:system_r:initrc_t:s0 sh-4.4$ ls -al /var/lib/private/ total <span class="m">0</span> drwxr-xr-x. <span class="m">3</span> root root <span class="m">60</span> <span class="m">6</span>. Okt <span class="m">13</span>:21 . drwxr-xr-x. <span class="m">1</span> root root <span class="m">852</span> <span class="m">6</span>. Okt <span class="m">13</span>:21 .. drwxr-xr-x. <span class="m">1</span> run-u15750 run-u15750 <span class="m">8</span> <span class="m">6</span>. Okt <span class="m">13</span>:22 wuff sh-4.4$ ls -ld /var/lib/wuff lrwxrwxrwx. <span class="m">1</span> root root <span class="m">12</span> <span class="m">6</span>. Okt <span class="m">13</span>:21 /var/lib/wuff -&gt; private/wuff sh-4.4$ ls -ld /var/lib/wuff/ drwxr-xr-x. <span class="m">1</span> run-u15750 run-u15750 <span class="m">0</span> <span class="m">6</span>. Okt <span class="m">13</span>:21 /var/lib/wuff/ sh-4.4$ <span class="nb">echo</span> hello &gt; /var/lib/wuff/test sh-4.4$ <span class="nb">exit</span> <span class="nb">exit</span> <span class="c1"># id run-u15750</span> id: ‘run-u15750’: no such user <span class="c1"># ls -al /var/lib/private</span> total <span class="m">0</span> drwx------. <span class="m">1</span> root root <span class="m">66</span> <span class="m">6</span>. Okt <span class="m">13</span>:21 . drwxr-xr-x. <span class="m">1</span> root root <span class="m">852</span> <span class="m">6</span>. Okt <span class="m">13</span>:21 .. drwxr-xr-x. <span class="m">1</span> <span class="m">63122</span> <span class="m">63122</span> <span class="m">8</span> <span class="m">6</span>. Okt <span class="m">13</span>:22 wuff <span class="c1"># ls -ld /var/lib/wuff</span> lrwxrwxrwx. <span class="m">1</span> root root <span class="m">12</span> <span class="m">6</span>. Okt <span class="m">13</span>:21 /var/lib/wuff -&gt; private/wuff <span class="c1"># ls -ld /var/lib/wuff/</span> drwxr-xr-x. <span class="m">1</span> <span class="m">63122</span> <span class="m">63122</span> <span class="m">8</span> <span class="m">6</span>. Okt <span class="m">13</span>:22 /var/lib/wuff/ <span class="c1"># cat /var/lib/wuff/test</span> hello </code></pre></div> <p>The above invokes an interactive shell as transient service <code>run-u15750.service</code> (<code>systemd-run</code> picked that name automatically, since we didn't specify anything explicitly) with a dynamic user whose name is derived automatically from the service name. Because <code>StateDirectory=wuff</code> is used, a persistent state directory for the service is made available as <code>/var/lib/wuff</code>. In the interactive shell running inside the service, the <code>ls</code> commands show the <code>/var/lib/private</code> boundary directory and its contents, as well as the symlink that is placed for the service. Finally, before exiting the shell, a file is created in the state directory. Back in the original command shell we check if the user is still allocated: it is not, of course, since the service ceased to exist when we exited the shell and with it the dynamic user associated with it. From the host we check the state directory of the service, with similar commands as we did from inside of it. We see that things are set up pretty much the same way in both cases, except for two things: first of all the user/group of the files is now shown as raw numeric UIDs instead of the user/group names derived from the unit name. That's because the user ceased to exist at this point, and "ls" shows the raw UID for files owned by users that don't exist. Secondly, the access mode of the boundary directory is different: when we look at it from outside of the service it is not readable by anyone but root, when we looked from inside we saw it it being world readable.</p> <p>Now, let's see how things look if we start another transient service, reusing the state directory from the first invocation:</p> <div class="highlight"><pre><span></span><code><span class="c1"># systemd-run --pty --property=DynamicUser=yes --property=StateDirectory=wuff /bin/sh</span> Running as unit: run-u16087.service Press ^<span class="o">]</span> three <span class="nb">times</span> within 1s to disconnect TTY. sh-4.4$ cat /var/lib/wuff/test hello sh-4.4$ ls -al /var/lib/wuff/ total <span class="m">4</span> drwxr-xr-x. <span class="m">1</span> run-u16087 run-u16087 <span class="m">8</span> <span class="m">6</span>. Okt <span class="m">13</span>:22 . drwxr-xr-x. <span class="m">3</span> root root <span class="m">60</span> <span class="m">6</span>. Okt <span class="m">15</span>:42 .. -rw-r--r--. <span class="m">1</span> run-u16087 run-u16087 <span class="m">6</span> <span class="m">6</span>. Okt <span class="m">13</span>:22 <span class="nb">test</span> sh-4.4$ id <span class="nv">uid</span><span class="o">=</span><span class="m">63122</span><span class="o">(</span>run-u16087<span class="o">)</span> <span class="nv">gid</span><span class="o">=</span><span class="m">63122</span><span class="o">(</span>run-u16087<span class="o">)</span> <span class="nv">groups</span><span class="o">=</span><span class="m">63122</span><span class="o">(</span>run-u16087<span class="o">)</span> <span class="nv">context</span><span class="o">=</span>system_u:system_r:initrc_t:s0 sh-4.4$ <span class="nb">exit</span> <span class="nb">exit</span> </code></pre></div> <p>Here, <code>systemd-run</code> picked a different auto-generated unit name, but the used dynamic UID is still the same, as it was read from the pre-existing state directory, and was otherwise unused. As we can see the test file we generated earlier is accessible and still contains the data we left in there. Do note that the user name is different this time (as it is derived from the unit name, which is different), but the UID it is assigned to is the same one as on the first invocation. We can thus see that the mentioned optimization of the UID allocation logic (i.e. that we start the allocation loop from the UID owner of any existing state directory) took effect, so that no recursive <code>chown()</code>ing was required.</p> <p>And that's the end of our example, which hopefully illustrated a bit how this concept and implementation works.</p> <h1>Use-cases</h1> <p>Now that we had a look at how to enable this logic for a unit and how it is implemented, let's discuss where this actually could be useful in real life.</p> <ul> <li> <p>One major benefit of dynamic user IDs is that running a privilege-separated service leaves no artifacts in the system. A system user is allocated and made use of, but it is discarded automatically in a safe and secure way after use, in a fashion that is safe for later recycling. Thus, quickly invoking a short-lived service for processing some job can be protected properly through a user ID without having to pre-allocate it and without this draining the available UID pool any longer than necessary.</p> </li> <li> <p>In many cases, starting a service no longer requires package-specific preparation. Or in other words, quite often <code>useradd</code>/<code>mkdir</code>/<code>chown</code>/<code>chmod</code> invocations in "<code>post-inst</code>" package scripts, as well as <a href="https://www.freedesktop.org/software/systemd/man/sysusers.d.html"><code>sysusers.d</code></a> and <a href="https://www.freedesktop.org/software/systemd/man/tmpfiles.d.html"><code>tmpfiles.d</code></a> drop-ins become unnecessary, as the <code>DynamicUser=</code> and <code>StateDirectory=</code>/<code>CacheDirectory=</code>/<code>LogsDirectory=</code> logic can do the necessary work automatically, on-demand and with a well-defined life-cycle.</p> </li> <li> <p>By combining dynamic user IDs with the transient unit concept, new creative ways of sand-boxing are made available. For example, let's say you don't trust the correct implementation of the <code>sort</code> command. You can now lock it into a simple, robust, dynamic UID sandbox with a simple <code>systemd-run</code> and still integrate it into a shell pipeline like any other command. Here's an example, showcasing a shell pipeline whose middle element runs as a dynamically on-the-fly allocated UID, that is released when the pipelines ends.</p> <div class="highlight"><pre><span></span><code># cat some-file.txt | systemd-run ---pipe --property=DynamicUser=1 sort -u | grep -i foobar &gt; some-other-file.txt </code></pre></div> </li> <li> <p>By combining dynamic user IDs with the systemd templating logic it is now possible to do much more fine-grained and fully automatic UID management. For example, let's say you have a template unit file <code>/etc/systemd/system/foobard@.service</code>:</p> <div class="highlight"><pre><span></span><code><span class="k">[Service]</span><span class="w"></span> <span class="na">ExecStart</span><span class="o">=</span><span class="s">/usr/bin/myfoobarserviced</span><span class="w"></span> <span class="na">DynamicUser</span><span class="o">=</span><span class="s">1</span><span class="w"></span> <span class="na">StateDirectory</span><span class="o">=</span><span class="s">foobar/%i</span><span class="w"></span> </code></pre></div> <p>Now, let's say you want to start one instance of this service for each of your customers. All you need to do now for that is:</p> <div class="highlight"><pre><span></span><code><span class="err">#</span><span class="w"> </span><span class="n">systemctl</span><span class="w"> </span><span class="n">enable</span><span class="w"> </span><span class="n">foobard</span><span class="nv">@customerxyz</span><span class="p">.</span><span class="n">service</span><span class="w"> </span><span class="c1">--now</span> </code></pre></div> <p>And you are done. (Invoke this as many times as you like, each time replacing <code>customerxyz</code> by some customer identifier, you get the idea.)</p> </li> <li> <p>By combining dynamic user IDs with socket activation you may easily implement a system where each incoming connection is served by a process instance running as a different, fresh, newly allocated UID within its own sandbox. Here's an example <code>waldo.socket</code>:</p> <div class="highlight"><pre><span></span><code><span class="k">[Socket]</span><span class="w"></span> <span class="na">ListenStream</span><span class="o">=</span><span class="s">2048</span><span class="w"></span> <span class="na">Accept</span><span class="o">=</span><span class="s">yes</span><span class="w"></span> </code></pre></div> <p>With a matching <code>waldo@.service</code>:</p> <div class="highlight"><pre><span></span><code><span class="k">[Service]</span><span class="w"></span> <span class="na">ExecStart</span><span class="o">=</span><span class="s">-/usr/bin/myservicebinary</span><span class="w"></span> <span class="na">DynamicUser</span><span class="o">=</span><span class="s">yes</span><span class="w"></span> </code></pre></div> <p>With the two unit files above, systemd will listen on TCP/IP port 2048, and for each incoming connection invoke a fresh instance of <code>waldo@.service</code>, each time utilizing a different, new, dynamically allocated UID, neatly isolated from any other instance.</p> </li> <li> <p>Dynamic user IDs combine very well with state-less systems, i.e. systems that come up with an unpopulated <code>/etc</code> and <code>/var</code>. A service using dynamic user IDs and the <code>StateDirectory=</code>, <code>CacheDirectory=</code>, <code>LogsDirectory=</code> and <code>RuntimeDirectory=</code> concepts will implicitly allocate the users and directories it needs for running, right at the moment where it needs it.</p> </li> </ul> <p>Dynamic users are a very generic concept, hence a multitude of other uses are thinkable; the list above is just supposed to trigger your imagination.</p> <h1>What does this mean for you as a packager?</h1> <p>I am pretty sure that a large number of services shipped with today's distributions could benefit from using <code>DynamicUser=</code> and <code>StateDirectory=</code> (and related settings). It often allows removal of <code>post-inst</code> packaging scripts altogether, as well as any <code>sysusers.d</code> and <code>tmpfiles.d</code> drop-ins by unifying the needed declarations in the unit file itself. Hence, as a packager please consider switching your unit files over. That said, there are a number of conditions where <code>DynamicUser=</code> and <code>StateDirectory=</code> (and friends) cannot or should not be used. To name a few:</p> <ol> <li> <p>Service that need to write to files outside of <code>/run/&lt;package&gt;</code>, <code>/var/lib/&lt;package&gt;</code>, <code>/var/cache/&lt;package&gt;</code>, <code>/var/log/&lt;package&gt;</code>, <code>/var/tmp</code>, <code>/tmp</code>, <code>/dev/shm</code> are generally incompatible with this scheme. This rules out daemons that upgrade the system as one example, as that involves writing to <code>/usr</code>.</p> </li> <li> <p>Services that maintain a herd of processes with different user IDs. Some SMTP services are like this. If your service has such a <em>super-server</em> design, UID management needs to be done by the super-server itself, which rules out systemd doing its dynamic UID magic for it.</p> </li> <li> <p>Services which run as root (obviously…) or are otherwise privileged.</p> </li> <li> <p>Services that need to live in the same mount name-space as the host system (for example, because they want to establish mount points visible system-wide). As mentioned <code>DynamicUser=</code> implies <code>ProtectSystem=</code>, <code>PrivateTmp=</code> and related options, which all require the service to run in its own mount name-space.</p> </li> <li> <p>Your focus is older distributions, i.e. distributions that do not have systemd 232 (for <code>DynamicUser=</code>) or systemd 235 (for <code>StateDirectory=</code> and friends) yet.</p> </li> <li> <p>If your distribution's packaging guides don't allow it. Consult your packaging guides, and possibly start a discussion on your distribution's mailing list about this.</p> </li> </ol> <h1>Notes</h1> <p>A couple of additional, random notes about the implementation and use of these features:</p> <ol> <li> <p>Do note that allocating or deallocating a dynamic user leaves <code>/etc/passwd</code> untouched. A dynamic user is added into the user database through the glibc NSS module <a href="https://www.freedesktop.org/software/systemd/man/nss-systemd.html"><code>nss-systemd</code></a>, and this information never hits the disk.</p> </li> <li> <p>On traditional UNIX systems it was the job of the daemon process itself to drop privileges, while the <code>DynamicUser=</code> concept is designed around the service manager (i.e. systemd) being responsible for that. That said, since v235 there's a way to marry <code>DynamicUser=</code> and such services which want to drop privileges on their own. For that, turn on <code>DynamicUser=</code> and set <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#User="><code>User=</code></a> to the user name the service wants to <code>setuid()</code> to. This has the effect that systemd will allocate the dynamic user under the specified name when the service is started. Then, prefix the command line you specify in <a href="https://www.freedesktop.org/software/systemd/man/systemd.service.html#ExecStart="><code>ExecStart=</code></a> with a single <code>!</code> character. If you do, the user is allocated for the service, but the daemon binary is invoked as <code>root</code> instead of the allocated user, under the assumption that the daemon changes its UID on its own the right way. Note that after registration the user will show up instantly in the user database, and is hence resolvable like any other by the daemon process. Example: <code>ExecStart=!/usr/bin/mydaemond</code></p> </li> <li> <p>You may wonder why systemd uses the UID range 61184–65519 for its dynamic user allocations (side note: in hexadecimal this reads as 0xEF00–0xFFEF). That's because distributions (specifically Fedora) tend to allocate regular users from below the 60000 range, and we don't want to step into that. We also want to stay away from 65535 and a bit around it, as some of these UIDs have special meanings (65535 is often used as special value for "invalid" or "no" UID, as it is identical to the 16bit value -1; 65534 is generally mapped to the "nobody" user, and is where some kernel subsystems map unmappable UIDs). Finally, we want to stay within the 16bit range. In a user name-spacing world each container tends to have much less than the full 32bit UID range available that Linux kernels theoretically provide. Everybody apparently can agree that a container should at least cover the 16bit range though — already to include a <code>nobody</code> user. (And quite frankly, I am pretty sure assigning 64K UIDs per container is nicely systematic, as the the higher 16bit of the 32bit UID values this way become a container ID, while the lower 16bit become the logical UID within each container, if you still follow what I am babbling here…). And before you ask: no this range cannot be changed right now, it's compiled in. We might change that eventually however.</p> </li> <li> <p>You might wonder what happens if you already used UIDs from the 61184–65519 range on your system for other purposes. systemd should handle that mostly fine, as long as that usage is properly registered in the user database: when allocating a dynamic user we pick a UID, see if it is currently used somehow, and if yes pick a different one, until we find a free one. Whether a UID is used right now or not is checked through NSS calls. Moreover the IPC object lists are checked to see if there are any objects owned by the UID we are about to pick. This means systemd will avoid using UIDs you have assigned otherwise. Note however that this of course makes the pool of available UIDs smaller, and in the worst cases this means that allocating a dynamic user might fail because there simply are no unused UIDs in the range.</p> </li> <li> <p>If not specified otherwise the name for a dynamically allocated user is derived from the service name. Not everything that's valid in a service name is valid in a user-name however, and in some cases a randomized name is used instead to deal with this. Often it makes sense to pick the user names to register explicitly. For that use <code>User=</code> and choose whatever you like.</p> </li> <li> <p>If you pick a user name with <code>User=</code> and combine it with <code>DynamicUser=</code> and the user already exists statically it will be used for the service and the dynamic user logic is automatically disabled. This permits automatic up- and downgrades between static and dynamic UIDs. For example, it provides a nice way to move a system from static to dynamic UIDs in a compatible way: as long as you select the same <code>User=</code> value before and after switching <code>DynamicUser=</code> on, the service will continue to use the statically allocated user if it exists, and only operates in the dynamic mode if it does not. This is useful for other cases as well, for example to adapt a service that normally would use a dynamic user to concepts that require statically assigned UIDs, for example to marry classic UID-based file system quota with such services.</p> </li> <li> <p>systemd always allocates a pair of dynamic UID and GID at the same time, with the same numeric ID.</p> </li> <li> <p>If the Linux kernel had a "shiftfs" or similar functionality, i.e. a way to mount an existing directory to a second place, but map the exposed UIDs/GIDs in some way configurable at mount time, this would be excellent for the implementation of <code>StateDirectory=</code> in conjunction with <code>DynamicUser=</code>. It would make the recursive <code>chown()</code>ing step unnecessary, as the host version of the state directory could simply be mounted into a the service's mount name-space, with a shift applied that maps the directory's owner to the services' UID/GID. But I don't have high hopes in this regard, as all work being done in this area appears to be bound to user name-spacing — which is a concept not used here (and I guess one could say user name-spacing is probably more a source of problems than a solution to one, but you are welcome to disagree on that).</p> </li> </ol> <p>And that's all for now. Enjoy your dynamic users!</p>Lennart PoetteringFri, 06 Oct 2017 00:00:00 +0200tag:0pointer.net,2017-10-06:/blog/dynamic-users-with-systemd.htmlprojectsAll Systems Go! 2017 Schedule Publishedhttps://0pointer.net/blog/all-systems-go-2017-schedule-published.html<p><large><b>The All Systems Go! 2017 schedule has been published!</b></large></p> <p>I am happy to announce that we have published the <a href="https://all-systems-go.io/">All Systems Go! 2017</a> schedule! We are very happy with the large number and the quality of the submissions we got, and the resulting schedule is exceptionally strong.</p> <p>Without further ado:</p> <p><a href="https://cfp.all-systems-go.io/en/ASG2017/public/schedule/2">Here's the schedule for the first day (Saturday, 21st of October).</a></p> <p><a href="https://cfp.all-systems-go.io/en/ASG2017/public/schedule/3">And here's the schedule for the second day (Sunday, 22nd of October).</a></p> <p>Here are a couple of keywords from the topics of the talks: <strong>1password</strong>, <strong>azure</strong>, <strong>bluetooth</strong>, <strong>build systems</strong>, <strong>casync</strong>, <strong>cgroups</strong>, <strong>cilium</strong>, <strong>cockpit</strong>, <strong>containers</strong>, <strong>ebpf</strong>, <strong>flatpak</strong>, <strong>habitat</strong>, <strong>IoT</strong>, <strong>kubernetes</strong>, <strong>landlock</strong>, <strong>meson</strong>, <strong>OCI</strong>, <strong>rkt</strong>, <strong>rust</strong>, <strong>secureboot</strong>, <strong>skydive</strong>, <strong>systemd</strong>, <strong>testing</strong>, <strong>tor</strong>, <strong>varlink</strong>, <strong>virtualization</strong>, <strong>wifi</strong>, and more.</p> <p>Our speakers are from all across the industry: Chef CoreOS, Covalent, Facebook, Google, Intel, Kinvolk, Microsoft, Mozilla, Pantheon, Pengutronix, Red Hat, SUSE and more.</p> <p><a href="https://all-systems-go.io/"><img src="https://all-systems-go.io/img/header-graphic.png" width="600" height="195" border="5"/></a></p> <p>For further information about All Systems Go! visit our <a href="http://all-systems-go.io/">conference web site</a>.</p> <p>Make sure to buy your ticket for All Systems Go! 2017 now! A limited number of tickets are left at this point, so make sure you get yours before we are all sold out! <a href="https://all-systems-go.io/#tickets">Find all details here.</a></p> <p>See you in Berlin!</p>Lennart PoetteringWed, 27 Sep 2017 00:00:00 +0200tag:0pointer.net,2017-09-27:/blog/all-systems-go-2017-schedule-published.htmlprojectsAll Systems Go! 2017 CfP Closes Soon!https://0pointer.net/blog/all-systems-go-2017-cfp-closes-soon.html<p><large><b>The All Systems Go! 2017 Call for Participation is Closing on September 3rd!</b></large></p> <p>Please make sure to get your presentation proprosals for<i>All Systems Go! 2017</i> in now! The CfP closes on sunday!</p> <p><a href="https://all-systems-go.io/"><img src="https://all-systems-go.io/img/header-graphic.png" width="600" height="195" border="5"/></a></p> <p>In case you haven't heard about <i>All Systems Go!</i> yet, here's a quick reminder what kind of conference it is, and why you should attend and speak there:</p> <p><i>All Systems Go!</i> is an Open Source community conference focused on the projects and technologies at the foundation of modern Linux systems — specifically low-level user-space technologies. Its goal is to provide a friendly and collaborative gathering place for individuals and communities working to push these technologies forward. <i>All Systems Go! 2017</i> takes place in <b>Berlin, Germany</b> on <b>October 21st+22nd</b>. <i>All Systems Go!</i> is a 2-day event with 2-3 talks happening in parallel. Full presentation slots are 30-45 minutes in length and lightning talk slots are 5-10 minutes.</p> <p>In particular, we are looking for sessions including, but not limited to, the following topics:</p> <ul><li>Low-level container executors and infrastructure</li> <li>IoT and embedded OS infrastructure</li> <li>OS, container, IoT image delivery and updating</li> <li>Building Linux devices and applications</li> <li>Low-level desktop technologies</li> <li>Networking</li> <li>System and service management</li> <li>Tracing and performance measuring</li> <li>IPC and RPC systems</li> <li>Security and Sandboxing</li></ul> <p>While our focus is definitely more on the user-space side of things, talks about kernel projects are welcome too, as long as they have a clear and direct relevance for user-space.</p> <p>To submit your proposal now please visit our <a href="https://cfp.all-systems-go.io/en/ASG2017/events/new">CFP submission web site</a>.</p> <p>For further information about All Systems Go! visit our <a href="http://all-systems-go.io/">conference web site</a>.</p> <p><i>systemd.conf</i> will not take place this year in lieu of <i>All Systems Go!</i>. <i>All Systems Go!</i> welcomes all projects that contribute to Linux user space, which, of course, includes systemd. Thus, anything you think was appropriate for submission to <i>systemd.conf</i> is also fitting for <i>All Systems Go</i>!</p>Lennart PoetteringWed, 30 Aug 2017 00:00:00 +0200tag:0pointer.net,2017-08-30:/blog/all-systems-go-2017-cfp-closes-soon.htmlprojectsAll Systems Go! 2017 Speakershttps://0pointer.net/blog/all-systems-go-2017-speakers.html<p><large><b>The All Systems Go! 2017 Headline Speakers Announced!</b></large></p> <p>Don't forget to send in your submissions to the All Systems Go! 2017 CfP! Proposals are accepted until <b>September 3rd</b>!</p> <p>A couple of headline speakers have been announced now:</p> <ul><li><b>Alban Crequy</b> (Kinvolk)</li> <li><b>Brian "Redbeard" Harrington</b> (CoreOS)</li> <li><b>Gianluca Borello</b> (Sysdig)</li> <li><b>Jon Boulle</b> (NStack/CoreOS)</li> <li><b>Martin Pitt</b> (Debian)</li> <li><b>Thomas Graf</b> (covalent.io/Cilium)</li> <li><b>Vincent Batts</b> (Red Hat/OCI)</li> <li>(and yours truly)</li> </ul> <p>These folks will also review your submissions as part of the papers committee!</p> <p><a href="https://all-systems-go.io/"><img src="https://all-systems-go.io/img/header-graphic.png" width="600" height="195" border="5"/></a></p> <p><i>All Systems Go!</i> is an Open Source community conference focused on the projects and technologies at the foundation of modern Linux systems — specifically low-level user-space technologies. Its goal is to provide a friendly and collaborative gathering place for individuals and communities working to push these technologies forward.</p> <p><i>All Systems Go! 2017</i> takes place in <b>Berlin, Germany</b> on <b>October 21st+22nd</b>.</p> <p>To submit your proposal now please visit our <a href="https://cfp.all-systems-go.io/en/ASG2017/events/new">CFP submission web site</a>.</p> <p>For further information about All Systems Go! visit our <a href="http://all-systems-go.io/">conference web site</a>.</p>Lennart PoetteringThu, 10 Aug 2017 00:00:00 +0200tag:0pointer.net,2017-08-10:/blog/all-systems-go-2017-speakers.htmlprojectscasync Videohttps://0pointer.net/blog/casync-video.html<h1>Video of my casync Presentation @ kinvolk</h1> <p>The great folks at <a href="https://kinvolk.io/">kinvolk</a> have uploaded a <a href="https://www.youtube.com/watch?v=JnNkBJ6pr9s">video of my casync presentation at their offices last week</a>.</p> <iframe width="560" height="315" src="https://www.youtube.com/embed/JnNkBJ6pr9s" frameborder="0" allowfullscreen></iframe> <p>The <a href="http://0pointer.de/public/casync-kinvolk2017.pdf">slides are available</a> as well.</p> <p>Enjoy!</p>Lennart PoetteringTue, 18 Jul 2017 00:00:00 +0200tag:0pointer.net,2017-07-18:/blog/casync-video.htmlprojectsmkosi — A Tool for Generating OS Imageshttps://0pointer.net/blog/mkosi-a-tool-for-generating-os-images.html<h1>Introducing mkosi</h1> <p>After blogging about <a href="http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html"><code>casync</code></a> I realized I never blogged about the <a href="https://github.com/systemd/mkosi"><code>mkosi</code></a> tool that combines nicely with it. <code>mkosi</code> has been around for a while already, and its time to make it a bit better known. <code>mkosi</code> stands for <em>Make Operating System Image</em>, and is a tool for precisely that: generating an OS tree or image that can be booted.</p> <p>Yes, there are many tools like <code>mkosi</code>, and a number of them are quite well known and popular. But <code>mkosi</code> has a number of features that I think make it interesting for a variety of use-cases that other tools don't cover that well.</p> <h1>What is mkosi?</h1> <p>What are those use-cases, and what does <code>mkosi</code> precisely set apart? <code>mkosi</code> is definitely a tool with a focus on developer's needs for building OS images, for testing and debugging, but also for generating production images with cryptographic protection. A typical use-case would be to add a <code>mkosi.default</code> file to an existing project (for example, one written in C or Python), and thus making it easy to generate an OS image for it. <code>mkosi</code> will put together the image with development headers and tools, compile your code in it, run your test suite, then throw away the image again, and build a new one, this time without development headers and tools, and install your build artifacts in it. This final image is then "production-ready", and only contains your built program and the minimal set of packages you configured otherwise. Such an image could then be deployed with <code>casync</code> (or any other tool of course) to be delivered to your set of servers, or IoT devices or whatever you are building.</p> <p><code>mkosi</code> is supposed to be <em>legacy-free</em>: the focus is clearly on today's technology, not yesteryear's. Specifically this means that we'll generate GPT partition tables, not MBR/DOS ones. When you tell <code>mkosi</code> to generate a bootable image for you, it will make it bootable on EFI, not on legacy BIOS. The GPT images generated follow specifications such as the <a href="https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/">Discoverable Partitions Specification</a>, so that <code>/etc/fstab</code> can remain unpopulated and tools such as <code>systemd-nspawn</code> can automatically dissect the image and boot from them.</p> <p>So, let's have a look on the specific images it can generate:</p> <ol> <li>Raw GPT disk image, with ext4 as root</li> <li>Raw GPT disk image, with btrfs as root</li> <li>Raw GPT disk image, with a read-only squashfs as root</li> <li>A plain directory on disk containing the OS tree directly (this is useful for creating generic container images)</li> <li>A btrfs subvolume on disk, similar to the plain directory</li> <li>A tarball of a plain directory</li> </ol> <p>When any of the GPT choices above are selected, a couple of additional options are available:</p> <ol> <li>A swap partition may be added in</li> <li>The system may be made bootable on EFI systems</li> <li>Separate partitions for <code>/home</code> and <code>/srv</code> may be added in</li> <li>The root, <code>/home</code> and <code>/srv</code> partitions may be optionally encrypted with LUKS</li> <li>The root partition may be protected using <code>dm-verity</code>, thus making offline attacks on the generated system hard</li> <li>If the image is made bootable, the <code>dm-verity</code> root hash is automatically added to the kernel command line, and the kernel together with its initial RAM disk and the kernel command line is optionally cryptographically signed for UEFI SecureBoot</li> </ol> <p>Note that <code>mkosi</code> is distribution-agnostic. It currently can build images based on the following Linux distributions:</p> <ol> <li>Fedora</li> <li>Debian</li> <li>Ubuntu</li> <li>ArchLinux</li> <li>openSUSE</li> </ol> <p>Note though that not all distributions are supported at the same feature level currently. Also, as <code>mkosi</code> is based on <code>dnf --installroot</code>, <code>debootstrap</code>, <code>pacstrap</code> and <code>zypper</code>, and those packages are not packaged universally on all distributions, you might not be able to build images for all those distributions on arbitrary host distributions.</p> <p>The GPT images are put together in a way that they aren't just compatible with UEFI systems, but also with VM and container managers (that is, at least the smart ones, i.e. VM managers that know UEFI, and container managers that grok GPT disk images) to a large degree. In fact, the idea is that you can use <code>mkosi</code> to build a single GPT image that may be used to:</p> <ol> <li>Boot on bare-metal boxes</li> <li>Boot in a VM</li> <li>Boot in a <code>systemd-nspawn</code> container</li> <li>Directly run a systemd service off, using systemd's <code>RootImage=</code> unit file setting</li> </ol> <p>Note that in all four cases the <code>dm-verity</code> data is automatically used if available to ensure the image is not tampered with (yes, you read that right, <code>systemd-nspawn</code> and systemd's <code>RootImage=</code> setting automatically do <code>dm-verity</code> these days if the image has it.)</p> <h1>Mode of Operation</h1> <p>The simplest usage of <code>mkosi</code> is by simply invoking it without parameters (as root):</p> <div class="highlight"><pre><span></span><code># mkosi </code></pre></div> <p>Without any configuration this will create a GPT disk image for you, will call it <code>image.raw</code> and drop it in the current directory. The distribution used will be the same one as your host runs.</p> <p>Of course in most cases you want more control about how the image is put together, i.e. select package sets, select the distribution, size partitions and so on. Most of that you can actually specify on the command line, but it is recommended to instead create a couple of <code>mkosi.$SOMETHING</code> files and directories in some directory. Then, simply change to that directory and run <code>mkosi</code> without any further arguments. The tool will then look in the current working directory for these files and directories and make use of them (similar to how <code>make</code> looks for a <code>Makefile</code>…). Every single file/directory is optional, but if they exist they are honored. Here's a list of the files/directories <code>mkosi</code> currently looks for:</p> <ol> <li> <p><code>mkosi.default</code> — This is the main configuration file, here you can configure what kind of image you want, which distribution, which packages and so on.</p> </li> <li> <p><code>mkosi.extra/</code> — If this directory exists, then <code>mkosi</code> will copy everything inside it into the images built. You can place arbitrary directory hierarchies in here, and they'll be copied over whatever is already in the image, after it was put together by the distribution's package manager. This is the best way to drop additional static files into the image, or override distribution-supplied ones.</p> </li> <li> <p><code>mkosi.build</code> — This executable file is supposed to be a build script. When it exists, <code>mkosi</code> will build two images, one after the other in the mode already mentioned above: the first version is the build image, and may include various build-time dependencies such as a compiler or development headers. The build script is also copied into it, and then run inside it. The script should then build whatever shall be built and place the result in <code>$DESTDIR</code> (don't worry, popular build tools such as Automake or Meson all honor <code>$DESTDIR</code> anyway, so there's not much to do here explicitly). It may also run a test suite, or anything else you like. After the script finished, the build image is removed again, and a second image (the <em>final</em> image) is built. This time, no development packages are included, and the build script is not copied into the image again — however, the build artifacts from the first run (i.e. those placed in <code>$DESTDIR</code>) are copied into the image.</p> </li> <li> <p><code>mkosi.postinst</code> — If this executable script exists, it is invoked inside the image (inside a <code>systemd-nspawn</code> invocation) and can adjust the image as it likes at a very late point in the image preparation. If <code>mkosi.build</code> exists, i.e. the dual-phased development build process used, then this script will be invoked twice: once inside the build image and once inside the final image. The first parameter passed to the script clarifies which phase it is run in.</p> </li> <li> <p><code>mkosi.nspawn</code> — If this file exists, it should contain a container configuration file for <code>systemd-nspawn</code> (see <a href="https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html">systemd.nspawn(5)</a> for details), which shall be shipped along with the final image and shall be included in the check-sum calculations (see below).</p> </li> <li> <p><code>mkosi.cache/</code> — If this directory exists, it is used as package cache directory for the builds. This directory is effectively bind mounted into the image at build time, in order to speed up building images. The package installers of the various distributions will place their package files here, so that subsequent runs can reuse them.</p> </li> <li> <p><code>mkosi.passphrase</code> — If this file exists, it should contain a pass-phrase to use for the LUKS encryption (if that's enabled for the image built). This file should not be readable to other users.</p> </li> <li> <p><code>mkosi.secure-boot.crt</code> and <code>mkosi.secure-boot.key</code> should be an X.509 key pair to use for signing the kernel and initrd for UEFI SecureBoot, if that's enabled.</p> </li> </ol> <h1>How to use it</h1> <p>So, let's come back to our most trivial example, without any of the <code>mkosi.$SOMETHING</code> files around:</p> <div class="highlight"><pre><span></span><code># mkosi </code></pre></div> <p>As mentioned, this will create a build file <code>image.raw</code> in the current directory. How do we use it? Of course, we could <code>dd</code> it onto some USB stick and boot it on a bare-metal device. However, it's much simpler to first run it in a container for testing:</p> <div class="highlight"><pre><span></span><code># systemd-nspawn -bi image.raw </code></pre></div> <p>And there you go: the image should boot up, and just work for you.</p> <p>Now, let's make things more interesting. Let's still not use any of the <code>mkosi.$SOMETHING</code> files around:</p> <div class="highlight"><pre><span></span><code># mkosi -t raw_btrfs --bootable -o foobar.raw # systemd-nspawn -bi foobar.raw </code></pre></div> <p>This is similar as the above, but we made three changes: it's no longer GPT + <code>ext4</code>, but GPT + <code>btrfs</code>. Moreover, the system is made bootable on UEFI systems, and finally, the output is now called <code>foobar.raw</code>.</p> <p>Because this system is bootable on UEFI systems, we can run it in KVM:</p> <div class="highlight"><pre><span></span><code>qemu-kvm -m 512 -smp 2 -bios /usr/share/edk2/ovmf/OVMF_CODE.fd -drive format=raw,file=foobar.raw </code></pre></div> <p>This will look very similar to the <code>systemd-nspawn</code> invocation, except that this uses full VM virtualization rather than container virtualization. (Note that the way to run a UEFI qemu/kvm instance appears to change all the time and is different on the various distributions. It's quite annoying, and I can't really tell you what the right qemu command line is to make this work on your system.)</p> <p>Of course, it's not all raw GPT disk images with <code>mkosi</code>. Let's try a plain directory image:</p> <div class="highlight"><pre><span></span><code># mkosi -d fedora -t directory -o quux # systemd-nspawn -bD quux </code></pre></div> <p>Of course, if you generate the image as plain directory you can't boot it on bare-metal just like that, nor run it in a VM.</p> <p>A more complex command line is the following:</p> <div class="highlight"><pre><span></span><code># mkosi -d fedora -t raw_squashfs --checksum --xz --package=openssh-clients --package=emacs </code></pre></div> <p>In this mode we explicitly pick Fedora as the distribution to use, ask <code>mkosi</code> to generate a compressed GPT image with a root squashfs, compress the result with <code>xz</code>, and generate a <code>SHA256SUMS</code> file with the hashes of the generated artifacts. The package will contain the SSH client as well as everybody's favorite editor.</p> <p>Now, let's make use of the various <code>mkosi.$SOMETHING</code> files. Let's say we are working on some Automake-based project and want to make it easy to generate a disk image off the development tree with the version you are hacking on. Create a configuration file:</p> <div class="highlight"><pre><span></span><code><span class="err">#</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="n">mkosi</span><span class="p">.</span><span class="k">default</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="n">EOF</span><span class="w"></span> <span class="o">[</span><span class="n">Distribution</span><span class="o">]</span><span class="w"></span> <span class="n">Distribution</span><span class="o">=</span><span class="n">fedora</span><span class="w"></span> <span class="k">Release</span><span class="o">=</span><span class="mi">24</span><span class="w"></span> <span class="o">[</span><span class="n">Output</span><span class="o">]</span><span class="w"></span> <span class="nf">Format</span><span class="o">=</span><span class="n">raw_btrfs</span><span class="w"></span> <span class="n">Bootable</span><span class="o">=</span><span class="n">yes</span><span class="w"></span> <span class="o">[</span><span class="n">Packages</span><span class="o">]</span><span class="w"></span> <span class="err">#</span><span class="w"> </span><span class="n">The</span><span class="w"> </span><span class="n">packages</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">appear</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="k">both</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">build</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">final</span><span class="w"> </span><span class="nc">image</span><span class="w"></span> <span class="n">Packages</span><span class="o">=</span><span class="n">openssh</span><span class="o">-</span><span class="n">clients</span><span class="w"> </span><span class="n">httpd</span><span class="w"></span> <span class="err">#</span><span class="w"> </span><span class="n">The</span><span class="w"> </span><span class="n">packages</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">appear</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">build</span><span class="w"> </span><span class="nc">image</span><span class="p">,</span><span class="w"> </span><span class="n">but</span><span class="w"> </span><span class="n">absent</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">final</span><span class="w"> </span><span class="nc">image</span><span class="w"></span> <span class="n">BuildPackages</span><span class="o">=</span><span class="n">make</span><span class="w"> </span><span class="n">gcc</span><span class="w"> </span><span class="n">libcurl</span><span class="o">-</span><span class="n">devel</span><span class="w"></span> <span class="n">EOF</span><span class="w"></span> </code></pre></div> <p>And let's add a build script:</p> <div class="highlight"><pre><span></span><code><span class="c1"># cat &gt; mkosi.build &lt;&lt;EOF</span><span class="w"></span> <span class="c1">#!/bin/sh</span><span class="w"></span> <span class="p">.</span><span class="o">/</span><span class="n">autogen</span><span class="p">.</span><span class="n">sh</span><span class="w"></span> <span class="p">.</span><span class="o">/</span><span class="n">configure</span><span class="w"> </span><span class="o">--</span><span class="n">prefix</span><span class="o">=/</span><span class="n">usr</span><span class="w"></span> <span class="n">make</span><span class="w"> </span><span class="o">-</span><span class="n">j</span><span class="w"> </span><span class="n n-Quoted">`nproc`</span><span class="w"></span> <span class="n">make</span><span class="w"> </span><span class="k">install</span><span class="w"></span> <span class="n">EOF</span><span class="w"></span> <span class="c1"># chmod +x mkosi.build</span><span class="w"></span> </code></pre></div> <p>And with all that in place we can now build our project into a disk image, simply by typing:</p> <div class="highlight"><pre><span></span><code># mkosi </code></pre></div> <p>Let's try it out:</p> <div class="highlight"><pre><span></span><code># systemd-nspawn -bi image.raw </code></pre></div> <p>Of course, if you do this you'll notice that building an image like this can be quite slow. And slow build times are actively hurtful to your productivity as a developer. Hence let's make things a bit faster. First, let's make use of a package cache shared between runs:</p> <div class="highlight"><pre><span></span><code># mkdir mkosi.cache </code></pre></div> <p>Building images now should already be substantially faster (and generate less network traffic) as the packages will now be downloaded only once and reused. However, you'll notice that unpacking all those packages and the rest of the work is still quite slow. But <code>mkosi</code> can help you with that. Simply use <code>mkosi</code>'s incremental build feature. In this mode <code>mkosi</code> will make a copy of the build and final images immediately before dropping in your build sources or artifacts, so that building an image becomes a lot quicker: instead of always starting totally from scratch a build will now reuse everything it can reuse from a previous run, and immediately begin with building your sources rather than the build image to build your sources in. To enable the incremental build feature use <code>-i</code>:</p> <div class="highlight"><pre><span></span><code># mkosi -i </code></pre></div> <p>Note that if you use this option, the package list is not updated anymore from your distribution's servers, as the cached copy is made after all packages are installed, and hence until you actually delete the cached copy the distribution's network servers aren't contacted again and no RPMs or DEBs are downloaded. This means the distribution you use becomes "frozen in time" this way. (Which might be a bad thing, but also a good thing, as it makes things kinda reproducible.)</p> <p>Of course, if you run <code>mkosi</code> a couple of times you'll notice that it won't overwrite the generated image when it already exists. You can either delete the file yourself first (<code>rm image.raw</code>) or let <code>mkosi</code> do it for you right before building a new image, with <code>mkosi -f</code>. You can also tell <code>mkosi</code> to not only remove any such pre-existing images, but also remove any cached copies of the incremental feature, by using <code>-f</code> twice.</p> <p>I wrote <code>mkosi</code> originally in order to test systemd, and quickly generate a disk image of various distributions with the most current systemd version from git, without all that affecting my host system. I regularly use <code>mkosi</code> for that today, in incremental mode. The two commands I use most in that context are:</p> <div class="highlight"><pre><span></span><code># <span class="nv">mkosi</span> <span class="o">-</span><span class="k">if</span> <span class="o">&amp;&amp;</span> <span class="nv">systemd</span><span class="o">-</span><span class="nv">nspawn</span> <span class="o">-</span><span class="nv">bi</span> <span class="nv">image</span>.<span class="nv">raw</span> </code></pre></div> <p>And sometimes:</p> <div class="highlight"><pre><span></span><code># mkosi -iff &amp;&amp; systemd-nspawn -bi image.raw </code></pre></div> <p>The latter I use only if I want to regenerate everything based on the very newest set of RPMs provided by Fedora, instead of a cached snapshot of it.</p> <p>BTW, the <code>mkosi</code> files for systemd are included in the systemd git tree: <a href="https://github.com/systemd/systemd/blob/master/.mkosi/mkosi.fedora"><code>mkosi.default</code></a> and <a href="https://github.com/systemd/systemd/blob/master/mkosi.build"><code>mkosi.build</code></a>. This way, any developer who wants to quickly test something with current systemd git, or wants to prepare a patch based on it and test it can check out the systemd repository and simply run <code>mkosi</code> in it and a few minutes later he has a bootable image he can test in <code>systemd-nspawn</code> or KVM. <code>casync</code> has similar files: <a href="https://github.com/systemd/casync/blob/master/mkosi.default"><code>mkosi.default</code></a>, <a href="https://github.com/systemd/casync/blob/master/mkosi.build"><code>mkosi.build</code></a>.</p> <h1>Random Interesting Features</h1> <ol> <li> <p>As mentioned already, <code>mkosi</code> will generate <code>dm-verity</code> enabled disk images if you ask for it. For that use the <code>--verity</code> switch on the command line or <code>Verity=</code> setting in <code>mkosi.default</code>. Of course, <code>dm-verity</code> implies that the root volume is read-only. In this mode the top-level <code>dm-verity</code> hash will be placed along-side the output disk image in a file named the same way, but with the <code>.roothash</code> suffix. If the image is to be created bootable, the root hash is also included on the kernel command line in the <code>roothash=</code> parameter, which current systemd versions can use to both find and activate the root partition in a <code>dm-verity</code> protected way. BTW: it's a good idea to combine this <code>dm-verity</code> mode with the <code>raw_squashfs</code> image mode, to generate a genuinely protected, compressed image suitable for running in your IoT device.</p> </li> <li> <p>As indicated above, <code>mkosi</code> can automatically create a check-sum file <code>SHA256SUMS</code> for you (<code>--checksum</code>) covering all the files it outputs (which could be the image file itself, a matching <code>.nspawn</code> file using the <code>mkosi.nspawn</code> file mentioned above, as well as the <code>.roothash</code> file for the <code>dm-verity</code> root hash.) It can then optionally sign this with <code>gpg</code> (<code>--sign</code>). Note that <code>systemd</code>'s <code>machinectl pull-tar</code> and <code>machinectl pull-raw</code> command can download these files and the <code>SHA256SUMS</code> file automatically and verify things on download. With other words: what <code>mkosi</code> outputs is perfectly ready for downloads using these two <code>systemd</code> commands.</p> </li> <li> <p>As mentioned, <code>mkosi</code> is big on supporting UEFI SecureBoot. To make use of that, place your X.509 key pair in two files <code>mkosi.secureboot.crt</code> and <code>mkosi.secureboot.key</code>, and set <code>SecureBoot=</code> or <code>--secure-boot</code>. If so, <code>mkosi</code> will sign the kernel/initrd/kernel command line combination during the build. Of course, if you use this mode, you should also use <code>Verity=</code>/<code>--verity=</code>, otherwise the setup makes only partial sense. Note that <code>mkosi</code> will not help you with actually enrolling the keys you use in your UEFI BIOS.</p> </li> <li> <p><code>mkosi</code> has minimal support for GIT checkouts: when it recognizes it is run in a git checkout and you use the <code>mkosi.build</code> script stuff, the source tree will be copied into the build image, but will all files excluded by <code>.gitignore</code> removed.</p> </li> <li> <p>There's support for encryption in place. Use <code>--encrypt=</code> or <code>Encrypt=</code>. Note that the UEFI ESP is never encrypted though, and the root partition only if explicitly requested. The <code>/home</code> and <code>/srv</code> partitions are unconditionally encrypted if that's enabled.</p> </li> <li> <p>Images may be built with all documentation removed.</p> </li> <li> <p>The password for the root user and additional kernel command line arguments may be configured for the image to generate.</p> </li> </ol> <h1>Minimum Requirements</h1> <p>Current <code>mkosi</code> requires Python 3.5, and has a number of dependencies, listed in the <a href="https://github.com/systemd/mkosi/blob/master/README.md"><code>README</code></a>. Most notably you need a somewhat recent systemd version to make use of its full feature set: systemd 233. Older versions are already packaged for various distributions, but much of what I describe above is only available in the most recent release <code>mkosi 3</code>.</p> <p>The UEFI SecureBoot support requires <code>sbsign</code> which currently isn't available in Fedora, but there's <a href="https://copr.fedorainfracloud.org/coprs/msekleta/sbsigntool/">a COPR</a>.</p> <h1>Future</h1> <p>It is my intention to continue turning <code>mkosi</code> into a tool suitable for:</p> <ol> <li>Testing and debugging projects</li> <li>Building images for secure devices</li> <li>Building portable service images</li> <li>Building images for secure VMs and containers</li> </ol> <p>One of the biggest goals I have for the future is to teach <code>mkosi</code> and <code>systemd</code>/<code>sd-boot</code> native support for A/B IoT style partition setups. The idea is that the combination of <code>systemd</code>, <code>casync</code> and <code>mkosi</code> provides generic building blocks for building secure, auto-updating devices in a generic way from, even though all pieces may be used individually, too.</p> <h1>FAQ</h1> <ol> <li> <p><strong>Why are you reinventing the wheel again? This is exactly like <code>$SOMEOTHERPROJECT</code>!</strong> — Well, to my knowledge there's no tool that integrates this nicely with your project's development tree, and can do <code>dm-verity</code> and UEFI SecureBoot and all that stuff for you. So nope, I don't think this exactly like <code>$SOMEOTHERPROJECT</code>, thank you very much.</p> </li> <li> <p><strong>What about creating MBR/DOS partition images?</strong> — That's really out of focus to me. This is an exercise in figuring out how generic OSes and devices in the future should be built and an attempt to commoditize OS image building. And no, the future doesn't speak MBR, sorry. That said, I'd be quite interested in adding support for booting on Raspberry Pi, possibly using a hybrid approach, i.e. using a GPT disk label, but arranging things in a way that the Raspberry Pi boot protocol (which is built around DOS partition tables), can still work.</p> </li> <li> <p><strong>Is this portable?</strong> — Well, depends what you mean by <em>portable</em>. No, this tool runs on Linux only, and as it uses <code>systemd-nspawn</code> during the build process it doesn't run on non-<code>systemd</code> systems either. But then again, you should be able to create images for any architecture you like with it, but of course if you want the image bootable on bare-metal systems only systems doing UEFI are supported (but <code>systemd-nspawn</code> should still work fine on them).</p> </li> <li> <p><strong>Where can I get this stuff?</strong> — Try <a href="https://github.com/systemd/mkosi">GitHub</a>. And some distributions carry packaged versions, but I think none of them the current v3 yet.</p> </li> <li> <p><strong>Is this a systemd project?</strong> — Yes, it's hosted under the <a href="https://github.com/systemd">systemd GitHub umbrella</a>. And yes, during run-time <code>systemd-nspawn</code> in a current version is required. But no, the code-bases are separate otherwise, already because <code>systemd</code> is a C project, and <code>mkosi</code> Python.</p> </li> <li> <p><strong>Requiring systemd 233 is a pretty steep requirement, no?</strong> — Yes, but the feature we need kind of matters (<code>systemd-nspawn</code>'s <code>--overlay=</code> switch), and again, this isn't supposed to be a tool for legacy systems.</p> </li> <li> <p><strong>Can I run the resulting images in LXC or Docker?</strong> — Humm, I am not an LXC nor Docker guy. If you select <code>directory</code> or <code>subvolume</code> as image type, LXC should be able to boot the generated images just fine, but I didn't try. Last time I looked, Docker doesn't permit running proper init systems as PID 1 inside the container, as they define their own run-time without intention to emulate a proper system. Hence, no I don't think it will work, at least not with an unpatched Docker version. That said, again, don't ask me questions about Docker, it's not precisely my area of expertise, and quite frankly I am not a fan. To my knowledge neither LXC nor Docker are able to run containers directly off GPT disk images, hence the various <code>raw_xyz</code> image types are definitely not compatible with either. That means if you want to generate a single raw disk image that can be booted unmodified both in a container and on bare-metal, then <code>systemd-nspawn</code> is the container manager to go for (specifically, its <code>-i</code>/<code>--image=</code> switch).</li></p> </li> </ol> <h1>Should you care? Is this a tool for you?</h1> <p>Well, that's up to you really.</p> <p>If you hack on some complex project and need a quick way to compile and run your project on a specific current Linux distribution, then <code>mkosi</code> is an excellent way to do that. Simply drop the <code>mkosi.default</code> and <code>mkosi.build</code> files in your <code>git</code> tree and everything will be easy. (And of course, as indicated above: if the project you are hacking on happens to be called <code>systemd</code> or <code>casync</code> be aware that those files are already part of the git tree — you can just use them.)</p> <p>If you hack on some embedded or IoT device, then <code>mkosi</code> is a great choice too, as it will make it reasonably easy to generate secure images that are protected against offline modification, by using <code>dm-verity</code> and UEFI SecureBoot.</p> <p>If you are an administrator and need a nice way to build images for a VM or <code>systemd-nspawn</code> container, or a portable service then <code>mkosi</code> is an excellent choice too.</p> <p>If you care about legacy computers, old distributions, non-<code>systemd</code> init systems, old VM managers, Docker, … then no, <code>mkosi</code> is not for you, but there are plenty of well-established alternatives around that cover that nicely.</p> <p>And never forget: <code>mkosi</code> is an Open Source project. We are happy to accept your patches and other contributions.</p> <p>Oh, and one unrelated last thing: don't forget to <a href="https://cfp.all-systems-go.io/en/ASG2017/events/new">submit your talk proposal</a> and/or <a href="https://ti.to/all-systems-go/all-systems-go">buy a ticket</a> for <a href="https://all-systems-go.io/">All Systems Go! 2017 in Berlin</a> — the conference where things like <code>systemd</code>, <code>casync</code> and <code>mkosi</code> are discussed, along with a variety of other Linux userspace projects used for building systems.</p>Lennart PoetteringWed, 28 Jun 2017 00:00:00 +0200tag:0pointer.net,2017-06-28:/blog/mkosi-a-tool-for-generating-os-images.htmlprojectsAll Systems Go! 2017 CfP Openhttps://0pointer.net/blog/all-systems-go-2017-cfp-open.html<p><large><b>The All Systems Go! 2017 Call for Participation is Now Open!</b></large></p> <p>We’d like to invite presentation proposals for <i>All Systems Go! 2017</i>!</p> <p><a href="https://all-systems-go.io/"><img src="https://all-systems-go.io/img/header-graphic.png" width="600" height="195" border="5"/></a></p> <p><i>All Systems Go!</i> is an Open Source community conference focused on the projects and technologies at the foundation of modern Linux systems — specifically low-level user-space technologies. Its goal is to provide a friendly and collaborative gathering place for individuals and communities working to push these technologies forward.</p> <p><i>All Systems Go! 2017</i> takes place in <b>Berlin, Germany</b> on <b>October 21st+22nd</b>.</p> <p><i>All Systems Go!</i> is a 2-day event with 2-3 talks happening in parallel. Full presentation slots are 30-45 minutes in length and lightning talk slots are 5-10 minutes.</p> <p>We are now accepting submissions for presentation proposals. In particular, we are looking for sessions including, but not limited to, the following topics:</p> <ul><li>Low-level container executors and infrastructure</li> <li>IoT and embedded OS infrastructure</li> <li>OS, container, IoT image delivery and updating</li> <li>Building Linux devices and applications</li> <li>Low-level desktop technologies</li> <li>Networking</li> <li>System and service management</li> <li>Tracing and performance measuring</li> <li>IPC and RPC systems</li> <li>Security and Sandboxing</li></ul> <p>While our focus is definitely more on the user-space side of things, talks about kernel projects are welcome too, as long as they have a clear and direct relevance for user-space.</p> <p>Please submit your proposals by <b>September 3rd</b>. Notification of acceptance will be sent out 1-2 weeks later.</p> <p>To submit your proposal now please visit our <a href="https://cfp.all-systems-go.io/en/ASG2017/events/new">CFP submission web site</a>.</p> <p>For further information about All Systems Go! visit our <a href="http://all-systems-go.io/">conference web site</a>.</p> <p><i>systemd.conf</i> will not take place this year in lieu of <i>All Systems Go!</i>. <i>All Systems Go!</i> welcomes all projects that contribute to Linux user space, which, of course, includes systemd. Thus, anything you think was appropriate for submission to <i>systemd.conf</i> is also fitting for <i>All Systems Go</i>!</p>Lennart PoetteringTue, 20 Jun 2017 00:00:00 +0200tag:0pointer.net,2017-06-20:/blog/all-systems-go-2017-cfp-open.htmlprojectscasync — A tool for distributing file system imageshttps://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html<h1>Introducing casync</h1> <p>In the past months I have been working on a new project: <a href="https://github.com/systemd/casync/"><code>casync</code></a>. <code>casync</code> takes inspiration from the popular <a href="https://rsync.samba.org/"><code>rsync</code></a> file synchronization tool as well as the probably even more popular <a href="https://git-scm.com/"><code>git</code></a> revision control system. It combines the idea of the <code>rsync</code> algorithm with the idea of <code>git</code>-style content-addressable file systems, and creates a new system for efficiently storing and delivering file system images, optimized for high-frequency update cycles over the Internet. Its current focus is on delivering IoT, container, VM, application, portable service or OS images, but I hope to extend it later in a generic fashion to become useful for backups and home directory synchronization as well (but more about that later).</p> <p>The basic technological building blocks <code>casync</code> is built from are neither new nor particularly innovative (at least not anymore), however the way <code>casync</code> combines them is different from existing tools, and that's what makes it useful for a variety of use-cases that other tools can't cover that well.</p> <h1>Why?</h1> <p>I created <code>casync</code> after studying how today's popular tools store and deliver file system images. To briefly name a few: Docker has a layered tarball approach, <a href="https://ostree.readthedocs.io/en/latest/">OSTree</a> serves the individual files directly via HTTP and maintains packed deltas to speed up updates, while other systems operate on the block layer and place raw <code>squashfs</code> images (or other archival file systems, such as IS09660) for download on HTTP shares (in the better cases combined with <a href="http://zsync.moria.org.uk/"><code>zsync</code></a> data).</p> <p>Neither of these approaches appeared fully convincing to me when used in high-frequency update cycle systems. In such systems, it is important to optimize towards a couple of goals:</p> <ol> <li>Most importantly, make updates cheap traffic-wise (for this most tools use image deltas of some form)</li> <li>Put boundaries on disk space usage on servers (keeping deltas between all version combinations clients might want to run updates between, would suggest keeping an exponentially growing amount of deltas on servers)</li> <li>Put boundaries on disk space usage on clients</li> <li>Be friendly to Content Delivery Networks (CDNs), i.e. serve neither too many small nor too many overly large files, and only require the most basic form of HTTP. Provide the repository administrator with high-level knobs to tune the average file size delivered.</li> <li>Simplicity to use for users, repository administrators and developers</li> </ol> <p>I don't think any of the tools mentioned above are really good on more than a small subset of these points.</p> <p>Specifically: Docker's layered tarball approach dumps the "delta" question onto the feet of the image creators: the best way to make your image downloads minimal is basing your work on an existing image clients might already have, and inherit its resources, maintaining full history. Here, revision control (a tool for the developer) is intermingled with update management (a concept for optimizing production delivery). As container histories grow individual deltas are likely to stay small, but on the other hand a brand-new deployment usually requires downloading the full history onto the deployment system, even though there's no use for it there, and likely requires substantially more disk space and download sizes.</p> <p>OSTree's serving of individual files is unfriendly to CDNs (as many small files in file trees cause an explosion of HTTP GET requests). To counter that OSTree supports placing pre-calculated delta images between selected revisions on the delivery servers, which means a certain amount of revision management, that leaks into the clients.</p> <p>Delivering direct <code>squashfs</code> (or other file system) images is almost beautifully simple, but of course means every update requires a full download of the newest image, which is both bad for disk usage and generated traffic. Enhancing it with <code>zsync</code> makes this a much better option, as it can reduce generated traffic substantially at very little cost of history/meta-data (no explicit deltas between a large number of versions need to be prepared server side). On the other hand server requirements in disk space and functionality (HTTP Range requests) are minus points for the use-case I am interested in.</p> <p>(Note: all the mentioned systems have great properties, and it's not my intention to badmouth them. They only point I am trying to make is that for the use case I care about — file system image delivery with high high frequency update-cycles — each system comes with certain drawbacks.)</p> <h1>Security &amp; Reproducibility</h1> <p>Besides the issues pointed out above I wasn't happy with the security and reproducibility properties of these systems. In today's world where security breaches involving hacking and breaking into connected systems happen every day, an image delivery system that cannot make strong guarantees regarding data integrity is out of date. Specifically, the tarball format is famously nondeterministic: the very same file tree can result in any number of different valid serializations depending on the tool used, its version and the underlying OS and file system. Some <code>tar</code> implementations attempt to correct that by guaranteeing that each file tree maps to exactly one valid serialization, but such a property is always only specific to the tool used. I strongly believe that any good update system must guarantee on every single link of the chain that there's only one valid representation of the data to deliver, that can easily be verified.</p> <h1>What casync Is</h1> <p>So much about the background why I created <code>casync</code>. Now, let's have a look what <code>casync</code> actually is like, and what it does. Here's the brief technical overview:</p> <p>Encoding: Let's take a large linear data stream, split it into variable-sized chunks (the size of each being a function of the chunk's contents), and store these chunks in individual, compressed files in some directory, each file named after a strong hash value of its contents, so that the hash value may be used to as key for retrieving the full chunk data. Let's call this directory a "chunk store". At the same time, generate a "chunk index" file that lists these chunk hash values plus their respective chunk sizes in a simple linear array. The chunking algorithm is supposed to create variable, but similarly sized chunks from the data stream, and do so in a way that the same data results in the same chunks even if placed at varying offsets. For more information <a href="https://moinakg.wordpress.com/2013/06/22/high-performance-content-defined-chunking/">see this blog story</a>.</p> <p>Decoding: Let's take the chunk index file, and reassemble the large linear data stream by concatenating the uncompressed chunks retrieved from the chunk store, keyed by the listed chunk hash values.</p> <p>As an extra twist, we introduce a well-defined, reproducible, random-access serialization format for file trees (think: a more modern <code>tar</code>), to permit efficient, stable storage of complete file trees in the system, simply by serializing them and then passing them into the encoding step explained above.</p> <p>Finally, let's put all this on the network: for each image you want to deliver, generate a chunk index file and place it on an HTTP server. Do the same with the chunk store, and share it between the various index files you intend to deliver.</p> <p>Why bother with all of this? Streams with similar contents will result in mostly the same chunk files in the chunk store. This means it is very efficient to store many related versions of a data stream in the same chunk store, thus minimizing disk usage. Moreover, when transferring linear data streams chunks already known on the receiving side can be made use of, thus minimizing network traffic.</p> <p>Why is this different from <code>rsync</code> or OSTree, or similar tools? Well, one major difference between <code>casync</code> and those tools is that we remove file boundaries before chunking things up. This means that small files are lumped together with their siblings and large files are chopped into pieces, which permits us to recognize similarities in files and directories beyond file boundaries, and makes sure our chunk sizes are pretty evenly distributed, without the file boundaries affecting them.</p> <p>The "chunking" algorithm is based on a the buzhash rolling hash function. SHA256 is used as strong hash function to generate digests of the chunks. xz is used to compress the individual chunks.</p> <p>Here's a diagram, hopefully explaining a bit how the encoding process works, wasn't it for my crappy drawing skills:</p> <p><a href="http://0pointer.de/public/casync.png"><img src="http://0pointer.de/public/casync.png" width="800" height="862" alt="Diagram"/></a></p> <p>The diagram shows the encoding process from top to bottom. It starts with a block device or a file tree, which is then serialized and chunked up into variable sized blocks. The compressed chunks are then placed in the chunk store, while a chunk index file is written listing the chunk hashes in order. (The original SVG of this graphic may be found <a href="http://0pointer.de/public/casync.svg">here</a>.)</p> <h1>Details</h1> <p>Note that <code>casync</code> operates on two different layers, depending on the use-case of the user:</p> <ol> <li> <p>You may use it on the block layer. In this case the raw block data on disk is taken as-is, read directly from the block device, split into chunks as described above, compressed, stored and delivered.</p> </li> <li> <p>You may use it on the file system layer. In this case, the file tree serialization format mentioned above comes into play: the file tree is serialized depth-first (much like <code>tar</code> would do it) and then split into chunks, compressed, stored and delivered.</p> </li> </ol> <p>The fact that it may be used on both the block and file system layer opens it up for a variety of different use-cases. In the VM and IoT ecosystems shipping images as block-level serializations is more common, while in the container and application world file-system-level serializations are more typically used.</p> <p>Chunk index files referring to block-layer serializations carry the <code>.caibx</code> suffix, while chunk index files referring to file system serializations carry the <code>.caidx</code> suffix. Note that you may also use <code>casync</code> as direct <code>tar</code> replacement, i.e. without the chunking, just generating the plain linear file tree serialization. Such files carry the <code>.catar</code> suffix. Internally <code>.caibx</code> are identical to <code>.caidx</code> files, the only difference is semantical: <code>.caidx</code> files describe a <code>.catar</code> file, while <code>.caibx</code> files may describe any other blob. Finally, chunk stores are directories carrying the <code>.castr</code> suffix.</p> <h1>Features</h1> <p>Here are a couple of other features <code>casync</code> has:</p> <ol> <li> <p>When downloading a new image you may use <code>casync</code>'s <code>--seed=</code> feature: each block device, file, or directory specified is processed using the same chunking logic described above, and is used as preferred source when putting together the downloaded image locally, avoiding network transfer of it. This of course is useful whenever updating an image: simply specify one or more old versions as seed and only download the chunks that truly changed since then. Note that using seeds requires no history relationship between seed and the new image to download. This has major benefits: you can even use it to speed up downloads of relatively foreign and unrelated data. For example, when downloading a container image built using Ubuntu you can use your Fedora host OS tree in <code>/usr</code> as seed, and <code>casync</code> will automatically use whatever it can from that tree, for example timezone and locale data that tends to be identical between distributions. Example: <code>casync extract http://example.com/myimage.caibx --seed=/dev/sda1 /dev/sda2</code>. This will place the block-layer image described by the indicated URL in the <code>/dev/sda2</code> partition, using the existing <code>/dev/sda1</code> data as seeding source. An invocation like this could be typically used by IoT systems with an A/B partition setup. Example 2: <code>casync extract http://example.com/mycontainer-v3.caidx --seed=/srv/container-v1 --seed=/srv/container-v2 /src/container-v3</code>, is very similar but operates on the file system layer, and uses two old container versions to seed the new version.</p> </li> <li> <p>When operating on the file system level, the user has fine-grained control on the meta-data included in the serialization. This is relevant since different use-cases tend to require a different set of saved/restored meta-data. For example, when shipping OS images, file access bits/ACLs and ownership matter, while file modification times hurt. When doing personal backups OTOH file ownership matters little but file modification times are important. Moreover different backing file systems support different feature sets, and storing more information than necessary might make it impossible to validate a tree against an image if the meta-data cannot be replayed in full. Due to this, <code>casync</code> provides a set of <code>--with=</code> and <code>--without=</code> parameters that allow fine-grained control of the data stored in the file tree serialization, including the granularity of modification times and more. The precise set of selected meta-data features is also always part of the serialization, so that seeding can work correctly and automatically.</p> </li> <li> <p><code>casync</code> tries to be as accurate as possible when storing file system meta-data. This means that besides the usual baseline of file meta-data (file ownership and access bits), and more advanced features (extended attributes, ACLs, file capabilities) a number of more exotic data is stored as well, including Linux <a href="https://linux.die.net/man/1/chattr">chattr(1)</a> file attributes, as well as <a href="https://en.wikipedia.org/wiki/File_attribute#DOS_and_Windows">FAT file attributes</a> (you may wonder why the latter? — EFI is FAT, and <code>/efi</code> is part of the comprehensive serialization of any host). In the future I intend to extend this further, for example storing <code>btrfs</code> sub-volume information where available. Note that as described above every single type of meta-data may be turned off and on individually, hence if you don't need FAT file bits (and I figure it's pretty likely you don't), then they won't be stored.</p> </li> <li> <p>The user creating <code>.caidx</code> or <code>.caibx</code> files may control the desired average chunk length (before compression) freely, using the <code>--chunk-size=</code> parameter. Smaller chunks increase the number of generated files in the chunk store and increase HTTP GET load on the server, but also ensure that sharing between similar images is improved, as identical patterns in the images stored are more likely to be recognized. By default <code>casync</code> will use a 64K average chunk size. Tweaking this can be particularly useful when adapting the system to specific CDNs, or when delivering compressed disk images such as <code>squashfs</code> (see below).</p> </li> <li> <p>Emphasis is placed on making all invocations reproducible, well-defined and strictly deterministic. As mentioned above this is a requirement to reach the intended security guarantees, but is also useful for many other use-cases. For example, the <code>casync digest</code> command may be used to calculate a hash value identifying a specific directory in all desired detail (use <code>--with=</code> and <code>--without</code> to pick the desired detail). Moreover the <code>casync mtree</code> command may be used to generate a BSD <a href="https://www.freebsd.org/cgi/man.cgi?mtree(5)">mtree(5)</a> compatible manifest of a directory tree, <code>.caidx</code> or <code>.catar</code> file.</p> </li> <li> <p>The file system serialization format is nicely composable. By this I mean that the serialization of a file tree is the concatenation of the serializations of all files and file sub-trees located at the top of the tree, with zero meta-data references from any of these serializations into the others. This property is essential to ensure maximum reuse of chunks when similar trees are serialized.</p> </li> <li> <p>When extracting file trees or disk image files, <code>casync</code> will automatically create <a href="http://man7.org/linux/man-pages/man2/ioctl_ficlonerange.2.html">reflinks</a> from any specified seeds if the underlying file system supports it (such as <code>btrfs</code>, <code>ocfs</code>, and future <code>xfs</code>). After all, instead of copying the desired data from the seed, we can just tell the file system to link up the relevant blocks. This works both when extracting <code>.caidx</code> and <code>.caibx</code> files — the latter of course only when the extracted disk image is placed in a regular raw image file on disk, rather than directly on a plain block device, as plain block devices do not know the concept of reflinks.</p> </li> <li> <p>Optionally, when extracting file trees, <code>casync</code> can create traditional UNIX hard-links for identical files in specified seeds (<code>--hardlink=yes</code>). This works on all UNIX file systems, and can save substantial amounts of disk space. However, this only works for very specific use-cases where disk images are considered read-only after extraction, as any changes made to one tree will propagate to all other trees sharing the same hard-linked files, as that's the nature of hard-links. In this mode, <code>casync</code> exposes OSTree-like behavior, which is built heavily around read-only hard-link trees.</p> </li> <li> <p><code>casync</code> tries to be smart when choosing what to include in file system images. Implicitly, file systems such as procfs and sysfs are excluded from serialization, as they expose API objects, not real files. Moreover, the "nodump" (<code>+d</code>) <a href="https://linux.die.net/man/1/chattr">chattr(1)</a> flag is honored by default, permitting users to mark files to exclude from serialization.</p> </li> <li> <p>When creating and extracting file trees <code>casync</code> may apply an automatic or explicit UID/GID shift. This is particularly useful when transferring container image for use with Linux user name-spacing.</p> </li> <li> <p>In addition to local operation, <code>casync</code> currently supports HTTP, HTTPS, FTP and ssh natively for downloading chunk index files and chunks (the ssh mode requires installing <code>casync</code> on the remote host, though, but an sftp mode not requiring that should be easy to add). When creating index files or chunks, only ssh is supported as remote back-end.</p> </li> <li> <p>When operating on block-layer images, you may expose locally or remotely stored images as local block devices. Example: <code>casync mkdev http://example.com/myimage.caibx</code> exposes the disk image described by the indicated URL as local block device in <code>/dev</code>, which you then may use the usual block device tools on, such as mount or fdisk (only read-only though). Chunks are downloaded on access with high priority, and at low priority when idle in the background. Note that in this mode, <code>casync</code> also plays a role similar to "dm-verity", as all blocks are validated against the strong digests in the chunk index file before passing them on to the kernel's block layer. This feature is implemented though Linux' NBD kernel facility.</p> </li> <li> <p>Similar, when operating on file-system-layer images, you may mount locally or remotely stored images as regular file systems. Example: <code>casync mount http://example.com/mytree.caidx /srv/mytree</code> mounts the file tree image described by the indicated URL as a local directory <code>/srv/mytree</code>. This feature is implemented though Linux' FUSE kernel facility. Note that special care is taken that the images exposed this way can be packed up again with <code>casync make</code> and are guaranteed to return the bit-by-bit exact same serialization again that it was mounted from. No data is lost or changed while passing things through FUSE (OK, strictly speaking this is a lie, we do lose ACLs, but that's hopefully just a temporary gap to be fixed soon).</p> </li> <li> <p>In IoT A/B fixed size partition setups the file systems placed in the two partitions are usually much shorter than the partition size, in order to keep some room for later, larger updates. <code>casync</code> is able to analyze the super-block of a number of common file systems in order to determine the actual size of a file system stored on a block device, so that writing a file system to such a partition and reading it back again will result in reproducible data. Moreover this speeds up the seeding process, as there's little point in seeding the white-space after the file system within the partition.</p> </li> </ol> <h1>Example Command Lines</h1> <p>Here's how to use <code>casync</code>, explained with a few examples:</p> <div class="highlight"><pre><span></span><code>$ casync make foobar.caidx /some/directory </code></pre></div> <p>This will create a chunk index file <code>foobar.caidx</code> in the local directory, and populate the chunk store directory <code>default.castr</code> located next to it with the chunks of the serialization (you can change the name for the store directory with <code>--store=</code> if you like). This command operates on the file-system level. A similar command operating on the block level:</p> <div class="highlight"><pre><span></span><code>$ casync make foobar.caibx /dev/sda1 </code></pre></div> <p>This command creates a chunk index file <code>foobar.caibx</code> in the local directory describing the current contents of the <code>/dev/sda1</code> block device, and populates <code>default.castr</code> in the same way as above. Note that you may as well read a raw disk image from a file instead of a block device:</p> <div class="highlight"><pre><span></span><code>$ casync make foobar.caibx myimage.raw </code></pre></div> <p>To reconstruct the original file tree from the <code>.caidx</code> file and the chunk store of the first command, use:</p> <div class="highlight"><pre><span></span><code>$ casync extract foobar.caidx /some/other/directory </code></pre></div> <p>And similar for the block-layer version:</p> <div class="highlight"><pre><span></span><code>$ casync extract foobar.caibx /dev/sdb1 </code></pre></div> <p>or, to extract the block-layer version into a raw disk image:</p> <div class="highlight"><pre><span></span><code>$ casync extract foobar.caibx myotherimage.raw </code></pre></div> <p>The above are the most basic commands, operating on local data only. Now let's make this more interesting, and reference remote resources:</p> <div class="highlight"><pre><span></span><code>$ casync extract http://example.com/images/foobar.caidx /some/other/directory </code></pre></div> <p>This extracts the specified <code>.caidx</code> onto a local directory. This of course assumes that <code>foobar.caidx</code> was uploaded to the HTTP server in the first place, along with the chunk store. You can use any command you like to accomplish that, for example <code>scp</code> or <code>rsync</code>. Alternatively, you can let <code>casync</code> do this directly when generating the chunk index:</p> <div class="highlight"><pre><span></span><code>$ casync make ssh.example.com:images/foobar.caidx /some/directory </code></pre></div> <p>This will use ssh to connect to the <code>ssh.example.com</code> server, and then places the <code>.caidx</code> file and the chunks on it. Note that this mode of operation is "smart": this scheme will only upload chunks currently missing on the server side, and not re-transmit what already is available.</p> <p>Note that you can always configure the precise path or URL of the chunk store via the <code>--store=</code> option. If you do not do that, then the store path is automatically derived from the path or URL: the last component of the path or URL is replaced by <code>default.castr</code>.</p> <p>Of course, when extracting <code>.caidx</code> or <code>.caibx</code> files from remote sources, using a local seed is advisable:</p> <div class="highlight"><pre><span></span><code>$ casync extract http://example.com/images/foobar.caidx --seed<span class="o">=</span>/some/exising/directory /some/other/directory </code></pre></div> <p>Or on the block layer:</p> <div class="highlight"><pre><span></span><code>$ casync extract http://example.com/images/foobar.caibx --seed<span class="o">=</span>/dev/sda1 /dev/sdb2 </code></pre></div> <p>When creating chunk indexes on the file system layer <code>casync</code> will by default store meta-data as accurately as possible. Let's create a chunk index with reduced meta-data:</p> <div class="highlight"><pre><span></span><code>$ casync make foobar.caidx --with<span class="o">=</span>sec-time --with<span class="o">=</span>symlinks --with<span class="o">=</span>read-only /some/dir </code></pre></div> <p>This command will create a chunk index for a file tree serialization that has three features above the absolute baseline supported: 1s granularity time-stamps, symbolic links and a single read-only bit. In this mode, all the other meta-data bits are not stored, including nanosecond time-stamps, full UNIX permission bits, file ownership or even ACLs or extended attributes.</p> <p>Now let's make a <code>.caidx</code> file available locally as a mounted file system, without extracting it:</p> <div class="highlight"><pre><span></span><code>$ casync mount http://example.comf/images/foobar.caidx /mnt/foobar </code></pre></div> <p>And similar, let's make a <code>.caibx</code> file available locally as a block device:</p> <div class="highlight"><pre><span></span><code>$ casync mkdev http://example.comf/images/foobar.caibx </code></pre></div> <p>This will create a block device in <code>/dev</code> and print the used device node path to STDOUT.</p> <p>As mentioned, <code>casync</code> is big about reproducibility. Let's make use of that to calculate the a digest identifying a very specific version of a file tree:</p> <div class="highlight"><pre><span></span><code>$ casync digest . </code></pre></div> <p>This digest will include all meta-data bits <code>casync</code> and the underlying file system know about. Usually, to make this useful you want to configure exactly what meta-data to include:</p> <div class="highlight"><pre><span></span><code>$ casync digest --with<span class="o">=</span>unix . </code></pre></div> <p>This makes use of the <code>--with=unix</code> shortcut for selecting meta-data fields. Specifying <code>--with-unix=</code> selects all meta-data that traditional UNIX file systems support. It is a shortcut for writing out: <code>--with=16bit-uids --with=permissions --with=sec-time --with=symlinks --with=device-nodes --with=fifos --with=sockets</code>.</p> <p>Note that when calculating digests or creating chunk indexes you may also use the negative <code>--without=</code> option to remove specific features but start from the most precise:</p> <div class="highlight"><pre><span></span><code>$ casync digest --without<span class="o">=</span>flag-immutable </code></pre></div> <p>This generates a digest with the most accurate meta-data, but leaves one feature out: <a href="https://linux.die.net/man/1/chattr">chattr(1)</a>'s immutable (<code>+i</code>) file flag.</p> <p>To list the contents of a <code>.caidx</code> file use a command like the following:</p> <div class="highlight"><pre><span></span><code>$ casync list http://example.com/images/foobar.caidx </code></pre></div> <p>or</p> <div class="highlight"><pre><span></span><code>$ casync mtree http://example.com/images/foobar.caidx </code></pre></div> <p>The former command will generate a brief list of files and directories, not too different from <code>tar t</code> or <code>ls -al</code> in its output. The latter command will generate a BSD <a href="https://www.freebsd.org/cgi/man.cgi?mtree(5)">mtree(5)</a> compatible manifest. Note that <code>casync</code> actually stores substantially more file meta-data than <code>mtree</code> files can express, though.</p> <h1>What casync isn't</h1> <ol> <li> <p><code>casync</code> is not an attempt to minimize serialization and downloaded deltas to the extreme. Instead, the tool is supposed to find a good middle ground, that is good on traffic and disk space, but not at the price of convenience or requiring explicit revision control. If you care about updates that are absolutely minimal, there are binary delta systems around that might be an option for you, such as <a href="https://www.chromium.org/developers/design-documents/software-updates-courgette">Google's Courgette</a>.</p> </li> <li> <p><code>casync</code> is not a replacement for <code>rsync</code>, or <code>git</code> or <code>zsync</code> or anything like that. They have very different use-cases and semantics. For example, <code>rsync</code> permits you to directly synchronize two file trees remotely. <code>casync</code> just cannot do that, and it is unlikely it every will.</p> </li> </ol> <h1>Where next?</h1> <p><code>casync</code> is supposed to be a generic synchronization tool. Its primary focus for now is delivery of OS images, but I'd like to make it useful for a couple other use-cases, too. Specifically:</p> <ol> <li> <p>To make the tool useful for backups, encryption is missing. I have pretty concrete plans how to add that. When implemented, the tool might become an alternative to <a href="https://restic.github.io/"><code>restic</code></a>, <a href="https://borgbackup.readthedocs.io/">BorgBackup</a> or <a href="https://www.tarsnap.com/"><code>tarsnap</code></a>.</p> </li> <li> <p>Right now, if you want to deploy <code>casync</code> in real-life, you still need to validate the downloaded <code>.caidx</code> or <code>.caibx</code> file yourself, for example with some <code>gpg</code> signature. It is my intention to integrate with <code>gpg</code> in a minimal way so that signing and verifying chunk index files is done automatically.</p> </li> <li> <p>In the longer run, I'd like to build an automatic synchronizer for <code>$HOME</code> between systems from this. Each <code>$HOME</code> instance would be stored automatically in regular intervals in the cloud using casync, and conflicts would be resolved locally.</p> </li> <li> <p><code>casync</code> is written in a shared library style, but it is not yet built as one. Specifically this means that almost all of <code>casync</code>'s functionality is supposed to be available as C API soon, and applications can process <code>casync</code> files on every level. It is my intention to make this library useful enough so that it will be easy to write a module for GNOME's <code>gvfs</code> subsystem in order to make remote or local <code>.caidx</code> files directly available to applications (as an alternative to <code>casync mount</code>). In fact the idea is to make this all flexible enough that even the remoting back-ends can be replaced easily, for example to replace <code>casync</code>'s default HTTP/HTTPS back-ends built on CURL with GNOME's own HTTP implementation, in order to share cookies, certificates, … There's also an alternative method to integrate with <code>casync</code> in place already: simply invoke <code>casync</code> as a sub-process. <code>casync</code> will inform you about a certain set of state changes using a mechanism compatible with <a href="https://www.freedesktop.org/software/systemd/man/sd_notify.html">sd_notify(3)</a>. In future it will also propagate progress data this way and more.</p> </li> <li> <p>I intend to a add a new seeding back-end that sources chunks from the local network. After downloading the new <code>.caidx</code> file off the Internet <code>casync</code> would then search for the listed chunks on the local network first before retrieving them from the Internet. This should speed things up on all installations that have multiple similar systems deployed in the same network.</p> </li> </ol> <p>Further plans are listed tersely in the <a href="https://github.com/systemd/casync/blob/master/TODO">TODO</a> file.</p> <h1>FAQ:</h1> <ol> <li> <p><strong><em>Is this a systemd project?</em></strong> — <code>casync</code> is hosted under the github <a href="https://github.com/systemd/systemd">systemd</a> umbrella, and the projects share the same coding style. However, the code-bases are distinct and without interdependencies, and <code>casync</code> works fine both on systemd systems and systems without it.</p> </li> <li> <p><strong><em>Is <code>casync</code> portable?</em></strong> — At the moment: no. I only run Linux and that's what I code for. That said, I am open to accepting portability patches (unlike for systemd, which doesn't really make sense on non-Linux systems), as long as they don't interfere too much with the way <code>casync</code> works. Specifically this means that I am not too enthusiastic about merging portability patches for OSes lacking the <a href="http://man7.org/linux/man-pages/man2/open.2.html">openat(2)</a> family of APIs.</p> </li> <li> <p><strong><em>Does <code>casync</code> require reflink-capable file systems to work, such as <code>btrfs</code>?</em></strong> — No it doesn't. The reflink magic in <code>casync</code> is employed when the file system permits it, and it's good to have it, but it's not a requirement, and <code>casync</code> will implicitly fall back to copying when it isn't available. Note that <code>casync</code> supports a number of file system features on a variety of file systems that aren't available everywhere, for example FAT's system/hidden file flags or <code>xfs</code>'s <code>projinherit</code> file flag.</p> </li> <li> <p><strong><em>Is <code>casync</code> stable?</em></strong> — I just tagged the first, initial release. While I have been working on it since quite some time and it is quite featureful, this is the first time I advertise it publicly, and it hence received very little testing outside of its own test suite. I am also not fully ready to commit to the stability of the current serialization or chunk index format. I don't see any breakages coming for it though. <code>casync</code> is pretty light on documentation right now, and does not even have a man page. I also intend to correct that soon.</p> </li> <li> <p><strong><em>Are the <code>.caidx</code>/<code>.caibx</code> and <code>.catar</code> file formats open and documented?</em></strong> — <code>casync</code> is Open Source, so if you want to know the precise format, have a look at the sources for now. It's definitely my intention to add comprehensive docs for both formats however. Don't forget this is just the initial version right now.</p> </li> <li> <p><strong><em><code>casync</code> is just like <code>$SOMEOTHERTOOL</code>! Why are you reinventing the wheel (again)?</em></strong> — Well, because <code>casync</code> <em>isn't</em> "just like" some other tool. I am pretty sure I did my homework, and that there is no tool just like <code>casync</code> right now. The tools coming closest are probably <code>rsync</code>, <code>zsync</code>, <code>tarsnap</code>, <code>restic</code>, but they are quite different beasts each.</p> </li> <li> <p><strong><em>Why did you invent your own serialization format for file trees? Why don't you just use <code>tar</code>?</em></strong> — That's a good question, and other systems — most prominently <code>tarsnap</code> — do that. However, as mentioned above <code>tar</code> doesn't enforce reproducibility. It also doesn't really do random access: if you want to access some specific file you need to read every single byte stored before it in the <code>tar</code> archive to find it, which is of course very expensive. The serialization <code>casync</code> implements places a focus on reproducibility, random access, and meta-data control. Much like traditional <code>tar</code> it can still be generated and extracted in a stream fashion though.</p> </li> <li> <p><strong><em>Does <code>casync</code> save/restore SELinux/SMACK file labels?</em></strong> — At the moment not. That's not because I wouldn't want it to, but simply because I am not a guru of either of these systems, and didn't want to implement something I do not fully grok nor can test. If you look at the sources you'll find that there's already some definitions in place that keep room for them though. I'd be delighted to accept a patch implementing this fully.</p> </li> <li> <p><strong><em>What about delivering <code>squashfs</code> images? How well does chunking work on compressed serializations?</em></strong> – That's a very good point! Usually, if you apply the a chunking algorithm to a compressed data stream (let's say a <code>tar.gz</code> file), then changing a single bit at the front will propagate into the entire remainder of the file, so that minimal changes will explode into major changes. Thankfully this doesn't apply that strictly to <code>squashfs</code> images, as it provides random access to files and directories and thus breaks up the compression streams in regular intervals to make seeking easy. This fact is beneficial for systems employing chunking, such as <code>casync</code> as this means single bit changes might affect their vicinity but will not explode in an unbounded fashion. In order achieve best results when delivering <code>squashfs</code> images through <code>casync</code> the block sizes of <code>squashfs</code> and the chunks sizes of <code>casync</code> should be matched up (using <code>casync</code>'s <code>--chunk-size=</code> option). How precisely to choose both values is left a research subject for the user, for now.</p> </li> <li> <p><strong><em>What does the name <code>casync</code> mean?</em></strong> – It's a synchronizing tool, hence the <code>-sync</code> suffix, following <code>rsync</code>'s naming. It makes use of the content-addressable concept of <code>git</code> hence the <code>ca-</code> prefix.</p> </li> <li> <p><strong><em>Where can I get this stuff? Is it already packaged? </em></strong> – Check out the sources on <a href="https://github.com/systemd/casync/">GitHub</a>. I just tagged the <a href="https://github.com/systemd/casync/releases/tag/v1">first version</a>. Martin Pitt has <a href="https://plus.google.com/+MartinPitti/posts/8YMp3xNh1q7">packaged <code>casync</code> for Ubuntu</a>. There is also an <a href="https://aur.archlinux.org/packages/casync-git/">ArchLinux package</a>. Zbigniew Jędrzejewski-Szmek has prepared a <a href="https://apps.fedoraproject.org/packages/casync">Fedora RPM</a> that hopefully will soon be included in the distribution.</p> </li> </ol> <h1>Should you care? Is this a tool for you?</h1> <p>Well, that's up to you really. If you are involved with projects that need to deliver IoT, VM, container, application or OS images, then maybe this is a great tool for you — but other options exist, some of which are linked above.</p> <p>Note that <code>casync</code> is an Open Source project: if it doesn't do exactly what you need, prepare a patch that adds what you need, and we'll consider it.</p> <p>If you are interested in the project and would like to talk about this in person, I'll be presenting <code>casync</code> soon at <a href="https://www.meetup.com/linux-technologies-berlin/events/240909087/">Kinvolk's Linux Technologies Meetup</a> in Berlin, Germany. You are invited. I also intend to talk about it at <a href="https://all-systems-go.io/">All Systems Go!</a>, also in Berlin.</p>Lennart PoetteringTue, 20 Jun 2017 00:00:00 +0200tag:0pointer.net,2017-06-20:/blog/casync-a-tool-for-distributing-file-system-images.htmlprojectsAvoiding CVE-2016-8655 with systemdhttps://0pointer.net/blog/avoiding-cve-2016-8655-with-systemd.html<h1>Avoiding CVE-2016-8655 with systemd</h1> <p>Just a quick note: on recent versions of <a href="https://www.freedesktop.org/wiki/Software/systemd/">systemd</a> it is relatively easy to block the vulnerability described in <a href="http://seclists.org/oss-sec/2016/q4/607">CVE-2016-8655</a> for individual services.</p> <p>Since systemd release v211 there's an option <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RestrictAddressFamilies="><code>RestrictAddressFamilies=</code></a> for service unit files which takes away the right to create sockets of specific address families for processes of the service. In your unit file, add <code>RestrictAddressFamilies=~AF_PACKET</code> to the <code>[Service]</code> section to make <code>AF_PACKET</code> unavailable to it (i.e. a blacklist), which is sufficient to close the attack path. Safer of course is a whitelist of address families whch you can define by dropping the <code>~</code> character from the assignment. Here's a trivial example:</p> <div class="highlight"><pre><span></span><code><span class="err">…</span><span class="w"></span> <span class="o">[</span><span class="n">Service</span><span class="o">]</span><span class="w"></span> <span class="n">ExecStart</span><span class="o">=/</span><span class="n">usr</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">mydaemon</span><span class="w"></span> <span class="n">RestrictAddressFamilies</span><span class="o">=</span><span class="n">AF_INET</span><span class="w"> </span><span class="n">AF_INET6</span><span class="w"> </span><span class="n">AF_UNIX</span><span class="w"></span> <span class="err">…</span><span class="w"></span> </code></pre></div> <p>This restricts access to socket families, so that the service may access only <code>AF_INET</code>, <code>AF_INET6</code> or <code>AF_UNIX</code> sockets, which is usually the right, minimal set for most system daemons. (<code>AF_INET</code> is the low-level name for the IPv4 address family, <code>AF_INET6</code> for the IPv6 address family, and <code>AF_UNIX</code> for local UNIX socket IPC).</p> <p><a href="https://github.com/systemd/systemd/blob/8e458bfe4e2aa36c939db62561b2a59206d78577/NEWS#L45">Starting with systemd v232</a> we added <code>RestrictAddressFamilies=</code> to all of systemd's own unit files, always with the minimal set of socket address families appropriate.</p> <p>With the upcoming v233 release we'll provide a second method for blocking this vulnerability. Using <a href="https://github.com/systemd/systemd/pull/4536"><code>RestrictNamespaces=</code></a> it is possible to limit which types of Linux namespaces a service may get access to. Use <code>RestrictNamespaces=yes</code> to prohibit access to any kind of namespace, or set <code>RestrictNamespaces=net ipc</code> (or similar) to restrict access to a specific set (in this case: network and IPC namespaces). Given that user namespaces have been a major source of security vulnerabilities in the past months it's probably a good idea to block namespaces on all services which don't need them (which is probably most of them).</p> <p>Of course, ideally, distributions such as Fedora, as well as upstream developers would turn on the various sandboxing settings systemd provides like these ones by default, since they know best which kind of address families or namespaces a specific daemon needs.</p>Lennart PoetteringWed, 07 Dec 2016 00:00:00 +0100tag:0pointer.net,2016-12-07:/blog/avoiding-cve-2016-8655-with-systemd.htmlprojectssystemd.conf 2016 Over Nowhttps://0pointer.net/blog/systemdconf-2016-over-now.html<h1>systemd.conf 2016 is Over Now!</h1> <p>A few days ago <a href="https://systemd.io/">systemd.conf 2016</a> ended, our second conference of this kind. I personally enjoyed this conference a lot: the talks, the atmosphere, the audience, the organization, the location, they all were excellent!</p> <p>I'd like to take the opportunity to thanks everybody involved. In particular I'd like to thank <em>Chris</em>, <em>Daniel</em>, <em>Sandra</em> and <em>Henrike</em> for organizing the conference, your work was stellar!</p> <p>I'd also like to thank our sponsors, without which the conference couldn't take place like this, of course. In particular I'd like to thank our gold sponsor, <strong>Red Hat</strong>, our organizing sponsor <strong>Kinvolk</strong>, as well as our silver sponsors <strong>CoreOS</strong> and <strong>Facebook</strong>. I'd also like to thank our bronze sponsors <strong>Collabora</strong>, <strong>OpenSUSE</strong>, <strong>Pantheon</strong>, <strong>Pengutronix</strong>, our supporting sponsor <strong>Codethink</strong> and last but not least our media sponsor <strong>Linux Magazin</strong>. Thank you all!</p> <p><img src="https://conf.systemd.io/media/imgs/sponsors/redhat.png" width="300" height="97"></p> <p><img src="https://conf.systemd.io/media/imgs/sponsors/kinvolk_logo.png" width="300" height="187"></p> <p><img src="https://conf.systemd.io/media/imgs/sponsors/coreos.png" width="300" height="116"></p> <p><img src="https://conf.systemd.io/media/imgs/sponsors/facebook-logo.png" width="300" height="113"></p> <p><img src="https://conf.systemd.io/media/imgs/sponsors/collabora.png" width="300" height="169"></p> <p><img src="https://conf.systemd.io/media/imgs/sponsors/opensuse-logo.png" width="300" height="190"></p> <p><img src="https://conf.systemd.io/media/imgs/sponsors/pantheon.png" width="300" height="106"></p> <p><img src="https://conf.systemd.io/media/imgs/sponsors/pengutronix.png" width="300" height="84"></p> <p><img src="https://conf.systemd.io/media/imgs/sponsors/codethink-logo.png" width="300" height="88"></p> <p><img src="https://conf.systemd.io/media/imgs/sponsors/linux-magazin.png" width="300" height="126"></p> <p>I'd also like to thank the <a href="https://c3voc.de/">Video Operation Center ("VOC")</a> for their amazing work on live-streaming the conference and making all talks available on YouTube. It's amazing how efficient the VOC is, it's simply stunning! Thank you guys!</p> <p>In case you missed this year's iteration of the conference, please have a look at our <strong><a href="https://www.youtube.com/channel/UCvq_RgZp3kljp9X8Io9Z1DA">YouTube Channel</a></strong>. You'll find all of this year's talks there, as well the ones from last year. (For example, my welcome talk is available <a href="https://www.youtube.com/watch?v=DUUbFGNZ1vI">here</a>). Enjoy!</p> <p>We hope to see you again next year, for systemd.conf 2017 in Berlin!</p>Lennart PoetteringWed, 05 Oct 2016 00:00:00 +0200tag:0pointer.net,2016-10-05:/blog/systemdconf-2016-over-now.htmlprojectssystemd.conf 2016 Workshop Tickets Availablehttps://0pointer.net/blog/systemdconf-2016-workshop-tickets-available.html<h1>Tickets for systemd 2016 Workshop day still available!</h1> <p>We still have a number of ticket for the workshop day of <a href="https://conf.systemd.io/">systemd.conf 2016</a> available. If you are a newcomer to systemd, and would like to learn about various systemd facilities, or if you already know your way around, but would like to know more: this is the best chance to do so. The workshop day is the 28th of September, one day before the main conference, at the betahaus in Berlin, Germany. The schedule for the day is available <a href="https://cfp.systemd.io/en/systemdconf_2016/public/schedule/0">here</a>. There are five interesting, extensive sessions, run by the systemd hackers themselves. Who better to learn systemd from, than the folks who wrote it?</p> <p>Note that the workshop day and the main conference days require different tickets. (Also note: there are still a few tickets available for the main conference!).</p> <p><a href="https://ti.to/systemdconf/systemdconf-2016">Buy a ticket here.</a></p> <p>See you in Berlin!</p>Lennart PoetteringSun, 18 Sep 2016 00:00:00 +0200tag:0pointer.net,2016-09-18:/blog/systemdconf-2016-workshop-tickets-available.htmlprojectsPreliminary systemd.conf 2016 Schedulehttps://0pointer.net/blog/preliminary-systemdconf-2016-now-available.html<h1>A Preliminary systemd.conf 2016 Schedule is Now Available!</h1> <p>We have just published a first, preliminary version of the <a href="https://cfp.systemd.io/en/systemdconf_2016/public/schedule/1">systemd.conf 2016 schedule</a>. There is a small number of white slots in the schedule still, because we're missing confirmation from a small number of presenters. The missing talks will be added in as soon as they are confirmed.</p> <p>The schedule consists of 5 workshops by high-profile speakers during the workshop day, 22 exciting talks during the main conference days, followed by one full day of hackfests.</p> <p>Please sign up for the conference soon! Only a limited number of tickets are available, hence make sure to secure yours quickly before they run out! (Last year we sold out.) <a href="https://ti.to/systemdconf/systemdconf-2016">Please sign up here for the conference!</a></p>Lennart PoetteringTue, 16 Aug 2016 00:00:00 +0200tag:0pointer.net,2016-08-16:/blog/preliminary-systemdconf-2016-now-available.htmlprojectsFINAL REMINDER! systemd.conf 2016 CfP Ends on Monday!https://0pointer.net/blog/final-reminder-systemdconf-2016-cfp-ends-on-monday.html<p>Please note that the <a href="https://conf.systemd.io/">systemd.conf 2016</a> Call for Participation ends on Monday, on <strong>Aug. 1st</strong>! Please send in your talk proposal by then! We’ve already got a good number of excellent submissions, but we are very interested in yours, too!</p> <p><a href="http://systemd.io/"><img src="http://0pointer.de/public/systemdconf2016.png" width="750" height="349" border="0"></a></p> <p>We are looking for talks on all facets of systemd: deployment, maintenance, administration, development. Regardless of whether you use it in the cloud, on embedded, on IoT, on the desktop, on mobile, in a container or on the server: we are interested in your submissions!</p> <p>In addition to proposals for talks for the main conference, we are looking for proposals for <strong>workshop sessions</strong> held during our Workshop Day (the first day of the conference). The workshop format consists of a day of 2-3h training sessions, that may cover any systemd-related topic you'd like. We are both interested in submissions from the developer community as well as submissions from organizations making use of systemd! Introductory workshop sessions are particularly welcome, as the Workshop Day is intended to open up our conference to newcomers and people who aren't systemd gurus yet, but would like to become more fluent.</p> <p>For further details on the submissions we are looking for and the CfP process, please consult the <a href="https://cfp.systemd.io/en/systemdconf_2016/cfp/session/new">CfP page</a> and submit your proposal using the provided form!</p> <p><strong>ALSO:</strong> Please sign up for the conference soon! Only a <strong>limited</strong> number of tickets are available, hence make sure to secure yours quickly before they run out! (Last year we sold out.) <a href="https://ti.to/systemdconf/systemdconf-2016">Please sign up here for the conference!</a></p> <p><strong>AND OF COURSE:</strong> We are also looking for more sponsors for systemd.conf! If you are working on systemd-related projects, or make use of it in your company, <a href="https://conf.systemd.io/files/systemdconf2016SponsorshipProspectus.pdf">please consider <strong>becoming a sponsor</strong> of systemd.conf 2016</a>! Without our sponsors we couldn't organize systemd.conf 2016!</p> <p>Thank you very much, and see you in Berlin!</p>Lennart PoetteringThu, 28 Jul 2016 00:00:00 +0200tag:0pointer.net,2016-07-28:/blog/final-reminder-systemdconf-2016-cfp-ends-on-monday.htmlprojectsREMINDER! systemd.conf 2016 CfP Ends in Two Weeks!https://0pointer.net/blog/reminder-systemdconf-2016-cfp-ends-in-two-weeks.html<p>Please note that the <a href="https://conf.systemd.io/">systemd.conf 2016</a> Call for Participation ends in less than two weeks, on <strong>Aug. 1st</strong>! Please send in your talk proposal by then! We’ve already got a good number of excellent submissions, but we are interested in yours even more!</p> <p>We are looking for talks on all facets of systemd: deployment, maintenance, administration, development. Regardless of whether you use it in the cloud, on embedded, on IoT, on the desktop, on mobile, in a container or on the server: we are interested in your submissions!</p> <p>In addition to proposals for talks for the main conference, we are looking for proposals for <strong>workshop sessions</strong> held during our Workshop Day (the first day of the conference). The workshop format consists of a day of 2-3h training sessions, that may cover any systemd-related topic you'd like. We are both interested in submissions from the developer community as well as submissions from organizations making use of systemd! Introductory workshop sessions are particularly welcome, as the Workshop Day is intended to open up our conference to newcomers and people who aren't systemd gurus yet, but would like to become more fluent.</p> <p>For further details on the submissions we are looking for and the CfP process, please consult the <a href="https://cfp.systemd.io/en/systemdconf_2016/cfp/session/new">CfP page</a> and submit your proposal using the provided form!</p> <p>And keep in mind:</p> <p><strong>REMINDER:</strong> Please sign up for the conference soon! Only a <strong>limited</strong> number of tickets are available, hence make sure to secure yours quickly before they run out! (Last year we sold out.) <a href="https://ti.to/systemdconf/systemdconf-2016">Please sign up here for the conference!</a></p> <p><strong>AND OF COURSE:</strong> We are also looking for more sponsors for systemd.conf! If you are working on systemd-related projects, or make use of it in your company, <a href="https://conf.systemd.io/files/systemdconf2016SponsorshipProspectus.pdf">please consider <strong>becoming a sponsor</strong> of systemd.conf 2016</a>! Without our sponsors we couldn't organize systemd.conf 2016!</p> <p>Thank you very much, and see you in Berlin!</p>Lennart PoetteringTue, 19 Jul 2016 00:00:00 +0200tag:0pointer.net,2016-07-19:/blog/reminder-systemdconf-2016-cfp-ends-in-two-weeks.htmlprojectsCfP is now openhttps://0pointer.net/blog/cfp-is-now-open.html<h1>The systemd.conf 2016 Call for Participation is Now Open!</h1> <p>We’d like to invite presentation and workshop proposals for <a href="https://systemd.io/">systemd.conf 2016</a>!</p> <p>The conference will consist of three parts:</p> <ul> <li>One day of <b>workshops</b>, consisting of in-depth (2-3hr) training and learning-by-doing sessions (Sept. 28th)</li> <li>Two days of regular <b>talks</b> (Sept. 29th-30th)</li> <li>One day of <b>hackfest</b> (Oct. 1st)</li> </ul> <p>We are now accepting submissions for the first three days: proposals for workshops, training sessions and regular talks. In particular, we are looking for sessions including, but not limited to, the following topics:</p> <ul> <li>Use Cases: systemd in today’s and tomorrow’s <b>devices</b> and <b>applications</b></li> <li>systemd and <b>containers</b>, in the cloud and on <b>servers</b></li> <li>systemd in <b>distributions</b></li> <li><b>Embedded</b> systemd and in <b>IoT</b></li> <li>systemd on the <b>desktop</b></li> <li><b>Networking</b> with systemd</li> <li>… and everything else related to <a href="https://www.freedesktop.org/wiki/Software/systemd/">systemd</a></li> </ul> <p>Please submit your proposals by <strong>August 1st, 2016</strong>. Notification of acceptance will be sent out 1-2 weeks later.</p> <p>If submitting a workshop proposal please contact <a href="mailto:info@systemd.io">the organizers</a> for more details.</p> <p>To submit a talk, please visit <a href="https://cfp.systemd.io/en/systemdconf_2016/cfp/session/new">our CfP submission page</a>.</p> <p>For further information on systemd.conf 2016, please visit <a href="https://systemd.io/">our conference web site</a>.</p>Lennart PoetteringThu, 12 May 2016 00:00:00 +0200tag:0pointer.net,2016-05-12:/blog/cfp-is-now-open.htmlprojectsAnnouncing systemd.conf 2016https://0pointer.net/blog/announcing-systemdconf-2016.html<h1>Announcing systemd.conf 2016</h1> <p>We are happy to announce the 2016 installment of systemd.conf, the conference of the systemd project!</p> <p>After our successful first conference 2015 we’d like to repeat the event in 2016 for the second time. The conference will take place on <strong>September 28th</strong> until <strong>October 1st</strong>, 2016 at <strong>betahaus</strong> in <strong>Berlin, Germany</strong>. The event is a few days before LinuxCon Europe, which also is located in Berlin this year. This year, the conference will consist of two days of presentations, a one-day hackfest and one day of hands-on training sessions.</p> <p>The website is online now, please visit <a href="https://conf.systemd.io">https://conf.systemd.io/</a>.</p> <p>Tickets at early-bird prices are available already. Purchase them at <a href="https://ti.to/systemdconf/systemdconf-2016">https://ti.to/systemdconf/systemdconf-2016</a>.</p> <p>The Call for Presentations will open soon, we are looking forward to your submissions! A separate announcement will be published as soon as the CfP is open.</p> <p>systemd.conf 2016 is a organized jointly by the <strong>systemd community</strong> and <strong>kinvolk.io</strong>.</p> <p>We are looking for sponsors! We’ve got early commitments from some of last year’s sponsors: <strong>Collabora</strong>, <strong>Pengutronix</strong> &amp; <strong>Red Hat</strong>. Please see the web site for details about how your company may become a sponsor, too.</p> <p>If you have any questions, please contact us at <a href="mailto:info@systemd.io">info@systemd.io</a>.</p>Lennart PoetteringMon, 04 Apr 2016 00:00:00 +0200tag:0pointer.net,2016-04-04:/blog/announcing-systemdconf-2016.htmlprojectsIntroducing sd-eventhttps://0pointer.net/blog/introducing-sd-event.html<h1>The Event Loop API of libsystemd</h1> <p>When we began working on <a href="https://wiki.freedesktop.org/www/Software/systemd/">systemd</a> we built it around a hand-written ad-hoc event loop, wrapping <a href="http://man7.org/linux/man-pages/man7/epoll.7.html">Linux epoll</a>. The more our project grew the more we realized the limitations of using raw epoll:</p> <ul> <li> <p>As we used <a href="http://man7.org/linux/man-pages/man2/timerfd_create.2.html">timerfd</a> for our timer events, each event source cost one file descriptor and we had many of them! File descriptors are a scarce resource on UNIX, as <a href="http://man7.org/linux/man-pages/man2/setrlimit.2.html">RLIMIT_NOFILE</a> is typically set to 1024 or similar, limiting the number of available file descriptors per process to 1021, which isn't particularly a lot.</p> </li> <li> <p>Ordering of event dispatching became a nightmare. In many cases, we wanted to make sure that a certain kind of event would always be dispatched before another kind of event, if both happen at the same time. For example, when the last process of a service dies, we might be notified about that via a SIGCHLD signal, via an <a href="http://www.freedesktop.org/software/systemd/man/sd_notify.html">sd_notify() "STATUS="</a> message, and via a control group notification. We wanted to get these events in the right order, to know when it's safe to process and subsequently release the runtime data systemd keeps about the service or process: it shouldn't be done if there are still events about it pending.</p> </li> <li> <p>For each program we added to the systemd project we noticed we were adding similar code, over and over again, to work with epoll's complex interfaces. For example, finding the right file descriptor and callback function to dispatch an epoll event to, without running into invalidated pointer issues is outright difficult and requires non-trivial code.</p> </li> <li> <p>Integrating child process watching into our event loops was much more complex than one could hope, and even more so if child process events should be ordered against each other and unrelated kinds of events.</p> </li> </ul> <p>Eventually, we started working on <a href="the-new-sd-bus-api-of-systemd.html">sd-bus</a>. At the same time we decided to seize the opportunity, put together a proper event loop API in C, and then not only port sd-bus on top of it, but also the rest of systemd. The result of this is <a href="http://www.freedesktop.org/software/systemd/man/sd-event.html">sd-event</a>. After almost two years of development we declared sd-event stable in systemd version 221, and published it as official API of libsystemd.</p> <h2>Why?</h2> <p><a href="https://github.com/systemd/systemd/blob/master/src/systemd/sd-event.h">sd-event.h</a>, of course, is not the first event loop API around, and it doesn't implement any really novel concepts. When we started working on it we tried to do our homework, and checked the various existing event loop APIs, maybe looking for candidates to adopt instead of doing our own, and to learn about the strengths and weaknesses of the various implementations existing. Ultimately, we found no implementation that could deliver what we needed, or where it would be easy to add the missing bits: as usual in the systemd project, we wanted something that allows us access to all the Linux-specific bits, instead of limiting itself to the least common denominator of UNIX. We weren't looking for an abstraction API, but simply one that makes epoll usable in system code.</p> <p>With this blog story I'd like to take the opportunity to introduce you to sd-event, and explain why it might be a good candidate to adopt as event loop implementation in your project, too.</p> <p>So, here are some features it provides:</p> <ul> <li> <p>I/O event sources, based on epoll's file descriptor watching, including edge triggered events (EPOLLET). See <a href="http://www.freedesktop.org/software/systemd/man/sd_event_add_io.html">sd_event_add_io(3)</a>.</p> </li> <li> <p>Timer event sources, based on <code>timerfd_create()</code>, supporting the <code>CLOCK_MONOTONIC</code>, <code>CLOCK_REALTIME</code>, <code>CLOCK_BOOTIME</code> clocks, as well as the <code>CLOCK_REALTIME_ALARM</code> and <code>CLOCK_BOOTTIME_ALARM</code> clocks that can resume the system from suspend. When creating timer events a required accuracy parameter may be specified which allows coalescing of timer events to minimize power consumption. For each clock only a single timer file descriptor is kept, and all timer events are multiplexed with a priority queue. See <a href="http://www.freedesktop.org/software/systemd/man/sd_event_add_time.html">sd_event_add_time(3)</a>.</p> </li> <li> <p>UNIX process signal events, based on <a href="http://man7.org/linux/man-pages/man2/signalfd.2.html">signalfd(2)</a>, including full support for real-time signals, and queued parameters. See <a href="http://www.freedesktop.org/software/systemd/man/sd_event_add_signal.html">sd_event_add_signal(3)</a>.</p> </li> <li> <p>Child process state change events, based on <a href="http://man7.org/linux/man-pages/man2/waitid.2.html">waitid(2)</a>. See <a href="http://www.freedesktop.org/software/systemd/man/sd_event_add_child.html">sd_event_add_child(3)</a>.</p> </li> <li> <p>Static event sources, of three types: defer, post and exit, for invoking calls in each event loop, after other event sources or at event loop termination. See <a href="http://www.freedesktop.org/software/systemd/man/sd_event_add_defer.html">sd_event_add_defer(3)</a>.</p> </li> <li> <p>Event sources may be assigned a 64bit priority value, that controls the order in which event sources are dispatched if multiple are pending simultanously. See <a href="http://www.freedesktop.org/software/systemd/man/sd_event_source_set_priority.html">sd_event_source_set_priority(3)</a>.</p> </li> <li> <p>The event loop may automatically send watchdog notification messages to the service manager. See <a href="http://www.freedesktop.org/software/systemd/man/sd_event_set_watchdog.html">sd_event_set_watchdog(3)</a>.</p> </li> <li> <p>The event loop may be integrated into foreign event loops, such as the GLib one. The event loop API is hence composable, the same way the underlying epoll logic is. See <a href="http://www.freedesktop.org/software/systemd/man/sd_event_get_fd.html">sd_event_get_fd(3)</a> for an example.</p> </li> <li> <p>The API is fully OOM safe.</p> </li> <li> <p>A complete set of documentation in UNIX man page format is available, with <a href="http://www.freedesktop.org/software/systemd/man/sd-event.html">sd-event(3)</a> as the entry page.</p> </li> <li> <p>It's pretty widely available, and requires no extra dependencies. Since systemd is built on it, most major distributions ship the library in their default install set.</p> </li> <li> <p>After two years of development, and after being used in all of systemd's components, it has received a fair share of testing already, even though we only recently decided to declare it stable and turned it into a public API.</p> </li> </ul> <p>Note that sd-event has some potential drawbacks too:</p> <ul> <li> <p>If portability is essential to you, sd-event is not your best option. sd-event is a wrapper around Linux-specific APIs, and that's visible in the API. For example: our event callbacks receive structures defined by Linux-specific APIs such as signalfd.</p> </li> <li> <p>It's a low-level C API, and it doesn't isolate you from the OS underpinnings. While I like to think that it is relatively nice and easy to use from C, it doesn't compromise on exposing the low-level functionality. It just fills the gaps in what's missing between epoll, timerfd, signalfd and related concepts, and it does not hide that away.</p> </li> </ul> <p>Either way, I believe that sd-event is a great choice when looking for an event loop API, in particular if you work on system-level software and embedded, where functionality like timer coalescing or watchdog support matter.</p> <h2>Getting Started</h2> <p>Here's a short example how to use sd-event in a simple daemon. In this example, we'll not just use <a href="https://github.com/systemd/systemd/blob/master/src/systemd/sd-event.h">sd-event.h</a>, but also <a href="https://github.com/systemd/systemd/blob/master/src/systemd/sd-daemon.h">sd-daemon.h</a> to implement a system service.</p> <div class="highlight"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;alloca.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;endian.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;errno.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;netinet/in.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;signal.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;stdbool.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;stdio.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;stdlib.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;string.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;sys/ioctl.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;sys/socket.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;unistd.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;systemd/sd-daemon.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;systemd/sd-event.h&gt;</span><span class="cp"></span> <span class="k">static</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">io_handler</span><span class="p">(</span><span class="n">sd_event_source</span><span class="w"> </span><span class="o">*</span><span class="n">es</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="kt">uint32_t</span><span class="w"> </span><span class="n">revents</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">userdata</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">buffer</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="kt">ssize_t</span><span class="w"> </span><span class="n">n</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">sz</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="cm">/* UDP enforces a somewhat reasonable maximum datagram size of 64K, we can just allocate the buffer on the stack */</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">ioctl</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="n">FIONREAD</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">sz</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"></span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">-</span><span class="n">errno</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="n">buffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">alloca</span><span class="p">(</span><span class="n">sz</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">recv</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="n">buffer</span><span class="p">,</span><span class="w"> </span><span class="n">sz</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">errno</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">EAGAIN</span><span class="p">)</span><span class="w"></span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">-</span><span class="n">errno</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">5</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">memcmp</span><span class="p">(</span><span class="n">buffer</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;EXIT</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Request a clean exit */</span><span class="w"></span> <span class="w"> </span><span class="n">sd_event_exit</span><span class="p">(</span><span class="n">sd_event_source_get_event</span><span class="p">(</span><span class="n">es</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="n">fwrite</span><span class="p">(</span><span class="n">buffer</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">stdout</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="n">fflush</span><span class="p">(</span><span class="n">stdout</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span> <span class="p">}</span><span class="w"></span> <span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">argv</span><span class="p">[])</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="k">union</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">sockaddr_in</span><span class="w"> </span><span class="n">in</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">sockaddr</span><span class="w"> </span><span class="n">sa</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">sa</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="n">sd_event_source</span><span class="w"> </span><span class="o">*</span><span class="n">event_source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="n">sd_event</span><span class="w"> </span><span class="o">*</span><span class="n">event</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">fd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">-1</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="kt">sigset_t</span><span class="w"> </span><span class="n">ss</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_event_default</span><span class="p">(</span><span class="o">&amp;</span><span class="n">event</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sigemptyset</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ss</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">||</span><span class="w"></span> <span class="w"> </span><span class="n">sigaddset</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ss</span><span class="p">,</span><span class="w"> </span><span class="n">SIGTERM</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">||</span><span class="w"></span> <span class="w"> </span><span class="n">sigaddset</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ss</span><span class="p">,</span><span class="w"> </span><span class="n">SIGINT</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">-</span><span class="n">errno</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Block SIGTERM first, so that the event loop can handle it */</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sigprocmask</span><span class="p">(</span><span class="n">SIG_BLOCK</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">ss</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">-</span><span class="n">errno</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Let&#39;s make use of the default handler and &quot;floating&quot; reference features of sd_event_add_signal() */</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_event_add_signal</span><span class="p">(</span><span class="n">event</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="n">SIGTERM</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_event_add_signal</span><span class="p">(</span><span class="n">event</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="n">SIGINT</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Enable automatic service watchdog support */</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_event_set_watchdog</span><span class="p">(</span><span class="n">event</span><span class="p">,</span><span class="w"> </span><span class="nb">true</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="n">fd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">socket</span><span class="p">(</span><span class="n">AF_INET</span><span class="p">,</span><span class="w"> </span><span class="n">SOCK_DGRAM</span><span class="o">|</span><span class="n">SOCK_CLOEXEC</span><span class="o">|</span><span class="n">SOCK_NONBLOCK</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">fd</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">-</span><span class="n">errno</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="n">sa</span><span class="p">.</span><span class="n">in</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="k">struct</span><span class="w"> </span><span class="nc">sockaddr_in</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="p">.</span><span class="n">sin_family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">AF_INET</span><span class="p">,</span><span class="w"></span> <span class="w"> </span><span class="p">.</span><span class="n">sin_port</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">htobe16</span><span class="p">(</span><span class="mi">7777</span><span class="p">),</span><span class="w"></span> <span class="w"> </span><span class="p">};</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">bind</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">sa</span><span class="p">.</span><span class="n">sa</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">sa</span><span class="p">))</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">-</span><span class="n">errno</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_event_add_io</span><span class="p">(</span><span class="n">event</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">event_source</span><span class="p">,</span><span class="w"> </span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="n">EPOLLIN</span><span class="p">,</span><span class="w"> </span><span class="n">io_handler</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="w"> </span><span class="n">sd_notifyf</span><span class="p">(</span><span class="nb">false</span><span class="p">,</span><span class="w"></span> <span class="w"> </span><span class="s">&quot;READY=1</span><span class="se">\n</span><span class="s">&quot;</span><span class="w"></span> <span class="w"> </span><span class="s">&quot;STATUS=Daemon startup completed, processing events.&quot;</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_event_loop</span><span class="p">(</span><span class="n">event</span><span class="p">);</span><span class="w"></span> <span class="nl">finish</span><span class="p">:</span><span class="w"></span> <span class="w"> </span><span class="n">event_source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_event_source_unref</span><span class="p">(</span><span class="n">event_source</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="n">event</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_event_unref</span><span class="p">(</span><span class="n">event</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">fd</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"></span> <span class="w"> </span><span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="w"> </span><span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"></span> <span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Failure: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">strerror</span><span class="p">(</span><span class="o">-</span><span class="n">r</span><span class="p">));</span><span class="w"></span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">EXIT_FAILURE</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">EXIT_SUCCESS</span><span class="p">;</span><span class="w"></span> <span class="p">}</span><span class="w"></span> </code></pre></div> <p>The example above shows how to write a minimal UDP/IP server, that listens on port 7777. Whenever a datagram is received it outputs its contents to STDOUT, unless it is precisely the string <code>EXIT\n</code> in which case the service exits. The service will react to SIGTERM and SIGINT and do a clean exit then. It also notifies the service manager about its completed startup, if it runs under a service manager. Finally, it sends watchdog keep-alive messages to the service manager if it asked for that, and if it runs under a service manager.</p> <p>When run as systemd service this service's STDOUT will be connected to the logging framework of course, which means the service can act as a minimal UDP-based remote logging service.</p> <p>To compile and link this example, save it as <code>event-example.c</code>, then run:</p> <div class="highlight"><pre><span></span><code>$ gcc event-example.c -o event-example <span class="sb">`</span>pkg-config --cflags --libs libsystemd<span class="sb">`</span> </code></pre></div> <p>For a first test, simply run the resulting binary from the command line, and test it against the following netcat command line:</p> <div class="highlight"><pre><span></span><code>$ nc -u localhost <span class="m">7777</span> </code></pre></div> <p>For the sake of brevity error checking is minimal, and in a real-world application should, of course, be more comprehensive. However, it hopefully gets the idea across how to write a daemon that reacts to external events with sd-event.</p> <p>For further details on the functions used in the example above, please consult the manual pages: <a href="http://www.freedesktop.org/software/systemd/man/sd-event.html">sd-event(3)</a>, <a href="http://www.freedesktop.org/software/systemd/man/sd_event_exit.html">sd_event_exit(3)</a>, <a href="http://www.freedesktop.org/software/systemd/man/sd_event_source_get_event.html">sd_event_source_get_event(3)</a>, <a href="http://www.freedesktop.org/software/systemd/man/sd_event_default.html">sd_event_default(3)</a>, <a href="http://www.freedesktop.org/software/systemd/man/sd_event_add_signal.html">sd_event_add_signal(3)</a>, <a href="http://www.freedesktop.org/software/systemd/man/sd_event_set_watchdog.html">sd_event_set_watchdog(3)</a>, <a href="http://www.freedesktop.org/software/systemd/man/sd_event_add_io.html">sd_event_add_io(3)</a>, <a href="http://www.freedesktop.org/software/systemd/man/sd_notifyf.html">sd_notifyf(3)</a>, <a href="http://www.freedesktop.org/software/systemd/man/sd_event_loop.html">sd_event_loop(3)</a>, <a href="http://www.freedesktop.org/software/systemd/man/sd_event_source_unref.html">sd_event_source_unref(3)</a>, <a href="http://www.freedesktop.org/software/systemd/man/sd_event_unref.html">sd_event_unref(3)</a>.</p> <h2>Conclusion</h2> <p>So, is this the event loop to end all other event loops? Certainly not. I actually believe in "event loop plurality". There are many reasons for that, but most importantly: sd-event is supposed to be an event loop suitable for writing a wide range of applications, but it's definitely not going to solve all event loop problems. For example, while the priority logic is important for many usecase it comes with drawbacks for others: if not used carefully high-priority event sources can easily starve low-priority event sources. Also, in order to implement the priority logic, sd-event needs to linearly iterate through the event structures returned by <a href="http://man7.org/linux/man-pages/man2/epoll_wait.2.html">epoll_wait(2)</a> to sort the events by their priority, resulting in worst case O(n*log(n)) complexity on each event loop wakeup (for n = number of file descriptors). Then, to implement priorities fully, sd-event only dispatches a single event before going back to the kernel and asking for new events. sd-event will hence not provide the theoretically possible best scalability to huge numbers of file descriptors. Of course, this could be optimized, by improving epoll, and making it support how todays's event loops actually work (after, all, this is the problem set all event loops that implement priorities -- including GLib's -- have to deal with), but even then: the design of sd-event is focussed on running one event loop per thread, and it dispatches events strictly ordered. In many other important usecases a very different design is preferable: one where events are distributed to a set of worker threads and are dispatched out-of-order.</p> <p>Hence, don't mistake sd-event for what it isn't. It's not supposed to unify everybody on a single event loop. It's just supposed to be a very good implementation of an event loop suitable for a large part of the typical usecases.</p> <p>Note that our APIs, including <a href="the-new-sd-bus-api-of-systemd.html">sd-bus</a>, integrate nicely into sd-event event loops, but do not require it, and may be integrated into other event loops too, as long as they support watching for time and I/O events.</p> <p>And that's all for now. If you are considering using sd-event for your project and need help or have questions, please direct them to the <a href="http://lists.freedesktop.org/mailman/listinfo/systemd-devel">systemd mailing list</a>.</p>Lennart PoetteringThu, 19 Nov 2015 00:00:00 +0100tag:0pointer.net,2015-11-19:/blog/introducing-sd-event.htmlprojectssystemd.conf 2015 Summaryhttps://0pointer.net/blog/systemdconf-2015-summary.html<h1>systemd.conf 2015 is Over Now!</h1> <p>Last week our first <a href="https://systemd.events/">systemd.conf</a> conference took place at betahaus, in Berlin, Germany. With almost 100 attendees, a dense schedule of 23 high-quality talks stuffed into a single track on just two days, a productive hackfest and numerous consumed Club-Mates I believe it was quite a success!</p> <p>If you couldn't attend the conference, you may watch all talks on our <a href="https://www.youtube.com/channel/UCvq_RgZp3kljp9X8Io9Z1DA">YouTube Channel</a>. The slides are <a href="https://drive.google.com/open?id=0B-UWEwsUY5PJZXQ2emdsVXJ4OTA">available online</a>, too.</p> <p>Many photos from the conference are available on the <a href="https://plus.google.com/events/gallery/cilbcdfrpbk12h2qe8o18fn7m04">Google Events Page</a>. Enjoy!</p> <p>I'd specifically like to thank Daniel Mack, Chris Kühl and Nils Magnus for running the conference, and making sure that it worked out as smoothly as it did! Thank you very much, you did a fantastic job!</p> <p>I'd also specifically like to thank the <a href="http://c3voc.de/">CCC Video Operation Center</a> folks for the excellent video coverage of the conference. Not only did they implement a live-stream for the entire talks part of the conference, but also cut and uploaded videos of all talks to our <a href="https://www.youtube.com/channel/UCvq_RgZp3kljp9X8Io9Z1DA">YouTube Channel</a> within the same day (in fact, within a few hours after the talks finished). That's quite an impressive feat!</p> <p>The folks from LinuxTag e.V. put a lot of time and energy in the organization. It was great to see how well this all worked out! Excellent work!</p> <p>(BTW, LinuxTag e.V. and the CCC Video Operation Center folks are willing to help with the organization of Free Software community events in Germany (and Europe?). Hence, if you need an entity that can do the financial work and other stuff for your Free Software project's conference, consider pinging LinuxTag, they might be willing to help. Similar, if you are organizing such an event and are thinking about providing video coverage, consider pinging the the CCC VOC folks! Both of them get our best recommendations!)</p> <p>I'd also like to thank <a href="https://systemd.events/systemdconf-2015/sponsors">our conference sponsors</a>! Specifically, we'd like to thank our Gold Sponsors <strong>Red Hat</strong> and <strong>CoreOS</strong> for their support. We'd also like to thank our Silver Sponsor <strong>Codethink</strong>, and our Bronze Sponsors <strong>Pengutronix</strong>, <strong>Pantheon</strong>, <strong>Collabora</strong>, <strong>Endocode</strong>, the <strong>Linux Foundation</strong>, <strong>Samsung</strong> and <strong>Travelping</strong>, as well as our Cooperation Partners <strong>LinuxTag</strong> and <strong>kinvolk.io</strong>, and our Media Partner <strong>Golem.de</strong>.</p> <p>Last but not least I'd really like to thank our speakers and attendees for presenting and participating in the conference. Of course, the conference we put together specifically for you, and we really hope you had as much fun at it as we did!</p> <p>Thank you all for attending, supporting, and organizing <a href="https://systemd.events/">systemd.conf 2015</a>! We are looking forward to seeing you and working with you again at systemd.conf 2016!</p> <p>Thanks!</p>Lennart PoetteringMon, 09 Nov 2015 00:00:00 +0100tag:0pointer.net,2015-11-09:/blog/systemdconf-2015-summary.htmlprojectsSecond Round of systemd.conf 2015 Sponsorshttps://0pointer.net/blog/second-round-of-systemdconf-2015-sponsors.html<h1>Second Round of systemd.conf 2015 Sponsors</h1> <p>We are happy to announce the second round of <a href="https://systemd.events/">systemd.conf 2015</a> sponsors! In addition to those from <a href="http://0pointer.net/blog/first-round-of-systemdconf-2015-sponsors.html">the first announcement</a>, we have:</p> <p>Our second <strong>Gold</strong> sponsor is Red Hat!</p> <p><a href="https://systemd.events/systemdconf-2015/sponsors/red"><img src="https://systemd.events/sites/default/files/Red_Hat_RGB-220.png" width="220" height="85"></a></p> <blockquote> <p>What began as a better way to build software—openness, transparency, collaboration—soon shifted the balance of power in an entire industry. The revolution of choice continues. Today Red Hat® is the world's leading provider of open source solutions, using a community-powered approach to provide reliable and high-performing cloud, virtualization, storage, Linux®, and middleware technologies.</p> </blockquote> <p>A <strong>Bronze</strong> sponsor is <em>Samsung</em>:</p> <p><a href="https://systemd.events/systemdconf-2015/sponsors/samsung-poland-rd-center"><img src="https://systemd.events/sites/default/files/samsung_logo.png" width="220" height="86"></a></p> <blockquote> <p>From the beginning we have established a very fast pace and are currently one of the biggest and fastest growing modern-technology R&amp;D centers in East-Central Europe. We have started with designing subsystems for digital satellite television, however, we have quickly expanded the scope of our interest. Currently, it includes advanced systems of digital television, platform convergence, mobile systems, smart solutions, and enterprise solutions. Also a vital role in our activity plays the quality and certification center, which controls the conformity of Samsung Electronics products with the highest standards of quality and reliability.</p> </blockquote> <p>A <strong>Bronze</strong> sponsor is <em>travelping</em>:</p> <p><a href="https://systemd.events/systemdconf-2015/sponsors/travelping"><img src="https://systemd.events/sites/default/files/travelping_logo.png" width="220" height="60"></a></p> <blockquote> <p>Travelping is passionate about networks, communications and devices. We empower our customers to deploy and operate networks using our state of the art products, solutions and services. Our products and solutions are based on our industry proven physical and virtual appliance platforms. These purpose built platforms ensure best in class performance, scalability and reliability combined with consistent end to end management capabilities. To build this products, Travelping has developed a own embedded, cross platform Linux distribution called CAROS.io which incorporates the systemd service manager and tools.</p> </blockquote> <p>A <strong>Bronze</strong> sponsor is <em>Collabora</em>:</p> <p><a href="https://systemd.events/systemdconf-2015/sponsors/collabora"><img src="https://systemd.events/sites/default/files/collabora-logo.png" width="220" height="124"></a></p> <blockquote> <p>Collabora has over 10 years of experience working with top tier OEMs &amp; silicon manufacturers worldwide to develop products based on Open Source software. Through the use of Open Source technologies and methodologies, Collabora helps clients in multiple market segments gain faster time to market and save millions of dollars in licensing and maintenance costs. Collabora has already brought to market several products relying on systemd extensively.</p> </blockquote> <p>A <strong>Bronze</strong> sponsor is <em>Endocode</em>:</p> <p><a href="https://systemd.events/systemdconf-2015/sponsors/endocode"><img src="https://systemd.events/sites/default/files/endocode-logo.png" width="220" height="52"></a></p> <blockquote> <p>Endocode AG. An employee-owned, software engineering company from Berlin. Open Source is our heart and soul.</p> </blockquote> <p>A <strong>Bronze</strong> sponsor is the <em>Linux</em> <em>Foundation</em>:</p> <p><a href="https://systemd.events/systemdconf-2015/sponsors/linux-foundation"><img src="https://systemd.events/sites/default/files/Linux_Foundation-logo.png" width="220" height="95"></a></p> <blockquote> <p>The Linux Foundation advances the growth of Linux and offers its collaborative principles and practices to any endeavor.</p> </blockquote> <p>We are <strong>Cooperating</strong> with <em>LinuxTag</em> <em>e.V.</em> on the organization:</p> <p><a href="https://systemd.events/systemdconf-2015/sponsors/linuxtag-ev"><img src="https://systemd.events/sites/default/files/Linuxtag-logo.png" width="220" height="149"></a></p> <blockquote> <p>LinuxTag is Europe's leading organizer of Linux and Open Source events. Born of the community and in business for 20 years, we organize LinuxTag, an annual conference and exhibition attracting thousands of visitors. We also participate and cooperate in organizing workshops, tutorials, seminars, and other events together with and for the Open Source community. Selected events include non-profit workshops, the German Kernel Summit at FrOSCon, participation in the Open Tech Summit, and others. We take care of the organizational framework of systemd.conf 2015. LinuxTag e.V. is a non-profit organization and welcomes donations of ideas and workforce.</p> </blockquote> <p>A <strong>Media</strong> Partner is <em>Golem</em>:</p> <p><a href="https://systemd.events/systemdconf-2015/sponsors/golem"><img src="https://systemd.events/sites/default/files/golem_logo.png" width="220" height="220"></a></p> <blockquote> <p>Golem.de is an up to date online-publication intended for professional computer users. It provides technology insights of the IT and telecommunications industry. Golem.de offers profound and up to date information on significant and trending topics. Online- and IT-Professionals, marketing managers, purchasers, and readers inspired by technology receive substantial information on product, market and branding potentials through tests, interviews und market analysis.</p> </blockquote> <p>We'd like to thank our sponsors for their support! Without sponsors our conference would not be possible!</p> <p>The Conference s SOLD OUT since a few weeks. We no longer accept registrations, nor paper submissions.</p> <p>For further details about systemd.conf consult the <a href="https://systemd.events/">conference website</a>.</p> <p>See the <a href="http://0pointer.net/blog/first-round-of-systemdconf-2015-sponsors.html">the first round of sponsor announcements</a>!</p> <p>See you in Berlin!</p>Lennart PoetteringMon, 19 Oct 2015 00:00:00 +0200tag:0pointer.net,2015-10-19:/blog/second-round-of-systemdconf-2015-sponsors.htmlprojectssystemd.conf close to being sold out!https://0pointer.net/blog/systemdconf-close-to-being-sold-out.html<h1>Only 14 tickets still available!</h1> <p>systemd.conf 2015 is close to being sold out, there are <em>only</em> <em>14</em> <em>tickets</em> <em>left</em> now. If you haven't bought your ticket yet, now is the time to do it, because otherwise it will be too late and all tickets will be gone!</p> <p>Why attend? At this conference you'll get to meet everybody who is involved with the systemd project and learn what they are working on, and where the project will go next. You'll hear from major users and projects working with systemd. It's the primary forum where you can make yourself heard and get first hand access to everybody who's working on the future of the core Linux userspace!</p> <p>To get an idea about the schedule, please consult our <a href="https://systemd.events/systemdconf-2015/schedule">preliminary schedule</a>.</p> <p>In order to <strong>register</strong> for the conference, please visit <a href="https://systemd.events/systemdconf-2015/registration">the registration page</a>.</p> <p>We are still looking for sponsors. If you'd like to join the ranks of systemd.conf 2015 sponsors, please have a look at our <a href="https://systemd.events/systemdconf-2015/become-sponsor">Becoming a Sponsor</a> page!</p> <p>For further details about systemd.conf consult the <a href="https://systemd.events/">conference website</a>.</p>Lennart PoetteringWed, 23 Sep 2015 00:00:00 +0200tag:0pointer.net,2015-09-23:/blog/systemdconf-close-to-being-sold-out.htmlprojectsPreliminary systemd.conf 2015 Schedulehttps://0pointer.net/blog/preliminary-systemdconf-2015-schedule.html<h1>A Preliminary systemd.conf 2015 Schedule is Now Online!</h1> <p>We are happy to announce that an initial, preliminary version of the <a href="https://systemd.events/systemdconf-2015/schedule">systemd.conf 2015 schedule</a> is now online! (Please ignore that some rows in the schedule link the same session twice on that page. That's a bug in the web site CMS we are working on to fix.)</p> <p>We got an overwhelming number of high-quality submissions during the CfP! Because there were so many good talks we really wanted to accept, we decided to do two full days of talks now, leaving one more day for the hackfest and BoFs. We also shortened many of the slots, to make room for more. All in all we now have a schedule packed with fantastic presentations!</p> <p>The areas covered range from containers, to system provisioning, stateless systems, distributed init systems, the kdbus IPC, control groups, systemd on the desktop, systemd in embedded devices, configuration management and systemd, and systemd in downstream distributions.</p> <p>We'd like to thank everybody who submited a presentation proposal!</p> <p>Also, don't forget to <strong>register</strong> for the conference! Only a limited number of registrations are available due to space constraints! <a href="https://systemd.events/systemdconf-2015/registration">Register here!</a>.</p> <p>We are still looking for sponsors. If you'd like to join the ranks of systemd.conf 2015 sponsors, please have a look at our <a href="https://systemd.events/systemdconf-2015/become-sponsor">Becoming a Sponsor</a> page!</p> <p>For further details about systemd.conf consult the <a href="https://systemd.events/">conference website</a>.</p>Lennart PoetteringWed, 16 Sep 2015 00:00:00 +0200tag:0pointer.net,2015-09-16:/blog/preliminary-systemdconf-2015-schedule.htmlprojectssystemd.conf 2015 CfP REMINDERhttps://0pointer.net/blog/systemdconf-2015-cfp-reminder.html<h1>LAST REMINDER! systemd.conf 2015 Call for Presentations ends August 31st!</h1> <p>Here's the last reminder that the systemd.conf 2015 CfP ends on August 31st 11:59:59pm Central European Time (that's monday next week)! Make sure to submit your proposals until then!</p> <p>Please submit your proposals <a href="https://systemd.events/systemdconf-2015/call-presentations">on our website</a>!</p> <p>And don't forget to register for the conference! Only a limited number of registrations are available due to space constraints! <a href="https://systemd.events/systemdconf-2015/registration">Register here!</a>.</p> <p>For further details about systemd.conf consult the <a href="https://systemd.events/">conference website</a>.</p>Lennart PoetteringFri, 28 Aug 2015 00:00:00 +0200tag:0pointer.net,2015-08-28:/blog/systemdconf-2015-cfp-reminder.htmlprojectsFirst Round of systemd.conf 2015 Sponsorshttps://0pointer.net/blog/first-round-of-systemdconf-2015-sponsors.html<h1>First Round of systemd.conf 2015 Sponsors</h1> <p>We are happy to announce the first round of <a href="https://systemd.events/">systemd.conf 2015</a> sponsors!</p> <p>Our first <strong>Gold</strong> sponsor is CoreOS!</p> <p><a href="https://systemd.events/systemdconf-2015/sponsors/coreos"><img src="https://systemd.events/sites/default/files/coreos-logo.png" width="240" height="105"></a></p> <blockquote> <p>CoreOS develops software for modern infrastructure that delivers a consistent operating environment for distributed applications. CoreOS's commercial offering, Tectonic, is an enterprise-ready platform that combines Kubernetes and the CoreOS stack to run Linux containers. In addition CoreOS is the creator and maintainer of open source projects such as CoreOS Linux, etcd, fleet, flannel and rkt. The strategies and architectures that influence CoreOS allow companies like Google, Facebook and Twitter to run their services at scale with high resilience. Learn more about CoreOS here https://coreos.com/, Tectonic here, https://tectonic.com/ or follow CoreOS on Twitter @coreoslinux.</p> </blockquote> <p>A <strong>Silver</strong> sponsor is <em>Codethink</em>:</p> <p><a href="https://systemd.events/systemdconf-2015/sponsors/codethink"><img src="https://systemd.events/sites/default/files/codethink-logo_0.png" width="220" height="64"></a></p> <blockquote> <p>Codethink is a software services consultancy, focusing on engineering reliable systems for long-term deployment with open source technologies.</p> </blockquote> <p>A <strong>Bronze</strong> sponsor is <em>Pantheon</em>:</p> <p><a href="https://systemd.events/systemdconf-2015/sponsors/pantheon"><img src="https://systemd.events/sites/default/files/Pantheon_logo.png" width="220" height="91"></a></p> <blockquote> <p>Pantheon is a platform for professional website development, testing, and deployment. Supporting Drupal and WordPress, Pantheon runs over 100,000 websites for the world's top brands, universities, and media organizations on top of over a million containers.</p> </blockquote> <p>A <strong>Bronze</strong> sponsor is <em>Pengutronix</em>:</p> <p><a href="https://systemd.events/systemdconf-2015/sponsors/pengutronix"><img src="https://systemd.events/sites/default/files/pengutronix_0.png" width="220" height="76"></a></p> <blockquote> <p>Pengutronix provides consulting, training and development services for Embedded Linux to customers from the industry. The Kernel Team ports Linux to customer hardware and has more than 3100 patches in the official mainline kernel. In addition to lowlevel ports, the Pengutronix Application Team is responsible for board support packages based on PTXdist or Yocto and deals with system integration (this is where systemd plays an important role). The Graphics Team works on accelerated multimedia tasks, based on the Linux kernel, GStreamer, Qt and web technologies.</p> </blockquote> <p>We'd like to thank our sponsors for their support! Without sponsors our conference would not be possible!</p> <p>We'll shortly announce our second round of sponsors, please stay tuned!</p> <p>If you'd like to join the ranks of systemd.conf 2015 sponsors, please have a look at our <a href="https://systemd.events/systemdconf-2015/become-sponsor">Becoming a Sponsor</a> page!</p> <p><strong>Reminder!</strong> The systemd.conf 2015 Call for Presentations ends on monday, <strong>August 31st</strong>! Please make sure to submit your proposals on the <a href="https://systemd.events/systemdconf-2015/call-presentations">CfP page</a> until then!</p> <p>Also, don't forget to <strong>register</strong> for the conference! Only a limited number of registrations are available due to space constraints! <a href="https://systemd.events/systemdconf-2015/registration">Register here!</a>.</p> <p>For further details about systemd.conf consult the <a href="https://systemd.events/">conference website</a>.</p>Lennart PoetteringTue, 25 Aug 2015 00:00:00 +0200tag:0pointer.net,2015-08-25:/blog/first-round-of-systemdconf-2015-sponsors.htmlprojectssystemd.conf 2015 Call for Presentationshttps://0pointer.net/blog/systemdconf-2015-call-for-presentations.html<h1>REMINDER! systemd.conf 2015 Call for Presentations ends August 31st!</h1> <p>We'd like to remind you that the systemd.conf 2015 Call for Presentations ends on <strong>August 31st</strong>! Please submit your presentation proposals before that data <a href="https://systemd.events/systemdconf-2015/call-presentations">on our website</a>.</p> <p>We are specifically interested in submissions from projects and vendors building today's and tomorrow's <strong>products</strong>, <strong>services</strong> and <strong>devices</strong> with systemd. We'd like to learn about the problems you encounter and the benefits you see! Hence, if you work for a company using systemd, please submit a presentation!</p> <p>We are also specifically interested in submissions from <strong>downstream</strong> <strong>distribution</strong> <strong>maintainers</strong> of systemd! If you develop or maintain systemd packages in a distribution, please submit a presentation reporting about the state, future and the problems of systemd packaging so that we can improve downstream collaboration!</p> <p>And of course, all talks regarding systemd usage in <strong>containers</strong>, in the <strong>cloud</strong>, on <strong>servers</strong>, on the <strong>desktop</strong>, in <strong>mobile</strong> and in <strong>embedded</strong> are highly welcome! Talks about systemd <strong>networking</strong> and <strong>kdbus</strong> IPC are very welcome too!</p> <p>Please submit your presentations until <em>August 31st</em>!</p> <p>And don't forget to register for the conference! Only a limited number of registrations are available due to space constraints! <a href="https://systemd.events/systemdconf-2015/registration">Register here!</a>.</p> <p>Also, limited travel and entry fee sponsorship is available for community contributors. Please contact us for details!</p> <p>For further details about the CfP consult the <a href="https://systemd.events/systemdconf-2015/call-presentations">CfP page</a>.</p> <p>For further details about systemd.conf consult the <a href="https://systemd.events/">conference website</a>.</p>Lennart PoetteringWed, 19 Aug 2015 00:00:00 +0200tag:0pointer.net,2015-08-19:/blog/systemdconf-2015-call-for-presentations.htmlprojectsAnnouncing systemd.conf 2015https://0pointer.net/blog/announcing-systemdconf-2015.html<h1>Announcing systemd.conf 2015</h1> <p>We are happy to announce the inaugural <a href="https://systemd.events/">systemd.conf 2015</a> conference of the <a href="https://wiki.freedesktop.org/www/Software/systemd/">systemd project</a>.</p> <p>The conference takes place November 5th-7th, 2015 in Berlin, Germany.</p> <p>Only a limited number of tickets are available, hence make sure to sign up quickly.</p> <p>For further details consult the <a href="https://systemd.events/">conference website</a>.</p>Lennart PoetteringWed, 29 Jul 2015 00:00:00 +0200tag:0pointer.net,2015-07-29:/blog/announcing-systemdconf-2015.htmlprojectsThe new sd-bus API of systemdhttps://0pointer.net/blog/the-new-sd-bus-api-of-systemd.html<p>With the new <a href="http://lists.freedesktop.org/archives/systemd-devel/2015-June/033170.html">v221 release of systemd</a> we are declaring the <a href="https://github.com/systemd/systemd/blob/master/src/systemd/sd-bus.h">sd-bus</a> API shipped with <a href="https://wiki.freedesktop.org/www/Software/systemd/">systemd</a> stable. sd-bus is our minimal <a href="https://en.wikipedia.org/wiki/D-Bus">D-Bus IPC</a> C library, supporting as back-ends both classic socket-based D-Bus and <a href="https://github.com/systemd/kdbus">kdbus</a>. The library has been been part of systemd for a while, but has only been used internally, since we wanted to have the liberty to still make API changes without affecting external consumers of the library. However, now we are confident to commit to a stable API for it, starting with v221.</p> <p>In this blog story I hope to provide you with a quick overview on sd-bus, a short reiteration on D-Bus and its concepts, as well as a few simple examples how to write D-Bus clients and services with it.</p> <h1>What is D-Bus again?</h1> <p>Let's start with a quick reminder what <a href="https://en.wikipedia.org/wiki/D-Bus">D-Bus</a> actually is: it's a powerful, generic IPC system for Linux and other operating systems. It knows concepts like buses, objects, interfaces, methods, signals, properties. It provides you with fine-grained access control, a rich type system, discoverability, introspection, monitoring, reliable multicasting, service activation, file descriptor passing, and more. There are bindings for numerous programming languages that are used on Linux.</p> <p>D-Bus has been a core component of Linux systems since more than 10 years. It is certainly the most widely established high-level local IPC system on Linux. Since systemd's inception it has been the IPC system it exposes its interfaces on. And even before systemd, it was the IPC system Upstart used to expose its interfaces. It is used by GNOME, by KDE and by a variety of system components.</p> <p>D-Bus refers to both <a href="http://dbus.freedesktop.org/doc/dbus-specification.html">a specification</a>, and <a href="https://wiki.freedesktop.org/www/Software/dbus/">a reference implementation</a>. The reference implementation provides both a bus server component, as well as a client library. While there are multiple other, popular reimplementations of the client library – for both C and other programming languages –, the only commonly used server side is the one from the reference implementation. (However, the kdbus project is working on providing an alternative to this server implementation as a kernel component.)</p> <p>D-Bus is mostly used as local IPC, on top of AF_UNIX sockets. However, the protocol may be used on top of TCP/IP as well. It does not natively support encryption, hence using D-Bus directly on TCP is usually not a good idea. It is possible to combine D-Bus with a transport like ssh in order to secure it. systemd uses this to make many of its APIs accessible remotely.</p> <p>A frequently asked question about D-Bus is why it exists at all, given that AF_UNIX sockets and FIFOs already exist on UNIX and have been used for a long time successfully. To answer this question let's make a comparison with popular web technology of today: what AF_UNIX/FIFOs are to D-Bus, TCP is to HTTP/REST. While AF_UNIX sockets/FIFOs only shovel raw bytes between processes, D-Bus defines actual message encoding and adds concepts like method call transactions, an object system, security mechanisms, multicasting and more.</p> <p>From our 10year+ experience with D-Bus we know today that while there are some areas where we can improve things (and we are working on that, both with kdbus and sd-bus), it generally appears to be a very well designed system, that stood the test of time, aged well and is widely established. Today, if we'd sit down and design a completely new IPC system incorporating all the experience and knowledge we gained with D-Bus, I am sure the result would be very close to what D-Bus already is.</p> <p>Or in short: D-Bus is great. If you hack on a Linux project and need a local IPC, it should be your first choice. Not only because D-Bus is well designed, but also because there aren't many alternatives that can cover similar functionality.</p> <h1>Where does sd-bus fit in?</h1> <p>Let's discuss why sd-bus exists, how it compares with the other existing C D-Bus libraries and why it might be a library to consider for your project.</p> <p>For C, there are two established, popular D-Bus libraries: libdbus, as it is shipped in the reference implementation of D-Bus, as well as GDBus, a component of GLib, the low-level tool library of GNOME.</p> <p>Of the two libdbus is the much older one, as it was written at the time the specification was put together. The library was written with a focus on being portable and to be useful as back-end for higher-level language bindings. Both of these goals required the API to be very generic, resulting in a relatively baroque, hard-to-use API that lacks the bits that make it easy and fun to use from C. It provides the building blocks, but few tools to actually make it straightforward to build a house from them. On the other hand, the library is suitable for most use-cases (for example, it is OOM-safe making it suitable for writing lowest level system software), and is portable to operating systems like Windows or more exotic UNIXes.</p> <p><a href="https://developer.gnome.org/gio/stable/gdbus-convenience.html">GDBus</a> is a much newer implementation. It has been written after considerable experience with using a GLib/GObject wrapper around libdbus. GDBus is implemented from scratch, shares no code with libdbus. Its design differs substantially from libdbus, it contains code generators to make it specifically easy to expose GObject objects on the bus, or talking to D-Bus objects as GObject objects. It translates D-Bus data types to GVariant, which is GLib's powerful data serialization format. If you are used to GLib-style programming then you'll feel right at home, hacking D-Bus services and clients with it is a lot simpler than using libdbus.</p> <p>With sd-bus we now provide a third implementation, sharing no code with either libdbus or GDBus. For us, the focus was on providing kind of a middle ground between libdbus and GDBus: a low-level C library that actually is fun to work with, that has enough syntactic sugar to make it easy to write clients and services with, but on the other hand is more low-level than GDBus/GLib/GObject/GVariant. To be able to use it in systemd's various system-level components it needed to be OOM-safe and minimal. Another major point we wanted to focus on was supporting a kdbus back-end right from the beginning, in addition to the socket transport of the original D-Bus specification ("dbus1"). In fact, we wanted to design the library closer to kdbus' semantics than to dbus1's, wherever they are different, but still cover both transports nicely. In contrast to libdbus or GDBus portability is not a priority for sd-bus, instead we try to make the best of the Linux platform and expose specific Linux concepts wherever that is beneficial. Finally, performance was also an issue (though a secondary one): neither libdbus nor GDBus will win any speed records. We wanted to improve on performance (throughput and latency) -- but simplicity and correctness are more important to us. We believe the result of our work delivers our goals quite nicely: the library is fun to use, supports kdbus and sockets as back-end, is relatively minimal, and the <a href="http://lists.freedesktop.org/archives/systemd-devel/2015-May/031418.html">performance is substantially better</a> than both libdbus and GDBus.</p> <p>To decide which of the three APIs to use for you C project, here are short guidelines:</p> <ul> <li> <p>If you hack on a GLib/GObject project, GDBus is definitely your first choice.</p> </li> <li> <p>If portability to non-Linux kernels -- including Windows, Mac OS and other UNIXes -- is important to you, use either GDBus (which more or less means buying into GLib/GObject) or libdbus (which requires a lot of manual work).</p> </li> <li> <p>Otherwise, sd-bus would be my recommended choice.</p> </li> </ul> <p>(I am not covering C++ specifically here, this is all about plain C only. But do note: if you use Qt, then QtDBus is the D-Bus API of choice, being a wrapper around libdbus.)</p> <h1>Introduction to D-Bus Concepts</h1> <p>To the uninitiated D-Bus usually appears to be a relatively opaque technology. It uses lots of concepts that appear unnecessarily complex and redundant on first sight. But actually, they make a lot of sense. Let's have a look:</p> <ul> <li> <p>A <em>bus</em> is where you look for IPC services. There are usually two kinds of buses: a system bus, of which there's exactly one per system, and which is where you'd look for system services; and a user bus, of which there's one per user, and which is where you'd look for user services, like the address book service or the mail program. (Originally, the user bus was actually a session bus -- so that you get multiple of them if you log in many times as the same user --, and on most setups it still is, but we are working on moving things to a true user bus, of which there is only one per user on a system, regardless how many times that user happens to log in.)</p> </li> <li> <p>A <em>service</em> is a program that offers some IPC API on a bus. A service is identified by a name in reverse domain name notation. Thus, the <code>org.freedesktop.NetworkManager</code> service on the system bus is where NetworkManager's APIs are available and <code>org.freedesktop.login1</code> on the system bus is where <code>systemd-logind</code>'s APIs are exposed.</p> </li> <li> <p>A <em>client</em> is a program that makes use of some IPC API on a bus. It talks to a service, monitors it and generally doesn't provide any services on its own. That said, lines are blurry and many services are also clients to other services. Frequently the term <em>peer</em> is used as a generalization to refer to either a service or a client.</p> </li> <li> <p>An <em>object path</em> is an identifier for an object on a specific service. In a way this is comparable to a C pointer, since that's how you generally reference a C object, if you hack object-oriented programs in C. However, C pointers are just memory addresses, and passing memory addresses around to other processes would make little sense, since they of course refer to the address space of the service, the client couldn't make sense of it. Thus, the D-Bus designers came up with the object path concept, which is just a string that looks like a file system path. Example: <code>/org/freedesktop/login1</code> is the object path of the 'manager' object of the <code>org.freedesktop.login1</code> service (which, as we remember from above, is still the service <code>systemd-logind</code> exposes). Because object paths are structured like file system paths they can be neatly arranged in a tree, so that you end up with a venerable tree of objects. For example, you'll find all user sessions <code>systemd-logind</code> manages below the <code>/org/freedesktop/login1/session</code> sub-tree, for example called <code>/org/freedesktop/login1/session/_7</code>, <code>/org/freedesktop/login1/session/_55</code> and so on. How services precisely label their objects and arrange them in a tree is completely up to the developers of the services.</p> </li> <li> <p>Each object that is identified by an object path has one or more <em>interfaces</em>. An interface is a collection of signals, methods, and properties (collectively called <em>members</em>), that belong together. The concept of a D-Bus interface is actually pretty much identical to what you know from programming languages such as Java, which also know an interface concept. Which interfaces an object implements are up the developers of the service. Interface names are in reverse domain name notation, much like service names. (Yes, that's admittedly confusing, in particular since it's pretty common for simpler services to reuse the service name string also as an interface name.) A couple of interfaces are standardized though and you'll find them available on many of the objects offered by the various services. Specifically, those are <code>org.freedesktop.DBus.Introspectable</code>, <code>org.freedesktop.DBus.Peer</code> and <code>org.freedesktop.DBus.Properties</code>.</p> </li> <li> <p>An interface can contain <em>methods</em>. The word "method" is more or less just a fancy word for "function", and is a term used pretty much the same way in object-oriented languages such as Java. The most common interaction between D-Bus peers is that one peer invokes one of these methods on another peer and gets a reply. A D-Bus method takes a couple of parameters, and returns others. The parameters are transmitted in a type-safe way, and the type information is included in the introspection data you can query from each object. Usually, method names (and the other member types) follow a <em>CamelCase</em> syntax. For example, <code>systemd-logind</code> exposes an <code>ActivateSession</code> method on the <code>org.freedesktop.login1.Manager</code> interface that is available on the <code>/org/freedesktop/login1</code> object of the <code>org.freedesktop.login1</code> service.</p> </li> <li> <p>A <em>signature</em> describes a set of parameters a function (or signal, property, see below) takes or returns. It's a series of characters that each encode one parameter by its type. The set of types available is pretty powerful. For example, there are simpler types like <code>s</code> for string, or <code>u</code> for 32bit integer, but also complex types such as <code>as</code> for an array of strings or <code>a(sb)</code> for an array of structures consisting of one string and one boolean each. See <a href="http://dbus.freedesktop.org/doc/dbus-specification.html#type-system">the D-Bus specification</a> for the full explanation of the type system. The <code>ActivateSession</code> method mentioned above takes a single string as parameter (the parameter signature is hence <code>s</code>), and returns nothing (the return signature is hence the empty string). Of course, the signature can get a lot more complex, see below for more examples.</p> </li> <li> <p>A <em>signal</em> is another member type that the D-Bus object system knows. Much like a method it has a signature. However, they serve different purposes. While in a method call a single client issues a request on a single service, and that service sends back a response to the client, signals are for general notification of peers. Services send them out when they want to tell one or more peers on the bus that something happened or changed. In contrast to method calls and their replies they are hence usually broadcast over a bus. While method calls/replies are used for duplex one-to-one communication, signals are usually used for simplex one-to-many communication (note however that that's not a requirement, they can also be used one-to-one). Example: <code>systemd-logind</code> broadcasts a <code>SessionNew</code> signal from its manager object each time a user logs in, and a <code>SessionRemoved</code> signal every time a user logs out.</p> </li> <li> <p>A <em>property</em> is the third member type that the D-Bus object system knows. It's similar to the property concept known by languages like C#. Properties also have a signature, and are more or less just variables that an object exposes, that can be read or altered by clients. Example: <code>systemd-logind</code> exposes a property <code>Docked</code> of the signature <code>b</code> (a boolean). It reflects whether <code>systemd-logind</code> thinks the system is currently in a docking station of some form (only applies to laptops …).</p> </li> </ul> <p>So much for the various concepts D-Bus knows. Of course, all these new concepts might be overwhelming. Let's look at them from a different perspective. I assume many of the readers have an understanding of today's web technology, specifically HTTP and REST. Let's try to compare the concept of a HTTP request with the concept of a D-Bus method call:</p> <ul> <li> <p>A HTTP request you issue on a specific network. It could be the Internet, or it could be your local LAN, or a company VPN. Depending on which network you issue the request on, you'll be able to talk to a different set of servers. This is not unlike the "bus" concept of D-Bus.</p> </li> <li> <p>On the network you then pick a specific HTTP server to talk to. That's roughly comparable to picking a service on a specific bus.</p> </li> <li> <p>On the HTTP server you then ask for a specific URL. The "path" part of the URL (by which I mean everything after the host name of the server, up to the last "/") is pretty similar to a D-Bus object path.</p> </li> <li> <p>The "file" part of the URL (by which I mean everything after the last slash, following the path, as described above), then defines the actual call to make. In D-Bus this could be mapped to an interface and method name.</p> </li> <li> <p>Finally, the parameters of a HTTP call follow the path after the "?", they map to the signature of the D-Bus call.</p> </li> </ul> <p>Of course, comparing an HTTP request to a D-Bus method call is a bit comparing apples and oranges. However, I think it's still useful to get a bit of a feeling of what maps to what.</p> <h1>From the shell</h1> <p>So much about the concepts and the gray theory behind them. Let's make this exciting, let's actually see how this feels on a real system.</p> <p>Since a while systemd has included a tool <code>busctl</code> that is useful to explore and interact with the D-Bus object system. When invoked without parameters, it will show you a list of all peers connected to the system bus. (Use <code>--user</code> to see the peers of your user bus instead):</p> <div class="highlight"><pre><span></span><code><span class="gp">$ </span>busctl <span class="go">NAME PID PROCESS USER CONNECTION UNIT SESSION DESCRIPTION</span> <span class="go">:1.1 1 systemd root :1.1 - - -</span> <span class="go">:1.11 705 NetworkManager root :1.11 NetworkManager.service - -</span> <span class="go">:1.14 744 gdm root :1.14 gdm.service - -</span> <span class="go">:1.4 708 systemd-logind root :1.4 systemd-logind.service - -</span> <span class="go">:1.7200 17563 busctl lennart :1.7200 session-1.scope 1 -</span> <span class="go">[…]</span> <span class="go">org.freedesktop.NetworkManager 705 NetworkManager root :1.11 NetworkManager.service - -</span> <span class="go">org.freedesktop.login1 708 systemd-logind root :1.4 systemd-logind.service - -</span> <span class="go">org.freedesktop.systemd1 1 systemd root :1.1 - - -</span> <span class="go">org.gnome.DisplayManager 744 gdm root :1.14 gdm.service - -</span> <span class="go">[…]</span> </code></pre></div> <p>(I have shortened the output a bit, to make keep things brief).</p> <p>The list begins with a list of all peers currently connected to the bus. They are identified by peer names like ":1.11". These are called <em>unique names</em> in D-Bus nomenclature. Basically, every peer has a unique name, and they are assigned automatically when a peer connects to the bus. They are much like an IP address if you so will. You'll notice that a couple of peers are already connected, including our little busctl tool itself as well as a number of system services. The list then shows all actual services on the bus, identified by their service names (as discussed above; to discern them from the unique names these are also called <em>well-known names</em>). In many ways well-known names are similar to DNS host names, i.e. they are a friendlier way to reference a peer, but on the lower level they just map to an IP address, or in this comparison the unique name. Much like you can connect to a host on the Internet by either its host name or its IP address, you can also connect to a bus peer either by its unique or its well-known name. (Note that each peer can have as many well-known names as it likes, much like an IP address can have multiple host names referring to it).</p> <p>OK, that's already kinda cool. Try it for yourself, on your local machine (all you need is a recent, systemd-based distribution).</p> <p>Let's now go the next step. Let's see which objects the <code>org.freedesktop.login1</code> service actually offers:</p> <div class="highlight"><pre><span></span><code><span class="gp">$ </span>busctl tree org.freedesktop.login1 <span class="go">└─/org/freedesktop/login1</span> <span class="go"> ├─/org/freedesktop/login1/seat</span> <span class="go"> │ ├─/org/freedesktop/login1/seat/seat0</span> <span class="go"> │ └─/org/freedesktop/login1/seat/self</span> <span class="go"> ├─/org/freedesktop/login1/session</span> <span class="go"> │ ├─/org/freedesktop/login1/session/_31</span> <span class="go"> │ └─/org/freedesktop/login1/session/self</span> <span class="go"> └─/org/freedesktop/login1/user</span> <span class="go"> ├─/org/freedesktop/login1/user/_1000</span> <span class="go"> └─/org/freedesktop/login1/user/self</span> </code></pre></div> <p>Pretty, isn't it? What's actually even nicer, and which the output does <em>not</em> show is that there's full command line completion available: as you press TAB the shell will auto-complete the service names for you. It's a real pleasure to explore your D-Bus objects that way!</p> <p>The output shows some objects that you might recognize from the explanations above. Now, let's go further. Let's see what interfaces, methods, signals and properties one of these objects actually exposes:</p> <div class="highlight"><pre><span></span><code><span class="gp">$ </span>busctl introspect org.freedesktop.login1 /org/freedesktop/login1/session/_31 <span class="go">NAME TYPE SIGNATURE RESULT/VALUE FLAGS</span> <span class="go">org.freedesktop.DBus.Introspectable interface - - -</span> <span class="go">.Introspect method - s -</span> <span class="go">org.freedesktop.DBus.Peer interface - - -</span> <span class="go">.GetMachineId method - s -</span> <span class="go">.Ping method - - -</span> <span class="go">org.freedesktop.DBus.Properties interface - - -</span> <span class="go">.Get method ss v -</span> <span class="go">.GetAll method s a{sv} -</span> <span class="go">.Set method ssv - -</span> <span class="go">.PropertiesChanged signal sa{sv}as - -</span> <span class="go">org.freedesktop.login1.Session interface - - -</span> <span class="go">.Activate method - - -</span> <span class="go">.Kill method si - -</span> <span class="go">.Lock method - - -</span> <span class="go">.PauseDeviceComplete method uu - -</span> <span class="go">.ReleaseControl method - - -</span> <span class="go">.ReleaseDevice method uu - -</span> <span class="go">.SetIdleHint method b - -</span> <span class="go">.TakeControl method b - -</span> <span class="go">.TakeDevice method uu hb -</span> <span class="go">.Terminate method - - -</span> <span class="go">.Unlock method - - -</span> <span class="go">.Active property b true emits-change</span> <span class="go">.Audit property u 1 const</span> <span class="go">.Class property s &quot;user&quot; const</span> <span class="go">.Desktop property s &quot;&quot; const</span> <span class="go">.Display property s &quot;&quot; const</span> <span class="go">.Id property s &quot;1&quot; const</span> <span class="go">.IdleHint property b true emits-change</span> <span class="go">.IdleSinceHint property t 1434494624206001 emits-change</span> <span class="go">.IdleSinceHintMonotonic property t 0 emits-change</span> <span class="go">.Leader property u 762 const</span> <span class="go">.Name property s &quot;lennart&quot; const</span> <span class="go">.Remote property b false const</span> <span class="go">.RemoteHost property s &quot;&quot; const</span> <span class="go">.RemoteUser property s &quot;&quot; const</span> <span class="go">.Scope property s &quot;session-1.scope&quot; const</span> <span class="go">.Seat property (so) &quot;seat0&quot; &quot;/org/freedesktop/login1/seat... const</span> <span class="go">.Service property s &quot;gdm-autologin&quot; const</span> <span class="go">.State property s &quot;active&quot; -</span> <span class="go">.TTY property s &quot;/dev/tty1&quot; const</span> <span class="go">.Timestamp property t 1434494630344367 const</span> <span class="go">.TimestampMonotonic property t 34814579 const</span> <span class="go">.Type property s &quot;x11&quot; const</span> <span class="go">.User property (uo) 1000 &quot;/org/freedesktop/login1/user/_1... const</span> <span class="go">.VTNr property u 1 const</span> <span class="go">.Lock signal - - -</span> <span class="go">.PauseDevice signal uus - -</span> <span class="go">.ResumeDevice signal uuh - -</span> <span class="go">.Unlock signal - - -</span> </code></pre></div> <p>As before, the busctl command supports command line completion, hence both the service name and the object path used are easily put together on the shell simply by pressing TAB. The output shows the methods, properties, signals of one of the session objects that are currently made available by <code>systemd-logind</code>. There's a section for each interface the object knows. The second column tells you what kind of member is shown in the line. The third column shows the signature of the member. In case of method calls that's the input parameters, the fourth column shows what is returned. For properties, the fourth column encodes the current value of them.</p> <p>So far, we just explored. Let's take the next step now: let's become active - let's call a method:</p> <div class="highlight"><pre><span></span><code><span class="gp"># </span>busctl call org.freedesktop.login1 /org/freedesktop/login1/session/_31 org.freedesktop.login1.Session Lock </code></pre></div> <p>I don't think I need to mention this anymore, but anyway: again there's full command line completion available. The third argument is the interface name, the fourth the method name, both can be easily completed by pressing TAB. In this case we picked the <code>Lock</code> method, which activates the screen lock for the specific session. And yupp, the instant I pressed enter on this line my screen lock turned on (this only works on DEs that correctly hook into <code>systemd-logind</code> for this to work. GNOME works fine, and KDE should work too).</p> <p>The <code>Lock</code> method call we picked is very simple, as it takes no parameters and returns none. Of course, it can get more complicated for some calls. Here's another example, this time using one of systemd's own bus calls, to start an arbitrary system unit:</p> <div class="highlight"><pre><span></span><code><span class="gp"># </span>busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager StartUnit ss <span class="s2">&quot;cups.service&quot;</span> <span class="s2">&quot;replace&quot;</span> <span class="go">o &quot;/org/freedesktop/systemd1/job/42684&quot;</span> </code></pre></div> <p>This call takes two strings as input parameters, as we denote in the signature string that follows the method name (as usual, command line completion helps you getting this right). Following the signature the next two parameters are simply the two strings to pass. The specified signature string hence indicates what comes next. systemd's StartUnit method call takes the unit name to start as first parameter, and the mode in which to start it as second. The call returned a single object path value. It is encoded the same way as the input parameter: a signature (just <code>o</code> for the object path) followed by the actual value.</p> <p>Of course, some method call parameters can get a ton more complex, but with <code>busctl</code> it's relatively easy to encode them all. See <a href="http://www.freedesktop.org/software/systemd/man/busctl.html">the man page</a> for details.</p> <p><code>busctl</code> knows a number of other operations. For example, you can use it to monitor D-Bus traffic as it happens (including generating a <code>.cap</code> file for use with Wireshark!) or you can set or get specific properties. However, this blog story was supposed to be about sd-bus, not <code>busctl</code>, hence let's cut this short here, and let me direct you to the man page in case you want to know more about the tool.</p> <p><code>busctl</code> (like the rest of system) is implemented using the sd-bus API. Thus it exposes many of the features of sd-bus itself. For example, you can use to connect to remote or container buses. It understands both kdbus and classic D-Bus, and more!</p> <h1>sd-bus</h1> <p>But enough! Let's get back on topic, let's talk about sd-bus itself.</p> <p>The sd-bus set of APIs is mostly contained in the header file <a href="https://github.com/systemd/systemd/blob/master/src/systemd/sd-bus.h">sd-bus.h</a>.</p> <p>Here's a random selection of features of the library, that make it compare well with the other implementations available.</p> <ul> <li> <p>Supports both kdbus and dbus1 as back-end.</p> </li> <li> <p>Has high-level support for connecting to remote buses via ssh, and to buses of local OS containers.</p> </li> <li> <p>Powerful credential model, to implement authentication of clients in services. Currently 34 individual fields are supported, from the PID of the client to the cgroup or capability sets.</p> </li> <li> <p>Support for tracking the life-cycle of peers in order to release local objects automatically when all peers referencing them disconnected.</p> </li> <li> <p>The client builds an efficient decision tree to determine which handlers to deliver an incoming bus message to.</p> </li> <li> <p>Automatically translates D-Bus errors into UNIX style errors and back (this is lossy though), to ensure best integration of D-Bus into low-level Linux programs.</p> </li> <li> <p>Powerful but lightweight object model for exposing local objects on the bus. Automatically generates introspection as necessary.</p> </li> </ul> <p>The API is currently not fully documented, but we are working on completing the set of manual pages. For details <a href="http://www.freedesktop.org/software/systemd/man/index.html#S">see all pages starting with <code>sd_bus_</code></a>.</p> <h1>Invoking a Method, from C, with sd-bus</h1> <p>So much about the library in general. Here's an example for connecting to the bus and issuing a method call:</p> <div class="highlight"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;stdio.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;stdlib.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;systemd/sd-bus.h&gt;</span><span class="cp"></span> <span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">argv</span><span class="p">[])</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">sd_bus_error</span><span class="w"> </span><span class="n">error</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">SD_BUS_ERROR_NULL</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="n">sd_bus_message</span><span class="w"> </span><span class="o">*</span><span class="n">m</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="n">sd_bus</span><span class="w"> </span><span class="o">*</span><span class="n">bus</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">path</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">r</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Connect to the system bus */</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_bus_open_system</span><span class="p">(</span><span class="o">&amp;</span><span class="n">bus</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Failed to connect to system bus: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">strerror</span><span class="p">(</span><span class="o">-</span><span class="n">r</span><span class="p">));</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Issue the method call and store the respons message in m */</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_bus_call_method</span><span class="p">(</span><span class="n">bus</span><span class="p">,</span><span class="w"></span> <span class="w"> </span><span class="s">&quot;org.freedesktop.systemd1&quot;</span><span class="p">,</span><span class="w"> </span><span class="cm">/* service to contact */</span><span class="w"></span> <span class="w"> </span><span class="s">&quot;/org/freedesktop/systemd1&quot;</span><span class="p">,</span><span class="w"> </span><span class="cm">/* object path */</span><span class="w"></span> <span class="w"> </span><span class="s">&quot;org.freedesktop.systemd1.Manager&quot;</span><span class="p">,</span><span class="w"> </span><span class="cm">/* interface name */</span><span class="w"></span> <span class="w"> </span><span class="s">&quot;StartUnit&quot;</span><span class="p">,</span><span class="w"> </span><span class="cm">/* method name */</span><span class="w"></span> <span class="w"> </span><span class="o">&amp;</span><span class="n">error</span><span class="p">,</span><span class="w"> </span><span class="cm">/* object to return error in */</span><span class="w"></span> <span class="w"> </span><span class="o">&amp;</span><span class="n">m</span><span class="p">,</span><span class="w"> </span><span class="cm">/* return message on success */</span><span class="w"></span> <span class="w"> </span><span class="s">&quot;ss&quot;</span><span class="p">,</span><span class="w"> </span><span class="cm">/* input signature */</span><span class="w"></span> <span class="w"> </span><span class="s">&quot;cups.service&quot;</span><span class="p">,</span><span class="w"> </span><span class="cm">/* first argument */</span><span class="w"></span> <span class="w"> </span><span class="s">&quot;replace&quot;</span><span class="p">);</span><span class="w"> </span><span class="cm">/* second argument */</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Failed to issue method call: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">.</span><span class="n">message</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Parse the response message */</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_bus_message_read</span><span class="p">(</span><span class="n">m</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;o&quot;</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">path</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Failed to parse response message: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">strerror</span><span class="p">(</span><span class="o">-</span><span class="n">r</span><span class="p">));</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">&quot;Queued service job as %s.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">path</span><span class="p">);</span><span class="w"></span> <span class="nl">finish</span><span class="p">:</span><span class="w"></span> <span class="w"> </span><span class="n">sd_bus_error_free</span><span class="p">(</span><span class="o">&amp;</span><span class="n">error</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="n">sd_bus_message_unref</span><span class="p">(</span><span class="n">m</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="n">sd_bus_unref</span><span class="p">(</span><span class="n">bus</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">EXIT_FAILURE</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">EXIT_SUCCESS</span><span class="p">;</span><span class="w"></span> <span class="p">}</span><span class="w"></span> </code></pre></div> <p>Save this example as <code>bus-client.c</code>, then build it with:</p> <div class="highlight"><pre><span></span><code><span class="gp">$ </span>gcc bus-client.c -o bus-client <span class="sb">`</span>pkg-config --cflags --libs libsystemd<span class="sb">`</span> </code></pre></div> <p>This will generate a binary <code>bus-client</code> you can now run. Make sure to run it as root though, since access to the <code>StartUnit</code> method is privileged:</p> <div class="highlight"><pre><span></span><code><span class="gp"># </span>./bus-client <span class="go">Queued service job as /org/freedesktop/systemd1/job/3586.</span> </code></pre></div> <p>And that's it already, our first example. It showed how we invoked a method call on the bus. The actual function call of the method is very close to the <code>busctl</code> command line we used before. I hope the code excerpt needs little further explanation. It's supposed to give you a taste how to write D-Bus clients with sd-bus. For more more information please have a look at the header file, the man page or even the sd-bus sources.</p> <h1>Implementing a Service, in C, with sd-bus</h1> <p>Of course, just calling a single method is a rather simplistic example. Let's have a look on how to write a bus service. We'll write a small calculator service, that exposes a single object, which implements an interface that exposes two methods: one to multiply two 64bit signed integers, and one to divide one 64bit signed integer by another.</p> <div class="highlight"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;stdio.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;stdlib.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;errno.h&gt;</span><span class="cp"></span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;systemd/sd-bus.h&gt;</span><span class="cp"></span> <span class="k">static</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">method_multiply</span><span class="p">(</span><span class="n">sd_bus_message</span><span class="w"> </span><span class="o">*</span><span class="n">m</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">userdata</span><span class="p">,</span><span class="w"> </span><span class="n">sd_bus_error</span><span class="w"> </span><span class="o">*</span><span class="n">ret_error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="kt">int64_t</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">r</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Read the parameters */</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_bus_message_read</span><span class="p">(</span><span class="n">m</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;xx&quot;</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">y</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Failed to parse parameters: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">strerror</span><span class="p">(</span><span class="o">-</span><span class="n">r</span><span class="p">));</span><span class="w"></span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">r</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Reply with the response */</span><span class="w"></span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">sd_bus_reply_method_return</span><span class="p">(</span><span class="n">m</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">y</span><span class="p">);</span><span class="w"></span> <span class="p">}</span><span class="w"></span> <span class="k">static</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">method_divide</span><span class="p">(</span><span class="n">sd_bus_message</span><span class="w"> </span><span class="o">*</span><span class="n">m</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">userdata</span><span class="p">,</span><span class="w"> </span><span class="n">sd_bus_error</span><span class="w"> </span><span class="o">*</span><span class="n">ret_error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="kt">int64_t</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">r</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Read the parameters */</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_bus_message_read</span><span class="p">(</span><span class="n">m</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;xx&quot;</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">y</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Failed to parse parameters: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">strerror</span><span class="p">(</span><span class="o">-</span><span class="n">r</span><span class="p">));</span><span class="w"></span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">r</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Return an error on division by zero */</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">y</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">sd_bus_error_set_const</span><span class="p">(</span><span class="n">ret_error</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;net.poettering.DivisionByZero&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Sorry, can&#39;t allow division by zero.&quot;</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">-</span><span class="n">EINVAL</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">sd_bus_reply_method_return</span><span class="p">(</span><span class="n">m</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">y</span><span class="p">);</span><span class="w"></span> <span class="p">}</span><span class="w"></span> <span class="cm">/* The vtable of our little object, implements the net.poettering.Calculator interface */</span><span class="w"></span> <span class="k">static</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">sd_bus_vtable</span><span class="w"> </span><span class="n">calculator_vtable</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">SD_BUS_VTABLE_START</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span><span class="w"></span> <span class="w"> </span><span class="n">SD_BUS_METHOD</span><span class="p">(</span><span class="s">&quot;Multiply&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;xx&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">method_multiply</span><span class="p">,</span><span class="w"> </span><span class="n">SD_BUS_VTABLE_UNPRIVILEGED</span><span class="p">),</span><span class="w"></span> <span class="w"> </span><span class="n">SD_BUS_METHOD</span><span class="p">(</span><span class="s">&quot;Divide&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;xx&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">method_divide</span><span class="p">,</span><span class="w"> </span><span class="n">SD_BUS_VTABLE_UNPRIVILEGED</span><span class="p">),</span><span class="w"></span> <span class="w"> </span><span class="n">SD_BUS_VTABLE_END</span><span class="w"></span> <span class="p">};</span><span class="w"></span> <span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">argv</span><span class="p">[])</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">sd_bus_slot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="n">sd_bus</span><span class="w"> </span><span class="o">*</span><span class="n">bus</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">r</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Connect to the user bus this time */</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_bus_open_user</span><span class="p">(</span><span class="o">&amp;</span><span class="n">bus</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Failed to connect to system bus: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">strerror</span><span class="p">(</span><span class="o">-</span><span class="n">r</span><span class="p">));</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Install the object */</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_bus_add_object_vtable</span><span class="p">(</span><span class="n">bus</span><span class="p">,</span><span class="w"></span> <span class="w"> </span><span class="o">&amp;</span><span class="n">slot</span><span class="p">,</span><span class="w"></span> <span class="w"> </span><span class="s">&quot;/net/poettering/Calculator&quot;</span><span class="p">,</span><span class="w"> </span><span class="cm">/* object path */</span><span class="w"></span> <span class="w"> </span><span class="s">&quot;net.poettering.Calculator&quot;</span><span class="p">,</span><span class="w"> </span><span class="cm">/* interface name */</span><span class="w"></span> <span class="w"> </span><span class="n">calculator_vtable</span><span class="p">,</span><span class="w"></span> <span class="w"> </span><span class="nb">NULL</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Failed to issue method call: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">strerror</span><span class="p">(</span><span class="o">-</span><span class="n">r</span><span class="p">));</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Take a well-known service name so that clients can find us */</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_bus_request_name</span><span class="p">(</span><span class="n">bus</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;net.poettering.Calculator&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Failed to acquire service name: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">strerror</span><span class="p">(</span><span class="o">-</span><span class="n">r</span><span class="p">));</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(;;)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Process requests */</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_bus_process</span><span class="p">(</span><span class="n">bus</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Failed to process bus: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">strerror</span><span class="p">(</span><span class="o">-</span><span class="n">r</span><span class="p">));</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="cm">/* we processed a request, try to process another one, right-away */</span><span class="w"></span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="cm">/* Wait for the next request to process */</span><span class="w"></span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd_bus_wait</span><span class="p">(</span><span class="n">bus</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="w"> </span><span class="mi">-1</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Failed to wait on bus: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">strerror</span><span class="p">(</span><span class="o">-</span><span class="n">r</span><span class="p">));</span><span class="w"></span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">finish</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="nl">finish</span><span class="p">:</span><span class="w"></span> <span class="w"> </span><span class="n">sd_bus_slot_unref</span><span class="p">(</span><span class="n">slot</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="n">sd_bus_unref</span><span class="p">(</span><span class="n">bus</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">EXIT_FAILURE</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">EXIT_SUCCESS</span><span class="p">;</span><span class="w"></span> <span class="p">}</span><span class="w"></span> </code></pre></div> <p>Save this example as <code>bus-service.c</code>, then build it with:</p> <div class="highlight"><pre><span></span><code><span class="gp">$ </span>gcc bus-service.c -o bus-service <span class="sb">`</span>pkg-config --cflags --libs libsystemd<span class="sb">`</span> </code></pre></div> <p>Now, let's run it:</p> <div class="highlight"><pre><span></span><code><span class="gp">$ </span>./bus-service </code></pre></div> <p>In another terminal, let's try to talk to it. Note that this service is now on the user bus, not on the system bus as before. We do this for simplicity reasons: on the system bus access to services is tightly controlled so unprivileged clients cannot request privileged operations. On the user bus however things are simpler: as only processes of the user owning the bus can connect no further policy enforcement will complicate this example. Because the service is on the user bus, we have to pass the <code>--user</code> switch on the <code>busctl</code> command line. Let's start with looking at the service's object tree.</p> <div class="highlight"><pre><span></span><code><span class="gp">$ </span>busctl --user tree net.poettering.Calculator <span class="go">└─/net/poettering/Calculator</span> </code></pre></div> <p>As we can see, there's only a single object on the service, which is not surprising, given that our code above only registered one. Let's see the interfaces and the members this object exposes:</p> <div class="highlight"><pre><span></span><code><span class="gp">$ </span>busctl --user introspect net.poettering.Calculator /net/poettering/Calculator <span class="go">NAME TYPE SIGNATURE RESULT/VALUE FLAGS</span> <span class="go">net.poettering.Calculator interface - - -</span> <span class="go">.Divide method xx x -</span> <span class="go">.Multiply method xx x -</span> <span class="go">org.freedesktop.DBus.Introspectable interface - - -</span> <span class="go">.Introspect method - s -</span> <span class="go">org.freedesktop.DBus.Peer interface - - -</span> <span class="go">.GetMachineId method - s -</span> <span class="go">.Ping method - - -</span> <span class="go">org.freedesktop.DBus.Properties interface - - -</span> <span class="go">.Get method ss v -</span> <span class="go">.GetAll method s a{sv} -</span> <span class="go">.Set method ssv - -</span> <span class="go">.PropertiesChanged signal sa{sv}as - -</span> </code></pre></div> <p>The sd-bus library automatically added a couple of generic interfaces, as mentioned above. But the first interface we see is actually the one we added! It shows our two methods, and both take "xx" (two 64bit signed integers) as input parameters, and return one "x". Great! But does it work?</p> <div class="highlight"><pre><span></span><code><span class="gp">$ </span>busctl --user call net.poettering.Calculator /net/poettering/Calculator net.poettering.Calculator Multiply xx <span class="m">5</span> <span class="m">7</span> <span class="go">x 35</span> </code></pre></div> <p>Woohoo! We passed the two integers 5 and 7, and the service actually multiplied them for us and returned a single integer 35! Let's try the other method:</p> <div class="highlight"><pre><span></span><code><span class="gp">$ </span>busctl --user call net.poettering.Calculator /net/poettering/Calculator net.poettering.Calculator Divide xx <span class="m">99</span> <span class="m">17</span> <span class="go">x 5</span> </code></pre></div> <p>Oh, wow! It can even do integer division! Fantastic! But let's trick it into dividing by zero:</p> <div class="highlight"><pre><span></span><code><span class="gp">$ </span>busctl --user call net.poettering.Calculator /net/poettering/Calculator net.poettering.Calculator Divide xx <span class="m">43</span> <span class="m">0</span> <span class="go">Sorry, can&#39;t allow division by zero.</span> </code></pre></div> <p>Nice! It detected this nicely and returned a clean error about it. If you look in the source code example above you'll see how precisely we generated the error.</p> <p>And that's really all I have for today. Of course, the examples I showed are short, and I don't get into detail here on what precisely each line does. However, this is supposed to be a short introduction into D-Bus and sd-bus, and it's already way too long for that …</p> <p>I hope this blog story was useful to you. If you are interested in using sd-bus for your own programs, I hope this gets you started. If you have further questions, check the (incomplete) man pages, and inquire us on IRC or the systemd mailing list. If you need more examples, have a look at the systemd source tree, all of systemd's many bus services use sd-bus extensively.</p>Lennart PoetteringFri, 19 Jun 2015 00:00:00 +0200tag:0pointer.net,2015-06-19:/blog/the-new-sd-bus-api-of-systemd.htmlprojectsRevisiting How We Put Together Linux Systemshttps://0pointer.net/blog/revisiting-how-we-put-together-linux-systems.html<p>In a previous blog story I discussed <a href="http://0pointer.net/blog/projects/stateless.html">Factory Reset, Stateless Systems, Reproducible Systems &amp; Verifiable Systems</a>, I now want to take the opportunity to explain a bit where we want to take this with <a href="http://www.freedesktop.org/wiki/Software/systemd/">systemd</a> in the longer run, and what we want to build out of it. This is going to be a longer story, so better grab a cold bottle of <a href="https://en.wikipedia.org/wiki/Club-Mate">Club Mate</a> before you start reading.</p> <p>Traditional Linux distributions are built around packaging systems like RPM or dpkg, and an organization model where upstream developers and downstream packagers are relatively clearly separated: an upstream developer writes code, and puts it somewhere online, in a tarball. A packager than grabs it and turns it into RPMs/DEBs. The user then grabs these RPMs/DEBs and installs them locally on the system. For a variety of uses this is a fantastic scheme: users have a large selection of readily packaged software available, in mostly uniform packaging, from a single source they can trust. In this scheme the distribution vets all software it packages, and as long as the user trusts the distribution all should be good. The distribution takes the responsibility of ensuring the software is not malicious, of timely fixing security problems and helping the user if something is wrong.</p> <h1>Upstream Projects</h1> <p>However, this scheme also has a number of problems, and doesn't fit many use-cases of our software particularly well. Let's have a look at the problems of this scheme for many upstreams:</p> <ul> <li> <p>Upstream software vendors are fully dependent on downstream distributions to package their stuff. It's the downstream distribution that decides on schedules, packaging details, and how to handle support. Often upstream vendors want much faster release cycles then the downstream distributions follow.</p> </li> <li> <p>Realistic testing is extremely unreliable and next to impossible. Since the end-user can run a variety of different package versions together, and expects the software he runs to just work on any combination, the test matrix explodes. If upstream tests its version on distribution X release Y, then there's no guarantee that that's the precise combination of packages that the end user will eventually run. In fact, it is very unlikely that the end user will, since most distributions probably updated a number of libraries the package relies on by the time the package ends up being made available to the user. The fact that each package can be individually updated by the user, and each user can combine library versions, plug-ins and executables relatively freely, results in a high risk of something going wrong.</p> </li> <li> <p>Since there are so many different distributions in so many different versions around, if upstream tries to build and test software for them it needs to do so for a large number of distributions, which is a massive effort.</p> </li> <li> <p>The distributions are actually quite different in many ways. In fact, they are different in a lot of the most basic functionality. For example, the path where to put x86-64 libraries is different on Fedora and Debian derived systems..</p> </li> <li> <p>Developing software for a number of distributions and versions is hard: if you want to do it, you need to actually install them, each one of them, manually, and then build your software for each.</p> </li> <li> <p>Since most downstream distributions have strict licensing and trademark requirements (and rightly so), any kind of closed source software (or otherwise non-free) does not fit into this scheme at all.</p> </li> </ul> <p>This all together makes it really hard for many upstreams to work nicely with the current way how Linux works. Often they try to improve the situation for them, for example by bundling libraries, to make their test and build matrices smaller.</p> <h1>System Vendors</h1> <p>The <em>toolbox</em> approach of classic Linux distributions is fantastic for people who want to put together their individual system, nicely adjusted to exactly what they need. However, this is not really how many of today's Linux systems are built, installed or updated. If you build any kind of embedded device, a server system, or even user systems, you frequently do your work based on complete system images, that are linearly versioned. You build these images somewhere, and then you replicate them atomically to a larger number of systems. On these systems, you don't install or remove packages, you get a defined set of files, and besides installing or updating the system there are no ways how to change the set of tools you get.</p> <p>The current Linux distributions are not particularly good at providing for this major use-case of Linux. Their strict focus on individual packages as well as package managers as end-user install and update tool is incompatible with what many system vendors want.</p> <h1>Users</h1> <p>The classic Linux distribution scheme is frequently not what end users want, either. Many users are used to app markets like Android, Windows or iOS/Mac have. Markets are a platform that doesn't package, build or maintain software like distributions do, but simply allows users to quickly find and download the software they need, with the app vendor responsible for keeping the app updated, secured, and all that on the vendor's release cycle. Users tend to be impatient. They want their software quickly, and the fine distinction between trusting a single distribution or a myriad of app developers individually is usually not important for them. The companies behind the marketplaces usually try to improve this trust problem by providing sand-boxing technologies: as a replacement for the distribution that audits, vets, builds and packages the software and thus allows users to trust it to a certain level, these vendors try to find technical solutions to ensure that the software they offer for download can't be malicious.</p> <h1>Existing Approaches To Fix These Problems</h1> <p>Now, all the issues pointed out above are not new, and there are sometimes quite successful attempts to do something about it. Ubuntu Apps, Docker, Software Collections, ChromeOS, CoreOS all fix part of this problem set, usually with a strict focus on one facet of Linux systems. For example, Ubuntu Apps focus strictly on end user (desktop) applications, and don't care about how we built/update/install the OS itself, or containers. Docker OTOH focuses on containers only, and doesn't care about end-user apps. Software Collections tries to focus on the development environments. ChromeOS focuses on the OS itself, but only for end-user devices. CoreOS also focuses on the OS, but only for server systems.</p> <p>The approaches they find are usually good at specific things, and use a variety of different technologies, on different layers. However, none of these projects tried to fix this problems in a generic way, for all uses, right in the core components of the OS itself.</p> <p>Linux has come to tremendous successes because its kernel is so generic: you can build supercomputers and tiny embedded devices out of it. It's time we come up with a basic, reusable scheme how to solve the problem set described above, that is equally generic.</p> <h1>What We Want</h1> <p>The systemd cabal (Kay Sievers, Harald Hoyer, Daniel Mack, Tom Gundersen, David Herrmann, and yours truly) recently met in Berlin about all these things, and tried to come up with a scheme that is somewhat simple, but tries to solve the issues generically, for all use-cases, as part of the systemd project. All that in a way that is somewhat compatible with the current scheme of distributions, to allow a slow, gradual adoption. Also, and that's something one cannot stress enough: the <em>toolbox</em> scheme of classic Linux distributions is actually a good one, and for many cases the right one. However, we need to make sure we make distributions relevant again for <em>all</em> use-cases, not just those of highly individualized systems.</p> <p>Anyway, so let's summarize what we are trying to do:</p> <ul> <li> <p>We want an efficient way that allows vendors to package their software (regardless if just an app, or the whole OS) directly for the end user, and know the precise combination of libraries and packages it will operate with.</p> </li> <li> <p>We want to allow end users and administrators to install these packages on their systems, regardless which distribution they have installed on it.</p> </li> <li> <p>We want a unified solution that ultimately can cover updates for full systems, OS containers, end user apps, programming ABIs, and more. These updates shall be double-buffered, (at least). This is an absolute necessity if we want to prepare the ground for operating systems that manage themselves, that can update safely without administrator involvement.</p> </li> <li> <p>We want our images to be trustable (i.e. signed). In fact we want a fully trustable OS, with images that can be verified by a full trust chain from the firmware (EFI SecureBoot!), through the boot loader, through the kernel, and initrd. Cryptographically secure verification of the code we execute is relevant on the desktop (like ChromeOS does), but also for apps, for embedded devices and even on servers (in a post-Snowden world, in particular).</p> </li> </ul> <h1>What We Propose</h1> <p>So much about the set of problems, and what we are trying to do. So, now, let's discuss the technical bits we came up with:</p> <p>The scheme we propose is built around the variety of concepts of btrfs and Linux file system name-spacing. btrfs at this point already has a large number of features that fit neatly in our concept, and the maintainers are busy working on a couple of others we want to eventually make use of.</p> <p>As first part of our proposal we make heavy use of btrfs sub-volumes and introduce a clear naming scheme for them. We name snapshots like this:</p> <ul> <li> <p><code>usr:&lt;vendorid&gt;:&lt;architecture&gt;:&lt;version&gt;</code> -- This refers to a full vendor operating system tree. It's basically a /usr tree (and no other directories), in a specific version, with everything you need to boot it up inside it. The <code>&lt;vendorid&gt;</code> field is replaced by some vendor identifier, maybe a scheme like <code>org.fedoraproject.FedoraWorkstation</code>. The <code>&lt;architecture&gt;</code> field specifies a CPU architecture the OS is designed for, for example <code>x86-64</code>. The <code>&lt;version&gt;</code> field specifies a specific OS version, for example <code>23.4</code>. An example sub-volume name could hence look like this: <code>usr:org.fedoraproject.FedoraWorkstation:x86_64:23.4</code></p> </li> <li> <p><code>root:&lt;name&gt;:&lt;vendorid&gt;:&lt;architecture&gt;</code> -- This refers to an <em>instance</em> of an operating system. Its basically a root directory, containing primarily /etc and /var (but possibly more). Sub-volumes of this type do not contain a populated /usr tree though. The <code>&lt;name&gt;</code> field refers to some instance name (maybe the host name of the instance). The other fields are defined as above. An example sub-volume name is <code>root:revolution:org.fedoraproject.FedoraWorkstation:x86_64</code>.</p> </li> <li> <p><code>runtime:&lt;vendorid&gt;:&lt;architecture&gt;:&lt;version&gt;</code> -- This refers to a vendor <em>runtime</em>. A runtime here is supposed to be a set of libraries and other resources that are needed to run apps (for the concept of <em>apps</em> see below), all in a /usr tree. In this regard this is very similar to the <code>usr</code> sub-volumes explained above, however, while a <code>usr</code> sub-volume is a full OS and contains everything necessary to boot, a runtime is really only a set of libraries. You cannot boot it, but you can run apps with it. An example sub-volume name is: <code>runtime:org.gnome.GNOME3_20:x86_64:3.20.1</code></p> </li> <li> <p><code>framework:&lt;vendorid&gt;:&lt;architecture&gt;:&lt;version&gt;</code> -- This is very similar to a vendor runtime, as described above, it contains just a /usr tree, but goes one step further: it additionally contains all development headers, compilers and build tools, that allow developing against a specific runtime. For each runtime there should be a framework. When you develop against a specific framework in a specific architecture, then the resulting app will be compatible with the runtime of the same vendor ID and architecture. Example: <code>framework:org.gnome.GNOME3_20:x86_64:3.20.1</code></p> </li> <li> <p><code>app:&lt;vendorid&gt;:&lt;runtime&gt;:&lt;architecture&gt;:&lt;version&gt;</code> -- This encapsulates an application bundle. It contains a tree that at runtime is mounted to <code>/opt/&lt;vendorid&gt;</code>, and contains all the application's resources. The <code>&lt;vendorid&gt;</code> could be a string like <code>org.libreoffice.LibreOffice</code>, the <code>&lt;runtime&gt;</code> refers to one the vendor id of one specific runtime the application is built for, for example <code>org.gnome.GNOME3_20:3.20.1</code>. The <code>&lt;architecture&gt;</code> and <code>&lt;version&gt;</code> refer to the architecture the application is built for, and of course its version. Example: <code>app:org.libreoffice.LibreOffice:GNOME3_20:x86_64:133</code></p> </li> <li> <p><code>home:&lt;user&gt;:&lt;uid&gt;:&lt;gid&gt;</code> -- This sub-volume shall refer to the home directory of the specific user. The <code>&lt;user&gt;</code> field contains the user name, the <code>&lt;uid&gt;</code> and <code>&lt;gid&gt;</code> fields the numeric Unix UIDs and GIDs of the user. The idea here is that in the long run the list of sub-volumes is sufficient as a user database (but see below). Example: <code>home:lennart:1000:1000</code>.</p> </li> </ul> <p>btrfs partitions that adhere to this naming scheme should be clearly identifiable. It is our intention to introduce a new GPT partition type ID for this.</p> <h1>How To Use It</h1> <p>After we introduced this naming scheme let's see what we can build of this:</p> <ul> <li> <p>When booting up a system we mount the root directory from one of the <code>root</code> sub-volumes, and then mount /usr from a matching <code>usr</code> sub-volume. <em>Matching</em> here means it carries the same <code>&lt;vendor-id&gt;</code> and <code>&lt;architecture&gt;</code>. Of course, by default we should pick the matching <code>usr</code> sub-volume with the newest version by default.</p> </li> <li> <p>When we boot up an OS container, we do exactly the same as the when we boot up a regular system: we simply combine a <code>usr</code> sub-volume with a <code>root</code> sub-volume.</p> </li> <li> <p>When we enumerate the system's users we simply go through the list of <code>home</code> snapshots.</p> </li> <li> <p>When a user authenticates and logs in we mount his home directory from his snapshot.</p> </li> <li> <p>When an app is run, we set up a new file system name-space, mount the <code>app</code> sub-volume to <code>/opt/&lt;vendorid&gt;/</code>, and the appropriate <code>runtime</code> sub-volume the app picked to <code>/usr</code>, as well as the user's <code>/home/$USER</code> to its place.</p> </li> <li> <p>When a developer wants to develop against a specific runtime he installs the right framework, and then temporarily transitions into a name space where <code>/usr</code>is mounted from the framework sub-volume, and <code>/home/$USER</code> from his own home directory. In this name space he then runs his build commands. He can build in multiple name spaces at the same time, if he intends to builds software for multiple runtimes or architectures at the same time.</p> </li> </ul> <p>Instantiating a new system or OS container (which is exactly the same in this scheme) just consists of creating a new appropriately named <code>root</code> sub-volume. Completely naturally you can share one vendor OS copy in one specific version with a multitude of container instances.</p> <p>Everything is <em>double-buffered</em> (or actually, n-fold-buffered), because <code>usr</code>, <code>runtime</code>, <code>framework</code>, <code>app</code> sub-volumes can exist in multiple versions. Of course, by default the execution logic should always pick the newest release of each sub-volume, but it is up to the user keep multiple versions around, and possibly execute older versions, if he desires to do so. In fact, like on ChromeOS this could even be handled automatically: if a system fails to boot with a newer snapshot, the boot loader can automatically revert back to an older version of the OS.</p> <h1>An Example</h1> <p>Note that in result this allows installing not only multiple end-user applications into the same btrfs volume, but also multiple operating systems, multiple system instances, multiple runtimes, multiple frameworks. Or to spell this out in an example:</p> <p>Let's say Fedora, Mageia and ArchLinux all implement this scheme, and provide ready-made end-user images. Also, the GNOME, KDE, SDL projects all define a runtime+framework to develop against. Finally, both LibreOffice and Firefox provide their stuff according to this scheme. You can now trivially install of these into the same btrfs volume:</p> <ul> <li>usr:org.fedoraproject.WorkStation:x86_64:24.7</li> <li>usr:org.fedoraproject.WorkStation:x86_64:24.8</li> <li>usr:org.fedoraproject.WorkStation:x86_64:24.9</li> <li>usr:org.fedoraproject.WorkStation:x86_64:25beta</li> <li>usr:org.mageia.Client:i386:39.3</li> <li>usr:org.mageia.Client:i386:39.4</li> <li>usr:org.mageia.Client:i386:39.6</li> <li>usr:org.archlinux.Desktop:x86_64:302.7.8</li> <li>usr:org.archlinux.Desktop:x86_64:302.7.9</li> <li>usr:org.archlinux.Desktop:x86_64:302.7.10</li> <li>root:revolution:org.fedoraproject.WorkStation:x86_64</li> <li>root:testmachine:org.fedoraproject.WorkStation:x86_64</li> <li>root:foo:org.mageia.Client:i386</li> <li>root:bar:org.archlinux.Desktop:x86_64</li> <li>runtime:org.gnome.GNOME3_20:x86_64:3.20.1</li> <li>runtime:org.gnome.GNOME3_20:x86_64:3.20.4</li> <li>runtime:org.gnome.GNOME3_20:x86_64:3.20.5</li> <li>runtime:org.gnome.GNOME3_22:x86_64:3.22.0</li> <li>runtime:org.kde.KDE5_6:x86_64:5.6.0</li> <li>framework:org.gnome.GNOME3_22:x86_64:3.22.0</li> <li>framework:org.kde.KDE5_6:x86_64:5.6.0</li> <li>app:org.libreoffice.LibreOffice:GNOME3_20:x86_64:133</li> <li>app:org.libreoffice.LibreOffice:GNOME3_22:x86_64:166</li> <li>app:org.mozilla.Firefox:GNOME3_20:x86_64:39</li> <li>app:org.mozilla.Firefox:GNOME3_20:x86_64:40</li> <li>home:lennart:1000:1000</li> <li>home:hrundivbakshi:1001:1001</li> </ul> <p>In the example above, we have three vendor operating systems installed. All of them in three versions, and one even in a beta version. We have four system instances around. Two of them of Fedora, maybe one of them we usually boot from, the other we run for very specific purposes in an OS container. We also have the runtimes for two GNOME releases in multiple versions, plus one for KDE. Then, we have the development trees for one version of KDE and GNOME around, as well as two apps, that make use of two releases of the GNOME runtime. Finally, we have the home directories of two users.</p> <p>Now, with the name-spacing concepts we introduced above, we can actually relatively freely mix and match apps and OSes, or develop against specific frameworks in specific versions on any operating system. It doesn't matter if you booted your ArchLinux instance, or your Fedora one, you can execute both LibreOffice and Firefox just fine, because at execution time they get matched up with the right runtime, and all of them are available from all the operating systems you installed. You get the precise runtime that the upstream vendor of Firefox/LibreOffice did their testing with. It doesn't matter anymore which distribution you run, and which distribution the vendor prefers.</p> <p>Also, given that the user database is actually encoded in the sub-volume list, it doesn't matter which system you boot, the distribution should be able to find your local users automatically, without any configuration in /etc/passwd.</p> <h1>Building Blocks</h1> <p>With this naming scheme plus the way how we can combine them on execution we already came quite far, but how do we actually get these sub-volumes onto the final machines, and how do we update them? Well, btrfs has a feature they call "send-and-receive". It basically allows you to "diff" two file system versions, and generate a binary delta. You can generate these deltas on a developer's machine and then push them into the user's system, and he'll get the exact same sub-volume too. This is how we envision installation and updating of operating systems, applications, runtimes, frameworks. At installation time, we simply deserialize an initial send-and-receive delta into our btrfs volume, and later, when a new version is released we just add in the few bits that are new, by dropping in another send-and-receive delta under a new sub-volume name. And we do it exactly the same for the OS itself, for a runtime, a framework or an app. There's no technical distinction anymore. The underlying operation for installing apps, runtime, frameworks, vendor OSes, as well as the operation for updating them is done the exact same way for all.</p> <p>Of course, keeping multiple full /usr trees around sounds like an awful lot of waste, after all they will contain a lot of very similar data, since a lot of resources are shared between distributions, frameworks and runtimes. However, thankfully btrfs actually is able to de-duplicate this for us. If we add in a new app snapshot, this simply adds in the new files that changed. Moreover different runtimes and operating systems might actually end up sharing the same tree.</p> <p>Even though the example above focuses primarily on the end-user, desktop side of things, the concept is also extremely powerful in server scenarios. For example, it is easy to build your own <code>usr</code> trees and deliver them to your hosts using this scheme. The <code>usr</code> sub-volumes are supposed to be something that administrators can put together. After deserializing them into a couple of hosts, you can trivially instantiate them as OS containers there, simply by adding a new <code>root</code> sub-volume for each instance, referencing the <code>usr</code> tree you just put together. Instantiating OS containers hence becomes as easy as creating a new btrfs sub-volume. And you can still update the images nicely, get fully double-buffered updates and everything.</p> <p>And of course, this scheme also applies great to embedded use-cases. Regardless if you build a TV, an IVI system or a phone: you can put together you OS versions as <code>usr</code> trees, and then use btrfs-send-and-receive facilities to deliver them to the systems, and update them there.</p> <p>Many people when they hear the word "btrfs" instantly reply with "is it ready yet?". Thankfully, most of the functionality we really need here is strictly read-only. With the exception of the <code>home</code> sub-volumes (see below) all snapshots are strictly read-only, and are delivered as immutable vendor trees onto the devices. They never are changed. Even if btrfs might still be immature, for this kind of read-only logic it should be more than good enough.</p> <p>Note that this scheme also enables doing <em>fat</em> systems: for example, an installer image could include a Fedora version compiled for x86-64, one for i386, one for ARM, all in the same btrfs volume. Due to btrfs' de-duplication they will share as much as possible, and when the image is booted up the right sub-volume is automatically picked. Something similar of course applies to the apps too!</p> <p>This also allows us to implement something that we like to call <em>Operating-System-As-A-Virus</em>. Installing a new system is little more than:</p> <ul> <li>Creating a new GPT partition table</li> <li>Adding an EFI System Partition (FAT) to it</li> <li>Adding a new btrfs volume to it</li> <li>Deserializing a single <code>usr</code> sub-volume into the btrfs volume</li> <li>Installing a boot loader into the EFI System Partition</li> <li>Rebooting</li> </ul> <p>Now, since the only real vendor data you need is the <code>usr</code> sub-volume, you can trivially duplicate this onto any block device you want. Let's say you are a happy Fedora user, and you want to provide a friend with his own installation of this awesome system, all on a USB stick. All you have to do for this is do the steps above, using your installed <code>usr</code> tree as source to copy. And there you go! And you don't have to be afraid that any of your personal data is copied too, as the <code>usr</code> sub-volume is the exact version your vendor provided you with. Or with other words: there's no distinction anymore between installer images and installed systems. It's all the same. Installation becomes replication, not more. Live-CDs and installed systems can be fully identical.</p> <p>Note that in this design apps are actually developed against a single, very specific runtime, that contains all libraries it can link against (including a specific glibc version!). Any library that is not included in the runtime the developer picked must be included in the app itself. This is similar how apps on Android declare one very specific Android version they are developed against. This greatly simplifies application installation, as there's no dependency hell: each app pulls in one runtime, and the app is actually free to pick which one, as you can have multiple installed, though only one is used by each app.</p> <p>Also note that operating systems built this way will never see "half-updated" systems, as it is common when a system is updated using RPM/dpkg. When updating the system the code will either run the old or the new version, but it will never see part of the old files and part of the new files. This is the same for apps, runtimes, and frameworks, too.</p> <h1>Where We Are Now</h1> <p>We are currently working on a lot of the groundwork necessary for this. This scheme relies on the ability to monopolize the vendor OS resources in /usr, which is the key of what I described in <a href="http://0pointer.net/blog/projects/stateless.html">Factory Reset, Stateless Systems, Reproducible Systems &amp; Verifiable Systems</a> a few weeks back. Then, of course, for the full desktop app concept we need a strong sandbox, that does more than just hiding files from the file system view. After all with an app concept like the above the primary interfacing between the executed desktop apps and the rest of the system is via IPC (which is why we work on kdbus and teach it all kinds of sand-boxing features), and the kernel itself. Harald Hoyer has started working on generating the btrfs send-and-receive images based on Fedora.</p> <p>Getting to the full scheme will take a while. Currently we have many of the building blocks ready, but some major items are missing. For example, we push quite a few problems into btrfs, that other solutions try to solve in user space. One of them is actually signing/verification of images. The btrfs maintainers are working on adding this to the code base, but currently nothing exists. This functionality is essential though to come to a fully verified system where a trust chain exists all the way from the firmware to the apps. Also, to make the <code>home</code> sub-volume scheme fully workable we actually need encrypted sub-volumes, so that the sub-volume's pass-phrase can be used for authenticating users in PAM. This doesn't exist either.</p> <p>Working towards this scheme is a gradual process. Many of the steps we require for this are useful outside of the grand scheme though, which means we can slowly work towards the goal, and our users can already take benefit of what we are working on as we go.</p> <p>Also, and most importantly, this is not really a departure from traditional operating systems:</p> <p>Each app, each OS and each app sees a traditional Unix hierarchy with /usr, /home, /opt, /var, /etc. It executes in an environment that is pretty much identical to how it would be run on traditional systems.</p> <p>There's no need to fully move to a system that uses only btrfs and follows strictly this sub-volume scheme. For example, we intend to provide implicit support for systems that are installed on ext4 or xfs, or that are put together with traditional packaging tools such as RPM or dpkg: if the the user tries to install a runtime/app/framework/os image on a system that doesn't use btrfs so far, it can just create a loop-back btrfs image in /var, and push the data into that. Even us developers will run our stuff like this for a while, after all this new scheme is not particularly useful for highly individualized systems, and we developers usually tend to run systems like that.</p> <p>Also note that this in no way a departure from packaging systems like RPM or DEB. Even if the new scheme we propose is used for installing and updating a specific system, it is RPM/DEB that is used to put together the vendor OS tree initially. Hence, even in this scheme RPM/DEB are highly relevant, though not strictly as an end-user tool anymore, but as a build tool.</p> <h1>So Let's Summarize Again What We Propose</h1> <ul> <li> <p>We want a unified scheme, how we can install and update OS images, user apps, runtimes and frameworks.</p> </li> <li> <p>We want a unified scheme how you can relatively freely mix OS images, apps, runtimes and frameworks on the same system.</p> </li> <li> <p>We want a fully trusted system, where cryptographic verification of all executed code can be done, all the way to the firmware, as standard feature of the system.</p> </li> <li> <p>We want to allow app vendors to write their programs against very specific frameworks, under the knowledge that they will end up being executed with the exact same set of libraries chosen.</p> </li> <li> <p>We want to allow parallel installation of multiple OSes and versions of them, multiple runtimes in multiple versions, as well as multiple frameworks in multiple versions. And of course, multiple apps in multiple versions.</p> </li> <li> <p>We want everything <em>double buffered</em> (or actually n-fold buffered), to ensure we can reliably update/rollback versions, in particular to safely do automatic updates.</p> </li> <li> <p>We want a system where updating a runtime, OS, framework, or OS container is as simple as adding in a new snapshot and restarting the runtime/OS/framework/OS container.</p> </li> <li> <p>We want a system where we can easily instantiate a number of OS instances from a single vendor tree, with zero difference for doing this on order to be able to boot it on bare metal/VM or as a container.</p> </li> <li> <p>We want to enable Linux to have an open scheme that people can use to build app markets and similar schemes, not restricted to a specific vendor.</p> </li> </ul> <h1>Final Words</h1> <p>I'll be talking about this at LinuxCon Europe in October. I originally intended to discuss this at the Linux Plumbers Conference (which I assumed was the right forum for this kind of major plumbing level improvement), and at linux.conf.au, but there was no interest in my session submissions there...</p> <p>Of course this is all work in progress. These are our current ideas we are working towards. As we progress we will likely change a number of things. For example, the precise naming of the sub-volumes might look very different in the end.</p> <p>Of course, we are developers of the systemd project. Implementing this scheme is not just a job for the systemd developers. This is a reinvention how distributions work, and hence needs great support from the distributions. We really hope we can trigger some interest by publishing this proposal now, to get the distributions on board. This after all is explicitly not supposed to be a solution for one specific project and one specific vendor product, we care about making this open, and solving it for the generic case, without cutting corners.</p> <p>If you have any questions about this, you know how you can reach us (IRC, mail, G+, ...).</p> <p>The future is going to be awesome!</p>Lennart PoetteringMon, 01 Sep 2014 00:00:00 +0200tag:0pointer.net,2014-09-01:/blog/revisiting-how-we-put-together-linux-systems.htmlprojectsFUDCON + GNOME.Asia Beijing 2014https://0pointer.net/blog/projects/fudcon-gnomeasia.html <p>Thanks to the funding from FUDCON I had the chance to attend and keynote at the combined <a href="https://fedoraproject.org/wiki/FUDCon:Beijing_2014">FUDCON Beijing 2014</a> and <a href="http://2014.gnome.asia/">GNOME.Asia 2014</a> conference in Beijing, China.</p> <p>My talk was about systemd's present and future, what we achieved and where we are going. In my talk I tried to explain a bit where we are coming from, and how we changed focus from being purely an init system, to more being a set of basic building blocks to build an OS from. Most of the talk I talked about where we still intend to take systemd, which areas we believe should be covered by systemd, and of course, also the always difficult question, on where to draw the line and what clearly is outside of the focus of systemd. The slides of my talk you <a href="http://0pointer.de/public/gnomeasia2014.pdf">find online</a>. (No video recording I am aware of, sorry.)</p> <p>The combined conferences were a lot of fun, and as usual, the best discussions I had in the hallway track, discussing Linux and systemd.</p> <p>A number of pictures of the conference are <a href="https://plus.google.com/events/gallery/cqsjvgg7o125tkli6up5d60f83g">now online</a>. Enjoy!</p> <p>After the conference I stayed for a few more days in Beijing, doing a bit of sightseeing. What a fantastic city! The food was amazing, we tried all kinds of fantastic stuff, from Peking duck, to Bullfrog Sechuan style. Yummy. And one of those days I am sure I will find the time to actually sort my photos and put them online, too.</p> <p>I am really looking forward to the next FUDCON/GNOME.Asia!</p> Lennart PoetteringFri, 04 Jul 2014 18:43:00 +0200tag:0pointer.net,2014-07-04:/blog/projects/fudcon-gnomeasia.htmlprojectsFactory Reset, Stateless Systems, Reproducible Systems & Verifiable Systemshttps://0pointer.net/blog/projects/stateless.html <p><small>(Just a small heads-up: I don't blog as much as I used to, I nowadays update my <a href="https://plus.google.com/u/0/+LennartPoetteringTheOneAndOnly/posts">Google+ page</a> a lot more frequently. You might want to subscribe that if you are interested in more frequent technical updates on what we are working on.)</small></p> <p>In the past weeks we have been working on a couple of features for <a href="http://www.freedesktop.org/wiki/Software/systemd/">systemd</a> that enable a number of new usecases I'd like to shed some light on. Taking benefit of the <a href="http://www.freedesktop.org/wiki/Software/systemd/TheCaseForTheUsrMerge/"><tt>/usr</tt> merge</a> that a number of distributions have completed we want to bring runtime behaviour of Linux systems to the next level. With the <tt>/usr</tt> merge completed most static vendor-supplied OS data is found exclusively in <tt>/usr</tt>, only a few additional bits in <tt>/var</tt> and <tt>/etc</tt> are necessary to make a system boot. On this we can build to enable a couple of new features:</p> <ol> <li>A mechanism we call <i>Factory Reset</i> shall flush out <tt>/etc</tt> and <tt>/var</tt>, but keep the vendor-supplied <tt>/usr</tt>, bringing the system back into a well-defined, pristine vendor state with no local state or configuration. This functionality is useful across the board from servers, to desktops, to embedded devices.</li> <li>A <i>Stateless System</i> goes one step further: a system like this never stores <tt>/etc</tt> or <tt>/var</tt> on persistent storage, but always comes up with pristine vendor state. On systems like this every reboot acts as factor reset. This functionality is particularly useful for simple containers or systems that boot off the network or read-only media, and receive all configuration they need during runtime from vendor packages or protocols like DHCP or are capable of discovering their parameters automatically from the available hardware or periphery.</li> <li><i>Reproducible Systems</i> multiply a vendor image into many containers or systems. Only local configuration or state is stored per-system, while the vendor operating system is pulled in from the same, immutable, shared snapshot. Each system hence has its private <tt>/etc</tt> and <tt>/var</tt> for receiving local configuration, however the OS tree in <tt>/usr</tt> is pulled in via bind mounts (in case of containers) or technologies like NFS (in case of physical systems), or btrfs snapshots from a <i>golden master</i> image. This is particular interesting for containers where the goal is to run thousands of container images from the same OS tree. However, it also has a number of other usecases, for example thin client systems, which can boot the same NFS share a number of times. Furthermore this mechanism is useful to implement very simple OS installers, that simply unserialize a <tt>/usr</tt> snapshot into a file system, install a boot loader, and reboot.</li> <li><i>Verifiable Systems</i> are closely related to stateless systems: if the underlying storage technology can cryptographically ensure that the vendor-supplied OS is trusted and in a consistent state, then it must be made sure that <tt>/etc</tt> or <tt>/var</tt> are either included in the OS image, or simply unnecessary for booting.</li> </ol> <h3>Concepts</h3> <p>A number of Linux-based operating systems have tried to implement some of the schemes described out above in one way or another. Particularly interesting are <a href="https://wiki.gnome.org/Projects/OSTree">GNOME's OSTree</a>, <a href="https://coreos.com/">CoreOS</a> and Google's Android and ChromeOS. They generally found different solutions for the specific problems you have when implementing schemes like this, sometimes taking shortcuts that keep only the specific case in mind, and cannot cover the general purpose. With systemd now being at the core of so many distributions and deeply involved in bringing up and maintaining the system we came to the conclusion that we should attempt to add generic support for setups like this to systemd itself, to open this up for the general purpose distributions to build on. We decided to focus on three kinds of systems:</p> <ol> <li>The <i>stateful</i> system, the traditional system as we know it with machine-specific <tt>/etc</tt>, <tt>/usr</tt> and <tt>/var</tt>, all properly populated.</li> <li>Startup without a populated <tt>/var</tt>, but with configured <tt>/etc</tt>. (We will call these <i>volatile</i> systems.)</li> <li>Startup without either <tt>/etc</tt> or <tt>/var</tt>. (We will call these <i>stateless</i> systems.)</li> </ol> <p>A factory reset is just a special case of the latter two modes, where the system boots up without <tt>/var</tt> and <tt>/etc</tt> but the next boot is a normal stateful boot like like the first described mode. Note that a mode where <tt>/etc</tt> is flushed, but <tt>/var</tt> is not is nothing we intend to cover (why? well, the user ID question becomes much harder, see below, and we simply saw no usecase for it worth the trouble).</p> <h4>Problems</h4> <p>Booting up a system without a populated <tt>/var</tt> is relatively straight-forward. With <a href="http://cgit.freedesktop.org/systemd/systemd/plain/tmpfiles.d/var.conf">a few lines of tmpfiles configuration</a> it is possible to populate <tt>/var</tt> with its basic structure in a way that is sufficient to make a system boot cleanly. systemd version 214 and newer ship with support for this. Of course, support for this scheme in systemd is only a small part of the solution. While a lot of software reconstructs the directory hierarchy it needs in <tt>/var</tt> automatically, many software does not. In case like this it is necessary to ship a couple of additional tmpfiles lines that setup up at boot-time the necessary files or directories in <tt>/var</tt> to make the software operate, similar to what RPM or DEB packages would set up at installation time.</p> <p>Booting up a system without a populated <tt>/etc</tt> is a more difficult task. In <tt>/etc</tt> we have a lot of configuration bits that are essential for the system to operate, for example and most importantly system user and group information in <tt>/etc/passwd</tt> and <tt>/etc/group</tt>. If the system boots up without <tt>/etc</tt> there must be a way to replicate the minimal information necessary in it, so that the system manages to boot up fully.</p> <p>To make this even more complex, in order to support "offline" updates of <tt>/usr</tt> that are replicated into a number of systems possessing private <tt>/etc</tt> and <tt>/var</tt> there needs to be a way how these directories can be upgraded transparently when necessary, for example by recreating caches like <tt>/etc/ld.so.cache</tt> or adding missing system users to <tt>/etc/passwd</tt> on next reboot.</p> <p>Starting with systemd 215 (yet unreleased, as I type this) we will ship with a number of features in systemd that make <tt>/etc</tt>-less boots functional:</p> <ul> <li><p>A new tool <tt>systemd-sysusers</tt> as been added. It introduces a new drop-in directory <tt>/usr/lib/sysusers.d/</tt>. Minimal descriptions of necessary system users and groups can be placed there. Whenever the tool is invoked it will create these users in <tt>/etc/passwd</tt> and <tt>/etc/group</tt> should they be missing. It is only suitable for creating system users and groups, not for normal users. It will write to the files directly via the appropriate glibc APIs, which is the right thing to do for system users. (For normal users no such APIs exist, as the users might be stored centrally on LDAP or suchlike, and they are out of focus for our usecase.) The major benefit of this tool is that system user definition can happen offline: a package simply has to drop in a new file to register a user. This makes system user registration <i>declarative</i> instead of <i>imperative</i> -- which is the way how system users are traditionally created from RPM or DEB installation scripts. By being declarative it is easy to replicate the users on next boot to a number of system instances.</p> <p>To make this new tool interesting for packaging scripts we make it easy to alternatively invoke it during package installation time, thus being a good alternative to invocations of <tt>useradd -r</tt> and <tt>groupadd -r</tt>.</p> <p>Some OS designs use a static, fixed user/group list stored in <tt>/usr</tt> as primary database for users/groups, which fixed UID/GID mappings. While this works for specific systems, this cannot cover the general purpose. As the UID/GID range for system users/groups is very small (only containing 998 users and groups on most systems), the best has to be made from this space and only UIDs/GIDs necessary on the specific system should be allocated. This means allocation has to be dynamic and adjust to what is necessary.</p> <p>Also note that this tool has one very nice feature: in addition to fully dynamic, and fully static UID/GID assignment for the users to create, it supports reading UID/GID numbers off existing files in <tt>/usr</tt>, so that vendors can make use of setuid/setgid binaries owned by specific users.</p></li> <li>We also added a <a href="http://cgit.freedesktop.org/systemd/systemd/plain/sysusers.d/systemd.conf.in">default user definition list</a> which creates the most basic users the system and systemd need. Of course, very likely downstream distributions might need to alter this default list, add new entries and possibly map specific users to particular numeric UIDs.</li> <li>A new condition <tt>ConditionNeedsUpdate=</tt> has been added. With this mechanism it is possible to conditionalize execution of services depending on whether <tt>/usr</tt> is newer than <tt>/etc</tt> or <tt>/var</tt>. The idea is that various services that need to be added into the boot process on upgrades make use of this to not delay boot-ups on normal boots, but run as necessary should <tt>/usr</tt> have been update since the last boot. This is implemented based on the <tt>mtime</tt> timestamp of the <tt>/usr</tt>: if the OS has been updated the packaging software should <i>touch</i> the directory, thus informing all instances that an upgrade of <tt>/etc</tt> and <tt>/var</tt> might be necessary.</li> <li>We added a number of service files, that make use of the new <tt>ConditionNeedsUpdate=</tt> switch, and run a couple of services after each update. Among them are the aforementiond <tt>systemd-sysusers</tt> tool, as well as services that rebuild the udev hardware database, the journal catalog database and the library cache in <tt>/etc/ld.so.cache</tt>.</li> <li>If systemd detects an empty <tt>/etc</tt> at early boot it will now use the <a href="http://www.freedesktop.org/software/systemd/man/systemd.preset.html">unit preset</a> information to enable all services by default that the vendor or packager declared. It will then proceed booting.</li> <li>We added <a href="http://cgit.freedesktop.org/systemd/systemd/plain/tmpfiles.d/etc.conf">a new tmpfiles snippet</a> that is able to reconstruct the most basic structure of <tt>/etc</tt> if it is missing.</li> <li>tmpfiles also gained the ability copy entire directory trees into place should they be missing. This is particularly useful for copying certain essential files or directories into <tt>/etc</tt> without which the system refuses to boot. Currently the most prominent candidates for this are <tt>/etc/pam.d</tt> and <tt>/etc/dbus-1</tt>. In the long run we hope that packages can be fixed so that they always work correctly without configuration in <tt>/etc</tt>. Depending on the software this means that they should come with compiled-in defaults that just work should their configuration file be missing, or that they should fall back to static vendor-supplied configuration in <tt>/usr</tt> that is used whenever <tt>/etc</tt> doesn't have any configuration. Both the PAM and the D-Bus case are probably candidates for the latter. Given that there are probably many cases like this we are working with a number of folks to introduce a new directory called <tt>/usr/share/etc</tt> (name is not settled yet) to major distributions, that always contain the full, original, vendor-supplied configuration of all packages. This is very useful here, so that there's an obvious place to copy the original configuration from, but it is also useful completely independently as this provides administrators with an easy place to <tt>diff</tt> their own configuration in <tt>/etc</tt> against to see what local changes are in place.</li> <li><p>We added a new <tt>--tmpfs=</tt> switch to <tt>systemd-nspawn</tt> to make testing of systems with unpopulated <tt>/etc</tt> and <tt>/var</tt> easy. For example, to run a fully state-less container, use a command line like this:</p> <pre># system-nspawn -D /srv/mycontainer --read-only --tmpfs=/var --tmpfs=/etc -b</pre> <p>This command line will invoke the container tree stored in <tt>/srv/mycontainer</tt> in a read-only way, but with a (writable) tmpfs mounted to <tt>/var</tt> and <tt>/etc</tt>. With a very recent git snapshot of systemd invoking a Fedora rawhide system should mostly work OK, modulo the D-Bus and PAM problems mentioned above. A later version of <tt>systemd-nspawn</tt> is likely to gain a high-level switch <tt>--mode={stateful|volatile|stateless}</tt> that sets combines this into simple switches reusing the vocabulary introduced earlier.</p></li> </ul> <h3>What's Next</h3> <p>Pulling this all together we are very close to making boots with empty <tt>/etc</tt> and <tt>/var</tt> on general purpose Linux operating systems a reality. Of course, while doing the groundwork in systemd gets us some distance, there's a lot of work left. Most importantly: the majority of Linux packages are simply incomptible with this scheme the way they are currently set up. They do not work without configuration in <tt>/etc</tt> or state directories in <tt>/var</tt>; they do not drop system user information in <tt>/usr/lib/sysusers.d</tt>. However, we believe it's our job to do the groundwork, and to start somewhere.</p> <p>So what does this mean for the next steps? Of course, currently very little of this is available in any distribution (simply already because 215 isn't even released yet). However, this will hopefully change quickly. As soon as that is accomplished we can start working on making the other components of the OS work nicely in this scheme. If you are an upstream developer, please consider making your software work correctly if <tt>/etc</tt> and/or <tt>/var</tt> are not populated. This means:</p> <ul> <li>When you need a state directory in <tt>/var</tt> and it is missing, create it first. If you cannot do that, because you dropped priviliges or suchlike, please consider dropping in a tmpfiles snippet that creates the directory with the right permissions early at boot, should it be missing.</li> <li>When you need configuration files in <tt>/etc</tt> to work properly, consider changing your application to work nicely when these files are missing, and automatically fall back to either built-in defaults, or to static vendor-supplied configuration files shipped in <tt>/usr</tt>, so that administrators can override configuration in <tt>/etc</tt> but if they don't the default configuration counts.</li> <li>When you need a system user or group, consider dropping in a file into <tt>/usr/lib/sysusers.d</tt> describing the users. (Currently documentation on this is minimal, we will provide more docs on this shortly.)</li> </ul> <p>If you are a packager, you can also help on making this all work:</p> <ul> <li>Ask upstream to implement what we describe above, possibly even preparing a patch for this.</li> <li>If upstream will not make these changes, then consider dropping in tmpfiles snippets that copy the bare minimum of configuration files to make your software work from somewhere in <tt>/usr</tt> into <tt>/etc</tt>.</li> <li>Consider moving from imperative <tt>useradd</tt> commands in packaging scripts, to declarative <tt>sysusers</tt> files. Ideally, this is shipped upstream too, but if that's not possible then simply adding this to packages should be good enough.</li> </ul> <p>Of course, before moving to declarative system user definitions you should consult with your distribution whether their packaging policy even allows that. Currently, most distributions will not, so we have to work to get this changed first.</p> <p>Anyway, so much about what we have been working on and where we want to take this.</p> <h4>Conclusion</h4> <p>Before we finish, let me stress again why we are doing all this:</p> <ol> <li>For end-user machines like desktops, tablets or mobile phones, we want a generic way to implement factory reset, which the user can make use of when the system is broken (saves you support costs), or when he wants to sell it and get rid of his private data, and renew that "fresh car smell".</li> <li>For embedded machines we want a generic way how to reset devices. We also want a way how every single boot can be identical to a factory reset, in a stateless system design.</li> <li>For all kinds of systems we want to centralize vendor data in <tt>/usr</tt> so that it can be strictly read-only, and fully cryptographically verified as one unit.</li> <li>We want to enable new kinds of OS installers that simply deserialize a vendor OS <tt>/usr</tt> snapshot into a new file system, install a boot loader and reboot, leaving all first-time configuration to the next boot.</li> <li>We want to enable new kinds of OS updaters that build on this, and manage a number of vendor OS <tt>/usr</tt> snapshots in verified states, and which can then update <tt>/etc</tt> and <tt>/var</tt> simply by rebooting into a newer version.</li> <li>We wanto to scale container setups naturally, by sharing a single <i>golden master</i> <tt>/usr</tt> tree with a large number of instances that simply maintain their own private <tt>/etc</tt> and <tt>/var</tt> for their private configuration and state, while still allowing clean updates of <tt>/usr</tt>.</li> <li>We want to make thin clients that share <tt>/usr</tt> across the network work by allowing stateless bootups. During all discussions on how <tt>/usr</tt> was to be organized this was fequently mentioned. A setup like this so far only worked in very specific cases, with this scheme we want to make this work in general case.</li> </ol> <p>Of course, we have no illusions, just doing the groundwork for all of this in systemd doesn't make this all a real-life solution yet. Also, it's very unlikely that all of Fedora (or any other general purpose distribution) will support this scheme for all its packages soon, however, we are quite confident that the idea is convincing, that we need to start somewhere, and that getting the most core packages adapted to this shouldn't be out of reach.</p> <p>Oh, and of course, the concepts behind this are really not new, we know that. However, what's new here is that we try to make them available in a general purpose OS core, instead of special purpose systems.</p> <p>Anyway, let's get the ball rolling! Late's make stateless systems a reality!</p> <p>And that's all I have for now. I am sure this leaves a lot of questions open. If you have any, join us on IRC on <tt>#systemd</tt> on freenode or comment on <a href="https://plus.google.com/+LennartPoetteringTheOneAndOnly/posts/hT4jsCkmQzv">Google+</a>.</p> Lennart PoetteringTue, 17 Jun 2014 18:13:00 +0200tag:0pointer.net,2014-06-17:/blog/projects/stateless.htmlprojectsUpcoming Eventshttps://0pointer.net/blog/projects/dates.html <p>You are invited to three events:</p> <p>Christoph Wickert set up a <a href="https://plus.google.com/events/cgbotu8inedql8qlecjo3a6glk8">Fedora 19 Release Party</a> here in Berlin! Please join us on <b>Tuesday, July 2nd</b>.</p> <p>We'll have another <a href="https://plus.google.com/events/ck4p957u79bgm3jeiq8meh1b2ns">Berlin Open Source Meetup</a> on <b>Sunday, July 14th</b>.</p> <p>And finally, theres' going to be another <a href="https://plus.google.com/events/cb1urr7jt5p4voutfelci14c5qc">systemd Hackfest</a>, this time colocated with <a href="https://www.guadec.org/">GUADEC</a>, on <b>Tuesday/Wednesday, August 6th/7th</b>.</p> <p>See you soon!</p> Lennart PoetteringMon, 01 Jul 2013 01:04:00 +0200tag:0pointer.net,2013-07-01:/blog/projects/dates.htmlprojectsGNOME.Asia and LinuxCon Japanhttps://0pointer.net/blog/projects/asia-2013.html <p>Two weeks ago I attended GNOME.Asia/Seoul and LinuxCon Japan/Tokyo, thanks to sponsoring by the GNOME Foundation and the Linux Foundation. At GNOME.Asia I spoke about <a href="http://0pointer.de/public/gnome-asia-2013-apps.pdf">Sandboxed Applications for GNOME</a>, and at LinuxCon Japan about <a href="http://0pointer.de/public/linuxcon-japan-2013-systemd.pdf">the first three years of systemd</a>. (I think at least the latter one was videotaped, and recordings might show up on the net eventually). I like to believe both talks went pretty well, and helped getting the message across to community what we are working on and what the roadmap for us is, and what we expect from the various projects, and especially GNOME. However, for me personally the <i>hallway track</i> was the most interesting part. The personal Q&amp;A regarding our work on kdbus, cgroups, systemd and related projects where highly interesting. In fact, at both conferences we had something like impromptu hackfests on the topics of kdbus and cgroups, with some conferences attendees. I also enjoyed the opportunity to be on Karen's upcoming GNOME podcast, recorded in a session at Gyeongbokgung Palace in Seoul (what better place could there be for a podcast recording?).</p> <p>I'd like to thank the GNOME and Linux foundations for sponsoring my attendance to these conferences. I'd especially like to thank the organizers of GNOME.Asia for their perfectly organized conference!</p> <p><img src="https://live.gnome.org/Travel/Policy?action=AttachFile&amp;do=get&amp;target=sponsored-badge-simple.png" alt="GNOME Travel Badge" /></p> Lennart PoetteringSun, 09 Jun 2013 16:30:00 +0200tag:0pointer.net,2013-06-09:/blog/projects/asia-2013.htmlprojectsIt's Time Again!https://0pointer.net/blog/projects/berlin-open-source-meetup-4.html <p>My fellow Berliners! There's another <a href="https://plus.google.com/events/cnikpv83amqf0mr8cf0ag7f2qus">Berlin Open Source Meetup</a> scheduled for this Sunday! You are invited!</p> <p>See you on Sunday!</p> Lennart PoetteringMon, 08 Apr 2013 10:58:00 +0200tag:0pointer.net,2013-04-08:/blog/projects/berlin-open-source-meetup-4.htmlprojectsWhat Are We Breaking Now?https://0pointer.net/blog/projects/brno.html <p>End of February <a href="http://www.devconf.cz/">devconf.cz</a> took place in Brno, Czech Republic. At the conference Kay Sievers, Harald Hoyer and I did two presentations about our work on <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> and about the systemd Journal. These talks were taped and the recordings are now available online.</p> <p>First, here's our talk about <a href="https://www.youtube.com/watch?v=_rrpjYD373A"><i>What Are We Breaking Now?</i></a>, in which we try to give an overview on what we are working on currently in the systemd context, and what we expect to do in the next few months. We cover <a href="http://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames">Predictable Network Interface Names</a>, the <a href="http://www.freedesktop.org/wiki/Specifications/BootLoaderSpec">Boot Loader Spec</a>, kdbus, the Apps framework, and more.</p> <object width="420" height="315"><param name="movie" value="http://www.youtube.com/v/_rrpjYD373A?hl=en_US&amp;version=3"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/_rrpjYD373A?hl=en_US&amp;version=3" type="application/x-shockwave-flash" width="420" height="315" allowscriptaccess="always" allowfullscreen="true"></embed></object> <p>And then, I did my second talk about <a href="https://www.youtube.com/watch?v=i4CACB7paLc"><i>The systemd Journal</i></a>, with a focus on how to practically make use of <tt>journalctl</tt>, as a day-to-day tool for administrators (these practical bits start around 28:40). The commands demoed here are all explained in an <a href="http://0pointer.de/blog/projects/journalctl.html">earlier blog story of mine</a>.</p> <object width="420" height="315"><param name="movie" value="http://www.youtube.com/v/i4CACB7paLc?hl=en_US&amp;version=3"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/i4CACB7paLc?hl=en_US&amp;version=3" type="application/x-shockwave-flash" width="420" height="315" allowscriptaccess="always" allowfullscreen="true"></embed></object> <p>Unfortunately, the audience questions are sometimes hard or impossible to understand from the videos, and sometimes the text on the slides is hard to read, but I still believe that the two talks are quite interesting.</p> Lennart PoetteringThu, 14 Mar 2013 16:58:00 +0100tag:0pointer.net,2013-03-14:/blog/projects/brno.htmlprojectssystemd Hackfest!https://0pointer.net/blog/projects/hackfest.html <p>Hey, you, systemd hacker, Fedora hacker! Listen up! This Thu/Fri is the <a href="https://plus.google.com/u/0/events/cnklef88b85tb6tgf6ue3hn32lg">systemd Hackfest</a> in Brno/Czech Rep, right before <a href="http://www.devconf.cz/">devconf.cz</a>! On thursday we'll talk about (and hack on) all things systemd. And the hackfest friday is going to be a <a href="https://fedoraproject.org/wiki/FAD_systemd_2013">Fedora Activity Day</a>, so we'll have a focus on systemd integration into Fedora.</p> <p>You are invited!</p> <p>See you in Brno!</p> Lennart PoetteringMon, 18 Feb 2013 18:59:00 +0100tag:0pointer.net,2013-02-18:/blog/projects/hackfest.htmlprojectsThe Biggest Mythshttps://0pointer.net/blog/projects/the-biggest-myths.html <p>Since we first proposed <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> for inclusion in the distributions it has been frequently discussed in many forums, mailing lists and conferences. In these discussions one can often hear certain myths about systemd, that are repeated over and over again, but certainly don't gain any truth by constant repetition. Let's take the time to debunk a few of them:</p> <ol> <li><p><b>Myth: systemd is monolithic.</b></p> <p>If you build systemd with all configuration options enabled you will build 69 individual binaries. These binaries all serve different tasks, and are neatly separated for a number of reasons. For example, we designed systemd with security in mind, hence most daemons run at minimal privileges (using kernel capabilities, for example) and are responsible for very specific tasks only, to minimize their security surface and impact. Also, systemd parallelizes the boot more than any prior solution. This parallization happens by running more processes in parallel. Thus it is essential that systemd is nicely split up into many binaries and thus processes. In fact, many of these binaries<sup>[1]</sup> are separated out so nicely, that they are very useful outside of systemd, too.</p> <p>A package involving 69 individual binaries can hardly be called <i>monolithic</i>. What is different from prior solutions however, is that we ship more components in a single tarball, and maintain them upstream in a single repository with a unified release cycle.</p></li> <li><p><b>Myth: systemd is about speed.</b></p> <p>Yes, systemd is fast (<a href="https://plus.google.com/108087225644395745666/posts/LyPQgKdntgA">A pretty complete userspace boot-up in ~900ms, anyone?</a>), but that's primarily just a side-effect of doing things <i>right</i>. In fact, we never really sat down and optimized the last tiny bit of performance out of systemd. Instead, we actually frequently knowingly picked the slightly slower code paths in order to keep the code more readable. This doesn't mean being fast was irrelevant for us, but reducing systemd to its speed is certainly quite a misconception, since that is certainly not anywhere near the top of our list of goals.</p></li> <li><p><b>Myth: systemd's fast boot-up is irrelevant for servers.</b></p> <p>That is just completely not true. Many administrators actually are keen on reduced downtimes during maintenance windows. In High Availability setups it's kinda nice if the failed machine comes back up really fast. In cloud setups with a large number of VMs or containers the price of slow boots multiplies with the number of instances. Spending minutes of CPU and IO on really slow boots of hundreds of VMs or containers reduces your system's density drastically, heck, it even costs you more energy. Slow boots can be quite financially expensive. Then, fast booting of containers allows you to implement a logic such as <a href="http://0pointer.de/blog/projects/socket-activated-containers.html">socket activated containers</a>, allowing you to drastically increase the density of your cloud system.</p> <p>Of course, in many server setups boot-up is indeed irrelevant, but systemd is supposed to cover the whole range. And yes, I am aware that often it is the server firmware that costs the most time at boot-up, and the OS anyways fast compared to that, but well, systemd is still supposed to cover the whole range (see above...), and no, not all servers have such bad firmware, and certainly not VMs and containers, which are servers of a kind, too.<sup>[2]</sup></p></li> <li><p><b>Myth: systemd is incompatible with shell scripts.</b></p> <p>This is entirely bogus. <i>We</i> just don't use them for the boot process, because we believe they aren't the best tool for that specific purpose, but that doesn't mean systemd was incompatible with them. You can easily run shell scripts as systemd services, heck, you can run scripts written in <i>any</i> language as systemd services, systemd doesn't care the slightest bit what's inside your executable. Moreover, we heavily use shell scripts for our own purposes, for installing, building, testing systemd. And you can stick your scripts in the early boot process, use them for normal services, you can run them at latest shutdown, there are practically no limits.</p></li> <li><p><b>Myth: systemd is difficult.</b></p> <p>This also is entire non-sense. A systemd platform is actually much simpler than traditional Linuxes because it unifies system objects and their dependencies as systemd units. The configuration file language is very simple, and redundant configuration files we got rid of. We provide uniform tools for much of the configuration of the system. The system is much less conglomerate than traditional Linuxes are. We also have pretty comprehensive documentation (<a href="http://www.freedesktop.org/wiki/Software/systemd">all linked from the homepage</a>) about pretty much every detail of systemd, and this not only covers admin/user-facing interfaces, but also developer APIs.</p> <p>systemd certainly comes with a learning curve. Everything does. However, we like to believe that it is actually simpler to understand systemd than a Shell-based boot for most people. Surprised we say that? Well, as it turns out, Shell is not a pretty language to learn, it's syntax is arcane and complex. systemd unit files are substantially easier to understand, they do not expose a programming language, but are simple and declarative by nature. That all said, if you are experienced in shell, then yes, adopting systemd will take a bit of learning.</p> <p>To make learning easy we tried hard to provide the maximum compatibility to previous solutions. But not only that, on many distributions you'll find that some of the traditional tools will now even tell you -- while executing what you are asking for -- how you could do it with the newer tools instead, in a possibly nicer way.</p> <p>Anyway, the take-away is probably that systemd is probably as simple as such a system can be, and that we try hard to make it easy to learn. But yes, if you know sysvinit then adopting systemd will require a bit learning, but quite frankly if you mastered sysvinit, then systemd should be easy for you.</p></li> <li><p><b>Myth: systemd is not modular.</b></p> <p>Not true at all. At compile time you have a number of <tt>configure</tt> switches to select what you want to build, and what not. And <a href="http://freedesktop.org/wiki/Software/systemd/MinimalBuilds">we document</a> how you can select in even more detail what you need, going beyond our configure switches.</p> <p>This modularity is not totally unlike the one of the Linux kernel, where you can select many features individually at compile time. If the kernel is modular enough for you then systemd should be pretty close, too.</p></li> <li><p><b>Myth: systemd is only for desktops.</b></p> <p>That is certainly not true. With systemd we try to cover pretty much the same range as Linux itself does. While we care for desktop uses, we also care pretty much the same way for server uses, and embedded uses as well. You can bet that Red Hat wouldn't make it a core piece of RHEL7 if it wasn't the best option for managing services on servers.</p> <p>People from numerous companies work on systemd. Car manufactureres build it into cars, Red Hat uses it for a server operating system, and GNOME uses many of its interfaces for improving the desktop. You find it in toys, in space telescopes, and in wind turbines.</p> <p>Most features I most recently worked on are probably relevant primarily on servers, such as <a href="http://0pointer.de/blog/projects/socket-activated-containers.html">container support</a>, <a href="http://0pointer.de/blog/projects/resources.html">resource management</a> or the <a href="http://0pointer.de/blog/projects/security.html">security features</a>. We cover desktop systems pretty well already, and there are number of companies doing systemd development for embedded, some even offer consulting services in it.</p></li> <li><p><b>Myth: systemd was created as result of the NIH syndrome.</b></p> <p>This is not true. Before we began working on systemd we were pushing for Canonical's Upstart to be widely adopted (and Fedora/RHEL used it too for a while). However, we eventually came to the conclusion that its design was inherently flawed at its core (at least in our eyes: most fundamentally, it leaves dependency management to the admin/developer, instead of solving this hard problem in code), and if something's wrong in the core you better replace it, rather than fix it. This was hardly the only reason though, other things that came into play, such as the licensing/contribution agreement mess around it. NIH wasn't one of the reasons, though...<sup>[3]</sup></p></li> <li><p><b>Myth: systemd is a freedesktop.org project.</b></p> <p>Well, systemd is certainly hosted at fdo, but freedesktop.org is little else but a repository for code and documentation. Pretty much any coder can request a repository there and dump his stuff there (as long as it's somewhat relevant for the infrastructure of free systems). There's no cabal involved, no "standardization" scheme, no project vetting, nothing. It's just a nice, free, reliable place to have your repository. In that regard it's a bit like SourceForge, github, kernel.org, just not commercial and without over-the-top requirements, and hence a good place to keep our stuff.</p> <p>So yes, we host our stuff at fdo, but the implied assumption of this myth in that there was a group of people who meet and then agree on how the future free systems look like, is entirely bogus.</p></li> <li><p><b>Myth: systemd is not UNIX.</b></p> <p>There's certainly some truth in that. systemd's sources do not contain a single line of code originating from original UNIX. However, we derive inspiration from UNIX, and thus there's a ton of UNIX in systemd. For example, the UNIX idea of "everything is a file" finds reflection in that in systemd all services are exposed at runtime in a kernel file system, the <tt>cgroupfs</tt>. Then, one of the original features of UNIX was multi-seat support, based on built-in terminal support. Text terminals are hardly the state of the art how you interface with your computer these days however. With systemd we brought native <a href="http://0pointer.de/blog/projects/multi-seat.html">multi-seat</a> support back, but this time with full support for today's hardware, covering graphics, mice, audio, webcams and more, and all that fully automatic, hotplug-capable and without configuration. In fact the design of systemd as a suite of integrated tools that each have their individual purposes but when used together are more than just the sum of the parts, that's pretty much at the core of UNIX philosophy. Then, the way our project is handled (i.e. maintaining much of the core OS in a single git repository) is much closer to the BSD model (which is a true UNIX, unlike Linux) of doing things (where most of the core OS is kept in a single CVS/SVN repository) than things on Linux ever were.</p> <p>Ultimately, UNIX is something different for everybody. For us systemd maintainers it is something we derive inspiration from. For others it is a religion, and much like the other world religions there are different readings and understandings of it. Some define UNIX based on specific pieces of code heritage, others see it just as a set of ideas, others as a set of commands or APIs, and even others as a definition of behaviours. Of course, it is impossible to ever make all these people happy.</p> <p>Ultimately the question whether something is UNIX or not matters very little. Being technically excellent is hardly exclusive to UNIX. For us, UNIX is a major influence (heck, the biggest one), but we also have other influences. Hence in some areas systemd will be very UNIXy, and in others a little bit less.</p></li> <li><p><b>Myth: systemd is complex.</b></p> <p>There's certainly some truth in that. Modern computers are complex beasts, and the OS running on it will hence have to be complex too. However, systemd is certainly not more complex than prior implementations of the same components. Much rather, it's simpler, and has less redundancy (see above). Moreover, building a simple OS based on systemd will involve much fewer packages than a traditional Linux did. Fewer packages makes it easier to build your system, gets rid of interdependencies and of much of the different behaviour of every component involved.</p></li> <li><p><b>Myth: systemd is bloated.</b></p> <p>Well, <i>bloated</i> certainly has many different definitions. But in most definitions systemd is probably the opposite of bloat. Since systemd components share a common code base, they tend to share much more code for common code paths. Here's an example: in a traditional Linux setup, sysvinit, start-stop-daemon, inetd, cron, dbus, all implemented a scheme to execute processes with various configuration options in a certain, hopefully clean environment. On systemd the code paths for all of this, for the configuration parsing, as well as the actual execution is shared. This means less code, less place for mistakes, less memory and cache pressure, and is thus a very good thing. And as a side-effect you actually get a ton more functionality for it...</p> <p>As mentioned above, systemd is also pretty modular. You can choose at build time which components you need, and which you don't need. People can hence specifically choose the level of "bloat" they want.</p> <p>When you build systemd, it only requires three dependencies: glibc, libcap and dbus. That's it. It can make use of more dependencies, but these are entirely optional.</p> <p>So, yeah, whichever way you look at it, it's really not <i>bloated</i>.</p></li> <li><p><b>Myth: systemd being Linux-only is not nice to the BSDs.</b></p> <p>Completely wrong. The BSD folks are pretty much uninterested in systemd. If systemd was portable, this would change nothing, they still wouldn't adopt it. And the same is true for the other Unixes in the world. Solaris has SMF, BSD has their own "rc" system, and they always maintained it separately from Linux. The init system is very close to the core of the entire OS. And these other operating systems hence define themselves among other things by their core userspace. The assumption that they'd adopt our core userspace if we just made it portable, is completely without any foundation.</p></li> <li><p><b>Myth: systemd being Linux-only makes it impossible for Debian to adopt it as default.</b></p> <p>Debian supports non-Linux kernels in their distribution. systemd won't run on those. Is that a problem though, and should that hinder them to adopt system as default? Not really. The folks who ported Debian to these other kernels were willing to invest time in a massive porting effort, they set up test and build systems, and patched and built numerous packages for their goal. The maintainance of both a systemd unit file and a classic init script for the packaged services is a negligable amount of work compared to that, especially since those scripts more often than not exist already.</p></li> <li><p><b>Myth: systemd could be ported to other kernels if its maintainers just wanted to.</b></p> <p>That is simply not true. Porting systemd to other kernel is not feasible. We just use too many Linux-specific interfaces. For a few one might find replacements on other kernels, some features one might want to turn off, but for most this is nor really possible. Here's a small, very incomprehensive list: <tt>cgroups, fanotify, umount2(), /proc/self/mountinfo </tt>(including notification)<tt>, /dev/swaps </tt>(same)<tt>, udev, netlink, </tt>the structure of<tt> /sys, /proc/$PID/comm, /proc/$PID/cmdline, /proc/$PID/loginuid, /proc/$PID/stat, /proc/$PID/session, /proc/$PID/exe, /proc/$PID/fd, tmpfs, devtmpfs, </tt>capabilities, namespaces of all kinds, various<tt> prctl()s, </tt>numerous<tt> ioctls, </tt>the<tt> mount() </tt>system call and its semantics<tt>, selinux, audit, inotify, statfs, O_DIRECTORY, O_NOATIME, /proc/$PID/root, waitid(), SCM_CREDENTIALS, SCM_RIGHTS, mkostemp(), /dev/input, ...</tt></p> <p>And no, if you look at this list and pick out the few where you can think of obvious counterparts on other kernels, then think again, and look at the others you didn't pick, and the complexity of replacing them.</p></li> <li><p><b>Myth: systemd is not portable for no reason.</b></p> <p>Non-sense! We use the Linux-specific functionality because we need it to implement what we want. Linux has so many features that UNIX/POSIX didn't have, and we want to empower the user with them. These features are incredibly useful, but only if they are actually exposed in a friendly way to the user, and that's what we do with systemd.</p></li> <li><p><b>Myth: systemd uses binary configuration files.</b></p> <p>No idea who came up with this crazy myth, but it's absolutely not true. systemd is configured pretty much exclusively via simple text files. A few settings you can also alter with the kernel command line and via environment variables. There's nothing binary in its configuration (not even XML). Just plain, simple, easy-to-read text files.</p></li> <li><p><b>Myth: systemd is a feature creep.</b></p> <p>Well, systemd certainly covers more ground that it used to. It's not just an init system anymore, but the basic userspace building block to build an OS from, but we carefully make sure to keep most of the features optional. You can turn a lot off at compile time, and even more at runtime. Thus you can choose freely how much feature creeping you want.</p></li> <li><p><b>Myth: systemd forces you to do something.</b></p> <p>systemd is not the mafia. It's Free Software, you can do with it whatever you want, and that includes not using it. That's pretty much the opposite of "forcing".</p></li> <li><p><b>Myth: systemd makes it impossible to run syslog.</b></p> <p>Not true, we carefully made sure when <a href="http://0pointer.de/blog/projects/the-journal.html">we introduced the journal</a> that all data is also passed on to any syslog daemon running. In fact, if something changed, then only that syslog gets more complete data now than it got before, since we now cover early boot stuff as well as STDOUT/STDERR of any system service.</p></li> <li><p><b>Myth: systemd is incompatible.</b></p> <p>We try very hard to provide the best possible compatibility with sysvinit. In fact, the vast majority of init scripts should work just fine on systemd, unmodified. However, there actually are indeed a few incompatibilities, but we try to <a href="http://www.freedesktop.org/wiki/Software/systemd/Incompatibilities">document these</a> and explain what to do about them. Ultimately every system that is not actually sysvinit itself will have a certain amount of incompatibilities with it since it will not share the exect same code paths.</p> <p>It is our goal to ensure that differences between the various distributions are kept at a minimum. That means unit files usually work just fine on a different distribution than you wrote it on, which is a big improvement over classic init scripts which are very hard to write in a way that they run on multiple Linux distributions, due to numerous incompatibilities between them.</p></li> <li><p><b>Myth: systemd is not scriptable, because of its D-Bus use.</b></p> <p>Not true. Pretty much every single D-Bus interface systemd provides is also available in a command line tool, for example in <a href="http://www.freedesktop.org/software/systemd/man/systemctl.html"><tt>systemctl</tt></a>, <a href="http://www.freedesktop.org/software/systemd/man/loginctl.html"><tt>loginctl</tt></a>, <a href="http://www.freedesktop.org/software/systemd/man/timedatectl.html"><tt>timedatectl</tt></a>, <a href="http://www.freedesktop.org/software/systemd/man/hostnamectl.html"><tt>hostnamectl</tt></a>, <a href="http://www.freedesktop.org/software/systemd/man/localectl.html"><tt>localectl</tt></a> and suchlike. You can easily call these tools from shell scripts, they open up pretty much the entire API from the command line with easy-to-use commands.</p> <p>That said, D-Bus actually has bindings for almost any scripting language this world knows. Even from the shell you can invoke arbitrary D-Bus methods with <a href="http://dbus.freedesktop.org/doc/dbus-send.1.html">dbus-send</a> or <a href="http://developer.gnome.org/gio/unstable/gdbus.html">gdbus</a>. If anything, this improves scriptability due to the good support of D-Bus in the various scripting languages.</p></li> <li><p><b>Myth: systemd requires you to use some arcane configuration tools instead of allowing you to edit your configuration files directly.</b></p> <p>Not true at all. We offer some configuration tools, and using them gets you a bit of additional functionality (for example, command line completion for all settings!), but there's no need at all to use them. You can always edit the files in question directly if you wish, and that's fully supported. Of course sometimes you need to explicitly reload configuration of some daemon after editing the configuration, but that's pretty much true for most UNIX services.</p></li> <li><p><b>Myth: systemd is unstable and buggy.</b></p> <p>Certainly not according to our data. We have been monitoring the Fedora bug tracker (and some others) closely for a long long time. The number of bugs is very low for such a central component of the OS, especially if you discount the numerous RFE bugs we track for the project. We are pretty good in keeping systemd out of the list of blocker bugs of the distribution. We have a relatively fast development cycle with mostly incremental changes to keep quality and stability high.</p></li> <li><p><b>Myth: systemd is not debuggable.</b></p> <p>False. Some people try to imply that the shell was a good debugger. Well, it isn't really. In systemd we provide you with actual debugging features instead. For example: interactive debugging, verbose tracing, the ability to mask any component during boot, and more. Also, we provide <a href="http://freedesktop.org/wiki/Software/systemd/Debugging">documentation for it</a>.</p> <p>It's certainly well debuggable, we needed that for our own development work, after all. But we'll grant you one thing: it uses different debugging tools, we believe more appropriate ones for the purpose, though.</p></li> <li><p><b>Myth: systemd makes changes for the changes' sake.</b></p> <p>Very much untrue. We pretty much exclusively have technical reasons for the changes we make, and we explain them in the various pieces of documentation, wiki pages, blog articles, mailing list announcements. We try hard to avoid making incompatible changes, and if we do we try to document the why and how in detail. And if you wonder about something, just ask us!</p></li> <li><p><b>Myth: systemd is a Red-Hat-only project, is private property of some smart-ass developers, who use it to push their views to the world.</b></p> <p>Not true. Currently, there are 16 hackers with commit powers to the systemd git tree. Of these 16 only six are employed by Red Hat. The 10 others are folks from ArchLinux, from Debian, from Intel, even from Canonical, Mandriva, Pantheon and a number of community folks with full commit rights. And they frequently commit big stuff, major changes. Then, there are 374 individuals with patches in our tree, and they too came from a number of different companies and backgrounds, and many of those have way more than one patch in the tree. The discussions about where we want to take systemd are done in the open, on our IRC channel (<tt>#systemd</tt> on freenode, you are always weclome), on our <a href="http://lists.freedesktop.org/mailman/listinfo/systemd-devel">mailing list</a>, and on public hackfests (<a href="https://plus.google.com/events/cnklef88b85tb6tgf6ue3hn32lg">such as our next one in Brno</a>, you are invited). We regularly attend various conferences, to collect feedback, to explain what we are doing and why, like few others do. We <a href="http://0pointer.de/blog">maintain blogs</a>, engage in social networks (<a href="https://plus.google.com/104232583922197692623/posts">we actually have some pretty interesting content on Google+</a>, and our <a href="https://plus.google.com/communities/114587707547576757881">Google+ Community is pretty alive, too</a>.), and try really hard to explain the why and the how how we do things, and to listen to feedback and figure out where the current issues are (for example, from that feedback we compiled this lists of often heard myths about systemd...).</p> <p>What most systemd contributors probably share is a rough idea how a good OS should look like, and the desire to make it happen. However, by the very nature of the project being Open Source, and rooted in the community systemd is just what people want it to be, and if it's not what they want then they can drive the direction with patches and code, and if that's not feasible, then there are numerous other options to use, too, systemd is never exclusive.</p> <p>One goal of systemd is to unify the dispersed Linux landscape a bit. We try to get rid of many of the more pointless differences of the various distributions in various areas of the core OS. As part of that we sometimes adopt schemes that were previously used by only one of the distributions and push it to a level where it's the default of systemd, trying to gently push everybody towards the same set of basic configuration. This is never exclusive though, distributions can continue to deviate from that if they wish, however, if they end-up using the well-supported default their work becomes much easier and they might gain a feature or two. Now, as it turns out, more frequently than not we actually adopted schemes that where Debianisms, rather than Fedoraisms/Redhatisms as best supported scheme by systemd. For example, systems running systemd now generally store their hostname in <tt>/etc/hostname</tt>, something that used to be specific to Debian and now is used across distributions.</p> <p>One thing we'll grant you though, we sometimes can be smart-asses. We try to be prepared whenever we open our mouth, in order to be able to back-up with facts what we claim. That might make us appear as smart-asses.</p> <p>But in general, yes, some of the more influental contributors of systemd work for Red Hat, but they are in the minority, and systemd is a healthy, open community with different interests, different backgrounds, just unified by a few rough ideas where the trip should go, a community where code and its design counts, and certainly not company affiliation.</p></li> <li><p><b>Myth: systemd doesn't support <tt>/usr</tt> split from the root directory.</b></p> <p>Non-sense. Since its beginnings systemd supports the <tt>--with-rootprefix=</tt> option to its <tt>configure</tt> script which allows you to tell systemd to neatly split up the stuff needed for early boot and the stuff needed for later on. All this logic is fully present and we keep it up-to-date right there in systemd's build system.</p> <p>Of course, we still don't think that <a href="http://freedesktop.org/wiki/Software/systemd/separate-usr-is-broken">actually booting with <tt>/usr</tt> unavailable is a good idea</a>, but we support this just fine in our build system. This won't fix the inherent problems of the scheme that you'll encounter all across the board, but you can't blame that on systemd, because in systemd we support this just fine.</p></li> <li><p><b>Myth: systemd doesn't allow your to replace its components.</b></p> <p>Not true, you can turn off and replace pretty much any part of systemd, with very few exceptions. And those exceptions (such as journald) generally allow you to run an alternative side by side to it, while cooperating nicely with it.</p></li> <li><p><b>Myth: systemd's use of D-Bus instead of sockets makes it intransparent.</b></p> <p>This claim is already contradictory in itself: D-Bus uses sockets as transport, too. Hence whenever D-Bus is used to send something around, a socket is used for that too. D-Bus is mostly a standardized serialization of messages to send over these sockets. If anything this makes it more transparent, since this serialization is well documented, understood and there are numerous tracing tools and language bindings for it. This is very much unlike the usual homegrown protocols the various classic UNIX daemons use to communicate locally.</p></li> </ol> <p>Hmm, did I write I just wanted to debunk a "few" myths? Maybe these were more than just a few... Anyway, I hope I managed to clear up a couple of misconceptions. Thanks for your time.</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] For example, <a href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html"><tt>systemd-detect-virt</tt></a>, <a href="http://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html"><tt>systemd-tmpfiles</tt></a>, <a href="http://www.freedesktop.org/software/systemd/man/systemd-udevd.service.html"><tt>systemd-udevd</tt></a> are.</small></p> <p><small>[2] Also, we are trying to do our little part on maybe making this better. By exposing boot-time performance of the firmware more prominently in systemd's boot output we hope to shame the firmware writers to clean up their stuff.</small></p> <p><small>[3] And anyways, guess which project includes a library "lib<i>nih</i>" -- Upstart or systemd?<sup>[4]</sup></small></p> <p><small>[4] Hint: it's not systemd!</small></p> Lennart PoetteringSat, 26 Jan 2013 02:43:00 +0100tag:0pointer.net,2013-01-26:/blog/projects/the-biggest-myths.htmlprojectssystemd for Administrators, Part XXhttps://0pointer.net/blog/projects/socket-activated-containers.html <p> <a href="http://0pointer.de/blog/projects/detect-virt.html">This is</a> <a href="http://0pointer.de/blog/projects/resources.html">no</a> <a href="http://0pointer.de/blog/projects/journalctl.html">time</a> <a href="http://0pointer.de/blog/projects/serial-console.html">for</a> <a href="http://0pointer.de/blog/projects/watchdog.html">procrastination,</a> <a href="http://0pointer.de/blog/projects/self-documented-boot.html">here</a> <a href="http://0pointer.de/blog/projects/systemctl-journal.html">is</a> <a href="http://0pointer.de/blog/projects/security.html">already</a> <a href="http://0pointer.de/blog/projects/inetd.html">the</a> <a href="http://0pointer.de/blog/projects/instances.html">twentieth</a> <a href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a> <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a> <a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p> <h4>Socket Activated Internet Services and OS Containers</h4> <p><a href="http://0pointer.de/blog/projects/socket-activation.html">Socket</a> <a href="http://0pointer.de/blog/projects/socket-activation2.html">Activation</a> is an important feature of <a href="http://www.freedesktop.org/wiki/Software/systemd/">systemd</a>. When we <a href="http://0pointer.de/blog/projects/systemd.html">first announced</a> systemd we already tried to make the point how great socket activation is for increasing parallelization and robustness of socket services, but also for simplifying the dependency logic of the boot. In this episode I'd like to explain why socket activation is an important tool for drastically improving how many services and even containers you can run on a single system with the same resource usage. Or in other words, how you can drive up the density of customer sites on a system while spending less on new hardware.</p> <h5>Socket Activated Internet Services</h5> <p>First, let's take a step back. What was <i>socket activation</i> again? -- Basically, socket activation simply means that systemd sets up listening sockets (IP or otherwise) on behalf of your services (without these running yet), and then starts (<i>activates</i>) the services as soon as the first connection comes in. Depending on the technology the services might idle for a while after having processed the connection and possible follow-up connections before they exit on their own, so that systemd will again listen on the sockets and activate the services again the next time they are connected to. For the client it is not visible whether the service it is interested in is currently running or not. The service's IP socket stays continously connectable, no connection attempt ever fails, and all connects will be processed promptly.</p> <p>A setup like this lowers resource usage: as services are only running when needed they only consume resources when required. Many internet sites and services can benefit from that. For example, web site hosters will have noticed that of the multitude of web sites that are on the Internet only a tiny fraction gets a continous stream of requests: the huge majority of web sites still needs to be available all the time but gets requests only very unfrequently. With a scheme like socket activation you take benefit of this. By hosting many of these sites on a single system like this and only activating their services as necessary allows a large degree of over-commit: you can run more sites on your system than the available resources actually allow. Of course, one shouldn't over-commit too much to avoid contention during peak times.</p> <p>Socket activation like this is easy to use in systemd. Many modern Internet daemons already support socket activation out of the box (and for those which don't yet it's <a href="http://0pointer.de/blog/projects/socket-activation.html">not</a> <a href="http://0pointer.de/blog/projects/socket-activation2.html">hard</a> to add). Together with systemd's <a href="http://0pointer.de/blog/projects/instances.html">instantiated units support</a> it is easy to write a pair of service and socket templates that then may be instantiated multiple times, once for each site. Then, (optionally) make use of some of the <a href="http://0pointer.de/blog/projects/security.html">security features</a> of systemd to nicely isolate the customer's site's services from each other (think: each customer's service should only see the home directory of the customer, everybody else's directories should be invisible), and there you go: you now have a highly scalable and reliable server system, that serves a maximum of securely sandboxed services at a minimum of resources, and all nicely done with built-in technology of your OS.</p> <p>This kind of setup is already in production use in a number of companies. For example, the great folks at <a href="https://www.getpantheon.com/">Pantheon</a> are running their scalable instant Drupal system on a setup that is similar to this. (In fact, Pantheon's David Strauss pioneered this scheme. David, you rock!)</p> <h5>Socket Activated OS Containers</h5> <p>All of the above can already be done with older versions of systemd. If you use a distribution that is based on systemd, you can right-away set up a system like the one explained above. But let's take this one step further. With systemd 197 (to be included in Fedora 19), we added support for socket activating not only individual services, but <i>entire</i> OS containers. And I really have to say it at this point: this is stuff I am really excited about. ;-)</p> <p>Basically, with socket activated OS containers, the host's systemd instance will listen on a number of ports on behalf of a container, for example one for SSH, one for web and one for the database, and as soon as the first connection comes in, it will spawn the container this is intended for, and pass to it all three sockets. Inside of the container, another systemd is running and will accept the sockets and then distribute them further, to the services running inside the container using normal socket activation. The SSH, web and database services will only see the inside of the container, even though they have been activated by sockets that were originally created on the host! Again, to the clients this all is not visible. That an entire OS container is spawned, triggered by simple network connection is entirely transparent to the client side.<sup>[1]</sup></p> <p>The OS containers may contain (as the name suggests) a full operating system, that might even be a different distribution than is running on the host. For example, you could run your host on Fedora, but run a number of Debian containers inside of it. The OS containers will have their own systemd init system, their own SSH instances, their own process tree, and so on, but will share a number of other facilities (such as memory management) with the host.</p> <p>For now, only systemd's own trivial container manager, <a href="http://0pointer.de/blog/projects/changing-roots">systemd-nspawn</a> has been updated to support this kind of socket activation. We hope that <a href="http://libvirt.org/drvlxc.html">libvirt-lxc</a> will soon gain similar functionality. At this point, let's see in more detail how such a setup is configured in systemd using nspawn:</p> <p>First, please use a tool such as <tt>debootstrap</tt> or yum's <tt>--installroot</tt> to set up a container OS tree<sup>[2]</sup>. The details of that are a bit out-of-focus for this story, there's plenty of documentation around how to do this. Of course, make sure you have systemd v197 installed inside the container. For accessing the container from the command line, consider using <a href="http://0pointer.de/blog/projects/changing-roots">systemd-nspawn</a> itself. After you configured everything properly, try to boot it up from the command line with systemd-nspawn's <tt>-b</tt> switch.</p> <p>Assuming you now have a working container that boots up fine, let's write a service file for it, to turn the container into a systemd service on the host you can start and stop. Let's create <tt>/etc/systemd/system/mycontainer.service</tt> on the host:</p> <pre> [Unit] Description=My little container [Service] ExecStart=/usr/bin/systemd-nspawn -jbD /srv/mycontainer 3 KillMode=process </pre> <p>This service can already be started and stopped via <tt>systemctl start</tt> and <tt>systemctl stop</tt>. However, there's no nice way to actually get a shell prompt inside the container. So let's add SSH to it, and even more: let's configure SSH so that a connection to the container's SSH port will socket-activate the entire container. First, let's begin with telling the host that it shall now listen on the SSH port of the container. Let's create <tt>/etc/systemd/system/mycontainer.socket</tt> on the host:</p> <pre> [Unit] Description=The SSH socket of my little container [Socket] ListenStream=23 </pre> <p>If we start this unit with <tt>systemctl start</tt> on the host then it will listen on port 23, and as soon as a connection comes in it will activate our container service we defined above. We pick port 23 here, instead of the usual 22, as our host's SSH is already listening on that. nspawn virtualizes the process list and the file system tree, but does not actually virtualize the network stack, hence we just pick different ports for the host and the various containers here.</p> <p>Of course, the system inside the container doesn't yet know what to do with the socket it gets passed due to socket activation. If you'd now try to connect to the port, the container would start-up but the incoming connection would be immediately closed since the container can't handle it yet. Let's fix that!</p> <p>All that's necessary for that is teach SSH inside the container socket activation. For that let's simply write a pair of socket and service units for SSH. Let's create <tt>/etc/systemd/system/sshd.socket</tt> in the container:</p> <pre>[Unit] Description=SSH Socket for Per-Connection Servers [Socket] ListenStream=23 Accept=yes</pre> <p>Then, let's add the matching SSH service file <tt>/etc/systemd/system/sshd@.service</tt> in the container:</p> <pre>[Unit] Description=SSH Per-Connection Server for %I [Service] ExecStart=-/usr/sbin/sshd -i StandardInput=socket</pre> <p>Then, make sure to hook <tt>sshd.socket</tt> into the <tt>sockets.target</tt> so that unit is started automatically when the container boots up:</p> <pre>ln -s /etc/systemd/system/sshd.socket /etc/systemd/system/sockets.target.wants/</pre> <p>And that's it. If we now activate <tt>mycontainer.socket</tt> on the host, the host's systemd will bind the socket and we can connect to it. If we do this, the host's systemd will activate the container, and pass the socket in to it. The container's systemd will then take the socket, match it up with <tt>sshd.socket</tt> inside the container. As there's still our incoming connection queued on it, it will then immediately trigger an instance of <tt>sshd@.service</tt>, and we'll have our login.</p> <p>And that's already everything there is to it. You can easily add additional sockets to listen on to <tt>mycontainer.socket</tt>. Everything listed therein will be passed to the container on activation, and will be matched up as good as possible with all socket units configured inside the container. Sockets that cannot be matched up will be closed, and sockets that aren't passed in but are configured for listening will be bound be the container's systemd instance.</p> <p>So, let's take a step back again. What did we gain through all of this? Well, basically, we can now offer a number of full OS containers on a single host, and the containers can offer their services without running continously. The density of OS containers on the host can hence be increased drastically.</p> <p>Of course, this only works for kernel-based virtualization, not for hardware virtualization. i.e. something like this can only be implemented on systems such as libvirt-lxc or nspawn, but not in qemu/kvm.</p> <p>If you have a number of containers set up like this, here's one cool thing the journal allows you to do. If you pass <tt>-m</tt> to <tt>journalctl</tt> on the host, it will automatically discover the journals of all local containers and interleave them on display. Nifty, eh?</p> <p>With systemd 197 you have everything to set up your own socket activated OS containers on-board. However, there are a couple of improvements we're likely to add soon: for example, right now even if all services inside the container exit on idle, the container still will stay around, and we really should make it exit on idle too, if all its services exited and no logins are around. As it turns out we already have much of the infrastructure for this around: we can reuse the auto-suspend functionality we added for laptops: detecting when a laptop is idle and suspending it then is a very similar problem to detecting when a container is idle and shutting it down then.</p> <p>Anyway, this blog story is already way too long. I hope I haven't lost you half-way already with all this talk of virtualization, sockets, services, different OSes and stuff. I hope this blog story is a good starting point for setting up powerful highly scalable server systems. If you want to know more, consult the documentation and drop by our IRC channel. Thank you!</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] And BTW, <a href="https://plus.google.com/115547683951727699051/posts/cVrLAJ8HYaP">this is another reason</a> why fast boot times the way systemd offers them are actually a really good thing on servers, too.</small></p> <p><small>[2] To make it easy: you need a command line such as <tt>yum --releasever=19 --nogpg --installroot=/srv/mycontainer/ --disablerepo='*' --enablerepo=fedora install systemd passwd yum fedora-release vim-minimal </tt> to install Fedora, and <tt>debootstrap --arch=amd64 unstable /srv/mycontainer/</tt> to install Debian. Also see the bottom of <a href="http://www.freedesktop.org/software/systemd/man/systemd-nspawn.html">systemd-nspawn(1)</a>. Also note that auditing is currently broken for containers, and if enabled in the kernel will cause all kinds of errors in the container. Use <tt>audit=0</tt> on the host's kernel command line to turn it off.</small></p> Lennart PoetteringWed, 09 Jan 2013 18:58:00 +0100tag:0pointer.net,2013-01-09:/blog/projects/socket-activated-containers.htmlprojectssystemd for Administrators, Part XIXhttps://0pointer.net/blog/projects/detect-virt.html <p> <a href="http://0pointer.de/blog/projects/resources.html">Happy</a> <a href="http://0pointer.de/blog/projects/journalctl.html">new</a> <a href="http://0pointer.de/blog/projects/serial-console.html">year</a> <a href="http://0pointer.de/blog/projects/watchdog.html">2013!</a> <a href="http://0pointer.de/blog/projects/self-documented-boot.html">Here</a> <a href="http://0pointer.de/blog/projects/systemctl-journal.html">is</a> <a href="http://0pointer.de/blog/projects/security.html">now</a> <a href="http://0pointer.de/blog/projects/inetd.html">the</a> <a href="http://0pointer.de/blog/projects/instances.html">nineteenth</a> <a href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a> <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a> <a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p> <h4>Detecting Virtualization</h4> <p>When we started working on <a href="http://www.freedesktop.org/wiki/Software/systemd/">systemd</a> we had a closer look on what the various existing init scripts used on Linux where actually doing. Among other things we noticed that a number of them where checking explicitly whether they were running in a virtualized environment (i.e. in a kvm, VMWare, LXC guest or suchlike) or not. Some init scripts disabled themselves in such cases<sup>[1]</sup>, others enabled themselves only in such cases<sup>[2]</sup>. Frequently, it would probably have been a better idea to check for other conditions rather than explicitly checking for virtualization, but after looking at this from all sides we came to the conclusion that in many cases explicitly conditionalizing services based on detected virtualization is a valid thing to do. As a result we added a new configuration option to systemd that can be used to conditionalize services this way: <a href="http://www.freedesktop.org/software/systemd/man/systemd.unit.html"><tt>ConditionVirtualization</tt></a>; we also added a small tool that can be used in shell scripts to detect virtualization: <a href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html"><tt>systemd-detect-virt(1)</tt></a>; and finally, we added a minimal bus interface to query this from other applications.</p> <p>Detecting whether your code is run inside a virtualized environment <a href="http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/virt.c#n30">is actually not that hard</a>. Depending on what precisely you want to detect it's little more than running the CPUID instruction and maybe checking a few files in <tt>/sys</tt> and <tt>/proc</tt>. The complexity is mostly about knowing the strings to look for, and keeping this list up-to-date. Currently, the the virtualization detection code in systemd can detect the following virtualization systems:</p> <ul><li><p>Hardware virtualization (i.e. VMs):</p> <ul><li>qemu</li> <li>kvm</li> <li>vmware</li> <li>microsoft</li> <li>oracle</li> <li>xen</li> <li>bochs</li> </ul></li> <li><p>Same-kernel virtualization (i.e. containers):</p> <ul><li>chroot</li> <li>openvz</li> <li>lxc</li> <li>lxc-libvirt</li> <li><a href="http://0pointer.de/blog/projects/changing-roots">systemd-nspawn</a></li> </ul></li></ul> <p>Let's have a look how one may make use if this functionality.</p> <h5>Conditionalizing Units</h5> <p>Adding <a href="http://www.freedesktop.org/software/systemd/man/systemd.unit.html"><tt>ConditionVirtualization</tt></a> to the <tt>[Unit]</tt> section of a unit file is enough to conditionalize it depending on which virtualization is used or whether one is used at all. Here's an example:</p> <pre>[Unit] Name=My Foobar Service (runs only only on guests) ConditionVirtualization=yes [Service] ExecStart=/usr/bin/foobard</pre> <p>Instead of specifiying "<tt>yes</tt>" or "<tt>no</tt>" it is possible to specify the ID of a specific virtualization solution (Example: "<tt>kvm</tt>", "<tt>vmware</tt>", ...), or either "<tt>container</tt>" or "<tt>vm</tt>" to check whether the kernel is virtualized or the hardware. Also, checks can be prefixed with an exclamation mark ("!") to invert a check. For further details see the <a href="http://www.freedesktop.org/software/systemd/man/systemd.unit.html">manual page</a>.</p> <h5>In Shell Scripts</h5> <p>In shell scripts it is easy to check for virtualized systems with the <a href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html"><tt>systemd-detect-virt(1)</tt></a> tool. Here's an example:</p> <pre> if systemd-detect-virt -q ; then echo "Virtualization is used:" `systemd-detect-virt` else echo "No virtualization is used." fi</pre> <p>If this tool is run it will return with an exit code of zero (success) if a virtualization solution has been found, non-zero otherwise. It will also print a short identifier of the used virtualization solution, which can be suppressed with <tt>-q</tt>. Also, with the <tt>-c</tt> and <tt>-v</tt> parameters it is possible to detect only kernel or only hardware virtualization environments. For further details see the <a href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html">manual page</a>.</p> <h5>In Programs</h5> <p>Whether virtualization is available is also exported on the system bus:</p> <pre>$ gdbus call --system --dest org.freedesktop.systemd1 --object-path /org/freedesktop/systemd1 --method org.freedesktop.DBus.Properties.Get org.freedesktop.systemd1.Manager Virtualization (&lt;'systemd-nspawn'&gt;,)</pre> <p>This property contains the empty string if no virtualization is detected. Note that some container environments cannot be detected directly from unprivileged code. That's why we expose this property on the bus rather than providing a library -- the bus implicitly solves the privilege problem quite nicely.</p> <p>Note that all of this will only ever detect and return information about the "inner-most" virtualization solution. If you stack virtualization ("We must go deeper!") then these interfaces will expose the one the code is most directly interfacing with. Specifically that means that if a container solution is used inside of a VM, then only the container is generally detected and returned.</p> <p><small><b>Footonotes</b></small></p> <p><small>[1] For example: running certain device management service in a container environment that has no access to any physical hardware makes little sense.</small></p> <p><small>[2] For example: some VM solutions work best if certain vendor-specific userspace components are running that connect the guest with the host in some way.</small></p> Lennart PoetteringTue, 08 Jan 2013 21:19:00 +0100tag:0pointer.net,2013-01-08:/blog/projects/detect-virt.htmlprojectsThird Berlin Open Source Meetuphttps://0pointer.net/blog/projects/berlin-open-source-meetup-3.html <p>The Third <a href="https://plus.google.com/u/0/events/c3f3a8go99cn72n8rsosbj7djks">Berlin Open Source Meetup</a> is going to take place on Sunday, January 20th. You are invited!</p> <p>It's a public event, so everybody is welcome, and please feel free to invite others!</p> Lennart PoetteringThu, 03 Jan 2013 23:20:00 +0100tag:0pointer.net,2013-01-03:/blog/projects/berlin-open-source-meetup-3.htmlprojectsfoss.in Needs Your Funding!https://0pointer.net/blog/projects/fossin2012-2.html <p>One of the most exciting conferences in the Free Software world, <a href="http://foss.in/">foss.in</a> in Bangalore, India has <a href="http://atulchitnis.net/2012/sponsoring-foss-in/">trouble finding enough sponsoring</a> for this year's edition. <a href="http://foss.in/2012/take-one-speakers-at-foss-in2012">Many speakers from all around the Free Software world</a> (including yours truly) have signed up to present at the event, and the conference would appreciate any corporate funding they can get!</p> <p><a href="http://atulchitnis.net/2012/sponsoring-foss-in/">Please check if your company can help</a> and <a href="http://foss.in/sponsors">contact the organizers</a> for details!</p> <p>See you in Bangalore!</p> <p><a href="http://foss.in"><img src="http://foss.in/wp-content/uploads/2008/11/speaking_250px.jpg" alt="FOSS.IN" width="250" height="250" border="0" /></a></p> Lennart PoetteringThu, 15 Nov 2012 13:05:00 +0100tag:0pointer.net,2012-11-15:/blog/projects/fossin2012-2.htmlprojectssystemd for Developers IIIhttps://0pointer.net/blog/projects/journal-submit.html <p>Here's the third episode of <a href="http://0pointer.de/blog/projects/socket-activation.html">of my</a> <a href="http://0pointer.de/blog/projects/socket-activation2.html"><i>systemd for Developers</i></a> series.</p> <h4>Logging to the Journal</h4> <p>In a <a href="http://0pointer.de/blog/projects/journalctl.html">recent blog story</a> intended for administrators I shed some light on how to use the <a href="http://www.freedesktop.org/software/systemd/man/journalctl.html">journalctl(1)</a> tool to browse and search the systemd journal. In this blog story for developers I want to explain a little how to get log data into the <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> Journal in the first place.</p> <p>The good thing is that getting log data into the Journal is not particularly hard, since there's a good chance the Journal already collects it anyway and writes it to disk. The journal collects:</p> <ol> <li>All data logged via libc <tt>syslog()</tt></li> <li>The data from the kernel logged with <tt>printk()</tt></li> <li>Everything written to STDOUT/STDERR of any system service</li> </ol> <p>This covers pretty much all of the traditional log output of a Linux system, including messages from the kernel initialization phase, the initial RAM disk, the early boot logic, and the main system runtime.</p> <h4>syslog()</h4> <p>Let's have a quick look how <tt>syslog()</tt> is used again. Let's write a journal message using this call:</p> <pre>#include &lt;syslog.h&gt; int main(int argc, char *argv[]) { syslog(LOG_NOTICE, "Hello World"); return 0; }</pre> <p>This is C code, of course. Many higher level languages provide APIs that allow writing local syslog messages. Regardless which language you choose, all data written like this ends up in the Journal.</p> <p>Let's have a look how this looks after it has been written into the journal (this is the <a href="http://www.freedesktop.org/wiki/Software/systemd/json">JSON output</a> <tt>journalctl -o json-pretty</tt> generates):</p> <pre>{ "_BOOT_ID" : "5335e9cf5d954633bb99aefc0ec38c25", "_TRANSPORT" : "syslog", "PRIORITY" : "5", "_UID" : "500", "_GID" : "500", "_AUDIT_SESSION" : "2", "_AUDIT_LOGINUID" : "500", "_SYSTEMD_CGROUP" : "/user/lennart/2", "_SYSTEMD_SESSION" : "2", "_SELINUX_CONTEXT" : "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023", "_MACHINE_ID" : "a91663387a90b89f185d4e860000001a", "_HOSTNAME" : "epsilon", "_COMM" : "test-journal-su", "_CMDLINE" : "./test-journal-submit", "SYSLOG_FACILITY" : "1", "_EXE" : "/home/lennart/projects/systemd/test-journal-submit", "_PID" : "3068", "SYSLOG_IDENTIFIER" : "test-journal-submit", "MESSAGE" : "Hello World!", "_SOURCE_REALTIME_TIMESTAMP" : "1351126905014938" }</pre> <p>This nicely shows how the Journal implicitly augmented our little log message with various meta data fields which describe in more detail the context our message was generated from. For an explanation of the various fields, please refer to <a href="http://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html">systemd.journal-fields(7)</a></p> <h4>printf()</h4> <p>If you are writing code that is run as a systemd service, generating journal messages is even easier:</p> <pre>#include &lt;stdio.h&gt; int main(int argc, char *argv[]) { printf("Hello World\n"); return 0; }</pre> <p>Yupp, that's easy, indeed.</p> <p>The printed string in this example is logged at a default log priority of LOG_INFO<sup>[1]</sup>. Sometimes it is useful to change the log priority for such a printed string. When systemd parses STDOUT/STDERR of a service it will look for priority values enclosed in &lt; &gt; at the beginning of each line<sup>[2]</sup>, following the scheme used by the kernel's <tt>printk()</tt> which in turn took inspiration from the BSD syslog network serialization of messages. We can make use of this systemd feature like this:</p> <pre>#include &lt;stdio.h&gt; #define PREFIX_NOTICE "&lt;5&gt;" int main(int argc, char *argv[]) { printf(PREFIX_NOTICE "Hello World\n"); return 0; }</pre> <p>Nice! Logging with nothing but <tt>printf()</tt> but we still get log priorities!</p> <p>This scheme works with any programming language, including, of course, shell:</p> <pre>#!/bin/bash echo "&lt;5&gt;Hellow world"</pre> <h4>Native Messages</h4> <p>Now, what I explained above is not particularly exciting: the take-away is pretty much only that things end up in the journal if they are output using the traditional message printing APIs. Yaaawn!</p> <p>Let's make this more interesting, let's look at what the Journal provides as native APIs for logging, and let's see what its benefits are. Let's translate our little example into the 1:1 counterpart using the Journal's logging API <a href="http://0pointer.de/public/systemd-man/sd_journal_print.html"><tt>sd_journal_print(3)</tt></a>:</p> <pre>#include &lt;systemd/sd-journal.h&gt; int main(int argc, char *argv[]) { sd_journal_print(LOG_NOTICE, "Hello World"); return 0; }</pre> <p>This doesn't look much more interesting than the two examples above, right? After compiling this with <tt>`pkg-config --cflags --libs libsystemd-journal`</tt> appended to the compiler parameters, let's have a closer look at the JSON representation of the journal entry this generates:</p> <pre> { "_BOOT_ID" : "5335e9cf5d954633bb99aefc0ec38c25", "PRIORITY" : "5", "_UID" : "500", "_GID" : "500", "_AUDIT_SESSION" : "2", "_AUDIT_LOGINUID" : "500", "_SYSTEMD_CGROUP" : "/user/lennart/2", "_SYSTEMD_SESSION" : "2", "_SELINUX_CONTEXT" : "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023", "_MACHINE_ID" : "a91663387a90b89f185d4e860000001a", "_HOSTNAME" : "epsilon", <b> "CODE_FUNC" : "main",</b> "_TRANSPORT" : "journal", "_COMM" : "test-journal-su", "_CMDLINE" : "./test-journal-submit", <b> "CODE_FILE" : "src/journal/test-journal-submit.c",</b> "_EXE" : "/home/lennart/projects/systemd/test-journal-submit", "MESSAGE" : "Hello World", <b> "CODE_LINE" : "4",</b> "_PID" : "3516", "_SOURCE_REALTIME_TIMESTAMP" : "1351128226954170" }</pre> <p>This looks pretty much the same, right? Almost! I highlighted three new lines compared to the earlier output. Yes, you guessed it, by using <tt>sd_journal_print()</tt> meta information about the generating source code location is implicitly appended to each message<sup>[3]</sup>, which is helpful for a developer to identify the source of a problem.</p> <p>The primary reason for using the Journal's native logging APIs is a not just the source code location however: it is to allow passing additional structured log messages from the program into the journal. This additional log data may the be used to search the journal for, is available for consumption for other programs, and might help the administrator to track down issues beyond what is expressed in the human readable message text. Here's and example how to do that with <tt>sd_journal_send()</tt>:</p> <pre>#include &lt;systemd/sd-journal.h&gt; #include &lt;unistd.h&gt; #include &lt;stdlib.h&gt; int main(int argc, char *argv[]) { sd_journal_send("MESSAGE=Hello World!", "MESSAGE_ID=52fb62f99e2c49d89cfbf9d6de5e3555", "PRIORITY=5", "HOME=%s", getenv("HOME"), "TERM=%s", getenv("TERM"), "PAGE_SIZE=%li", sysconf(_SC_PAGESIZE), "N_CPUS=%li", sysconf(_SC_NPROCESSORS_ONLN), NULL); return 0; }</pre> <p>This will write a log message to the journal much like the earlier examples. However, this times a few additional, structured fields are attached:</p> <pre>{ "__CURSOR" : "s=ac9e9c423355411d87bf0ba1a9b424e8;i=5930;b=5335e9cf5d954633bb99aefc0ec38c25;m=16544f875b;t=4ccd863cdc4f0;x=896defe53cc1a96a", "__REALTIME_TIMESTAMP" : "1351129666274544", "__MONOTONIC_TIMESTAMP" : "95903778651", "_BOOT_ID" : "5335e9cf5d954633bb99aefc0ec38c25", "PRIORITY" : "5", "_UID" : "500", "_GID" : "500", "_AUDIT_SESSION" : "2", "_AUDIT_LOGINUID" : "500", "_SYSTEMD_CGROUP" : "/user/lennart/2", "_SYSTEMD_SESSION" : "2", "_SELINUX_CONTEXT" : "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023", "_MACHINE_ID" : "a91663387a90b89f185d4e860000001a", "_HOSTNAME" : "epsilon", "CODE_FUNC" : "main", "_TRANSPORT" : "journal", "_COMM" : "test-journal-su", "_CMDLINE" : "./test-journal-submit", "CODE_FILE" : "src/journal/test-journal-submit.c", "_EXE" : "/home/lennart/projects/systemd/test-journal-submit", "MESSAGE" : "Hello World!", "_PID" : "4049", "CODE_LINE" : "6", <b> "MESSAGE_ID" : "52fb62f99e2c49d89cfbf9d6de5e3555",</b> <b> "HOME" : "/home/lennart",</b> <b> "TERM" : "xterm-256color",</b> <b> "PAGE_SIZE" : "4096",</b> <b> "N_CPUS" : "4",</b> "_SOURCE_REALTIME_TIMESTAMP" : "1351129666241467" }</pre> <p>Awesome! Our simple example worked! The five meta data fields we attached to our message appeared in the journal. We used <a href="http://0pointer.de/public/systemd-man/sd_journal_print.html"><tt>sd_journal_send()</tt></a> for this which works much like <tt>sd_journal_print()</tt> but takes a NULL terminated list of format strings each followed by its arguments. The format strings must include the field name and a '=' before the values.</p> <p>Our little structured message included seven fields. The first three we passed are well-known fields:</p> <ol> <li><tt>MESSAGE=</tt> is the actual human readable message part of the structured message.</li> <li><tt>PRIORITY=</tt> is the numeric message priority value as known from BSD syslog formatted as an integer string.</li> <li><tt>MESSAGE_ID=</tt> is a 128bit ID that identifies our specific message call, formatted as hexadecimal string. We randomly generated this string with <tt>journalctl --new-id128</tt>. This can be used by applications to track down all occasions of this specific message. The 128bit can be a UUID, but this is not a requirement or enforced.</li></ol> <p>Applications may relatively freely define additional fields as they see fit (we defined four pretty arbitrary ones in our example). A complete list of the currently well-known fields is available in <a href="http://0pointer.de/public/systemd-man/systemd.journal-fields.html">systemd.journal-fields(7)</a>.</p> <p>Let's see how the message ID helps us finding this message and all its occasions in the journal:</p> <pre> $ journalctl MESSAGE_ID=52fb62f99e2c49d89cfbf9d6de5e3555 -- Logs begin at Thu, 2012-10-18 04:07:03 CEST, end at Thu, 2012-10-25 04:48:21 CEST. -- Oct 25 03:47:46 epsilon test-journal-se[4049]: Hello World! Oct 25 04:40:36 epsilon test-journal-se[4480]: Hello World! </pre> <p>Seems I already invoked this example tool twice!</p> <p>Many messages systemd itself generates <a href="http://cgit.freedesktop.org/systemd/systemd/plain/src/systemd/sd-messages.h">have message IDs</a>. This is useful for example, to find all occasions where a program dumped core (<tt>journalctl MESSAGE_ID=fc2e22bc6ee647b6b90729ab34a250b1</tt>), or when a user logged in (<tt>journalctl MESSAGE_ID=8d45620c1a4348dbb17410da57c60c66</tt>). If your application generates a message that might be interesting to recognize in the journal stream later on, we recommend attaching such a message ID to it. You can easily allocate a new one for your message with <tt>journalctl --new-id128</tt>.</p> <p>This example shows how we can use the Journal's native APIs to generate structured, recognizable messages. You can do much more than this with the C API. For example, you may store binary data in journal fields as well, which is useful to attach coredumps or hard disk SMART states to events where this applies. In order to make this blog story not longer than it already is we'll not go into detail about how to do this, an I ask you to check out <a href="http://0pointer.de/public/systemd-man/sd_journal_print.html"><tt>sd_journal_send(3)</tt></a> for further information on this.</p> <h4>Python</h4> <p>The examples above focus on C. Structured logging to the Journal is also available from other languages. Along with systemd itself we ship bindings for Python. Here's an example how to use this:</p> <pre>from systemd import journal journal.send('Hello world') journal.send('Hello, again, world', FIELD2='Greetings!', FIELD3='Guten tag')</pre> <p>Other binding exist for <a href="http://fourkitchens.com/blog/2012/09/25/nodejs-extension-systemd">Node.js</a>, <a href="https://github.com/systemd/php-systemd">PHP</a>, <a href="https://github.com/philips/luvit-systemd-journal">Lua</a>.</p> <h4>Portability</h4> <p>Generating structured data is a very useful feature for services to make their logs more accessible both for administrators and other programs. In addition to the <i>implicit</i> structure the Journal adds to all logged messages it is highly beneficial if the various components of our stack also provide <i>explicit</i> structure in their messages, coming from within the processes themselves.</p> <p>Porting an existing program to the Journal's logging APIs comes with one pitfall though: the Journal is Linux-only. If non-Linux portability matters for your project it's a good idea to provide an alternative log output, and make it selectable at compile-time.</p> <p>Regardless which way to log you choose, in all cases we'll forward the message to a classic syslog daemon running side-by-side with the Journal, if there is one. However, much of the structured meta data of the message is not forwarded since the classic syslog protocol simply has no generally accepted way to encode this and we shouldn't attempt to serialize meta data into classic syslog messages which might turn <tt>/var/log/messages</tt> into an unreadable dump of machine data. Anyway, to summarize this: regardless if you log with <tt>syslog()</tt>, <tt>printf()</tt>, <tt>sd_journal_print()</tt> or <tt>sd_journal_send()</tt>, the message will be stored and indexed by the journal and it will also be forwarded to classic syslog.</p> <p>And that's it for today. In a follow-up episode we'll focus on retrieving messages from the Journal using the C API, possibly filtering for a specific subset of messages. Later on, I hope to give a real-life example how to port an existing service to the Journal's logging APIs. Stay tuned!</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] This can be changed with the <tt>SyslogLevel=</tt> service setting. See <a href="http://0pointer.de/public/systemd-man/systemd.exec.html">systemd.exec(5)</a> for details.</small></p> <p><small>[2] Interpretation of the &lt; &gt; prefixes of logged lines may be disabled with the <tt>SyslogLevelPrefix=</tt> service setting. See <a href="http://0pointer.de/public/systemd-man/systemd.exec.html">systemd.exec(5)</a> for details.</small></p> <p><small>[3] Appending the code location to the log messages can be turned off at compile time by defining -DSD_JOURNAL_SUPPRESS_LOCATION.</small></p> Lennart PoetteringThu, 25 Oct 2012 04:29:00 +0200tag:0pointer.net,2012-10-25:/blog/projects/journal-submit.htmlprojectssystemd for Administrators, Part XVIIIhttps://0pointer.net/blog/projects/resources.html <p><a href="http://0pointer.de/blog/projects/journalctl.html">Hot on</a> <a href="http://0pointer.de/blog/projects/serial-console.html">the heels</a> <a href="http://0pointer.de/blog/projects/watchdog.html">of the </a> <a href="http://0pointer.de/blog/projects/self-documented-boot.html">previous story</a>, <a href="http://0pointer.de/blog/projects/systemctl-journal.html">here's</a> <a href="http://0pointer.de/blog/projects/security.html">now</a> <a href="http://0pointer.de/blog/projects/inetd.html">the</a> <a href="http://0pointer.de/blog/projects/instances.html">eighteenth</a> <a href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a> <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a> <a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p> <h4>Managing Resources</h4> <p>An important facet of modern computing is resource management: if you run more than one program on a single machine you want to assign the available resources to them enforcing particular policies. This is particularly crucial on smaller, embedded or mobile systems where the scarce resources are the main constraint, but equally for large installations such as cloud setups, where resources are plenty, but the number of programs/services/containers on a single node is drastically higher.</p> <p>Traditionally, on Linux only one policy was really available: all processes got about the same CPU time, or IO bandwith, modulated a bit via the process <i>nice</i> value. This approach is very simple and covered the various uses for Linux quite well for a long time. However, it has drawbacks: not all all processes deserve to be even, and services involving lots of processes (think: Apache with a lot of CGI workers) this way would get more resources than services whith very few (think: syslog).</p> <p>When thinking about service management for systemd, we quickly realized that resource management must be core functionality of it. In a modern world -- regardless if server or embedded -- controlling CPU, Memory, and IO resources of the various services cannot be an afterthought, but must be built-in as first-class service settings. And it must be per-service and not per-process as the traditional nice values or <a href="http://linux.die.net/man/2/setrlimit">POSIX Resource Limits</a> were.</p> <p>In this story I want to shed some light on what you can do to enforce resource policies on systemd services. Resource Management in one way or another has been available in systemd for a while already, so it's really time we introduce this to the broader audience.</p> <p><a href="http://0pointer.de/blog/projects/cgroups-vs-cgroups.html">In an earlier blog post</a> I highlighted the difference between Linux Control Croups (cgroups) as a labelled, hierarchal grouping mechanism, and Linux cgroups as a resource controlling subsystem. While systemd requires the former, the latter is optional. And this optional latter part is now what we can make use of to manage per-service resources. (At this points, it's probably a good idea to read up on <a href="https://en.wikipedia.org/wiki/Cgroups">cgroups</a> before reading on, to get at least a basic idea what they are and what they accomplish. Even thought the explanations below will be pretty high-level, it all makes a lot more sense if you grok the background a bit.)</p> <p>The main Linux cgroup controllers for resource management are <a href="http://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt">cpu</a>, <a href="http://www.kernel.org/doc/Documentation/cgroups/memory.txt">memory</a> and <a href="http://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt">blkio</a>. To make use of these, they need to be enabled in the kernel, which many distributions (including Fedora) do. systemd exposes a couple of high-level service settings to make use of these controllers without requiring too much knowledge of the gory kernel details. </p> <h4>Managing CPU</h4> <p>As a nice default, if the <tt>cpu</tt> controller is enabled in the kernel, systemd will create a cgroup for each service when starting it. Without any further configuration this already has one nice effect: on a systemd system every system service will get an even amount of CPU, regardless how many processes it consists off. Or in other words: on your web server MySQL will get the roughly same amount of CPU as Apache, even if the latter consists a 1000 CGI script processes, but the former only of a few worker tasks. (This behavior can be turned off, see <a href="http://0pointer.de/public/systemd-man/systemd.conf.html">DefaultControllers=</a> in <tt>/etc/systemd/system.conf</tt>.)</p> <p>On top of this default, it is possible to explicitly configure the CPU shares a service gets with the <a href="http://0pointer.de/public/systemd-man/systemd.exec.html">CPUShares=</a> setting. The default value is 1024, if you increase this number you'll assign more CPU to a service than an unaltered one at 1024, if you decrease it, less.</p> <p>Let's see in more detail, how we can make use of this. Let's say we want to assign Apache 1500 CPU shares instead of the default of 1024. For that, let's create a new administrator service file for Apache in <tt>/etc/systemd/system/httpd.service</tt>, overriding the vendor supplied one in <tt>/usr/lib/systemd/system/httpd.service</tt>, but let's change the <tt>CPUShares=</tt> parameter:</p> <pre>.include /usr/lib/systemd/system/httpd.service [Service] CPUShares=1500</pre> <p>The first line will pull in the vendor service file. Now, lets's reload systemd's configuration and restart Apache so that the new service file is taken into account:</p> <pre>systemctl daemon-reload systemctl restart httpd.service</pre> <p>And yeah, that's already it, you are done!</p> <p>(Note that setting <tt>CPUShares=</tt> in a unit file will cause the specific service to get its own cgroup in the <tt>cpu</tt> hierarchy, even if <tt>cpu</tt> is not included in <tt>DefaultControllers=</tt>.)</p> <h4>Analyzing Resource usage</h4> <p>Of course, changing resource assignments without actually understanding the resource usage of the services in questions is like blind flying. To help you understand the resource usage of all services, we created the tool <a href="http://www.freedesktop.org/software/systemd/man/systemd-cgtop.html">systemd-cgtop</a>, that will enumerate all cgroups of the system, determine their resource usage (CPU, Memory, and IO) and present them in a <a href="http://linux.die.net/man/1/top">top</a>-like fashion. Building on the fact that systemd services are managed in cgroups this tool hence can present to you for services what top shows you for processes.</p> <p>Unfortunately, by default <tt>cgtop</tt> will only be able to chart CPU usage per-service for you, IO and Memory are only tracked as total for the entire machine. The reason for this is simply that by default there are no per-service cgroups in the <tt>blkio</tt> and <tt>memory</tt> controller hierarchies but that's what we need to determine the resource usage. The best way to get this data for all services is to simply add the <tt>memory</tt> and <tt>blkio</tt> controllers to the aforementioned <tt>DefaultControllers=</tt> setting in <tt>system.conf</tt>.</p> <h4>Managing Memory</h4> <p>To enforce limits on memory systemd provides the <tt>MemoryLimit=</tt>, and <tt>MemorySoftLimit=</tt> settings for services, summing up the memory of all its processes. These settings take memory sizes in bytes that are the total memory limit for the service. This setting understands the usual K, M, G, T suffixes for Kilobyte, Megabyte, Gigabyte, Terabyte (to the base of 1024).</p> <pre>.include /usr/lib/systemd/system/httpd.service [Service] MemoryLimit=1G</pre> <p>(Analogue to <tt>CPUShares=</tt> above setting this option will cause the service to get its own cgroup in the <tt>memory</tt> cgroup hierarchy.)</p> <h4>Managing Block IO</h4> <p>To control block IO multiple settings are available. First of all <tt>BlockIOWeight=</tt> may be used which assigns an IO <i>weight</i> to a specific service. In behaviour the <i>weight</i> concept is not unlike the <i>shares</i> concept of CPU resource control (see above). However, the default weight is 1000, and the valid range is from 10 to 1000:</p> <pre>.include /usr/lib/systemd/system/httpd.service [Service] BlockIOWeight=500</pre> <p>Optionally, per-device weights can be specified:</p> <pre>.include /usr/lib/systemd/system/httpd.service [Service] BlockIOWeight=/dev/disk/by-id/ata-SAMSUNG_MMCRE28G8MXP-0VBL1_DC06K01009SE009B5252 750</pre> <p>Instead of specifiying an actual device node you also specify any path in the file system:</p> <pre>.include /usr/lib/systemd/system/httpd.service [Service] BlockIOWeight=/home/lennart 750</pre> <p>If the specified path does not refer to a device node systemd will determine the block device <tt>/home/lennart</tt> is on, and assign the bandwith weight to it.</p> <p>You can even add per-device and normal lines at the same time, which will set the per-device weight for the device, and the other value as default for everything else.</p> <p>Alternatively one may control explicit bandwith limits with the <tt>BlockIOReadBandwidth=</tt> and <tt>BlockIOWriteBandwidth=</tt> settings. These settings take a pair of device node and bandwith rate (in bytes per second) or of a file path and bandwith rate:</p> <pre>.include /usr/lib/systemd/system/httpd.service [Service] BlockIOReadBandwith=/var/log 5M</pre> <p>This sets the maximum read bandwith on the block device backing <tt>/var/log</tt> to 5Mb/s.</p> <p>(Analogue to <tt>CPUShares=</tt> and <tt>MemoryLimit=</tt> using any of these three settings will result in the service getting its own cgroup in the <tt>blkio</tt> hierarchy.)</p> <h4>Managing Other Resource Parameters</h4> <p>The options described above cover only a small subset of the available controls the various Linux control group controllers expose. We picked these and added high-level options for them since we assumed that these are the most relevant for most folks, and that they really needed a nice interface that can handle units properly and resolve block device names.</p> <p>In many cases the options explained above might not be sufficient for your usecase, but a low-level kernel cgroup setting might help. It is easy to make use of these options from systemd unit files, without having them covered with a high-level setting. For example, sometimes it might be useful to set the <i>swappiness</i> of a service. The kernel makes this controllable via the <tt>memory.swappiness</tt> cgroup attribute, but systemd does not expose it as a high-level option. Here's how you use it nonetheless, using the low-level <tt>ControlGroupAttribute=</tt> setting:</p> <pre>.include /usr/lib/systemd/system/httpd.service [Service] ControlGroupAttribute=memory.swappiness 70</pre> <p>(Analogue to the other cases this too causes the service to be added to the memory hierarchy.)</p> <p>Later on we might add more high-level controls for the various cgroup attributes. In fact, please ping us if you frequently use one and believe it deserves more focus. We'll consider adding a high-level option for it then. (Even better: send us a patch!)</p> <p><i>Disclaimer:</i> note that making use of the various resource controllers does have a runtime impact on the system. Enforcing resource limits comes at a price. If you do use them, certain operations do get slower. Especially the <tt>memory</tt> controller has (used to have?) a bad reputation to come at a performance cost.</p> <p>For more details on all of this, please have a look at the documenation of the <a href="http://0pointer.de/public/systemd-man/systemd.exec.html">mentioned unit settings</a>, and of the <a href="http://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt">cpu</a>, <a href="http://www.kernel.org/doc/Documentation/cgroups/memory.txt">memory</a> and <a href="http://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt">blkio</a> controllers.</p> <p>And that's it for now. Of course, this blog story only focussed on the per-<i>service</i> resource settings. On top this, you can also set the more traditional, well-known per-<i>process</i> resource settings, which will then be inherited by the various subprocesses, but always only be enforced per-process. More specifically that's <tt>IOSchedulingClass=</tt>, <tt>IOSchedulingPriority=</tt>, <tt>CPUSchedulingPolicy=</tt>, <tt>CPUSchedulingPriority=</tt>, <tt>CPUAffinity=</tt>, <tt>LimitCPU=</tt> and related. These do not make use of cgroup controllers and have a much lower performance cost. We might cover those in a later article in more detail.</p> Lennart PoetteringWed, 24 Oct 2012 04:11:00 +0200tag:0pointer.net,2012-10-24:/blog/projects/resources.htmlprojectssystemd for Administrators, Part XVIIhttps://0pointer.net/blog/projects/journalctl.html <p><a href="http://0pointer.de/blog/projects/serial-console.html">It's</a> <a href="http://0pointer.de/blog/projects/watchdog.html">that</a> <a href="http://0pointer.de/blog/projects/self-documented-boot.html">time again</a>, <a href="http://0pointer.de/blog/projects/systemctl-journal.html">here's</a> <a href="http://0pointer.de/blog/projects/security.html">now</a> <a href="http://0pointer.de/blog/projects/inetd.html">the</a> <a href="http://0pointer.de/blog/projects/instances.html">seventeenth</a> <a href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a> <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a> <a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p> <h4>Using the Journal</h4> <p><a href="http://0pointer.de/blog/projects/systemctl-journal.html">A while back I already</a> posted a blog story introducing some functionality of the journal, and how it is exposed in <tt>systemctl</tt>. In this episode I want to explain a few more uses of the journal, and how you can make it work for you.</p> <p>If you are wondering what the journal is, here's an explanation in a few words to get you up to speed: the journal is a component of <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>, that captures Syslog messages, Kernel log messages, initial RAM disk and early boot messages as well as messages written to STDOUT/STDERR of all services, indexes them and makes this available to the user. It can be used in parallel, or in place of a traditional syslog daemon, such as rsyslog or syslog-ng. For more information, see <a href="http://0pointer.de/blog/projects/the-journal.html">the initial announcement</a>.</p> <p>The journal has been part of Fedora since F17. With Fedora 18 it now has grown into a reliable, powerful tool to handle your logs. Note however, that on F17 and F18 the journal is configured by default to store logs only in a small ring-buffer in <tt>/run/log/journal</tt>, i.e. not persistent. This of course limits its usefulness quite drastically but is sufficient to show a bit of recent log history in <tt>systemctl status</tt>. For Fedora 19, we plan to change this, and enable persistent logging by default. Then, journal files will be stored in <tt>/var/log/journal</tt> and can grow much larger, thus making the journal a lot more useful.</p> <h4>Enabling Persistency</h4> <p>In the meantime, on F17 or F18, you can enable journald's persistent storage manually:</p> <pre># mkdir -p /var/log/journal</pre> <p>After that, it's a good idea to reboot, to get some useful structured data into your journal to play with. Oh, and since you have the journal now, you don't need syslog anymore (unless having <tt>/var/log/messages</tt> as text file is a necessity for you.), so you can choose to deinstall rsyslog:</p> <pre># yum remove rsyslog</pre> <h4>Basics</h4> <p>Now we are ready to go. The following text shows a lot of features of systemd 195 as it will be included in Fedora 18<sup>[1]</sup>, so if your F17 can't do the tricks you see, please wait for F18. First, let's start with some basics. To access the logs of the journal use the <a href="http://www.freedesktop.org/software/systemd/man/journalctl.html">journalctl(1)</a> tool. To have a first look at the logs, just type in:</p> <pre># journalctl</pre> <p>If you run this as root you will see all logs generated on the system, from system components the same way as for logged in users. The output you will get looks like a pixel-perfect copy of the traditional <tt>/var/log/messages</tt> format, but actually has a couple of improvements over it:</p> <ul> <li>Lines of error priority (and higher) will be highlighted red.</li> <li>Lines of notice/warning priority will be highlighted bold.</li> <li>The timestamps are converted into your local time-zone.</li> <li>The output is auto-paged with your pager of choice (defaults to <tt>less</tt>).</li> <li>This will show <i>all</i> available data, including rotated logs.</li> <li>Between the output of each boot we'll add a line clarifying that a new boot begins now.</li> </ul> <p>Note that in this blog story I will not actually show you any of the output this generates, I cut that out for brevity -- and to give you a reason to try it out yourself with a current image for F18's development version with systemd 195. But I do hope you get the idea anyway.</p> <h4>Access Control</h4> <p>Browsing logs this way is already pretty nice. But requiring to be root sucks of course, even administrators tend to do most of their work as unprivileged users these days. By default, Journal users can only watch their own logs, unless they are root or in the <tt>adm</tt> group. To make watching system logs more fun, let's add ourselves to <tt>adm</tt>:</p> <pre># usermod -a -G adm lennart</pre> <p>After logging out and back in as <tt>lennart</tt> I know have access to the full journal of the system and all users:</p> <pre>$ journalctl</pre> <h4>Live View</h4> <p>If invoked without parameters journalctl will show me the current log database. Sometimes one needs to watch logs as they grow, where one previously used <tt>tail -f /var/log/messages</tt>:</p> <pre>$ journalctl -f</pre> <p>Yes, this does exactly what you expect it to do: it will show you the last ten logs lines and then wait for changes and show them as they take place.</p> <h4>Basic Filtering</h4> <p>When invoking <tt>journalctl</tt> without parameters you'll see the whole set of logs, beginning with the oldest message stored. That of course, can be a lot of data. Much more useful is just viewing the logs of the current boot:</p> <pre>$ journalctl -b</pre> <p>This will show you only the logs of the current boot, with all the aforementioned gimmicks mentioned. But sometimes even this is way too much data to process. So what about just listing all the real issues to care about: all messages of priority levels ERROR and worse, from the current boot:</p> <pre>$ journalctl -b -p err</pre> <p>If you reboot only seldom the <tt>-b</tt> makes little sense, filtering based on time is much more useful:</p> <pre>$ journalctl --since=yesterday</pre> <p>And there you go, all log messages from the day before at 00:00 in the morning until right now. Awesome! Of course, we can combine this with <tt>-p err</tt> or a similar match. But humm, we are looking for something that happened on the 15th of October, or was it the 16th?</p> <pre>$ journalctl --since=2012-10-15 --until="2011-10-16 23:59:59"</pre> <p>Yupp, there we go, we found what we were looking for. But humm, I noticed that some CGI script in Apache was acting up earlier today, let's see what Apache logged at that time:</p> <pre>$ journalctl -u httpd --since=00:00 --until=9:30</pre> <p>Oh, yeah, there we found it. But hey, wasn't there an issue with that disk <tt>/dev/sdc</tt>? Let's figure out what was going on there:</p> <pre>$ journalctl /dev/sdc</pre> <p>OMG, a disk error!<sup>[2]</sup> Hmm, let's quickly replace the disk before we lose data. Done! Next! -- Hmm, didn't I see that the vpnc binary made a booboo? Let's check for that:</p> <pre>$ journalctl /usr/sbin/vpnc</pre> <p>Hmm, I don't get this, this seems to be some weird interaction with <tt>dhclient</tt>, let's see both outputs, interleaved:</p> <pre>$ journalctl /usr/sbin/vpnc /usr/sbin/dhclient</pre> <p>That did it! Found it!</p> <h4>Advanced Filtering</h4> <p>Whew! That was awesome already, but let's turn this up a notch. Internally systemd stores each log entry with a set of <i>implicit</i> meta data. This meta data looks a lot like an environment block, but actually is a bit more powerful: values can take binary, large values (though this is the exception, and usually they just contain UTF-8), and fields can have multiple values assigned (an exception too, usually they only have one value). This implicit meta data is collected for each and every log message, without user intervention. The data will be there, and wait to be used by you. Let's see how this looks:</p> <pre>$ journalctl -o verbose -n [...] Tue, 2012-10-23 23:51:38 CEST [s=ac9e9c423355411d87bf0ba1a9b424e8;i=4301;b=5335e9cf5d954633bb99aefc0ec38c25;m=882ee28d2;t=4ccc0f98326e6;x=f21e8b1b0994d7ee] PRIORITY=6 SYSLOG_FACILITY=3 _MACHINE_ID=a91663387a90b89f185d4e860000001a _HOSTNAME=epsilon _TRANSPORT=syslog SYSLOG_IDENTIFIER=avahi-daemon _COMM=avahi-daemon _EXE=/usr/sbin/avahi-daemon _SYSTEMD_CGROUP=/system/avahi-daemon.service _SYSTEMD_UNIT=avahi-daemon.service _SELINUX_CONTEXT=system_u:system_r:avahi_t:s0 _UID=70 _GID=70 _CMDLINE=avahi-daemon: registering [epsilon.local] MESSAGE=Joining mDNS multicast group on interface wlan0.IPv4 with address 172.31.0.53. _BOOT_ID=5335e9cf5d954633bb99aefc0ec38c25 _PID=27937 SYSLOG_PID=27937 _SOURCE_REALTIME_TIMESTAMP=1351029098747042 </pre> <p>(I cut out a lot of noise here, I don't want to make this story overly long. <tt>-n</tt> without parameter shows you the last 10 log entries, but I cut out all but the last.)</p> <p>With the <tt>-o verbose</tt> switch we enabled verbose output. Instead of showing a pixel-perfect copy of classic <tt>/var/log/messages</tt> that only includes a minimimal subset of what is available we now see all the gory details the journal has about each entry. But it's highly interesting: there is user credential information, SELinux bits, machine information and more. For a full list of common, well-known fields, see <a href="http://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html">the man page</a>.</p> <p>Now, as it turns out the journal database is indexed by <i>all</i> of these fields, out-of-the-box! Let's try this out:</p> <pre>$ journalctl _UID=70</pre> <p>And there you go, this will show all log messages logged from Linux user ID 70. As it turns out one can easily combine these matches:</p> <pre>$ journalctl _UID=70 _UID=71</pre> <p>Specifying two matches for the same field will result in a logical OR combination of the matches. All entries matching either will be shown, i.e. all messages from either UID 70 or 71.</p> <pre>$ journalctl _HOSTNAME=epsilon _COMM=avahi-daemon</pre> <p>You guessed it, if you specify two matches for different field names, they will be combined with a logical AND. All entries matching both will be shown now, meaning that all messages from processes named <tt>avahi-daemon</tt> <i>and</i> host <tt>epsilon</tt>.</p> <p>But of course, that's not fancy enough for us. We are computer nerds after all, we live off logical expressions. We must go deeper!</p> <pre>$ journalctl _HOSTNAME=theta _UID=70 + _HOSTNAME=epsilon _COMM=avahi-daemon</pre> <p>The + is an explicit OR you can use in addition to the implied OR when you match the same field twice. The line above hence means: show me everything from host <tt>theta</tt> with UID 70, or of host <tt>epsilon</tt> with a process name of <tt>avahi-daemon</tt>.</p> <h4>And now, it becomes magic!</h4> <p>That was already pretty cool, right? Righ! But heck, who can remember all those values a field can take in the journal, I mean, seriously, who has thaaaat kind of photographic memory? Well, the journal has:</p> <pre>$ journalctl -F _SYSTEMD_UNIT</pre> <p>This will show us all values the field _SYSTEMD_UNIT takes in the database, or in other words: the names of all systemd services which ever logged into the journal. This makes it super-easy to build nice matches. But wait, turns out this all is actually hooked up with shell completion on bash! This gets even more awesome: as you type your match expression you will get a list of well-known field names, and of the values they can take! Let's figure out how to filter for SELinux labels again. We remember the field name was something with SELINUX in it, let's try that:</p> <pre>$ journalctl _SE<b>&lt;TAB&gt;</b></pre> <p>And yupp, it's immediately completed:</p> <pre>$ journalctl _SELINUX_CONTEXT=</pre> <p>Cool, but what's the label again we wanted to match for?</p> <pre>$ journalctl _SELINUX_CONTEXT=<b>&lt;TAB&gt;&lt;TAB&gt;</b> kernel system_u:system_r:local_login_t:s0-s0:c0.c1023 system_u:system_r:udev_t:s0-s0:c0.c1023 system_u:system_r:accountsd_t:s0 system_u:system_r:lvm_t:s0 system_u:system_r:virtd_t:s0-s0:c0.c1023 system_u:system_r:avahi_t:s0 system_u:system_r:modemmanager_t:s0-s0:c0.c1023 system_u:system_r:vpnc_t:s0 system_u:system_r:bluetooth_t:s0 system_u:system_r:NetworkManager_t:s0 system_u:system_r:xdm_t:s0-s0:c0.c1023 system_u:system_r:chkpwd_t:s0-s0:c0.c1023 system_u:system_r:policykit_t:s0 unconfined_u:system_r:rpm_t:s0-s0:c0.c1023 system_u:system_r:chronyd_t:s0 system_u:system_r:rtkit_daemon_t:s0 unconfined_u:system_r:unconfined_t:s0-s0:c0.c1023 system_u:system_r:crond_t:s0-s0:c0.c1023 system_u:system_r:syslogd_t:s0 unconfined_u:system_r:useradd_t:s0-s0:c0.c1023 system_u:system_r:devicekit_disk_t:s0 system_u:system_r:system_cronjob_t:s0-s0:c0.c1023 unconfined_u:unconfined_r:unconfined_dbusd_t:s0-s0:c0.c1023 system_u:system_r:dhcpc_t:s0 system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 system_u:system_r:dnsmasq_t:s0-s0:c0.c1023 system_u:system_r:systemd_logind_t:s0 system_u:system_r:init_t:s0 system_u:system_r:systemd_tmpfiles_t:s0</pre> <p>Ah! Right! We wanted to see everything logged under PolicyKit's security label:</p> <pre>$ journalctl _SELINUX_CONTEXT=system_u:system_r:policykit_t:s0</pre> <p>Wow! That was easy! I didn't know anything related to SELinux could be thaaat easy! ;-) Of course this kind of completion works with any field, not just SELinux labels.</p> <p>So much for now. There's a lot more cool stuff in <a href="http://www.freedesktop.org/software/systemd/man/journalctl.html">journalctl(1)</a> than this. For example, it generates JSON output for you! You can match against kernel fields! You can get simple <tt>/var/log/messages</tt>-like output but with <i>relative</i> timestamps! And so much more!</p> <p>Anyway, in the next weeks I hope to post more stories about all the cool things the journal can do for you. This is just the beginning, stay tuned.</p> <p><small>Footnotes</small></p> <p><small>[1] systemd 195 is currently still in <a href="https://admin.fedoraproject.org/updates/FEDORA-2012-16709/systemd-195-1.fc18">Bodhi</a> but hopefully will get into F18 proper soon, and definitely before the release of Fedora 18.</small></p> <p><small>[2] OK, I cheated here, indexing by block device is not in the kernel yet, but on its way due to <a href="http://www.spinics.net/lists/linux-scsi/msg62499.html">Hannes' fantastic work</a>, and I hope it will make appearence in F18.</small></p> Lennart PoetteringWed, 24 Oct 2012 00:16:00 +0200tag:0pointer.net,2012-10-24:/blog/projects/journalctl.htmlprojectssystemd for Administrators, Part XVIhttps://0pointer.net/blog/projects/serial-console.html <p><a href="http://0pointer.de/blog/projects/watchdog.html">And,</a> <a href="http://0pointer.de/blog/projects/self-documented-boot.html">yes,</a> <a href="http://0pointer.de/blog/projects/systemctl-journal.html">here's</a> <a href="http://0pointer.de/blog/projects/security.html">now</a> <a href="http://0pointer.de/blog/projects/inetd.html">the</a> <a href="http://0pointer.de/blog/projects/instances.html">sixteenth</a> <a href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a> <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a> <a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p> <h4>Gettys on Serial Consoles (and Elsewhere)</h4> <p><i>TL;DR: To make use of a serial console, just use <tt>console=ttyS0</tt> on the kernel command line, and systemd will automatically start a getty on it for you.</i></p> <p>While physical <a href="https://en.wikipedia.org/wiki/RS-232">RS232</a> serial ports have become exotic in today's PCs they play an important role in modern servers and embedded hardware. They provide a relatively robust and minimalistic way to access the console of your device, that works even when the network is hosed, or the primary UI is unresponsive. VMs frequently emulate a serial port as well.</p> <p>Of course, Linux has always had good support for serial consoles, but with <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> we tried to make serial console support even simpler to use. In the following text I'll try to give an overview how serial console <a href="https://en.wikipedia.org/wiki/Getty_%28Unix%29">gettys</a> on systemd work, and how TTYs of any kind are handled.</p> <p>Let's start with the key take-away: in most cases, to get a login prompt on your serial prompt you don't need to do anything. systemd checks the kernel configuration for the selected kernel console and will simply spawn a serial getty on it. That way it is entirely sufficient to configure your kernel console properly (for example, by adding <tt>console=ttyS0</tt> to the kernel command line) and that's it. But let's have a look at the details:</p> <p>In systemd, two template units are responsible for bringing up a login prompt on text consoles:</p> <ol> <li><tt>getty@.service</tt> is responsible for <a href="https://en.wikipedia.org/wiki/Virtual_console">virtual terminal</a> (VT) login prompts, i.e. those on your VGA screen as exposed in <tt>/dev/tty1</tt> and similar devices.</li> <li><tt>serial-getty@.service</tt> is responsible for all other terminals, including serial ports such as <tt>/dev/ttyS0</tt>. It differs in a couple of ways from <tt>getty@.service</tt>: among other things the <tt>$TERM</tt> environment variable is set to <tt>vt102</tt> (hopefully a good default for most serial terminals) rather than <tt>linux</tt> (which is the right choice for VTs only), and a special logic that clears the VT scrollback buffer (and only work on VTs) is skipped.</li> </ol> <h5>Virtual Terminals</h5> <p>Let's have a closer look how <tt>getty@.service</tt> is started, i.e. how login prompts on the virtual terminal (i.e. non-serial TTYs) work. Traditionally, the init system on Linux machines was configured to spawn a fixed number login prompts at boot. In most cases six instances of the getty program were spawned, on the first six VTs, <tt>tty1</tt> to <tt>tty6</tt>.</p> <p>In a systemd world we made this more dynamic: in order to make things more efficient login prompts are now started on demand only. As you switch to the VTs the getty service is instantiated to <tt>getty@tty2.service</tt>, <tt>getty@tty5.service</tt> and so on. Since we don't have to unconditionally start the getty processes anymore this allows us to save a bit of resources, and makes start-up a bit faster. This behaviour is mostly transparent to the user: if the user activates a VT the getty is started right-away, so that the user will hardly notice that it wasn't running all the time. If he then logs in and types <tt>ps</tt> he'll notice however that getty instances are only running for the VTs he so far switched to.</p> <p>By default this automatic spawning is done for the VTs up to VT6 only (in order to be close to the traditional default configuration of Linux systems)<sup>[1]</sup>. Note that the auto-spawning of gettys is only attempted if no other subsystem took possession of the VTs yet. More specifically, if a user makes frequent use of <a href="https://en.wikipedia.org/wiki/Fast_user_switching">fast user switching</a> via GNOME he'll get his X sessions on the first six VTs, too, since the lowest available VT is allocated for each session.</p> <p>Two VTs are handled specially by the auto-spawning logic: firstly <tt>tty1</tt> gets special treatment: if we boot into graphical mode the display manager takes possession of this VT. If we boot into multi-user (text) mode a getty is started on it -- unconditionally, without any on-demand logic<sup>[2]</sup>.</p> <p>Secondly, <tt>tty6</tt> is especially reserved for auto-spawned gettys and unavailable to other subsystems such as X<sup>[3]</sup>. This is done in order to ensure that there's always a way to get a text login, even if due to fast user switching X took possession of more than 5 VTs.</p> <h5>Serial Terminals</h5> <p>Handling of login prompts on serial terminals (and all other kind of non-VT terminals) is different from that of VTs. By default systemd will instantiate one <tt>serial-getty@.service</tt> on the main kernel<sup>[4]</sup> console, if it is not a virtual terminal. The kernel console is where the kernel outputs its own log messages and is usually configured on the kernel command line in the boot loader via an argument such as <tt>console=ttyS0</tt><sup>[5]</sup>. This logic ensures that when the user asks the kernel to redirect its output onto a certain serial terminal, he will automatically also get a login prompt on it as the boot completes<sup>[6]</sup>. systemd will also spawn a login prompt on the first special VM console (that's <tt>/dev/hvc0</tt>, <tt>/dev/xvc0</tt>, <tt>/dev/hvsi0</tt>), if the system is run in a VM that provides these devices. This logic is implemented in a <a href="http://www.freedesktop.org/wiki/Software/systemd/Generators">generator</a> called <a href="http://www.freedesktop.org/software/systemd/man/systemd-getty-generator.html">systemd-getty-generator</a> that is run early at boot and pulls in the necessary services depending on the execution environment.</p> <p>In many cases, this automatic logic should already suffice to get you a login prompt when you need one, without any specific configuration of systemd. However, sometimes there's the need to manually configure a serial getty, for example, if more than one serial login prompt is needed or the kernel console should be redirected to a different terminal than the login prompt. To facilitate this it is sufficient to instantiate <tt>serial-getty@.service</tt> once for each serial port you want it to run on<sup>[7]</sup>:</p> <pre># systemctl enable serial-getty@ttyS2.service # systemctl start serial-getty@ttyS2.service</pre> <p>And that's it. This will make sure you get the login prompt on the chosen port on all subsequent boots, and starts it right-away too.</p> <p>Sometimes, there's the need to configure the login prompt in even more detail. For example, if the default baud rate configured by the kernel is not correct or other <tt>agetty</tt> parameters need to be changed. In such a case simply copy the default unit template to <tt>/etc/systemd/system</tt> and edit it there:</p> <pre># cp /usr/lib/systemd/system/serial-getty@.service /etc/systemd/system/serial-getty@ttyS2.service # vi /etc/systemd/system/serial-getty@ttyS2.service .... now make your changes to the agetty command line ... # ln -s /etc/systemd/system/serial-getty@ttyS2.service /etc/systemd/system/getty.target.wants/ # systemctl daemon-reload # systemctl start serial-getty@ttyS2.service</pre> <p>This creates a unit file that is specific to serial port <tt>ttyS2</tt>, so that you can make specific changes to this port and this port only.</p> <p>And this is pretty much all there's to say about serial ports, VTs and login prompts on them. I hope this was interesting, and please come back soon for the next installment of this series!</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] You can easily modify this by changing <tt>NAutoVTs=</tt> in <a href="http://www.freedesktop.org/software/systemd/man/logind.conf.html">logind.conf</a>.</small></p> <p><small>[2] Note that whether the getty on VT1 is started on-demand or not hardly makes a difference, since VT1 is the default active VT anyway, so the demand is there anyway at boot.</small></p> <p><small>[3] You can easily change this special reserved VT by modifying <tt>ReserveVT=</tt> in <a href="http://www.freedesktop.org/software/systemd/man/logind.conf.html">logind.conf</a>.</small></p> <p><small>[4] If multiple kernel consoles are used simultaneously, the <i>main</i> console is the one listed <i>first</i> in <tt>/sys/class/tty/console/active</tt>, which is the <i>last</i> one listed on the kernel command line.</small></p> <p><small>[5] See <a href="https://www.kernel.org/doc/Documentation/kernel-parameters.txt">kernel-parameters.txt</a> for more information on this kernel command line option.</small></p> <p><small>[6] Note that <tt>agetty -s</tt> is used here so that the baud rate configured at the kernel command line is not altered and continued to be used by the login prompt.</small></p> <p><small>[7] Note that this <tt>systemctl enable</tt> syntax only works with systemd 188 and newer (i.e. F18). On older versions use <tt>ln -s /usr/lib/systemd/system/serial-getty@.service /etc/systemd/system/getty.target.wants/serial-getty@ttyS2.service ; systemctl daemon-reload</tt> instead.</small></p> Lennart PoetteringSat, 13 Oct 2012 02:56:00 +0200tag:0pointer.net,2012-10-13:/blog/projects/serial-console.htmlprojectsBerlin Open Source Meetuphttps://0pointer.net/blog/projects/berlin-open-source-meetup.html <p><a href="http://blixtra.org/blog/2012/08/06/berlin-open-source-meetup/"><img src="http://blixtra.org/blog/wp-content/uploads/2012/08/Prater.jpg" width="500" height="375" alt="Prater" /></a></p> <p>Chris K&uuml;hl and I are organizing a <a href="http://blixtra.org/blog/2012/08/06/berlin-open-source-meetup/">Berlin Open Source Meetup</a> on Aug 19th at the Prater Biergarten in Prenzlauer Berg. If you live in Berlin (or are passing by) and are involved in or interested in Open Source then you are invited!</p> <p><a href="https://plus.google.com/u/0/events/c9ffkptmk6kbjkgn7nb7bh5i1ek/107949128852701224835">There's also a Google+ event for the meetup.</a></p> <p>It's a public event, so everybody is welcome, and please feel free to invite others!</p> <p>See you at the Prater!</p> Lennart PoetteringMon, 06 Aug 2012 14:59:00 +0200tag:0pointer.net,2012-08-06:/blog/projects/berlin-open-source-meetup.htmlprojectsUpcoming Hackfests/Sprintshttps://0pointer.net/blog/hackfests.html <p>The <a href="http://www.linuxplumbersconf.org/2012/">Linux Plumbers Conference 2012</a> will take place August 29th to 31st in San Diego, California. We, the <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> developers, would like to invite you to two hackfests/sprints that will happen around LPC:</p> <h4>San Diego: libvirt/LXC/systemd/SELinux Integration Hackfest</h4> <p>On <b>28th of August</b> we'll have a hackfest on the topic of closer integration of libvirt, LXC, systemd and SELinux, colocated with LPC in San Diego, California. We'll have a number of key people from these projects participating, including Dan Walsh, Eric Paris, Daniel P. Berrange, Kay Sievers and myself.</p> <p>Topics we'll cover: making Fedora/Linux boot entirely cleanly in normal containers, teaching systemd's control tools minimal container-awareness (such as being able to list all services of all containers in one go, in addition to those running on the host system), unified journal logging across multiple containers, the <a href="http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface">systemd container interface</a>, auditing and containers, running multiple instances from the same <tt>/usr</tt> tree, and a lot more...</p> <p><b>Who should attend?</b> Everybody hacking on the mentioned projects who wants to help integrating them with the goal of turning them into a secure, reliable, powerful container solution for Linux.</p> <p><b>Who should not attend?</b> If you don't hack on any of these projects, or if you are not interested in closer integration of at least two of these projects.</p> <p><b>How to register?</b> Just show up. You get extra points however for letting us know in advance (just send us an email). Attendance is free.</p> <p>&#10149; See also: <a href="https://plus.google.com/u/0/events/cvs9oi2q802vh57o1vr9le7tsjc/115547683951727699051">Google+ Event</a></p> <h4>San Francisco: systemd Journal Sprint</h4> <p>On <b>September 3-7</b> we'll have a sprint on the topic of the systemd Journal. It's going to take place at the <a href="https://www.getpantheon.com/">Pantheon</a> headquarters in San Francisco, California. Among others, Kay Sievers, David Strauss and I will participate.</p> <p><b>Who should attend?</b> Everybody who wants to help improving the systemd Journal, regardless if in its core itself, in client software for it, hooking up other projects or writing library bindings for it. Also, if you are using or planning to use the journal for a project, we'd be very interested in high-bandwith face-to-face feedback regarding what you are missing, what you don't like so much, and what you find awesome in the Journal.</p> <p><b>How to register?</b> Please sign up at <a href="http://systemd.eventbrite.com/">EventBrite</a>. Attendance is free. For more information see the <a href="http://lists.freedesktop.org/archives/systemd-devel/2012-July/005803.html">invitation mail</a>.</p> <p>&#10149; See also: <a href="https://plus.google.com/u/0/events/cee28a21tk5lfv0u224kj6pa930/115547683951727699051">Google+ Event</a></p> <p><i>See you in California!</i></p> Lennart PoetteringFri, 20 Jul 2012 02:52:00 +0200tag:0pointer.net,2012-07-20:/blog/hackfests.htmlmiscfoss.in 2012 CFP Ends in a Few Hourshttps://0pointer.net/blog/projects/fossin2012.html <p><a href="http://foss.in/">foss.in 2012 in Bangalore</a> takes place again after a hiatus of some years. It has always been a fantastic conference, and a great opportunity to visit Bangalore and India. I just submitted my talk proposals, so, hurry up, and <a href="http://foss.in/participate/call-for-participation">submit yours</a>!</p> Lennart PoetteringSun, 08 Jul 2012 15:47:00 +0200tag:0pointer.net,2012-07-08:/blog/projects/fossin2012.htmlprojectssystemd for Administrators, Part XVhttps://0pointer.net/blog/projects/watchdog.html <p><a href="http://0pointer.de/blog/projects/self-documented-boot.html">Quickly following the previous iteration</a>, <a href="http://0pointer.de/blog/projects/systemctl-journal.html">here's</a> <a href="http://0pointer.de/blog/projects/security.html">now</a> <a href="http://0pointer.de/blog/projects/inetd.html">the</a> <a href="http://0pointer.de/blog/projects/instances.html">fifteenth</a> <a href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a> <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a> <a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p> <h4>Watchdogs</h4> <p>There are three big target audiences we try to cover with <a href="http://www.freedesktop.org/wiki/Software/systemd/">systemd</a>: the embedded/mobile folks, the desktop people and the server folks. While the systems used by embedded/mobile tend to be underpowered and have few resources are available, desktops tend to be much more powerful machines -- but still much less resourceful than servers. Nonetheless there are surprisingly many features that matter to both extremes of this axis (embedded and servers), but not the center (desktops). On of them is support for <a href="https://en.wikipedia.org/wiki/Watchdog_timer">watchdogs</a> in hardware and software.</p> <p>Embedded devices frequently rely on watchdog hardware that resets it automatically if software stops responding (more specifically, stops signalling the hardware in fixed intervals that it is still alive). This is required to increase reliability and make sure that regardless what happens the best is attempted to get the system working again. Functionality like this makes little sense on the desktop<sup>[1]</sup>. However, on high-availability servers watchdogs are frequently used, again.</p> <p>Starting with version 183 systemd provides full support for hardware watchdogs (as exposed in <tt>/dev/watchdog</tt> to userspace), as well as supervisor (software) watchdog support for invidual system services. The basic idea is the following: if enabled, systemd will regularly ping the watchdog hardware. If systemd or the kernel hang this ping will not happen anymore and the hardware will automatically reset the system. This way systemd and the kernel are protected from boundless hangs -- by the hardware. To make the chain complete, systemd then exposes a software watchdog interface for individual services so that they can also be restarted (or some other action taken) if they begin to hang. This software watchdog logic can be configured individually for each service in the ping frequency and the action to take. Putting both parts together (i.e. hardware watchdogs supervising systemd and the kernel, as well as systemd supervising all other services) we have a reliable way to watchdog every single component of the system.</p> <p>To make use of the hardware watchdog it is sufficient to set the <tt>RuntimeWatchdogSec=</tt> option in <tt>/etc/systemd/system.conf</tt>. It defaults to 0 (i.e. no hardware watchdog use). Set it to a value like 20s and the watchdog is enabled. After 20s of no keep-alive pings the hardware will reset itself. Note that systemd will send a ping to the hardware at half the specified interval, i.e. every 10s. And that's already all there is to it. By enabling this single, simple option you have turned on supervision by the hardware of systemd and the kernel beneath it.<sup>[2]</sup></p> <p>Note that the hardware watchdog device (<tt>/dev/watchdog</tt>) is single-user only. That means that you can either enable this functionality in systemd, or use a separate external watchdog daemon, such as the aptly named <a href="http://linux.die.net/man/8/watchdog">watchdog</a>.</p> <p><tt>ShutdownWatchdogSec=</tt> is another option that can be configured in <tt>/etc/systemd/system.conf</tt>. It controls the watchdog interval to use during reboots. It defaults to 10min, and adds extra reliability to the system reboot logic: if a clean reboot is not possible and shutdown hangs, we rely on the watchdog hardware to reset the system abruptly, as extra safety net.</p> <p>So much about the hardware watchdog logic. These two options are really everything that is necessary to make use of the hardware watchdogs. Now, let's have a look how to add watchdog logic to individual services.</p> <p>First of all, to make software watchdog-supervisable it needs to be patched to send out "I am alive" signals in regular intervals in its event loop. Patching this is relatively easy. First, a daemon needs to read the <tt>WATCHDOG_USEC=</tt> environment variable. If it is set, it will contain the watchdog interval in usec formatted as ASCII text string, as it is configured for the service. The daemon should then issue <tt><a href="http://www.freedesktop.org/software/systemd/man/sd_notify.html">sd_notify</a>("WATCHDOG=1")</tt> calls every half of that interval. A daemon patched this way should transparently support watchdog functionality by checking whether the environment variable is set and honouring the value it is set to.</p> <p>To enable the software watchdog logic for a service (which has been patched to support the logic pointed out above) it is sufficient to set the <tt>WatchdogSec=</tt> to the desired failure latency. See <a href="http://www.freedesktop.org/software/systemd/man/systemd.service.html">systemd.service(5)</a> for details on this setting. This causes <tt>WATCHDOG_USEC=</tt> to be set for the service's processes and will cause the service to enter a failure state as soon as no keep-alive ping is received within the configured interval.</p> <p>If a service enters a failure state as soon as the watchdog logic detects a hang, then this is hardly sufficient to build a reliable system. The next step is to configure whether the service shall be restarted and how often, and what to do if it then still fails. To enable automatic service restarts on failure set <tt>Restart=on-failure</tt> for the service. To configure how many times a service shall be attempted to be restarted use the combination of <tt>StartLimitBurst=</tt> and <tt>StartLimitInterval=</tt> which allow you to configure how often a service may restart within a time interval. If that limit is reached, a special action can be taken. This action is configured with <tt>StartLimitAction=</tt>. The default is a <tt>none</tt>, i.e. that no further action is taken and the service simply remains in the failure state without any further attempted restarts. The other three possible values are <tt>reboot</tt>, <tt>reboot-force</tt> and <tt>reboot-immediate</tt>. <tt>reboot</tt> attempts a clean reboot, going through the usual, clean shutdown logic. <tt>reboot-force</tt> is more abrupt: it will not actually try to cleanly shutdown any services, but immediately kills all remaining services and unmounts all file systems and then forcibly reboots (this way all file systems will be clean but reboot will still be very fast). Finally, <tt>reboot-immediate</tt> does not attempt to kill any process or unmount any file systems. Instead it just hard reboots the machine without delay. <tt>reboot-immediate</tt> hence comes closest to a reboot triggered by a hardware watchdog. All these settings are documented in <a href="http://www.freedesktop.org/software/systemd/man/systemd.service.html">systemd.service(5)</a>.</p> <p>Putting this all together we now have pretty flexible options to watchdog-supervise a specific service and configure automatic restarts of the service if it hangs, plus take ultimate action if that doesn't help.</p> <p>Here's an example unit file:</p> <pre>[Unit] Description=My Little Daemon Documentation=man:mylittled(8) [Service] ExecStart=/usr/bin/mylittled WatchdogSec=30s Restart=on-failure StartLimitInterval=5min StartLimitBurst=4 StartLimitAction=reboot-force </pre> <p>This service will automatically be restarted if it hasn't pinged the system manager for longer than 30s or if it fails otherwise. If it is restarted this way more often than 4 times in 5min action is taken and the system quickly rebooted, with all file systems being clean when it comes up again.</p> <p>And that's already all I wanted to tell you about! With hardware watchdog support right in PID 1, as well as supervisor watchdog support for individual services we should provide everything you need for most watchdog usecases. Regardless if you are building an embedded or mobile applience, or if your are working with high-availability servers, please give this a try!</p> <p>(Oh, and if you wonder why in heaven PID 1 needs to deal with <tt>/dev/watchdog</tt>, and why this shouldn't be kept in a separate daemon, then please read this again and try to understand that this is all about the supervisor chain we are building here, where the hardware watchdog supervises systemd, and systemd supervises the individual services. Also, we believe that a service not responding should be treated in a similar way as any other service error. Finally, pinging <tt>/dev/watchdog</tt> is one of the most trivial operations in the OS (basically little more than a ioctl() call), to the support for this is not more than a handful lines of code. Maintaining this externally with complex IPC between PID 1 (and the daemons) and this watchdog daemon would be drastically more complex, error-prone and resource intensive.)</p> <p>Note that the built-in hardware watchdog support of systemd does not conflict with other watchdog software by default. systemd does not make use of <tt>/dev/watchdog</tt> by default, and you are welcome to use external watchdog daemons in conjunction with systemd, if this better suits your needs.</p> <p>And one last thing: if you wonder whether your hardware has a watchdog, then the answer is: almost definitely yes -- if it is anything more recent than a few years. If you want to verify this, try the <a href="http://karelzak.blogspot.de/2012/05/eject1-sulogin1-wdctl1.html">wdctl</a> tool from recent util-linux, which shows you everything you need to know about your watchdog hardware.</p> <p>I'd like to thank the great folks from <a href="http://www.pengutronix.de/">Pengutronix</a> for contributing most of the watchdog logic. Thank you!</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] Though actually most desktops tend to include watchdog hardware these days too, as this is cheap to build and available in most modern PC chipsets.</small></p> <p><small>[2] So, here's a free tip for you if you hack on the core OS: don't enable this feature while you hack. Otherwise your system might suddenly reboot if you are in the middle of tracing through PID 1 with gdb and cause it to be stopped for a moment, so that no hardware ping can be done...</small></p> Lennart PoetteringThu, 28 Jun 2012 00:07:00 +0200tag:0pointer.net,2012-06-28:/blog/projects/watchdog.htmlprojectssystemd for Administrators, Part XIVhttps://0pointer.net/blog/projects/self-documented-boot.html <p><a href="http://0pointer.de/blog/projects/systemctl-journal.html">And</a> <a href="http://0pointer.de/blog/projects/security.html">here's</a> <a href="http://0pointer.de/blog/projects/inetd.html">the</a> <a href="http://0pointer.de/blog/projects/instances.html">fourteenth</a> <a href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a> <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a> <a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p> <h4>The Self-Explanatory Boot</h4> <p>One complaint we often hear about <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> is that its boot process was hard to understand, even incomprehensible. In general I can only disagree with this sentiment, I even believe in quite the opposite: in comparison to what we had before -- where to even remotely understand what was going on you had to have a decent comprehension of the programming language that is Bourne Shell<sup>[1]</sup> -- understanding systemd's boot process is substantially easier. However, like in many complaints there is some truth in this frequently heard discomfort: for a seasoned Unix administrator there indeed is a bit of learning to do when the switch to <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> is made. And as systemd developers it is our duty to make the learning curve shallow, introduce as few surprises as we can, and provide good documentation where that is not possible.</p> <p>systemd always had huge body of documentation <a href="http://www.freedesktop.org/software/systemd/man/">as manual pages</a> (nearly 100 individual pages now!), in the <a href="http://www.freedesktop.org/wiki/Software/systemd">Wiki</a> and the various blog stories I posted. However, any amount of documentation alone is not enough to make software easily understood. In fact, thick manuals sometimes appear intimidating and make the reader wonder where to start reading, if all he was interested in was this one simple concept of the whole system.</p> <p>Acknowledging all this we have now added a new, neat, little feature to systemd: the self-explanatory boot process. What do we mean by that? Simply that each and every single component of our boot comes with documentation and that this documentation is closely linked to its component, so that it is easy to find.</p> <p>More specifically, all units in systemd (which are what encapsulate the components of the boot) now include references to their documentation, the documentation of their configuration files and further applicable manuals. A user who is trying to understand the purpose of a unit, how it fits into the boot process and how to configure it can now easily look up this documentation with the well-known <tt>systemctl status</tt> command. Here's an example how this looks for <tt>systemd-logind.service</tt>:</p> <pre> $ systemctl status systemd-logind.service systemd-logind.service - Login Service Loaded: loaded (/usr/lib/systemd/system/systemd-logind.service; static) Active: active (running) since Mon, 25 Jun 2012 22:39:24 +0200; 1 day and 18h ago Docs: <a href="http://www.freedesktop.org/software/systemd/man/systemd-logind.service.html">man:systemd-logind.service(7)</a> <a href="http://www.freedesktop.org/software/systemd/man/logind.conf.html">man:logind.conf(5)</a> <a href="http://www.freedesktop.org/wiki/Software/systemd/multiseat">http://www.freedesktop.org/wiki/Software/systemd/multiseat</a> Main PID: 562 (systemd-logind) CGroup: name=systemd:/system/systemd-logind.service └ 562 /usr/lib/systemd/systemd-logind Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event2 (Power Button) Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event6 (Video Bus) Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event0 (Lid Switch) Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event1 (Sleep Button) Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event7 (ThinkPad Extra Buttons) Jun 25 22:39:25 epsilon systemd-logind[562]: New session 1 of user gdm. Jun 25 22:39:25 epsilon systemd-logind[562]: Linked /tmp/.X11-unix/X0 to /run/user/42/X11-display. Jun 25 22:39:32 epsilon systemd-logind[562]: New session 2 of user lennart. Jun 25 22:39:32 epsilon systemd-logind[562]: Linked /tmp/.X11-unix/X0 to /run/user/500/X11-display. Jun 25 22:39:54 epsilon systemd-logind[562]: Removed session 1. </pre> <p>On the first look this output changed very little. If you look closer however you will find that it now includes one new field: <tt>Docs</tt> lists references to the documentation of this service. In this case there are two man page URIs and one web URL specified. The man pages describe the purpose and configuration of this service, the web URL includes an introduction to the basic concepts of this service.</p> <p>If the user uses a recent graphical terminal implementation it is sufficient to click on the URIs shown to get the respective documentation<sup>[2]</sup>. With other words: it never has been that easy to figure out what a specific component of our boot is about: just use <tt>systemctl status</tt> to get more information about it and click on the links shown to find the documentation.</p> <p>The past days I have written man pages and added these references for every single unit we ship with systemd. This means, with <tt>systemctl status</tt> you now have a very easy way to find out more about every single service of the core OS.</p> <p>If you are not using a graphical terminal (where you can just click on URIs), a man page URI in the middle of the output of <tt>systemctl status</tt> is not the most useful thing to have. To make reading the referenced man pages easier we have also added a new command:</p> <pre>systemctl help systemd-logind.service</pre> <p>Which will open the listed man pages right-away, without the need to click anything or copy/paste an URI.</p> <p>The URIs are in the formats documented by the <a href="https://www.kernel.org/doc/man-pages/online/pages/man7/url.7.html">uri(7)</a> man page. Units may reference http and https URLs, as well as man and info pages.</p> <p>Of course all this doesn't make everything self-explanatory, simply because the user still has to find out about <tt>systemctl status</tt> (and even <tt>systemctl</tt> in the first place so that he even knows what units there are); however with this basic knowledge further help on specific units is in very easy reach.</p> <p>We hope that this kind of interlinking of runtime behaviour and the matching documentation is a big step forward to make our boot easier to understand.</p> <p>This functionality is partially already available in Fedora 17, and will show up in complete form in Fedora 18.</p> <p>That all said, credit where credit is due: this kind of references to documentation within the service descriptions is not new, Solaris' SMF had similar functionality for quite some time. However, we believe this new systemd feature is certainly a novelty on Linux, and with systemd we now offer you the best documented and best self-explaining init system.</p> <p>Of course, if you are writing unit files for your own packages, please consider also including references to the documentation of your services and its configuration. This is really easy to do, just list the URIs in the new <tt>Documentation=</tt> field in the <tt>[Unit]</tt> section of your unit files. For details see <a href="http://www.freedesktop.org/software/systemd/man/systemd.unit.html">systemd.unit(5)</a>. The more comprehensively we include links to documentation in our OS services the easier the work of administrators becomes. (To make sure Fedora makes comprehensive use of this functionality <a href="https://fedorahosted.org/fpc/ticket/192">I filed a bug on FPC</a>).</p> <p>Oh, and BTW: if you are looking for a rough overview of systemd's boot process <a href="http://www.freedesktop.org/software/systemd/man/bootup.html">here's another new man page we recently added</a>, which includes a pretty ASCII flow chart of the boot process and the units involved.</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] Which TBH is a pretty crufty, strange one on top.</small></p> <p><small>[2] Well, <a href="https://bugzilla.gnome.org/show_bug.cgi?id=676452">a terminal where this bug is fixed</a> (used together with <a href="https://bugzilla.gnome.org/show_bug.cgi?id=676482">a help browser where this one is fixed</a>).</small></p> Lennart PoetteringWed, 27 Jun 2012 17:45:00 +0200tag:0pointer.net,2012-06-27:/blog/projects/self-documented-boot.htmlprojectsPresentation in Warsawhttps://0pointer.net/blog/projects/warsaw.html <p>I recently had the chance to speak about <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> and other projects, as well as the politics behind them at a <a href="http://osec.pl/barcamp/lennart">Bar Camp in Warsaw</a>, organized by the fine people of <a href="http://osec.pl/">OSEC</a>. The presentation has been recorded, and has now been posted online. It's a very long recording (1:43h), but it's quite interesting (as I'd like to believe) and contains a bit of background where we are coming from and where are going to. Anyway, please have a look. Enjoy!</p> <iframe width="560" height="315" src="http://www.youtube.com/embed/9UnEV9SPuw8" frameborder="0" allowfullscreen="1"></iframe> <p>I'd like to thank the organizers for this great event and for publishing the recording online.</p> Lennart PoetteringThu, 24 May 2012 22:06:00 +0200tag:0pointer.net,2012-05-24:/blog/projects/warsaw.htmlprojectssystemd for Administrators, Part XIIIhttps://0pointer.net/blog/projects/systemctl-journal.html <p><a href="http://0pointer.de/blog/projects/security.html">Here's</a> <a href="http://0pointer.de/blog/projects/inetd.html">the</a> <a href="http://0pointer.de/blog/projects/instances.html">thirteenth</a> <a href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a> <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a> <a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p> <h4>Log and Service Status</h4> <p>This one is a short episode. One of the most commonly used commands on a <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> system is <tt>systemctl status</tt> which may be used to determine the status of a service (or other unit). It always has been a valuable tool to figure out the processes, runtime information and other meta data of a daemon running on the system.</p> <p>With Fedora 17 we introduced <a href="http://0pointer.de/blog/projects/the-journal.html">the journal</a>, our new logging scheme that provides structured, indexed and reliable logging on systemd systems, while providing a certain degree of compatibility with classic syslog implementations. The original reason we started to work on the journal was one specific feature idea, that to the outsider might appear simple but without the journal is difficult and inefficient to implement: along with the output of <tt>systemctl status</tt> we wanted to show the last 10 log messages of the daemon. Log data is some of the most essential bits of information we have on the status of a service. Hence it it is an obvious choice to show next to the general status of the service.</p> <p>And now to make it short: at the same time as we integrated the journal into <tt>systemd</tt> and Fedora we also hooked up <tt>systemctl</tt> with it. Here's an example output:</p> <pre>$ systemctl status avahi-daemon.service avahi-daemon.service - Avahi mDNS/DNS-SD Stack Loaded: loaded (/usr/lib/systemd/system/avahi-daemon.service; enabled) Active: active (running) since Fri, 18 May 2012 12:27:37 +0200; 14s ago Main PID: 8216 (avahi-daemon) Status: "avahi-daemon 0.6.30 starting up." CGroup: name=systemd:/system/avahi-daemon.service ├ 8216 avahi-daemon: running [omega.local] └ 8217 avahi-daemon: chroot helper May 18 12:27:37 omega avahi-daemon[8216]: Joining mDNS multicast group on interface eth1.IPv4 with address 172.31.0.52. May 18 12:27:37 omega avahi-daemon[8216]: New relevant interface eth1.IPv4 for mDNS. May 18 12:27:37 omega avahi-daemon[8216]: Network interface enumeration completed. May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for 192.168.122.1 on virbr0.IPv4. May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for fd00::e269:95ff:fe87:e282 on eth1.*. May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for 172.31.0.52 on eth1.IPv4. May 18 12:27:37 omega avahi-daemon[8216]: Registering HINFO record with values 'X86_64'/'LINUX'. May 18 12:27:38 omega avahi-daemon[8216]: Server startup complete. Host name is omega.local. Local service cookie is 3555095952. May 18 12:27:38 omega avahi-daemon[8216]: Service "omega" (/services/ssh.service) successfully established. May 18 12:27:38 omega avahi-daemon[8216]: Service "omega" (/services/sftp-ssh.service) successfully established.</pre> <p>This, of course, shows the status of everybody's favourite mDNS/DNS-SD daemon with a list of its processes, along with -- as promised -- the 10 most recent log lines. Mission accomplished!</p> <p>There are a couple of switches available to alter the output slightly and adjust it to your needs. The two most interesting switches are <tt>-f</tt> to enable follow mode (as in <tt>tail -f</tt>) and <tt>-n</tt> to change the number of lines to show (you guessed it, as in <tt>tail -n</tt>).</p> <p>The log data shown comes from three sources: everything any of the daemon's processes logged with libc's <tt>syslog()</tt> call, everything submitted using the native Journal API, plus everything any of the daemon's processes logged to STDOUT or STDERR. In short: everything the daemon generates as log data is collected, properly interleaved and shown in the same format.</p> <p>And that's it already for today. It's a very simple feature, but an immensely useful one for every administrator. One of the kind "Why didn't we already do this 15 years ago?".</p> <p>Stay tuned for the next installment!</p> Lennart PoetteringFri, 18 May 2012 12:37:00 +0200tag:0pointer.net,2012-05-18:/blog/projects/systemctl-journal.htmlprojectsBoot & Base OS Miniconf at Linux Plumbers Conference 2012, San Diegohttps://0pointer.net/blog/projects/lpc2012.html <p style="text-align: center"><a href="http://www.linuxplumbersconf.org/2012/"><img src="http://www.linuxplumbersconf.org/2012/style/tagline.png" width="493" height="90" alt="Linux Plumbers Conference Logo" /></a></p> <p>We are working on putting together <a href="http://wiki.linuxplumbersconf.org/2012:boot_and_base_os">a miniconf on the topic of Boot &amp; Base OS</a> for the Linux Plumbers Conference 2012 in San Diego (Aug 29-31). And we need your submission!</p> <p>Are you working on some exciting project related to Boot and Base OS and would like to present your work? Then please submit something <a href="http://www.linuxplumbersconf.org/2012/2012-lpc-call-for-proposals-take-2/">following these guidelines</a>, but please CC Kay Sievers and Lennart Poettering.</p> <p>I hope that at this point the Linux Plumbers Conference needs little introduction, so I will spare any further prose on how great and useful and the best conference ever it is for everybody who works on the plumbing layer of Linux. However, there's one conference that will be co-located with LPC that is still little known, because it happens for the first time: <a href="http://www.cconf.org/">The C Conference</a>, organized by Brandon Philips and friends. It covers all things C, and they are still looking for more topics, in a <a href="http://www.cconf.org/pfc/">reverse CFP</a>. Please consider submitting a proposal and registering to the conference!</p> <p style="text-align: center"><a href="http://www.cconf.org/"><img src="http://www.cconf.org/assets/cconf.png" width="270" height="270" alt="C Conference Logo" /></a></p> Lennart PoetteringThu, 03 May 2012 20:42:00 +0200tag:0pointer.net,2012-05-03:/blog/projects/lpc2012.htmlprojectsThe Most Awesome, Least-Advertised Fedora 17 Featurehttps://0pointer.net/blog/projects/multi-seat.html <p>There's one feature In the upcoming Fedora 17 release that is immensly useful but very little known, since its <a href="https://fedoraproject.org/wiki/Features/ckremoval">feature page 'ckremoval'</a> does not explicitly refer to it in its name: true <i>automatic multi-seat</i> support for Linux.</p> <p>A multi-seat computer is a system that offers not only one local seat for a user, but multiple, at the same time. A seat refers to a combination of a screen, a set of input devices (such as mice and keyboards), and maybe an audio card or webcam, as individual local workplace for a user. A multi-seat computer can drive an entire class room of seats with only a fraction of the cost in hardware, energy, administration and space: you only have one PC, which usually has way enough CPU power to drive 10 or more workplaces. (In fact, even a Netbook has fast enough to drive a couple of seats!) <i>Automatic multi-seat</i> refers to an entirely automatically managed seat setup: whenever a new seat is plugged in a new login screen immediately appears -- without any manual configuration --, and when the seat is unplugged all user sessions on it are removed without delay.</p> <p>In Fedora 17 we added this functionality to the low-level user and device tracking of systemd, replacing the previous ConsoleKit logic that lacked support for automatic multi-seat. With all the ground work done in systemd, udev and the other components of our plumbing layer the last remaining bits were surprisingly easy to add.</p> <p>Currently, the automatic multi-seat logic works best with the USB multi-seat hardware from <a href="http://www.amazon.com/Plugable-Universal-DisplayLink-1920x1080-High-Speed/dp/B002PONXAI/ref=sr_1_3?ie=UTF8&amp;qid=1335904746&amp;sr=8-3">Plugable</a> you can buy cheaply on <a href="http://www.amazon.com/Plugable-DC-125-Docking-Station-Multiseat/dp/B004PXPPNA/ref=sr_1_10?ie=UTF8&amp;qid=1335904746&amp;sr=8-10">Amazon (US)</a>. These devices require exactly zero configuration with the new scheme implemented in Fedora 17: just plug them in at any time, login screens pop up on them, and you have your additional seats. Alternatively you can also assemble your seat manually with a few easy <a href="http://www.freedesktop.org/software/systemd/man/loginctl.html">loginctl attach</a> commands, from any kind of hardware you might have lying around. To get a full seat you need multiple graphics cards, keyboards and mice: one set for each seat. (Later on we'll probably have a graphical setup utility for additional seats, but that's not a pressing issue we believe, as the plug-n-play multi-seat support with the Plugable devices is so awesomely nice.)</p> <p>Plugable provided us for free with hardware for testing multi-seat. They are also involved with the upstream development of the USB DisplayLink driver for Linux. Due to their positive involvement with Linux we can only recommend to buy their hardware. They are good guys, and support Free Software the way all hardware vendors should! (And besides that, their hardware is also nicely put together. For example, in contrast to most similar vendors they actually assign proper vendor/product IDs to their USB hardware so that we can easily recognize their hardware when plugged in to set up automatic seats.)</p> <p>Currently, all this magic is only implemented in the GNOME stack with the biggest component getting updated being the GNOME Display Manager. On the Plugable USB hardware you get a full GNOME Shell session with all the usual graphical gimmicks, the same way as on any other hardware. (Yes, GNOME 3 works perfectly fine on simpler graphics cards such as these USB devices!) If you are hacking on a different desktop environment, or on a different display manager, please have a look at <a href="http://www.freedesktop.org/wiki/Software/systemd/multiseat">the multi-seat documentation</a> we put together, and particularly at our short piece about <a href="http://www.freedesktop.org/wiki/Software/systemd/writing-display-managers">writing display managers</a> which are multi-seat capable.</p> <p>If you work on a major desktop environment or display manager and would like to implement multi-seat support for it, but lack the aforementioned Plugable hardware, we might be able to provide you with the hardware for free. Please contact us directly, and we might be able to send you a device. Note that we don't have unlimited devices available, hence we'll probably not be able to pass hardware to everybody who asks, and we will pass the hardware preferably to people who work on well-known software or otherwise have contributed good code to the community already. Anyway, if in doubt, ping us, and explain to us why you should get the hardware, and we'll consider you! (Oh, and this not only applies to display managers, if you hack on some other software where multi-seat awareness would be truly useful, then don't hesitate and ping us!)</p> <p>Phoronix has <a href="http://www.phoronix.com/scan.php?page=article&amp;item=plugable_multiseat_kick">this story about this new multi-seat</a> support which is quite interesting and full of pictures. Please have a look.</p> <p>Plugable started a <a href="http://www.kickstarter.com/projects/1666707630/plugable-thin-client-the-50-computer">Pledge drive</a> to lower the price of the Plugable USB multi-seat terminals further. It's full of pictures (<a href="http://www.kickstarter.com/projects/1666707630/plugable-thin-client-the-50-computer/widget/video.html"><b>and a video showing all this in action!</b></a>), and uses the code we now make available in Fedora 17 as base. Please consider pledging a few bucks.</p> <p>Recently David Zeuthen <a href="https://plus.google.com/110773474140772402317/posts/NqPUifsFUYH">added multi-seat support to udisks</a> as well. With this in place, a user logged in on a specific seat can only see the USB storage plugged into his individual seat, but does not see any USB storage plugged into any other local seat. With this in place we closed the last missing bit of multi-seat support in our desktop stack.</p> <p>With this code in Fedora 17 we cover the big use cases of multi-seat already: internet cafes, class rooms and similar installations can provide PC workplaces cheaply and easily without any manual configuration. Later on we want to build on this and make this useful for different uses too: for example, the ability to get a login screen as easily as plugging in a USB connector makes this not useful only for saving money in setups for many people, but also in embedded environments (consider monitoring/debugging screens made available via this hotplug logic) or servers (get trivially quick local access to your otherwise head-less server). To be truly useful in these areas we need one more thing though: the ability to run a simply getty (i.e. text login) on the seat, without necessarily involving a graphical UI.</p> <p>The well-known X successor Wayland already comes out of the box with multi-seat support based on this logic.</p> <p>Oh, and BTW, as Ubuntu appears to be "<i>focussing</i>" on "<i>clarity</i>" in the "<i>cloud</i>" now ;-), and chose Upstart instead of systemd, this feature won't be available in Ubuntu any time soon. That's (one detail of) the price Ubuntu has to pay for choosing to maintain it's own (largely legacy, such as ConsoleKit) plumbing stack.</p> <p>Multi-seat has a long history on Unix. Since the earliest days Unix systems could be accessed by multiple local terminals at the same time. Since then local terminal support (and hence multi-seat) gradually moved out of view in computing. The fewest machines these days have more than one seat, the concept of terminals survived almost exclusively in the context of PTYs (i.e. fully virtualized API objects, disconnected from any real hardware seat) and VCs (i.e. a single virtualized local seat), but almost not in any other way (well, server setups still use serial terminals for emergency remote access, but they almost never have more than one serial terminal). All what we do in systemd is based on the ideas originally brought forward in Unix; with systemd we now try to bring back a number of the good ideas of Unix that since the old times were lost on the roadside. For example, in true Unix style we already started to expose the concept of a service in the file system (in <tt>/sys/fs/cgroup/systemd/system/</tt>), something where on Linux the (often misunderstood) "<i>everything is a file</i>" mantra previously fell short. With automatic multi-seat support we bring back support for terminals, but updated with all the features of today's desktops: plug and play, zero configuration, full graphics, and not limited to input devices and screens, but extending to all kinds of devices, such as audio, webcams or USB memory sticks.</p> <p>Anyway, this is all for now; I'd like to thank everybody who was involved with making multi-seat work so nicely and natively on the Linux platform. You know who you are! Thanks a ton!</p> Lennart PoetteringTue, 01 May 2012 23:07:00 +0200tag:0pointer.net,2012-05-01:/blog/projects/multi-seat.htmlprojectssystemd Status Updatehttps://0pointer.net/blog/projects/systemd-update-3.html <p><a href="http://0pointer.de/blog/projects/systemd-update-2.html">It has been way too long since my last status update on systemd</a>. Here's another short, incomprehensive status update on what we worked on for <a href="http://freedesktop.org/wiki/Software/systemd">systemd</a> since then.</p> <p>We have been working hard to turn systemd into the most viable set of components to build operating systems, appliances and devices from, and make it the best choice for servers, for desktops and for embedded environments alike. I think we have a really convincing set of features now, but we are actively working on making it even better.</p> <p>Here's a list of some more and some less interesting features, in no particular order:</p> <ol> <li>We added an automatic pager to <tt>systemctl</tt> (and related tools), similar to how <tt>git</tt> has it.</li> <li><tt>systemctl</tt> learnt a new switch <tt>--failed</tt>, to show only failed services.</li> <li>You may now start services immediately, overrding all dependency logic by passing <tt>--ignore-dependencies</tt> to <tt>systemctl</tt>. This is mostly a debugging tool and nothing people should use in real life.</li> <li>Sending <tt>SIGKILL</tt> as final part of the implicit shutdown logic of services is now optional and may be configured with the <tt>SendSIGKILL=</tt> option individually for each service.</li> <li>We split off the Vala/Gtk tools into its own project <tt>systemd-ui</tt>.</li> <li><tt>systemd-tmpfiles</tt> learnt file globbing and creating FIFO special files as well as character and block device nodes, and symlinks. It also is capable of relabelling certain directories at boot now (in the SELinux sense).</li> <li>Immediately before shuttding dow we will now invoke all binaries found in <tt>/lib/systemd/system-shutdown/</tt>, which is useful for debugging late shutdown.</li> <li>You may now globally control where STDOUT/STDERR of services goes (unless individual service configuration overrides it).</li> <li>There's a new <tt>ConditionVirtualization=</tt> option, that makes systemd skip a specific service if a certain virtualization technology is found or not found. Similar, we now have a new option to detect whether a certain security technology (such as SELinux) is available, called <tt>ConditionSecurity=</tt>. There's also <tt>ConditionCapability=</tt> to check whether a certain process capability is in the capability bounding set of the system. There's also a new <tt>ConditionFileIsExecutable=</tt>, <tt>ConditionPathIsMountPoint=</tt>, <tt>ConditionPathIsReadWrite=</tt>, <tt>ConditionPathIsSymbolicLink=</tt>.</li> <li>The file system condition directives now support globbing.</li> <li>Service conditions may now be "triggering" and "mandatory", meaning that they can be a necessary requirement to hold for a service to start, or simply one trigger among many.</li> <li>At boot time we now print warnings if: <a href="http://freedesktop.org/wiki/Software/systemd/separate-usr-is-broken"><tt>/usr</tt> is on a split-off partition but not already mounted by an initrd</a>; if <tt>/etc/mtab</tt> is not a symlink to <tt>/proc/mounts</tt>; <a href="http://0pointer.de/blog/projects/cgroups-vs-cgroups.html">CONFIG_CGROUPS is not enabled in the kernel</a>. We'll also expose this as <i>tainted</i> flag on the bus.</li> <li>You may now boot the same OS image on a bare metal machine and in Linux namespace containers and will get a clean boot in both cases. This is more complicated than it sounds since device management with udev or write access to <tt>/sys</tt>, <tt>/proc/sys</tt> or things like <tt>/dev/kmsg</tt> is not available in a container. This makes systemd a first-class choice for managing thin container setups. This is all tested with systemd's own <tt>systemd-nspawn</tt> tool but should work fine in LXC setups, too. Basically this means that you do not have to adjust your OS manually to make it work in a container environment, but will just work out of the box. It also makes it easier to convert real systems into containers.</li> <li>We now automatically spawn gettys on HVC ttys when booting in VMs.</li> <li>We introduced <tt>/etc/machine-id</tt> as a generalization of D-Bus machine ID logic. See <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">this blog story for more information</a>. On stateless/read-only systems the machine ID is initialized randomly at boot. In virtualized environments it may be passed in from the machine manager (with qemu's <tt>-uuid</tt> switch, or via the <a href="http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface">container interface</a>).</li> <li>All of the systemd-specific <tt>/etc/fstab</tt> mount options are now in the <tt>x-systemd-<i>xyz</i></tt> format.</li> <li>To make it easy to find non-converted services we will now implicitly prefix all LSB and SysV init script descriptions with the strings "<tt>LSB:</tt>" resp. "<tt>SYSV:</tt>".</li> <li>We introduced <tt>/run</tt> and made it a hard dependency of systemd. This directory is now widely accepted and implemented on all relevant Linux distributions.</li> <li>systemctl can now execute all its operations remotely too (<tt>-H</tt> switch).</li> <li>We now ship <a href="http://0pointer.de/blog/projects/changing-roots.html">systemd-nspawn</a>, a really powerful tool that can be used to start containers for debugging, building and testing, much like chroot(1). It is useful to just get a shell inside a build tree, but is good enough to boot up a full system in it, too.</li> <li>If we query the user for a hard disk password at boot he may hit TAB to hide the asterisks we normally show for each key that is entered, for extra paranoia.</li> <li>We don't enable <tt>udev-settle.service</tt> anymore, which is only required for certain legacy software that still hasn't been updated to follow devices coming and going cleanly.</li> <li>We now include a tool that can plot boot speed graphs, similar to bootchartd, called <a href="http://0pointer.de/blog/projects/blame-game.html"><tt>systemd-analyze</tt></a>.</li> <li>At boot, we now initialize the kernel's <tt>binfmt_misc</tt> logic with the data from <tt>/etc/binfmt.d</tt>.</li> <li><tt>systemctl</tt> now recognizes if it is run in a <tt>chroot()</tt> environment and will work accordingly (i.e. apply changes to the tree it is run in, instead of talking to the actual PID 1 for this). It also has a new <tt>--root=</tt> switch to work on an OS tree from outside of it.</li> <li>There's a new unit dependency type <tt>OnFailureIsolate=</tt> that allows entering a different target whenever a certain unit fails. For example, this is interesting to enter emergency mode if file system checks of crucial file systems failed.</li> <li>Socket units may now listen on Netlink sockets, special files from <tt>/proc</tt> and POSIX message queues, too.</li> <li>There's a new <tt>IgnoreOnIsolate=</tt> flag which may be used to ensure certain units are left untouched by isolation requests. There's a new <tt>IgnoreOnSnapshot=</tt> flag which may be used to exclude certain units from snapshot units when they are created.</li> <li>There's now small mechanism services <a href="http://www.freedesktop.org/wiki/Software/systemd/hostnamed">for changing the local hostname and other host meta data</a>, <a href="http://www.freedesktop.org/wiki/Software/systemd/localed">changing the system locale and console settings</a> and the <a href="http://www.freedesktop.org/wiki/Software/systemd/timedated">system clock</a>.</li> <li>We now limit the capability bounding set for a number of our internal services by default.</li> <li>Plymouth may now be disabled globally with <tt>plymouth.enable=0</tt> on the kernel command line.</li> <li>We now disallocate VTs when a getty finished running (and optionally other tools run on VTs). This adds extra security since it clears up the scrollback buffer so that subsequent users cannot get access to a user's session output.</li> <li>In socket units there are now options to control the <tt>IP_TRANSPARENT</tt>, <tt>SO_BROADCAST</tt>, <tt>SO_PASSCRED</tt>, <tt>SO_PASSSEC</tt> socket options.</li> <li>The receive and send buffers of socket units may now be set larger than the default system settings if needed by using SO_{RCV,SND}BUFFORCE.</li> <li>We now set the hardware timezone as one of the first things in PID 1, in order to avoid time jumps during normal userspace operation, and to guarantee sensible times on all generated logs. We also no longer save the system clock to the RTC on shutdown, assuming that this is done by the clock control tool when the user modifies the time, or automatically by the kernel if NTP is enabled.</li> <li>The SELinux directory got moved from <tt>/selinux</tt> to <tt>/sys/fs/selinux</tt>.</li> <li>We added a small service <tt>systemd-logind</tt> that keeps tracks of logged in users and their sessions. It creates control groups for them, implements the <a href="http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html">XDG_RUNTIME_DIR specification</a> for them, maintains seats and device node ACLs and implements shutdown/idle inhibiting for clients. It auto-spawns gettys on all local VTs when the user switches to them (instead of starting six of them unconditionally), thus reducing the resource foot print by default. It has a D-Bus interface as well as <a href="http://www.freedesktop.org/software/systemd/man/sd-login.html">a simple synchronous library interface</a>. This mechanism obsoletes ConsoleKit which is now deprecated and should no longer be used.</li> <li>There's now full, automatic multi-seat support, and this is enabled in GNOME 3.4. Just by pluging in new seat hardware you get a new login screen on your seat's screen.</li> <li>There is now an option <tt>ControlGroupModify=</tt> to allow services to change the properties of their control groups dynamically, and one to make control groups persistent in the tree (<tt>ControlGroupPersistent=</tt>) so that they can be created and maintained by external tools.</li> <li>We now jump back into the <tt>initrd</tt> in shutdown, so that it can detach the root file system and the storage devices backing it. This allows (for the first time!) to reliably undo complex storage setups on shutdown and leave them in a clean state.</li> <li><tt>systemctl</tt> now supports <i>presets</i>, a way for distributions and administrators to define their own policies on whether services should be enabled or disabled by default on package installation.</li> <li><tt>systemctl</tt> now has high-level verbs for masking/unmasking units. There's also a new command (<tt>systemctl list-unit-files</tt>) for determining the list of all installed unit file files and whether they are enabled or not.</li> <li>We now apply <tt>sysctl</tt> variables to each new network device, as it appears. This makes <tt>/etc/sysctl.d</tt> compatible with hot-plug network devices.</li> <li>There's limited profiling for SELinux start-up perfomance built into PID 1.</li> <li>There's a new switch <a href="http://0pointer.de/blog/projects/security.html"><tt>PrivateNetwork=</tt></a> to turn of any network access for a specific service.</li> <li>Service units may now include configuration for control group parameters. A few (such as <tt>MemoryLimit=</tt>) are exposed with high-level options, and all others are available via the generic <tt>ControlGroupAttribute=</tt> setting.</li> <li>There's now the option to mount certain cgroup controllers jointly at boot. We do this now for <tt>cpu</tt> and <tt>cpuacct</tt> by default.</li> <li>We added <a href="https://docs.google.com/document/pub?id=1IC9yOXj7j6cdLLxWEBAGRL6wl97tFxgjLUEHIX3MSTs">the journal</a> and turned it on by default.</li> <li>All service output is now written to the Journal by default, regardless whether it is sent via syslog or simply written to stdout/stderr. Both message streams end up in the same location and are interleaved the way they should. All log messages even from the kernel and from early boot end up in the journal. Now, no service output gets unnoticed and is saved and indexed at the same location.</li> <li><tt>systemctl status</tt> will now show the last 10 log lines for each service, directly from the journal.</li> <li>We now show the progress of <tt>fsck</tt> at boot on the console, again. We also show the much loved colorful <tt>[ OK ]</tt> status messages at boot again, as known from most SysV implementations.</li> <li>We merged udev into systemd.</li> <li>We implemented and documented interfaces to <a href="http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface">container managers</a> and <a href="http://www.freedesktop.org/wiki/Software/systemd/InitrdInterface">initrds</a> for passing execution data to systemd. We also implemented and documented <a href="http://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons">an interface for storage daemons that are required to back the root file system</a>.</li> <li>There are two new options in service files to propagate reload requests between several units.</li> <li><tt>systemd-cgls</tt> won't show kernel threads by default anymore, or show empty control groups.</li> <li>We added a new tool <tt>systemd-cgtop</tt> that shows resource usage of whole services in a top(1) like fasion.</li> <li>systemd may now supervise services in watchdog style. If enabled for a service the daemon daemon has to ping PID 1 in regular intervals or is otherwise considered failed (which might then result in restarting it, or even rebooting the machine, as configured). Also, PID 1 is capable of pinging a hardware watchdog. Putting this together, the hardware watchdogs PID 1 and PID 1 then watchdogs specific services. This is highly useful for high-availability servers as well as embedded machines. Since watchdog hardware is noawadays built into all modern chipsets (including desktop chipsets), this should hopefully help to make this a more widely used functionality.</li> <li>We added support for a new kernel command line option <tt>systemd.setenv=</tt> to set an environment variable system-wide.</li> <li>By default services which are started by systemd will have SIGPIPE set to ignored. The Unix SIGPIPE logic is used to reliably implement shell pipelines and when left enabled in services is usually just a source of bugs and problems.</li> <li>You may now configure the rate limiting that is applied to restarts of specific services. Previously the rate limiting parameters were hard-coded (similar to SysV).</li> <li>There's now support for loading the IMA integrity policy into the kernel early in PID 1, similar to how we already did it with the SELinux policy.</li> <li>There's now an official API to schedule and query scheduled shutdowns.</li> <li>We changed the license from GPL2+ to LGPL2.1+.</li> <li>We made <a href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html"><tt>systemd-detect-virt</tt></a> an official tool in the tool set. Since we already had code to detect certain VM and container environments we now added an official tool for administrators to make use of in shell scripts and suchlike.</li> <li>We documented <a href="http://www.freedesktop.org/wiki/Software/systemd/InterfacePortabilityAndStabilityChart">numerous interfaces</a> systemd introduced.</li> </ol> <p>Much of the stuff above is already available in Fedora 15 and 16, or will be made available in the upcoming Fedora 17.</p> <p>And that's it for now. There's a lot of other stuff in the git commits, but most of it is smaller and I will it thus spare you.</p> <p>I'd like to thank everybody who contributed to systemd over the past years.</p> <p>Thanks for your interest!</p> Lennart PoetteringSat, 21 Apr 2012 00:17:00 +0200tag:0pointer.net,2012-04-21:/blog/projects/systemd-update-3.htmlprojectsControl Groups vs. Control Groupshttps://0pointer.net/blog/projects/cgroups-vs-cgroups.html <p><i>TL;DR: <a href="http://www.freedesktop.org/wiki/Software/systemd/">systemd</a> does not require the performance-sensitive bits of Linux control groups enabled in the kernel. However, it does require some non-performance-sensitive bits of the control group logic.</i></p> <p>In some areas of the community there's still some confusion about Linux control groups and their performance impact, and what precisely it is that systemd requires of them. In the hope to clear this up a bit, I'd like to point out a few things:</p> <p>Control Groups are two things: <b>(A)</b> <i>a way to hierarchally group and label processes</i>, and <b>(B)</b> <i>a way to then apply resource limits</i> to these groups. systemd only requires the former (A), and not the latter (B). That means you can compile your kernel without any control group resource controllers (B) and systemd will work perfectly on it. However, if you in addition disable the grouping feature entirely (A) then systemd will loudly complain at boot and proceed only reluctantly with a big warning and in a limited functionality mode.</p> <p>At compile time, the grouping/labelling feature in the kernel is enabled by CONFIG_CGROUPS=y, the individual controllers by CONFIG_CGROUP_FREEZER=y, CONFIG_CGROUP_DEVICE=y, CONFIG_CGROUP_CPUACCT=y, CONFIG_CGROUP_MEM_RES_CTLR=y, CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y, CONFIG_CGROUP_MEM_RES_CTLR_KMEM=y, CONFIG_CGROUP_PERF=y, CONFIG_CGROUP_SCHED=y, CONFIG_BLK_CGROUP=y, CONFIG_NET_CLS_CGROUP=y, CONFIG_NETPRIO_CGROUP=y. And since (as mentioned) we only need the former (A), not the latter (B) you may disable all of the latter options while enabling CONFIG_CGROUPS=y, if you want to run systemd on your system.</p> <p>What about the performance impact of these options? Well, every bit of code comes at some price, so none of these options come entirely for free. However, the grouping feature (A) alters the general logic very little, it just sticks hierarchial labels on processes, and its impact is minimal since that is usually not in any hot path of the OS. This is different for the various controllers (B) which have a much bigger impact since they influence the resource management of the OS and are full of hot paths. This means that the kernel feature that systemd mandatorily requires (A) has a minimal effect on system performance, but the actually performance-sensitive features of control groups (B) are entirely optional.</p> <p>On boot, systemd will mount all controller hierarchies it finds enabled in the kernel to individual directories below <tt>/sys/fs/cgroup/</tt>. This is the official place where kernel controllers are mounted to these days. The <tt>/sys/fs/cgroup/</tt> mount point in the kernel was created precisely for this purpose. Since the control group controllers are a shared facility that might be used by a number of different subsystems <a href="http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups">a few projects have agreed on a set of rules in order to avoid that the various bits of code step on each other's toes when using these directories</a>. </p> <p>systemd will also maintain its own, private, controller-less, named control group hierarchy which is mounted to <tt>/sys/fs/cgroup/systemd/</tt>. This hierarchy is private property of systemd, and other software should not try to interfere with it. This hierarchy is how systemd makes use of the naming and grouping feature of control groups (A) without actually requiring any kernel controller enabled for that.</p> <p>Now, you might notice that by default systemd does create per-service cgroups in the "cpu" controller if it finds it enabled in the kernel. This is entirely optional, however. We chose to make use of it by default to even out CPU usage between system services. Example: On a traditional web server machine Apache might end up having 100 CGI worker processes around, while MySQL only has 5 processes running. Without the use of the "cpu" controller this means that Apache all together ends up having 20x more CPU available than MySQL since the kernel tries to provide every process with the same amount of CPU time. On the other hand, if we add these two services to the "cpu" controller in individual groups by default, Apache and MySQL get the same amount of CPU, which we think is a good default.</p> <p>Note that if the CPU controller is not enabled in the kernel systemd will not attempt to make use of the "cpu" hierarchy as described above. Also, even if it is enabled in the kernel it is trivial to tell systemd not to make use of it: Simply edit <tt>/etc/systemd/system.conf</tt> and set <tt>DefaultControllers=</tt> to the empty string.</p> <p>Let's discuss a few frequently heard complaints regarding systemd's use of control groups:</p> <ul> <li><b>systemd mounts all controllers to <tt>/sys/fs/cgroup/</tt> even though my software requires it at <tt>/dev/cgroup/</tt> (or some other place)!</b> The standardization of <tt>/sys/fs/cgroup/</tt> as mount point of the hierarchies is a relatively recent change in the kernel. Some software has not been updated yet for it. If you cannot change the software in question you are welcome to unmount the hierarchies from <tt>/sys/fs/cgroup/</tt> and mount them wherever you need them instead. However, make sure to leave <tt>/sys/fs/cgroup/systemd/</tt> untouched.</li> <li><b>systemd makes use of the "cpu" hierarchy, but it should leave its dirty fingers from it!</b> As mentioned above, just set the <tt>DefaultControllers=</tt> option of systemd to the empty string.</li> <li><b>I need my two controllers "foo" and "bar" mounted into one hierarchy, but systemd mounts them in two!</b> Use the <tt>JoinControllers=</tt> setting in <tt>/etc/systemd/system.conf</tt> to mount several controllers into a single hierarchy.</li> <li><b>Control groups are evil and they make everything slower!</b> Well, please read the text above and understand the difference between "control-groups-as-in-naming-and-grouping" (A) and "cgroups-as-in-controllers" (B). Then, please turn off all controllers in you kernel build (B) but leave CONFIG_CGROUPS=y (A) enabled.</li> <li><b>I have heard <i>some</i> kernel developers really hate control groups and think systemd is evil because it requires them!</b> Well, there are a couple of things behind the dislike of control groups by some folks. Primarily, this is probably caused because the hackers in question do not distuingish the naming-and-grouping bits of the control group logic (A) and the controllers that are based on it (B). Mainly, their beef is with the latter (which systemd does not require, which is the key point I am trying to make in the text above), but there are other issues as well: for example, the code of the grouping logic is not the most beautiful bit of code ever written by man (which is thankfully likely to get better now, since the control groups subsystem now has an active maintainer again). And then for some developers it is important that they can compare the runtime behaviour of many historic kernel versions in order to find bugs (git bisect). Since systemd requires kernels with basic control group support enabled, and this is a relatively recent feature addition to the kernel, this makes it difficult for them to use a newer distribution with all these old kernels that predate cgroups. Anyway, the summary is probably that what matters to developers is different from what matters to users and administrators.</li> </ul> <p>I hope this explanation was useful for a reader or two! Thank you for your time!</p> Lennart PoetteringTue, 10 Apr 2012 19:09:00 +0200tag:0pointer.net,2012-04-10:/blog/projects/cgroups-vs-cgroups.htmlprojectsGUADEC 2012 CFP Ending Soon!https://0pointer.net/blog/projects/guadec-2012-cfp.html <p>In case you haven't submitted your talk proposal for GUADEC 2012 in A Coru&ntilde;a, Spain yet, hurry: the deadline is on April 14th, i.e. this saturday! <a href="http://www.guadec.org/cfp">Read der Call for Participation!</a> <a href="https://www.gpul.org/indico/abstractSubmission.py?confId=0">Submit a proposal!</a></p> Lennart PoetteringTue, 10 Apr 2012 17:40:00 +0200tag:0pointer.net,2012-04-10:/blog/projects/guadec-2012-cfp.htmlprojects/tmp or not /tmp?https://0pointer.net/blog/projects/tmp.html <p>A number of Linux distributions have recently switched (or started switching) to <tt>/tmp</tt> on tmpfs by default (ArchLinux, Debian among others). Other distributions have plans/are discussing doing the same (Ubuntu, OpenSUSE). Since we believe this is a good idea and it's good to keep the delta between the distributions minimal <a href="https://fedoraproject.org/wiki/Features/tmp-on-tmpfs">we are proposing the same for Fedora 18, too</a>. On Solaris a similar change has already been implemented in 1994 (and other Unixes have made a similar change long ago, too). Yet, not all of our software is written in a way that it works nicely together with <tt>/tmp</tt> on tmpfs.</p> <p>Another <a href="https://fedoraproject.org/wiki/Features/ServicesPrivateTmp">Fedora feature (for Fedora 17)</a> changed the semantics of <tt>/tmp</tt> for many system services to make them more secure, by isolating the /tmp namespaces of the various services. Handling of temporary files in <tt>/tmp</tt> has been security sensitive since it has been introduced since it traditionally has been a world writable, shared namespace and unless all user code safely uses randomized file names it is vulnerable to DoS attacks and worse.</p> <p>In this blog story I'd like to shed some light on proper usage of <tt>/tmp</tt> and what your Linux application should use for what purpose. We'll not discuss why <tt>/tmp</tt> on tmpfs is a good idea, for that refer to the <a href="https://fedoraproject.org/wiki/Features/tmp-on-tmpfs">Fedora feature page</a>. Here we'll just discuss what <tt>/tmp</tt> should be used for and for what it shouldn't be, as well as what should be used instead. All that in order to make sure your application remains compatible with these new features introduced to many newer Linux distributions.</p> <p><tt>/tmp</tt> is (as the name suggests) an area where temporary files applications require during operation may be placed. Of course, temporary files differ very much in their properties:</p> <ul> <li>They can be large, or very small</li> <li>They might be used for sharing between users, or be private to users</li> <li>They might need to be persistent across boots, or very volatile</li> <li>They might need to be machine-local or shared on the network</li> </ul> <p>Traditionally, <tt>/tmp</tt> has not only been the place where actual temporary files are stored, but some software used to place (and often still continues to place) communication primitives such as sockets, FIFOs, shared memory there as well. Notably X11, but many others too. Usage of world-writable shared namespaces for communication purposes has always been problematic, since to establish communication you need stable names, but stable names open the doors for DoS attacks. This can be corrected partially, by establishing protected per-app directories for certain services during early boot (like we do for X11), but this only fixes the problem partially, since this only works correctly if every package installation is followed by a reboot.</p> <p>Besides <tt>/tmp</tt> there are various other places where temporary files (or other files that traditionally have been stored in <tt>/tmp</tt>) can be stored. Here's a quick overview of the candidates:</p> <ul> <li><tt>/tmp</tt>, POSIX suggests this is flushed as boot, FHS says that files do not need to be persistent between two runs of the application. Old files are often cleaned up automatically after a time ("aging"). Usually it is recommended to use $TMPDIR if it is set before falling back to <tt>/tmp</tt> directly. As mentioned, this is a tmpfs on many Linuxes/Unixes (and most likely will be for most soon), and hence should be used only for small files. It's generally a shared namespace, hence the only APIs for using it should be <a href="http://linux.die.net/man/3/mkstemp"><tt>mkstemp()</tt></a>, <a href="http://linux.die.net/man/3/mkdtemp"><tt>mkdtemp()</tt></a> (and friends) to be entirely safe.<sup>[1]</sup> Recently, improvements have been made to turn this shared namespace into a private namespace (see above), but that doesn't relieve developers from writing secure code that is also safe if <tt>/tmp</tt> is a shared namespace. Because <tt>/tmp</tt> is no longer necessarily a shared namespace it is generally unsuitable as a location for communication primitives. It is machine-private and local. It's usually fully featured (locking, ...). This directory is world writable and thus available for both privileged and unprivileged code.</li> <li><tt>/var/tmp</tt>, according to FHS "more persistent" than <tt>/tmp</tt>, and is less often cleaned up (it's persistent across reboots, for example). It's not on a tmpfs, but on a real disk, and hence can be used to store much larger files. The same namespace problems apply as with <tt>/tmp</tt>, hence also exclusively use <tt>mkstemp()</tt>/<tt>mkdtemp()</tt> for this directory. It is also automatically cleaned up by time. It is machine-private. It's not necessarily fully featured (no locking, ...). This directory is world writable and thus available for both privileged and unprivileged code. We suggest to also check <tt>$TMPDIR</tt> before falling back to <tt>/var/tmp</tt>. That way if <tt>$TMPDIR</tt> is set this overrides usage of both <tt>/tmp</tt> and <tt>/var/tmp</tt>.</li> <li><tt>/run</tt> (traditionally <tt>/var/run</tt>) where privileged daemons can store runtime data, such as communication primitives. This is where your daemon should place its sockets. It's guaranteed to be a shared namespace, but is only writable by privileged code and hence very safe to use. This file system is guaranteed to be a tmpfs and is hence automatically flushed at boots. No automatic clean-up is done beyond that. It is machine-private and local. It is fully-featured, and provides all functionality the local OS can provide (locking, sockets, ...).</li> <li><tt><a href="http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html">$XDG_RUNTIME_DIR</a></tt> where unprivileged user software can store runtime data, such as communication primitives. This is similar to <tt>/run</tt> but for user applications. It's a user private namespace, and hence very safe to use. It's cleaned up automatically at logout and also is cleaned up by time via "aging". It is machine-private and fully featured. In GLib applications use <tt>g_get_user_runtime_dir()</tt> to query the path of this directory.</li> <li><tt><a href="http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html">$XDG_CACHE_HOME</a></tt> where unprivileged user software can store non-essential data. It's a private namespace of the user. It might be shared between machines. It is not automatically cleaned up, and not fully featured (no locking, and so on, due to NFS). In GLib applications use <tt>g_get_user_cache_dir()</tt> to query this directory.</li> <li><tt><a href="http://freedesktop.org/wiki/Software/xdg-user-dirs">$XDG_DOWNLOAD_DIR</a></tt> where unprivileged user software can store downloads and downloads in progress. It should only be used for downloads, and is a private namespace fo the user, but might be shared between machines. It is not automatically cleaned up and not fully featured. In GLib applications use <tt>g_get_user_special_dir()</tt> to query the path of this directory.</li> </ul> <p>Now that we have introduced the contestants, here's a rough guide how we suggest you (a Linux application developer) pick the right directory to use:</p> <ol> <li>You need a place to put your socket (or other communication primitive) and your code runs privileged: use a subdirectory beneath <tt>/run</tt>. (Or beneath <tt>/var/run</tt> for extra compatibility.)</li> <li>You need a place to put your socket (or other communication primitive) and your code runs unprivileged: use a subdirectory beneath <tt>$XDG_RUNTIME_DIR</tt>.</li> <li>You need a place to put your larger downloads and downloads in progress and run unprivileged: use <tt>$XDG_DOWNLOAD_DIR</tt>.</li> <li>You need a place to put cache files which should be persistent and run unprivileged: use <tt>$XDG_CACHE_HOME</tt>.</li> <li>Nothing of the above applies and you need to place a small file that needs no persistency: use <tt>$TMPDIR</tt> with a fallback on <tt>/tmp</tt>. And use <tt>mkstemp()</tt>, and <tt>mkdtemp()</tt> and nothing homegrown.</li> <li>Otherwise use <tt>$TMPDIR</tt> with a fallback on <tt>/var/tmp</tt>. Also use <tt>mkstemp()</tt>/<tt>mkdtemp()</tt>.</li> </ol> <p>Note that these rules above are only suggested by us. These rules take into account everything we know about this topic and avoid problems with current and future distributions, as far as we can see them. Please consider updating your projects to follow these rules, and keep them in mind if you write new code.</p> <p><b>One thing we'd like to stress is that <tt>/tmp</tt> and <tt>/var/tmp</tt> more often than not are actually not the right choice for your usecase. There are valid uses of these directories, but quite often another directory might actually be the better place. So, be careful, consider the other options, but if you do go for <tt>/tmp</tt> or <tt>/var/tmp</tt> then at least make sure to use <tt>mkstemp()</tt>/<tt>mkdtemp()</tt>.</b></p> <p>Thank you for your interest!</p> <p>Oh, and if you now complain that we don't understand Unix, and that we are morons and worse, then please read this again, and you might notice that this is just a best practice guide, not a specification we have written. Nothing that introduces anything new, just something that explains how things are.</p> <p>If you want to complain about the <tt>tmp-on-tmpfs</tt> or <tt>ServicesPrivateTmp</tt> feature, then this is not the right place either, because this blog post is not really about that. Please direct this to <tt>fedora-devel</tt> instead. Thank you very much.</p> <p><b><small>Footnotes</small></b></p> <p><small>[1] Well, or to turn this around: unless you have a PhD in advanced Unixology and are not using <tt>mkstemp()</tt>/<tt>mkdtemp()</tt> but use <tt>/tmp</tt> nonetheless it's very likely you are writing vulnerable code.</small></p> Lennart PoetteringWed, 28 Mar 2012 14:04:00 +0200tag:0pointer.net,2012-03-28:/blog/projects/tmp.htmlprojects/etc/os-releasehttps://0pointer.net/blog/projects/os-release.html <p><a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">One of the new configuration files systemd introduced is <tt>/etc/os-release</tt></a>. It replaces the multitude of per-distribution release files<sup>[1]</sup> with a single one. Yesterday we <a href="http://lists.freedesktop.org/archives/systemd-devel/2012-February/004475.html">decided to drop</a> support for systems lacking <a href="http://www.freedesktop.org/software/systemd/man/os-release.html"><tt>/etc/os-release</tt></a> in systemd since recently the majority of the big distributions adopted <tt>/etc/os-release</tt> and many small ones did, too<sup>[2]</sup>. It's our hope that by dropping support for non-compliant distributions we gently put some pressure on the remaining hold-outs to adopt this scheme as well.</p> <p>I'd like to take the opportunity to explain a bit what the new file offers, why application developers should care, and why the distributions should adopt it. Of course, this file is pretty much a triviality in many ways, but I guess it's still one that deserves explanation.</p> <p>So, you ask why this all?</p> <ul> <li>It relieves application developers who just want to know the distribution they are running on to check for a multitude of individual release files.</li> <li>It provides both a "pretty" name (i.e. one to show to the user), and machine parsable version/OS identifiers (i.e. for use in build systems).</li> <li>It is extensible, can easily learn new fields if needed. For example, since we want to print a welcome message in the color of your distribution at boot we make it possible to configure the ANSI color for that in the file.</li> </ul> <p><b>FAQs</b></p> <p><b>There's already the <tt>lsb_release</tt> tool for this, why don't you just use that?</b> Well, it's a very strange interface: a shell script you have to invoke (and hence spawn asynchronously from your C code), and it's not written to be extensible. It's an optional package in many distributions, and nothing we'd be happy to invoke as part of early boot in order to show a welcome message. (In times with sub-second userspace boot times we really don't want to invoke a huge shell script for a triviality like showing the welcome message). The <tt>lsb_release</tt> tool to us appears to be an attempt of abstracting distribution checks, where standardization of distribution checks is needed. It's simply a badly designed interface. In our opinion, it has its use as an interface to determine the LSB version itself, but not for checking the distribution or version.</p> <p><b>Why haven't you adopted one of the generic release files, such as Fedora's <tt>/etc/system-release</tt>?</b> Well, they are much nicer than <tt>lsb_release</tt>, so much is true. However, they are not extensible and are not really parsable, if the distribution needs to be identified programmatically or a specific version needs to be verified.</p> <p><b>Why didn't you call this file <tt>/etc/bikeshed</tt> instead? The name <tt>/etc/os-release</tt> sucks!</b> In a way, I think you kind of answered your own question there already.</p> <p><b>Does this mean my distribution can now drop our equivalent of <tt>/etc/fedora-release</tt>?</b> Unlikely, too much code exists that still checks for the individual release files, and you probably shouldn't break that. This new file makes things easy for applications, not for distributions: applications can now rely on a single file only, and use it in a nice way. Distributions will have to continue to ship the old files unless they are willing to break compatibility here.</p> <p><b>This is so useless! My application needs to be compatible with distros from 1998, so how could I ever make use of the new file? I will have to continue using the old ones!</b> True, if you need compatibility with really old distributions you do. But for new code this might not be an issue, and in general new APIs are new APIs. So if you decide to depend on it, you add a dependency on it. However, even if you need to stay compatible it might make sense to check <tt>/etc/os-release</tt> first and just fall back to the old files if it doesn't exist. The least it does for you is that you don't need 25+ <tt>open()</tt> attempts on modern distributions, but just one.</p> <p><b>You evil people are forcing my beloved distro $XYZ to adopt your awful systemd schemes. I hate you!</b> You hate too much, my friend. Also, I am pretty sure it's not difficult to see the benefit of this new file independently of systemd, and it's truly useful on systems without systemd, too.</p> <p><b>I hate what you people do, can I just ignore this?</b> Well, you really need to work on your constant feelings of hate, my friend. But, to a certain degree yes, you can ignore this for a while longer. But already, there are a number of applications making use of this file. You lose compatibility with those. Also, you are kinda working towards the further balkanization of the Linux landscape, but maybe that's your intention?</p> <p><b>You guys add a new file because you think there are already too many? You guys are so confused!</b> None of the existing files is generic and extensible enough to do what we want it to do. Hence we had to introduce a new one. We acknowledge the irony, however.</p> <p><b>The file is extensible? Awesome! I want a new field XYZ= in it!</b> Sure, it's extensible, and we are happy if distributions extend it. Please prefix your keys with your distribution's name however. Or even better: talk to us and we might be able update the documentation and make your field standard, if you convince us that it makes sense.</p> <p>Anyway, to summarize all this: if you work on an application that needs to identify the OS it is being built on or is being run on, please consider making use of this new file, we created it for you. If you work on a distribution, and your distribution doesn't support this file yet, please consider adopting this file, too.</p> <p>If you are working on a small/embedded distribution, or a legacy-free distribution we encourage you to adopt only this file and not establish any other per-distro release file.</p> <p><a href="http://www.freedesktop.org/software/systemd/man/os-release.html">Read the documentation for <tt>/etc/os-release</tt>.</a></p> <p><small><b>Footnotes</b></small></p> <p><small>[1] Yes, multitude, there's at least: <tt>/etc/redhat-release</tt>, <tt>/etc/SuSE-release</tt>, <tt>/etc/debian_version</tt>, <tt>/etc/arch-release</tt>, <tt>/etc/gentoo-release</tt>, <tt>/etc/slackware-version</tt>, <tt>/etc/frugalware-release</tt>, <tt>/etc/altlinux-release</tt>, <tt>/etc/mandriva-release</tt>, <tt>/etc/meego-release</tt>, <tt>/etc/angstrom-version</tt>, <tt>/etc/mageia-release</tt>. And some distributions even have multiple, for example Fedora has already four different files.</small></p> <p><small>[2] To our knowledge at least OpenSUSE, Fedora, ArchLinux, Angstrom, Frugalware have adopted this. (This list is not comprehensive, there are probably more.)</small></p> Lennart PoetteringMon, 13 Feb 2012 19:46:00 +0100tag:0pointer.net,2012-02-13:/blog/projects/os-release.htmlprojectsThe Case for the /usr Mergehttps://0pointer.net/blog/projects/the-usr-merge.html <p>One of the features of Fedora 17 is <a href="https://fedoraproject.org/wiki/Features/UsrMove">the /usr merge</a>, put forward by Harald Hoyer and Kay Sievers<sup>[1]</sup>. In the time since this feature has been proposed repetitive discussions took place all over the various Free Software communities, and usually the same questions were asked: what the reasons behind this feature were, and whether it makes sense to adopt the same scheme for distribution XYZ, too.</p> <p>Especially in the Non-Fedora world it appears to be socially unacceptable to actually have a look at the <a href="https://fedoraproject.org/wiki/Features/UsrMove">Fedora feature page</a> (where many of the questions are already brought up and answered) which is very unfortunate. To improve the situation I spent some time today to summarize the reasons for the /usr merge independently. I'd hence like to direct you to this new page I put up which tries to summarize the reasons for this, with an emphasis on the compatibility point of view:</p> <p><a href="http://www.freedesktop.org/wiki/Software/systemd/TheCaseForTheUsrMerge">The Case for the /usr Merge</a></p> <p>Note that even though this page is in the systemd wiki, what it covers is mostly orthogonal to systemd. systemd supports both systems with a merged /usr and with a split /usr, and the /usr merge should be interesting for non-systemd distributions as well.</p> <p>Primarily I put this together to have a nice place to point all those folks who continue to write me annoyed emails, even though I am actually not even working on all of this...</p> <p>Enjoy the read!</p> <p><b><small>Footnotes:</small></b></p> <p><small>[1] And not actually by me, I am just a supportive spectator and am not doing any work on it. Unfortunately some tech press folks created the false impression I was behind this. But credit where credit is due, this is all Harald's and Kay's work.</small></p> Lennart PoetteringThu, 26 Jan 2012 22:29:00 +0100tag:0pointer.net,2012-01-26:/blog/projects/the-usr-merge.htmlprojectsPlumbers Wishlist, The Third Edition, a.k.a. "The Thank You Edition"https://0pointer.net/blog/projects/plumbers-wishlist-3.html <p>Last October <a href="http://0pointer.de/blog/projects/plumbers-wishlist-2.html">we published a wishlist for plumbing related features</a> we'd like to see added to the Linux kernel. Three months later it's time to publish a short update, and explain what has been implemented in the kernel, what people have started working on, and what's still missing.</p> <p>The full, updated list is <a href="https://docs.google.com/document/pub?id=1RmJrtIoTnivkmR9KCqfJNBnEll4X9Jtu0xj5w6hFGs8">available on Google Docs</a>.</p> <p>In general, I must say that the list turned out to be a great success. It shows how awesome the Open Source community is: Just ask nicely and there's a good chance they'll fulfill your wishes! Thank you very much, Linux community!</p> <p>We'd like to thank everybody who worked on any of the features on that list: Lucas De Marchi, Andi Kleen, Dan Ballard, Li Zefan, Kirill A. Shutemov, Davidlohr Bueso, Cong Wang, Lennart Poettering, Kay Sievers.</p> <p>Of the items on the list 5 have been fully implemented and are already part of a released kernel, or already merged for inclusion for the next kernels being released.</p> <p>For 4 further items patches have been posted, and I am hoping they'll get merged eventually. Davidlohr, Wang, Zefan, Kirill, it would be great if you'd continue working on your patches, as we think they are following the right approach<sup>[1]</sup> even if there was some opposition to them on LKML. So, please keep pushing to solve the outstanding issues and thanks for your work so far!</p> <p><b><small>Footnotes</small></b></p> <p><small>[1] Yes, I still believe that tmpfs quota should be implemented via resource limits, as everything else wouldn't work, as we don't want to implement complex and fragile userspace infrastructure to racily upload complex quota data for all current and future UIDs ever used on the system into each tmpfs mount point at mount time.</small></p> Lennart PoetteringFri, 20 Jan 2012 21:26:00 +0100tag:0pointer.net,2012-01-20:/blog/projects/plumbers-wishlist-3.htmlprojectssystemd for Administrators, Part XIIhttps://0pointer.net/blog/projects/security.html <p>Here's <a href="http://0pointer.de/blog/projects/inetd.html">the</a> <a href="http://0pointer.de/blog/projects/instances.html">twelfth</a> <a href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a> <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a> <a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p> <h4>Securing Your Services</h4> <p>One of the core features of Unix systems is the idea of privilege separation between the different components of the OS. Many system services run under their own user IDs thus limiting what they can do, and hence the impact they may have on the OS in case they get exploited.</p> <p>This kind of privilege separation only provides very basic protection however, since in general system services run this way can still do at least as much as a normal local users, though not as much as root. For security purposes it is however very interesting to limit even further what services can do, and shut them off a couple of things that normal users are allowed to do.</p> <p>A great way to limit the impact of services is by employing MAC technologies such as SELinux. If you are interested to secure down your server, running SELinux is a very good idea. systemd enables developers and administrators to apply additional restrictions to local services independently of a MAC. Thus, regardless whether you are able to make use of SELinux you may still enforce certain security limits on your services.</p> <p>In this iteration of the series we want to focus on a couple of these security features of systemd and how to make use of them in your services. These features take advantage of a couple of Linux-specific technologies that have been available in the kernel for a long time, but never have been exposed in a widely usable fashion. These systemd features have been designed to be as easy to use as possible, in order to make them attractive to administrators and upstream developers:</p> <ul> <li>Isolating services from the network</li> <li>Service-private <tt>/tmp</tt></li> <li>Making directories appear read-only or inaccessible to services</li> <li>Taking away capabilities from services</li> <li>Disallowing forking, limiting file creation for services</li> <li>Controlling device node access of services</li> </ul> <p>All options described here are documented in systemd's man pages, notably <a href="http://0pointer.de/public/systemd-man/systemd.exec.html">systemd.exec(5)</a>. Please consult these man pages for further details.</p> <p>All these options are available on all systemd systems, regardless if SELinux or any other MAC is enabled, or not.</p> <p>All these options are relatively cheap, so if in doubt use them. Even if you might think that your service doesn't write to <tt>/tmp</tt> and hence enabling <tt>PrivateTmp=yes</tt> (as described below) might not be necessary, due to today's complex software it's still beneficial to enable this feature, simply because libraries you link to (and plug-ins to those libraries) which you do not control might need temporary files after all. Example: you never know what kind of NSS module your local installation has enabled, and what that NSS module does with <tt>/tmp</tt>.</p> <p>These options are hopefully interesting both for administrators to secure their local systems, and for upstream developers to ship their services secure by default. We strongly encourage upstream developers to consider using these options by default in their upstream service units. They are very easy to make use of and have major benefits for security.</p> <h4>Isolating Services from the Network</h4> <p>A very simple but powerful configuration option you may use in systemd service definitions is <tt>PrivateNetwork=</tt>:</p> <pre>... [Service] ExecStart=... PrivateNetwork=yes ...</pre> <p>With this simple switch a service and all the processes it consists of are entirely disconnected from any kind of networking. Network interfaces became unavailable to the processes, the only one they'll see is the loopback device "lo", but it is isolated from the real host loopback. This is a very powerful protection from network attacks.</p> <p><b>Caveat:</b> Some services require the network to be operational. Of course, nobody would consider using <tt>PrivateNetwork=yes</tt> on a network-facing service such as Apache. However even for non-network-facing services network support might be necessary and not always obvious. Example: if the local system is configured for an LDAP-based user database doing glibc name lookups with calls such as <tt>getpwnam()</tt> might end up resulting in network access. That said, even in those cases it is more often than not OK to use <tt>PrivateNetwork=yes</tt> since user IDs of system service users are required to be resolvable even without any network around. That means as long as the only user IDs your service needs to resolve are below the magic 1000 boundary using <tt>PrivateNetwork=yes</tt> should be OK.</p> <p>Internally, this feature makes use of network namespaces of the kernel. If enabled a new network namespace is opened and only the loopback device configured in it.</p> <h4>Service-Private /tmp</h4> <p>Another very simple but powerful configuration switch is <tt>PrivateTmp=</tt>:</p> <pre>... [Service] ExecStart=... PrivateTmp=yes ...</pre> <p>If enabled this option will ensure that the <tt>/tmp</tt> directory the service will see is private and isolated from the host system's <tt>/tmp</tt>. <tt>/tmp</tt> traditionally has been a shared space for all local services and users. Over the years it has been a major source of security problems for a multitude of services. Symlink attacks and DoS vulnerabilities due to guessable <tt>/tmp</tt> temporary files are common. By isolating the service's <tt>/tmp</tt> from the rest of the host, such vulnerabilities become moot.</p> <p>For Fedora 17 a <a href="https://fedoraproject.org/wiki/Features/ServicesPrivateTmp">feature has been accepted</a> in order to enable this option across a large number of services.</p> <p><b>Caveat:</b> Some services actually misuse <tt>/tmp</tt> as a location for IPC sockets and other communication primitives, even though this is almost always a vulnerability (simply because if you use it for communication you need guessable names, and guessable names make your code vulnerable to DoS and symlink attacks) and <tt>/run</tt> is the much safer replacement for this, simply because it is not a location writable to unprivileged processes. For example, X11 places it's communication sockets below <tt>/tmp</tt> (which is actually secure -- though still not ideal -- in this exception since it does so in a safe subdirectory which is created at early boot.) Services which need to communicate via such communication primitives in <tt>/tmp</tt> are no candidates for <tt>PrivateTmp=</tt>. Thankfully these days only very few services misusing <tt>/tmp</tt> like this remain.</p> <p>Internally, this feature makes use of file system namespaces of the kernel. If enabled a new file system namespace is opened inheritng most of the host hierarchy with the exception of <tt>/tmp</tt>.</p> <h4>Making Directories Appear Read-Only or Inaccessible to Services</h4> <p>With the <tt>ReadOnlyDirectories=</tt> and <tt>InaccessibleDirectories=</tt> options it is possible to make the specified directories inaccessible for writing resp. both reading and writing to the service:</p> <pre>... [Service] ExecStart=... InaccessibleDirectories=/home ReadOnlyDirectories=/var ... </pre> <p>With these two configuration lines the whole tree below <tt>/home</tt> becomes inaccessible to the service (i.e. the directory will appear empty and with 000 access mode), and the tree below <tt>/var</tt> becomes read-only.</p> <p><b>Caveat:</b> Note that <tt>ReadOnlyDirectories=</tt> currently is not recursively applied to submounts of the specified directories (i.e. mounts below <tt>/var</tt> in the example above stay writable). This is likely to get fixed soon.</p> <p>Internally, this is also implemented based on file system namspaces.</p> <h4>Taking Away Capabilities From Services</h4> <p>Another very powerful security option in systemd is <tt>CapabilityBoundingSet=</tt> which allows to limit in a relatively fine grained fashion which kernel capabilities a service started retains:</p> <pre>... [Service] ExecStart=... CapabilityBoundingSet=CAP_CHOWN CAP_KILL ... </pre> <p>In the example above only the CAP_CHOWN and CAP_KILL capabilities are retained by the service, and the service and any processes it might create have no chance to ever acquire any other capabilities again, not even via setuid binaries. The list of currently defined capabilities is available in <a href="http://linux.die.net/man/7/capabilities">capabilities(7)</a>. Unfortunately some of the defined capabilities are overly generic (such as CAP_SYS_ADMIN), however they are still a very useful tool, in particular for services that otherwise run with full root privileges.</p> <p>To identify precisely which capabilities are necessary for a service to run cleanly is not always easy and requires a bit of testing. To simplify this process a bit, it is possible to blacklist certain capabilities that are definitely not needed instead of whitelisting all that might be needed. Example: the CAP_SYS_PTRACE is a particularly powerful and security relevant capability needed for the implementation of debuggers, since it allows introspecting and manipulating any local process on the system. A service like Apache obviously has no business in being a debugger for other processes, hence it is safe to remove the capability from it:</p> <pre>... [Service] ExecStart=... CapabilityBoundingSet=~CAP_SYS_PTRACE ...</pre> <p>The <tt>~</tt> character the value assignment here is prefixed with inverts the meaning of the option: instead of listing all capabalities the service will retain you may list the ones it will not retain.</p> <p><b>Caveat:</b> Some services might react confused if certain capabilities are made unavailable to them. Thus when determining the right set of capabilities to keep around you need to do this carefully, and it might be a good idea to talk to the upstream maintainers since they should know best which operations a service might need to run successfully.</p> <p><b>Caveat 2:</b> <a href="https://forums.grsecurity.net/viewtopic.php?f=7&amp;t=2522">Capabilities are not a magic wand.</a> You probably want to combine them and use them in conjunction with other security options in order to make them truly useful.</p> <p>To easily check which processes on your system retain which capabilities use the <tt>pscap</tt> tool from the <tt>libcap-ng-utils</tt> package.</p> <p>Making use of systemd's <tt>CapabilityBoundingSet=</tt> option is often a simple, discoverable and cheap replacement for patching all system daemons individually to control the capability bounding set on their own.</p> <h4>Disallowing Forking, Limiting File Creation for Services</h4> <p>Resource Limits may be used to apply certain security limits on services being run. Primarily, resource limits are useful for resource control (as the name suggests...) not so much access control. However, two of them can be useful to disable certain OS features: RLIMIT_NPROC and RLIMIT_FSIZE may be used to disable forking and disable writing of any files with a size > 0:</p> <pre>... [Service] ExecStart=... LimitNPROC=1 LimitFSIZE=0 ...</pre> <p>Note that this will work only if the service in question drops privileges and runs under a (non-root) user ID of its own or drops the CAP_SYS_RESOURCE capability, for example via <tt>CapabilityBoundingSet=</tt> as discussed above. Without that a process could simply increase the resource limit again thus voiding any effect.</p> <p><b>Caveat:</b> <tt>LimitFSIZE=</tt> is pretty brutal. If the service attempts to write a file with a size > 0, it will immeidately be killed with the SIGXFSZ which unless caught terminates the process. Also, creating files with size 0 is still allowed, even if this option is used.</p> <p>For more information on these and other resource limits, see <a href="http://linux.die.net/man/2/setrlimit">setrlimit(2)</a>.</p> <h4>Controlling Device Node Access of Services</h4> <p>Devices nodes are an important interface to the kernel and its drivers. Since drivers tend to get much less testing and security checking than the core kernel they often are a major entry point for security hacks. systemd allows you to control access to devices individually for each service:</p> <pre>... [Service] ExecStart=... DeviceAllow=/dev/null rw ...</pre> <p>This will limit access to <tt>/dev/null</tt> and only this device node, disallowing access to any other device nodes.</p> <p>The feature is implemented on top of the <tt>devices</tt> cgroup controller.</p> <h4>Other Options</h4> <p>Besides the easy to use options above there are a number of other security relevant options available. However they usually require a bit of preparation in the service itself and hence are probably primarily useful for upstream developers. These options are <tt>RootDirectory=</tt> (to set up <tt>chroot()</tt> environments for a service) as well as <tt>User=</tt> and <tt>Group=</tt> to drop privileges to the specified user and group. These options are particularly useful to greatly simplify writing daemons, where all the complexities of securely dropping privileges can be left to systemd, and kept out of the daemons themselves.</p> <p>If you are wondering why these options are not enabled by default: some of them simply break seamntics of traditional Unix, and to maintain compatibility we cannot enable them by default. e.g. since traditional Unix enforced that <tt>/tmp</tt> was a shared namespace, and processes could use it for IPC we cannot just go and turn that off globally, just because <tt>/tmp</tt>'s role in IPC is now replaced by <tt>/run</tt>.</p> <p>And that's it for now. If you are working on unit files for upstream or in your distribution, please consider using one or more of the options listed above. If you service is secure by default by taking advantage of these options this will help not only your users but also make the Internet a safer place.</p> Lennart PoetteringFri, 20 Jan 2012 02:26:00 +0100tag:0pointer.net,2012-01-20:/blog/projects/security.htmlprojectsPulseAudio vs. AudioFlingerhttps://0pointer.net/blog/projects/aruns-numbers.html <p><a href="http://arunraghavan.net/2012/01/pulseaudio-vs-audioflinger-fight/">Arun put an awesome article up</a>, detailing how PulseAudio compares to Android's AudioFlinger in terms of power consumption and suchlike. Suffice to say, PulseAudio rocks, but go and read the whole thing, it's worth it.</p> <p>Apparently, AudioFlinger is a great choice if you want to shorten your battery life.</p> Lennart PoetteringMon, 16 Jan 2012 16:31:00 +0100tag:0pointer.net,2012-01-16:/blog/projects/aruns-numbers.htmlprojectsIntroducing the Journalhttps://0pointer.net/blog/projects/the-journal.html <p>In the past weeks we have been working on a major new addition to systemd that will hopefully positively change the Linux ecosystem in a number of ways. But see for yourself, check out the full explanation on what we have implemented on the <a href="https://docs.google.com/document/pub?id=1IC9yOXj7j6cdLLxWEBAGRL6wl97tFxgjLUEHIX3MSTs">design document we put up on Google Docs</a>.</p> Lennart PoetteringFri, 18 Nov 2011 16:28:00 +0100tag:0pointer.net,2011-11-18:/blog/projects/the-journal.htmlprojectsKernel Hackers Panelhttps://0pointer.net/blog/projects/linuxcon-kernel-panel.html <p>At LinuxCon Europe/ELCE I had the chance to moderate the <a href="https://events.linuxfoundation.org/events/linuxcon-europe/kernel-panel">kernel hackers panel with Linus Torvalds, Alan Cox, Paul McKenney and Thomas Gleixner on stage</a>. I like to believe it went quite well, but check it out for yourself, as a video recording is now available online:</p> <video width="800" height="450" controls="1"> <source src="http://free-electrons.com/pub/video/2011/elce/elce-2011-torvalds-cox-gleixner-mackenney-kernel-developer-panel-450p.webm" /> </video> <p>For me personally I think the most notable topic covered was Control Groups, and the clarification that they are something that is needed even though their implementation right now is in many ways less than perfect. But in the end there is no reasonable way around it, and much like SMP, technology that complicates things substantially but is ultimately unavoidable.</p> <p><a href="http://free-electrons.com/blog/elce-2011-videos/">Other videos from ELCE are online now, too.</a></p> Lennart PoetteringMon, 07 Nov 2011 16:53:00 +0100tag:0pointer.net,2011-11-07:/blog/projects/linuxcon-kernel-panel.htmlprojectslibabchttps://0pointer.net/blog/projects/libabc.html <p>At the Kernel Summit in Prague last week Kay Sievers and I lead a session on developing shared userspace libraries, for kernel hackers. More and more userspace interfaces of the kernel (for example many which deal with storage, audio, resource management, security, file systems or a number of other subsystems) nowadays rely on a dedicated userspace component. As people who work primarily in the plumbing layer of the Linux OS we noticed over and over again that these libraries written by people who usually are at home on the kernel side of things make the same mistakes repeatedly, thus making life for the users of the libraries unnecessarily difficult. In our session we tried to point out a number of these things, and in particular places where the usual kernel hacking style translates badly into userspace shared library hacking. Our hope is that maybe a few kernel developers have a look at our list of recommendations and consider the points we are raising.</p> <p>To make things easy we have put together an example skeleton library we dubbed <tt>libabc</tt>, whose <a href="https://git.kernel.org/?p=linux/kernel/git/kay/libabc.git;a=blob_plain;f=README">README</a> file includes all our points in terse form. It's available on kernel.org:</p> <p><a href="https://git.kernel.org/?p=linux/kernel/git/kay/libabc.git">The git repository</a> and the <a href="https://git.kernel.org/?p=linux/kernel/git/kay/libabc.git;a=blob_plain;f=README">README</a>.</p> <p>This list of recommendations draws inspiration from David Zeuthen's and Ulrich Drepper's well known papers on the topic of writing shared libraries. In the README linked above we try to distill this wealth of information into a terse list of recommendations, with a couple of additions and with a strict focus on a kernel hacker background.</p> <p>Please have a look, and even if you are not a kernel hacker there might be something useful to know in it, especially if you work on the lower layers of our stack.</p> <p>If you have any questions or additions, just ping us, or comment below!</p> Lennart PoetteringTue, 01 Nov 2011 01:46:00 +0100tag:0pointer.net,2011-11-01:/blog/projects/libabc.htmlprojectsPraguehttps://0pointer.net/blog/projects/linuxcon-europe.html <p>If you make it to Prague the coming week for the LinuxCon/ELCE/GStreamer/Kernel Summit/... superconference, make sure not to miss:</p> <ul> <li>The Linux Audio BoF with numerous Linux audio hackers, 5pm, on Sunday (23rd, i.e. today).</li> <li><a href="http://gstreamer.freedesktop.org/conference/speakers.html#raghavan">Latest developments in PulseAudio</a> by Arun Raghavan. 4pm, on Tuesday, GStreamer Summit</li> <li><a href="https://events.linuxfoundation.org/events/linuxcon-europe/kernel-panel">Linux Kernel Developer Panel</a>, a shared session of LinuxCon and ELCE. Panelists are Linus Torvalds, Alan Cox, Thomas Gleixner and Paul McKenney. Moderated by yours truly. 9:30am, on Wednesday</li> <li><a href="https://events.linuxfoundation.org/events/linuxcon-europe/poettering-sievers">systemd Administration in the Enterprise</a> by Kay Sievers and yours truly. 4:15pm, on Wednesday, LinuxCon</li> <li><a href="https://events.linuxfoundation.org/events/embedded-linux-conference-europe/kooi">Integrating systemd: Booting Userspace in Less Than 1 Second</a> by Koen Kooi. 11:15am, on Friday, ELCE</li> </ul> <p>All of that at the Clarion Hotel. See you in Prague!</p> Lennart PoetteringSun, 23 Oct 2011 01:31:00 +0200tag:0pointer.net,2011-10-23:/blog/projects/linuxcon-europe.htmlprojectsPlumbers Wishlist, The Second Editionhttps://0pointer.net/blog/projects/plumbers-wishlist-2.html <p>Two weeks ago we published a <a href="http://0pointer.de/blog/projects/plumbers-wishlist.html">Plumber's Wishlist for Linux</a>. So far, this has already created lively discussions in the community (as reported on LWN among others), and patches for a few of the items listed have already been posted (thanks a lot to those who worked on this, your contributions are much appreciated!).</p> <p><a href="https://docs.google.com/document/pub?id=1RmJrtIoTnivkmR9KCqfJNBnEll4X9Jtu0xj5w6hFGs8">We have now prepared a second version of the wish list.</a> It includes a number of additions (tmpfs quota! hostname change notifications! and more!) and updates to the previous items, including links to patches, and references to other interesting material.</p> <p>We hope to update this wishlist from time, so stay tuned!</p> <p><a href="https://docs.google.com/document/pub?id=1RmJrtIoTnivkmR9KCqfJNBnEll4X9Jtu0xj5w6hFGs8">And now, go and read the new wishlist!</a></p> Lennart PoetteringThu, 20 Oct 2011 20:41:00 +0200tag:0pointer.net,2011-10-20:/blog/projects/plumbers-wishlist-2.htmlprojectsGoogle doesn't like my namehttps://0pointer.net/blog/projects/google-doesnt-like-my-name.html <p>Nice one, Google suspended my Google+ account because I created it under, well, my name, which is "Lennart Poettering", and Google+ thinks that wasn't my name, even though it says so in my passport, and almost every document I own and I was never aware I had any other name. This is ricidulous. Google, give me my name back! This is a really uncool move.</p> Lennart PoetteringMon, 17 Oct 2011 18:50:00 +0200tag:0pointer.net,2011-10-17:/blog/projects/google-doesnt-like-my-name.htmlprojectsYour Questions for the Kernel Developer Panel at LinuxCon in Praguehttps://0pointer.net/blog/projects/kernel-hacker-panel.html #nocomments yes <p><a href="https://plus.google.com/115547683951727699051/posts/SuTUvbcJ6p9">I am currently collecting</a> questions for the <a href="https://events.linuxfoundation.org/events/linuxcon-europe/kernel-panel">kernel developer panel at LinuxCon in Prague</a>. If there's something you'd like the panelists to respond to, please post it on <a href="https://plus.google.com/115547683951727699051/posts/SuTUvbcJ6p9">the thread</a>, and I'll see what I can do. Thank you!</p> Lennart PoetteringMon, 17 Oct 2011 15:38:00 +0200tag:0pointer.net,2011-10-17:/blog/projects/kernel-hacker-panel.htmlprojectsA Big Losshttps://0pointer.net/blog/projects/a-big-loss.html <p><a href="http://googleblog.blogspot.com/2011/10/fall-sweep.html">Google announced today that they'll be shutting down Google Code Search in January</a>. I am quite sure that this would be a massive loss for the Free Software community. The ability to learn from other people's code is a key idea of Free Software. There's simply no better way to do that than with a source code search engine. The day Google Code Search will be shut down will be a sad day for the Free Software community.</p> <p>Of course, there are a couple of alternatives around, but they all have one thing in common: they, uh, don't even remotely compare to the completeness, performance and simplicity of the Google Code Search interface, and have serious usability issues. (For example: koders.com is really really slow, and splits up identifiers you search for at underscores, which kinda makes it useless for looking for almost any kind of code.)</p> <p>I think it must be of genuine interest to the Free Software community to have a capable replacement for Google Code Search, for the day it is turned off. In fact, it probably should be something the various foundations which promote Free Software should be looking into, like the FSF or the Linux Foundation. There are very few better ways to get Free Software into the heads and minds of engineers than by examples -- examples consisting of real life code they can find with a source code search engine. I believe a source code search engine is probably among the best vehicles to promote Free Software towards engineers. In particular if it itself was Free Software (in contrast to Google Code Search).</p> <p>Ideally, all software available on web sites like SourceForge, Freshmeat, or github should be indexed. But there's also a chance for distributions here: indexing the sources of all packages a distribution like Debian or Fedora include would be a great tool for developers. In fact, a distribution offering this functionality might benefit from such functionality, as it attracts developer interest in the distribution.</p> <p>It's sad that Google Code Search will be gone soon. But maybe there's something positive in the bad news here, and a chance to create something better, more comprehensive, that is free, and promotes our ideals better than Google ever could. Maybe there's a chance here for the Open Source foundations, for the distributions and for the communities to create a better replacement!</p> Lennart PoetteringFri, 14 Oct 2011 23:05:00 +0200tag:0pointer.net,2011-10-14:/blog/projects/a-big-loss.htmlprojectsDresden, California, Poznanhttps://0pointer.net/blog/photos/california.html <p><a href="http://0pointer.de/static/dresden.html"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/dresden-small.jpeg" width="1024" height="291" alt="Hofkirche, Dresden, Saxony, Germany" /></a></p> <p><i>Hofkirche, Dresden, Saxony, Germany</i></p> <p><a href="http://0pointer.de/static/bastei.html"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/bastei-small.jpeg" width="1024" height="260" alt="Bastei, Saxon Switzerland, Saxony, Germany" /></a></p> <p><i>Bastei, Saxon Switzerland, Saxony, Germany</i></p> <p><a href="http://0pointer.de/static/dresden2.html"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/dresden2-small.jpeg" width="1024" height="370" alt="Fürstenzug, Dresden, Saxony, Germany" /></a></p> <p><i>F&uuml;rstenzug, Dresden, Saxony, Germany</i></p> <p><a href="http://0pointer.de/static/california.html"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/california-small.jpeg" width="1024" height="120" alt="Near California State Route 46, California, USA" /></a></p> <p><i>Near California State Route 46, California, USA</i></p> <p><a href="http://0pointer.de/static/california2.html"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/california2-small.jpeg" width="1024" height="122" alt="Near Generals Highway, California, USA" /></a></p> <p><i>Near Generals Highway, California, USA</i></p> <p><a href="http://0pointer.de/static/california3.html"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/california3-small.jpeg" width="1024" height="230" alt="Near Generals Highway, California, USA" /></a></p> <p><i>Near Generals Highway, California, USA</i>, a bit further down the road.</p> <p><a href="http://0pointer.de/static/poznan.html"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/poznan-small.jpeg" width="1024" height="183" alt="Parish Church in Poznan, Poland" /></a></p> <p><i>Parish Church in Poznan, Poland</i></p> Lennart PoetteringSun, 09 Oct 2011 21:32:00 +0200tag:0pointer.net,2011-10-09:/blog/photos/california.htmlphotosA Plumber's Wish List for Linuxhttps://0pointer.net/blog/projects/plumbers-wishlist.html <p>Here's a <a href="http://thread.gmane.org/gmane.linux.kernel/1200272">mail we just sent to LKML</a>, for your consideration. Enjoy:</p> <pre><b>Subject: A Plumber’s Wish List for Linux</b> We’d like to share our current wish list of plumbing layer features we are hoping to see implemented in the near future in the Linux kernel and associated tools. Some items we can implement on our own, others are not our area of expertise, and we will need help getting them implemented. Acknowledging that this wish list of ours only gets longer and not shorter, even though we have implemented a number of other features on our own in the previous years, we are posting this list here, in the hope to find some help. If you happen to be interested in working on something from this list or able to help out, we’d be delighted. Please ping us in case you need clarifications or more information on specific items. Thanks, Kay, Lennart, Harald, in the name of all the other plumbers An here’s the wish list, in no particular order: * (ioctl based?) interface to query and modify the label of a mounted FAT volume: A FAT labels is implemented as a hidden directory entry in the file system which need to be renamed when changing the file system label, this is impossible to do from userspace without unmounting. Hence we’d like to see a kernel interface that is available on the mounted file system mount point itself. Of course, bonus points if this new interface can be implemented for other file systems as well, and also covers fs UUIDs in addition to labels. * CPU modaliases in /sys/devices/system/cpu/cpuX/modalias: useful to allow module auto-loading of e.g. cpufreq drivers and KVM modules. Andy Kleen has a patch to create the alias file itself. CPU ‘struct sysdev’ needs to be converted to ‘struct device’ and a ‘struct bus_type cpu’ needs to be introduced to allow proper CPU coldplug event replay at bootup. This is one of the last remaining places where automatic hardware-triggered module auto-loading is not available. And we’d like to see that fix to make numerous ugly userspace work-arounds to achieve the same go away. * expose CAP_LAST_CAP somehow in the running kernel at runtime: Userspace needs to know the highest valid capability of the running kernel, which right now cannot reliably be retrieved from header files only. The fact that this value cannot be detected properly right now creates various problems for libraries compiled on newer header files which are run on older kernels. They assume capabilities are available which actually aren’t. Specifically, libcap-ng claims that all running processes retain the higher capabilities in this case due to the “inverted” semantics of CapBnd in /proc/$PID/status. * export ‘struct device_type fb/fbcon’ of ‘struct class graphics’ Userspace wants to easily distinguish ‘fb’ and ‘fbcon’ from each other without the need to match on the device name. * allow changing argv[] of a process without mucking with environ[]: Something like setproctitle() or a prctl() would be ideal. Of course it is questionable if services like sendmail make use of this, but otoh for services which fork but do not immediately exec() another binary being able to rename this child processes in ps is of importance. * module-init-tools: provide a proper libmodprobe.so from module-init-tools: Early boot tools, installers, driver install disks want to access information about available modules to optimize bootup handling. * fork throttling mechanism as basic cgroup functionality that is available in all hierarchies independent of the controllers used: This is important to implement race-free killing of all members of a cgroup, so that cgroup member processes cannot fork faster then a cgroup supervisor process could kill them. This needs to be recursive, so that not only a cgroup but all its subgroups are covered as well. * proper cgroup-is-empty notification interface: The current call_usermodehelper() interface is an unefficient and an ugly hack. Tools would prefer anything more lightweight like a netlink, poll() or fanotify interface. * allow user xattrs to be set on files in the cgroupfs (and maybe procfs?) * simple, reliable and future-proof way to detect whether a specific pid is running in a CLONE_NEWPID container, i.e. not in the root PID namespace. Currently, there are available a few ugly hacks to detect this (for example a process wanting to know whether it is running in a PID namespace could just look for a PID 2 being around and named kthreadd which is a kernel thread only visible in the root namespace), however all these solutions encode information and expectations that better shouldn’t be encoded in a namespace test like this. This functionality is needed in particular since the removal of the the ns cgroup controller which provided the namespace membership information to user code. * allow making use of the “cpu” cgroup controller by default without breaking RT. Right now creating a cgroup in the “cpu” hierarchy that shall be able to take advantage of RT is impossible for the generic case since it needs an RT budget configured which is from a limited resource pool. What we want is the ability to create cgroups in “cpu” whose processes get an non-RT weight applied, but for RT take advantage of the parent’s RT budget. We want the separation of RT and non-RT budget assignment in the “cpu” hierarchy, because right now, you lose RT functionality in it unless you assign an RT budget. This issue severely limits the usefulness of “cpu” hierarchy on general purpose systems right now. * Add a timerslack cgroup controller, to allow increasing the timer slack of user session cgroups when the machine is idle. * An auxiliary meta data message for AF_UNIX called SCM_CGROUPS (or something like that), i.e. a way to attach sender cgroup membership to messages sent via AF_UNIX. This is useful in case services such as syslog shall be shared among various containers (or service cgroups), and the syslog implementation needs to be able to distinguish the sending cgroup in order to separate the logs on disk. Of course stm SCM_CREDENTIALS can be used to look up the PID of the sender followed by a check in /proc/$PID/cgroup, but that is necessarily racy, and actually a very real race in real life. * SCM_COMM, with a similar use case as SCM_CGROUPS. This auxiliary control message should carry the process name as available in /proc/$PID/comm.</pre> Lennart PoetteringFri, 07 Oct 2011 01:22:00 +0200tag:0pointer.net,2011-10-07:/blog/projects/plumbers-wishlist.htmlprojectsWhat You Need to Know When Becoming a Free Software Hackerhttps://0pointer.net/blog/projects/hinter-den-kulissen.html <p>Earlier today I gave a presentation at the Technical University Berlin about things you need to know, things you should expect and things you shouldn't expect when your are aspiring to become a successful Free Software Hacker.</p> <p>I have put my slides up on Google Docs in case you are interested, either because you are the target audience (i.e. a university student) or because you need inspiration for a similar talk about the same topic.</p> <p>The first two slides are in German language, so skip over them. The interesting bits are all in English. I hope it's quite comprehensive (though of course terse). Enjoy:</p> <iframe src="https://docs.google.com/present/embed?id=dd4d9j2z_1r8fjkqc7" frameborder="0" width="410" height="342"></iframe> <p>In case your feed reader/planet messes this up, <a href="https://docs.google.com/present/view?id=dd4d9j2z_1r8fjkqc7">here's the non-embedded version</a>.</p> <p>Oh, and thanks to everybody who <a href="https://plus.google.com/115547683951727699051/posts/UqNgFiV3qTx">reviewed and suggested additions to the the slides on +</a>.</p> Lennart PoetteringThu, 06 Oct 2011 22:05:00 +0200tag:0pointer.net,2011-10-06:/blog/projects/hinter-den-kulissen.htmlprojectsPulseAudio 1.0https://0pointer.net/blog/projects/pa-one-dot-zero.html #nocomments y <p><a href="http://www.freedesktop.org/wiki/Software/PulseAudio/Notes/1.0">PulseAudio 1.0 is out now.</a> It's awesome. Get it while it is hot!</p> <p>I'd like to thank Colin Guthrie and Arun Raghavan (and all the others involved) for getting this release out of the door!</p> Lennart PoetteringTue, 27 Sep 2011 16:07:00 +0200tag:0pointer.net,2011-09-27:/blog/projects/pa-one-dot-zero.htmlprojectssystemd for Administrators, Part XIhttps://0pointer.net/blog/projects/inetd.html <p>Here's the <a href="http://0pointer.de/blog/projects/instances.html">eleventh</a> <a href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a> <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a> <a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p> <h4>Converting inetd Services</h4> <p><a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">In a previous episode of this series</a> I covered how to convert a SysV init script to a systemd unit file. In this story I hope to explain how to convert inetd services into systemd units.</p> <p>Let's start with a bit of background. inetd has a long tradition as one of the classic Unix services. As a superserver it listens on an Internet socket on behalf of another service and then activate that service on an incoming connection, thus implementing an on-demand socket activation system. This allowed Unix machines with limited resources to provide a large variety of services, without the need to run processes and invest resources for all of them all of the time. Over the years a number of independent implementations of inetd have been shipped on Linux distributions. The most prominent being the ones based on BSD inetd and xinetd. While inetd used to be installed on most distributions by default, it nowadays is used only for very few selected services and the common services are all run unconditionally at boot, primarily for (perceived) performance reasons.</p> <p>One of the core feature of systemd (and Apple's launchd for the matter) is socket activation, a scheme pioneered by inetd, however back then with a different focus. Systemd-style socket activation focusses on local sockets (AF_UNIX), not so much Internet sockets (AF_INET), even though both are supported. And more importantly even, socket activation in systemd is not primarily about the on-demand aspect that was key in inetd, but more on increasing parallelization (socket activation allows starting clients and servers of the socket at the same time), simplicity (since the need to configure explicit dependencies between services is removed) and robustness (since services can be restarted or may crash without loss of connectivity of the socket). However, systemd can also activate services on-demand when connections are incoming, if configured that way.</p> <p>Socket activation of any kind requires support in the services themselves. systemd provides a very simple interface that services may implement to provide socket activation, built around <a href="http://0pointer.de/public/systemd-man/sd_listen_fds.html">sd_listen_fds()</a>. <a href="http://0pointer.de/blog/projects/socket-activation.html">As such it is already a very minimal, simple scheme</a>. However, the traditional inetd interface is even simpler. It allows passing only a single socket to the activated service: the socket fd is simply duplicated to STDIN and STDOUT of the process spawned, and that's already it. In order to provide compatibility systemd optionally offers the same interface to processes, thus taking advantage of the many services that already support inetd-style socket activation, but not yet systemd's native activation.</p> <p>Before we continue with a concrete example, let's have a look at three different schemes to make use of socket activation:</p> <ol> <li><b>Socket activation for parallelization, simplicity, robustness:</b> sockets are bound during early boot and a singleton service instance to serve all client requests is immediately started at boot. This is useful for all services that are very likely used frequently and continously, and hence starting them early and in parallel with the rest of the system is advisable. Examples: D-Bus, Syslog.</li> <li><b>On-demand socket activation for singleton services:</b> sockets are bound during early boot and a singleton service instance is executed on incoming traffic. This is useful for services that are seldom used, where it is advisable to save the resources and time at boot and delay activation until they are actually needed. Example: CUPS.</li> <li><b>On-demand socket activation for per-connection service instances:</b> sockets are bound during early boot and for each incoming connection a new service instance is instantiated and the connection socket (and not the listening one) is passed to it. This is useful for services that are seldom used, and where performance is not critical, i.e. where the cost of spawning a new service process for each incoming connection is limited. Example: SSH.</li> </ol> <p>The three schemes provide different performance characteristics. After the service finishes starting up the performance provided by the first two schemes is identical to a stand-alone service (i.e. one that is started without a super-server, without socket activation), since the listening socket is passed to the actual service, and code paths from then on are identical to those of a stand-alone service and all connections are processes exactly the same way as they are in a stand-alone service. On the other hand, performance of the third scheme is usually not as good: since for each connection a new service needs to be started the resource cost is much higher. However, it also has a number of advantages: for example client connections are better isolated and it is easier to develop services activated this way.</p> <p>For systemd primarily the first scheme is in focus, however the other two schemes are supported as well. (In fact, the blog story <a href="http://0pointer.de/blog/projects/socket-activation2.html">I covered the necessary code changes for systemd-style socket activation in</a> was about a service of the second type, i.e. CUPS). inetd primarily focusses on the third scheme, however the second scheme is supported too. (The first one isn't. Presumably due the focus on the third scheme inetd got its -- a bit unfair -- reputation for being "slow".)</p> <p>So much about the background, let's cut to the beef now and show an inetd service can be integrated into systemd's socket activation. We'll focus on SSH, a very common service that is widely installed and used but on the vast majority of machines probably not started more often than 1/h in average (and usually even much less). SSH has supported inetd-style activation since a long time, following the third scheme mentioned above. Since it is started only every now and then and only with a limited number of connections at the same time it is a very good candidate for this scheme as the extra resource cost is negligble: if made socket-activatable SSH is basically free as long as nobody uses it. And as soon as somebody logs in via SSH it will be started and the moment he or she disconnects all its resources are freed again. Let's find out how to make SSH socket-activatable in systemd taking advantage of the provided inetd compatibility!</p> <p>Here's the configuration line used to hook up SSH with classic inetd:</p> <pre>ssh stream tcp nowait root /usr/sbin/sshd sshd -i</pre> <p>And the same as xinetd configuration fragment:</p> <pre>service ssh { socket_type = stream protocol = tcp wait = no user = root server = /usr/sbin/sshd server_args = -i }</pre> <p>Most of this should be fairly easy to understand, as these two fragments express very much the same information. The non-obvious parts: the port number (22) is not configured in inetd configuration, but indirectly via the service database in <tt>/etc/services</tt>: the service name is used as lookup key in that database and translated to a port number. This indirection via <tt>/etc/services</tt> has been part of Unix tradition though has been getting more and more out of fashion, and the newer xinetd hence optionally allows configuration with explicit port numbers. The most interesting setting here is the not very intuitively named <tt>nowait</tt> (resp. <tt>wait=no</tt>) option. It configures whether a service is of the second (<tt>wait</tt>) resp. third (<tt>nowait</tt>) scheme mentioned above. Finally the <tt>-i</tt> switch is used to enabled inetd mode in SSH.</p> <p>The systemd translation of these configuration fragments are the following two units. First: <tt>sshd.socket</tt> is a unit encapsulating information about a socket to listen on:</p> <pre>[Unit] Description=SSH Socket for Per-Connection Servers [Socket] ListenStream=22 Accept=yes [Install] WantedBy=sockets.target </pre> <p>Most of this should be self-explanatory. A few notes: <tt>Accept=yes</tt> corresponds to <tt>nowait</tt>. It's hopefully better named, referring to the fact that for <tt>nowait</tt> the superserver calls <tt>accept()</tt> on the listening socket, where for <tt>wait</tt> this is the job of the executed service process. <tt>WantedBy=sockets.target</tt> is used to ensure that when enabled this unit is activated at boot at the right time.</p> <p>And here's the matching service file <tt>sshd@.service</tt>:</p> <pre> [Unit] Description=SSH Per-Connection Server [Service] ExecStart=-/usr/sbin/sshd -i StandardInput=socket </pre> <p>This too should be mostly self-explanatory. Interesting is <tt>StandardInput=socket</tt>, the option that enables inetd compatibility for this service. <tt>StandardInput=</tt> may be used to configure what STDIN of the service should be connected for this service (see <a href="http://0pointer.de/public/systemd-man/systemd.exec.html">the man page for details</a>). By setting it to <tt>socket</tt> we make sure to pass the connection socket here, as expected in the simple inetd interface. Note that we do not need to explicitly configure <tt>StandardOutput=</tt> here, since by default the setting from <tt>StandardInput=</tt> is inherited if nothing else is configured. Important is the "-" in front of the binary name. This ensures that the exit status of the per-connection sshd process is forgotten by systemd. Normally, systemd will store the exit status of a all service instances that die abnormally. SSH will sometimes die abnormally with an exit code of 1 or similar, and we want to make sure that this doesn't cause systemd to keep around information for numerous previous connections that died this way (until this information is forgotten with <tt>systemctl reset-failed</tt>).</p> <p><tt>sshd@.service</tt> is an instantiated service, as described <a href="http://0pointer.de/blog/projects/instances.html">in the preceeding installment of this series</a>. For each incoming connection systemd will instantiate a new instance of <tt>sshd@.service</tt>, with the instance identifier named after the connection credentials.</p> <p>You may wonder why in systemd configuration of an inetd service requires two unit files instead of one. The reason for this is that to simplify things we want to make sure that the relation between live units and unit files is obvious, while at the same time we can order the socket unit and the service units independently in the dependency graph and control the units as independently as possible. (Think: this allows you to shutdown the socket independently from the instances, and each instance individually.)</p> <p>Now, let's see how this works in real life. If we drop these files into <tt>/etc/systemd/system</tt> we are ready to enable the socket and start it:</p> <pre># systemctl enable sshd.socket ln -s '/etc/systemd/system/sshd.socket' '/etc/systemd/system/sockets.target.wants/sshd.socket' # systemctl start sshd.socket # systemctl status sshd.socket sshd.socket - SSH Socket for Per-Connection Servers Loaded: loaded (/etc/systemd/system/sshd.socket; enabled) Active: active (listening) since Mon, 26 Sep 2011 20:24:31 +0200; 14s ago Accepted: 0; Connected: 0 CGroup: name=systemd:/system/sshd.socket </pre> <p>This shows that the socket is listening, and so far no connections have been made (<tt>Accepted:</tt> will show you how many connections have been made in total since the socket was started, <tt>Connected:</tt> how many connections are currently active.)</p> <p>Now, let's connect to this from two different hosts, and see which services are now active:</p> <pre> $ systemctl --full | grep ssh sshd@172.31.0.52:22-172.31.0.4:47779.service loaded active running SSH Per-Connection Server sshd@172.31.0.52:22-172.31.0.54:52985.service loaded active running SSH Per-Connection Server sshd.socket loaded active listening SSH Socket for Per-Connection Servers </pre> <p>As expected, there are now two service instances running, for the two connections, and they are named after the source and destination address of the TCP connection as well as the port numbers. (For AF_UNIX sockets the instance identifier will carry the PID and UID of the connecting client.) This allows us to invidiually introspect or kill specific sshd instances, in case you want to terminate the session of a specific client:</p> <pre># systemctl kill sshd@172.31.0.52:22-172.31.0.4:47779.service</pre> <p>And that's probably already most of what you need to know for hooking up inetd services with systemd and how to use them afterwards.</p> <p>In the case of SSH it is probably a good suggestion for most distributions in order to save resources to default to this kind of inetd-style socket activation, but provide a stand-alone unit file to sshd as well which can be enabled optionally. I'll soon file a wishlist bug about this against our SSH package in Fedora.</p> <p>A few final notes on how xinetd and systemd compare feature-wise, and whether xinetd is fully obsoleted by systemd. The short answer here is that systemd does not provide the full xinetd feature set and that is does not fully obsolete xinetd. The longer answer is a bit more complex: if you look at the <a href="http://linux.die.net/man/5/xinetd.conf">multitude of options</a> xinetd provides you'll notice that systemd does not compare. For example, systemd does not come with built-in <tt>echo</tt>, <tt>time</tt>, <tt>daytime</tt> or <tt>discard</tt> servers, and never will include those. TCPMUX is not supported, and neither are RPC services. However, you will also find that most of these are either irrelevant on today's Internet or became other way out-of-fashion. The vast majority of inetd services do not directly take advantage of these additional features. In fact, none of the xinetd services shipped on Fedora make use of these options. That said, there are a couple of useful features that systemd does not support, for example IP ACL management. However, most administrators will probably agree that firewalls are the better solution for these kinds of problems and on top of that, systemd supports ACL management via tcpwrap for those who indulge in retro technologies like this. On the other hand systemd also provides numerous features <tt>xinetd</tt> does not provide, starting with the individual control of instances shown above, or the more expressive configurability of the <a href="http://0pointer.de/public/systemd-man/systemd.exec.html">execution context for the instances</a>. I believe that what systemd provides is quite comprehensive, comes with little legacy cruft but should provide you with everything you need. And if there's something systemd does not cover, <tt>xinetd</tt> will always be there to fill the void as you can easily run it in conjunction with <tt>systemd</tt>. For the majority of uses systemd should cover what is necessary, and allows you cut down on the required components to build your system from. In a way, systemd brings back the functionality of classic Unix inetd and turns it again into a center piece of a Linux system.</p> <p>And that's all for now. Thanks for reading this long piece. And now, get going and convert your services over! Even better, do this work in the individual packages upstream or in your distribution!</p> Lennart PoetteringMon, 26 Sep 2011 20:46:00 +0200tag:0pointer.net,2011-09-26:/blog/projects/inetd.htmlprojectssystemd for Administrators, Part Xhttps://0pointer.net/blog/projects/instances.html <p>Here's the tenth <a href="http://0pointer.de/blog/projects/on-etc-sysinit.html">installment</a> <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a> <a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p> <h4>Instantiated Services</h4> <p>Most services on Linux/Unix are <i>singleton</i> services: there's usually only one instance of Syslog, Postfix, or Apache running on a specific system at the same time. On the other hand some select services may run in multiple instances on the same host. For example, an Internet service like the Dovecot IMAP service could run in multiple instances on different IP ports or different local IP addresses. A more common example that exists on all installations is <i>getty</i>, the mini service that runs once for each TTY and presents a login prompt on it. On most systems this service is instantiated once for each of the first six virtual consoles <tt>tty1</tt> to <tt>tty6</tt>. On some servers depending on administrator configuration or boot-time parameters an additional getty is instantiated for a serial or virtualizer console. Another common instantiated service in the systemd world is <i>fsck</i>, the file system checker that is instantiated once for each block device that needs to be checked. Finally, in systemd socket activated per-connection services (think classic inetd!) are also implemented via instantiated services: a new instance is created for each incoming connection. In this installment I hope to explain a bit how systemd implements instantiated services and how to take advantage of them as an administrator.</p> <p>If you followed the previous episodes of this series you are probably aware that services in systemd are named according to the pattern <tt><i>foobar</i>.service</tt>, where <i>foobar</i> is an identification string for the service, and <tt>.service</tt> simply a fixed suffix that is identical for all service units. The definition files for these services are searched for in <tt>/etc/systemd/system</tt> and <tt>/lib/systemd/system</tt> (and possibly other directories) under this name. For instantiated services this pattern is extended a bit: the service name becomes <tt><i>foobar</i>@<i>quux</i>.service</tt> where <i>foobar</i> is the common service identifier, and <i>quux</i> the instance identifier. Example: <tt>serial-getty@ttyS2.service</tt> is the serial getty service instantiated for <tt>ttyS2</tt>.</p> <p>Service instances can be created dynamically as needed. Without further configuration you may easily start a new getty on a serial port simply by invoking a <tt>systemctl start</tt> command for the new instance:</p> <pre># systemctl start serial-getty@ttyUSB0.service</pre> <p>If a command like the above is run systemd will first look for a unit configuration file by the exact name you requested. If this service file is not found (and usually it isn't if you use instantiated services like this) then the instance id is removed from the name and a unit configuration file by the resulting <i>template</i> name searched. In other words, in the above example, if the precise <tt>serial-getty@ttyUSB0.service</tt> unit file cannot be found, <tt>serial-getty@.service</tt> is loaded instead. This unit template file will hence be common for all instances of this service. For the serial getty we ship a template unit file in systemd (<tt>/lib/systemd/system/serial-getty@.service</tt>) that looks something like this:</p> <pre>[Unit] Description=Serial Getty on %I BindTo=dev-%i.device After=dev-%i.device systemd-user-sessions.service [Service] ExecStart=-/sbin/agetty -s %I 115200,38400,9600 Restart=always RestartSec=0 </pre> <p>(Note that the unit template file we actually ship along with systemd for the serial gettys is a bit longer. If you are interested, have a look at the <a href="http://cgit.freedesktop.org/systemd/plain/units/serial-getty@.service.m4">actual file</a> which includes additional directives for compatibility with SysV, to clear the screen and remove previous users from the TTY device. To keep things simple I have shortened the unit file to the relevant lines here.)</p> <p>This file looks mostly like any other unit file, with one distinction: the specifiers <tt>%I</tt> and <tt>%i</tt> are used at multiple locations. At unit load time <tt>%I</tt> and <tt>%i</tt> are replaced by systemd with the instance identifier of the service. In our example above, if a service is instantiated as <tt>serial-getty@ttyUSB0.service</tt> the specifiers <tt>%I</tt> and <tt>%i</tt> will be replaced by <tt>ttyUSB0</tt>. If you introspect the instanciated unit with <tt>systemctl status serial-getty@ttyUSB0.service</tt> you will see these replacements having taken place:</p> <pre>$ systemctl status serial-getty@ttyUSB0.service serial-getty@ttyUSB0.service - Getty on ttyUSB0 Loaded: loaded (/lib/systemd/system/serial-getty@.service; static) Active: active (running) since Mon, 26 Sep 2011 04:20:44 +0200; 2s ago Main PID: 5443 (agetty) CGroup: name=systemd:/system/getty@.service/ttyUSB0 └ 5443 /sbin/agetty -s ttyUSB0 115200,38400,9600 </pre> <p>And that is already the core idea of instantiated services in systemd. As you can see systemd provides a very simple templating system, which can be used to dynamically instantiate services as needed. To make effective use of this, a few more notes:</p> <p>You may instantiate these services <i>on-the-fly</i> in <tt>.wants/</tt> symbolic links in the file system. For example, to make sure the serial getty on <tt>ttyUSB0</tt> is started automatically at every boot, create a symlink like this:</p> <pre># ln -s /lib/systemd/system/serial-getty@.service /etc/systemd/system/getty.target.wants/serial-getty@<b>ttyUSB0</b>.service</pre> <p>systemd will instantiate the symlinked unit file with the instance name specified in the symlink name.</p> <p>You cannot instantiate a unit template without specifying an instance identifier. In other words <tt>systemctl start serial-getty@.service</tt> will necessarily fail since the instance name was left unspecified.</p> <p>Sometimes it is useful to <i>opt-out</i> of the generic template for one specific instance. For these cases make use of the fact that systemd always searches first for the full instance file name before falling back to the template file name: make sure to place a unit file under the fully instantiated name in <tt>/etc/systemd/system</tt> and it will override the generic templated version for this specific instance.</p> <p>The unit file shown above uses <tt>%i</tt> at some places and <tt>%I</tt> at others. You may wonder what the difference between these specifiers are. <tt>%i</tt> is replaced by the exact characters of the instance identifier. For <tt>%I</tt> on the other hand the instance identifier is first passed through a simple unescaping algorithm. In the case of a simple instance identifier like <tt>ttyUSB0</tt> there is no effective difference. However, if the device name includes one or more slashes ("<tt>/</tt>") this cannot be part of a unit name (or Unix file name). Before such a device name can be used as instance identifier it needs to be escaped so that "/" becomes "-" and most other special characters (including "-") are replaced by "\xAB" where AB is the ASCII code of the character in hexadecimal notation<sup>[1]</sup>. Example: to refer to a USB serial port by its bus path we want to use a port name like <tt>serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0</tt>. The escaped version of this name is <tt>serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0</tt>. <tt>%I</tt> will then refer to former, <tt>%i</tt> to the latter. Effectively this means <tt>%i</tt> is useful wherever it is necessary to refer to other units, for example to express additional dependencies. On the other hand <tt>%I</tt> is useful for usage in command lines, or inclusion in pretty description strings. Let's check how this looks with the above unit file:</p> <pre># systemctl start 'serial-getty@serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0.service' # systemctl status 'serial-getty@serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0.service' serial-getty@serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0.service - Serial Getty on serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0 Loaded: loaded (/lib/systemd/system/serial-getty@.service; static) Active: active (running) since Mon, 26 Sep 2011 05:08:52 +0200; 1s ago Main PID: 5788 (agetty) CGroup: name=systemd:/system/serial-getty@.service/serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0 └ 5788 /sbin/agetty -s serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0 115200 38400 9600 </pre> <p>As we can see the while the instance identifier is the escaped string the command line and the description string actually use the unescaped version, as expected.</p> <p>(Side note: there are more specifiers available than just <tt>%i</tt> and <tt>%I</tt>, and many of them are actually available in all unit files, not just templates for service instances. For more details see the <a href="http://0pointer.de/public/systemd-man/systemd.unit.html">man page</a> which includes a full list and terse explanations.)</p> <p>And at this point this shall be all for now. Stay tuned for a follow-up article on how instantiated services are used for <tt>inetd</tt>-style socket activation.</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] Yupp, this escaping algorithm doesn't really result in particularly pretty escaped strings, but then again, most escaping algorithms don't help readability. The algorithm we used here is inspired by what udev does in a similar case, with one change. In the end, we had to pick something. If you'll plan to comment on the escaping algorithm please also mention where you live so that I can come around and paint your bike shed yellow with blue stripes. Thanks!</small></p> Lennart PoetteringMon, 26 Sep 2011 05:11:00 +0200tag:0pointer.net,2011-09-26:/blog/projects/instances.htmlprojectsBoot/Init LPC MC Summary at LWNhttps://0pointer.net/blog/projects/lwn-lpc-2011.html <p>Make sure to read the summary of the <a href="http://lwn.net/SubscriberLink/458789/3ae00c9827889929/">Boot &amp; Init Microconf at the Linux Plumbers Conference 2011 In Santa Rosa, CA</a>. It was a fantastic conference (at the social event we took busses from the appetizers to the mains...), and this summary should give you quite a good idea what we discussed there. Highly recommended read.</p> Lennart PoetteringSat, 17 Sep 2011 17:56:00 +0200tag:0pointer.net,2011-09-17:/blog/projects/lwn-lpc-2011.htmlprojectssystemd US Tour Dateshttps://0pointer.net/blog/projects/us-tour-dates.html <p>Kay Sievers, Harald Hoyer and I will tour the US in the next weeks. If you have any questions on systemd, udev or dracut (or any of the related technologies), then please do get in touch with us on the following occasions:</p> <blockquote><p>Linux Plumbers Conference, Santa Rosa, CA, Sep 7-9th<br /> Google, Googleplex, Mountain View, CA, Sep 12th<br /> Red Hat, Westford, MA, Sep 13-14th</p></blockquote> <p>As usual LPC is going to rock, so make sure to be there!</p> Lennart PoetteringThu, 01 Sep 2011 15:37:00 +0200tag:0pointer.net,2011-09-01:/blog/projects/us-tour-dates.htmlprojectsHow to Write syslog Daemons Which Cooperate Nicely With systemdhttps://0pointer.net/blog/projects/syslog.html <p>I just finished putting together a text on the systemd wiki explaining what to do to write a syslog service that is nicely integrated with systemd, and does all the right things. It's supposed to be a checklist for all syslog hackers:</p> <p><a href="http://www.freedesktop.org/wiki/Software/systemd/syslog">Read it now</a>.</p> <p>rsyslog already implements everything on this list afaics, and that's pretty cool. If other implementations want to catch up, please consider following these recommendations, too.</p> <p>I put this together since I have changed systemd 35 to set <tt>StandardOutput=syslog</tt> as default, so that all stdout/stderr of all services automatically ends up in syslog. And since that change requires some (minimal) changes to all syslog implementations I decided to document this all properly (if you are curious: they need to set <tt>StandardOutput=null</tt> to opt out of this default in order to avoid logging loops).</p> <p>Anyway, please have a peek and comment if you spot a mistake or something I forgot. Or if you have questions, just ask.</p> Lennart PoetteringTue, 30 Aug 2011 23:14:00 +0200tag:0pointer.net,2011-08-30:/blog/projects/syslog.htmlprojectsHow to Behave Nicely in the cgroup Treeshttps://0pointer.net/blog/projects/pax-cgroups.html <p>The Linux cgroup hierarchies of the various kernel controllers are a shared resource. Recently many components of Linux userspace started making use of these hierarchies. In order to avoid that the various programs step on each others toes while manipulating this shared resource we have put together a list of recommendations. Programs following these guidelines should work together nicely without interfering with other users of the hierarchies.</p> <p><a href="http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups">These guidelines are available in the systemd wiki.</a> I'd be very interested in feedback, and would like to ask you to ping me in case we forgot something or left something too vague.</p> <p>And please, if you are writing software that interfaces with the cgroup tree consider following these recommendations. Thank you.</p> Lennart PoetteringFri, 19 Aug 2011 16:25:00 +0200tag:0pointer.net,2011-08-19:/blog/projects/pax-cgroups.htmlprojectsThe Desktop Summit Wiki Is Full Of Interesting Stuffhttps://0pointer.net/blog/projects/ds-wiki.html <p>Just wanted to draw your attention to the <a href="http://wiki.desktopsummit.org/Main_Page">Desktop Summit Wiki</a>. If you are attending the Desktop Summit in Berlin you might find some interesting information in the Wiki.</p> <ul> <li>If you are arriving by plane and want to share a ride (even S-Bahn trains/bus) from ether of the two airports, consider adding your name to <a href="http://wiki.desktopsummit.org/Attendee_Arrival_Dates">this list.</a> It's still a bit empty (since I just set it up 3min ago) but that'll hopefully change quickly.</li> <li><a href="http://wiki.desktopsummit.org/Getting_around">Some information on getting around in Berlin</a> (i.e. which public transport tickets to buy)</li> <li><a href="http://wiki.desktopsummit.org/Pre-paid_SIM">Where to get a SIM card for your phone</a></li> <li><a href="http://wiki.desktopsummit.org/Sight-seeing">Some sights to see</a></li> <li><a href="http://wiki.desktopsummit.org/Going_out">Where to get wasted</a></li> <li><a href="http://wiki.desktopsummit.org/Food">Where to eat</a></li> </ul> <p><a href="http://wiki.desktopsummit.org/Main_Page">Go to the main page of the Wiki here.</a> You are welcome to edit and add additional information to the Wiki. To edit the Wiki authenticate with the same credentials you used to sign up for the conference at the Desktop Summit web site.</p> <p>See you on friday!</p> Lennart PoetteringTue, 02 Aug 2011 22:56:00 +0200tag:0pointer.net,2011-08-02:/blog/projects/ds-wiki.htmlprojectsDesktop Summit Announcements, Part IIhttps://0pointer.net/blog/projects/desktop-summit-announce2.html <p><a href="http://0pointer.de/blog/projects/desktop-summit-announce.html">Read the first part of the announcements.</a></p> <p>And now there are more exciting announcements:</p> <ul> <li><a href="https://desktopsummit.org/news/copyright-assignment-panel">The Panel on Copyright Assignement</a> has been announced, featuring SUSE's <b>Michael Meeks</b>, Canonical's <b>Mark Shuttleworth</b> and <b>Bradley Kuhn</b> from the Software Freedom Conservancy. This session will be moderated by GNOME's <b>Karen Sandler</b>.</li> <li><a href="https://desktopsummit.org/interviews/nick-richards">The fifth and final keynote Interview</a> has been published, with Nick Richards from Intel.</li> <li><a href="https://desktopsummit.org/news/conference-attendee-policy-published">The conference attendee policy</a> as been published.</li> </ul> <p>Only 5 days are now left to beginning of the conference. The <a href="https://desktopsummit.org/program/pre-registration">first event</a> will already take place on <b>Friday August 5th</b>, at <b><a href="http://www.c-base.org/">c-base</a> at U/S Jannowitzbr&uuml;cke</b>, starting at 4pm. The conference programme itself will begin on <b>Saturday August 6th, 10am</b> (though do come earlier, for registration, if you didn't register at the c-base event already!). Note that the primary entrance to the Desktop Summit is in the <b>north-eastern corner</b> of the main building of Humboldt University. That's on Dorotheenstr./Hegelplatz, and <i>not</i> on Unter den Linden.</p> <p>See you on Friday at c-base!</p> Lennart PoetteringMon, 01 Aug 2011 18:08:00 +0200tag:0pointer.net,2011-08-01:/blog/projects/desktop-summit-announce2.htmlprojectsDesktop Summit Announcementshttps://0pointer.net/blog/projects/desktop-summit-announce.html <p>In case you missed them, there have been a couple of exciting announcements around the Desktop Summit in Berlin, Germany.</p> <ul> <li><a href="https://desktopsummit.org/program/keynotes">The three keynotes have been announced</a>.</li> <li>Interviews with the keynote speakers have been published: <a href="https://desktopsummit.org/interviews/thomas-thwaite">Thomas Twaite</a>, <a href="https://desktopsummit.org/interviews/claire-rowland">Claire Rowland</a>, <a href="https://desktopsummit.org/interviews/dirk-hohndel">Dirk Hohndel</a>, <a href="https://desktopsummit.org/%20interviews/stuart-jarvis">Stuart Jarvis</a>.</li> <li><a href="https://desktopsummit.org/news/t-shirt-design-chosen">The Desktop Summit T-Shirt design has been announced.</a></li> <li><a href="http://blixtra.org/blog/2011/07/21/desktop-summit-the-social-events/">The Desktop Summit social events have been announced.</a> One is on an island! In the river Spree! In summer! In Berlin! How awesome is that?</li> <li><a href="https://desktopsummit.org/program/workshops-bofs">The BoF and workshop schedule has been published.</a></li> </ul> <p>And there will be more exciting announcements coming!</p> <p>See you in 14 days! Oh, and if you still haven't registered, <a href="https://desktopsummit.org/register">do so now</a>. It's free, and if you don't register you might not get on the WLAN at the conference right-away.</p> Lennart PoetteringFri, 22 Jul 2011 21:15:00 +0200tag:0pointer.net,2011-07-22:/blog/projects/desktop-summit-announce.htmlprojectssystemd for Administrators, Part IXhttps://0pointer.net/blog/projects/on-etc-sysinit.html <p>Here's the ninth installment <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">of</a> <a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p> <h4>On /etc/sysconfig and /etc/default</h4> <p>So, here's a bit of an opinion piece on the <tt>/etc/sysconfig/</tt> and <tt>/etc/default</tt> directories that exist on the various distributions in one form or another, and why I believe their use should be faded out. Like everything I say on this blog what follows is just my personal opinion, and not the gospel and has nothing to do with the position of the Fedora project or my employer. The topic of <tt>/etc/sysconfig</tt> has been coming up in discussions over and over again. I hope with this blog story I can explain a bit what we as systemd upstream think about these files.</p> <p>A few lines about the historical context: I wasn't around when /etc/sysconfig was introduced -- suffice to say it has been around on Red Hat and SUSE distributions since a long long time. Eventually /etc/default was introduced on Debian with very similar semantics. Many other distributions know a directory with similar semantics too, most of them call it either one or the other way. In fact, even other Unix-OSes sported a directory like this. (Such as SCO. If you are interested in the details, I am sure a Unix greybeard of your trust can fill in what I am leaving vague here.) So, even though a directory like this has been known widely on Linuxes and Unixes, it never has been standardized, neither in POSIX nor in LSB/FHS. These directories very much are something where distributions distuingish themselves from each other.</p> <p>The semantics of <tt>/etc/default</tt> and <tt>/etc/sysconfig</tt> are very losely defined only. What almost all files stored in these directories have in common though is that they are sourcable shell scripts which primarily consist of environment variable assignments. Most of the files in these directories are sourced by the SysV init scripts of the same name. The <a href="http://www.debian.org/doc/debian-policy/ch-opersys.html#s-sysvinit">Debian Policy Manual (9.3.2)</a> and the <a href="http://fedoraproject.org/wiki/Packaging:SysVInitScript">Fedora Packaging Guidelines</a> suggest this use of the directories, however both distributions also have files in them that do not follow this scheme, i.e. that do not have a matching SysV init script -- or not even are shell scripts at all.</p> <p>Why have these files been introduced? On SysV systems services are started via init scripts in <tt>/etc/rc.d/init.d</tt> (or a similar directory). <tt>/etc/</tt> is (these days) considered the place where system configuration is stored. Originally these init scripts were subject to customization by the administrator. But as they grew and become complex most distributions no longer considered them true configuration files, but more just a special kind of programs. To make customization easy and guarantee a safe upgrade path the customizable bits hence have been moved to separate configuration files, which the init scripts then source.</p> <p>Let's have a quick look what kind of configuration you can do with these files. Here's a short incomprehensive list of various things that can be configured via environment settings in these source files I found browsing through the directories on a Fedora and a Debian machine:</p> <ul> <li>Additional command line parameters for the daemon binaries</li> <li>Locale settings for a daemon</li> <li>Shutdown time-out for a daemon</li> <li>Shutdown mode for a daemon</li> <li>System configuration like system locale, time zone information, console keyboard</li> <li>Redundant system configuration, like whether the RTC is in local timezone</li> <li>Firewall configuration data, not in shell format (!)</li> <li>CPU affinity for a daemon</li> <li>Settings unrelated to boot, for example including information how to install a new kernel package, how to configure nspluginwrap or whether to do library prelinking</li> <li>Whether a specific service should be started or not</li> <li>Networking configuration</li> <li>Which kernel modules to statically load</li> <li>Whether to halt or power-off on shutdown</li> <li>Access modes for device nodes (!)</li> <li>A description string for the SysV service (!)</li> <li>The user/group ID, umask to run specific daemons as</li> <li>Resource limits to set for a specific daemon</li> <li>OOM adjustment to set for a specific daemon</li> </ul> <p>Now, let's go where the beef is: what's wrong with <tt>/etc/sysconfig</tt> (resp. <tt>/etc/default</tt>)? Why might it make sense to fade out use of these files in a systemd world?</p> <ul> <li>For the majority of these files the reason for having them simply does not exist anymore: systemd unit files are not programs like SysV init scripts were. Unit files are simple, declarative descriptions, that usually do not consist of more than 6 lines or so. They can easily be generated, parsed without a Bourne interpreter and understood by the reader. Also, they are very easy to modify: just copy them from <tt>/lib/systemd/system</tt> to <tt>/etc/systemd/system</tt> and edit them there, where they will not be modified by the package manager. The need to separate code and configuration that was the original reason to introduce these files does not exist anymore, as systemd unit files do not include code. These files hence now are a solution looking for a problem that no longer exists.</li> <li>They are inherently distribution-specific. With systemd we hope to encourage standardization between distributions. Part of this is that we want that unit files are supplied with upstream, and not just added by the packager -- how it has usually been done in the SysV world. Since the location of the directory and the available variables in the files is very different on each distribution, supporting <tt>/etc/sysconfig</tt> files in upstream unit files is not feasible. Configuration stored in these files works against de-balkanization of the Linux platform.</li> <li>Many settings are fully redundant in a systemd world. For example, various services support configuration of the process credentials like the user/group ID, resource limits, CPU affinity or the OOM adjustment settings. However, these settings are supported only by some SysV init scripts, and often have different names if supported in multiple of them. OTOH in systemd, all these settings are available equally and uniformly for all services, with the same configuration option in unit files.</li> <li>Unit files know a large number of easy-to-use process context settings, that are more comprehensive than what most <tt>/etc/sysconfig</tt> files offer.</li> <li>A number of these settings are entirely questionnabe. For example, the aforementiond configuration option for the user/group ID a service runs as is primarily something the distributor has to take care of. There is little to win for administrators to change these settings, and only the distributor has the broad overview to make sure that UID/GID and name collisions do not happen.</li> <li>The file format is not ideal. Since the files are usually sourced as shell scripts, parse errors are very hard to decypher and are not logged along the other configuration problems of the services. Generally, unknown variable assignments simply have no effect but this is not warned about. This makes these files harder to debug than necessary.</li> <li>Configuration files sources from shell scripts are subject to the execution parameters of the interpreter, and it has many: settings like IFS or LANG tend to modify drastically how shell scripts are parsed and understood. This makes them fragile.</li> <li>Interpretation of these files is slow, since it requires spawning of a shell, which adds at least one process for each service to be spawned at boot.</li> <li>Often, files in <tt>/etc/sysconfig</tt> are used to "fake" configuration files for daemons which do not support configuration files natively. This is done by glueing together command line arguments from these variable assignments that are then passed to the daemon. In general proper, native configuration files in these daemons are the much prettier solution however. Command line options like "-k", "-a" or "-f" are not self-explanatory and have a very cryptic syntax. Moreover the same switches in many daemons have (due to the limited vocabulary) often very much contradicting effects. (On one daemon <tt>-f</tt> might cause the daemon to daemonize, while on another one this option turns exactly this behaviour off.) Command lines generally cannot include sensible comments which most configuration files however can.</li> <li>A number of configuration settings in <tt>/etc/sysconfig</tt> are entirely redundant: for example, on many distributions it can be controlled via <tt>/etc/sysconfig</tt> files whether the RTC is in UTC or local time. Such an option already exists however in the 3rd line of the <tt>/etc/adjtime</tt> (which is known on all distributions). Adding a second, redundant, distribution-specific option overriding this is hence needless and complicates things for no benefit.</li> <li>Many of the configuration settings in <tt>/etc/sysconfig</tt> allow disabling services. By this they basically become a second level of enabling/disabling over what the init system already offers: when a service is enabled with <tt>systemctl enable</tt> or <tt>chkconfig on</tt> these settings override this, and turn the daemon of even though the init system was configured to start it. This of course is very confusing to the user/administrator, and brings virtually no benefit.</li> <li>For options like the configuration of static kernel modules to load: there are nowadays usually much better ways to load kernel modules at boot. For example, most modules may now be autoloaded by udev when the right hardware is found. This goes very far, and even includes ACPI and other high-level technologies. One of the very few exceptions where we currently do not do kernel module autoloading is CPU feature and model based autoloading which however will be supported soon too. And even if your specific module cannot be auto-loaded there's usually a better way to statically load it, for example by sticking it in <tt>/etc/load-modules.d</tt> so that the administrator can check a standardized place for all statically loaded modules.</li> <li>Last but not least, /etc already is intended to be the place for system configuration ("Host-specific system configuration" according to FHS). A subdirectory beneath it called <tt>sysconfig</tt> to place system configuration in is hence entirely redundant, already on the language level.</li> </ul> <p>What to use instead? Here are a few recommendations of what to do with these files in the long run in a systemd world:</p> <ul> <li>Just drop them without replacement. If they are fully redundant (like the local/UTC RTC setting) this is should be a relatively easy way out (well, ignoring the need for compatibility). If systemd natively supports an equivalent option in the unit files there is no need to duplicate these settings in <tt>sysconfig</tt> files. For a list of execution options you may set for a service check out the respective man pages: <a href="http://0pointer.de/public/systemd-man/systemd.exec.html">systemd.exec(5)</a> and <a href="http://0pointer.de/public/systemd-man/systemd.service.html">systemd.service(5)</a>. If your setting simply adds another layer where a service can be disabled, remove it to keep things simple. There's no need to have multiple ways to disable a service.</li> <li>Find a better place for them. For configuration of the system locale or system timezone we hope to gently push distributions into the right direction, for more details see <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">previous episode of this series</a>.</li> <li>Turn these settings into native settings of the daemon. If necessary add support for reading native configuration files to the daemon. Thankfully, most of the stuff we run on Linux is Free Software, so this can relatively easily be done.</li> </ul> <p>Of course, there's one very good reason for supporting these files for a bit longer: compatibility for upgrades. But that's is really the only one I could come up with. It's reason enough to keep compatibility for a while, but I think it is a good idea to phase out usage of these files at least in new packages.</p> <p>If compatibility is important, then systemd will still allow you to read these configuration files even if you otherwise use native systemd unit files. If your <tt>sysconfig</tt> file only knows simple options <tt>EnvironmentFile=-/etc/sysconfig/foobar</tt> (<a href="http://0pointer.de/public/systemd-man/systemd.exec.html">See systemd.exec(5) for more information about this option.</a>) may be used to import the settings into the environment and use them to put together command lines. If you need a programming language to make sense of these settings, then use a programming language like shell. For example, place an short shell script in <tt>/usr/lib/<i>&lt;your package&gt;</i>/</tt> which reads these files for compatibility, and then <tt>exec</tt>'s the actual daemon binary. Then spawn this script instead of the actual daemon binary with <tt>ExecStart=</tt> in the unit file.</p> <p>And this is all for now. Thank you very much for your interest.</p> Lennart PoetteringMon, 18 Jul 2011 00:34:00 +0200tag:0pointer.net,2011-07-18:/blog/projects/on-etc-sysinit.htmlprojectsWake up!https://0pointer.net/blog/projects/ds-wake-up-call.html <p>If you plan to attend Desktop Summit in Berlin this year, then please <a href="https://desktopsummit.org/register">REGISTER NOW!</a></p> <p>If you do not register, then this means you will have to wait in the signup queue at the conference for substantially longer and might miss a talk or two. You will <b>not get onto the conference WLAN</b> right from the beginning of the conference (access is authenticated and personalized, only people who sign up will get access credentials). Your personal badge will not be ready right-away. If not enough people register we will also have to <b>cut down on the available catering and the parties</b>. We rely on the registration numbers to plan, and if you come but don't sign up before you make it very hard for us to plan. Registration is free, so what are you waiting for?</p> <p>I am pretty sure you want to avoid all of this right? For your own benefit and for the benefit of everybody else attending the conference, go and register for the conference <a href="https://desktopsummit.org/register">right-away</a>!</p> <p>Also, we are still looking for more volunteers for session chairs and runners at the conference. This is your chance to introduce your favourite Open Source hacker on stage! Please consider volunteering and <a href="https://desktopsummit.org/news/call-for-volunteers">read the Call for Volunteers</a>. Add yourself to <a href="http://wiki.desktopsummit.org/Volunteers">the list on the wiki page</a>, today. If you sign up you'll earn yourself the gratitude of the GNOME and KDE communities, and you'll receive the exclusive team T-shirts!</p> <p>Thank you!</p> Lennart PoetteringWed, 13 Jul 2011 00:49:00 +0200tag:0pointer.net,2011-07-13:/blog/projects/ds-wake-up-call.htmlprojectsYet another interviewhttps://0pointer.net/blog/projects/linuxfr.html <p><a href="http://linuxfr.org/news/un-entretien-avec-lennart-poettering">Here's yes another interview with yours truly.</a> It's on LinuxFR, so I hope you understand some fr_FR.</p> Lennart PoetteringTue, 05 Jul 2011 15:28:00 +0200tag:0pointer.net,2011-07-05:/blog/projects/linuxfr.htmlprojectssystemd for Developers IIhttps://0pointer.net/blog/projects/socket-activation2.html <p>It has been way too long since I posted the <a href="http://0pointer.de/blog/projects/socket-activation.html">first episode</a> of my <i>systemd for Developers</i> series. Here's finally the second part. Make sure you read the first episode of the series before you start with this part since I'll assume the reader grokked the wonders of socket activation.</p> <h4>Socket Activation, Part II</h4> <p>This time we'll focus on adding socket activation support to real-life software, more specifically the CUPS printing server. Most current Linux desktops run CUPS by default these days, since printing is so basic that it's a must have, and must just work when the user needs it. However, most desktop CUPS installations probably don't actually see more than a handful of print jobs each month. Even if you are a busy office worker you'll unlikely to generate more than a couple of print jobs an hour on your PC. Also, printing is not time critical. Whether a job takes 50ms or 100ms until it reaches the printer matters little. As long as it is less than a few seconds the user probably won't notice. Finally, printers are usually peripheral hardware: they aren't built into your laptop, and you don't always carry them around plugged in. That all together makes CUPS a perfect candidate for lazy activation: instead of starting it unconditionally at boot we just start it on-demand, when it is needed. That way we can save resources, at boot and at runtime. However, this kind of activation needs to take place transparently, so that the user doesn't notice that the print server was not actually running yet when he tried to access it. To achieve that we need to make sure that the print server is started as soon at least one of three conditions hold:</p> <ol> <li>A local client tries to talk to the print server, for example because a GNOME application opened the printing dialog which tries to enumerate available printers.</li> <li>A printer is being plugged in locally, and it should be configured and enabled and then optionally the user be informed about it.</li> <li>At boot, when there's still an unprinted print job lurking in the queue.</li> </ol> <p>Of course, the desktop is not the only place where CUPS is used. CUPS can be run in small and big print servers, too. In that case the amount of print jobs is substantially higher, and CUPS should be started right away at boot. That means that (optionally) we still want to start CUPS unconditionally at boot and not delay its execution until when it is needed.</p> <p>Putting this all together we need four kind of activation to make CUPS work well in all situations at minimal resource usage: socket based activation (to support condition 1 above), hardware based activation (to support condition 2), path based activation (for condition 3) and finally boot-time activation (for the optional server usecase). Let's focus on these kinds of activation in more detail, and in particular on socket-based activation.</p> <h5>Socket Activation in Detail</h5> <p>To implement socket-based activation in CUPS we need to make sure that when sockets are passed from systemd these are used to listen on instead of binding them directly in the CUPS source code. Fortunately this is relatively easy to do in the CUPS sources, since it already supports launchd-style socket activation, as it is used on MacOS X (note that CUPS is nowadays an Apple project). That means the code already has all the necessary hooks to add systemd-style socket activation with minimal work.</p> <p>To begin with our patching session we check out the CUPS sources. Unfortunately CUPS is still stuck in unhappy Subversion country and not using git yet. In order to simplify our patching work our first step is to use <tt>git-svn</tt> to check it out locally in a way we can access it with the usual git tools:</p> <pre>git svn clone http://svn.easysw.com/public/cups/trunk/ cups</pre> <p>This will take a while. After the command finished we use the wonderful <tt>git grep</tt> to look for all occurences of the word "launchd", since that's probably where we need to add the systemd support too. This reveals <a href="http://svn.easysw.com/public/cups/trunk/scheduler/main.c">scheduler/main.c</a> as main source file which implements launchd interaction.</p> <p>Browsing through this file we notice that two functions are primarily responsible for interfacing with launchd, the appropriately named <tt>launchd_checkin()</tt> and <tt>launchd_checkout()</tt> functions. The former acquires the sockets from launchd when the daemon starts up, the latter terminates communication with launchd and is called when the daemon shuts down. systemd's socket activation interfaces are much simpler than those of launchd. Due to that we only need an equivalent of the <tt>launchd_checkin()</tt> call, and do not need a checkout function. Our own function <tt>systemd_checkin()</tt> can be implemented very similar to <tt>launchd_checkin()</tt>: we look at the sockets we got passed and try to map them to the ones configured in the CUPS configuration. If we got more sockets passed than configured in CUPS we implicitly add configuration for them. If the CUPS configuration includes definitions for more listening sockets those will be bound natively in CUPS. That way we'll very robustly listen on all ports that are listed in either systemd or CUPS configuration.</p> <p>Our function <tt>systemd_checkin()</tt> uses <tt>sd_listen_fds()</tt> from <tt>sd-daemon.c</tt> to acquire the file descriptors. Then, we use <tt>sd_is_socket()</tt> to map the sockets to the right listening configuration of CUPS, in a loop for each passed socket. The loop corresponds very closely to the loop from <tt>launchd_checkin()</tt> however is a lot simpler. <a href="http://0pointer.de/public/cups-patch-core.txt">Our patch so far looks like this.</a></p> <p>Before we can test our patch, we add <a href="http://cgit.freedesktop.org/systemd/plain/src/sd-daemon.c">sd-daemon.c</a> and <a href="http://cgit.freedesktop.org/systemd/plain/src/sd-daemon.h">sd-daemon.h</a> as drop-in files to the package, so that <tt>sd_listen_fds()</tt> and <tt>sd_is_socket()</tt> are available for use. After a few minimal changes to the <tt>Makefile</tt> we are almost ready to test our socket activated version of CUPS. The last missing step is creating two unit files for CUPS, one for the socket (<a href="http://0pointer.de/public/cups.socket">cups.socket</a>), the other for the service (<a href="http://0pointer.de/public/cups.service">cups.service</a>). To make things simple we just drop them in <tt>/etc/systemd/system</tt> and make sure systemd knows about them, with <tt>systemctl daemon-reload</tt>.</p> <p>Now we are ready to test our little patch: we start the socket with <tt>systemctl start cups.socket</tt>. This will bind the socket, but won't start CUPS yet. Next, we simply invoke <tt>lpq</tt> to test whether CUPS is transparently started, and yupp, this is exactly what happens. We'll get the normal output from <tt>lpq</tt> as if we had started CUPS at boot already, and if we then check with <tt>systemctl status cups.service</tt> we see that CUPS was automatically spawned by our invocation of <tt>lpq</tt>. Our test succeeded, socket activation worked!</p> <h5>Hardware Activation in Detail</h5> <p>The next trigger is hardware activation: we want to make sure that CUPS is automatically started as soon as a local printer is found, regardless whether that happens as <i>hotplug</i> during runtime or as <i>coldplug</i> during boot. Hardware activation in systemd is done via udev rules. Any udev device that is tagged with the <tt>systemd</tt> tag can pull in units as needed via the <tt>SYSTEMD_WANTS=</tt> environment variable. In the case of CUPS we don't even have to add our own udev rule to the mix, we can simply hook into what systemd already does out-of-the-box with rules shipped upstream. More specifically, it ships a udev rules file with the following lines:</p> <pre>SUBSYSTEM=="printer", TAG+="systemd", ENV{SYSTEMD_WANTS}="printer.target" SUBSYSTEM=="usb", KERNEL=="lp*", TAG+="systemd", ENV{SYSTEMD_WANTS}="printer.target" SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ENV{ID_USB_INTERFACES}=="*:0701??:*", TAG+="systemd", ENV{SYSTEMD_WANTS}="printer.target"</pre> <p>This pulls in the target unit <tt>printer.target</tt> as soon as at least one printer is plugged in (supporting all kinds of printer ports). All we now have to do is make sure that our CUPS service is pulled in by <tt>printer.target</tt> and we are done. By placing <tt>WantedBy=printer.target </tt> line in the <tt>[Install]</tt> section of the service file, a <tt>Wants</tt> dependency is created from <tt>printer.target</tt> to <tt>cups.service</tt> as soon as the latter is enabled with <tt>systemctl enable</tt>. The indirection via <tt>printer.target</tt> provides us with a simple way to use <tt>systemctl enable</tt> and <tt>systemctl disable</tt> to manage hardware activation of a service.</p> <h5>Path-based Activation in Detail</h5> <p>To ensure that CUPS is also started when there is a print job still queued in the printing spool, we write a simple <a href="http://0pointer.de/public/cups.path"><tt>cups.path</tt></a> that activates CUPS as soon as we find a file in <tt>/var/spool/cups</tt>.</p> <h5>Boot-based Activation in Detail</h5> <p>Well, starting services on boot is obviously the most boring and well-known way to spawn a service. This entire excercise was about making this unnecessary, but we still need to support it for explicit print server machines. Since those are probably the exception and not the common case we do not enable this kind of activation by default, but leave it to the administrator to add it in when he deems it necessary, with a simple command (<tt>ln -s /lib/systemd/system/cups.service /etc/systemd/system/multi-user.target.wants/</tt> to be precise).</p> <p>So, now we have covered all four kinds of activation. To finalize our patch we have a closer look at the <tt>[Install]</tt> section of <a href="http://0pointer.de/public/cups.service"><tt>cups.service</tt></a>, i.e. the part of the unit file that controls how <tt>systemctl enable cups.service</tt> and <tt>systemctl disable cups.service</tt> will hook the service into/unhook the service from the system. Since we don't want to start cups at boot we do not place <tt>WantedBy=multi-user.target</tt> in it like we would do for those services. Instead we just place an <tt>Also=</tt> line that makes sure that <a href="http://0pointer.de/public/cups.path"><tt>cups.path</tt></a> and <a href="http://0pointer.de/public/cups.socket"><tt>cups.socket</tt></a> are automatically also enabled if the user asks to enable <tt>cups.service</tt> (they are enabled according to the <tt>[Install]</tt> sections in those unit files).</p> <p>As last step we then integrate our work into the build system. In contrast to SysV init scripts systemd unit files are supposed to be distribution independent. Hence it is a good idea to include them in the upstream tarball. Unfortunately CUPS doesn't use Automake, but Autoconf with a set of handwritten Makefiles. This requires a bit more work to get our additions integrated, but is not too difficult either. <a href="http://0pointer.de/public/cups-0001-systemd-add-systemd-socket-activation-and-unit-files.patch">And this is how our final patch looks like</a>, after we commited our work and ran <tt>git format-patch -1</tt> on it to generate a pretty git patch.</p> <p>The next step of course is to get this patch integrated into the upstream and Fedora packages (or whatever other distribution floats your boat). To make this easy I have prepared <a href="http://0pointer.de/public/cups-0001-Add-socket-activation-patch.patch">a patch for Tim that makes the necessary packaging changes for Fedora 16</a>, and includes the patch intended for upstream linked above. Of course, ideally the patch is merged upstream, however in the meantime we can already include it in the Fedora packages.</p> <p>Note that CUPS was particularly easy to patch since it already supported launchd-style activation, patching a service that doesn't support that yet is only marginally more difficult. (Oh, and we have no plans to offer the complex launchd API as compatibility kludge on Linux. It simply doesn't translate very well, so don't even ask... ;-))</p> <p>And that finishes our little blog story. I hope this quick walkthrough how to add socket activation (and the other forms of activation) to a package were interesting to you, and will help you doing the same for your own packages. If you have questions, our IRC channel <tt>#systemd</tt> on freenode and our <a href="http://lists.freedesktop.org/mailman/listinfo/systemd-devel">mailing list</a> are available, and we are always happy to help!</p> Lennart PoetteringTue, 05 Jul 2011 00:46:00 +0200tag:0pointer.net,2011-07-05:/blog/projects/socket-activation2.htmlprojectsAnother interviewhttps://0pointer.net/blog/projects/developerworks-brasil.html <p><a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/752a690f-8e93-4948-b7a3-c060117e8665/entry/_entrevista_lennart_poettering">Here's another interview with yours truly.</a> It's on IBM developerWorks Brasil, so I hope you understand some pt_BR.</p> Lennart PoetteringSun, 03 Jul 2011 16:33:00 +0200tag:0pointer.net,2011-07-03:/blog/projects/developerworks-brasil.htmlprojectsReminder!https://0pointer.net/blog/projects/reminder.html <p>GNOMErs, the Desktop Summit in Berlin, Germany is approaching quickly!</p> <p><a href="https://desktopsummit.org/program/workshops-bofs">Submit your BoF for the Desktop Summit BoF days NOW!</a> Deadline is <b>July 3rd</b>, this sunday!</p> <p><a href="https://desktopsummit.org/news/call-for-volunteers">Sign up as a volunteer for the Desktop Summit NOW!</a> Deadline is <b>July 18th</b>!</p> Lennart PoetteringWed, 29 Jun 2011 23:20:00 +0200tag:0pointer.net,2011-06-29:/blog/projects/reminder.htmlprojectsImpressions of Japan, Thailand and Indiahttps://0pointer.net/blog/photos/india-bangkok-japan.html <p>It has been a while since I blogged photos of my various travels, although I have visited quite a number of countries in the past 12 months, and travelled overland in a number of them. Here are a few selected shots from three: India (November/December), Thailand (January), Japan (June).</p> <p> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=955"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-955.jpg" alt="Japan" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=1289"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-1289.jpg" alt="Japan" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=258"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-258.jpg" alt="Japan" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=203"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-203.jpg" alt="Japan" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=630"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-630.jpg" alt="Japan" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=707"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-707.jpg" alt="Japan" width="120" height="80" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=795"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-795.jpg" alt="Japan" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=1038"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-1038.jpg" alt="Japan" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=616"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-616.jpg" alt="Japan" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=769"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-769.jpg" alt="Japan" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=53"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-53.jpg" alt="Japan" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=681"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-681.jpg" alt="Japan" width="120" height="80" /></a> </p> <p> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=1268"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-1268.jpg" alt="Japan" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=125"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-125.jpg" alt="Japan" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=1198"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-1198.jpg" alt="Japan" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=1132"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-1132.jpg" alt="Japan" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=983"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-983.jpg" alt="Japan" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=874"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-874.jpg" alt="Japan" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=387"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-387.jpg" alt="Japan" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=751"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-751.jpg" alt="Japan" width="80" height="120" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=721"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-721.jpg" alt="Japan" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=1242"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-1242.jpg" alt="Japan" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=692"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-692.jpg" alt="Japan" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=632"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-632.jpg" alt="Japan" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=377"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-377.jpg" alt="Japan" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=815"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-815.jpg" alt="Japan" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=163"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-163.jpg" alt="Japan" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Japan%202011-06&amp;photo=146"><img src="http://0pointer.de/photos/galleries/Japan%202011-06/thumbs/img-146.jpg" alt="Japan" width="80" height="120" /></a> </p> <p>These pictures are from Kyoto, Nara and Takayama in Honshu, Japan.</p> <p> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=821"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-821.jpg" alt="Thailand" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=236"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-236.jpg" alt="Thailand" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=722"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-722.jpg" alt="Thailand" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=717"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-717.jpg" alt="Thailand" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=455"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-455.jpg" alt="Thailand" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=163"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-163.jpg" alt="Thailand" width="120" height="80" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=261"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-261.jpg" alt="Thailand" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=256"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-256.jpg" alt="Thailand" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=805"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-805.jpg" alt="Thailand" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=547"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-547.jpg" alt="Thailand" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=669"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-669.jpg" alt="Thailand" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=402"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-402.jpg" alt="Thailand" width="120" height="80" /></a> </p> <p> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=794"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-794.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=785"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-785.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=771"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-771.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=763"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-763.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=753"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-753.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=776"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-776.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=726"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-726.jpg" alt="Thailand" width="80" height="120" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=708"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-708.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=200"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-200.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=657"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-657.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=599"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-599.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=613"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-613.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=381"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-381.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=562"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-562.jpg" alt="Thailand" width="80" height="120" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=441"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-441.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=368"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-368.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=316"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-316.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=687"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-687.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=208"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-208.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=90"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-90.jpg" alt="Thailand" width="80" height="120" /></a> </p> <p>All this is Bangkok, Thailand. Particular interest deserve the gold-based patterns used widely to adorn Thai architecture:</p> <p> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=714"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-714.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=103"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-103.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=693"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-693.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=677"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-677.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=580"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-580.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=699"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-699.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=350"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-350.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=325"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-325.jpg" alt="Thailand" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Thailand%202011-01&amp;photo=269"><img src="http://0pointer.de/photos/galleries/Thailand%202011-01/thumbs/img-269.jpg" alt="Thailand" width="80" height="120" /></a> </p> <p>And finally India (one picture NSFW!):</p> <p> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=108"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-108.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=206"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-206.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=245"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-245.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=274"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-274.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=487"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-487.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=335"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-335.jpg" alt="India" width="120" height="80" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=428"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-428.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=491"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-491.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=244"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-244.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=689"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-689.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=655"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-655.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=938"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-938.jpg" alt="India" width="120" height="80" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3600"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3600.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1042"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1042.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1146"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1146.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1248"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1248.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1339"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1339.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1386"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1386.jpg" alt="India" width="120" height="80" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1380"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1380.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1509"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1509.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1799"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1799.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1871"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1871.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2336"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2336.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2415"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2415.jpg" alt="India" width="120" height="80" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3403"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3403.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2660"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2660.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2675"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2675.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2715"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2715.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3197"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3197.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2986"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2986.jpg" alt="India" width="120" height="80" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3064"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3064.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3098"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3098.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3191"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3191.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3234"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3234.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3254"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3254.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2804"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2804.jpg" alt="India" width="120" height="80" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=977"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-977.jpg" alt="India" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2612"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2612.jpg" alt="India" width="120" height="80" /></a> </p> <p> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1406"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1406.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1411"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1411.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=167"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-167.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=181"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-181.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=419"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-419.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=198"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-198.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=192"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-192.jpg" alt="India" width="80" height="120" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=221"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-221.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=399"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-399.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3185"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3185.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=443"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-443.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3775"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3775.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=494"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-494.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=188"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-188.jpg" alt="India" width="80" height="120" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1485"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1485.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1544"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1544.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1743"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1743.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3552"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3552.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1828"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1828.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2170"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2170.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2422"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2422.jpg" alt="India" width="80" height="120" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2440"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2440.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2488"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2488.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2502"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2502.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2623"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2623.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2721"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2721.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=2875"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-2875.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3000"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3000.jpg" alt="India" width="80" height="120" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3009"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3009.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3101"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3101.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3157"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3157.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=270"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-270.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3223"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3223.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3400"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3400.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3412"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3412.jpg" alt="India" width="80" height="120" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=1749"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-1749.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3576"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3576.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3716"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3716.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=823"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-823.jpg" alt="India" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=India%202010-11&amp;photo=3825"><img src="http://0pointer.de/photos/galleries/India%202010-11/thumbs/img-3825.jpg" alt="India" width="80" height="120" /></a> </p> <p>This is Mumbai, Ellora, Ajanta, Aurangabad (in Maharashtra); Mandu, Sanchi, Gwalior, Khajuraho (Madhya Pradesh); Orchha, Varanasi (Uttar Pradesh); Bangalore, Mysore (Karnataka).</p> Lennart PoetteringSat, 25 Jun 2011 19:09:00 +0200tag:0pointer.net,2011-06-25:/blog/photos/india-bangkok-japan.htmlphotosCall For Volunteershttps://0pointer.net/blog/projects/call-for-volunteers.html <p><i>The Desktop Summit 2011 in Berlin, Germany Needs Your Help!</i></p> <p>Are you attending the Desktop Summit? Are you interested in helping the GNOME and KDE communities organize this year's Summit? Do you want to work with other Free Software enthusiasts to make the Desktop Summit rock? Would you like to own one of the exclusive Desktop Summit Team T-Shirts?</p> <p>If so, please sign up as a volunteer for the Desktop Summit!</p> <p><a href="https://www.desktopsummit.org/news/call-for-volunteers">Read the full <b>Call For Volunteers</b>!</a></p> <p><a href="http://psconboard.blogspot.com/2011/06/desktop-summit-2011-call-for-volunteers.html">Read Patricia's original announcement.</a></p> <p><a href="http://wiki.desktopsummit.org/Volunteers">Or go directly and sign up as a volunteer.</a></p> Lennart PoetteringThu, 23 Jun 2011 17:32:00 +0200tag:0pointer.net,2011-06-23:/blog/projects/call-for-volunteers.htmlprojectsDesktop Summit Workshops and BoFs Call for Participationhttps://0pointer.net/blog/projects/dsbofcfp.html <p>The Desktop Summit <a href="https://desktopsummit.org/program">schedule for the talks and presentations</a> has been published a couple of weeks ago. Now it is time to open the 2nd Call for Participation, this time for Workshops and BoFs.</p> <p>If you'd like to run a workshop, BoF, hack session or training/teaching session, then please <a href="https://desktopsummit.org/program/workshops-bofs">submit it here</a>. If you do it will appear in the printed schedule and get a prominent time slot assigned. BoFs, workshops, hack sessions and training/teaching sessions can also be added after <b>the deadline of July 3rd</b>, and even be registered ad-hoc at the conference, but if you register your slot in advance we can make sure people will find it in the printed schedule, will know about it, can plan to attend it and we can do everything to make sure a lot of people show up.</p> <p>Note that BoF/workshop proposals are unrestricted, i.e. there is no program committee that will accept or reject submissions: we have a lot of room and we'll accept liberally what is submitted.</p> <p>For GNOMErs: this part of the conference is supposed to be much like the Boston GNOME summit, but with a printed schedule. So please be welcome to submit your sessions like you'd want to have them take place at the GNOME summit as well.</p> <p>Also see <a href="http://www.jonnor.com/2011/06/registration-open-for-workshops-bofs-at-the-desktopsummit-2011/">Jonnor's original announcement</a>.</p> <p>So, hurry, <a href="https://desktopsummit.org/program/workshops-bofs">file your session request right-away</a> and before <b>July 3rd</b>!</p> Lennart PoetteringMon, 20 Jun 2011 09:59:00 +0200tag:0pointer.net,2011-06-20:/blog/projects/dsbofcfp.htmlprojectsTwo Articles In c'thttps://0pointer.net/blog/projects/ct.html <p>If you are into computers and live in Germany I am sure you know the c't computer magazine. <a href="http://www.heise.de/ct/inhalt/2011/13/172/">The current edition 13/2011 (p. 172) contains two articles contributed by Thorsten Leemhuis, Kay Sievers and yours truly on the topic of systemd</a>. Awesome read. Now, run to your local kiosk and grab a c't and study the two articles. Go now, quick!</p> Lennart PoetteringMon, 13 Jun 2011 11:45:00 +0200tag:0pointer.net,2011-06-13:/blog/projects/ct.htmlprojectsVideo Interviewhttps://0pointer.net/blog/projects/golem-video.html <p><a href="http://www.golem.de/1105/83785.html">Golem.de has an interview with yours truly.</a> When I watched I learned so much! If you understand the German language then you might too! (and only then because it is in Goethe's tongue).</p> <object width="480" height="270"> <param name="movie" value="http://video.golem.de/player/videoplayer.swf?id=4823&amp;autoPl=false" /> <param name="allowFullScreen" value="true" /> <param name="AllowScriptAccess" value="always" /> <embed src="http://video.golem.de/player/videoplayer.swf?id=4823&amp;autoPl=false" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="480" height="270" /> </object> <div style="width:480px; text-align:center; font-family:verdana,sans-serif; font-size:0.8em;"><a href="http://video.golem.de/oss/4823/interview-mit-lennart-poettering-entwickler-systemd.html">Video: Interview mit Lennart Poettering, Entwickler Systemd</a>&nbsp;(7:14)</div> Lennart PoetteringFri, 27 May 2011 22:02:00 +0200tag:0pointer.net,2011-05-27:/blog/projects/golem-video.htmlprojectssystemd Documentationhttps://0pointer.net/blog/projects/systemd-docs.html <p><a href="https://fedoraproject.org/get-fedora">Fedora 15 is out.</a> Get it while it is hot! It is probably the biggest distribution release of a all time with being first in shipping both <a href="http://gnome3.org/">GNOME 3</a> and <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>.</p> <p>Since this is the first distribution release based on systemd, it might be interesting to read up on what it is all about. Here's a little compilation of the available documentation for systemd.</p> <h4>The Manual Pages</h4> <ul> <li><a href="http://0pointer.de/public/systemd-man/systemd.html">systemd(1)</a>, covering general concepts of systemd.</li> <li><a href="http://0pointer.de/public/systemd-man/systemctl.html">systemctl(1)</a>, covering the client control utility of systemd.</li> <li><a href="http://0pointer.de/public/systemd-man/systemd-cgls.html">systemd-cgls(1)</a>, on a tool to show the systemd cgroup tree.</li> <li><a href="http://0pointer.de/public/systemd-man/systemd.unit.html">systemd.unit(5)</a>, <a href="http://0pointer.de/public/systemd-man/systemd.exec.html">systemd.exec(5)</a>, <a href="http://0pointer.de/public/systemd-man/systemd.service.html">systemd.service(5)</a>, <a href="http://0pointer.de/public/systemd-man/systemd.socket.html">systemd.socket(5)</a>, <a href="http://0pointer.de/public/systemd-man/systemd.target.html">systemd.target(5)</a>, <a href="http://0pointer.de/public/systemd-man/systemd.timer.html">systemd.timer(5)</a>, <a href="http://0pointer.de/public/systemd-man/systemd.swap.html">systemd.swap(5)</a>, <a href="http://0pointer.de/public/systemd-man/systemd.snapshot.html">systemd.snapshot(5)</a>, <a href="http://0pointer.de/public/systemd-man/systemd.path.html">systemd.path(5)</a>, <a href="http://0pointer.de/public/systemd-man/systemd.mount.html">systemd.mount(5)</a>, <a href="http://0pointer.de/public/systemd-man/systemd.device.html">systemd.device(5)</a>, <a href="http://0pointer.de/public/systemd-man/systemd.automount.html">systemd.automount(5)</a>, for writing systemd unit files.</li> <li><a href="http://0pointer.de/public/systemd-man/systemd-nspawn.html">systemd-nspawn(1)</a>, on a tool for running simple containers.</li> <li><a href="http://0pointer.de/public/systemd-man/systemd-notify.html">systemd-notify(1)</a>, on a tool for sending notifications to systemd.</li> <li><a href="http://0pointer.de/public/systemd-man/systemd.special.html">systemd.special(5)</a>, with a list of systemd's special units.</li> <li><a href="http://0pointer.de/public/systemd-man/daemon.html">daemon(7)</a>, on writing daemons.</li> <li><a href="http://0pointer.de/public/systemd-man/pam_systemd.html">pam_systemd(8)</a>, on configuring user session settings.</li> <li><a href="http://0pointer.de/public/systemd-man/sd-daemon.html">sd-daemon(7)</a>, <a href="http://0pointer.de/public/systemd-man/sd_listen_fds.html">sd_listen_fds(3)</a>, <a href="http://0pointer.de/public/systemd-man/sd_notify.html">sd_notify(3)</a>, <a href="http://0pointer.de/public/systemd-man/sd_is_fifo.html">sd_is_fifo(3)</a>, <a href="http://0pointer.de/public/systemd-man/sd_booted.html">sd_booted(3)</a>, <a href="http://0pointer.de/public/systemd-man/sd-readahead.html">sd-readahead(7)</a>, <a href="http://0pointer.de/public/systemd-man/sd_readahead.html">sd_readahead(3)</a>, covering the systemd API.</li> <li><a href="http://0pointer.de/public/systemd-man/systemd-ask-password.html">systemd-ask-password(1)</a>, describing a tool for querying system passwords.</li> <li><a href="http://0pointer.de/public/systemd-man/systemd-tmpfiles.html">systemd-tmpfiles(8)</a>, describing a tool for creating, deleting and cleaning up volatile and temporary files and directories.</li> <li><a href="http://0pointer.de/public/systemd-man/systemd.conf.html">systemd.conf(5)</a>, describing the systemd main configuration file.</li> <li><a href="http://0pointer.de/public/systemd-man/binfmt.d.html">binfmt.d(5)</a>, <a href="http://0pointer.de/public/systemd-man/hostname.html">hostname(5)</a>, <a href="http://0pointer.de/public/systemd-man/locale.conf.html">locale.conf(5)</a>, <a href="http://0pointer.de/public/systemd-man/machine-id.html">machine-id(5)</a>, <a href="http://0pointer.de/public/systemd-man/machine-info.html">machine-info(5)</a>, <a href="http://0pointer.de/public/systemd-man/modules-load.d.html">modules-load.d(5)</a>, <a href="http://0pointer.de/public/systemd-man/os-release.html">os-release(5)</a>, <a href="http://0pointer.de/public/systemd-man/sysctl.d.html">sysctl.d</a>, <a href="http://0pointer.de/public/systemd-man/tmpfiles.d.html">tmpfiles.d(5)</a>, <a href="http://0pointer.de/public/systemd-man/vconsole.conf.html">vconsole.conf(5)</a>, for the configuration files systemd standardizes.</li> <li><a href="http://0pointer.de/public/systemd-man/halt.html">halt(8)</a>, <a href="http://0pointer.de/public/systemd-man/runlevel.html">runlevel(8)</a>, <a href="http://0pointer.de/public/systemd-man/shutdown.html">shutdown(8)</a>, <a href="http://0pointer.de/public/systemd-man/telinit.html">telinit(8)</a>, covering the SysV compatibility tools.</li> </ul> <p><a href="http://0pointer.de/public/systemd-man/">Here's the full list of all man pages.</a></p> <h4>The Blog Stories</h4> <ul> <li><a href="http://0pointer.de/blog/projects/systemd.html">The original announcement blog story</a>, lining out the ideas of systemd in much detail.</li> <li><a href="http://0pointer.de/blog/projects/systemd-update.html">The two status updates</a> <a href="http://0pointer.de/blog/projects/systemd-update-2.html">since then</a>.</li> <li><a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">systemd for Administrators #1: Verifying Bootup</a></li> <li><a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">systemd for Administrators #2: Which Service Owns Which Processes?</a></li> <li><a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd for Administrators #3: How Do I Convert A SysV Init Script Into A systemd Service File?</a></li> <li><a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">systemd for Administrators #4: Killing Services</a></li> <li><a href="http://0pointer.de/blog/projects/three-levels-of-off">systemd for Administrators #5: The Three Levels of "Off"</a></li> <li><a href="http://0pointer.de/blog/projects/changing-roots.html">systemd for Administrators #6: Changing Roots</a></li> <li><a href="http://0pointer.de/blog/projects/blame-game.html">systemd for Administrators #7: The Blame Game</a></li> <li><a href="http://0pointer.de/blog/projects/the-new-configuration-files">systemd for Administrators #8: The New Configuration Files</a></li> <li><a href="http://0pointer.de/blog/projects/why.html">Why systemd?</a>, exploring why distributions should choose (and are choosing) systemd.</li> <li><a href="http://0pointer.de/blog/projects/socket-activation.html">systemd for Developers #1: Socket Activation</a></li> </ul> <p><a href="http://wiki.opennet.ru/Systemd_%D0%B4%D0%BB%D1%8F_%D0%B0%D0%B4%D0%BC%D0%B8%D0%BD%D0%B8%D1%81%D1%82%D1%80%D0%B0%D1%82%D0%BE%D1%80%D0%BE%D0%B2">Some of the systemd for Administrators blog posts are available in Russian language, too.</a></p> <h4>Other Documentation</h4> <ul> <li><a href="http://www.freedesktop.org/wiki/Software/systemd/TipsAndTricks">Tips &amp; Tricks</a></li> <li><a href="http://www.freedesktop.org/wiki/Software/systemd/FrequentlyAskedQuestions">Frequently Asked Questions</a></li> <li><a href="http://www.freedesktop.org/wiki/Software/systemd/InterfaceStabilityPromise">Interface Stability Promise</a>, covering what you need to know when developing against systemd interfaces.</li> <li><a href="http://www.freedesktop.org/wiki/Software/systemd/PasswordAgents">Writing Password Agents</a>, in case you want to add a systemd compatible password agent to the desktop of your preference.</li> <li><a href="http://www.freedesktop.org/wiki/Software/systemd/hostnamed">On hostnamed</a>, in case you want to add hostname changing UIs to your favourite desktop environment.</li> </ul> <h4>Fedora Documentation</h4> <ul> <li><a href="http://fedoraproject.org/wiki/Systemd">General Overview</a></li> <li><a href="http://fedoraproject.org/wiki/SysVinit_to_Systemd_Cheatsheet">SysVInit to systemd Cheatsheet</a></li> <li><a href="http://fedoraproject.org/wiki/How_to_debug_Systemd_problems">How to Debug systemd Problems</a></li> <li><a href="https://fedoraproject.org/wiki/Packaging:Guidelines:Systemd">systemd Packaging Guidelines</a></li> </ul> <h4>In The Press</h4> <ul> <li><a href="http://lwn.net/Articles/389149/">The original LWN article</a></li> <li><a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/752a690f-8e93-4948-b7a3-c060117e8665/entry/systemd_parte_1?lang=pt_br">B&#xea;-&#xe1;-b&#xe1; do systemd</a></li> <li><a href="http://0pointer.de/blog/projects/systemd-in-the-news.html">Press articles after the original announcement</a></li> </ul> <h4>Other Distributions' Documentation</h4> <ul> <li><a href="http://en.opensuse.org/SDB:Systemd">OpenSUSE</a></li> <li><a href="http://wiki.debian.org/systemd">Debian</a></li> <li><a href="https://wiki.ubuntu.com/systemd">Ubuntu</a></li> <li><a href="http://en.gentoo-wiki.com/wiki/Systemd">Gentoo</a></li> <li><a href="https://wiki.archlinux.org/index.php/Systemd">Arch</a></li> </ul> <p>And, if you still have questions after all of this, <a href="http://lists.freedesktop.org/mailman/listinfo/systemd-devel">please join our mailing list</a>, or our IRC channel <tt>#systemd</tt> on <tt>irc.freenode.org</tt>. Alternatively, if you are looking for paid consulting services for systemd <a href="http://profusion.mobi/">contact our friends at ProFUSION</a>.</p> Lennart PoetteringTue, 24 May 2011 16:54:00 +0200tag:0pointer.net,2011-05-24:/blog/projects/systemd-docs.htmlprojectsDesktop Summit Programme Publishedhttps://0pointer.net/blog/projects/desktop-summit-schedule.html <p>The Paper Committee of the Desktop Summit 2011, in Berlin, Germany is happy to announce that the conference programme is now published.</p> <p><a href="https://www.desktopsummit.org/program">Go directly to the schedule.</a></p> <p><a href="https://www.desktopsummit.org/press/program-announcement">Read the full announcement.</a></p> <p>And yes, it is an absolutely rocking programme.</p> <p>See you in Berlin!</p> Lennart PoetteringFri, 20 May 2011 17:08:00 +0200tag:0pointer.net,2011-05-20:/blog/projects/desktop-summit-schedule.htmlprojectsPlumbers Conference 2011https://0pointer.net/blog/projects/lpc2011.html <p><a href="http://www.linuxplumbersconf.org/2011/">The Linux Plumbers Conference 2011 in Santa Rosa, CA, USA</a> is coming nearer (Sep. 7-9). Together with Kay Sievers I am running the Boot&amp;Init track, and together with Mark Brown the Audio track.</p> <p>For both tracks we still need proposals. So if you haven't submitted anything yet, please consider doing so. And that quickly. i.e. if you can arrange for it, last sunday would be best, since that was actually the final deadline. However, the submission form is still open, so if you submit something really, really quickly we'll ignore the absence of time travel and the calendar for a bit. So, go, submit something. Now.</p> <p>What are we looking for? Well, here's what I just posted on the <a href="https://tango.0pointer.de/pipermail/pulseaudio-discuss/2011-May/010191.html">audio related mailing lists</a>:</p> <pre> So, please consider submitting something if you haven't done so yet. We are looking for all kinds of technical talks covering everything audio plumbing related: audio drivers, audio APIs, sound servers, pro audio, consumer audio. If you can propose something audio related -- like talks on media controller routing, on audio for ASOC/Embedded, submit something! If you care for low-latency audio, submit something. If you care about the Linux audio stack in general, submit something. LPC is probably the most relevant technical conference on the general Linux platform, so be sure that if you want your project, your work, your ideas to be heard then this is the right forum for everything related to the Linux stack. And the Audio track covers everything in our Audio Stack, regardless whether it is pro or consumer audio. </pre> <p>And here's what I posted to the <a href="http://lists.freedesktop.org/archives/systemd-devel/2011-May/002428.html">init related lists</a>:</p> <pre>So, please consider submitting something if you haven't done so yet. We are looking for all kinds of technical talks covering everything from the BIOS (i.e. CoreBoot and friends), over boot loaders (i.e. GRUB and friends), to initramfs (i.e. Dracut and friends) and init systems (i.e. systemd and friends). If you have something smart to say about any of these areas or maybe about related tools (i.e. you wrote a fancy new tool to measure boot performance) or fancy boot schemes in your favourite Linux based OS (i.e. the new Meego zero second boot ;-)) then don't hesitate to submit something on the LPC web site, in the Boot&amp;Init track!</pre> <p>And now, quickly, go to <a href="http://www.linuxplumbersconf.org/2011/">the LPC website</a> and post your session proposal in the Audio resp. Boot&Init; track! Thank you!</p> Lennart PoetteringThu, 19 May 2011 23:30:00 +0200tag:0pointer.net,2011-05-19:/blog/projects/lpc2011.htmlprojectsThankshttps://0pointer.net/blog/projects/thanks.html <p>As some of you might know Fedora 15 went Gold a couple of days ago. The first big distribution based on systemd will be released 2011-05-24. Mark the date!</p> <p>In little over a year systemd went from nowhere to became a core piece of Fedora. This wasn't possible without the numerous folks who worked with us on getting systemd right, supplied patches, chased bugs, tested releases and posted comments and generally made sure everything was in shape for the big release.</p> <p>At this point we'd like to thank everybody who contributed and a few folks in particular:</p> <p><i>A. Costa Adrian Spinu Alexey Shabalin Andreas Jaeger Andrew Edmunds Andrey Borzenkov Bill Nottingham Brandon Philips Brendan Jones Brett Witherspoon Chris E Ferron Christian Ruppert Conrad Meyer Daniel J Walsh Dave Reisner Eric Paris Fabian Henze Fabiano Fid&ecirc;ncio Florian Kriener Franz Dietrich Greg Kroah-Hartman Gustavo Sverzut Barbieri Harald Hoyer James Laska Jan Engelhardt Jeff Mahoney Jesse Zhang J&oacute;hann B. Gu&eth;mundsson Karel Zak Koen Kooi Lucas De Marchi Ludwig Nussel Luis Felipe Strano Moraes Maarten Lankhorst Malcolm Studd Marc-Antoine Perennou Martin Mikkelsen Matthew Miller Matthias Clasen Matthias Schiffer Michael Biebl Michael Olbrich Michael Tremer Micha&#322; Piotrowski Michal Schmidt Mike Kazantsev Mike Kelly Miklos Vajna Milan Broz Ozan &Ccedil;a&#287;layan Paul Menzel Pavol Rusnak Rahul Sundaram Rainer Gerhards Ran Benita Ray Strode Robert Gerus Sedat Dilek Tero Roponen Thierry Reding Tollef Fog Heen Tomasz Torcz Tom Callaway Tom Gundersen Toshio Kuratomi William Jon McCann Wulf C. Krueger Zbigniew J&#281;drzejewski-Szmek</i></p> <p>And everybody else who I (or git shortlog) forgot.</p> <p>Thank you!</p> <p>Lennart and Kay</p> <p>BTW, the <a href="http://www.freedesktop.org/wiki/Software/systemd/InterfaceStabilityPromise">interface stability promise</a> is valid now.</p> Lennart PoetteringThu, 19 May 2011 14:01:00 +0200tag:0pointer.net,2011-05-19:/blog/projects/thanks.htmlprojectssystemd for Developers Ihttps://0pointer.net/blog/projects/socket-activation.html <p><a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> not only brings improvements for administrators and users, it also brings a (small) number of new APIs with it. In this blog story (which might become the first of a series) I hope to shed some light on one of the most important new APIs in systemd:</p> <h4>Socket Activation</h4> <p>In the <a href="http://0pointer.de/blog/projects/systemd.html">original blog story about systemd</a> I tried to explain why socket activation is a wonderful technology to spawn services. Let's reiterate the background here a bit.</p> <p>The basic idea of socket activation is not new. The inetd superserver was a standard component of most Linux and Unix systems since time began: instead of spawning all local Internet services already at boot, the superserver would listen on behalf of the services and whenever a connection would come in an instance of the respective service would be spawned. This allowed relatively weak machines with few resources to offer a big variety of services at the same time. However it quickly got a reputation for being somewhat slow: since daemons would be spawned for each incoming connection a lot of time was spent on forking and initialization of the services -- once for each connection, instead of once for them all.</p> <p>Spawning one instance per connection was how inetd was primarily used, even though inetd actually understood another mode: on the first incoming connection it would notice this via <tt>poll()</tt> (or <tt>select()</tt>) and spawn a single instance for all future connections. (This was controllable with the <tt>wait</tt>/<tt>nowait</tt> options.) That way the first connection would be slow to set up, but subsequent ones would be as fast as with a standalone service. In this mode inetd would work in a true on-demand mode: a service would be made available lazily when it was required.</p> <p>inetd's focus was clearly on AF_INET (i.e. Internet) sockets. As time progressed and Linux/Unix left the server niche and became increasingly relevant on desktops, mobile and embedded environments inetd was somehow lost in the troubles of time. Its reputation for being slow, and the fact that Linux' focus shifted away from only Internet servers made a Linux machine running inetd (or one of its newer implementations, like xinetd) the exception, not the rule.</p> <p>When Apple engineers worked on optimizing the MacOS boot time they found a new way to make use of the idea of socket activation: they shifted the focus away from AF_INET sockets towards AF_UNIX sockets. And they noticed that on-demand socket activation was only part of the story: much more powerful is socket activation when used for <i>all</i> local services including those which need to be started anyway on boot. They implemented these ideas in <a href="http://launchd.macosforge.org/">launchd</a>, a central building block of modern MacOS X systems, and probably the main reason why MacOS is so fast booting up.</p> <p>But, before we continue, let's have a closer look what the benefits of socket activation for non-on-demand, non-Internet services in detail are. Consider the four services Syslog, D-Bus, Avahi and the Bluetooth daemon. D-Bus logs to Syslog, hence on traditional Linux systems it would get started after Syslog. Similarly, Avahi requires Syslog and D-Bus, hence would get started after both. Finally Bluetooth is similar to Avahi and also requires Syslog and D-Bus but does not interface at all with Avahi. Sinceoin a traditional SysV-based system only one service can be in the process of getting started at a time, the following serialization of startup would take place: Syslog &#x2192; D-Bus &#x2192; Avahi &#x2192; Bluetooth (Of course, Avahi and Bluetooth could be started in the opposite order too, but we have to pick one here, so let's simply go alphabetically.). To illustrate this, here's a plot showing the order of startup beginning with system startup (at the top).</p> <a href="http://0pointer.de/public/parallelization.png"><img src="http://0pointer.de/public/parallelization-small.png" width="400" height="257" alt="Parallelization plot" /></a> <p>Certain distributions tried to improve this strictly serialized start-up: since Avahi and Bluetooth are independent from each other, they can be started simultaneously. The parallelization is increased, the overall startup time slightly smaller. (This is visualized in the middle part of the plot.)</p> <p>Socket activation makes it possible to start all four services completely simultaneously, without any kind of ordering. Since the creation of the listening sockets is moved outside of the daemons themselves we can start them all at the same time, and they are able to connect to each other's sockets right-away. I.e. in a single step the <tt>/dev/log</tt> and <tt>/run/dbus/system_bus_socket</tt> sockets are created, and in the next step all four services are spawned simultaneously. When D-Bus then wants to log to syslog, it just writes its messages to <tt>/dev/log</tt>. As long as the socket buffer does not run full it can go on immediately with what else it wants to do for initialization. As soon as the syslog service catches up it will process the queued messages. And if the socket buffer runs full then the client logging will temporarily block until the socket is writable again, and continue the moment it can write its log messages. That means the scheduling of our services is entirely done by the kernel: from the userspace perspective all services are run at the same time, and when one service cannot keep up the others needing it will temporarily block on their request but go on as soon as these requests are dispatched. All of this is completely automatic and invisible to userspace. Socket activation hence allows us to drastically parallelize start-up, enabling simultaneous start-up of services which previously were thought to strictly require serialization. Most Linux services use sockets as communication channel. Socket activation allows starting of clients and servers of these channels at the same time.</p> <p>But it's not just about parallelization. It offers a number of other benefits:</p> <ul> <li>We no longer need to configure dependencies explicitly. Since the sockets are initialized before all services they are simply available, and no userspace ordering of service start-up needs to take place anymore. Socket activation hence drastically simplifies configuration and development of services.</li> <li>If a service dies its listening socket stays around, not losing a single message. After a restart of the crashed service it can continue right where it left off.</li> <li>If a service is upgraded we can restart the service while keeping around its sockets, thus ensuring the service is continously responsive. Not a single connection is lost during the upgrade.</li> <li>We can even replace a service during runtime in a way that is invisible to the client. For example, all systems running systemd start up with a tiny syslog daemon at boot which passes all log messages written to <tt>/dev/log</tt> on to the kernel message buffer. That way we provide reliable userspace logging starting from the first instant of boot-up. Then, when the actual rsyslog daemon is ready to start we terminate the mini daemon and replace it with the real daemon. And all that while keeping around the original logging socket and sharing it between the two daemons and not losing a single message. Since rsyslog flushes the kernel log buffer to disk after start-up all log messages from the kernel, from early-boot and from runtime end up on disk.</li> </ul> <p>For another explanation of this idea consult <a href="http://0pointer.de/blog/projects/systemd.html">the original blog story about systemd</a>.</p> <p>Socket activation has been available in systemd since its inception. On Fedora 15 a number of services have been modified to implement socket activation, including Avahi, D-Bus and rsyslog (to continue with the example above).</p> <p>systemd's socket activation is quite comprehensive. Not only classic sockets are support but related technologies as well:</p> <ul> <li>AF_UNIX sockets, in the flavours SOCK_DGRAM, SOCK_STREAM and SOCK_SEQPACKET; both in the filesystem and in the abstract namespace</li> <li>AF_INET sockets, i.e. TCP/IP and UDP/IP; both IPv4 and IPv6</li> <li>Unix named pipes/FIFOs in the filesystem</li> <li>AF_NETLINK sockets, to subscribe to certain kernel features. This is currently used by udev, but could be useful for other netlink-related services too, such as audit.</li> <li>Certain special files like <tt>/proc/kmsg</tt> or device nodes like <tt>/dev/input/*</tt>.</li> <li>POSIX Message Queues</li> </ul> <p>A service capable of socket activation must be able to receive its preinitialized sockets from systemd, instead of creating them internally. For most services this requires (minimal) patching. However, since systemd actually provides inetd compatibility a service working with inetd will also work with systemd -- which is quite useful for services like sshd for example.</p> <p>So much about the background of socket activation, let's now have a look how to patch a service to make it socket activatable. Let's start with a theoretic service <tt>foobard</tt>. (In a later blog post we'll focus on real-life example.)</p> <p>Our little (theoretic) service includes code like the following for creating sockets (most services include code like this in one way or another):</p> <pre> <b>/* Source Code Example #1: ORIGINAL, NOT SOCKET-ACTIVATABLE SERVICE */</b> ... union { struct sockaddr sa; struct sockaddr_un un; } sa; int fd; fd = socket(AF_UNIX, SOCK_STREAM, 0); if (fd &lt; 0) { fprintf(stderr, "socket(): %m\n"); exit(1); } memset(&amp;sa, 0, sizeof(sa)); sa.un.sun_family = AF_UNIX; strncpy(sa.un.sun_path, "/run/foobar.sk", sizeof(sa.un.sun_path)); if (bind(fd, &amp;sa.sa, sizeof(sa)) &lt; 0) { fprintf(stderr, "bind(): %m\n"); exit(1); } if (listen(fd, SOMAXCONN) &lt; 0) { fprintf(stderr, "listen(): %m\n"); exit(1); } ... </pre> <p>A socket activatable service may use the following code instead:</p> <pre> <b>/* Source Code Example #2: UPDATED, SOCKET-ACTIVATABLE SERVICE */</b> ... #include "sd-daemon.h" ... int fd; if (sd_listen_fds(0) != 1) { fprintf(stderr, "No or too many file descriptors received.\n"); exit(1); } fd = SD_LISTEN_FDS_START + 0; ... </pre> <p>systemd might pass you more than one socket (based on configuration, see below). In this example we are interested in one only. <a href="http://0pointer.de/public/systemd-man/sd_listen_fds.html">sd_listen_fds()</a> returns how many file descriptors are passed. We simply compare that with 1, and fail if we got more or less. The file descriptors systemd passes to us are inherited one after the other beginning with fd #3. (SD_LISTEN_FDS_START is a macro defined to 3). Our code hence just takes possession of fd #3.</p> <p>As you can see this code is actually much shorter than the original. This of course comes at the price that our little service with this change will no longer work in a non-socket-activation environment. With minimal changes we can adapt our example to work nicely both with and without socket activation:</p> <pre> <b>/* Source Code Example #3: UPDATED, SOCKET-ACTIVATABLE SERVICE WITH COMPATIBILITY */</b> ... #include "sd-daemon.h" ... int fd, n; n = sd_listen_fds(0); if (n > 1) { fprintf(stderr, "Too many file descriptors received.\n"); exit(1); } else if (n == 1) fd = SD_LISTEN_FDS_START + 0; else { union { struct sockaddr sa; struct sockaddr_un un; } sa; fd = socket(AF_UNIX, SOCK_STREAM, 0); if (fd &lt; 0) { fprintf(stderr, "socket(): %m\n"); exit(1); } memset(&amp;sa, 0, sizeof(sa)); sa.un.sun_family = AF_UNIX; strncpy(sa.un.sun_path, "/run/foobar.sk", sizeof(sa.un.sun_path)); if (bind(fd, &amp;sa.sa, sizeof(sa)) &lt; 0) { fprintf(stderr, "bind(): %m\n"); exit(1); } if (listen(fd, SOMAXCONN) &lt; 0) { fprintf(stderr, "listen(): %m\n"); exit(1); } } ... </pre> <p>With this simple change our service can now make use of socket activation but still works unmodified in classic environments. Now, let's see how we can enable this service in systemd. For this we have to write two systemd unit files: one describing the socket, the other describing the service. First, here's <tt>foobar.socket</tt>:</p> <pre> [Socket] ListenStream=/run/foobar.sk [Install] WantedBy=sockets.target </pre> <p>And here's the matching service file <tt>foobar.service</tt>:</p> <pre> [Service] ExecStart=/usr/bin/foobard </pre> <p>If we place these two files in <tt>/etc/systemd/system</tt> we can enable and start them:</p> <pre># systemctl enable foobar.socket # systemctl start foobar.socket</pre> <p>Now our little socket is listening, but our service not running yet. If we now connect to <tt>/run/foobar.sk</tt> the service will be automatically spawned, for on-demand service start-up. With a modification of <tt>foobar.service</tt> we can start our service already at startup, thus using socket activation only for parallelization purposes, not for on-demand auto-spawning anymore:</p> <pre> [Service] ExecStart=/usr/bin/foobard <b>[Install] WantedBy=multi-user.target</b> </pre> <p>And now let's enable this too:</p> <pre># systemctl enable foobar.service # systemctl start foobar.service</pre> <p>Now our little daemon will be started at boot and on-demand, whatever comes first. It can be started fully in parallel with its clients, and when it dies it will be automatically restarted when it is used the next time.</p> <p>A single .socket file can include multiple ListenXXX stanzas, which is useful for services that listen on more than one socket. In this case all configured sockets will be passed to the service in the exact order they are configured in the socket unit file. <a href="http://0pointer.de/public/systemd-man/systemd.socket.html">Also, you may configure various socket settings in the .socket files.</a></p> <p>In real life it's a good idea to include description strings in these unit files, to keep things simple we'll leave this out of our example. Speaking of real-life: our next installment will cover an actual real-life example. We'll add socket activation to the CUPS printing server.</p> <p>The <tt>sd_listen_fds()</tt> function call is defined in <a href="http://cgit.freedesktop.org/systemd/plain/src/sd-daemon.h">sd-daemon.h</a> and <a href="http://cgit.freedesktop.org/systemd/plain/src/sd-daemon.c">sd-daemon.c</a>. These two files are currently drop-in .c sources which projects should simply copy into their source tree. Eventually we plan to turn this into a proper shared library, however using the drop-in files allows you to compile your project in a way that is compatible with socket activation even without any compile time dependencies on systemd. <tt>sd-daemon.c</tt> is liberally licensed, should compile fine on the most exotic Unixes and the algorithms are trivial enough to be reimplemented with very little code if the license should nonetheless be a problem for your project. <tt>sd-daemon.c</tt> contains a couple of other API functions besides <tt>sd_listen_fds()</tt> that are useful when implementing socket activation in a project. For example, there's <tt><a href="http://0pointer.de/public/systemd-man/sd_is_fifo.html">sd_is_socket()</a></tt> which can be used to distuingish and identify particular sockets when a service gets passed more than one.</p> <p>Let me point out that the interfaces used here are in no way bound directly to systemd. They are generic enough to be implemented in other systems as well. We deliberately designed them as simple and minimal as possible to make it possible for others to adopt similar schemes.</p> <p>Stay tuned for the next installment. As mentioned, it will cover a real-life example of turning an existing daemon into a socket-activatable one: the CUPS printing service. However, I hope this blog story might already be enough to get you started if you plan to convert an existing service into a socket activatable one. We invite everybody to convert upstream projects to this scheme. If you have any questions join us on <tt>#systemd</tt> on freenode.</p> Lennart PoetteringWed, 18 May 2011 22:00:00 +0200tag:0pointer.net,2011-05-18:/blog/projects/socket-activation.htmlprojectsBê-á-bá do systemdhttps://0pointer.net/blog/projects/be-a-ba-do-systemd.html <p><a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/752a690f-8e93-4948-b7a3-c060117e8665/entry/systemd_parte_1">Pablo Hess</a> has been posting a series of articles on <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> on IBM DeveloperWorks Brasil. <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/752a690f-8e93-4948-b7a3-c060117e8665/entry/systemd_parte_1">So, if you</a> <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/752a690f-8e93-4948-b7a3-c060117e8665/entry/systemd_parte_2">speak portuguese</a> <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/752a690f-8e93-4948-b7a3-c060117e8665/entry/systemd_parte_3">head over</a> <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/752a690f-8e93-4948-b7a3-c060117e8665/entry/systemd_parte_4">there and</a> <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/752a690f-8e93-4948-b7a3-c060117e8665/entry/systemd_parte_5">have a</a> <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/752a690f-8e93-4948-b7a3-c060117e8665/entry/systemd_parte_6">look</a>!</p> Lennart PoetteringTue, 17 May 2011 15:12:00 +0200tag:0pointer.net,2011-05-17:/blog/projects/be-a-ba-do-systemd.htmlprojectsPulseAudio Saves Powerhttps://0pointer.net/blog/projects/pa-and-power.html #nocomments yes <p><a href="http://linux-tipps.blogspot.com/2011/04/power-performance-of-pulseaudio-alsa.html">D. Jansen has put up a blog story</a> including some power saving results when running <a href="http://pulseaudio.org/">PulseAudio</a> on modern HDA drivers. This shows off some work Pierre-Louis Bossart from Intel did on the HDA drivers which now enables the timer-based scheduling code in PulseAudio I added quite some time ago to come to its full potential. You can save half a Watt and reduce wakeups while playing audio to 1 wakeup/s.</p> <p>Previously there was little public profiling data available about the benefits PA brings you for low-power devices. Thanks to Dennis' data there's now public data available that hopefully explains why PA is the best choice for low-power devices as well as desktops. Hopefully this cleans up some misconceptions.</p> <p>Pierre-Louis, thanks for your work!</p> <p><b>Update:</b> <a href="http://arunraghavan.net/2011/05/more-pulseaudio-power-goodness/">Arun Raghavan has posted a follow-up to this.</a></p> Lennart PoetteringMon, 16 May 2011 23:58:00 +0200tag:0pointer.net,2011-05-16:/blog/projects/pa-and-power.htmlprojectssystemd for Administrators as PDFhttps://0pointer.net/blog/projects/systemd-pdf.html <p><a href="http://psankar.blogspot.com/">Sankarasivasubramanian Pasupathilingam</a> <a href="http://0pointer.de/public/systemd-ebook-psankar.pdf">has put together a PDF</a> of my ongoing <a href="http://0pointer.de/blog/projects/the-new-configuration-files.html">systemd for Administrators series</a>. This might be handy for reading on an ebook reader or similar.</p> <p>Enjoy!</p> Lennart PoetteringTue, 10 May 2011 17:31:00 +0200tag:0pointer.net,2011-05-10:/blog/projects/systemd-pdf.htmlprojectsWhy systemd?https://0pointer.net/blog/projects/why.html <p><a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> is still a young project, but it is not a baby anymore. The <a href="http://0pointer.de/blog/projects/systemd.html">initial announcement</a> I posted precisely a year ago. Since then most of the big distributions have decided to adopt it in one way or another, many smaller distributions have already switched. The first big distribution with systemd by default will be Fedora 15, due end of May. It is expected that the others will follow the lead a bit later (<a href="http://www.ubuntu.com/">with one exception</a>). Many embedded developers have already adopted it too, and there's even a <a href="http://profusion.mobi/">company specializing on engineering and consulting services for systemd</a>. In short: within one year systemd became a really successful project.</p> <p>However, there are still folks who we haven't won over yet. If you fall into one of the following categories, then please have a look on the comparison of init systems below:</p> <ul> <li>You are working on an embedded project and are wondering whether it should be based on systemd.</li> <li>You are a user or administrator and wondering which distribution to pick, and are pondering whether it should be based on systemd or not.</li> <li>You are a user or administrator and wondering why your favourite distribution has switched to systemd, if everything already worked so well before.</li> <li>You are developing a distribution that hasn't switched yet, and you are wondering whether to invest the work and go systemd.</li> </ul> <p>And even if you don't fall into any of these categories, you might still find the comparison interesting.</p> <p>We'll be comparing the three most relevant init systems for Linux: sysvinit, Upstart and systemd. Of course there are other init systems in existance, but they play virtually no role in the big picture. Unless you run Android (which is a completely different beast anyway), you'll almost definitely run one of these three init systems on your Linux kernel. (OK, or busybox, but then you are basically not running any init system at all.) Unless you have a soft spot for exotic init systems there's little need to look further. Also, I am kinda lazy, and don't want to spend the time on analyzing those other systems in enough detail to be completely fair to them.</p> <p>Speaking of fairness: I am of course one of the creators of systemd. I will try my best to be fair to the other two contenders, but in the end, take it with a grain of salt. I am sure though that should I be grossly unfair or otherwise incorrect somebody will point it out in the comments of this story, so consider having a look on those, before you put too much trust in what I say.</p> <p>We'll look at the currently implemented features in a released version. Grand plans don't count.</p> <h4>General Features</h4> <table border="1"> <tr> <th></th> <th>sysvinit</th> <th>Upstart</th> <th>systemd</th> </tr> <tr> <td>Interfacing via D-Bus</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Shell-free bootup</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Modular C coded early boot services included</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Read-Ahead</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no<sup>[1]</sup></td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Socket-based Activation</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no<sup>[2]</sup></td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Socket-based Activation: inetd compatibility</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no<sup>[2]</sup></td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Bus-based Activation</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no<sup>[3]</sup></td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Device-based Activation</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no<sup>[4]</sup></td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Configuration of device dependencies with udev rules</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Path-based Activation (inotify)</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Timer-based Activation</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Mount handling</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no<sup>[5]</sup></td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>fsck handling</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no<sup>[5]</sup></td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Quota handling</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Automount handling</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Swap handling</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Snapshotting of system state</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>XDG_RUNTIME_DIR Support</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Optionally kills remaining processes of users logging out</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Linux Control Groups Integration</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Audit record generation for started services</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>SELinux integration</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>PAM integration</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Encrypted hard disk handling (LUKS)</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>SSL Certificate/LUKS Password handling, including Plymouth, Console, wall(1), TTY and GNOME agents</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Network Loopback device handling</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>binfmt_misc handling</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>System-wide locale handling</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Console and keyboard setup</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Infrastructure for creating, removing, cleaning up of temporary and volatile files</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Handling for <tt>/proc/sys</tt> sysctl</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Plymouth integration</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Save/restore random seed</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Static loading of kernel modules</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Automatic serial console handling</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Unique Machine ID handling</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Dynamic host name and machine meta data handling</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Reliable termination of services</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Early boot /dev/log logging</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Minimal kmsg-based syslog daemon for embedded use</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Respawning on service crash without losing connectivity</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Gapless service upgrades</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Graphical UI</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Built-In Profiling and Tools</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Instantiated services</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>PolicyKit integration</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Remote access/Cluster support built into client tools</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Can list all processes of a service</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Can identify service of a process</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Automatic per-service CPU cgroups to even out CPU usage between them</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Automatic per-user cgroups</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>SysV compatibility</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>SysV services controllable like native services</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>SysV-compatible /dev/initctl</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Reexecution with full serialization of state</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Interactive boot-up</td> <td style="background-color: #ff7f7f">no<sup>[6]</sup></td> <td style="background-color: #ff7f7f">no<sup>[6]</sup></td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Container support (as advanced chroot() replacement)</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Dependency-based bootup</td> <td style="background-color: #ff7f7f">no<sup>[7]</sup></td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Disabling of services without editing files</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Masking of services without editing files</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Robust system shutdown within PID 1</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Built-in kexec support</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Dynamic service generation</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Upstream support in various other OS components</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Service files compatible between distributions</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Signal delivery to services</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Reliable termination of user sessions before shutdown</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>utmp/wtmp support</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Easily writable, extensible and parseable service files, suitable for manipulation with enterprise management tools</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> </table> <p><small><sup>[1]</sup> Read-Ahead implementation for Upstart available in separate package ureadahead, requires non-standard kernel patch.</small></p> <p><small><sup>[2]</sup> Socket activation implementation for Upstart available as preview, lacks parallelization support hence entirely misses the point of socket activation.</small></p> <p><small><sup>[3]</sup> Bus activation implementation for Upstart posted as patch, not merged.</small></p> <p><small><sup>[4]</sup> udev device event bridge implementation for Upstart available as preview, forwards entire udev database into Upstart, not practical.</small></p> <p><small><sup>[5]</sup> Mount handling utility mountall for Upstart available in separate package, covers only boot-time mounts, very limited dependency system.</small></p> <p><small><sup>[6]</sup> Some distributions offer this implemented in shell.</small></p> <p><small><sup>[7]</sup> LSB init scripts support this, if they are used.</small></p> <h4>Available Native Service Settings</h4> <table border="1"> <tr> <th></th> <th>sysvinit</th> <th>Upstart</th> <th>systemd</th> </tr> <tr> <td>OOM Adjustment</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes<sup>[1]</sup></td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Working Directory</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Root Directory (chroot())</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Environment Variables</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Environment Variables from external file</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Resource Limits</td> <td style="background-color: #ff7f7f">no</td> <td>some<sup>[2]</sup></td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>umask</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>User/Group/Supplementary Groups</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>IO Scheduling Class/Priority</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>CPU Scheduling Nice Value</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>CPU Scheduling Policy/Priority</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>CPU Scheduling Reset on fork() control</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>CPU affinity</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Timer Slack</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Capabilities Control</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Secure Bits Control</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Control Group Control</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>High-level file system namespace control: making directories inacessible</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>High-level file system namespace control: making directories read-only</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>High-level file system namespace control: private /tmp</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>High-level file system namespace control: mount inheritance</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Input on Console</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Output on Syslog</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Output on kmsg/dmesg</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Output on arbitrary TTY</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Kill signal control</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Conditional execution: by identified CPU virtualization/container</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Conditional execution: by file existance</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Conditional execution: by security framework</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>Conditional execution: by kernel command line</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> </table> <p><small><sup>[1]</sup> Upstart supports only the deprecated oom_score_adj mechanism, not the current oom_adj logic.</small></p> <p><small><sup>[2]</sup> Upstart lacks support for RLIMIT_RTTIME and RLIMIT_RTPRIO.</small></p> <p>Note that some of these options are relatively easily added to SysV init scripts, by editing the shell sources. The table above focusses on easily accessible options that do not require source code editing.</p> <h4>Miscellaneous</h4> <table border="1"> <tr> <th></th> <th>sysvinit</th> <th>Upstart</th> <th>systemd</th> </tr> <tr> <td>Maturity</td> <td style="background-color: #7fff7f">&gt; 15 years</td> <td style="background-color: #7fff7f">6 years</td> <td style="background-color: #ff7f7f">1 year</td> </tr> <tr> <td>Specialized professional consulting and engineering services available</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> <tr> <td>SCM</td> <td style="background-color: #ff7f7f">Subversion</td> <td style="background-color: #ff7f7f">Bazaar</td> <td style="background-color: #7fff7f">git</td> </tr> <tr> <td>Copyright-assignment-free contributing</td> <td style="background-color: #7fff7f">yes</td> <td style="background-color: #ff7f7f">no</td> <td style="background-color: #7fff7f">yes</td> </tr> </table> <h4>Summary</h4> <p>As the tables above hopefully show in all clarity systemd has left behind both sysvinit and Upstart in almost every aspect. With the exception of the project's age/maturity systemd wins in every category. At this point in time it will be very hard for sysvinit and Upstart to catch up with the features systemd provides today. In one year we managed to push systemd forward <i>much</i> further than Upstart has been pushed in six.</p> <p>It is our intention to drive forward the development of the Linux platform with systemd. In the next release cycle we will focus more strongly on providing the same features and speed improvement we already offer for the system to the user login session. This will bring much closer integration with the other parts of the OS and applications, making the most of the features the service manager provides, and making it available to login sessions. Certain components such as ConsoleKit will be made redundant by these upgrades, and services relying on them will be updated. The burden for maintaining these then obsolete components will be passed on the vendors who plan to continue to rely on them.</p> <p>If you are wondering whether or not to adopt systemd, then systemd obviously wins when it comes to mere features. Of course that should not be the only aspect to keep in mind. In the long run, sticking with the existing infrastructure (such as ConsoleKit) comes at a price: porting work needs to take place, and additional maintainance work for bitrotting code needs to be done. Going it on your own means increased workload.</p> <p>That said, adopting systemd is also not free. Especially if you made investments in the other two solutions adopting systemd means work. The basic work to adopt systemd is relatively minimal for porting over SysV systems (since compatibility is provided), but can mean substantial work when coming from Upstart. If you plan to go for a 100% systemd system without any SysV compatibility (recommended for embedded, long run goal for the big distributions) you need to be willing to invest some work to rewrite init scripts as simple systemd unit files.</p> <p>systemd is in the process of becoming a comprehensive, integrated and modular platform providing everything needed to bootstrap and maintain an operating system's userspace. It includes C rewrites of all basic early boot init scripts that are shipped with the various distributions. Especially for the embedded case adopting systemd provides you in one step with almost everything you need, and you can pick the modules you want. The other two init systems are singular individual components, which to be useful need a great number of additional components with differing interfaces. The emphasis of systemd to provide a platform instead of just a component allows for closer integration, and cleaner APIs. Sooner or later this will trickle up to the applications. Already, there are accepted XDG specifications (e.g. XDG basedir spec, more specifically XDG_RUNTIME_DIR) that are not supported on the other init systems.</p> <p>systemd is also a big opportunity for Linux standardization. Since it standardizes many interfaces of the system that previously have been differing on every distribution, on every implementation, adopting it helps to work against the balkanization of the Linux interfaces. Choosing systemd means redefining more closely what the Linux platform is about. This improves the lifes of programmers, users and administrators alike.</p> <p>I believe that momentum is clearly with systemd. We invite you to join our community and be part of that momentum.</p> Lennart PoetteringThu, 28 Apr 2011 23:16:00 +0200tag:0pointer.net,2011-04-28:/blog/projects/why.htmlprojectssystemd for Administrators, Part VIIIhttps://0pointer.net/blog/projects/the-new-configuration-files.html <p>Another episode of <a href="http://0pointer.de/blog/projects/blame-game.html">my</a> <a href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p> <h4>The New Configuration Files</h4> <p>One of the formidable new features of <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> is that it comes with a complete set of modular early-boot services that are written in simple, fast, parallelizable and robust C, replacing the shell "novels" the various distributions featured before. Our little <i>Project Zero Shell</i><sup>[1]</sup> has been a full success. We currently cover pretty much everything most desktop and embedded distributions should need, plus a big part of the server needs:</p> <ul> <li>Checking and mounting of all file systems</li> <li>Updating and enabling quota on all file systems</li> <li>Setting the host name</li> <li>Configuring the loopback network device</li> <li>Loading the SELinux policy and relabelling <tt>/run</tt> and <tt>/dev</tt> as necessary on boot</li> <li>Registering additional binary formats in the kernel, such as Java, Mono and WINE binaries</li> <li>Setting the system locale</li> <li>Setting up the console font and keyboard map</li> <li>Creating, removing and cleaning up of temporary and volatile files and directories</li> <li>Applying mount options from <tt>/etc/fstab</tt> to pre-mounted API VFS</li> <li>Applying sysctl kernel settings</li> <li>Collecting and replaying readahead information</li> <li>Updating <tt>utmp</tt> boot and shutdown records</li> <li>Loading and saving the random seed</li> <li>Statically loading specific kernel modules</li> <li>Setting up encrypted hard disks and partitions</li> <li>Spawning automatic gettys on serial kernel consoles</li> <li>Maintenance of Plymouth</li> <li>Machine ID maintenance</li> <li>Setting of the UTC distance for the system clock</li> </ul> <p>On a standard Fedora 15 install, only a few legacy and storage services still require shell scripts during early boot. If you don't need those, you can easily disable them end enjoy your shell-free boot (like I do every day). The shell-less boot systemd offers you is a unique feature on Linux.</p> <p>Many of these small components are configured via configuration files in <tt>/etc</tt>. Some of these are fairly standardized among distributions and hence supporting them in the C implementations was easy and obvious. Examples include: <tt>/etc/fstab</tt>, <tt>/etc/crypttab</tt> or <tt>/etc/sysctl.conf</tt>. However, for others no standardized file or directory existed which forced us to add <tt>#ifdef</tt> orgies to our sources to deal with the different places the distributions we want to support store these things. All these configuration files have in common that they are dead-simple and there is simply no good reason for distributions to distuingish themselves with them: they all do the very same thing, just a bit differently.</p> <p>To improve the situation and benefit from the unifying force that systemd is we thus decided to read the per-distribution configuration files only as <i>fallbacks</i> -- and to introduce new configuration files as primary source of configuration wherever applicable. Of course, where possible these standardized configuration files should not be new inventions but rather just standardizations of the best distribution-specific configuration files previously used. Here's a little overview over these new common configuration files systemd supports on all distributions:</p> <ul> <li><tt><a href="http://0pointer.de/public/systemd-man/hostname.html">/etc/hostname</a></tt>: the host name for the system. One of the most basic and trivial system settings. Nonetheless previously all distributions used different files for this. Fedora used <tt>/etc/sysconfig/network</tt>, OpenSUSE <tt>/etc/HOSTNAME</tt>. We chose to standardize on the Debian configuration file <tt>/etc/hostname</tt>.</li> <li><tt><a href="http://0pointer.de/public/systemd-man/vconsole.conf.html">/etc/vconsole.conf</a></tt>: configuration of the default keyboard mapping and console font.</li> <li><tt><a href="http://0pointer.de/public/systemd-man/locale.conf.html">/etc/locale.conf</a></tt>: configuration of the system-wide locale.</li> <li><tt><a href="http://0pointer.de/public/systemd-man/modules-load.d.html">/etc/modules-load.d/*.conf</a></tt>: a drop-in directory for kernel modules to statically load at boot (for the very few that still need this).</li> <li><tt><a href="http://0pointer.de/public/systemd-man/sysctl.d.html">/etc/sysctl.d/*.conf</a></tt>: a drop-in directory for kernel sysctl parameters, extending what you can already do with <tt>/etc/sysctl.conf</tt>.</li> <li><tt><a href="http://0pointer.de/public/systemd-man/tmpfiles.d.html">/etc/tmpfiles.d/*.conf</a></tt>: a drop-in directory for configuration of runtime files that need to be removed/created/cleaned up at boot and during uptime.</li> <li><tt><a href="http://0pointer.de/public/systemd-man/binfmt.d.html">/etc/binfmt.d/*.conf</a></tt>: a drop-in directory for registration of additional binary formats for systems like Java, Mono and WINE.</li> <li><tt><a href="http://0pointer.de/public/systemd-man/os-release.html">/etc/os-release</a></tt>: a standardization of the various distribution ID files like <tt>/etc/fedora-release</tt> and similar. Really every distribution introduced their own file here; writing a simple tool that just prints out the name of the local distribution usually means including a database of release files to check. The LSB tried to standardize something like this with the <a href="http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/lsbrelease.html">lsb_release</a> tool, but quite frankly the idea of employing a shell script in this is not the best choice the LSB folks ever made. To rectify this we just decided to generalize this, so that everybody can use the same file here.</li> <li><tt><a href="http://0pointer.de/public/systemd-man/machine-id.html">/etc/machine-id</a></tt>: a machine ID file, superseding D-Bus' machine ID file. This file is guaranteed to be existing and valid on a systemd system, covering also stateless boots. By moving this out of the D-Bus logic it is hopefully interesting for a lot of additional uses as a unique and stable machine identifier.</li> <li><tt><a href="http://0pointer.de/public/systemd-man/machine-info.html">/etc/machine-info</a></tt>: a new information file encoding meta data about a host, like a pretty host name and an icon name, replacing stuff like <tt>/etc/favicon.png</tt> and suchlike. This is maintained by <a href="http://www.freedesktop.org/wiki/Software/systemd/hostnamed">systemd-hostnamed</a>.</li> </ul> <p>It is our definite intention to convince <i>you</i> to use these new configuration files in your configuration tools: if your configuration frontend writes these files instead of the old ones, it automatically becomes more portable between Linux distributions, and you are helping standardizing Linux. This makes things simpler to understand and more obvious for users and administrators. Of course, right now, only systemd-based distributions read these files, but that already covers all important distributions in one way or another, <a href="http://www.ubuntu.com/">except for one</a>. And it's a bit of a chicken-and-egg problem: a standard becomes a standard by being used. In order to gently push everybody to standardize on these files we also want to make clear that sooner or later we plan to drop the fallback support for the old configuration files from systemd. That means adoption of this new scheme can happen slowly and piece by piece. But the final goal of only having one set of configuration files must be clear.</p> <p>Many of these configuration files are relevant not only for configuration tools but also (and sometimes even primarily) in upstream projects. For example, we invite projects like Mono, Java, or WINE to install a drop-in file in <tt>/etc/binfmt.d/</tt> from their upstream build systems. Per-distribution downstream support for binary formats would then no longer be necessary and your platform would work the same on all distributions. Something similar applies to all software which need creation/cleaning of certain runtime files and directories at boot, for example beneath the <tt>/run</tt> hierarchy (i.e. <tt>/var/run</tt> as <a href="http://lwn.net/Articles/436012/">it used to be known</a>). These projects should just drop in configuration files in <tt>/etc/tmpfiles.d</tt>, also from the upstream build systems. This also helps speeding up the boot process, as separate per-project SysV shell scripts which implement trivial things like registering a binary format or removing/creating temporary/volatile files at boot are no longer necessary. Or another example, where upstream support would be fantastic: projects like X11 could probably benefit from reading the default keyboard mapping for its displays from <tt>/etc/vconsole.conf</tt>.</p> <p>Of course, I have no doubt that not everybody is happy with our choice of names (and formats) for these configuration files. In the end we had to pick something, and from all the choices these appeared to be the most convincing. The file formats are as simple as they can be, and usually easily written and read even from shell scripts. That said, <tt>/etc/bikeshed.conf</tt> could of course also have been a fantastic configuration file name!</p> <p><b>So, help us standardizing Linux! Use the new configuration files! Adopt them upstream, adopt them downstream, adopt them all across the distributions!</b></p> <p>Oh, and in case you are wondering: yes, all of these files were discussed in one way or another with various folks from the various distributions. And there has even been some push towards supporting some of these files even outside of systemd systems.</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] Our slogan: "<i>The only shell that should get started during boot is gnome-shell!</i>" -- Yes, the slogan needs a bit of work, but you get the idea.</small></p> Lennart PoetteringWed, 20 Apr 2011 22:57:00 +0200tag:0pointer.net,2011-04-20:/blog/projects/the-new-configuration-files.htmlprojectssystemd for Administrators, Part VIIhttps://0pointer.net/blog/projects/blame-game.html <p>Here's yet another installment of my <a href="http://0pointer.de/blog/projects/changing-roots">ongoing</a> <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">on </a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">Administrators</a>:</p> <h4>The Blame Game</h4> <p>Fedora 15<sup>[1]</sup> is the first Fedora release to sport <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>. Our primary goal for F15 was to get everything integrated and working well. One focus for Fedora 16 will be to further polish and speed up what we have in the distribution now. To prepare for this cycle we have implemented a few tools (which are already available in F15), which can help us pinpoint where exactly the biggest problems in our boot-up remain. With this blog story I hope to shed some light on how to figure out what to blame for your slow boot-up, and what to do about it. We want to allow you to put the blame where the blame belongs: on the system component responsible.</p> <p>The first utility is a very simple one: systemd will automatically write a log message with the time it needed to syslog/kmsg when it finished booting up.</p> <pre>systemd[1]: Startup finished in 2s 65ms 924us (kernel) + 2s 828ms 195us (initrd) + 11s 900ms 471us (userspace) = 16s 794ms 590us.</pre> <p>And here's how you read this: 2s have been spent for kernel initialization, until the time where the initial RAM disk (initrd, i.e. dracut) was started. A bit less than 3s have then been spent in the initrd. Finally, a bit less than 12s have been spent after the actual system init daemon (systemd) has been invoked by the initrd to bring up userspace. Summing this up the time that passed since the boot loader jumped into the kernel code until systemd was finished doing everything it needed to do at boot was a bit less than 17s. This number is nice and simple to understand -- and also easy to misunderstand: it does not include the time that is spent initializing your GNOME session, as that is outside of the scope of the init system. Also, in many cases this is just where systemd finished doing everything it needed to do. Very likely some daemons are still busy doing whatever <i>they</i> need to do to finish startup when this time is elapsed. Hence: while the time logged here is a good indication on the general boot speed, it is not the time the user might <i>feel</i> the boot actually takes.</p> <p>Also, it is a pretty superficial value: it gives no insight which system component systemd was waiting for all the time. To break this up, we introduced the tool <tt>systemd-analyze blame</tt>:</p> <pre>$ systemd-analyze blame 6207ms udev-settle.service 5228ms cryptsetup@luks\x2d9899b85d\x2df790\x2d4d2a\x2da650\x2d8b7d2fb92cc3.service 735ms NetworkManager.service 642ms avahi-daemon.service 600ms abrtd.service 517ms rtkit-daemon.service 478ms fedora-storage-init.service 396ms dbus.service 390ms rpcidmapd.service 346ms systemd-tmpfiles-setup.service 322ms fedora-sysinit-unhack.service 316ms cups.service 310ms console-kit-log-system-start.service 309ms libvirtd.service 303ms rpcbind.service 298ms ksmtuned.service 288ms lvm2-monitor.service 281ms rpcgssd.service 277ms sshd.service 276ms livesys.service 267ms iscsid.service 236ms mdmonitor.service 234ms nfslock.service 223ms ksm.service 218ms mcelog.service ...</pre> <p>This tool lists which systemd unit needed how much time to finish initialization at boot, the worst offenders listed first. What we can see here is that on this boot two services required more than 1s of boot time: <tt>udev-settle.service</tt> and <tt>cryptsetup@luks\x2d9899b85d\x2df790\x2d4d2a\x2da650\x2d8b7d2fb92cc3.service</tt>. This tool's output is easily misunderstood as well, it does not shed any light on why the services in question actually need this much time, it just determines that they did. Also note that the times listed here might be spent "in parallel", i.e. two services might be initializing at the same time and thus the time spent to initialize them both is much less than the sum of both individual times combined.</p> <p>Let's have a closer look at the worst offender on this boot: a service by the name of <tt>udev-settle.service</tt>. So why does it take that much time to initialize, and what can we do about it? This service actually does very little: it just waits for the device probing being done by udev to finish and then exits. Device probing can be slow. In this instance for example, the reason for the device probing to take more than 6s is the 3G modem built into the machine, which when not having an inserted SIM card takes this long to respond to software probe requests. The software probing is part of the logic that makes ModemManager work and enables NetworkManager to offer easy 3G setup. An obvious reflex might now be to blame ModemManager for having such a slow prober. But that's actually ill-directed: hardware probing quite frequently is this slow, and in the case of ModemManager it's a simple fact that the 3G hardware takes this long. It is an essential requirement for a proper hardware probing solution that individual probers can take this much time to finish probing. The actual culprit is something else: the fact that we actually wait for the probing, in other words: that <tt>udev-settle.service</tt> is part of our boot process.</p> <p>So, why is <tt>udev-settle.service</tt> part of our boot process? Well, it actually doesn't need to be. It is pulled in by the storage setup logic of Fedora: to be precise, by the LVM, RAID and Multipath setup script. These storage services have not been implemented in the way hardware detection and probing work today: they expect to be initialized at a point in time where "all devices have been probed", so that they can simply iterate through the list of available disks and do their work on it. However, on modern machinery this is not how things actually work: hardware can come and hardware can go all the time, during boot and during runtime. For some technologies it is not even possible to know when the device enumeration is complete (example: USB, or iSCSI), thus waiting for all storage devices to show up and be probed must necessarily include a fixed delay when it is assumed that all devices that can show up have shown up, and got probed. In this case all this shows very negatively in the boot time: the storage scripts force us to delay bootup until all potential devices have shown up and all devices that did got probed -- and all that even though we don't actually need most devices for anything. In particular since this machine actually does not make use of LVM, RAID or Multipath!<sup>[2]</sup></p> <p>Knowing what we know now we can go and disable <tt>udev-settle.service</tt> for the next boots: since neither LVM, RAID nor Multipath is used we can mask the services in question and thus speed up our boot a little:</p> <pre># ln -s /dev/null /etc/systemd/system/udev-settle.service # ln -s /dev/null /etc/systemd/system/fedora-wait-storage.service # ln -s /dev/null /etc/systemd/system/fedora-storage-init.service # systemctl daemon-reload</pre> <p>After restarting we can measure that the boot is now about 1s faster. Why just 1s? Well, the second worst offender is cryptsetup here: the machine in question has an encrypted <tt>/home</tt> directory. For testing purposes I have stored the passphrase in a file on disk, so that the boot-up is not delayed because I as the user am a slow typer. The cryptsetup tool unfortunately still takes more han 5s to set up the encrypted partition. Being lazy instead of trying to fix cryptsetup<sup>[3]</sup> we'll just tape over it here <sup>[4]</sup>: systemd will normally wait for all file systems not marked with the <tt>noauto</tt> option in /etc/fstab to show up, to be fscked and to be mounted before proceeding bootup and starting the usual system services. In the case of <tt>/home</tt> (unlike for example <tt>/var</tt>) we know that it is needed only very late (i.e. when the user actually logs in). An easy fix is hence to make the mount point available already during boot, but not actually wait until cryptsetup, fsck and mount finished running for it. You ask how we can make a mount point available before actually mounting the file system behind it? Well, systemd possesses magic powers, in form of the <tt>comment=systemd.automount</tt> mount option in <tt>/etc/fstab</tt>. If you specify it, systemd will create an automount point at <tt>/home</tt> and when at the time of the first access to the file system it still isn't backed by a proper file system systemd will wait for the device, fsck and mount it.</p> <p>And here's the result with this change to <tt>/etc/fstab</tt> made:</p> <pre>systemd[1]: Startup finished in 2s 47ms 112us (kernel) + 2s 663ms 942us (initrd) + 5s 540ms 522us (userspace) = 10s 251ms 576us.</pre> <p>Nice! With a few fixes we took almost 7s off our boot-time. And these two changes are only fixes for the two most superficial problems. With a bit of love and detail work there's a lot of additional room for improvements. In fact, on a different machine, a more than two year old X300 laptop (which even back then wasn't the fastest machine on earth) and a bit of decrufting we have boot times of around 4s (total) now, with a resonably complete GNOME system. And there's still a lot of room in it.</p> <p><tt>systemd-analyze blame</tt> is a nice and simple tool for tracking down slow services. However, it suffers by a big problem: it does not visualize how the parallel execution of the services actually diminishes the price one pays for slow starting services. For that we have prepared <tt>systemd-analyize plot</tt> for you. Use it like this:</p> <pre>$ systemd-analyze plot > plot.svg $ eog plot.svg</pre> <p>It creates pretty graphs, showing the time services spent to start up in relation to the other services. It currently doesn't visualize explicitly which services wait for which ones, but with a bit of guess work this is easily seen nonetheless.</p> <p>To see the effect of our two little optimizations here are two graphs generated with <tt>systemd-analyze plot</tt>, the first before and the other after our change:</p> <p><a href="http://0pointer.de/public/blame.svg"><img src="http://0pointer.de/public/blame.png" width="128" height="308" alt="Before" /></a>&nbsp;<a href="http://0pointer.de/public/blame2.svg"><img src="http://0pointer.de/public/blame2.png" width="95" height="308" alt="After" /></a></p> <p>(For the sake of completeness, here are the two complete outputs of <tt>systemd-analyze blame</tt> for these two boots: <a href="http://0pointer.de/public/blame.txt">before</a> and <a href="http://0pointer.de/public/blame2.txt">after</a>.)</p> <p>The well-informed reader probably wonders how this relates to <a href="https://github.com/mmeeks/bootchart">Michael Meeks' bootchart</a>. This plot and bootchart do show similar graphs, that is true. Bootchart is by far the more powerful tool. It plots in all detail what is happening during the boot, how much CPU and IO is used. <tt>systemd-analyze plot</tt> shows more high-level data: which service took how much time to initialize, and what needed to wait for it. If you use them both together you'll have a wonderful toolset to figure out why your boot is not as fast as it could be.</p> <p>Now, before you now take these tools and start filing bugs against the worst boot-up time offenders on your system: think twice. These tools give you raw data, don't misread it. As my optimization example above hopefully shows, the blame for the slow bootup was not actually with <tt>udev-settle.service</tt>, and not with the ModemManager prober run by it either. It is with the subsystem that pulled this service in in the first place. And that's where the problem needs to be fixed. So, file the bugs at the right places. Put the blame where the blame belongs.</p> <p>As mentioned, these three utilities are available on your Fedora 15 system out-of-the-box.</p> <p>And here's what to take home from this little blog story:</p> <ul> <li><tt>systemd-analyze</tt> is a wonderful tool and systemd comes with profiling built in.</li> <li>Don't misread the data these tools generate!</li> <li>With two simple changes you might be able to speed up your system by 7s!</li> <li>Fix your software if it can't handle dynamic hardware properly!</li> <li>The Fedora default of installing the OS on an enterprise-level storage managing system might be something to rethink.</li> </ul> <p>And that's all for now. Thank you for your interest.</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] Also known as the greatest Free Software OS release ever.</small></p> <p><small>[2] The right fix here is to improve the services in question to actively listen to hotplug events via libudev or similar and act on the devices showing up as they show up, so that we can continue with the bootup the instant everything we really need to go on has shown up. To get a quick bootup we should wait for what we actually need to proceed, not for everything. Also note that the storage services are not the only services which do not cope well with modern dynamic hardware, and assume that the device list is static and stays unchanged. For example, in this example the reason the initrd is actually as slow as it is is mostly due to the fact that Plymouth expects to be executed when all video devices have shown up and have been probed. For an unknown reason (at least unknown to me) loading the video kernel modules for my Intel graphics cards takes multiple seconds, and hence the entire boot is delayed unnecessarily. (Here too I'd not put the blame on the probing but on the fact that we wait for it to complete before going on.)</small></p> <p><small>[3] Well, to be precise, I actually did try to get this fixed. Most of the delay of crypsetup stems from the -- in my eyes -- unnecessarily high default values for <tt>--iter-time</tt> in cryptsetup. I tried to convince our cryptsetup maintainers that 100ms as a default here are not really less secure than 1s, but well, I failed.</small></p> <p><small>[4] Of course, it's usually not our style to just tape over problems instead of fixing them, but this is such a nice occasion to show off yet another cool systemd feature...</small></p> Lennart PoetteringTue, 12 Apr 2011 03:51:00 +0200tag:0pointer.net,2011-04-12:/blog/projects/blame-game.htmlprojectssystemd for Administrators, Part VIhttps://0pointer.net/blog/projects/changing-roots.html <p>Here's another installment <a href="http://0pointer.de/blog/projects/three-levels-of-off.html">of</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">my</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">ongoing</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">on </a> systemd for Administrators:</p> <h4>Changing Roots</h4> <p>As administrator or developer sooner or later you'll ecounter <a href="http://linux.die.net/man/1/chroot"><tt>chroot()</tt> environments</a>. The <tt>chroot()</tt> system call simply shifts what a process and all its children consider the root directory /, thus limiting what the process can see of the file hierarchy to a subtree of it. Primarily <tt>chroot()</tt> environments have two uses:</p> <ol> <li>For security purposes: In this use a specific isolated daemon is chroot()ed into a private subdirectory, so that when exploited the attacker can see only the subdirectory instead of the full OS hierarchy: he is trapped inside the chroot() jail.</li> <li>To set up and control a debugging, testing, building, installation or recovery image of an OS: For this a whole guest operating system hierarchy is mounted or bootstraped into a subdirectory of the host OS, and then a shell (or some other application) is started inside it, with this subdirectory turned into its /. To the shell it appears as if it was running inside a system that can differ greatly from the host OS. For example, it might run a different distribution or even a different architecture (Example: host x86_64, guest i386). The full hierarchy of the host OS it cannot see.</li> </ol> <p>On a classic System-V-based operating system it is relatively easy to use chroot() environments. For example, to start a specific daemon for test or other reasons inside a chroot()-based guest OS tree, mount <tt>/proc</tt>, <tt>/sys</tt> and a few other API file systems into the tree, and then use <tt>chroot(1)</tt> to enter the chroot, and finally run the SysV init script via <tt>/sbin/service</tt> from inside the chroot.</p> <p>On a systemd-based OS things are not that easy anymore. One of the big advantages of systemd is that all daemons are guaranteed to be invoked in a completely clean and independent context which is in no way related to the context of the user asking for the service to be started. While in sysvinit-based systems a large part of the execution context (like resource limits, environment variables and suchlike) is inherited from the user shell invoking the init skript, in systemd the user just notifies the init daemon, and the init daemon will then fork off the daemon in a sane, well-defined and pristine execution context and no inheritance of the user context parameters takes place. While this is a formidable feature it actually breaks traditional approaches to invoke a service inside a chroot() environment: since the actual daemon is always spawned off PID 1 and thus inherits the chroot() settings from it, it is irrelevant whether the client which asked for the daemon to start is chroot()ed or not. On top of that, since systemd actually places its local communications sockets in <tt>/run/systemd</tt> a process in a chroot() environment will not even be able to talk to the init system (which however is probably a good thing, and the daring can work around this of course by making use of bind mounts.)</p> <p>This of course opens the question how to use chroot()s properly in a systemd environment. And here's what we came up with for you, which hopefully answers this question thoroughly and comprehensively:</p> <p>Let's cover the first usecase first: locking a daemon into a chroot() jail for security purposes. To begin with, chroot() as a security tool is actually quite dubious, since chroot() is not a one-way street. It is relatively easy to escape a chroot() environment, <a href="http://linux.die.net/man/2/chroot">as even the man page points out</a>. Only in combination with a few other techniques it can be made somewhat secure. Due to that it usually requires specific support in the applications to chroot() themselves in a tamper-proof way. On top of that it usually requires a deep understanding of the chroot()ed service to set up the chroot() environment properly, for example to know which directories to bind mount from the host tree, in order to make available all communication channels in the chroot() the service actually needs. Putting this together, chroot()ing software for security purposes is almost always done best in the C code of the daemon itself. The developer knows best (or at least <i>should</i> know best) how to properly secure down the chroot(), and what the minimal set of files, file systems and directories is the daemon will need inside the chroot(). These days a number of daemons are capable of doing this, unfortunately however of those running by default on a normal Fedora installation only two are doing this: <a href="http://avahi.org/">Avahi</a> and RealtimeKit. Both apparently written by the same really smart dude. Chapeau! ;-) (Verify this easily by running <tt>ls -l /proc/*/root</tt> on your system.)</p> <p>That all said, systemd of course does offer you a way to chroot() specific daemons and manage them like any other with the usual tools. This is supported via the <tt>RootDirectory=</tt> option in systemd service files. Here's an example:</p> <pre>[Unit] Description=A chroot()ed Service [Service] RootDirectory=/srv/chroot/foobar ExecStartPre=/usr/local/bin/setup-foobar-chroot.sh ExecStart=/usr/bin/foobard RootDirectoryStartOnly=yes</pre> <p>In this example, <tt>RootDirectory=</tt> configures where to chroot() to before invoking the daemon binary specified with <tt>ExecStart=</tt>. Note that the path specified in <tt>ExecStart=</tt> needs to refer to the binary inside the chroot(), it is not a path to the binary in the host tree (i.e. in this example the binary executed is seen as <tt>/srv/chroot/foobar/usr/bin/foobard</tt> from the host OS). Before the daemon is started a shell script <tt>setup-foobar-chroot.sh</tt> is invoked, whose purpose it is to set up the chroot environment as necessary, i.e. mount <tt>/proc</tt> and similar file systems into it, depending on what the service might need. With the <tt>RootDirectoryStartOnly=</tt> switch we ensure that only the daemon as specified in <tt>ExecStart=</tt> is chrooted, but not the <tt>ExecStartPre=</tt> script which needs to have access to the full OS hierarchy so that it can bind mount directories from there. (For more information on these switches see the respective <a href="http://0pointer.de/public/systemd-man/systemd.service.html">man</a> <a href="http://0pointer.de/public/systemd-man/systemd.exec.html">pages</a>.) If you place a unit file like this in <tt>/etc/systemd/system/foobar.service</tt> you can start your chroot()ed service by typing <tt>systemctl start foobar.service</tt>. You may then introspect it with <tt>systemctl status foobar.service</tt>. It is accessible to the administrator like any other service, the fact that it is chroot()ed does -- unlike on SysV -- not alter how your monitoring and control tools interact with it.</p> <p>Newer Linux kernels support file system namespaces. These are similar to <tt>chroot()</tt> but a lot more powerful, and they do not suffer by the same security problems as <tt>chroot()</tt>. systemd exposes a subset of what you can do with file system namespaces right in the unit files themselves. Often these are a useful and simpler alternative to setting up full chroot() environment in a subdirectory. With the switches <tt>ReadOnlyDirectories=</tt> and <tt>InaccessibleDirectories=</tt> you may setup a file system namespace jail for your service. Initially, it will be identical to your host OS' file system namespace. By listing directories in these directives you may then mark certain directories or mount points of the host OS as read-only or even completely inaccessible to the daemon. Example:</p> <pre>[Unit] Description=A Service With No Access to /home [Service] ExecStart=/usr/bin/foobard InaccessibleDirectories=/home</pre> <p>This service will have access to the entire file system tree of the host OS with one exception: /home will not be visible to it, thus protecting the user's data from potential exploiters. (<a href="http://0pointer.de/public/systemd-man/systemd.exec.html">See the man page for details on these options.</a>)</p> <p>File system namespaces are in fact a better replacement for <tt>chroot()</tt>s in many many ways. Eventually Avahi and RealtimeKit should probably be updated to make use of namespaces replacing <tt>chroot()</tt>s.</p> <p>So much about the security usecase. Now, let's look at the other use case: setting up and controlling OS images for debugging, testing, building, installing or recovering.</p> <p>chroot() environments are relatively simple things: they only virtualize the file system hierarchy. By chroot()ing into a subdirectory a process still has complete access to all system calls, can kill all processes and shares about everything else with the host it is running on. To run an OS (or a small part of an OS) inside a chroot() is hence a dangerous affair: the isolation between host and guest is limited to the file system, everything else can be freely accessed from inside the chroot(). For example, if you upgrade a distribution inside a chroot(), and the package scripts send a SIGTERM to PID 1 to trigger a reexecution of the init system, this will actually take place in the host OS! On top of that, SysV shared memory, abstract namespace sockets and other IPC primitives are shared between host and guest. While a completely secure isolation for testing, debugging, building, installing or recovering an OS is probably not necessary, a basic isolation to avoid <i>accidental</i> modifications of the host OS from inside the chroot() environment is desirable: you never know what code package scripts execute which might interfere with the host OS.</p> <p>To deal with chroot() setups for this use systemd offers you a couple of features:</p> <p>First of all, <tt>systemctl</tt> detects when it is run in a chroot. If so, most of its operations will become NOPs, with the exception of <tt>systemctl enable</tt> and <tt>systemctl disable</tt>. If a package installation script hence calls these two commands, services will be enabled in the guest OS. However, should a package installation script include a command like <tt>systemctl restart</tt> as part of the package upgrade process this will have no effect at all when run in a chroot() environment.</p> <p>More importantly however systemd comes out-of-the-box with the <a href="http://0pointer.de/public/systemd-man/systemd-nspawn.html">systemd-nspawn</a> tool which acts as chroot(1) on steroids: it makes use of file system and PID namespaces to boot a simple lightweight container on a file system tree. It can be used almost like chroot(1), except that the isolation from the host OS is much more complete, a lot more secure and even easier to use. In fact, <tt>systemd-nspawn</tt> is capable of booting a <i>complete</i> systemd or sysvinit OS in container with a single command. Since it virtualizes PIDs, the init system in the container can act as PID 1 and thus do its job as normal. In contrast to chroot(1) this tool will implicitly mount <tt>/proc</tt>, <tt>/sys</tt> for you.</p> <p>Here's an example how in three commands you can boot a Debian OS on your Fedora machine inside an nspawn container:</p> <pre># yum install debootstrap # debootstrap --arch=amd64 unstable debian-tree/ # systemd-nspawn -D debian-tree/</pre> <p>This will bootstrap the OS directory tree and then simply invoke a shell in it. If you want to boot a full system in the container, use a command like this:</p> <pre># systemd-nspawn -D debian-tree/ /sbin/init</pre> <p>And after a quick bootup you should have a shell prompt, inside a complete OS, booted in your container. The container will not be able to see any of the processes outside of it. It will share the network configuration, but not be able to modify it. (Expect a couple of EPERMs during boot for that, which however should not be fatal). Directories like <tt>/sys</tt> and <tt>/proc/sys</tt> are available in the container, but mounted read-only in order to avoid that the container can modify kernel or hardware configuration. Note however that this protects the host OS only from <i>accidental</i> changes of its parameters. A process in the container can manually remount the file systems read-writeable and then change whatever it wants to change.</p> <p>So, what's so great about <tt>systemd-nspawn</tt> again?</p> <ol> <li>It's really easy to use. No need to manually mount <tt>/proc</tt> and <tt>/sys</tt> into your chroot() environment. The tool will do it for you and the kernel automatically cleans it up when the container terminates.</li> <li>The isolation is much more complete, protecting the host OS from accidental changes from inside the container.</li> <li>It's so good that you can actually boot a full OS in the container, not just a single lonesome shell.</li> <li>It's actually tiny and installed everywhere where systemd is installed. No complicated installation or setup.</li> </ol> <p>systemd itself has been modified to work very well in such a container. For example, when shutting down and detecting that it is run in a container, it just calls exit(), instead of reboot() as last step.</p> <p>Note that <tt>systemd-nspawn</tt> is not a full container solution. If you need that <a href="http://lxc.sourceforge.net/">LXC</a> is the better choice for you. It uses the same underlying kernel technology but offers a lot more, including network virtualization. If you so will, <tt>systemd-nspawn</tt> is the GNOME 3 of container solutions: slick and trivially easy to use -- but with few configuration options. LXC OTOH is more like KDE: more configuration options than lines of code. I wrote <tt>systemd-nspawn</tt> specifically to cover testing, debugging, building, installing, recovering. That's what you should use it for and what it is really good at, and where it is a much much nicer alternative to chroot(1).</p> <p>So, let's get this finished, this was already long enough. Here's what to take home from this little blog story:</p> <ol> <li>Secure chroot()s are best done natively in the C sources of your program.</li> <li><tt>ReadOnlyDirectories=</tt>, <tt>InaccessibleDirectories=</tt> might be suitable alternatives to a full chroot() environment.</li> <li><tt>RootDirectory=</tt> is your friend if you want to chroot() a specific service.</li> <li><tt>systemd-nspawn</tt> is made of awesome.</li> <li>chroot()s are lame, file system namespaces are totally l33t.</li> </ol> <p>All of this is readily available on your Fedora 15 system.</p> <p>And that's it for today. See you again for the next installment.</p> Lennart PoetteringFri, 08 Apr 2011 00:45:00 +0200tag:0pointer.net,2011-04-08:/blog/projects/changing-roots.htmlprojectsGNOME 3.0 Is Out!https://0pointer.net/blog/projects/gnome3.html <p><a href="http://gnome.org/">The next generation desktop has arrived.</a> I am running it as I type this, and so should you. <a href="http://gnome3.org/tryit.html">So, go, get it!</a></p> <p>If you are in <b>Berlin</b> on Friday you should also attend our <a href="https://live.gnome.org/ThreePointZero/LaunchParty/Germany/Berlin">GNOME 3.0 Release Party</a>. It's at the world famous <a href="http://www.c-base.org/">c-base</a>, in the remains of an alien spaceship that crashed into Berlin 4.5 billion years ago (no kidding!). We've got Ubuntu's <a href="http://daniel.holba.ch/blog/">Daniel Holbach</a> as DJ, and a few folks from the GNOME community will do a talk or two (including that annoying dude who created Avahi, PulseAudio and systemd). We even got Mirko Boehm from the KDE side to say a few things. And there are going to be GNOME 3 goodies! How awesome is that? <a href="https://live.gnome.org/ThreePointZero/LaunchParty/Germany/Berlin">See the wiki page for further details.</a></p> <p>And here's your homework until Friday: <a href="http://gnome3.org/tryit.html">Try out GNOME 3.0!</a></p> <p><a title="Help promote GNOME 3!" href="https://live.gnome.org/ThreePointZero/Promote"><img style="vertical-align: top; border: 0" alt="I am GNOME" src="http://www.gnome.org/wp-content/uploads/2011/04/iamgnome.png" width="200" height="200" /></a></p> Lennart PoetteringWed, 06 Apr 2011 23:02:00 +0200tag:0pointer.net,2011-04-06:/blog/projects/gnome3.htmlprojectsThe GNOME 3.0 Live CDhttps://0pointer.net/blog/projects/live-cd.html <p><a href="http://blogs.fedoraproject.org/wp/mclasen/2011/03/31/another-way-to-try-gnome-3/">The Fedora GNOME 3.0 Live CD</a> is made of awesome. Not just because it showcases the awesomeness that is GNOME 3, but also because it's built on an awesome <a href="http://freedesktop.org/wiki/Software/systemd">systemd</a>-based OS. Double awesome!</p> <p>So, get it, play with it. It's the future of computing: GNOME and systemd and Linux. Triple awesome!</p> <p>And did I mention that F15 is going the <i>awesomest</i> OS release ever?</p> <p>Nope, there's no April 1st joke in here. It's really honestly just ... <i>awesome</i>!</p> Lennart PoetteringFri, 01 Apr 2011 21:04:00 +0200tag:0pointer.net,2011-04-01:/blog/projects/live-cd.htmlprojectsFinal Reminderhttps://0pointer.net/blog/projects/final-reminder.html <p>Citizens! GNOMErs! Only two days are left and the <a href="http://www.desktopsummit.org/">GUADEC/Desktop Summit</a> CFP is over (end date is Friday). <a href="https://www.desktopsummit.org/submit">Submit your presentation proposal now</a>, or it is too late. <a href="https://www.desktopsummit.org/cfp">Read the CFP.</a></p> <p>Oh, and regarding the need for a KDE identity account: due to limited manpower we decided to reuse existing infrastucture instead of setting up a completely new one. We do acknowledge that this is not ideal and we'd like to ask for your understanding. (Creating a KDE identity account is unrestricted, and you can easily create one even if you never had anything to do with KDE in your life.)</p> <p>Note that we are looking for both <b>lightning talks</b> and full-length presentations. If you are interested in doing a lightning talk (and we can only encourage you to), please use the same form to make your submission.</p> Lennart PoetteringWed, 23 Mar 2011 18:42:00 +0100tag:0pointer.net,2011-03-23:/blog/projects/final-reminder.htmlprojectsDesktop Summit/GUADEC 2011 CFP ends in one Weekhttps://0pointer.net/blog/projects/cfp-ends-in-one-week.html <p>I'd like to remind everybody that only one week is left until the <a href="https://www.desktopsummit.org/">Desktop Summit (aka GUADEC 2011)</a> <a href="https://www.desktopsummit.org/cfp">Call for Participation</a> ends. We want your talk proposals, and that quickly, before it's too late!</p> <p>Berlin in summer is fantastic. You wouldn't want to miss that, would you?</p> <p>So, read the <a href="https://www.desktopsummit.org/cfp">CFP again</a>, and then <a href="https://www.desktopsummit.org/submit">submit something.</a></p> <p>The CFP ends next friday. So hurry!</p> <p>Thank you,<br /> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Lennart</p> Lennart PoetteringThu, 17 Mar 2011 19:51:00 +0100tag:0pointer.net,2011-03-17:/blog/projects/cfp-ends-in-one-week.htmlprojectssystemd for Administrators, Part Vhttps://0pointer.net/blog/projects/three-levels-of-off.html <p>It has been a while since the <a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html">last installment</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html">of my systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">for Administrators</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">series</a>, but now with the release of Fedora 15 based on systemd looming, here's a new episode:</p> <h4>The Three Levels of "Off"</h4> <p>In <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>, there are three levels of turning off a service (or other unit). Let's have a look which those are:</p> <ol> <li> <p>You can <b>stop</b> a service. That simply terminates the running instance of the service and does little else. If due to some form of activation (such as manual activation, socket activation, bus activation, activation by system boot or activation by hardware plug) the service is requested again afterwards it will be started. Stopping a service is hence a very simple, temporary and superficial operation. Here's an example how to do this for the NTP service:</p> <pre>$ systemctl stop ntpd.service</pre> <p>This is roughly equivalent to the following traditional command which is available on most SysV inspired systems:</p> <pre>$ service ntpd stop</pre> <p>In fact, on Fedora 15, if you execute the latter command it will be transparently converted to the former.</p> </li> <li> <p>You can <b>disable</b> a service. This unhooks a service from its activation triggers. That means, that depending on your service it will no longer be activated on boot, by socket or bus activation or by hardware plug (or any other trigger that applies to it). However, you can still start it manually if you wish. If there is already a started instance disabling a service will <i>not</i> have the effect of stopping it. Here's an example how to disable a service:</p> <pre>$ systemctl disable ntpd.service</pre> <p>On traditional Fedora systems, this is roughly equivalent to the following command:</p> <pre>$ chkconfig ntpd off</pre> <p>And here too, on Fedora 15, the latter command will be transparently converted to the former, if necessary.</p> <p>Often you want to combine stopping and disabling a service, to get rid of the current instance and make sure it is not started again (except when manually triggered):</p> <pre>$ systemctl disable ntpd.service $ systemctl stop ntpd.service</pre> <p>Commands like this are for example used during package deinstallation of systemd services on Fedora.</p> <p>Disabling a service is a permanent change; until you undo it it will be kept, even across reboots.</p> </li> <li> <p>You can <b>mask</b> a service. This is like disabling a service, but on steroids. It not only makes sure that service is not started automatically anymore, but even ensures that a service cannot even be started manually anymore. This is a bit of a hidden feature in systemd, since it is not commonly useful and might be confusing the user. But here's how you do it:</p> <pre>$ ln -s /dev/null /etc/systemd/system/ntpd.service $ systemctl daemon-reload</pre> <p>By symlinking a service file to <tt>/dev/null</tt> you tell systemd to never start the service in question and completely block its execution. Unit files stored in <tt>/etc/systemd/system</tt> override those from <tt>/lib/systemd/system</tt> that carry the same name. The former directory is administrator territory, the latter terroritory of your package manager. By installing your symlink in <tt>/etc/systemd/system/ntpd.service</tt> you hence make sure that systemd will never read the upstream shipped service file <tt>/lib/systemd/system/ntpd.service</tt>.</p> <p>systemd will recognize units symlinked to <tt>/dev/null</tt> and show them as <i>masked</i>. If you try to start such a service manually (via <tt>systemctl start</tt> for example) this will fail with an error.</p> <p>A similar trick on SysV systems does not (officially) exist. However, there are a few unofficial hacks, such as editing the init script and placing an <tt>exit 0</tt> at the top, or removing its execution bit. However, these solutions have various drawbacks, for example they interfere with the package manager.</p> <p>Masking a service is a permanent change, much like disabling a service.</p> </li> </ol> <p>Now that we learned how to turn off services on three levels, there's only one question left: how do we turn them on again? Well, it's quite symmetric. use <tt>systemctl start</tt> to undo <tt>systemctl stop</tt>. Use <tt>systemctl enable</tt> to undo <tt>systemctl disable</tt> and use <tt>rm</tt> to undo <tt>ln</tt>.</p> <p>And that's all for now. Thank you for your attention!</p> Lennart PoetteringWed, 02 Mar 2011 21:45:00 +0100tag:0pointer.net,2011-03-02:/blog/projects/three-levels-of-off.htmlprojectsDesktop Summit 2011 Call For Participationhttps://0pointer.net/blog/projects/guadec-cfp-2011.html <p>In case you haven't noticed yet: the <a href="https://www.desktopsummit.org/cfp">Call For Participation for the Desktop Summit 2011</a> (aka GUADEC 2011, aka Akademy 2011) in Berlin, Germany is open since yesterday. Submissions will be accepted until March 25th, so make sure to <a href="https://www.desktopsummit.org/submit">submit your proposals</a> quickly.</p> Lennart PoetteringTue, 01 Mar 2011 23:12:00 +0100tag:0pointer.net,2011-03-01:/blog/projects/guadec-cfp-2011.htmlprojectsFOSDEM Talk on Videohttps://0pointer.net/blog/projects/fosdem2011-video.html <p>If you have already watched my <a href="http://0pointer.de/blog/projects/lca2011-video.html">presentation on systemd I gave at linux.conf.au 2011</a> then this video of my talk on the same topic which I have gave at <a href="http://www.fosdem.org/">FOSDEM 2011</a> in Brussels, Belgium will probably not be all new to you, but the questions from the audience (and hopefully my responses) might answer a question or two you might still have. So do watch it:</p> <object width="640" height="390"> <param name="movie" value="http://www.youtube.com/v/TyMLi8QF6sw?fs=1&amp;hl=en_US" /> <param name="allowfullscreen" value="true" /> <param name="allowscriptaccess" value="always" /> <embed src="http://www.youtube.com/v/TyMLi8QF6sw?fs=1&amp;hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="390" /> </object> <p><small><i>Hmm, seems p.g.o strips the video from the blog post. So either read the original blog story or <a href="http://www.youtube.com/watch?v=TyMLi8QF6sw">watch it directly on YouTube</a>.</i></small></p> <p>Oh, and FOSDEM rocked, like every year!</p> Lennart PoetteringFri, 18 Feb 2011 00:25:00 +0100tag:0pointer.net,2011-02-18:/blog/projects/fosdem2011-video.htmlprojectsLCA Talk on Videohttps://0pointer.net/blog/projects/lca2011-video.html <p>I won't spare you the video of my talk about <a href="http://freedesktop.org/wiki/Software/systemd">systemd</a> at <a href="http://lca2011.linux.org.au/">linux.conf.au 2011</a> in Brisbane, Australia last week:</p> <object width="480" height="390"> <param name="movie" value="http://blip.tv/play/AYKf5GsC" /> <param name="allowfullscreen" value="true" /> <embed src="http://blip.tv/play/AYKf5GsC" width="480" height="390" /> </object> <p><small><i>Hmm, seems p.g.o strips the video from the blog post. So either read the original blog story or <a href="http://linuxconfau.blip.tv/file/4696791/">watch it directly on blip.tv</a>.</i></small></p> <p>LCA was fantastic and especially impressive given the circumstances of the recent floodings in Queensland. Really good conference, and congratulations to the organizers!</p> Lennart PoetteringMon, 31 Jan 2011 19:10:00 +0100tag:0pointer.net,2011-01-31:/blog/projects/lca2011-video.htmlprojectsFOSDEM Interview with Yours Trulyhttps://0pointer.net/blog/projects/fosdem2011.html <p>The FOSDEM organizers just published <a href="http://fosdem.org/2011/interview/lennart-poettering">a brief interview with yours truly</a> regarding the presentation about <a href="http://freedesktop.org/wiki/Software/systemd">systemd</a> I will be giving there on <a href="http://fosdem.org/2011/schedule/event/systemd">Sat. Feb. 5th, 3pm</a>. If you come to Brussels make sure to drop by! And even if you don't have a look on the interview!</p> <p>If you don't make it to Brussels, there are two more stops in my little <i>systemd World Tour</i> in the next weeks: today (<a href="https://conf.linux.org.au/programme/schedule/view_talk/150?day=wednesday">Wed. Jan. 26th, 2:30pm</a>) I will be speaking at linux.conf.au in Brisbane, Australia. And on <a href="http://fedoraproject.org/wiki/DeveloperConference2011">Fri. Feb. 11th, 1:20pm</a> I'll be speaking at the Red Hat Developer Conference in Brno, Czech Republic.</p> Lennart PoetteringTue, 25 Jan 2011 22:26:00 +0100tag:0pointer.net,2011-01-25:/blog/projects/fosdem2011.htmlprojectsChorinhttps://0pointer.net/blog/photos/chorin.html <p><a href="http://0pointer.de/static/chorin"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/chorin-gimped-small.jpeg" width="1024" height="291" alt="Chorin, Brandenburg, Germany" /></a></p> <p><i>Chorin Abbey Church, Brandenburg, Germany</i>. Yes, indeed, that's a crane.</p> Lennart PoetteringSun, 21 Nov 2010 00:19:00 +0100tag:0pointer.net,2010-11-21:/blog/photos/chorin.htmlphotossystemd for Administrators, Part IVhttps://0pointer.net/blog/projects/systemd-for-admins-4.html <p>Here's the fourth installment of my <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">ongoing series</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">about systemd</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html"> for administrators</a>.</p> <h4>Killing Services</h4> <p>Killing a system daemon is easy, right? Or is it?</p> <p>Sure, as long as your daemon persists only of a single process this might actually be somewhat true. You type <tt>killall rsyslogd</tt> and the syslog daemon is gone. However it is a bit dirty to do it like that given that this will kill all processes which happen to be called like this, including those an unlucky user might have named that way by accident. A slightly more correct version would be to read the <tt>.pid</tt> file, i.e. <tt>kill `cat /var/run/syslogd.pid`</tt>. That already gets us much further, but still, is this really what we want?</p> <p>More often than not it actually isn't. Consider a service like Apache, or crond, or atd, which as part of their usual operation spawn child processes. Arbitrary, user configurable child processes, such as cron or at jobs, or CGI scripts, even full application servers. If you kill the main apache/crond/atd process this might or might not pull down the child processes too, and it's up to those processes whether they want to stay around or go down as well. Basically that means that terminating Apache might very well cause its CGI scripts to stay around, reassigned to be children of init, and difficult to track down.</p> <p><a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> to the rescue: With <tt>systemctl kill</tt> you can easily send a signal to all processes of a service. Example:</p> <pre># systemctl kill crond.service</pre> <p>This will ensure that SIGTERM is delivered to all processes of the crond service, not just the main process. Of course, you can also send a different signal if you wish. For example, if you are bad-ass you might want to go for SIGKILL right-away:</p> <pre># systemctl kill -s SIGKILL crond.service</pre> <p>And there you go, the service will be brutally slaughtered in its entirety, regardless how many times it forked, whether it tried to escape supervision by double forking or fork bombing.</p> <p>Sometimes all you need is to send a specific signal to the main process of a service, maybe because you want to trigger a reload via SIGHUP. Instead of going via the PID file, here's an easier way to do this:</p> <pre># systemctl kill -s HUP --kill-who=main crond.service</pre> <p>So again, what is so new and fancy about killing services in systemd? Well, for the first time on Linux we can actually properly do that. Previous solutions were always depending on the daemons to actually cooperate to bring down everything they spawned if they themselves terminate. However, usually if you want to use SIGTERM or SIGKILL you are doing that because they actually do not cooperate properly with you.</p> <p>How does this relate to <tt>systemctl stop</tt>? <tt>kill</tt> goes directly and sends a signal to every process in the group, however <tt>stop</tt> goes through the official configured way to shut down a service, i.e. invokes the stop command configured with <tt>ExecStop=</tt> in the service file. Usually <tt>stop</tt> should be sufficient. <tt>kill</tt> is the tougher version, for cases where you either don't want the official shutdown command of a service to run, or when the service is hosed and hung in other ways.</p> <p>(It's up to you BTW to specify signal names with or without the SIG prefix on the -s switch. Both works.)</p> <p>It's a bit surprising that we have come so far on Linux without even being able to properly kill services. systemd for the first time enables you to do this properly.</p> Lennart PoetteringFri, 19 Nov 2010 18:17:00 +0100tag:0pointer.net,2010-11-19:/blog/projects/systemd-for-admins-4.htmlprojectssystemd Status Updatehttps://0pointer.net/blog/projects/systemd-update-2.html <p><a href="http://0pointer.de/blog/projects/systemd-update.html">It has been a while since my last status update on systemd</a>. Here's another short, incomprehensive status update on what we worked on for <a href="http://freedesktop.org/wiki/Software/systemd">systemd</a> since then.</p> <ul> <li>Fedora F15 (Rawhide) now includes a split up <tt>/etc/init.d/rc.sysinit</tt> (Bill Nottingham). This allows us to keep only a minimal compatibility set of shell scripts around, and boot otherwise a system without any shell scripts at all. In fact, shell scripts during early boot are only used in exceptional cases, i.e. when you enabled autoswapping (bad idea anyway), when a full SELinux relabel is necessary, during the first boot after initialization, if you have static kernel modules to load (which are not configured via the systemd-native way to do that), if you boot from a read-only NFS server, or when you rely on LVM/RAID/Multipath. If nothing of this applies to you can easily disable these parts of early boot and save several seconds on boot. How to do this I will describe in a later blog story.</li> <li>We have a fully C coded shutdown logic that kills all remaining processes, unmounts all remaining file systems, detaches all loop devices and DM volumes and does that in the right way to ensure that all these things are properly teared down even if they depend on each other in arbitrary ways. This is not only considerably faster then the traditional shell hackery for this, but also a lot safer, since we try to unmount/remount the remaining file systems with a little bit of brains. This feature is available via <tt>systemctl --force poweroff</tt> to the administrator. The <tt>--force</tt> controls whether the usual shutdown of all services is run or whether this is skipped and we immediately shall enter this final C shutdown logic. Using <tt>--force</tt> hence is a much safer replacement for the old <tt>/sbin/reboot -f</tt> and does not leave dirty file systems behind. (Thanks to Fabiano Fidencio has his colleagues from ProFUSION for this).</li> <li>systemd now includes a minmalistic readahead implementation, based on fanotify(), fadvise() and mincore(). It supports btrfs defragmentation and both SSD and HDD disks. While the effect on boots that are anyway fast (such as most stuff involving SSD) is minimal, slower and older machines benefit from this more substantially.</li> <li>We now control fsck and quota during early boot with a C tool that ensure maximum parallelization but properly implements the necessary high-level administration logic.</li> <li>Every service, every user and every user session now gets its own cgroup in the 'cpu' hierarchy thus creating better fairness between the logged in users and their sessions.</li> <li>We now provide <tt>/dev/log</tt> logging from early boot to late shutdown. If no syslog daemon is running the output is passed on to kmsg. As soon as a proper syslog daemon starts up the kmsg buffer is flushed to syslog, and hence we will have complete log coverage in syslog even for early boot.</li> <li><tt>systemctl kill</tt> was introduced, an easy command to send a signal to all processes of a service. Expect a blog story with more details about this shortly.</li> <li>systemd gained the ability to load the SELinux policy if necessary, thus supporting non-initrd boots and initrd boots from the same binary with no duplicate work. This is in fact (and surprisingly) a first among Linux init systems.</li> <li>We now initialize and set the system locale inside PID 1 to be inherited by all services and users.</li> <li>systemd has native support for <tt>/etc/crypttab</tt> and can activate encrypted LUKS/dm-crypt disks both at boot-up and during runtime. A minimal password querying infrastructure is available, where multiple agents can be used to present the password to the user. During boot the password is queried either via Plymouth or directly on the console. If a system crypto disk is plugged in after boot you are queried for the password via a GNOME agent, or a wall(1) agent. Finally, while you run <tt>systemctl start</tt> (or a similar command) a minimal TTY password agent is available which asks you for passwords right-away if this is necessary. The password querying logic is very simple, additional agents can be implemented in a trivial amount of code (Yupp, KDE folks, you can add an agent for this, too). Note that the password querying logic in systemd is only for non-user passwords, i.e. passwords that have no relation to a specific user, but rather to specific hardware or system software. In future we hope to extend this so that this can be used to query the password of SSL certificates when Apache or other servers start.</li> <li>We offer a minimal interface that external projects can use to extend the dependency graph systemd manages. In fact, the cryptsetup logic mentioned above is implemented via this 'plugin'-like system. Since we did not want to add code that deals with cryptographic disks into the systemd process itself we introduced this interface (after all cryptographic volumes are not an essential feature of a minimal OS, and unncessary on most embedded uses; also the future might bring us STC which might make this at least partially obsolete). Simply by dropping a <i>generator</i> binary into <tt>/lib/systemd/system-generators</tt> which should write out systemd unit files into a temporary directory third-party packages may extend the systemd dependency tree dynamically. This could be useful for example to automatically create a systemd service for each KVM machine or LXC container. With that in place those containers/machines could be managed and supervised with the same tools as the usual system services.</li> <li>We integrated automatic clean-up of directories such as <tt>/tmp</tt> into the <tt>tmpfiles</tt> logic we already had in place that recreates files and directories on volatile file systems such as <tt>/var/run</tt>, <tt>/var/lock</tt> or <tt>/tmp</tt>.</li> <li>We now always measure and write to the log files the system startup time we measured, broken up into how many time was spent on the kernel, the initrd and the initialization of userspace.</li> <li>We now safely destroy all user session before going down. This is a feature long missing on Linux: since user processes were not killed until the very last moment the unhealthy situation that user code was running at a time where no other daemon was remaining was a normal part of shutdown.</li> <li>systemd now understands an 'extreme' form of disabling a service: if you symlink a service name in <tt>/etc/systemd/system</tt> to <tt>/dev/null</tt> then systemd will mark it as <i>masked</i> and completely refuse starting it, regardless if this is requested manually or automaticallly. Normally it should be sufficient to simply call <tt>systemctl disable</tt> to disable a service which still allows manual activation but no automatic activation. Masking a service goes one step further.</li> <li>There's now a simple <i>condition</i> syntax in places which allows skipping or enabling units depending on the existance of a file, whether a directory is empty or whether a kernel command line option is set.</li> <li>In addition to normal shutdowns for reboot, halt or poweroff we now similarly support a kexec reboot, that reboots the machine without going though the BIOS code again.</li> <li>We have bash completion support for <tt>systemctl</tt>. (Ran Benita)</li> <li>Andrew Edmunds contributed basic support to boot Ubuntu with systemd.</li> <li>Michael Biebl and Tollef Fog Heen have worked on the systemd integration into Debian to a level that it is now possible to boot a system without having the old <tt>initscripts</tt> packaged installed. For more details <a href="http://wiki.debian.org/systemd">see the Debian Wiki</a>. Michael even tested this integration on an Ubuntu Natty system and as it turns out this works almost equally well on Ubuntu already. If you are interesting in playing around with this, ping Michael.</li> </ul> <p>And that's it for now. There's a lot of other stuff in the git commits, but most of it is smaller and I will it thus spare you.</p> <p>We have come quite far in the last year. systemd is about a year old now, and we are now able to boot a system without legacy shell scripts remaining, something that appeared to be a task for the distant future.</p> <p>All of this is available in systemd 13 and in F15/Rawhide as I type this. If you want to play around with this then consider installing Rawhide (it's fun!).</p> Lennart PoetteringFri, 19 Nov 2010 04:30:00 +0100tag:0pointer.net,2010-11-19:/blog/projects/systemd-update-2.htmlprojects27C3 Fudfesthttps://0pointer.net/blog/projects/ccc-nervt.html <p>I really wonder why on earth the 27C3 accepted a <a href="http://events.ccc.de/congress/2010/Fahrplan/events/4017.en.html">nonsensical paper like this</a> into their programme. So .. stupid. You read half the proposal and it's already kinda obvious that the presenter has no idea what he is talking of. Fundamental errors, obvious misinterpretations, outdated issues: this is just FUD.</p> <p>And apparently this talk even is anonymous? Such a coward! FUDing around anonymously is acceptable at the CCC?</p> Lennart PoetteringSat, 13 Nov 2010 21:44:00 +0100tag:0pointer.net,2010-11-13:/blog/projects/ccc-nervt.htmlprojectsLinux Plumbers Conference/Gnome Summit Recaphttps://0pointer.net/blog/projects/lpc2010-recap.html <p>Last week <a href="http://www.linuxplumbersconf.org/2010/">LPC</a> and <a href="http://live.gnome.org/Boston2010">GS</a> 2010 took place in Cambridge, MA. Like the last years, LPC showed again that -- at least for me -- it is one of the most relevant Linux conferences in existence, if not the single most relevant one.</p> <p>Here's a terse, incomprehensive report of the different discussions I took part in with various folks at the conference, in no particular order:</p> <p>The <a href="http://wiki.linuxplumbersconf.org/2010:early_boot_and_init_systems">Boot and Init</a> track led by Kay Sievers (Suse) was a great success. We had exciting talks which I think helped quite a bit in clearing a few things up, and hopefully helps us in consolidating the full Linux boot process among all the components involved. We had talks covering everything from the BIOS boot, to initrds, graphical boot splashes and <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>. Kay Sievers and I spoke about systemd, also covering the state of it in the Fedora and openSUSE distributions. Gustavo Barbieri (ProFUSION, Gentoo) and Michael Biebl (Debian) gave interesting talks about systemd adoption in their respective distributions. I was particularly interested in the various statistics Michael showed about SysV/LSB init script usage in Debian, because this gives an idea how much work we have in front of us in the long run. A longer discussion about the future of initrds and the logic necessary to find the root file system on boot was quite enlightening. I think this track was helpful to increase the unification and consolidation of the way Linux systems boot up and are maintained during runtime.</p> <p>Kay and I and some other folks sat down with Arjan van de Ven (Intel), to talk about the prospects of systemd in Meego. The discussions were very positive. In particular Arjan hat some great suggestions regarding use of the <a href="http://www.microsoft.com/whdc/resources/respec/specs/simp_boot.mspx">Simple Boot Flag</a> in systemd (expect this in one of the next versions) and readahead. Before systemd can find adoption in Meego we'd have to add a short number of features to systemd first, most of them should be easy to add.</p> <p>Similarly, I sat down with Martin Pitt and James Hunt (both Canonical) and discussed systemd in relation to Ubuntu. I think we managed to clear a lot of things up, and have a good chance to improve cooperation between Ubuntu and systemd in relation to APIs and maybe even more.</p> <p>We talked to Thomas Gleixner regarding userspace notifications when the wallclock time jumps relative to the monotonic clock. This is important to systemd so that we can schedule calendar jobs similar to cron, but without having to wake up periodically to check whether the wallclock time changed relatively to the monotonic clock so that we can recalculate the next point in time a calendar event is triggered. There has been previous work in this area in the kernel world, but nothing got merged. Thomas' suggestion how to add this facility should be much easier than anything proposed so far.</p> <p>I also tried to talk Andreas Gr&uuml;nbacher into supporting file system user extended attributes in various virtual file systems such as procfs, cgroupfs, sysfs and tmpfs. I hope I convinced him that this would be a good idea, since this would allow setting externally accessible attributes to all kinds of kernel objects, such as processes and devices. This would not only have uses in systemd (where we could easily store all meta information systemd needs to know about a service in the cgroupfs via xattrs, so that systemd could even crash or go away at any time and we still can read all runtime information necessary beyond mere cgrouping from the file system when systemd comes to live again) but also in the desktop environments, so that we could for example attach the human readable application name, an icon or a desktop file to the processes currently running, in a simple way where the data we attach follows the lifecycle of the process itself.</p> <p>The <a href="http://wiki.linuxplumbersconf.org/2010:audio">Audio</a> track went really well, too. I was particularly excited about Pierre-Louis Bossart's (Intel) plans regarding AC3 (and other codecs) support in <a href="http://pulseaudio.org/">PulseAudio</a>, and the simplicity of his approach. Also great was hearing about Laurent Pinchart's project to expose audio and video device routing to userspace. Finally, I really enjoyed David Henningsson's and Luke Yelavich's (both Canonical) talk regarding tracking down audio bugs on Ubuntu. I was really impressed by the elaborate tools they created to test audio drivers on users machines. Pretty cool stuff. Maybe this can be extended into a test suite for driver writers, because the current approach for driver writers (i.e. "If PulseAudio works correctly, your driver is correct") doesn't really scale (although I like the idea and take it as a compliment...). I also liked the timechart profiling results Pierre showed me that he generated for PulseAudio. Seems PulseAudio is behaving quite nicely these days.</p> <p>Together with Harald Hoyer I got a demo of David Zeuthen's disk assembly daemon (stc), which makes RAID/MD/LVM assembly more dynamic. Great stuff, and I think we convinced him to leave actual mounting of file systems to systemd instead of doing it himself.</p> <p>Harald and I also hashed out a few things to make integration between dracut and systemd nicer (i.e. passing along profiling information between the two, and information regarding the root fsck).</p> <p>I also hope I convinced Ray Strode to make Plymouth actively listen to udev for notifications about DRM devices, so that further synchronization between udev and plymouth won't be necessary, which both makes things more robust and a little bit faster.</p> <p>Kay and I talked to Greg Kroah-Hartman regarding the brokeness of VT_WAITEVENT in kernel TTY layer, and discussed what to do about this. After returning from the US Kay now did the necessary hacking work to provide a minimal sysfs based solution that allows userspace query to which TTYs <tt>/dev/console</tt> and <tt>/dev/tty0</tt> currently point, and get notifications when this changes. This should allow us to greatly simplify ConsoleKit and make it possible to add console-triggered activation to systemd (think: getty gets started the moment you switch to its virtual terminal, not already at boot).</p> <p>I also spent some time discussing the upcoming deadline scheduling kernel logic with Dario, Dhaval and Tommaso regarding its possible use in PulseAudio. I believe deadline schedule is a useful tool to hand out real-time scheduling to applications securely. As an easy path to supporting deadline scheduling in PulseAudio I suggested patching RealtimeKit to optionally use deadline scheduling for its clients. This would magically teach PA (and other clients) to use deadline scheduling without further patching in the clients.</p> <p>At GNOME Summit I sat down with Ryan Lortie and Will Thompson to discuss the the future of the D-Bus session bus and how we can move to a machine/user bus instead in a nice way. We managed to come to a nice agreement here, and this should enable us to introduce systemd for session management soonishly. Now we only need to convince the other folks having stakes in D-Bus that what we discussed is actually a good idea, expect more about this soon on dbus-devel. Ryan and I also hashed out our remaining differences regarding the exact semantics of XDG_RUNTIME_DIR, the result of which you can <a href="http://lists.freedesktop.org/archives/xdg/2010-November/011681.html">already see on the XDG mailing list</a>. Ryan already did the GLib work to introduce XDG_RUNTIME_DIR and systemd already supports this inofficially since a few versions.</p> <p>I quite appreciate how Michael Meeks <a href="http://lwn.net/Articles/414051/">quoted me</a> in his final keynote. ;-)</p> <p>There was a lot of other stuff going on at the conference, and what I wrote above is in no way complete. And of course, besides all the technical stuff, it was great meeting all the good Linux folks again, especially my colleagues from Red Hat.</p> <p>I am still amazed how systemd is received so positively and with open arms all across the board. It's particularly amazing that systemd at this point in time has already been adopted by various companies in the automotive and aviation industry.</p> Lennart PoetteringTue, 09 Nov 2010 21:21:00 +0100tag:0pointer.net,2010-11-09:/blog/projects/lpc2010-recap.htmlprojectsOff to LPC 2010, Bostonhttps://0pointer.net/blog/projects/lpc2010.html <p>Later this week the <a href="http://www.linuxplumbersconf.org/2010/">Linux Plumbers Conference 2010</a> will take place at the Hyatt Regency in Cambridge.</p> <p>Together with Mark Brown I'll be running the conference track about <a href="http://www.linuxplumbersconf.org/2010/ocw/events/LPC2010MC/tracks/53">Audio</a>, and I believe we managed to put together quite a nice schedule with various interesting talks covering many areas of what Audio on Linux is about.</p> <p>I'll also be around at the <a href="http://wiki.linuxplumbersconf.org/2010:early_boot_and_init_systems">Boot and Init Systems</a> track which Kay Sievers is running. Together with Kay I'll do a session about <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>, everybody's favourite system and session manager. We also managed to convince a number of distribution maintainers of systemd to do short presentations about the state of systemd adoption in their respective distributions: Michael Biebl from Debian, Gustavo Barbieri from Gentoo, Kay for openSUSE and yours truly for Fedora.</p> <p>Because there never can be enough systemd coverage at a conference I'll do another talk about systemd, in Vincent Untz' <a href="http://www.linuxplumbersconf.org/2010/ocw/events/LPC2010MC/tracks/117">Desktop</a> track, this time focussing less on how to boot and maintain a system, but more on doing the same for desktop sessions, in particular GNOME.</p> <p>I'll also stick around for the the first two days of the GNOME Boston Summit.</p> <p>See you in Cambridge!</p> Lennart PoetteringMon, 01 Nov 2010 02:18:00 +0100tag:0pointer.net,2010-11-01:/blog/projects/lpc2010.htmlprojectsFOSS.in CFP Deadline Approaching!https://0pointer.net/blog/projects/foss-in-2010.html <p>I just submitted my paper<sup>[1]</sup> for FOSS.in 2010 in Bangalore/India. Don't forget to submit yours! The CFP closes on 10th of October. That's this Sunday! Hurry up, before it is too late!</p> <p>FOSS.in is one of the most amazing Free Software conferences this world has to offer (hey, and I think I can say that because I have presented at quite a few). A dedicated audience, flawless organization, magic hospitality, and all this in incredible India! Both the technical programme and everything around it are impressive. Which other conference can offer you <a href="http://en.wikipedia.org/wiki/The_Raghu_Dixit_Project">a concert of one of India's greatest acts</a> as part of the schedule? Which other international conference host city can be such a positive attack on your senses as Bangalore (see that endless sea of flowers below)? And where else do they serve <a href="http://en.wikipedia.org/wiki/Vark">pure silver</a> as part of the conference catering?</p> <p><a href="http://0pointer.de/photos/?gallery=India%20Bangalore%202009-12&amp;photo=139"><img src="http://0pointer.de/photos/galleries/India%20Bangalore%202009-12/lq/img-139.jpg" width="640" height="427" alt="Bangalore Market" /></a></p> <p><a href="http://foss.in/news/foss-in2010-call-for-participation.html">Read the CFP!</a> Or, <a href="http://foss.in/register/speaker">go straight to submitting a paper.</a></p> <p><b><small>Footnotes</small></b></p> <p><sup>[1]</sup> About systemd.</p> Lennart PoetteringThu, 07 Oct 2010 01:45:00 +0200tag:0pointer.net,2010-10-07:/blog/projects/foss-in-2010.htmlprojectssystemd for Administrators, Part IIIhttps://0pointer.net/blog/projects/systemd-for-admins-3.html <p>Here's the third installment of my <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">ongoing series about</a> <a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html">systemd for administrators</a>.</p> <h4>How Do I Convert A SysV Init Script Into A systemd Service File?</h4> <p>Traditionally, Unix and Linux services (<i>daemons</i>) are started via SysV init scripts. These are Bourne Shell scripts, usually residing in a directory such as <tt>/etc/rc.d/init.d/</tt> which when called with one of a few standardized arguments (verbs) such as <tt>start</tt>, <tt>stop</tt> or <tt>restart</tt> controls, i.e. starts, stops or restarts the service in question. For starts this usually involves invoking the daemon binary, which then forks a background process (more precisely <i>daemonizes</i>). Shell scripts tend to be slow, needlessly hard to read, very verbose and fragile. Although they are immensly flexible (after all, they are just code) some things are very hard to do properly with shell scripts, such as ordering parallized execution, correctly supervising processes or just configuring execution contexts in all detail. systemd provides compatibility with these shell scripts, but due to the shortcomings pointed out it is recommended to install native systemd service files for all daemons installed. Also, in contrast to SysV init scripts which have to be adjusted to the distribution systemd service files are compatible with any kind of distribution running systemd (which become more and more these days...). What follows is a terse guide how to take a SysV init script and translate it into a native systemd service file. Ideally, upstream projects should ship and install systemd service files in their tarballs. If you have successfully converted a SysV script according to the guidelines it might hence be a good idea to submit the file as patch to upstream. How to prepare a patch like that will be discussed in a later installment, suffice to say at this point that the <a href="http://0pointer.de/public/systemd-man/daemon.html">daemon(7)</a> manual page shipping with systemd contains a lot of useful information regarding this.</p> <p>So, let's jump right in. As an example we'll convert the init script of the ABRT daemon into a systemd service file. ABRT is a standard component of every Fedora install, and is an acronym for Automatic Bug Reporting Tool, which pretty much describes what it does, i.e. it is a service for collecting crash dumps. <a href="http://0pointer.de/public/abrtd">Its SysV script I have uploaded here.</a></p> <p>The first step when converting such a script is to read it (surprise surprise!) and distill the useful information from the usually pretty long script. In almost all cases the script consists of mostly boilerplate code that is identical or at least very similar in all init scripts, and usually copied and pasted from one to the other. So, let's extract the interesting information from the script linked above:</p> <ul> <li>A description string for the service is "<i>Daemon to detect crashing apps</i>". As it turns out, the header comments include a redundant number of description strings, some of them describing less the actual service but the init script to start it. systemd services include a description too, and it should describe the service and not the service file.</li> <li>The LSB header<sup>[1]</sup> contains dependency information. systemd due to its design around socket-based activation usually needs no (or very little) manually configured dependencies. (For details regarding socket activation <a href="http://0pointer.de/blog/projects/systemd.html">see the original announcement blog post.</a>) In this case the dependency on <tt>$syslog</tt> (which encodes that abrtd requires a syslog daemon), is the only valuable information. While the header lists another dependency (<tt>$local_fs</tt>) this one is redundant with systemd as normal system services are always started with all local file systems available.</li> <li>The LSB header suggests that this service should be started in runlevels 3 (multi-user) and 5 (graphical).</li> <li>The daemon binary is <tt>/usr/sbin/abrtd</tt></li> </ul> <p>And that's already it. The entire remaining content of this 115-line shell script is simply boilerplate or otherwise redundant code: code that deals with synchronizing and serializing startup (i.e. the code regarding lock files) or that outputs status messages (i.e. the code calling echo), or simply parsing of the verbs (i.e. the big case block).</p> <p>From the information extracted above we can now write our systemd service file:</p> <pre>[Unit] Description=Daemon to detect crashing apps After=syslog.target [Service] ExecStart=/usr/sbin/abrtd Type=forking [Install] WantedBy=multi-user.target</pre> <p>A little explanation of the contents of this file: The <tt>[Unit]</tt> section contains generic information about the service. systemd not only manages system services, but also devices, mount points, timer, and other components of the system. The generic term for all these objects in systemd is a <i>unit</i>, and the <tt>[Unit]</tt> section encodes information about it that might be applicable not only to services but also in to the other unit types systemd maintains. In this case we set the following unit settings: we set the description string and configure that the daemon shall be started after Syslog<sup>[2]</sup>, similar to what is encoded in the LSB header of the original init script. For this Syslog dependency we create a dependency of type <tt>After=</tt> on a systemd unit <tt>syslog.target</tt>. The latter is a special target unit in systemd and is the standardized name to pull in a syslog implementation. For more information about these standardized names see the <a href="http://0pointer.de/public/systemd-man/systemd.special.html">systemd.special(7)</a>. Note that a dependency of type <tt>After=</tt> only encodes the suggested ordering, but does not actually cause syslog to be started when abrtd is -- and this is exactly what we want, since abrtd actually works fine even without syslog being around. However, if both are started (and usually they are) then the order in which they are is controlled with this dependency.</p> <p>The next section is <tt>[Service]</tt> which encodes information about the service itself. It contains all those settings that apply only to services, and not the other kinds of units systemd maintains (mount points, devices, timers, ...). Two settings are used here: <tt>ExecStart=</tt> takes the path to the binary to execute when the service shall be started up. And with <tt>Type=</tt> we configure how the service notifies the init system that it finished starting up. Since traditional Unix daemons do this by returning to the parent process after having forked off and initialized the background daemon we set the type to <tt>forking</tt> here. That tells systemd to wait until the start-up binary returns and then consider the processes still running afterwards the daemon processes.</p> <p>The final section is <tt>[Install]</tt>. It encodes information about how the suggested installation should look like, i.e. under which circumstances and by which triggers the service shall be started. In this case we simply say that this service shall be started when the <tt>multi-user.target</tt> unit is activated. This is a special unit (see above) that basically takes the role of the classic SysV Runlevel 3<sup>[3]</sup>. The setting <tt>WantedBy=</tt> has little effect on the daemon during runtime. It is only read by the <tt>systemctl enable</tt> command, which is the recommended way to enable a service in systemd. This command will simply ensure that our little service gets automatically activated as soon as <tt>multi-user.target</tt> is requested, which it is on all normal boots<sup>[4]</sup>.</p> <p>And that's it. Now we already have a minimal working systemd service file. To test it we copy it to <tt>/etc/systemd/system/abrtd.service</tt> and invoke <tt>systemctl daemon-reload</tt>. This will make systemd take notice of it, and now we can start the service with it: <tt>systemctl start abrtd.service</tt>. We can verify the status via <tt>systemctl status abrtd.service</tt>. And we can stop it again via <tt>systemctl stop abrtd.service</tt>. Finally, we can enable it, so that it is activated by default on future boots with <tt>systemctl enable abrtd.service</tt>.</p> <p>The service file above, while sufficient and basically a 1:1 translation (feature- and otherwise) of the SysV init script still has room for improvement. Here it is a little bit updated:</p> <pre>[Unit] Description=ABRT Automated Bug Reporting Tool After=syslog.target [Service] Type=dbus BusName=com.redhat.abrt ExecStart=/usr/sbin/abrtd -d -s [Install] WantedBy=multi-user.target</pre> <p>So, what did we change? Two things: we improved the description string a bit. More importantly however, we changed the type of the service to <tt>dbus</tt> and configured the D-Bus bus name of the service. Why did we do this? As mentioned classic SysV services <i>daemonize</i> after startup, which usually involves double forking and detaching from any terminal. While this is useful and necessary when daemons are invoked via a script, this is unnecessary (and slow) as well as counterproductive when a proper process babysitter such as systemd is used. The reason for that is that the forked off daemon process usually has little relation to the original process started by systemd (after all the daemonizing scheme's whole idea is to remove this relation), and hence it is difficult for systemd to figure out after the fork is finished which process belonging to the service is actually the main process and which processes might just be auxiliary. But that information is crucial to implement advanced babysitting, i.e. supervising the process, automatic respawning on abnormal termination, collectig crash and exit code information and suchlike. In order to make it easier for systemd to figure out the main process of the daemon we changed the service type to <tt>dbus</tt>. The semantics of this service type are appropriate for all services that take a name on the D-Bus system bus as last step of their initialization<sup>[5]</sup>. ABRT is one of those. With this setting systemd will spawn the ABRT process, which will no longer fork (this is configured via the <tt>-d -s</tt> switches to the daemon), and systemd will consider the service fully started up as soon as <tt>com.redhat.abrt</tt> appears on the bus. This way the process spawned by systemd is the main process of the daemon, systemd has a reliable way to figure out when the daemon is fully started up and systemd can easily supervise it.</p> <p>And that's all there is to it. We have a simple systemd service file now that encodes in 10 lines more information than the original SysV init script encoded in 115. And even now there's a lot of room left for further improvement utilizing more features systemd offers. For example, we could set <tt>Restart=restart-always</tt> to tell systemd to automatically restart this service when it dies. Or, we could use <tt>OOMScoreAdjust=-500</tt> to ask the kernel to please leave this process around when the OOM killer wreaks havoc. Or, we could use <tt>CPUSchedulingPolicy=idle</tt> to ensure that abrtd processes crash dumps in background only, always allowing the kernel to give preference to whatever else might be running and needing CPU time.</p> <p>For more information about the configuration options mentioned here, see the respective man pages <a href="http://0pointer.de/public/systemd-man/systemd.unit.html">systemd.unit(5)</a>, <a href="http://0pointer.de/public/systemd-man/systemd.service.html">systemd.service(5)</a>, <a href="http://0pointer.de/public/systemd-man/systemd.exec.html">systemd.exec(5)</a>. Or, browse <a href="http://0pointer.de/public/systemd-man/">all of systemd's man pages</a>.</p> <p>Of course, not all SysV scripts are as easy to convert as this one. But gladly, as it turns out the vast majority actually are.</p> <p>That's it for today, come back soon for the next installment in our series.</p> <p><b>Footnotes</b></p> <p><small>[1] The LSB header of init scripts is a convention of including meta data about the service in comment blocks at the top of SysV init scripts and <a href="http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/initscrcomconv.html">is defined by the Linux Standard Base</a>. This was intended to standardize init scripts between distributions. While most distributions have adopted this scheme, the handling of the headers varies greatly between the distributions, and in fact still makes it necessary to adjust init scripts for every distribution. As such the LSB spec never kept the promise it made.</small></p> <p><small>[2] Strictly speaking, this dependency does not even have to be encoded here, as it is redundant in a system where the Syslog daemon is socket activatable. Modern syslog systems (for example rsyslog v5) have been patched upstream to be socket-activatable. If such a init system is used configuration of the <tt>After=syslog.target</tt> dependency is redundant and implicit. However, to maintain compatibility with syslog services that have not been updated we include this dependency here.</small></p> <p><small>[3] At least how it used to be defined on Fedora.</small></p> <p><small>[4] Note that in systemd the graphical bootup (<tt>graphical.target</tt>, taking the role of SysV runlevel 5) is an implicit superset of the console-only bootup (<tt>multi-user.target</tt>, i.e. like runlevel 3). That means hooking a service into the latter will also hook it into the former.</small></p> <p><small>[5] Actually the majority of services of the default Fedora install now take a name on the bus after startup.</small></p> Lennart PoetteringFri, 01 Oct 2010 04:42:00 +0200tag:0pointer.net,2010-10-01:/blog/projects/systemd-for-admins-3.htmlprojectsXiph Videohttps://0pointer.net/blog/projects/video.html <p><a href="http://www.xiph.org/video/">Don't miss Monty's awesome video.</a></p> Lennart PoetteringFri, 24 Sep 2010 10:58:00 +0200tag:0pointer.net,2010-09-24:/blog/projects/video.htmlprojectssystemd for Administrators, Part IIhttps://0pointer.net/blog/projects/systemd-for-admins-2.html <p>Here's the second installment of my <a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html">ongoing series about systemd for administrators</a>.</p> <h4>Which Service Owns Which Processes?</h4> <p>On most Linux systems the number of processes that are running by default is substantial. Knowing which process does what and where it belongs to becomes increasingly difficult. Some services even maintain a couple of worker processes which clutter the "<tt>ps</tt>" output with many additional processes that are often not easy to recognize. This is further complicated if daemons spawn arbitrary 3rd-party processes, as Apache does with CGI processes, or cron does with user jobs.</p> <p>A slight remedy for this is often the process inheritance tree, as shown by "<tt>ps xaf</tt>". However this is usually not reliable, as processes whose parents die get reparented to PID 1, and hence all information about inheritance gets lost. If a process "double forks" it hence loses its relationships to the processes that started it. (This actually is supposed to be a feature and is relied on for the traditional Unix daemonizing logic.) Furthermore processes can freely change their names with <tt>PR_SETNAME</tt> or by patching <tt>argv[0]</tt>, thus making it harder to recognize them. In fact they can play hide-and-seek with the administrator pretty nicely this way.</p> <p>In systemd we place every process that is spawned in a <i>control group</i> named after its service. Control groups (or <i>cgroups</i>) at their most basic are simply groups of processes that can be arranged in a hierarchy and labelled individually. When processes spawn other processes these children are automatically made members of the parents cgroup. Leaving a cgroup is not possible for unprivileged processes. Thus, cgroups can be used as an effective way to label processes after the service they belong to and be sure that the service cannot escape from the label, regardless how often it forks or renames itself. Furthermore this can be used to safely kill a service and all processes it created, again with no chance of escaping.</p> <p>In today's installment I want to introduce you to two commands you may use to relate systemd services and processes. The first one, is the well known <tt>ps</tt> command which has been updated to show cgroup information along the other process details. And this is how it looks:</p> <pre>$ ps xawf -eo pid,user,cgroup,args PID USER CGROUP COMMAND 2 root - [kthreadd] 3 root - \_ [ksoftirqd/0] [...] 4281 root - \_ [flush-8:0] 1 root name=systemd:/systemd-1 /sbin/init 455 root name=systemd:/systemd-1/sysinit.service /sbin/udevd -d 28188 root name=systemd:/systemd-1/sysinit.service \_ /sbin/udevd -d 28191 root name=systemd:/systemd-1/sysinit.service \_ /sbin/udevd -d 1096 dbus name=systemd:/systemd-1/dbus.service /bin/dbus-daemon --system --address=systemd: --nofork --systemd-activation 1131 root name=systemd:/systemd-1/auditd.service auditd 1133 root name=systemd:/systemd-1/auditd.service \_ /sbin/audispd 1135 root name=systemd:/systemd-1/auditd.service \_ /usr/sbin/sedispatch 1171 root name=systemd:/systemd-1/NetworkManager.service /usr/sbin/NetworkManager --no-daemon 4028 root name=systemd:/systemd-1/NetworkManager.service \_ /sbin/dhclient -d -4 -sf /usr/libexec/nm-dhcp-client.action -pf /var/run/dhclient-wlan0.pid -lf /var/lib/dhclient/dhclient-7d32a784-ede9-4cf6-9ee3-60edc0bce5ff-wlan0.lease - 1175 avahi name=systemd:/systemd-1/avahi-daemon.service avahi-daemon: running [epsilon.local] 1194 avahi name=systemd:/systemd-1/avahi-daemon.service \_ avahi-daemon: chroot helper 1193 root name=systemd:/systemd-1/rsyslog.service /sbin/rsyslogd -c 4 1195 root name=systemd:/systemd-1/cups.service cupsd -C /etc/cups/cupsd.conf 1207 root name=systemd:/systemd-1/mdmonitor.service mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid 1210 root name=systemd:/systemd-1/irqbalance.service irqbalance 1216 root name=systemd:/systemd-1/dbus.service /usr/sbin/modem-manager 1219 root name=systemd:/systemd-1/dbus.service /usr/libexec/polkit-1/polkitd 1242 root name=systemd:/systemd-1/dbus.service /usr/sbin/wpa_supplicant -c /etc/wpa_supplicant/wpa_supplicant.conf -B -u -f /var/log/wpa_supplicant.log -P /var/run/wpa_supplicant.pid 1249 68 name=systemd:/systemd-1/haldaemon.service hald 1250 root name=systemd:/systemd-1/haldaemon.service \_ hald-runner 1273 root name=systemd:/systemd-1/haldaemon.service \_ hald-addon-input: Listening on /dev/input/event3 /dev/input/event9 /dev/input/event1 /dev/input/event7 /dev/input/event2 /dev/input/event0 /dev/input/event8 1275 root name=systemd:/systemd-1/haldaemon.service \_ /usr/libexec/hald-addon-rfkill-killswitch 1284 root name=systemd:/systemd-1/haldaemon.service \_ /usr/libexec/hald-addon-leds 1285 root name=systemd:/systemd-1/haldaemon.service \_ /usr/libexec/hald-addon-generic-backlight 1287 68 name=systemd:/systemd-1/haldaemon.service \_ /usr/libexec/hald-addon-acpi 1317 root name=systemd:/systemd-1/abrtd.service /usr/sbin/abrtd -d -s 1332 root name=systemd:/systemd-1/getty@.service/tty2 /sbin/mingetty tty2 1339 root name=systemd:/systemd-1/getty@.service/tty3 /sbin/mingetty tty3 1342 root name=systemd:/systemd-1/getty@.service/tty5 /sbin/mingetty tty5 1343 root name=systemd:/systemd-1/getty@.service/tty4 /sbin/mingetty tty4 1344 root name=systemd:/systemd-1/crond.service crond 1346 root name=systemd:/systemd-1/getty@.service/tty6 /sbin/mingetty tty6 1362 root name=systemd:/systemd-1/sshd.service /usr/sbin/sshd 1376 root name=systemd:/systemd-1/prefdm.service /usr/sbin/gdm-binary -nodaemon 1391 root name=systemd:/systemd-1/prefdm.service \_ /usr/libexec/gdm-simple-slave --display-id /org/gnome/DisplayManager/Display1 --force-active-vt 1394 root name=systemd:/systemd-1/prefdm.service \_ /usr/bin/Xorg :0 -nr -verbose -auth /var/run/gdm/auth-for-gdm-f2KUOh/database -nolisten tcp vt1 1495 root name=systemd:/user/lennart/1 \_ pam: gdm-password 1521 lennart name=systemd:/user/lennart/1 \_ gnome-session 1621 lennart name=systemd:/user/lennart/1 \_ metacity 1635 lennart name=systemd:/user/lennart/1 \_ gnome-panel 1638 lennart name=systemd:/user/lennart/1 \_ nautilus 1640 lennart name=systemd:/user/lennart/1 \_ /usr/libexec/polkit-gnome-authentication-agent-1 1641 lennart name=systemd:/user/lennart/1 \_ /usr/bin/seapplet 1644 lennart name=systemd:/user/lennart/1 \_ gnome-volume-control-applet 1646 lennart name=systemd:/user/lennart/1 \_ /usr/sbin/restorecond -u 1652 lennart name=systemd:/user/lennart/1 \_ /usr/bin/devilspie 1662 lennart name=systemd:/user/lennart/1 \_ nm-applet --sm-disable 1664 lennart name=systemd:/user/lennart/1 \_ gnome-power-manager 1665 lennart name=systemd:/user/lennart/1 \_ /usr/libexec/gdu-notification-daemon 1670 lennart name=systemd:/user/lennart/1 \_ /usr/libexec/evolution/2.32/evolution-alarm-notify 1672 lennart name=systemd:/user/lennart/1 \_ /usr/bin/python /usr/share/system-config-printer/applet.py 1674 lennart name=systemd:/user/lennart/1 \_ /usr/lib64/deja-dup/deja-dup-monitor 1675 lennart name=systemd:/user/lennart/1 \_ abrt-applet 1677 lennart name=systemd:/user/lennart/1 \_ bluetooth-applet 1678 lennart name=systemd:/user/lennart/1 \_ gpk-update-icon 1408 root name=systemd:/systemd-1/console-kit-daemon.service /usr/sbin/console-kit-daemon --no-daemon 1419 gdm name=systemd:/systemd-1/prefdm.service /usr/bin/dbus-launch --exit-with-session 1453 root name=systemd:/systemd-1/dbus.service /usr/libexec/upowerd 1473 rtkit name=systemd:/systemd-1/rtkit-daemon.service /usr/libexec/rtkit-daemon 1496 root name=systemd:/systemd-1/accounts-daemon.service /usr/libexec/accounts-daemon 1499 root name=systemd:/systemd-1/systemd-logger.service /lib/systemd/systemd-logger 1511 lennart name=systemd:/systemd-1/prefdm.service /usr/bin/gnome-keyring-daemon --daemonize --login 1534 lennart name=systemd:/user/lennart/1 dbus-launch --sh-syntax --exit-with-session 1535 lennart name=systemd:/user/lennart/1 /bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session 1603 lennart name=systemd:/user/lennart/1 /usr/libexec/gconfd-2 1612 lennart name=systemd:/user/lennart/1 /usr/libexec/gnome-settings-daemon 1615 lennart name=systemd:/user/lennart/1 /usr/libexec/gvfsd 1626 lennart name=systemd:/user/lennart/1 /usr/libexec//gvfs-fuse-daemon /home/lennart/.gvfs 1634 lennart name=systemd:/user/lennart/1 /usr/bin/pulseaudio --start --log-target=syslog 1649 lennart name=systemd:/user/lennart/1 \_ /usr/libexec/pulse/gconf-helper 1645 lennart name=systemd:/user/lennart/1 /usr/libexec/bonobo-activation-server --ac-activate --ior-output-fd=24 1668 lennart name=systemd:/user/lennart/1 /usr/libexec/im-settings-daemon 1701 lennart name=systemd:/user/lennart/1 /usr/libexec/gvfs-gdu-volume-monitor 1707 lennart name=systemd:/user/lennart/1 /usr/bin/gnote --panel-applet --oaf-activate-iid=OAFIID:GnoteApplet_Factory --oaf-ior-fd=22 1725 lennart name=systemd:/user/lennart/1 /usr/libexec/clock-applet 1727 lennart name=systemd:/user/lennart/1 /usr/libexec/wnck-applet 1729 lennart name=systemd:/user/lennart/1 /usr/libexec/notification-area-applet 1733 root name=systemd:/systemd-1/dbus.service /usr/libexec/udisks-daemon 1747 root name=systemd:/systemd-1/dbus.service \_ udisks-daemon: polling /dev/sr0 1759 lennart name=systemd:/user/lennart/1 gnome-screensaver 1780 lennart name=systemd:/user/lennart/1 /usr/libexec/gvfsd-trash --spawner :1.9 /org/gtk/gvfs/exec_spaw/0 1864 lennart name=systemd:/user/lennart/1 /usr/libexec/gvfs-afc-volume-monitor 1874 lennart name=systemd:/user/lennart/1 /usr/libexec/gconf-im-settings-daemon 1903 lennart name=systemd:/user/lennart/1 /usr/libexec/gvfsd-burn --spawner :1.9 /org/gtk/gvfs/exec_spaw/1 1909 lennart name=systemd:/user/lennart/1 gnome-terminal 1913 lennart name=systemd:/user/lennart/1 \_ gnome-pty-helper 1914 lennart name=systemd:/user/lennart/1 \_ bash 29231 lennart name=systemd:/user/lennart/1 | \_ ssh tango 2221 lennart name=systemd:/user/lennart/1 \_ bash 4193 lennart name=systemd:/user/lennart/1 | \_ ssh tango 2461 lennart name=systemd:/user/lennart/1 \_ bash 29219 lennart name=systemd:/user/lennart/1 | \_ emacs systemd-for-admins-1.txt 15113 lennart name=systemd:/user/lennart/1 \_ bash 27251 lennart name=systemd:/user/lennart/1 \_ empathy 29504 lennart name=systemd:/user/lennart/1 \_ ps xawf -eo pid,user,cgroup,args 1968 lennart name=systemd:/user/lennart/1 ssh-agent 1994 lennart name=systemd:/user/lennart/1 gpg-agent --daemon --write-env-file 18679 lennart name=systemd:/user/lennart/1 /bin/sh /usr/lib64/firefox-3.6/run-mozilla.sh /usr/lib64/firefox-3.6/firefox 18741 lennart name=systemd:/user/lennart/1 \_ /usr/lib64/firefox-3.6/firefox 28900 lennart name=systemd:/user/lennart/1 \_ /usr/lib64/nspluginwrapper/npviewer.bin --plugin /usr/lib64/mozilla/plugins/libflashplayer.so --connection /org/wrapper/NSPlugins/libflashplayer.so/18741-6 4016 root name=systemd:/systemd-1/sysinit.service /usr/sbin/bluetoothd --udev 4094 smmsp name=systemd:/systemd-1/sendmail.service sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue 4096 root name=systemd:/systemd-1/sendmail.service sendmail: accepting connections 4112 ntp name=systemd:/systemd-1/ntpd.service /usr/sbin/ntpd -n -u ntp:ntp -g 27262 lennart name=systemd:/user/lennart/1 /usr/libexec/mission-control-5 27265 lennart name=systemd:/user/lennart/1 /usr/libexec/telepathy-haze 27268 lennart name=systemd:/user/lennart/1 /usr/libexec/telepathy-logger 27270 lennart name=systemd:/user/lennart/1 /usr/libexec/dconf-service 27280 lennart name=systemd:/user/lennart/1 /usr/libexec/notification-daemon 27284 lennart name=systemd:/user/lennart/1 /usr/libexec/telepathy-gabble 27285 lennart name=systemd:/user/lennart/1 /usr/libexec/telepathy-salut 27297 lennart name=systemd:/user/lennart/1 /usr/libexec/geoclue-yahoo</pre> <p>(Note that this output is shortened, I have removed most of the kernel threads here, since they are not relevant in the context of this blog story)</p> <p>In the third column you see the cgroup systemd assigned to each process. You'll find that the <tt>udev</tt> processes are in the <tt>name=systemd:/systemd-1/sysinit.service</tt> cgroup, which is where systemd places all processes started by the <tt>sysinit.service</tt> service, which covers early boot.</p> <p>My personal recommendation is to set the shell alias <tt>psc</tt> to the ps command line shown above:</p> <pre>alias psc='ps xawf -eo pid,user,cgroup,args'</pre> <p>With this service information of processes is just four keypresses away!</p> <p>A different way to present the same information is the <tt>systemd-cgls</tt> tool we ship with systemd. It shows the cgroup hierarchy in a pretty tree. Its output looks like this:</p> <pre>$ systemd-cgls + 2 [kthreadd] [...] + 4281 [flush-8:0] + user | \ lennart | \ 1 | + 1495 pam: gdm-password | + 1521 gnome-session | + 1534 dbus-launch --sh-syntax --exit-with-session | + 1535 /bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session | + 1603 /usr/libexec/gconfd-2 | + 1612 /usr/libexec/gnome-settings-daemon | + 1615 /ushr/libexec/gvfsd | + 1621 metacity | + 1626 /usr/libexec//gvfs-fuse-daemon /home/lennart/.gvfs | + 1634 /usr/bin/pulseaudio --start --log-target=syslog | + 1635 gnome-panel | + 1638 nautilus | + 1640 /usr/libexec/polkit-gnome-authentication-agent-1 | + 1641 /usr/bin/seapplet | + 1644 gnome-volume-control-applet | + 1645 /usr/libexec/bonobo-activation-server --ac-activate --ior-output-fd=24 | + 1646 /usr/sbin/restorecond -u | + 1649 /usr/libexec/pulse/gconf-helper | + 1652 /usr/bin/devilspie | + 1662 nm-applet --sm-disable | + 1664 gnome-power-manager | + 1665 /usr/libexec/gdu-notification-daemon | + 1668 /usr/libexec/im-settings-daemon | + 1670 /usr/libexec/evolution/2.32/evolution-alarm-notify | + 1672 /usr/bin/python /usr/share/system-config-printer/applet.py | + 1674 /usr/lib64/deja-dup/deja-dup-monitor | + 1675 abrt-applet | + 1677 bluetooth-applet | + 1678 gpk-update-icon | + 1701 /usr/libexec/gvfs-gdu-volume-monitor | + 1707 /usr/bin/gnote --panel-applet --oaf-activate-iid=OAFIID:GnoteApplet_Factory --oaf-ior-fd=22 | + 1725 /usr/libexec/clock-applet | + 1727 /usr/libexec/wnck-applet | + 1729 /usr/libexec/notification-area-applet | + 1759 gnome-screensaver | + 1780 /usr/libexec/gvfsd-trash --spawner :1.9 /org/gtk/gvfs/exec_spaw/0 | + 1864 /usr/libexec/gvfs-afc-volume-monitor | + 1874 /usr/libexec/gconf-im-settings-daemon | + 1882 /usr/libexec/gvfs-gphoto2-volume-monitor | + 1903 /usr/libexec/gvfsd-burn --spawner :1.9 /org/gtk/gvfs/exec_spaw/1 | + 1909 gnome-terminal | + 1913 gnome-pty-helper | + 1914 bash | + 1968 ssh-agent | + 1994 gpg-agent --daemon --write-env-file | + 2221 bash | + 2461 bash | + 4193 ssh tango | + 15113 bash | + 18679 /bin/sh /usr/lib64/firefox-3.6/run-mozilla.sh /usr/lib64/firefox-3.6/firefox | + 18741 /usr/lib64/firefox-3.6/firefox | + 27251 empathy | + 27262 /usr/libexec/mission-control-5 | + 27265 /usr/libexec/telepathy-haze | + 27268 /usr/libexec/telepathy-logger | + 27270 /usr/libexec/dconf-service | + 27280 /usr/libexec/notification-daemon | + 27284 /usr/libexec/telepathy-gabble | + 27285 /usr/libexec/telepathy-salut | + 27297 /usr/libexec/geoclue-yahoo | + 28900 /usr/lib64/nspluginwrapper/npviewer.bin --plugin /usr/lib64/mozilla/plugins/libflashplayer.so --connection /org/wrapper/NSPlugins/libflashplayer.so/18741-6 | + 29219 emacs systemd-for-admins-1.txt | + 29231 ssh tango | \ 29519 systemd-cgls \ systemd-1 + 1 /sbin/init + ntpd.service | \ 4112 /usr/sbin/ntpd -n -u ntp:ntp -g + systemd-logger.service | \ 1499 /lib/systemd/systemd-logger + accounts-daemon.service | \ 1496 /usr/libexec/accounts-daemon + rtkit-daemon.service | \ 1473 /usr/libexec/rtkit-daemon + console-kit-daemon.service | \ 1408 /usr/sbin/console-kit-daemon --no-daemon + prefdm.service | + 1376 /usr/sbin/gdm-binary -nodaemon | + 1391 /usr/libexec/gdm-simple-slave --display-id /org/gnome/DisplayManager/Display1 --force-active-vt | + 1394 /usr/bin/Xorg :0 -nr -verbose -auth /var/run/gdm/auth-for-gdm-f2KUOh/database -nolisten tcp vt1 | + 1419 /usr/bin/dbus-launch --exit-with-session | \ 1511 /usr/bin/gnome-keyring-daemon --daemonize --login + getty@.service | + tty6 | | \ 1346 /sbin/mingetty tty6 | + tty4 | | \ 1343 /sbin/mingetty tty4 | + tty5 | | \ 1342 /sbin/mingetty tty5 | + tty3 | | \ 1339 /sbin/mingetty tty3 | \ tty2 | \ 1332 /sbin/mingetty tty2 + abrtd.service | \ 1317 /usr/sbin/abrtd -d -s + crond.service | \ 1344 crond + sshd.service | \ 1362 /usr/sbin/sshd + sendmail.service | + 4094 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue | \ 4096 sendmail: accepting connections + haldaemon.service | + 1249 hald | + 1250 hald-runner | + 1273 hald-addon-input: Listening on /dev/input/event3 /dev/input/event9 /dev/input/event1 /dev/input/event7 /dev/input/event2 /dev/input/event0 /dev/input/event8 | + 1275 /usr/libexec/hald-addon-rfkill-killswitch | + 1284 /usr/libexec/hald-addon-leds | + 1285 /usr/libexec/hald-addon-generic-backlight | \ 1287 /usr/libexec/hald-addon-acpi + irqbalance.service | \ 1210 irqbalance + avahi-daemon.service | + 1175 avahi-daemon: running [epsilon.local] + NetworkManager.service | + 1171 /usr/sbin/NetworkManager --no-daemon | \ 4028 /sbin/dhclient -d -4 -sf /usr/libexec/nm-dhcp-client.action -pf /var/run/dhclient-wlan0.pid -lf /var/lib/dhclient/dhclient-7d32a784-ede9-4cf6-9ee3-60edc0bce5ff-wlan0.lease -cf /var/run/nm-dhclient-wlan0.conf wlan0 + rsyslog.service | \ 1193 /sbin/rsyslogd -c 4 + mdmonitor.service | \ 1207 mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid + cups.service | \ 1195 cupsd -C /etc/cups/cupsd.conf + auditd.service | + 1131 auditd | + 1133 /sbin/audispd | \ 1135 /usr/sbin/sedispatch + dbus.service | + 1096 /bin/dbus-daemon --system --address=systemd: --nofork --systemd-activation | + 1216 /usr/sbin/modem-manager | + 1219 /usr/libexec/polkit-1/polkitd | + 1242 /usr/sbin/wpa_supplicant -c /etc/wpa_supplicant/wpa_supplicant.conf -B -u -f /var/log/wpa_supplicant.log -P /var/run/wpa_supplicant.pid | + 1453 /usr/libexec/upowerd | + 1733 /usr/libexec/udisks-daemon | + 1747 udisks-daemon: polling /dev/sr0 | \ 29509 /usr/libexec/packagekitd + dev-mqueue.mount + dev-hugepages.mount \ sysinit.service + 455 /sbin/udevd -d + 4016 /usr/sbin/bluetoothd --udev + 28188 /sbin/udevd -d \ 28191 /sbin/udevd -d</pre> <p>(This too is shortened, the same way)</p> <p>As you can see, this command shows the processes by their cgroup and hence service, as systemd labels the cgroups after the services. For example, you can easily see that the auditing service <tt>auditd.service</tt> spawns three individual processes, <tt>auditd</tt>, <tt>audisp</tt> and <tt>sedispatch</tt>.</p> <p>If you look closely you will notice that a number of processes have been assigned to the cgroup <tt>/user/1</tt>. At this point let's simply leave it at that systemd not only maintains services in cgroups, but user session processes as well. In a later installment we'll discuss in more detail what this about.</p> <p>So much for now, come back soon for the next installment!</p> Lennart PoetteringWed, 08 Sep 2010 00:52:00 +0200tag:0pointer.net,2010-09-08:/blog/projects/systemd-for-admins-2.htmlprojectsVideo Interview With Yours Trulyhttps://0pointer.net/blog/projects/google-video.html <p>Google just published a <a href="http://google-opensource.blogspot.com/2010/09/interviews-from-guadec-part-3.html">video interview with yours truly</a>. Watch it! Oh, and Vincent, I even put on a red shirt for you!</p> Lennart PoetteringSat, 04 Sep 2010 00:42:00 +0200tag:0pointer.net,2010-09-04:/blog/projects/google-video.htmlprojectssystemd Status Updatehttps://0pointer.net/blog/projects/systemd-update.html <p>It has been a while <a href="http://0pointer.de/blog/projects/systemd.html">since my original announcement of systemd</a>. Here's a little status update, on what happened since then. For simplicity's sake I'll just list here what we worked on in a bulleted list, with no particular order and without trying to cover this comprehensively:</p> <ul> <li>systemd has been accepted as Feature for Fedora 14, and as it looks right now everything worked out nicely and we'll ship F14 with systemd as init system.</li> <li>We added a number of additional unit types: <tt>.timer</tt> for cron-style timer-based activation of services, <tt>.swap</tt> exposes swap files and partitions the same way we handle mount points, and <tt>.path</tt> can be used to activate units dependending on the existance/creation of files or fill status of spool directories.</li> <li>We hooked systemd up to SELinux: systemd is now capabale of properly labelling directories, sockets and FIFOs it creates according to the SELinux policy for the services we maintain.</li> <li>We hooked systemd up to the Linux auditing subsystem: as first init system at all systemd now generates auditing records for all services it starts/stops, including their failure status.</li> <li>We hooked systemd up to TCP wrappers, for all socket connections it accepts.</li> <li>We hooked systemd up to PAM, so that optionally, when systemd runs a service as a different user it initializes the usual PAM session setup and teardown hooks.</li> <li>We hooked systemd up to D-Bus, so that D-Bus passes activation requests to systemd and systemd becomes the central point for all kinds of activation, thus greatly extending the control of the execution environment of bus activated services, and making them accessible through the same utilities as SysV services. Also, this enables us to do race-free parallelized start-up for D-Bus services and their clients, thus speeding up things even further.</li> <li>systemd is now able to handle various Debian and OpenSUSE-specific extensions to the classic SysV init script formats natively, on top of the Fedora extensions we already parse.</li> <li>The D-Bus coverage of the systemd interface is now complete, allowing both introspection of runtime data and of parsed configuration data. It's fun now to introspect systemd with <a href="http://davidz25.blogspot.com/2010/08/gdbus1-bash-completion.html"><tt>gdbus</tt></a> or <tt>d-feet</tt>.</li> <li>We added a <a href="http://0pointer.de/public/systemd-man/pam_systemd.html">systemd PAM module</a>, which assigns the processes of each user session to its own cgroup in the systemd cgroup tree. This also enables reliable killing of all processes associated with a session when the user logs out. This also manages a secure per-user <tt>/var/run</tt>-style directory which is supposed to be used for sockets and similar files that shall be cleaned up when the user logs out.</li> <li>There's a new tool <a href="http://0pointer.de/public/systemd-man/systemd-cgls.html"><tt>systemd-cgls</tt></a>, which plots a pretty process tree based on the systemd cgroup hierarchy. It's really pretty. Try it!</li> <li>We now have our own cgroup hierarchy beneath <tt>/cgroup/systemd</tt> (though is will move to <tt>/sys/fs/</tt> before the F14 release).</li> <li>We have pretty code that automatically spawns a getty on a serial port when the kernel console is redirected to a serial TTY.</li> <li><tt>systemctl</tt> got beefed up substantially (it can even draw dependency graphs now, via <tt>dot</tt>!), and the SysV compatiblity tools were extended to more completely and correctly support what was historically provided by SysV. For example, we'll now warn the user when systemd service files have changed but systemd was not asked to reload its configuration. Also, you can now use systemd's native client tools to reboot or shut-down an Upstart or sysvinit system, to facilitate upgrades.</li> <li>We provide a <a href="http://cgit.freedesktop.org/systemd/plain/src/sd-daemon.h">reference implementation</a> for the socket activation and other APIs for nicer interaction with systemd.</li> <li>We have a pretty complete <a href="http://0pointer.de/public/systemd-man/">set of documentation</a> now, <a href="http://0pointer.de/public/systemd-man/daemon.html">some of it</a> even extending to areas not directly related to systemd itself.</li> <li>Quite a number of upstream packages now ship with systemd service files out-of-the-box now, that work across all distributions that have adopted systemd. It is our intention to unify the boot and service management between distributions with systemd, and this shows fruits already. Furthermore a number of upstream packages now ship our patches for socket-based activation.</li> <li>Even more options that control the process execution environment or the sockets we create are now supported.</li> <li>Earlier today I began my series of blog stories on <a href="http://0pointer.de/blog/projects/systemd-for-admins-1">systemd for administrators</a>.</li> <li>We reimplemented almost all boot-up and shutdown scripts of the standard Fedora install in much smaller, simpler and faster C utilities, or in systemd itself. Most of this will not be enabled in F14 however, even though it is shipped with systemd upstream. With this enabled the entire Linux system gains a completely new feeling as the number of shells we spawn approaches zero, and the PID of the first user terminal is way &lt; 500 now, and the early boot-up is fully parallelized. We looked at the boot scripts of Fedora, OpenSUSE and Debian and distilled from this a list of functionality that makes up the early boot process and reimplemented this in C, if possible following the bahaviour of one of the existing implementations from these three distributions. This turned out to be much less effort than anticipated, and we are actually quite excited about this. Look forward to the fruits of this work in F15, when we might be able to present you a shell-less boot at least for standard desktop/laptop systems.</li> <li>We spent some time reinvestigating the current syslog logic, and came up with an elegant and simple scheme to provide <tt>/dev/log</tt> compatible logging right from the time systemd is first initialized right until the time the kernel halts the machine. Through the wonders of socket based activation we first connect the <tt>/dev/log</tt> socket with a minimal bridge to the kernel log buffer (<tt>kmsg</tt>) and then, as soon as the real syslog is started up as part of the later bootup phase, we dynamically replace this minimal bridge by the real syslog daemon -- without losing a single log message. Since one of the first things the real syslog daemon does is flushing the kernel log buffer into log files, all logged messages will sooner or later be stored on disk, regardless whether they have been generated during early boot, late boot or system runtime. On top of that if the syslog daemon terminates or is shut down during runtime, the bridge becomes active again and log output is written to kmsg again. The same applies when the system goes down. This provides a simple an robust way how we can ensure that no logs will ever be lost again, and logging is available from the beginning of boot-up to the end of shut-down. Plymouth will most likely adopt a similar scheme for initrd logging, thus ensuring that everything ever logged on the system will properly end up in the log files, whether it comes from the kernel, from the initrd, from early-boot, from runtime or shutdown. And if syslogd is not around, <tt>dmesg</tt> will provide you with access to the log messages. While this bridge is part of systemd upstream, we'll most likely enable this bridge in Fedora only starting with F15. Also note that embedded systems that have no interest in shipping a full syslogd solution can simply use this syslog bridge during the entire runtime, and thus making the kernel log buffer the centralized log storage, with all the advantages this offers: zero disk IO at runtime, access to serial and netconsole logging, and remote debug access to the kernel log buffer.</li> <li>We now install autofs units for many "API" kernel virtual file systems by default, such as <tt>binfmt_misc</tt> or <tt>hugetlbfs</tt>. That means that the file system access is readily available, client code no longer has to manually load the respective kernel modules, as they are autoloaded on first access of the file system. This has many advantages: it is not only faster to set up during boot, but also simpler for applications, as they can just assume the functionality is available. On top of that permission problems for the initialization go away, since manual module loading requires root privileges.</li> <li>Many smaller fixes and enhancements, all across the board, which if mentioned here would make this blog story another blog novel. Suffice to say, we did a lot of polishing to ready systemd for F14.</li> </ul> <p>All in all, systemd is progressing nicely, and the features we have been working on in the last months are without exception features not existing in any other of the init systems available on Linux and our feature set already was far ahead of what the older init implementations provide. And we have quite a bit planned for the future. So, stay tuned!</p> <p>Also note that I'll speak about systemd at <a href="http://www.linux-kongress.org/2010/program.html">LinuxKongress 2010</a> in Nuremberg, Germany. Later this year I'll also be speaking at the <a href="http://www.linuxplumbersconf.org/2010/ocw/proposals/873">Linux Plumbers Conference</a> in Boston, MA. Make sure to drop by if you want to learn about systemd or discuss exiciting new ideas or features with us.</p> Lennart PoetteringMon, 23 Aug 2010 13:32:00 +0200tag:0pointer.net,2010-08-23:/blog/projects/systemd-update.htmlprojectssystemd for Administrators, Part 1https://0pointer.net/blog/projects/systemd-for-admins-1.html <p>As many of you know, <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a> is the new Fedora init system, starting with F14, and it is also on its way to being adopted in a number of other distributions as well (for example, <a href="http://en.opensuse.org/SDB:Systemd">OpenSUSE</a>). For administrators systemd provides a variety of new features and changes and enhances the administrative process substantially. This blog story is the first part of a series of articles I plan to post roughly every week for the next months. In every post I will try to explain one new feature of systemd. Many of these features are small and simple, so these stories should be interesting to a broader audience. However, from time to time we'll dive a little bit deeper into the great new features systemd provides you with.</p> <h4>Verifying Bootup</h4> <p>Traditionally, when booting up a Linux system, you see a lot of little messages passing by on your screen. As we work on speeding up and parallelizing the boot process these messages are becoming visible for a shorter and shorter time only and be less and less readable -- if they are shown at all, given we use graphical boot splash technology like Plymouth these days. Nonetheless the information of the boot screens was and still is very relevant, because it shows you for each service that is being started as part of bootup, wether it managed to start up successfully or failed (with those green or red <tt>[ OK ]</tt> or <tt>[ FAILED ]</tt> indicators). To improve the situation for machines that boot up fast and parallelized and to make this information more nicely available during runtime, we added a feature to systemd that tracks and remembers for each service whether it started up successfully, whether it exited with a non-zero exit code, whether it timed out, or whether it terminated abnormally (by segfaulting or similar), both during start-up and runtime. By simply typing <tt>systemctl</tt> in your shell you can query the state of all services, both systemd native and SysV/LSB services:</p> <pre>[root@lambda] ~# systemctl UNIT LOAD ACTIVE SUB JOB DESCRIPTION dev-hugepages.automount loaded active running Huge Pages File System Automount Point dev-mqueue.automount loaded active running POSIX Message Queue File System Automount Point proc-sys-fs-binfmt_misc.automount loaded active waiting Arbitrary Executable File Formats File System Automount Point sys-kernel-debug.automount loaded active waiting Debug File System Automount Point sys-kernel-security.automount loaded active waiting Security File System Automount Point sys-devices-pc...0000:02:00.0-net-eth0.device loaded active plugged 82573L Gigabit Ethernet Controller <i>[...]</i> sys-devices-virtual-tty-tty9.device loaded active plugged /sys/devices/virtual/tty/tty9 -.mount loaded active mounted / boot.mount loaded active mounted /boot dev-hugepages.mount loaded active mounted Huge Pages File System dev-mqueue.mount loaded active mounted POSIX Message Queue File System home.mount loaded active mounted /home proc-sys-fs-binfmt_misc.mount loaded active mounted Arbitrary Executable File Formats File System abrtd.service loaded active running ABRT Automated Bug Reporting Tool accounts-daemon.service loaded active running Accounts Service acpid.service loaded active running ACPI Event Daemon atd.service loaded active running Execution Queue Daemon auditd.service loaded active running Security Auditing Service avahi-daemon.service loaded active running Avahi mDNS/DNS-SD Stack bluetooth.service loaded active running Bluetooth Manager console-kit-daemon.service loaded active running Console Manager cpuspeed.service loaded active exited LSB: processor frequency scaling support crond.service loaded active running Command Scheduler cups.service loaded active running CUPS Printing Service dbus.service loaded active running D-Bus System Message Bus getty@tty2.service loaded active running Getty on tty2 getty@tty3.service loaded active running Getty on tty3 getty@tty4.service loaded active running Getty on tty4 getty@tty5.service loaded active running Getty on tty5 getty@tty6.service loaded active running Getty on tty6 haldaemon.service loaded active running Hardware Manager hdapsd@sda.service loaded active running sda shock protection daemon irqbalance.service loaded active running LSB: start and stop irqbalance daemon iscsi.service loaded active exited LSB: Starts and stops login and scanning of iSCSI devices. iscsid.service loaded active exited LSB: Starts and stops login iSCSI daemon. livesys-late.service loaded active exited LSB: Late init script for live image. livesys.service loaded active exited LSB: Init script for live image. lvm2-monitor.service loaded active exited LSB: Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling mdmonitor.service loaded active running LSB: Start and stop the MD software RAID monitor modem-manager.service loaded active running Modem Manager netfs.service loaded active exited LSB: Mount and unmount network filesystems. NetworkManager.service loaded active running Network Manager ntpd.service loaded <span style="color: red"><b>maintenance maintenance</b></span> Network Time Service polkitd.service loaded active running Policy Manager prefdm.service loaded active running Display Manager rc-local.service loaded active exited /etc/rc.local Compatibility rpcbind.service loaded active running RPC Portmapper Service rsyslog.service loaded active running System Logging Service rtkit-daemon.service loaded active running RealtimeKit Scheduling Policy Service sendmail.service loaded active running LSB: start and stop sendmail sshd@172.31.0.53:22-172.31.0.4:36368.service loaded active running SSH Per-Connection Server sysinit.service loaded active running System Initialization systemd-logger.service loaded active running systemd Logging Daemon udev-post.service loaded active exited LSB: Moves the generated persistent udev rules to /etc/udev/rules.d udisks.service loaded active running Disk Manager upowerd.service loaded active running Power Manager wpa_supplicant.service loaded active running Wi-Fi Security Service avahi-daemon.socket loaded active listening Avahi mDNS/DNS-SD Stack Activation Socket cups.socket loaded active listening CUPS Printing Service Sockets dbus.socket loaded active running dbus.socket rpcbind.socket loaded active listening RPC Portmapper Socket sshd.socket loaded active listening sshd.socket systemd-initctl.socket loaded active listening systemd /dev/initctl Compatibility Socket systemd-logger.socket loaded active running systemd Logging Socket systemd-shutdownd.socket loaded active listening systemd Delayed Shutdown Socket dev-disk-by\x1...x1db22a\x1d870f1adf2732.swap loaded active active /dev/disk/by-uuid/fd626ef7-34a4-4958-b22a-870f1adf2732 basic.target loaded active active Basic System bluetooth.target loaded active active Bluetooth dbus.target loaded active active D-Bus getty.target loaded active active Login Prompts graphical.target loaded active active Graphical Interface local-fs.target loaded active active Local File Systems multi-user.target loaded active active Multi-User network.target loaded active active Network remote-fs.target loaded active active Remote File Systems sockets.target loaded active active Sockets swap.target loaded active active Swap sysinit.target loaded active active System Initialization LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. JOB = Pending job for the unit. 221 units listed. Pass --all to see inactive units, too. [root@lambda] ~#</pre> <p>(I have shortened the output above a little, and removed a few lines not relevant for this blog post.)</p> <p>Look at the ACTIVE column, which shows you the high-level state of a service (or in fact of any kind of unit systemd maintains, which can be more than just services, but we'll have a look on this in a later blog posting), whether it is <i>active</i> (i.e. running), <i>inactive</i> (i.e. not running) or in any other state. If you look closely you'll see one item in the list that is marked <i>maintenance</i> and highlighted in red. This informs you about a service that failed to run or otherwise encountered a problem. In this case this is ntpd. Now, let's find out what actually happened to ntpd, with the <tt>systemctl status</tt> command:</p> <pre>[root@lambda] ~# systemctl status ntpd.service ntpd.service - Network Time Service Loaded: loaded (/etc/systemd/system/ntpd.service) Active: <span style="color: red"><b>maintenance</b></span> Main: 953 (code=exited, status=255) CGroup: name=systemd:/systemd-1/ntpd.service [root@lambda] ~#</pre> <p>This shows us that NTP terminated during runtime (when it ran as PID 953), and tells us exactly the error condition: the process exited with an exit status of 255.</p> <p>In a later systemd version, we plan to hook this up to ABRT, <a href="https://bugzilla.redhat.com/show_bug.cgi?id=622773">as soon as this enhancement request is fixed</a>. Then, if <tt>systemctl status</tt> shows you information about a service that crashed it will direct you right-away to the appropriate crash dump in ABRT.</p> <p><b>Summary:</b> use <tt>systemctl</tt> and <tt>systemctl status</tt> as modern, more complete replacements for the traditional boot-up status messages of SysV services. <tt>systemctl status</tt> not only captures in more detail the error condition but also shows runtime errors in addition to start-up errors.</p> <p>That's it for this week, make sure to come back next week, for the next posting about systemd for administrators!</p> Lennart PoetteringMon, 23 Aug 2010 10:22:00 +0200tag:0pointer.net,2010-08-23:/blog/projects/systemd-for-admins-1.htmlprojectsDear Lazy Web,https://0pointer.net/blog/lenovo-laptop-codes.html <p>does anybody know how to decode those Lenovo ThinkPad model IDs? I am interested in the T410s. For example, there's the model NUK3AGE, and there's NUHFXGE, and there's NUHYXGE. Some web sites claim NUK3AGE has Nvidia graphics, others claim VGA is Intel-only. Some web sites claim it has a touch screen, others say the contrary. The Lenovo web site isn't helpful to figure out the differences between the models and what the feature set of the various models really is. I figured out the GE suffix indicates a german keyboard, but what about the remaining code? Anybody knows how to decypher those IDs or knows a reliable source explaining their feature set?</p> <p>Love,</p> <p>Lennart</p> Lennart PoetteringThu, 19 Aug 2010 13:27:00 +0200tag:0pointer.net,2010-08-19:/blog/lenovo-laptop-codes.htmlmiscMe too!https://0pointer.net/blog/projects/bad-lennart.html <p>I too forgot to mention that my accommodation at GUADEC was sponsored by the GNOME Foundation. Thanks guys!</p> <p><img alt="Sponsored" src="http://1.bp.blogspot.com/_2o81e3u4ZFU/TFfrnz00y6I/AAAAAAAAAW0/h8eVbnSRcc4/s400/sponsored-badge-simple.png" width="213" height="213" /></p> Lennart PoetteringTue, 03 Aug 2010 13:22:00 +0200tag:0pointer.net,2010-08-03:/blog/projects/bad-lennart.htmlprojectsDear Canonical,https://0pointer.net/blog/projects/sound-theme-canonical.html #ignore yes <p>Today I <a href="http://design.canonical.com/2010/08/ubuntu-needs-a-new-sound-theme/">came across this blog post of your design team</a>. In context of the recent criticism you had to endure regarding upstream contributions I am disappointed that you have not bothered to ping anybody from the upstream freedesktop sound theme (for example yours truly) about this in advance. No, you went to cook your own soup. What really disappoints me is that we have asked multiple times for help and support and contributions for the sound theme, to only very little success, and I even asked some of the Canonical engineers about this topic and in particular regarding some clarifications of the licensing of the old Ubuntu sound theme. I am sorry, but if you had listened, or looked, or asked you would have been aware that we were looking for somebody to maintain this actively, upstream -- and because we didn't have the time to maintain this we only did the absolute minimum work necessary and we only maintain this ourselves because noone else wanted to.</p> <p>It should be upstream first, downstream second.</p> <p>I am sorry if I sound like an always complaining prick to you. But believe me, I am not saying this because I wouldn't like you or anything like that. I am just saying this because I believe you could do things oh so much better.</p> <p>Please fix this. We want your contributions. Upstream.</p> Lennart PoetteringTue, 03 Aug 2010 02:24:00 +0200tag:0pointer.net,2010-08-03:/blog/projects/sound-theme-canonical.htmlprojectsBeating a Dead Horsehttps://0pointer.net/blog/projects/i-am-more-awesome-than-canonical.html <p>I guess it's a bit beating a dead horse, but I had a good laugh today when <a href="http://www.neary-consulting.com/index.php/services/gnome-census/">I learned</a> that I alone contributed more to GNOME than the entirety of Canonical, and only 800 additional commits seperating me from being more awesome than Nokia.</p> <p><tt>/me is amused</tt></p> Lennart PoetteringTue, 03 Aug 2010 01:46:00 +0200tag:0pointer.net,2010-08-03:/blog/projects/i-am-more-awesome-than-canonical.htmlprojectsInterview With Yours Trulyhttps://0pointer.net/blog/projects/i-like-listening-to-myself.html <p><a href="http://linuxoutlaws.com/podcast/ogg/160">Here's a podcast interview with yours truly</a> where I speak a little about PulseAudio and systemd. Seek to 64:43 for my lovely impetuous voice. There's also an interview with Owen just before mine.</p> Lennart PoetteringMon, 02 Aug 2010 17:13:00 +0200tag:0pointer.net,2010-08-02:/blog/projects/i-like-listening-to-myself.htmlprojectsLinux Plumbers Conference 2010 CFP Ending Soon!https://0pointer.net/blog/projects/plumbersconf-2010.html #nocomments y <p>The <a href="http://www.linuxplumbersconf.org/2010/06/03/linux-plumbers-conference-call-for-papers">Call for Papers</a> for the <a href="http://www.linuxplumbersconf.org/">Linux Plumbers Conference (LPC)</a> in November in Cambridge, Massachusetts is ending soon, on <b>July 19th 2010 (That's the upcoming monday!)</b>. It's a conference about the core infrastructure of Linux systems: the part of the system where userspace and the kernel interface. It's the only conference where the focus is specifically on getting together the kernel people who work on the userspace interfaces and the userspace people who have to deal with kernel interfaces. It's supposed to be a place where all the people doing infrastructure work sit down and talk, so that both parties understand better what the requirements and needs of the other are, and where we can work towards fixing the major problems we currently have with our lower-level infrastructure and APIs.</p> <p>The two previous LPCs were hugely successful (as reported on LWN on various occasions), and this time we hope to repeat that.</p> <p>Like the previous years, I will be running the Audio conference track of LPC, this time together with Mark Brown. Audio infrastructure on Linux has been steadily improving the last years all over the place, but there's still a lot to do. Join us at the LPC to discuss the next steps and help improving Linux audio further! If you are doing <b>audio infrastructure work</b> on Linux, make sure to attend and <b>submit a paper!</b></p> <p><a href="http://www.linuxplumbersconf.org/2010/ocw/login">Sign up soon!</a> <a href="http://www.linuxplumbersconf.org/2010/ocw/events/LPC2010/proposals/new">Send in your paper quickly!</a> <b>Only three days left to the end of the CFP!</b></p> <p><a href="http://www.linuxplumbersconf.org"><img style="border: 0" src="http://linuxplumbersconf.org/2010/style/tagline-2010.png" alt="Plumbers Logo" width="493" height="60" /></a></p> <p>(I am also planning to do a presentation there about <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>, together with Kay. Make sure to attend if you are interested in that topic.)</p> <p>See you in Boston!</p> Lennart PoetteringFri, 16 Jul 2010 18:35:00 +0200tag:0pointer.net,2010-07-16:/blog/projects/plumbersconf-2010.htmlprojectsAddendum on the Brokenness of File Lockinghttps://0pointer.net/blog/projects/locking2.html <p>I forgot to mention another central problem in my <a href="http://0pointer.de/blog/projects/locking">blog story about file locking on Linux</a>:</p> <p>Different machines have access to different features of the same file system. Here's an example: let's say you have two machines in your home LAN. You want them to share their <tt>$HOME</tt> directory, so that you (or your family) can use either machine and have access to all your (or their) data. So you export <tt>/home</tt> on one machine via NFS and mount it from the other machine.</p> <p>So far so good. But what happens to file locking now? Programs on the first machine see a fully-featured ext3 or ext4 file system, where all kinds of locking works (even though the API might suck as mentioned in the earlier blog story). But what about the other machine? If you set up <tt>lockd</tt> properly then POSIX locking will work on both. If you didn't one machine can use POSIX locking properly, the other cannot. And it gets even worse: as mentioned recent NFS implementations on Linux transparently convert client-side BSD locking into POSIX locking on the server side. Now, if the same application uses BSD locking on both the client and the server side from two instances they will end up with two orthogonal locks and although both sides think they have properly acquired a lock (and they actually did) they will overwrite each other's data, because those two locks are independent. (And one wonders why the NFS developers implemented this brokenness nonetheless...).</p> <p>This basically means that locking cannot be used unless it is verified that <i>everyone</i> accessing a file system can make use of the same file system feature set. If you use file locking on a file system you should do so only if you are sufficiently sure that nobody using a broken or weird NFS implementation might want to access and lock those files as well. And practically that is impossible. Even if <tt>fpathconf()</tt> was improved so that it could inform the caller whether it can successfully apply a file lock to a file, this would still not give any hint if the same is true for everybody else accessing the file. But that is essential when speaking of advisory (i.e. cooperative) file locking.</p> <p>And no, this isn't easy to fix. So again, the recommendation: forget about file locking on Linux, it's nothing more than a useless toy.</p> <p>Also read <a href="http://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html">Jeremy Allison's</a> (Samba) take on POSIX file locking. It's an interesting read.</p> Lennart PoetteringMon, 28 Jun 2010 19:49:00 +0200tag:0pointer.net,2010-06-28:/blog/projects/locking2.htmlprojectsOn the Brokenness of File Lockinghttps://0pointer.net/blog/projects/locking.html <p>It's amazing how far Linux has come without providing for proper file locking that works and is usable from userspace. A little overview why file locking is still in a very sad state:</p> <p>To begin with, there's a plethora of APIs, and all of them are awful:</p> <ul> <li>POSIX File locking as available with <tt>fcntl(F_SET_LK)</tt>: the POSIX locking API is the most portable one and in theory works across NFS. It can do byte-range locking. So much on the good side. On the bad side there's a lot more however: locks are bound to processes, not file descriptors. That means that this logic cannot be used in threaded environments unless combined with a process-local mutex. This is hard to get right, especially in libraries that do not know the environment they are run in, i.e. whether they are used in threaded environments or not. The worst part however is that POSIX locks are automatically released if a process calls <tt>close()</tt> on <i>any</i> (!) of its open file descriptors for that file. That means that when one part of a program locks a file and another by coincidence accesses it too for a short time, the first part's lock will be broken and it won't be notified about that. Modern software tends to load big frameworks (such as Gtk+ or Qt) into memory as well as arbitrary modules via mechanisms such as NSS, PAM, gvfs, GTK_MODULES, Apache modules, GStreamer modules where one module seldom can control what another module in the same process does or accesses. The effect of this is that POSIX locks are unusable in any non-trivial program where it cannot be ensured that a file that is locked is <i>never</i> accessed by any other part of the process at the same time. Example: a user managing daemon wants to write <tt>/etc/passwd</tt> and locks the file for that. At the same time in another thread (or from a stack frame further down) something calls <tt>getpwuid()</tt> which internally accesses <tt>/etc/passwd</tt> and causes the lock to be released, the first thread (or stack frame) not knowing that. Furthermore should two threads use the locking fcntl()s on the same file they will interfere with each other's locks and reset the locking ranges and flags of each other. On top of that locking cannot be used on any file that is publicly accessible (i.e. has the R bit set for groups/others, i.e. more access bits on than 0600), because that would otherwise effectively give arbitrary users a way to indefinitely block execution of any process (regardless of the UID it is running under) that wants to access and lock the file. This is generally not an acceptable security risk. Finally, while POSIX file locks are supposedly NFS-safe they not always really are as there are still many NFS implementations around where locking is not properly implemented, and NFS tends to be used in heterogenous networks. The biggest problem about this is that there is no way to properly detect whether file locking works on a specific NFS mount (or any mount) or not.</li> <li>The other API for POSIX file locks: <tt>lockf()</tt> is another API for the same mechanism and suffers by the same problems. One wonders why there are two APIs for the same messed up interface.</li> <li>BSD locking based on <tt>flock()</tt>. The semantics of this kind of locking are much nicer than for POSIX locking: locks are bound to file descriptors, not processes. This kind of locking can hence be used safely between threads and can even be inherited across <tt>fork()</tt> and <tt>exec()</tt>. Locks are only automatically broken on the <tt>close()</tt> call for the one file descriptor they were created with (or the last duplicate of it). On the other hand this kind of locking does not offer byte-range locking and suffers by the same security problems as POSIX locking, and works on even less cases on NFS than POSIX locking (i.e. on BSD and Linux &lt; 2.6.12 they were NOPs returning success). And since BSD locking is not as portable as POSIX locking this is sometimes an unsafe choice. Some OSes even find it funny to make <tt>flock()</tt> and <tt>fcntl(F_SET_LK)</tt> control the same locks. Linux treats them independently -- except for the cases where it doesn't: on Linux NFS they are transparently converted to POSIX locks, too now. What a chaos!</li> <li>Mandatory locking is available too. It's based on the POSIX locking API but not portable in itself. It's dangerous business and should generally be avoided in cleanly written software.</li> <li>Traditional lock file based file locking. This is how things where done traditionally, based around known atomicity guarantees of certain basic file system operations. It's a cumbersome thing, and requires polling of the file system to get notifications when a lock is released. Also, On Linux NFS &lt; 2.6.5 it doesn't work properly, since O_EXCL isn't atomic there. And of course the client cannot really know what the server is running, so again this brokeness is not detectable.</li> </ul> <h4>The Disappointing Summary</h4> <p>File locking on Linux is just broken. The broken semantics of POSIX locking show that the designers of this API apparently never have tried to actually use it in real software. It smells a lot like an interface that kernel people thought makes sense but in reality doesn't when you try to use it from userspace.</p> <p>Here's a list of places where you shouldn't use file locking due to the problems shown above: If you want to lock a file in $HOME, forget about it as $HOME might be NFS and locks generally are not reliable there. The same applies to every other file system that might be shared across the network. If the file you want to lock is accessible to more than your own user (i.e. an access mode > 0700), forget about locking, it would allow others to block your application indefinitely. If your program is non-trivial or threaded or uses a framework such as Gtk+ or Qt or any of the module-based APIs such as NSS, PAM, ... forget about about POSIX locking. If you care about portability, don't use file locking.</p> <p>Or to turn this around, the only case where it is kind of safe to use file locking is in trivial applications where portability is not key and by using BSD locking on a file system where you can rely that it is local and on files inaccessible to others. Of course, that doesn't leave much, except for private files in <tt>/tmp</tt> for trivial user applications.</p> <p>Or in one sentence: in its current state Linux file locking is unusable.</p> <p>And that is a shame.</p> <p><b>Update</b>: <a href="http://0pointer.de/blog/projects/locking2">Check out the follow-up story on this topic.</a></p> Lennart PoetteringSat, 26 Jun 2010 19:38:00 +0200tag:0pointer.net,2010-06-26:/blog/projects/locking.htmlprojectsOn IDshttps://0pointer.net/blog/projects/ids.html <p>When programming software that cooperates with software running on behalf of other users, other sessions or other computers it is often necessary to work with unique identifiers. These can be bound to various hardware and software objects as well as lifetimes. Often, when people look for such an ID to use they pick the wrong one because semantics and lifetime or the IDs are not clear. Here's a little incomprehensive list of IDs accessible on Linux and how you should or should not use them.</p> <h4>Hardware IDs</h4> <ol> <li><tt>/sys/class/dmi/id/product_uuid</tt>: The main board product UUID, as set by the board manufacturer and encoded in the BIOS DMI information. It may be used to identify a mainboard and only the mainboard. It changes when the user replaces the main board. Also, often enough BIOS manufacturers write bogus serials into it. In addition, it is x86-specific. Access for unprivileged users is forbidden. Hence it is of little general use.</li> <li><tt>CPUID/EAX=3</tt> CPU serial number: A CPU UUID, as set by the CPU manufacturer and encoded on the CPU chip. It may be used to identify a CPU and only a CPU. It changes when the user replaces the CPU. Also, most modern CPUs don't implement this feature anymore, and older computers tend to disable this option by default, controllable via a BIOS Setup option. In addition, it is x86-specific. Hence this too is of little general use.</li> <li><tt>/sys/class/net/*/address</tt>: One or more network MAC addresses, as set by the network adapter manufacturer and encoded on some network card EEPROM. It changes when the user replaces the network card. Since network cards are optional and there may be more than one the availability if this ID is not guaranteed and you might have more than one to choose from. On virtual machines the MAC addresses tend to be random. This too is hence of little general use.</li> <li><tt>/sys/bus/usb/devices/*/serial</tt>: Serial numbers of various USB devices, as encoded in the USB device EEPROM. Most devices don't have a serial number set, and if they have it is often bogus. If the user replaces his USB hardware or plugs it into another machine these IDs may change or appear in other machines. This hence too is of little use.</li> </ol> <p>There are various other hardware IDs available, many of which you may discover via the ID_SERIAL udev property of various devices, such hard disks and similar. They all have in common that they are bound to specific (replacable) hardware, not universally available, often filled with bogus data and random in virtualized environments. Or in other words: don't use them, don't rely on them for identification, unless you really know what you are doing and in general they do not guarantee what you might hope they guarantee.</p> <h4>Software IDs</h4> <ol> <li><tt>/proc/sys/kernel/random/boot_id</tt>: A random ID that is regenerated on each boot. As such it can be used to identify the local machine's current boot. It's universally available on any recent Linux kernel. It's a good and safe choice if you need to identify a specific boot on a specific booted kernel.</li> <li><tt>gethostname()</tt>, <tt>/proc/sys/kernel/hostname</tt>: A non-random ID configured by the administrator to identify a machine in the network. Often this is not set at all or is set to some default value such as <tt>localhost</tt> and not even unique in the local network. In addition it might change during runtime, for example because it changes based on updated DHCP information. As such it is almost entirely useless for anything but presentation to the user. It has very weak semantics and relies on correct configuration by the administrator. Don't use this to identify machines in a distributed environment. It won't work unless centrally administered, which makes it useless in a globalized, mobile world. It has no place in automatically generated filenames that shall be bound to specific hosts. Just don't use it, please. It's really not what many people think it is. <tt>gethostname()</tt> is standardized in POSIX and hence portable to other Unixes.</li> <li>IP Addresses returned by SIOCGIFCONF or the respective Netlink APIs: These tend to be dynamically assigned and often enough only valid on local networks or even only the local links (i.e. 192.168.x.x style addresses, or even 169.254.x.x/IPv4LL). Unfortunately they hence have little use outside of networking.</li> <li><tt>gethostid()</tt>: Returns a supposedly unique 32-bit identifier for the current machine. The semantics of this is not clear. On most machines this simply returns a value based on a local IPv4 address. On others it is administrator controlled via the <tt>/etc/hostid</tt> file. Since the semantics of this ID are not clear and most often is just a value based on the IP address it is almost always the wrong choice to use. On top of that 32bit are not particularly a lot. On the other hand this is standardized in POSIX and hence portable to other Unixes. It's probably best to ignore this value and if people don't want to ignore it they should probably symlink <tt>/etc/hostid</tt> to <tt>/var/lib/dbus/machine-id</tt> or something similar.</li> <li><tt>/var/lib/dbus/machine-id</tt>: An ID identifying a specific Linux/Unix installation. It does not change if hardware is replaced. It is not unreliable in virtualized environments. This value has clear semantics and is considered part of the D-Bus API. It is supposedly globally unique and portable to all systems that have D-Bus. On Linux, it is universally available, given that almost all non-embedded and even a fair share of the embedded machines ship D-Bus now. This is the recommended way to identify a machine, possibly with a fallback to the host name to cover systems that still lack D-Bus. If your application links against <tt>libdbus</tt>, you may access this ID with <tt>dbus_get_local_machine_id()</tt>, if not you can read it directly from the file system.</li> <li><tt>/proc/self/sessionid</tt>: An ID identifying a specific Linux login session. This ID is maintained by the kernel and part of the auditing logic. It is uniquely assigned to each login session during a specific system boot, shared by each process of a session, even across su/sudo and cannot be changed by userspace. Unfortunately some distributions have so far failed to set things up properly for this to work (Hey, you, Ubuntu!), and this ID is always <tt>(uint32_t) -1</tt> for them. But there's hope they get this fixed eventually. Nonetheless it is a good choice for a unique session identifier on the local machine and for the current boot. To make this ID globally unique it is best combined with <tt>/proc/sys/kernel/random/boot_id</tt>.</li> <li><tt>getuid()</tt>: An ID identifying a specific Unix/Linux user. This ID is usually automatically assigned when a user is created. It is not unique across machines and may be reassigned to a different user if the original user was deleted. As such it should be used only locally and with the limited validity in time in mind. To make this ID globally unique it is not sufficient to combine it with <tt>/var/lib/dbus/machine-id</tt>, because the same ID might be used for a different user that is created later with the same UID. Nonetheless this combination is often good enough. It is available on all POSIX systems.</li> <li><tt>ID_FS_UUID</tt>: an ID that identifies a specific file system in the udev tree. It is not always clear how these serials are generated but this tends to be available on almost all modern disk file systems. It is not available for NFS mounts or virtual file systems. Nonetheless this is often a good way to identify a file system, and in the case of the root directory even an installation. However due to the weakly defined generation semantics the D-Bus machine ID is generally preferrable.</li> </ol> <h4>Generating IDs</h4> <p>Linux offers a kernel interface to generate UUIDs on demand, by reading from <tt>/proc/sys/kernel/random/uuid</tt>. This is a very simple interface to generate UUIDs. That said, the logic behind UUIDs is unnecessarily complex and often it is a better choice to simply read 16 bytes or so from <tt>/dev/urandom</tt>.</p> <h4>Summary</h4> <p>And the gist of it all: <b>Use <tt>/var/lib/dbus/machine-id</tt>! Use <tt>/proc/self/sessionid</tt>! Use <tt>/proc/sys/kernel/random/boot_id</tt>! Use <tt>getuid()</tt>! Use <tt>/dev/urandom</tt>!</b> And forget about the rest, in particular the host name, or the hardware IDs such as DMI. And keep in mind that you may combine the aforementioned IDs in various ways to get different semantics and validity constraints.</p> Lennart PoetteringSat, 26 Jun 2010 17:02:00 +0200tag:0pointer.net,2010-06-26:/blog/projects/ids.htmlprojectsSlides from LinuxTag 2010https://0pointer.net/blog/projects/linuxtag-2010-slides.html <p>On popular request, <a href="http://0pointer.de/public/systemd-presentation-linuxtag2010.pdf">here are my (terse) slides</a> from LinuxTag on <a href="http://www.freedesktop.org/wiki/Software/systemd">systemd</a>.</p> Lennart PoetteringTue, 15 Jun 2010 14:15:00 +0200tag:0pointer.net,2010-06-15:/blog/projects/linuxtag-2010-slides.htmlprojectsChange of Planshttps://0pointer.net/blog/projects/linuxtag2k10.html <p>The upcoming week I'll do two talks at LinuxTag 2010 at the Berlin Fair Grounds. One of them was only added to the schedule today, <a href="http://www.linuxtag.org/2010/en/program/free-conference/wednesday/details.html?talkid=387">about systemd</a>. Systemd has never been presented in a public talk before, so make sure to attend this historic moment... ;-). <a href="http://0pointer.de/blog/projects/systemd.html">Read about</a> <a href="http://www.freedesktop.org/wiki/Software/systemd">what has</a> <a href="http://lwn.net/Articles/389149/">been written</a> <a href="http://0pointer.de/blog/projects/systemd-in-the-news.html">about systemd so far</a>, so that you can ask the sharpest questions during my presentation.</p> <p>My second talk might be about stuff a little less reported in the press, but still very interesting, about <a href="http://www.linuxtag.org/2010/en/program/free-conference/all-speakers/details.html?talkid=425">Surround Sound in Gnome</a>.</p> <p>See you at LinuxTag!</p> Lennart PoetteringSat, 05 Jun 2010 19:25:00 +0200tag:0pointer.net,2010-06-05:/blog/projects/linuxtag2k10.htmlprojectsMango Lassi is Backhttps://0pointer.net/blog/projects/mango-lassi-is-back.html <p><img src="http://0pointer.de/public/mango-lassi-icon.png" width="48" height="48" alt="Mango Lassi's Icon" /></p> <p><a href="http://github.com/herzi/mango-lassi">Sven Herzberg</a> has recently been doing a lot of work on <a href="http://0pointer.de/blog/projects/mango-lassi.html">Mango Lassi</a>, a project deserving love but which I as its original author haven't touched in 3 years.</p> <p>His work is already bearing fruits:</p> <p><a href="http://www.lanedo.com/~herzi/mango-lassi-herzi.png"><img src="http://www.lanedo.com/~herzi/mango-lassi-herzi.png" width="1280" height="800" alt="Mango Lassi" /></a></p> <p>Distribution packagers, please go and package <a href="http://github.com/herzi/mango-lassi">his version</a>, Mango Lassi is an awesome, wonderful tool that needs distributor love.</p> <p>If you want to use Mango Lassi without waiting for the distribution packagers to catch up, Sven <a href="https://build.opensuse.org/package/show?package=mango-lassi&amp;project=home%3Aherzi">has built some packages for you in the OpenSUSE Build Service</a>.</p> <p>Sven, KUTGW!</p> Lennart PoetteringWed, 19 May 2010 17:39:00 +0200tag:0pointer.net,2010-05-19:/blog/projects/mango-lassi-is-back.htmlprojectsName Your Threadshttps://0pointer.net/blog/projects/name-your-threads.html <p>Stefan Kost recently pointed me to the fact that the Linux system call <tt>prctl(PR_SET_NAME)</tt> does not in fact change the process name, but the task name (comm field) -- in contrast to what <a href="http://www.kernel.org/doc/man-pages/online/pages/man2/prctl.2.html">the man page</a> suggests.</p> <p>That makes it very useful for naming threads, since you can read back the name you set with PR_SET_NAME earlier from the <tt>/proc</tt> file system (<tt>/proc/$PID/task/$TID/comm</tt> on newer kernels, <tt>/proc/$PID/task/$TID/stat</tt>'s second field on older kernels), and hence distuingish which thread might be responsible for the high CPU load or similar problems.</p> <p>So, now go, if you have a project which involves a lot of threads, name them all individually, and make it easier to debug them. What's missing now, of course, is that gdb learns this and shows the comm name when doing <tt>info threads</tt>.</p> <p>I have changed PulseAudio now to name all threads it creates.</p> <p>Of course, what would be even better than this is full file system extended attribute support in procfs, so that we could attach arbitrary information to processes and threads, including references to <tt>.desktop</tt> files and such.</p> Lennart PoetteringTue, 11 May 2010 01:22:00 +0200tag:0pointer.net,2010-05-11:/blog/projects/name-your-threads.htmlprojectssystemd Now Has a Web Sitehttps://0pointer.net/blog/projects/systemd-website.html <p>We now have a <a href="http://www.freedesktop.org/wiki/Software/systemd">web site</a>, a <a href="http://lists.freedesktop.org/mailman/listinfo/systemd-devel">mailing list</a>, a bugzilla component and moved our <a href="http://cgit.freedesktop.org/systemd/">git repositories to freedesktop.org</a>. Make sure to update your check-outs.</p> <p>For more details see <a href="http://www.freedesktop.org/wiki/Software/systemd">our new web site</a>.</p> Lennart PoetteringFri, 07 May 2010 22:29:00 +0200tag:0pointer.net,2010-05-07:/blog/projects/systemd-website.htmlprojectsLAC Video Streams Onlinehttps://0pointer.net/blog/projects/lac-video.html <p>The great people from the <a href="http://lac.linuxaudio.org/2010/">Linux Audio Conference</a> <a href="http://lists.linuxaudio.org/pipermail/linux-audio-dev/2010-May/027529.html">uploaded the video streams from the event</a>. <a href="http://www.linuxproaudio.org/lac2010/">Among them</a> you can find <a href="http://www.linuxproaudio.org/lac2010/day1_1400_Pro_Audio_is_Easy_Consumer_Audio_is_Hard.ogv">my own presentation</a>. Enjoy!</p> Lennart PoetteringWed, 05 May 2010 20:22:00 +0200tag:0pointer.net,2010-05-05:/blog/projects/lac-video.htmlprojectsPulseAudio and Jackhttps://0pointer.net/blog/projects/when-pa-and-when-not.html #nocomments yes <p>One thing became very clear to me during my trip to the <a href="http://lac.linuxaudio.org/2010/">Linux Audio Conference 2010</a> in Utrecht: even many pro audio folks are not sure what <a href="http://jackaudio.org/">Jack</a> does that <a href="http://www.pulseaudio.org/">PulseAudio</a> doesn't do and what PulseAudio does that Jack doesn't do; why they are not competing, why you cannot replace one by the other, and why merging them (at least in the short term) might not make immediate sense. In other words, why millions of phones on this world run PulseAudio and not Jack, and why a music studio running PulseAudio is crack.</p> <p>To light this up a bit and for future reference I'll try to explain in the following text why there is this seperation between the two systems and why this isn't necessarily bad. This is mostly a written up version of (parts of) <a href="http://lac.linuxaudio.org/2010/download/lennarts-talk-auf-der-lac-2010.pdf">my slides from LAC</a>, so if you attended that event you might find little new, but I hope it is interesting nonetheless.</p> <p>This is mostly written from my perspective as a hacker working on consumer audio stuff (more specifically having written most of PulseAudio), but I am sure most pro audio folks would agree with the points I raise here, and have more things to add. What I explain below is in no way comprehensive, just a list of a couple of points I think are the most important, as they touch the very core of both systems (and we ignore all the toppings here, i.e. sound effects, yadda, yadda).</p> <p>First of all let's clear up the background of the sound server use cases here:</p> <table border="1"> <tr><th>Consumer Audio (i.e. PulseAudio)</th> <th>Pro Audio (i.e. Jack)</th></tr> <tr><td>Reducing power usage is a defining requirement, most systems are battery powered (Laptops, Cell Phones).</td> <td>Power usage usually not an issue, power comes out of the wall.</td></tr> <tr><td>Must support latencies low enough for telephony and games. Also covers high latency uses, such as movie and music playback (2s of latency is a good choice).</td> <td>Minimal latencies are a definining requirement.</td></tr> <tr><td>System is highly dynamic, with applications starting/stopping, hardware added and removed all the time.</td> <td>System is usually static in its configuration during operation.</td></tr> <tr><td>User is usually not proficient in the technologies used.<small><sup>[1]</sup></small></td> <td>User is usually a professional and knows audio technology and computers well.</td></tr> <tr><td>User is not necessarily the administrator of his machine, might have limited access.</td> <td>User usually administrates his own machines, has root privileges.</td></tr> <tr><td>Audio is just one use of the system among many, and often just a background job.</td> <td>Audio is the primary purpose of the system.</td></tr> <tr><td>Hardware tends to have limited resources and be crappy and cheap.</td> <td>Hardware is powerful, expensive and high quality.</td></tr> </table> <p>Of course, things are often not as black and white like this, there are uses that fall in the middle of these two areas.</p> <p>From the table above a few conclusions may be drawn:</p> <ul> <li>A consumer sound system must support both low and high latency operation. Since low latencies mean high CPU load and hence high power consumption<small><sup>[2]</sup></small> (Heisenberg...), a system should always run with the highest latency latency possible, but the lowest latency necessary.</li> <li>Since the consumer system is highly dynamic in its use latencies must be adjusted dynamically too. That makes a design such as PulseAudio's <a href="http://0pointer.de/blog/projects/pulse-glitch-free.html">timer-based scheduling</a> important.</li> <li>A pro audio system's primary optimization target is low latency. Low power usage, dynamic changeble configuration (i.e. a short drop-out while you change your pipeline is acceptable) and user-friendliness may be sacrificed for that.</li> <li>For large buffer sizes a zero-copy design suggests itself: since data blocks are large the cache pressure can be considerably reduced by zero-copy designs. Only for large buffers the cost of passing pointers around is considerable smaller than the cost of passing around the data itself (or the other way round: if your audio data has the same size as your pointers, then passing pointers around is useless extra work).</li> <li>On a resource constrained system the ideal audio pipeline does not touch and convert the data passed along it unnecessarily. That makes it important to support natively the sample types and interleaving modes of the audio source or destination.</li> <li>A consumer system needs to simplify the view on the hardware, hide the its complexity: hide redundant mixer elements, or merge them while making use of the hardware capabilities, and extending it in software so that the same functionality is provided on all hardware. A production system should not hide or simplify the hardware functionality.</li> <li>A consumer system should not drop-out when a client misbehaves or the configuration changes (OTOH if it happens in exceptions it is not disastrous either). A synchronous pipeline is hence not advisable, clients need to supply their data asynchronously.</li> <li>In a pro audio system a drop-out during reconfiguration is acceptable, during operation unacceptable.</li> <li>In consumer audio we need to make compromises on resource usage, which pro audio does not have to commit to. Example: a pro audio system can issue <tt>memlock()</tt> with little limitations since the hardware is powerful (i.e. a lot of RAM available) and audio is the primary purpose. A consumer audio system cannot do that because that call practically makes memory unavailable to other applications, increasing their swap pressure. And since audio is not the primary purpose of the system and resources limited we hence need to find a different way.</li> </ul> <p>Jack has been designed for low latencies, where synchronous operation is advisable, meaning that a misbehaving client call stall the entire pipeline. Changes of the pipeline or latencies usually result in drop-outs in one way or the other, since the entire pipeline is reconfigured, from the hardware to the various clients. Jack only supports FLOAT32 samples and non-interleaved audio channels (and that is a good thing). Jack does not employ reference-counted zero-copy buffers. It does not try to simplify the hardware mixer in any way.</p> <p>PulseAudio OTOH can deal with varying latancies, <a href="http://0pointer.de/blog/projects/pulse-glitch-free.html">dynamically adjusting to the lowest latencies any of the connected clients needs</a>. Client communication is fully asynchronous, a single client cannot stall the entire pipeline. PulseAudio supports a variety of PCM formats and channel setups. PulseAudio's design is heavily based on reference-counted zero-copy buffers that are passed around, even between processes, instead of the audio data itself. PulseAudio tries to simplify the hardware mixer as suggested above.</p> <p>Now, the two paragraphs above hopefully show how Jack is more suitable for the pro audio use case and PulseAudio more for the consumer audio use case. One question asks itself though: can we marry the two approaches? Yes, we probably can, MacOS has a unified approach for both uses. However, it is not clear this would be a good idea. First of all, a system with the complexities introduced by sample format/channel mapping conversion, as well as dynamically changing latencies and pipelines, and asynchronous behaviour would certainly be much less attractive to pro audio developers. In fact, that Jack limits itself to synchronous, FLOAT32-only, non-interleaved-only audio streams is one of the big features of its design. Marrying the two approaches would corrupt that. A merged solution would probably not have a good stand in the community.</p> <p>But it goes even further than this: what would the use case for this be? After all, most of the time, you don't want your event sounds, your Youtube, your VoIP and your Rhythmbox mixed into the new record you are producing. Hence a clear seperation between the two worlds might even be handy?</p> <p>Also, let's not forget that we lack the manpower to even create such an audio chimera.</p> <p>So, where to from here? Well, I think we should put the focus on cooperation instead of amalgamation: teach PulseAudio to go out of the way as soon as Jack needs access to the device, and optionally make PulseAudio a normal JACK client while both are running. That way, the user has the option to use the PulseAudio supplied streams, but normally does not see them in his pipeline. The first part of this has already been implemented: Jack2 and PulseAudio do not fight for the audio device, a friendly handover takes place. Jack takes precedence, PulseAudio takes the back seat. The second part is still missing: you still have to manually hookup PulseAudio to Jack if you are interested in its streams. If both are implemented starting Jack basically has the effect of replacing PulseAudio's core with the Jack core, while still providing full compatibility with PulseAudio clients.</p> <p>And that I guess is all I have to say on the entire Jack and PulseAudio story.</p> <p>Oh, one more thing, while we are at clearing things up: some news sites claim that PulseAudio's not necessarily stellar reputation in some parts of the community comes from Ubuntu and other distributions having integrated it too early. Well, let me stress here explicitly, that while they might have made a mistake or two in packaging PulseAudio and I publicly pointed that out (and probably not in a too friendly way), I do believe that the point in time they adopted it was right. Why? Basically, it's a chicken and egg problem. If it is not used in the distributions it is not tested, and there is no pressure to get fixed what then turns out to be broken: in PulseAudio itself, and in both the layers on top and below of it. Don't forget that pushing a new layer into an existing stack will break a lot of assumptions that the neighboring layers made. Doing this <i>must</i> break things. Most Free Software projects could probably use more developers, and that is particularly true for Audio on Linux. And given that that is how it is, pushing the feature in at that point in time was the right thing to do. Or in other words, if the features are right, and things do work correctly as far as the limited test base the developers control shows, then one day you need to push into the distributions, even if this might break setups and software that previously has not been tested, unless you want to stay stuck in your development indefinitely. So yes, Ubuntu, I think you did well with adopting PulseAudio when you did.</p> <h5>Footnotes</h5> <p><small>[1] Side note: yes, consumers tend not to know what <a href="http://en.wikipedia.org/wiki/Decibel">dB</a> is, and expect volume settings in "percentages", a mostly meaningless unit in audio. This even spills into projects like VLC or Amarok which expose linear volume controls (which is a really bad idea).</small></p> <p><small>[2] In case you are wondering why that is the case: if the latency is low the buffers must be sized smaller. And if the buffers are sized smaller then the CPU will have to wake up more often to fill them up for the same playback time. This drives up the CPU load since less actual payload can be processed for the amount of housekeeping that the CPU has to do during each buffer iteration. Also, frequent wake-ups make it impossible for the CPU to go to deeper sleep states. Sleep states are the primary way for modern CPUs to save power.</small></p> Lennart PoetteringWed, 05 May 2010 02:44:00 +0200tag:0pointer.net,2010-05-05:/blog/projects/when-pa-and-when-not.htmlprojectssystemd In The Newshttps://0pointer.net/blog/projects/systemd-in-the-news.html #nocomments yes <p>A few news sites brought articles (some shorter, others longer) about <a href="http://0pointer.de/blog/projects/systemd.html">last week's blog story on systemd</a>:</p> <ul> <li><a href="http://lwn.net/Articles/385536/">Linux Weekly News</a></li> <li><a href="http://www.h-online.com/open/news/item/Systemd-presented-as-SysV-Init-and-Upstart-alternative-991875.html">The H Open</a></li> <li><a href="http://www.osnews.com/story/23232/Rethinking_PID_1">OSNews</a></li> <li><a href="http://www.pro-linux.de/news/1/15621/systemd-alternative-zu-init-vorgeschlagen.html">Pro Linux (german)</a></li> <li><a href="http://www.golem.de/1005/74884.html">Golem (german)</a></li> <li><a href="http://www.heise.de/ct/meldung/SysVinit-und-Upstart-Alternative-Systemd-vorgestellt-991662.html">c't (german)</a></li> <li><a href="http://www.reddit.com/r/coding/comments/bym56/systemd_rethinking_pid_1/">Reddit #1</a>, <a href="http://www.reddit.com/r/linux/comments/bybxf/rethinking_pid_1/">Reddit #2</a></li> </ul> <p>Related to this, <a href="http://www.netsplit.com/2010/04/30/on-systemd/">Scott's cordial reply</a>.</p> <p><a href="http://brainstorm.ubuntu.com/idea/24701/">And this I find funny, make sure to vote for it... ;-)</a></p> <p>Many of the comments on those stories are quite interesting, though sometimes a little, uh..., misled... ;-)</p> <p>Generally the reception of the ideas seems to be very positive. And that's certainly good news and encouraging.</p> Lennart PoetteringWed, 05 May 2010 01:30:00 +0200tag:0pointer.net,2010-05-05:/blog/projects/systemd-in-the-news.htmlprojectsRethinking PID 1https://0pointer.net/blog/projects/systemd.html <p>If you are well connected or good at reading between the lines you might already know what this blog post is about. But even then you may find this story interesting. So grab a cup of coffee, sit down, and read what's coming.</p> <p>This blog story is long, so even though I can only recommend reading the long story, here's the one sentence summary: we are experimenting with a new init system and it is fun.</p> <p><a href="http://git.0pointer.de/?p=systemd.git">Here's the code.</a> And here's the story:</p> <h4>Process Identifier 1</h4> <p>On every Unix system there is one process with the special process identifier 1. It is started by the kernel before all other processes and is the parent process for all those other processes that have nobody else to be child of. Due to that it can do a lot of stuff that other processes cannot do. And it is also responsible for some things that other processes are not responsible for, such as bringing up and maintaining userspace during boot.</p> <p>Historically on Linux the software acting as PID 1 was the venerable sysvinit package, though it had been showing its age for quite a while. Many replacements have been suggested, only one of them really took off: <a href="http://upstart.ubuntu.com/">Upstart</a>, which has by now found its way into all major distributions.</p> <p>As mentioned, the central responsibility of an init system is to bring up userspace. And a good init system does that fast. Unfortunately, the traditional SysV init system was not particularly fast.</p> <p>For a fast and efficient boot-up two things are crucial:</p> <ul> <li>To start <b>less</b>.</li> <li>And to start <b>more</b> in <i>parallel</i>.</li> </ul> <p>What does that mean? Starting less means starting fewer services or deferring the starting of services until they are actually needed. There are some services where we know that they will be required sooner or later (syslog, D-Bus system bus, etc.), but for many others this isn't the case. For example, bluetoothd does not need to be running unless a bluetooth dongle is actually plugged in or an application wants to talk to its D-Bus interfaces. Same for a printing system: unless the machine physically is connected to a printer, or an application wants to print something, there is no need to run a printing daemon such as CUPS. Avahi: if the machine is not connected to a network, there is no need to run <a href="http://avahi.org">Avahi</a>, unless some application wants to use its APIs. And even SSH: as long as nobody wants to contact your machine there is no need to run it, as long as it is then started on the first connection. (And admit it, on most machines where sshd might be listening somebody connects to it only every other month or so.)</p> <p>Starting more in parallel means that if we have to run something, we should not serialize its start-up (as sysvinit does), but run it all at the same time, so that the available CPU and disk IO bandwidth is maxed out, and hence the overall start-up time minimized.</p> <h4>Hardware and Software Change Dynamically</h4> <p>Modern systems (especially general purpose OS) are highly dynamic in their configuration and use: they are mobile, different applications are started and stopped, different hardware added and removed again. An init system that is responsible for maintaining services needs to listen to hardware and software changes. It needs to dynamically start (and sometimes stop) services as they are needed to run a program or enable some hardware.</p> <p>Most current systems that try to parallelize boot-up still synchronize the start-up of the various daemons involved: since Avahi needs D-Bus, D-Bus is started first, and only when D-Bus signals that it is ready, Avahi is started too. Similar for other services: livirtd and X11 need HAL (well, I am considering the Fedora 13 services here, ignore that HAL is obsolete), hence HAL is started first, before livirtd and X11 are started. And libvirtd also needs Avahi, so it waits for Avahi too. And all of them require syslog, so they all wait until Syslog is fully started up and initialized. And so on.</p> <h4>Parallelizing Socket Services</h4> <p>This kind of start-up synchronization results in the serialization of a significant part of the boot process. Wouldn't it be great if we could get rid of the synchronization and serialization cost? Well, we can, actually. For that, we need to understand what exactly the daemons require from each other, and why their start-up is delayed. For traditional Unix daemons, there's one answer to it: they wait until the socket the other daemon offers its services on is ready for connections. Usually that is an AF_UNIX socket in the file-system, but it could be AF_INET[6], too. For example, clients of D-Bus wait that <tt>/var/run/dbus/system_bus_socket</tt> can be connected to, clients of syslog wait for <tt>/dev/log</tt>, clients of CUPS wait for <tt>/var/run/cups/cups.sock</tt> and NFS mounts wait for <tt>/var/run/rpcbind.sock</tt> and the portmapper IP port, and so on. And think about it, this is actually the only thing they wait for!</p> <p>Now, if that's all they are waiting for, if we manage to make those sockets available for connection earlier and only actually wait for that instead of the full daemon start-up, then we can speed up the entire boot and start more processes in parallel. So, how can we do that? Actually quite easily in Unix-like systems: we can create the listening sockets <b>before</b> we actually start the daemon, and then just pass the socket during <tt>exec()</tt> to it. That way, we can create <b>all</b> sockets for <b>all</b> daemons in one step in the init system, and then in a second step run all daemons at once. If a service needs another, and it is not fully started up, that's completely OK: what will happen is that the connection is queued in the providing service and the client will potentially block on that single request. But only that one client will block and only on that one request. Also, dependencies between services will no longer necessarily have to be configured to allow proper parallelized start-up: if we start all sockets at once and a service needs another it can be sure that it can connect to its socket.</p> <p>Because this is at the core of what is following, let me say this again, with different words and by example: if you start syslog and and various syslog clients at the same time, what will happen in the scheme pointed out above is that the messages of the clients will be added to the <tt>/dev/log</tt> socket buffer. As long as that buffer doesn't run full, the clients will not have to wait in any way and can immediately proceed with their start-up. As soon as syslog itself finished start-up, it will dequeue all messages and process them. Another example: we start D-Bus and several clients at the same time. If a synchronous bus request is sent and hence a reply expected, what will happen is that the client will have to block, however only that one client and only until D-Bus managed to catch up and process it.</p> <p>Basically, the kernel socket buffers help us to maximize parallelization, and the ordering and synchronization is done by the kernel, without any further management from userspace! And if all the sockets are available before the daemons actually start-up, dependency management also becomes redundant (or at least secondary): if a daemon needs another daemon, it will just connect to it. If the other daemon is already started, this will immediately succeed. If it isn't started but in the process of being started, the first daemon will not even have to wait for it, unless it issues a synchronous request. And even if the other daemon is not running at all, it can be auto-spawned. From the first daemon's perspective there is no difference, hence dependency management becomes mostly unnecessary or at least secondary, and all of this in optimal parallelization and optionally with on-demand loading. On top of this, this is also more robust, because the sockets stay available regardless whether the actual daemons might temporarily become unavailable (maybe due to crashing). In fact, you can easily write a daemon with this that can run, and exit (or crash), and run again and exit again (and so on), and all of that without the clients noticing or loosing any request.</p> <p>It's a good time for a pause, go and refill your coffee mug, and be assured, there is more interesting stuff following.</p> <p>But first, let's clear a few things up: is this kind of logic new? No, it certainly is not. The most prominent system that works like this is Apple's launchd system: on MacOS the listening of the sockets is pulled out of all daemons and done by launchd. The services themselves hence can all start up in parallel and dependencies need not to be configured for them. And that is actually a really ingenious design, and the primary reason why MacOS manages to provide the fantastic boot-up times it provides. I can highly recommend <a href="https://www.youtube.com/watch?v=SjrtySM9Dns">this video</a> where the launchd folks explain what they are doing. Unfortunately this idea never really took on outside of the Apple camp.</p> <p>The idea is actually even older than launchd. Prior to launchd the venerable <tt>inetd</tt> worked much like this: sockets were centrally created in a daemon that would start the actual service daemons passing the socket file descriptors during <tt>exec()</tt>. However the focus of <tt>inetd</tt> certainly wasn't local services, but Internet services (although later reimplementations supported AF_UNIX sockets, too). It also wasn't a tool to parallelize boot-up or even useful for getting implicit dependencies right.</p> <p>For TCP sockets <tt>inetd</tt> was primarily used in a way that for every incoming connection a new daemon instance was spawned. That meant that for each connection a new process was spawned and initialized, which is not a recipe for high-performance servers. However, right from the beginning <tt>inetd</tt> also supported another mode, where a single daemon was spawned on the first connection, and that single instance would then go on and also accept the follow-up connections (that's what the <tt>wait</tt>/<tt>nowait</tt> option in <tt>inetd.conf</tt> was for, a particularly badly documented option, unfortunately.) Per-connection daemon starts probably gave inetd its bad reputation for being slow. But that's not entirely fair.</p> <h4>Parallelizing Bus Services</h4> <p>Modern daemons on Linux tend to provide services via D-Bus instead of plain AF_UNIX sockets. Now, the question is, for those services, can we apply the same parallelizing boot logic as for traditional socket services? Yes, we can, D-Bus already has all the right hooks for it: using bus activation a service can be started the first time it is accessed. Bus activation also gives us the minimal per-request synchronisation we need for starting up the providers and the consumers of D-Bus services at the same time: if we want to start Avahi at the same time as CUPS (side note: CUPS uses Avahi to browse for mDNS/DNS-SD printers), then we can simply run them at the same time, and if CUPS is quicker than Avahi via the bus activation logic we can get D-Bus to queue the request until Avahi manages to establish its service name.</p> <p>So, in summary: the socket-based service activation and the bus-based service activation together enable us to start <b>all</b> daemons in parallel, without any further synchronization. Activation also allows us to do lazy-loading of services: if a service is rarely used, we can just load it the first time somebody accesses the socket or bus name, instead of starting it during boot.</p> <p>And if that's not great, then I don't <b>know</b> what is great!</p> <h4>Parallelizing File System Jobs</h4> <p>If you look <a href="http://picasaweb.google.com/betsubetsu43/Fedora#5179125455943690130">at the serialization graphs of the boot process</a> of current distributions, there are more synchronisation points than just daemon start-ups: most prominently there are file-system related jobs: mounting, fscking, quota. Right now, on boot-up a lot of time is spent idling to wait until all devices that are listed in <tt>/etc/fstab</tt> show up in the device tree and are then fsck'ed, mounted, quota checked (if enabled). Only after that is fully finished we go on and boot the actual services.</p> <p>Can we improve this? It turns out we can. Harald Hoyer came up with the idea of using the venerable autofs system for this:</p> <p>Just like a <tt>connect()</tt> call shows that a service is interested in another service, an <tt>open()</tt> (or a similar call) shows that a service is interested in a specific file or file-system. So, in order to improve how much we can parallelize we can make those apps wait only if a file-system they are looking for is not yet mounted and readily available: we set up an autofs mount point, and then when our file-system finished fsck and quota due to normal boot-up we replace it by the real mount. While the file-system is not ready yet, the access will be queued by the kernel and the accessing process will block, but only that one daemon and only that one access. And this way we can begin starting our daemons even before all file systems have been fully made available -- without them missing any files, and maximizing parallelization.</p> <p>Parallelizing file system jobs and service jobs does not make sense for <tt>/</tt>, after all that's where the service binaries are usually stored. However, for file-systems such as <tt>/home</tt>, that usually are bigger, even encrypted, possibly remote and seldom accessed by the usual boot-up daemons, this can improve boot time considerably. It is probably not necessary to mention this, but virtual file systems, such as procfs or sysfs should never be mounted via autofs.</p> <p>I wouldn't be surprised if some readers might find integrating autofs in an init system a bit fragile and even weird, and maybe more on the "crackish" side of things. However, having played around with this extensively I can tell you that this actually feels quite right. Using autofs here simply means that we can create a mount point without having to provide the backing file system right-away. In effect it hence only delays accesses. If an application tries to access an autofs file-system and we take very long to replace it with the real file-system, it will hang in an interruptible sleep, meaning that you can safely cancel it, for example via C-c. Also note that at any point, if the mount point should not be mountable in the end (maybe because fsck failed), we can just tell autofs to return a clean error code (like ENOENT). So, I guess what I want to say is that even though integrating autofs into an init system might appear adventurous at first, our experimental code has shown that this idea works surprisingly well in practice -- if it is done for the right reasons and the right way.</p> <p>Also note that these should be <i>direct</i> autofs mounts, meaning that from an application perspective there's little effective difference between a classic mount point and one based on autofs.</p> <h4>Keeping the First User PID Small</h4> <p>Another thing we can learn from the MacOS boot-up logic is that shell scripts are evil. Shell is fast and shell is slow. It is fast to hack, but slow in execution. The classic sysvinit boot logic is modelled around shell scripts. Whether it is <tt>/bin/bash</tt> or any other shell (that was written to make shell scripts faster), in the end the approach is doomed to be slow. On my system the scripts in <tt>/etc/init.d</tt> call <tt>grep</tt> at least 77 times. <tt>awk</tt> is called 92 times, <tt>cut</tt> 23 and <tt>sed</tt> 74. Every time those commands (and others) are called, a process is spawned, the libraries searched, some start-up stuff like i18n and so on set up and more. And then after seldom doing more than a trivial string operation the process is terminated again. Of course, that has to be incredibly slow. No other language but shell would do something like that. On top of that, shell scripts are also very fragile, and change their behaviour drastically based on environment variables and suchlike, stuff that is hard to oversee and control.</p> <p>So, let's get rid of shell scripts in the boot process! Before we can do that we need to figure out what they are currently actually used for: well, the big picture is that most of the time, what they do is actually quite boring. Most of the scripting is spent on trivial setup and tear-down of services, and should be rewritten in C, either in separate executables, or moved into the daemons themselves, or simply be done in the init system.</p> <p>It is not likely that we can get rid of shell scripts during system boot-up entirely anytime soon. Rewriting them in C takes time, in a few case does not really make sense, and sometimes shell scripts are just too handy to do without. But we can certainly make them less prominent.</p> <p>A good metric for measuring shell script infestation of the boot process is the PID number of the first process you can start after the system is fully booted up. Boot up, log in, open a terminal, and type <tt>echo $$</tt>. Try that on your Linux system, and then compare the result with MacOS! (Hint, it's something like this: Linux PID 1823; MacOS PID 154, measured on test systems we own.)</p> <h4>Keeping Track of Processes</h4> <p>A central part of a system that starts up and maintains services should be process babysitting: it should watch services. Restart them if they shut down. If they crash it should collect information about them, and keep it around for the administrator, and cross-link that information with what is available from crash dump systems such as abrt, and in logging systems like syslog or the audit system.</p> <p>It should also be capable of shutting down a service completely. That might sound easy, but is harder than you think. Traditionally on Unix a process that does double-forking can escape the supervision of its parent, and the old parent will not learn about the relation of the new process to the one it actually started. An example: currently, a misbehaving CGI script that has double-forked is not terminated when you shut down Apache. Furthermore, you will not even be able to figure out its relation to Apache, unless you know it by name and purpose.</p> <p>So, how can we keep track of processes, so that they cannot escape the babysitter, and that we can control them as one unit even if they fork a gazillion times?</p> <p>Different people came up with different solutions for this. I am not going into much detail here, but let's at least say that approaches based on ptrace or the netlink connector (a kernel interface which allows you to get a netlink message each time any process on the system fork()s or exit()s) that some people have investigated and implemented, have been criticised as ugly and not very scalable.</p> <p>So what can we do about this? Well, since quite a while the kernel knows <a href="http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cgroups/cgroups.txt;hb=HEAD">Control Groups</a> (aka "cgroups"). Basically they allow the creation of a hierarchy of groups of processes. The hierarchy is directly exposed in a virtual file-system, and hence easily accessible. The group names are basically directory names in that file-system. If a process belonging to a specific cgroup fork()s, its child will become a member of the same group. Unless it is privileged and has access to the cgroup file system it cannot escape its group. Originally, cgroups have been introduced into the kernel for the purpose of containers: certain kernel subsystems can enforce limits on resources of certain groups, such as limiting CPU or memory usage. Traditional resource limits (as implemented by <tt>setrlimit()</tt>) are (mostly) per-process. cgroups on the other hand let you enforce limits on entire groups of processes. cgroups are also useful to enforce limits outside of the immediate container use case. You can use it for example to limit the total amount of memory or CPU Apache and all its children may use. Then, a misbehaving CGI script can no longer escape your <tt>setrlimit()</tt> resource control by simply forking away.</p> <p>In addition to container and resource limit enforcement cgroups are very useful to keep track of daemons: cgroup membership is securely inherited by child processes, they cannot escape. There's a notification system available so that a supervisor process can be notified when a cgroup runs empty. You can find the cgroups of a process by reading <tt>/proc/$PID/cgroup</tt>. cgroups hence make a very good choice to keep track of processes for babysitting purposes.</p> <h4>Controlling the Process Execution Environment</h4> <p>A good babysitter should not only oversee and control when a daemon starts, ends or crashes, but also set up a good, minimal, and secure working environment for it.</p> <p>That means setting obvious process parameters such as the <tt>setrlimit()</tt> resource limits, user/group IDs or the environment block, but does not end there. The Linux kernel gives users and administrators a lot of control over processes (some of it is rarely used, currently). For each process you can set CPU and IO scheduler controls, the capability bounding set, CPU affinity or of course cgroup environments with additional limits, and more.</p> <p>As an example, <tt>ioprio_set()</tt> with <tt>IOPRIO_CLASS_IDLE</tt> is a great away to minimize the effect of <tt>locate</tt>'s <tt>updatedb</tt> on system interactivity.</p> <p>On top of that certain high-level controls can be very useful, such as setting up read-only file system overlays based on read-only bind mounts. That way one can run certain daemons so that all (or some) file systems appear read-only to them, so that EROFS is returned on every write request. As such this can be used to lock down what daemons can do similar in fashion to a poor man's SELinux policy system (but this certainly doesn't replace SELinux, don't get any bad ideas, please).</p> <p>Finally logging is an important part of executing services: ideally every bit of output a service generates should be logged away. An init system should hence provide logging to daemons it spawns right from the beginning, and connect stdout and stderr to syslog or in some cases even <tt>/dev/kmsg</tt> which in many cases makes a very useful replacement for syslog (embedded folks, listen up!), especially in times where the kernel log buffer is configured ridiculously large out-of-the-box.</p> <h4>On Upstart</h4> <p>To begin with, let me emphasize that I actually like the code of Upstart, it is very well commented and easy to follow. It's certainly something other projects should learn from (including my own).</p> <p>That being said, I can't say I agree with the general approach of Upstart. But first, a bit more about the project:</p> <p>Upstart does not share code with sysvinit, and its functionality is a super-set of it, and provides compatibility to some degree with the well known SysV init scripts. It's main feature is its event-based approach: starting and stopping of processes is bound to "events" happening in the system, where an "event" can be a lot of different things, such as: a network interfaces becomes available or some other software has been started.</p> <p>Upstart does service serialization via these events: if the <tt>syslog-started</tt> event is triggered this is used as an indication to start D-Bus since it can now make use of Syslog. And then, when <tt>dbus-started</tt> is triggered, <tt>NetworkManager</tt> is started, since it may now use <tt>D-Bus</tt>, and so on.</p> <p>One could say that this way the actual logical dependency tree that exists and is understood by the admin or developer is translated and encoded into event and action rules: every logical "a needs b" rule that the administrator/developer is aware of becomes a "start a when b is started" plus "stop a when b is stopped". In some way this certainly is a simplification: especially for the code in Upstart itself. However I would argue that this simplification is actually detrimental. First of all, the logical dependency system does not go away, the person who is writing Upstart files must now translate the dependencies manually into these event/action rules (actually, two rules for each dependency). So, instead of letting the computer figure out what to do based on the dependencies, the user has to manually translate the dependencies into simple event/action rules. Also, because the dependency information has never been encoded it is not available at runtime, effectively meaning that an administrator who tries to figure our <i>why</i> something happened, i.e. why a is started when b is started, has no chance of finding that out.</p> <p>Furthermore, the event logic turns around all dependencies, from the feet onto their head. Instead of <i>minimizing</i> the amount of work (which is something that a good init system should focus on, as pointed out in the beginning of this blog story), it actually <i>maximizes</i> the amount of work to do during operations. Or in other words, instead of having a clear goal and only doing the things it really needs to do to reach the goal, it does one step, and then after finishing it, it does <b>all</b> steps that possibly could follow it.</p> <p>Or to put it simpler: the fact that the user just started D-Bus is in no way an indication that NetworkManager should be started too (but this is what Upstart would do). It's right the other way round: when the user asks for NetworkManager, that is definitely an indication that D-Bus should be started too (which is certainly what most users would expect, right?).</p> <p>A good init system should start only what is needed, and that on-demand. Either lazily or parallelized and in advance. However it should not start more than necessary, particularly not everything installed that could use that service.</p> <p>Finally, I fail to see the actual usefulness of the event logic. It appears to me that most events that are exposed in Upstart actually are not punctual in nature, but have duration: a service starts, is running, and stops. A device is plugged in, is available, and is plugged out again. A mount point is in the process of being mounted, is fully mounted, or is being unmounted. A power plug is plugged in, the system runs on AC, and the power plug is pulled. Only a minority of the events an init system or process supervisor should handle are actually punctual, most of them are tuples of start, condition, and stop. This information is again not available in Upstart, because it focuses in singular events, and ignores durable dependencies.</p> <p>Now, I am aware that some of the issues I pointed out above are in some way mitigated by certain more recent changes in Upstart, particularly condition based syntaxes such as <tt>start on (local-filesystems and net-device-up IFACE=lo)</tt> in Upstart rule files. However, to me this appears mostly as an attempt to fix a system whose core design is flawed.</p> <p>Besides that Upstart does OK for babysitting daemons, even though some choices might be questionable (see above), and there are certainly a lot of missed opportunities (see above, too).</p> <p>There are other init systems besides sysvinit, Upstart and launchd. Most of them offer little substantial more than Upstart or sysvinit. The most interesting other contender is Solaris SMF, which supports proper dependencies between services. However, in many ways it is overly complex and, let's say, a bit <i>academic</i> with its excessive use of XML and new terminology for known things. It is also closely bound to Solaris specific features such as the <i>contract</i> system.</p> <h4>Putting it All Together: systemd</h4> <p>Well, this is another good time for a little pause, because after I have hopefully explained above what I think a good PID 1 should be doing and what the current most used system does, we'll now come to where the beef is. So, go and refill you coffee mug again. It's going to be worth it.</p> <p>You probably guessed it: what I suggested above as requirements and features for an ideal init system is actually available now, in a (still experimental) init system called <tt>systemd</tt>, and which I hereby want to announce. <a href="http://git.0pointer.de/?p=systemd.git">Again, here's the code.</a> And here's a quick rundown of its features, and the rationale behind them:</p> <p>systemd starts up and supervises the entire system (hence the name...). It implements all of the features pointed out above and a few more. It is based around the notion of <i>units</i>. Units have a name and a type. Since their configuration is usually loaded directly from the file system, these unit names are actually file names. Example: a unit <tt>avahi.service</tt> is read from a configuration file by the same name, and of course could be a unit encapsulating the Avahi daemon. There are several kinds of units:</p> <ol> <li><tt>service</tt>: these are the most obvious kind of unit: daemons that can be started, stopped, restarted, reloaded. For compatibility with SysV we not only support our own configuration files for services, but also are able to read classic SysV init scripts, in particular we parse the LSB header, if it exists. <tt>/etc/init.d</tt> is hence not much more than just another source of configuration.</li> <li><tt>socket</tt>: this unit encapsulates a socket in the file-system or on the Internet. We currently support AF_INET, AF_INET6, AF_UNIX sockets of the types stream, datagram, and sequential packet. We also support classic FIFOs as transport. Each <tt>socket</tt> unit has a matching <tt>service</tt> unit, that is started if the first connection comes in on the socket or FIFO. Example: <tt>nscd.socket</tt> starts <tt>nscd.service</tt> on an incoming connection.</li> <li><tt>device</tt>: this unit encapsulates a device in the Linux device tree. If a device is marked for this via udev rules, it will be exposed as a <tt>device</tt> unit in systemd. Properties set with <tt>udev</tt> can be used as configuration source to set dependencies for device units.</li> <li><tt>mount</tt>: this unit encapsulates a mount point in the file system hierarchy. systemd monitors all mount points how they come and go, and can also be used to mount or unmount mount-points. <tt>/etc/fstab</tt> is used here as an additional configuration source for these mount points, similar to how SysV init scripts can be used as additional configuration source for <tt>service</tt> units.</li> <li><tt>automount</tt>: this unit type encapsulates an automount point in the file system hierarchy. Each <tt>automount</tt> unit has a matching <tt>mount</tt> unit, which is started (i.e. mounted) as soon as the automount directory is accessed.</li> <li><tt>target</tt>: this unit type is used for logical grouping of units: instead of actually doing anything by itself it simply references other units, which thereby can be controlled together. Examples for this are: <tt>multi-user.target</tt>, which is a target that basically plays the role of run-level 5 on classic SysV system, or <tt>bluetooth.target</tt> which is requested as soon as a bluetooth dongle becomes available and which simply pulls in bluetooth related services that otherwise would not need to be started: <tt>bluetoothd</tt> and <tt>obexd</tt> and suchlike.</li> <li><tt>snapshot</tt>: similar to <tt>target</tt> units snapshots do not actually do anything themselves and their only purpose is to reference other units. Snapshots can be used to save/rollback the state of all services and units of the init system. Primarily it has two intended use cases: to allow the user to temporarily enter a specific state such as "Emergency Shell", terminating current services, and provide an easy way to return to the state before, pulling up all services again that got temporarily pulled down. And to ease support for system suspending: still many services cannot correctly deal with system suspend, and it is often a better idea to shut them down before suspend, and restore them afterwards.</li> </ol> <p>All these units can have dependencies between each other (both positive and negative, i.e. 'Requires' and 'Conflicts'): a device can have a dependency on a service, meaning that as soon as a device becomes available a certain service is started. Mounts get an implicit dependency on the device they are mounted from. Mounts also gets implicit dependencies to mounts that are their prefixes (i.e. a mount <tt>/home/lennart</tt> implicitly gets a dependency added to the mount for <tt>/home</tt>) and so on. </p> <p>A short list of other features:</p> <ol> <li>For each process that is spawned, you may control: the environment, resource limits, working and root directory, umask, OOM killer adjustment, nice level, IO class and priority, CPU policy and priority, CPU affinity, timer slack, user id, group id, supplementary group ids, readable/writable/inaccessible directories, shared/private/slave mount flags, capabilities/bounding set, secure bits, CPU scheduler reset of fork, private <tt>/tmp</tt> name-space, cgroup control for various subsystems. Also, you can easily connect stdin/stdout/stderr of services to syslog, <tt>/dev/kmsg</tt>, arbitrary TTYs. If connected to a TTY for input systemd will make sure a process gets exclusive access, optionally waiting or enforcing it.</li> <li>Every executed process gets its own cgroup (currently by default in the debug subsystem, since that subsystem is not otherwise used and does not much more than the most basic process grouping), and it is very easy to configure systemd to place services in cgroups that have been configured externally, for example via the libcgroups utilities.</li> <li>The native configuration files use a syntax that closely follows the well-known <tt>.desktop</tt> files. It is a simple syntax for which parsers exist already in many software frameworks. Also, this allows us to rely on existing tools for i18n for service descriptions, and similar. Administrators and developers don't need to learn a new syntax.</li> <li>As mentioned, we provide compatibility with SysV init scripts. We take advantages of LSB and Red Hat chkconfig headers if they are available. If they aren't we try to make the best of the otherwise available information, such as the start priorities in <tt>/etc/rc.d</tt>. These init scripts are simply considered a different source of configuration, hence an easy upgrade path to proper systemd services is available. Optionally we can read classic PID files for services to identify the main pid of a daemon. Note that we make use of the dependency information from the LSB init script headers, and translate those into native systemd dependencies. Side note: Upstart is unable to harvest and make use of that information. Boot-up on a plain Upstart system with mostly LSB SysV init scripts will hence not be parallelized, a similar system running systemd however will. In fact, for Upstart all SysV scripts together make one job that is executed, they are not treated individually, again in contrast to systemd where SysV init scripts are just another source of configuration and are all treated and controlled individually, much like any other native systemd service.</li> <li>Similarly, we read the existing <tt>/etc/fstab</tt> configuration file, and consider it just another source of configuration. Using the <tt>comment=</tt> fstab option you can even mark <tt>/etc/fstab</tt> entries to become <tt>systemd</tt> controlled automount points.</li> <li>If the same unit is configured in multiple configuration sources (e.g. <tt>/etc/systemd/system/avahi.service</tt> exists, and <tt>/etc/init.d/avahi</tt> too), then the native configuration will always take precedence, the legacy format is ignored, allowing an easy upgrade path and packages to carry both a SysV init script and a systemd service file for a while.</li> <li>We support a simple templating/instance mechanism. Example: instead of having six configuration files for six gettys, we only have one <tt>getty@.service</tt> file which gets instantiated to <tt>getty@tty2.service</tt> and suchlike. The interface part can even be inherited by dependency expressions, i.e. it is easy to encode that a service <tt>dhcpcd@eth0.service</tt> pulls in <tt>avahi-autoipd@eth0.service</tt>, while leaving the <tt>eth0</tt> string wild-carded.</li> <li>For socket activation we support full compatibility with the traditional inetd modes, as well as a very simple mode that tries to mimic launchd socket activation and is recommended for new services. The inetd mode only allows passing one socket to the started daemon, while the native mode supports passing arbitrary numbers of file descriptors. We also support one instance per connection, as well as one instance for all connections modes. In the former mode we name the cgroup the daemon will be started in after the connection parameters, and utilize the templating logic mentioned above for this. Example: <tt>sshd.socket</tt> might spawn services <tt>sshd@192.168.0.1-4711-192.168.0.2-22.service</tt> with a cgroup of <tt>sshd@.service/192.168.0.1-4711-192.168.0.2-22</tt> (i.e. the IP address and port numbers are used in the instance names. For AF_UNIX sockets we use PID and user id of the connecting client). This provides a nice way for the administrator to identify the various instances of a daemon and control their runtime individually. The native socket passing mode is very easily implementable in applications: if <tt>$LISTEN_FDS</tt> is set it contains the number of sockets passed and the daemon will find them sorted as listed in the <tt>.service</tt> file, starting from file descriptor 3 (a nicely written daemon could also use <tt>fstat()</tt> and <tt>getsockname()</tt> to identify the sockets in case it receives more than one). In addition we set <tt>$LISTEN_PID</tt> to the PID of the daemon that shall receive the fds, because environment variables are normally inherited by sub-processes and hence could confuse processes further down the chain. Even though this socket passing logic is very simple to implement in daemons, we will provide a BSD-licensed reference implementation that shows how to do this. We have ported a couple of existing daemons to this new scheme.</li> <li>We provide compatibility with <tt>/dev/initctl</tt> to a certain extent. This compatibility is in fact implemented with a FIFO-activated service, which simply translates these legacy requests to D-Bus requests. Effectively this means the old <tt>shutdown</tt>, <tt>poweroff</tt> and similar commands from Upstart and <tt>sysvinit</tt> continue to work with systemd.</li> <li>We also provide compatibility with <tt>utmp</tt> and <tt>wtmp</tt>. Possibly even to an extent that is far more than healthy, given how crufty <tt>utmp</tt> and <tt>wtmp</tt> are.</li> <li>systemd supports several kinds of dependencies between units. <tt>After</tt>/<tt>Before</tt> can be used to fix the ordering how units are activated. It is completely orthogonal to <tt>Requires</tt> and <tt>Wants</tt>, which express a positive requirement dependency, either mandatory, or optional. Then, there is <tt>Conflicts</tt> which expresses a negative requirement dependency. Finally, there are three further, less used dependency types.</li> <li>systemd has a minimal transaction system. Meaning: if a unit is requested to start up or shut down we will add it and all its dependencies to a temporary <i>transaction</i>. Then, we will verify if the transaction is consistent (i.e. whether the ordering via <tt>After</tt>/<tt>Before</tt> of all units is cycle-free). If it is not, systemd will try to fix it up, and removes non-essential jobs from the transaction that might remove the loop. Also, systemd tries to suppress non-essential jobs in the transaction that would stop a running service. Non-essential jobs are those which the original request did not directly include but which where pulled in by <tt>Wants</tt> type of dependencies. Finally we check whether the jobs of the transaction contradict jobs that have already been queued, and optionally the transaction is aborted then. If all worked out and the transaction is consistent and minimized in its impact it is merged with all already outstanding jobs and added to the run queue. Effectively this means that before executing a requested operation, we will verify that it makes sense, fixing it if possible, and only failing if it really cannot work.</li> <li>We record start/exit time as well as the PID and exit status of every process we spawn and supervise. This data can be used to cross-link daemons with their data in abrtd, auditd and syslog. Think of an UI that will highlight crashed daemons for you, and allows you to easily navigate to the respective UIs for syslog, abrt, and auditd that will show the data generated from and for this daemon on a specific run.</li> <li>We support reexecution of the init process itself at any time. The daemon state is serialized before the reexecution and deserialized afterwards. That way we provide a simple way to facilitate init system upgrades as well as handover from an initrd daemon to the final daemon. Open sockets and autofs mounts are properly serialized away, so that they stay connectible all the time, in a way that clients will not even notice that the init system reexecuted itself. Also, the fact that a big part of the service state is encoded anyway in the cgroup virtual file system would even allow us to resume execution without access to the serialization data. The reexecution code paths are actually mostly the same as the init system configuration reloading code paths, which guarantees that reexecution (which is probably more seldom triggered) gets similar testing as reloading (which is probably more common).</li> <li>Starting the work of removing shell scripts from the boot process we have recoded part of the basic system setup in C and moved it directly into systemd. Among that is mounting of the API file systems (i.e. virtual file systems such as <tt>/proc</tt>, <tt>/sys</tt> and <tt>/dev</tt>.) and setting of the host-name.</li> <li>Server state is introspectable and controllable via D-Bus. This is not complete yet but quite extensive.</li> <li>While we want to emphasize socket-based and bus-name-based activation, and we hence support dependencies between sockets and services, we also support traditional inter-service dependencies. We support multiple ways how such a service can signal its readiness: by forking and having the start process exit (i.e. traditional <tt>daemonize()</tt> behaviour), as well as by watching the bus until a configured service name appears.</li> <li>There's an interactive mode which asks for confirmation each time a process is spawned by systemd. You may enable it by passing <tt>systemd.confirm_spawn=1</tt> on the kernel command line.</li> <li>With the <tt>systemd.default=</tt> kernel command line parameter you can specify which unit systemd should start on boot-up. Normally you'd specify something like <tt>multi-user.target</tt> here, but another choice could even be a single service instead of a target, for example out-of-the-box we ship a service <tt>emergency.service</tt> that is similar in its usefulness as <tt>init=/bin/bash</tt>, however has the advantage of actually running the init system, hence offering the option to boot up the full system from the emergency shell.</li> <li>There's a minimal UI that allows you to start/stop/introspect services. It's far from complete but useful as a debugging tool. It's written in Vala (yay!) and goes by the name of <tt>systemadm</tt>.</li> </ol> <p>It should be noted that systemd uses many Linux-specific features, and does not limit itself to POSIX. That unlocks a lot of functionality a system that is designed for portability to other operating systems cannot provide.</p> <h4>Status</h4> <p>All the features listed above are already implemented. Right now systemd can already be used as a drop-in replacement for Upstart and sysvinit (at least as long as there aren't too many native upstart services yet. Thankfully most distributions don't carry too many native Upstart services yet.)</p> <p>However, testing has been minimal, our version number is currently at an impressive 0. Expect breakage if you run this in its current state. That said, overall it should be quite stable and some of us already boot their normal development systems with systemd (in contrast to VMs only). YMMV, especially if you try this on distributions we developers don't use.</p> <h4>Where is This Going?</h4> <p>The feature set described above is certainly already comprehensive. However, we have a few more things on our plate. I don't really like speaking too much about big plans but here's a short overview in which direction we will be pushing this:</p> <p>We want to add at least two more unit types: <tt>swap</tt> shall be used to control swap devices the same way we already control mounts, i.e. with automatic dependencies on the device tree devices they are activated from, and suchlike. <tt>timer</tt> shall provide functionality similar to <tt>cron</tt>, i.e. starts services based on time events, the focus being both monotonic clock and wall-clock/calendar events. (i.e. "start this 5h after it last ran" as well as "start this every monday 5 am")</p> <p>More importantly however, it is also our plan to experiment with systemd not only for optimizing boot times, but also to make it the ideal session manager, to replace (or possibly just augment) <tt>gnome-session</tt>, <tt>kdeinit</tt> and similar daemons. The problem set of a session manager and an init system are very similar: quick start-up is essential and babysitting processes the focus. Using the same code for both uses hence suggests itself. Apple recognized that and does just that with launchd. And so should we: socket and bus based activation and parallelization is something session services and system services can benefit from equally.</p> <p>I should probably note that all three of these features are already partially available in the current code base, but not complete yet. For example, already, you can run systemd just fine as a normal user, and it will detect that is run that way and support for this mode has been available since the very beginning, and is in the very core. (It is also exceptionally useful for debugging! This works fine even without having the system otherwise converted to systemd for booting.)</p> <p>However, there are some things we probably should fix in the kernel and elsewhere before finishing work on this: we need swap status change notifications from the kernel similar to how we can already subscribe to mount changes; we want a notification when CLOCK_REALTIME jumps relative to CLOCK_MONOTONIC; we want to allow <a href="http://lkml.org/lkml/2010/2/2/165">normal processes to get some init-like powers</a>; we need a <a href="http://lists.freedesktop.org/archives/xdg/2010-April/011446.html">well-defined place where we can put user sockets</a>. None of these issues are really essential for systemd, but they'd certainly improve things.</p> <h4>You Want to See This in Action?</h4> <p>Currently, there are no tarball releases, but it should be straightforward to check out the code <a href="http://git.0pointer.de/?p=systemd.git">from our repository</a>. In addition, to have something to start with, <a href="http://0pointer.de/public/etc-systemd-system.tar.gz">here's a tarball with unit configuration files</a> that allows an otherwise unmodified Fedora 13 system to work with systemd. We have no RPMs to offer you for now.</p> <p>An easier way is to download <a href="http://surfsite.org/f13-systemd-livecd.torrent">this Fedora 13 qemu image</a>, which has been prepared for systemd. In the grub menu you can select whether you want to boot the system with Upstart or systemd. Note that this system is minimally modified only. Service information is read exclusively from the existing SysV init scripts. Hence it will not take advantage of the full socket and bus-based parallelization pointed out above, however it will interpret the parallelization hints from the LSB headers, and hence boots faster than the Upstart system, which in Fedora does not employ any parallelization at the moment. The image is configured to output debug information on the serial console, as well as writing it to the kernel log buffer (which you may access with <tt>dmesg</tt>). You might want to run <tt>qemu</tt> configured with a virtual serial terminal. All passwords are set to <tt>systemd</tt>.</p> <p>Even simpler than downloading and booting the qemu image is looking at pretty screen-shots. Since an init system usually is well hidden beneath the user interface, some shots of <tt>systemadm</tt> and <tt>ps</tt> must do:</p> <p><img src="http://0pointer.de/public/systemadm.png" width="1057" height="881" alt="systemadm" /></p> <p>That's systemadm showing all loaded units, with more detailed information on one of the getty instances.</p> <p><img src="http://0pointer.de/public/pscgroups.png" width="1057" height="881" alt="ps" /></p> <p>That's an excerpt of the output of <tt>ps xaf -eo pid,user,args,cgroup</tt> showing how neatly the processes are sorted into the cgroup of their service. (The fourth column is the cgroup, the <tt>debug:</tt> prefix is shown because we use the debug cgroup controller for systemd, as mentioned earlier. This is only temporary.)</p> <p>Note that both of these screenshots show an only minimally modified Fedora 13 Live CD installation, where services are exclusively loaded from the existing SysV init scripts. Hence, this does not use socket or bus activation for any existing service.</p> <p>Sorry, no bootcharts or hard data on start-up times for the moment. We'll publish that as soon as we have fully parallelized all services from the default Fedora install. Then, we'll welcome you to benchmark the systemd approach, and provide our own benchmark data as well.</p> <p>Well, presumably everybody will keep bugging me about this, so here are two numbers I'll tell you. However, they are completely unscientific as they are measured for a VM (single CPU) and by using the stop timer in my watch. Fedora 13 booting up with Upstart takes 27s, with systemd we reach 24s (from grub to gdm, same system, same settings, shorter value of two bootups, one immediately following the other). Note however that this shows nothing more than the speedup effect reached by using the LSB dependency information parsed from the init script headers for parallelization. Socket or bus based activation was not utilized for this, and hence these numbers are unsuitable to assess the ideas pointed out above. Also, systemd was set to debug verbosity levels on a serial console. So again, this benchmark data has barely any value.</p> <h4>Writing Daemons</h4> <p>An ideal daemon for use with systemd does a few things differently then things were traditionally done. Later on, we will publish a longer guide explaining and suggesting how to write a daemon for use with this systemd. Basically, things get simpler for daemon developers:</p> <ul> <li>We ask daemon writers not to fork or even double fork in their processes, but run their event loop from the initial process systemd starts for you. Also, don't call <tt>setsid()</tt>.</li> <li>Don't drop user privileges in the daemon itself, leave this to systemd and configure it in systemd service configuration files. (There are exceptions here. For example, for some daemons there are good reasons to drop privileges inside the daemon code, after an initialization phase that requires elevated privileges.)</li> <li>Don't write PID files</li> <li>Grab a name on the bus</li> <li>You may rely on systemd for logging, you are welcome to log whatever you need to log to stderr.</li> <li>Let systemd create and watch sockets for you, so that socket activation works. Hence, interpret <tt>$LISTEN_FDS</tt> and <tt>$LISTEN_PID</tt> as described above.</li> <li>Use SIGTERM for requesting shut downs from your daemon.</li> </ul> <p>The list above is very similar to what <a href="http://developer.apple.com/mac/library/documentation/MacOSX/Conceptual/BPSystemStartup/Articles/LaunchOnDemandDaemons.html">Apple recommends for daemons compatible with launchd</a>. It should be easy to extend daemons that already support launchd activation to support systemd activation as well.</p> <p>Note that systemd supports daemons not written in this style perfectly as well, already for compatibility reasons (launchd has only limited support for that). As mentioned, this even extends to existing inetd capable daemons which can be used unmodified for socket activation by systemd.</p> <p>So, yes, should systemd prove itself in our experiments and get adopted by the distributions it would make sense to port at least those services that are started by default to use socket or bus-based activation. <a href="http://people.freedesktop.org/~kay/LISTEN_FDS/">We have written proof-of-concept patches</a>, and the porting turned out to be very easy. Also, we can leverage the work that has already been done for launchd, to a certain extent. Moreover, adding support for socket-based activation does not make the service incompatible with non-systemd systems.</p> <h4 id="faqs">FAQs</h4> <dl> <dt>Who's behind this?</dt> <dd>Well, the current code-base is mostly my work, Lennart Poettering (Red Hat). However the design in all its details is result of close cooperation between Kay Sievers (Novell) and me. Other people involved are Harald Hoyer (Red Hat), Dhaval Giani (Formerly IBM), and a few others from various companies such as Intel, SUSE and Nokia.</dd> <dt>Is this a Red Hat project?</dt> <dd>No, this is my personal side project. Also, let me emphasize this: <i>the opinions reflected here are my own. They are not the views of my employer, or Ronald McDonald, or anyone else.</i></dd> <dt>Will this come to Fedora?</dt> <dd>If our experiments prove that this approach works out, and discussions in the Fedora community show support for this, then yes, we'll certainly try to get this into Fedora.</dd> <dt>Will this come to OpenSUSE?</dt> <dd>Kay's pursuing that, so something similar as for Fedora applies here, too.</dd> <dt>Will this come to Debian/Gentoo/Mandriva/MeeGo/Ubuntu/[insert your favourite distro here]?</dt> <dd>That's up to them. We'd certainly welcome their interest, and help with the integration.</dd> <dt>Why didn't you just add this to Upstart, why did you invent something new?</dt> <dd>Well, the point of the part about Upstart above was to show that the core design of Upstart is flawed, in our opinion. Starting completely from scratch suggests itself if the existing solution appears flawed in its core. However, note that we took a lot of inspiration from Upstart's code-base otherwise.</dd> <dt>If you love Apple launchd so much, why not adopt that?</dt> <dd>launchd is a great invention, but I am not convinced that it would fit well into Linux, nor that it is suitable for a system like Linux with its immense scalability and flexibility to numerous purposes and uses.</dd> <dt>Is this an <a href="http://en.wikipedia.org/wiki/Not_Invented_Here">NIH</a> project?</dt> <dd>Well, I hope that I managed to explain in the text above why we came up with something new, instead of building on Upstart or launchd. We came up with systemd due to technical reasons, not political reasons.</dd> <dd>Don't forget that it is Upstart that includes <a href="https://launchpad.net/libnih">a library called NIH</a> (which is kind of a reimplementation of glib) -- not systemd!</dd> <dt>Will this run on [insert non-Linux OS here]?</dt> <dd>Unlikely. As pointed out, systemd uses many Linux specific APIs (such as epoll, signalfd, libudev, cgroups, and numerous more), a port to other operating systems appears to us as not making a lot of sense. Also, we, the people involved are unlikely to be interested in merging possible ports to other platforms and work with the constraints this introduces. That said, git supports branches and rebasing quite well, in case people really want to do a port.</dd> <dd>Actually portability is even more limited than just to other OSes: we require a very recent Linux kernel, glibc, libcgroup and libudev. No support for less-than-current Linux systems, sorry.</dd> <dd>If folks want to implement something similar for other operating systems, the preferred mode of cooperation is probably that we help you identify which interfaces can be shared with your system, to make life easier for daemon writers to support both systemd and your systemd counterpart. Probably, the focus should be to share interfaces, not code.</dd> <dt>I hear [fill one in here: the Gentoo boot system, initng, Solaris SMF, runit, uxlaunch, ...] is an awesome init system and also does parallel boot-up, so why not adopt that?</dt> <dd>Well, before we started this we actually had a very close look at the various systems, and none of them did what we had in mind for systemd (with the exception of launchd, of course). If you cannot see that, then please read again what I wrote above.</dd> <!-- <dt>First you <a href="http://pulseaudio.org/">break my audio</a>, and now you want to corrupt my boot?</dt> <dd>Yes. And don't forget that I am also responsible for <a href="http://avahi.org/">crucifying your network</a>. I am coming after you! Muhahahaha!</dd>--> </dl> <h4 id="contributions">Contributions</h4> <p>We are very interested in patches and help. It should be common sense that every Free Software project can only benefit from the widest possible external contributions. That is particularly true for a core part of the OS, such as an init system. We value your contributions and hence do not require copyright assignment (<a href="http://www.ebb.org/bkuhn/blog/2010/02/01/copyright-not-all-equal.html">Very much unlike Canonical/Upstart</a>!). And also, we use git, everybody's favourite VCS, yay!</p> <p>We are particularly interested in help getting systemd to work on other distributions, besides Fedora and OpenSUSE. (Hey, anybody from Debian, Gentoo, Mandriva, MeeGo looking for something to do?) But even beyond that we are keen to attract contributors on every level: we welcome C hackers, packagers, as well as folks who are interested to write documentation, or contribute a logo.</p> <h4 id="community">Community</h4> <p>At this time we only have <a href="http://git.0pointer.de/?p=systemd.git">source code repository</a> and an IRC channel (<tt>#systemd</tt> on Freenode). There's no mailing list, web site or bug tracking system. We'll probably set something up on freedesktop.org soon. If you have any questions or want to contact us otherwise we invite you to join us on IRC!</p> <p><b>Update: <a href="http://0pointer.de/blog/projects/systemd-website.html">our GIT repository has moved.</a></b></p> Lennart PoetteringFri, 30 Apr 2010 10:46:00 +0200tag:0pointer.net,2010-04-30:/blog/projects/systemd.htmlprojectsA Few Notes on Bloom Filtershttps://0pointer.net/blog/projects/bloom.html <p>For future reference (mostly for myself), here's a little summary of how to use <a href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</a> in real world applications.</p> <p>Most references are terse and vague on how to pick the hash functions for bloom filters, so here's some detail about that: For small filters, just use a boring and fast hash function like the <a href="http://www.google.com/codesearch?q=djb+hash+function&amp;hl=en">djb hash function</a> and split up the 32bit result into smaller independent chunks for each of the k hash indexes you'll need. Often those 32 bits already provide enough hash bits to get enough independent bloom filter indexes. And if they don't you basically have three options:</p> <ul> <li>Use multiple different hash functions, and then <a href="http://sites.google.com/site/murmurhash/">MurmurHash</a> seems to be a very good choice. It's simple, readily usable code (even in C, though the reference implementation claims to be C++), and properly licensed. It is a hash function that takes a seed parameter which can be used to create as many independent hash functions as needed.</li> <li>Use a cryptographic hash function. Most of them can be implemented really fast on modern CPUs and are already available in some library you use anyway. SHA512 for example outputs plenty bits you can split into k chunks as you need them for your k bloom filter indexes. (Of course, if you are afraid of US export regulations this might be a choice you want to avoid.)</li> <li>Use two independent hash functions and combine <a href="http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/esa06.pdf">them linearly</a>.</li> </ul> <p>The size of the bloom filter and the number of hash functions you should be using depending on your application can be calculated using the formulas on the Wikipedia page:</p> <p><tt>m = -n*ln(p)/(ln(2)^2)</tt></p> <p>This will tell you the number of bits m to use for your filter, given the number n of elements in your filter and the false positive probability p you want to achieve. All that for the ideal number of hash functions k which you can calculate like this:</p> <p><tt>k = 0.7*m/n</tt></p> <p>And that's already everything you need to know to build good bloom filters. If you know the p and n for your use case the above will tell you the m and k, and how to choose the k hash functions.</p> <p>Bloom filters are a really really useful tool, and given their simplicity something every developer should be aware of.</p> <p>(And in case you were wondering what this all is about, Kay Sievers and I were discussing using bloom filters in the libudev netlink BSD socket filters, to allow monitoring a certain subset of devices that is orthogonal to the usual subsystem hierarchy, and all that in a way where the number of wakeups in listening clients is minimized)</p> Lennart PoetteringTue, 20 Apr 2010 22:01:00 +0200tag:0pointer.net,2010-04-20:/blog/projects/bloom.htmlprojectsDown the Amazon IIhttps://0pointer.net/blog/photos/amazon2.html <p>As a followup to <a href="http://0pointer.de/blog/photos/amazon.html">this blog story</a> here are a couple of non-panorama shots from the trip:</p> <p> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=206"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-206.jpg" alt="Image 206" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=144"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-144.jpg" alt="Image 144" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=142"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-142.jpg" alt="Image 142" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=113"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-113.jpg" alt="Image 113" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=96"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-96.jpg" alt="Image 96" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=897"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-897.jpg" alt="Image 897" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=91"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-91.jpg" alt="Image 91" width="120" height="80" /></a> </p> <p> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=751"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-751.jpg" alt="Image 751" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=112"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-112.jpg" alt="Image 112" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=72"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-72.jpg" alt="Image 72" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=131"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-131.jpg" alt="Image 131" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=233"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-233.jpg" alt="Image 233" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=488"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-488.jpg" alt="Image 488" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=249"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-249.jpg" alt="Image 249" width="120" height="80" /></a> </p> <p> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=272"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-272.jpg" alt="Image 272" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=356"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-356.jpg" alt="Image 356" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=393"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-393.jpg" alt="Image 393" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=234"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-234.jpg" alt="Image 234" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=435"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-435.jpg" alt="Image 435" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=450"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-450.jpg" alt="Image 450" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=485"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-485.jpg" alt="Image 485" width="120" height="80" /></a> </p> <p> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=60"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-60.jpg" alt="Image 60" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=502"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-502.jpg" alt="Image 502" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=753"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-753.jpg" alt="Image 753" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=822"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-822.jpg" alt="Image 822" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=951"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-951.jpg" alt="Image 951" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=960"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-960.jpg" alt="Image 960" width="120" height="80" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=85"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-85.jpg" alt="Image 85" width="120" height="80" /></a> </p> <p> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=199"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-199.jpg" alt="Image 199" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=653"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-653.jpg" alt="Image 653" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=194"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-194.jpg" alt="Image 194" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=164"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-164.jpg" alt="Image 164" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=89"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-89.jpg" alt="Image 89" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=231"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-231.jpg" alt="Image 231" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=240"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-240.jpg" alt="Image 240" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=263"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-263.jpg" alt="Image 263" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=685"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-685.jpg" alt="Image 685" width="80" height="120" /></a> </p> <p> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=331"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-331.jpg" alt="Image 331" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=334"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-334.jpg" alt="Image 334" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=337"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-337.jpg" alt="Image 337" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=389"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-389.jpg" alt="Image 389" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=537"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-537.jpg" alt="Image 537" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=570"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-570.jpg" alt="Image 570" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=582"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-582.jpg" alt="Image 582" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=197"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-197.jpg" alt="Image 197" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=655"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-655.jpg" alt="Image 655" width="80" height="120" /></a> </p> <p> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=660"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-660.jpg" alt="Image 660" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=108"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-108.jpg" alt="Image 108" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=697"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-697.jpg" alt="Image 697" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=710"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-710.jpg" alt="Image 710" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=747"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-747.jpg" alt="Image 747" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=705"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-705.jpg" alt="Image 705" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=776"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-776.jpg" alt="Image 776" width="80" height="120" /></a> <a href="http://0pointer.de/photos/?gallery=Amazon%202010-03&amp;photo=832"><img src="http://0pointer.de/photos/galleries/Amazon%202010-03/thumbs/img-832.jpg" alt="Image 832" width="80" height="120" /></a> </p> Lennart PoetteringSun, 04 Apr 2010 00:59:00 +0200tag:0pointer.net,2010-04-04:/blog/photos/amazon2.htmlphotosDown the Amazonhttps://0pointer.net/blog/photos/amazon.html <p>After <a href="http://0pointer.de/blog/projects/bossa2010.html">BOSSA in Manaus/Brazil</a> we took a very enjoyable boat trip down the Amazon, to Santar&eacute;m and particularly Alter do Ch&atilde;o, a ridiculously amazing island paradise with glaring white sand in the middle of the jungle:</p> <p><a href="http://0pointer.de/static/tapajos2"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/tapajos2-gimped-small.jpeg" width="1024" height="188" alt="Tapajos 2" /></a></p> <p>The town is located on the Tapaj&oacute;s River:</p> <p><a href="http://0pointer.de/static/tapajos1"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/tapajos1-gimped-small.jpeg" width="1024" height="155" alt="Tapajos 1" /></a></p> <p><a href="http://0pointer.de/static/tapajos3"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/tapajos3-gimped-small.jpeg" width="1024" height="184" alt="Tapajos 3" /></a></p> <p>Up the river you find the Tapaj&oacute;s National Forest:</p> <p><a href="http://0pointer.de/static/tapajos4"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/tapajos4-gimped-small.jpeg" width="1024" height="231" alt="Tapajos 4" /></a></p> <p>From there we went on to S&atilde;o Lu&iacute;s, a beautiful old colonial town:</p> <p><a href="http://0pointer.de/static/saoluis1"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/saoluis1-gimped-small.jpeg" width="1024" height="168" alt="Sao Luis 1" /></a></p> <p><a href="http://0pointer.de/static/saoluis3"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/saoluis3-gimped-small.jpeg" width="1024" height="255" alt="Sao Luis 3" /></a></p> <p><a href="http://0pointer.de/static/saoluis4"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/saoluis4-gimped-small.jpeg" width="1024" height="172" alt="Sao Luis 4" /></a></p> <p><a href="http://0pointer.de/static/saoluis2"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/saoluis2-gimped-small.jpeg" width="1024" height="288" alt="Sao Luis 2" /></a></p> <p>A windy and wet sailing catamaran ride from S&atilde;o Lu&iacute;s you find Alc&acirc;ntara, another old colonial town, now partly in ruins and deserted:</p> <p><a href="http://0pointer.de/static/alcantara1"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/alcantara1-gimped-small.jpeg" width="1024" height="159" alt="Alcantara 1" /></a></p> <p><a href="http://0pointer.de/static/alcantara2"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/alcantara2-gimped-small.jpeg" width="1024" height="186" alt="Alcantara 2" /></a></p> <p><a href="http://0pointer.de/static/alcantara3"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/alcantara3-gimped-small.jpeg" width="1024" height="151" alt="Alcantara 3" /></a></p> Lennart PoetteringSat, 03 Apr 2010 22:25:00 +0200tag:0pointer.net,2010-04-03:/blog/photos/amazon.htmlphotosPublic Service Announcement: Beware of rsvg_term()!https://0pointer.net/blog/projects/beware-of-rsvg-term.html <p>As a short followup on <a href="http://0pointer.de/blog/projects/beware-of-xmlCleanupParser">an older blog posting of mine</a>:</p> <p>So you are using librsvg's <tt>rsvg_term()</tt> in your code? If so then you are probably misusing it and triggering crashes in PulseAudio related code. The same way everybody should stop using libxml2's <tt>xmlCleanupParser()</tt> call, stop using <tt>rsvg_term()</tt>! It's really hard to use it correctly, and uneeded anyway. <a href="https://bugzilla.gnome.org/show_bug.cgi?id=592100">Also see this bug report.</a></p> Lennart PoetteringTue, 23 Mar 2010 21:29:00 +0100tag:0pointer.net,2010-03-23:/blog/projects/beware-of-rsvg-term.htmlprojectsBossa 2010/Manaus Slideshttps://0pointer.net/blog/projects/bossa2010.html <p>The slides for my talk about the audio infrastructure of Linux mobile devices at <a href="http://bossaconference.indt.org/">BOSSA 2010</a> in Manaus/Brazil <a href="http://0pointer.de/public/pulse-bossa2010.pdf">are now available online</a>. They are terse (as usual), and the most interesting stuff is probably in what I said, and not so much in what I wrote in those slides. But nonetheless I believe this might still be quite interesting for attendees as well as non-attendees.</p> <p>The talk focuses on the audio architecture of the Nokia N900 and the Palm Pre, and of course particularly their use of <a href="http://pulseaudio.org">PulseAudio</a> for all things audio. I analyzed and compared their patch sets to figure out what their priorities are, what we should move into PulseAudio mainline, and what should better be left in their private patch sets.</p> Lennart PoetteringTue, 09 Mar 2010 19:04:00 +0100tag:0pointer.net,2010-03-09:/blog/projects/bossa2010.htmlprojectsMeasure Your Sound Card!https://0pointer.net/blog/projects/decibel-data.html #nocomments y <p>In recent versions <a href="http://pulseaudio.org/">PulseAudio</a> integrates the <a href="http://people.redhat.com/alexl/files/why-alsa-sucks.png">numerous mixer elements ALSA exposes</a> into one single powerful slider which tries to make the best of the granularity and range of the hardware and extends that in software so that we can provide an equally powerful slider on all systems. That means if your hardware only supports a limited volume range (many integrated USB speakers for example cannot be completely muted with the hardware volume slider), limited granularity (some hardware sliders only have 8 steps or so), or no per-channel volumes (many sound cards have a single slider that covers all channels), then PulseAudio tries its best to make use of the next hardware volume slider in the pipeline to compensate for that, and so on, finally falling back to software for everything that cannot be done in hardware. <a href="http://pulseaudio.org/wiki/PulseAudioStoleMyVolumes">This is explained in more detail here.</a></p> <p>Now this algorithm depends on that we know the actual attenuation factors (factors like that are usually written in units of dB which is why I will call this the "dB data" from now on) of the hardware volume controls. Thankfully ALSA includes that information in its driver interfaces. However for some hardware this data is not reliable. For example, one of my own cards (a Terratec Aureon 5.1 MkII USB) contains invalid dB data in its USB descriptor and ALSA passes that on to PulseAudio. The effect of that is that the PulseAudio volume control behaves very weirdly for this card, in a way that the volume "jumps" and changes in unexpected ways (or doesn't change at all in some ranges!) when you slowly move the slider, or that the volume is completely muted over large ranges of the slider where it should not be. Also this breaks the <i>flat volume</i> logic in PulseAudo, to the result that playing one stream (let's say a music stream) and then adding a second one (let's say an event sound) might incorrectly attenuate the first one (i.e. whenever you play an event sound the music changes in volume).</p> <p>Incorrect dB data is not a new problem. However PulseAudio is the first application that actually depends on the correctness of this data. Previously the dB info was shown as auxiliary information in some volume controls, and only noticed and understood by very few, technical people. It was not used for further calculations.</p> <p>Now, the reasons I am writing this blog posting are firstly to inform you about this type of bug and the results it has on the logic PulseAudio implements, and secondly (and more importantly) to point you to <a href="http://pulseaudio.org/wiki/BadDecibel">this little Wiki page</a> I wrote that explains how to verify if this is indeed a problem on your card (in case you are experiencing any of the symptoms mentioned above) and secondly what to do to improve the situation, and how to get correct dB data that can be included as quirk in your driver.</p> <p>Thank you for your attention.</p> Lennart PoetteringWed, 24 Feb 2010 01:49:00 +0100tag:0pointer.net,2010-02-24:/blog/projects/decibel-data.htmlprojectsHorizontal Panoramas Are So 2009!https://0pointer.net/blog/photos/brussels-cathedral.html <p>Horizontal panoramas are so 2009 -- which is why I now give you the <i>vertical panorama</i>:</p> <p><a href="http://0pointer.de/static/cathedral"><img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/cathedral-gimped-small.jpeg" width="1024" height="266" alt="Brussels Cathedral" /></a></p> <p>Now if I wasn't too stupid to hold my camera steady shooting upwards, this could actually have been a really good picture.</p> Lennart PoetteringSun, 21 Feb 2010 02:31:00 +0100tag:0pointer.net,2010-02-21:/blog/photos/brussels-cathedral.htmlphotosSpeaker Setuphttps://0pointer.net/blog/projects/speaker-setup.html <p>While tracking down some surround sound related bugs I was missing a speaker setup and testing utility. So I decided to do something about it and I present you <a href="http://git.0pointer.de/?p=gnome-speaker-setup.git">gnome-speaker-setup</a>:</p> <img src="http://0pointer.de/public/gnome-speaker-setup.png" width="729" height="736" alt="gnome-speaker-setup" /> <p>The tool should be very robust and even deal with the weirdest channel mappings. OTOH the artwork is not really good and appropriate. But I hope it still shows some resemblance to <a href="http://people.fedoraproject.org/~hadess/gnome-volume-control/multi-speaker/drivers1.gif">other</a> <a href="http://people.fedoraproject.org/~hadess/gnome-volume-control/multi-speaker/3.jpg">UIs</a> of this type. If you are an artist wand want to contribute better artwork make sure to go through the <a href="http://live.gnome.org/GnomeArt/ArtRequests/">Gnome Art Requests</a> page, and more specifically <a href="http://live.gnome.org/GnomeArt/ArtRequests/issue22">this particular request</a>.</p> <p>This (or something like it) will hopefully and eventually end up in some way or another in gnome-media. Until that day comes I'll maintain this tool independently.</p> <p>To compile this you need a recent <a href="http://live.gnome.org/Vala">Vala</a> and <a href="http://0pointer.de/lennart/projects/libcanberra/">libcanberra 0.23</a>.</p> Lennart PoetteringSun, 21 Feb 2010 00:58:00 +0100tag:0pointer.net,2010-02-21:/blog/projects/speaker-setup.htmlprojectsIndia, 360 Degrees at a Time, Part Sevenhttps://0pointer.net/blog/photos/india-360-at-a-time-7.html <p>Here's the seventh and <a href="http://0pointer.de/blog/photos/india-360-at-a-time-1.html">final</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-2.html">part</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-3.html">of</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-4.html">my</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-5.html">ongoing</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-6.html">series</a>.</p> <p>One of the grandest sights in Delhi is <a href="http://en.wikipedia.org/wiki/Humayun%27s_Tomb">Humayun's tomb</a>, a predecessor of the greatest mausoleum of them all, the <a href="http://0pointer.de/static/tajmahal2.html">Taj Mahal</a>:</p> <p><a href="http://0pointer.de/static/delhi3"><img alt="Humayun's Tomb" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/delhi3-gimped-small.jpeg" width="1024" height="174" /></a></p> <p>A little bit further down a view on the garden:</p> <p><a href="http://0pointer.de/static/delhi4"><img alt="Humayun's Tomb" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/delhi4-gimped-small.jpeg" width="1024" height="177" /></a></p> <p>From a different corner:</p> <p><a href="http://0pointer.de/static/delhi2"><img alt="Humayun's Tomb" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/delhi2-gimped-small.jpeg" width="1024" height="159" /></a></p> <p>We'll finish with our last panorama that shows the courtyard the <a href="http://en.wikipedia.org/wiki/Jama_Masjid,_Delhi">Jama Masjid</a> of Old Delhi:</p> <p><a href="http://0pointer.de/static/delhi5"><img alt="Jama Masjid" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/delhi5-gimped-small.jpeg" width="1024" height="183" /></a></p> <p>That's all panoramas from this trip. Thanks for your interest.</p> Lennart PoetteringTue, 19 Jan 2010 21:43:00 +0100tag:0pointer.net,2010-01-19:/blog/photos/india-360-at-a-time-7.htmlphotosIndia, 360 Degrees at a Time, Part Sixhttps://0pointer.net/blog/photos/india-360-at-a-time-6.html <p>Here's the sixth <a href="http://0pointer.de/blog/photos/india-360-at-a-time-1.html">part</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-2.html">of</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-3.html">my</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-4.html">ongoing</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-5.html">series</a>.</p> <p>Leaving Jodhpur we continued our journey to <a href="http://en.wikipedia.org/wiki/Jaisalmer">Jaisalmer</a>, a sand castle of a town in the <a href="http://en.wikipedia.org/wiki/Thar_Desert">Thar desert</a>:</p> <p><a href="http://0pointer.de/static/jaisalmer2"><img alt="Jaisalmer" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/jaisalmer2-gimped-small.jpeg" width="1024" height="213" /></a></p> <p>In the vicinity of Jaisalmer you'll find cliche sand dunes like you'd expect from a grown-up desert:</p> <p><a href="http://0pointer.de/static/jaisalmer1"><img alt="Jaisalmer" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/jaisalmer1-gimped-small.jpeg" width="1024" height="194" /></a></p> <p>Our next station after a long, cold and dusty train ride was <a href="http://en.wikipedia.org/wiki/Delhi">Delhi</a>. The principal mosque of Old Delhi is the <a href="http://en.wikipedia.org/wiki/Jama_Masjid,_Delhi">Jama Masjid</a>:</p> <p><a href="http://0pointer.de/static/delhi1"><img alt="Jama Masjid" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/delhi1-gimped-small.jpeg" width="1024" height="183" /></a></p> <p>That's all for now, tomorrow I'll post the rest of my panoramas from this trip, all from Delhi.</p> Lennart PoetteringMon, 18 Jan 2010 22:14:00 +0100tag:0pointer.net,2010-01-18:/blog/photos/india-360-at-a-time-6.htmlphotosIndia, 360 Degrees at a Time, Part Fivehttps://0pointer.net/blog/photos/india-360-at-a-time-5.html <p>Here's the fourth part <a href="http://0pointer.de/blog/photos/india-360-at-a-time-1.html">of</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-2.html">my</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-3.html">ongoing</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-4.html">series</a>.</p> <p>After Udaipur the next stop on our trip was <a href="http://en.wikipedia.org/wiki/Jodhpur">Jodhpur</a>, the blue city. Which is called that way due of the blue colour of many of its houses:</p> <p><a href="http://0pointer.de/static/jodhpur2"><img alt="Jodhpur" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/jodhpur2-gimped-small.jpeg" width="1024" height="159" /></a></p> <p>On a hill next to <a href="http://en.wikipedia.org/wiki/Mehrangarh_Fort">Mehrangarh Fort</a>, one of the biggest Forts in India (the big sand castle on the hill in the panorama above), you find the <a href="http://en.wikipedia.org/wiki/Jaswant_Thada">Jaswant Thada</a>, a memorial of the Maharajas of Jodhpur:</p> <p><a href="http://0pointer.de/static/jodhpur1"><img alt="Jodhpur" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/jodhpur1-gimped-small.jpeg" width="1024" height="235" /></a></p> <p>Inside the fort you'll find highly decorated courtyards:</p> <p><a href="http://0pointer.de/static/jodhpur3"><img alt="Jodhpur" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/jodhpur3-gimped-small.jpeg" width="1024" height="247" /></a></p> <p>That's all for Jodhpur, tomorrow I'll post more panoramas, from other stops of our trip.</p> Lennart PoetteringSun, 17 Jan 2010 18:43:00 +0100tag:0pointer.net,2010-01-17:/blog/photos/india-360-at-a-time-5.htmlphotosIndia, 360 Degrees at a Time, Part Fourhttps://0pointer.net/blog/photos/india-360-at-a-time-4.html <p>Here's the fourth part of <a href="http://0pointer.de/blog/photos/india-360-at-a-time-1.html">my</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-2.html">ongoing</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-3.html">series</a>.</p> <p>After Hampi we went to Bangalore to attend <a href="http://foss.in/">foss.in</a>. (Fantastic conference, btw. The concerts at the venue are unparalleled.) From there we flew up to <a href="http://en.wikipedia.org/wiki/Udaipur">Udaipur</a>, in Rajasthan. Udaipur is (among other things) famous for being the place where the central scenes of <a href="http://en.wikipedia.org/wiki/Octopussy">Octopussy</a> were filmed. Octopussy's famous white palace is on Jagniwas Island in Lake Pichola:</p> <p><a href="http://0pointer.de/static/udaipur1"><img alt="Udaipur" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/udaipur1-gimped-small.jpeg" width="1024" height="70" /></a></p> <p>This panorama was taken from another island in the lake, Jagmandir Island, which is visible in the following shot on the left:</p> <p><a href="http://0pointer.de/static/udaipur2"><img alt="Udaipur" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/udaipur2-gimped-small.jpeg" width="1024" height="150" /></a></p> <p>Udaipur's scenery, seen from the Maharaja's City Palace down onto Pichola Lake:</p> <p><a href="http://0pointer.de/static/udaipur3"><img alt="Udaipur" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/udaipur3-gimped-small.jpeg" width="1024" height="175" /></a></p> <p>That's all for Udaipur, tomorrow I'll post more panoramas, from other stops of our trip.</p> Lennart PoetteringSat, 16 Jan 2010 03:10:00 +0100tag:0pointer.net,2010-01-16:/blog/photos/india-360-at-a-time-4.htmlphotosAnnouncing udev-browsehttps://0pointer.net/blog/projects/udev-browse.html <p>It's easy to get lost in <tt>/sys</tt> and not much fun typing long <tt>udevadm info</tt> command lines all the time. Today, when I had enough of that I sat down and spent an hour to write a little UI for exploring the udev/sysfs tree: <tt>udev-browse</tt>. I wrote it for my own use, but I am quite sure I am not the only one who wants a little bit simpler access to the device tree. <a href="http://git.0pointer.de/?p=udev-browse.git">So here you go.</a></p> <p>And since everybody loves screenshots here you go:</p> <p><a href="http://0pointer.de/public/udev-browse"><img src="http://0pointer.de/public/udev-browse" width="931" height="728" alt="udev-browse" style="border: 0px" /></a></p> <p>Two usability hints: if you run <tt>udev-browse</tt> from a directory in <tt>/sys</tt> <tt>udev-browse</tt> will automatically present the device of that path on startup. And if you know the name of a device you can just type it into the device listbox (which is focussed by default). The usual Gtk+ live search will then find you the right entry right-away. It's pretty nifty.</p> <p>It's written in Vala with minimal dependencies.</p> <p>I want to keep the maintainership burden for this minimal. So no tarballs, no releases, and I won't reply to your emails regarding this tool, unless they include a good, clean, git formatted patch. Thank you for your understanding.</p> <p>Anyone wants to package this for Fedora? I'd be very thankful if someone would pick it up.</p> <p>Have fun.</p> Lennart PoetteringSat, 16 Jan 2010 02:19:00 +0100tag:0pointer.net,2010-01-16:/blog/projects/udev-browse.htmlprojectsIndia, 360 Degrees at a Time, Part Threehttps://0pointer.net/blog/photos/india-360-at-a-time-3.html <p>Here's the third part of my <a href="http://0pointer.de/blog/photos/india-360-at-a-time-1.html">ongoing</a> <a href="http://0pointer.de/blog/photos/india-360-at-a-time-2.html">series</a>.</p> <p>Still in Hampi here's another 360 from the Hills in Hampi down to the Achyutaraya Temple:</p> <p><a href="http://0pointer.de/static/hampi5"><img alt="Matanga Hill" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/hampi5-gimped-small.jpeg" width="1024" height="234" /></a></p> <p>A little further down, before dawn, here's a shot from the rocky path leading up the hill:</p> <p><a href="http://0pointer.de/static/hampi6"><img alt="Matanga Hill" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/hampi6-gimped-small.jpeg" width="1024" height="250" /></a></p> <p>Our last picture for today is a view down from Hemakuta Hill which is covered with old temples and other structures. In the middle you'll see the large <a href="http://en.wikipedia.org/wiki/Virupaksha_Temple">Virupaksha Temple</a> which is still in full use. In that temple you'll find an amazing <a href="http://en.wikipedia.org/wiki/Camera_obscura">camera obscura</a>, a physics teacher's dream that projects the temple tower onto a wall (<a href="http://0pointer.de/photos/?gallery=India%20Karnataka%202009-11&amp;photo=871">projection</a>, <a href="http://0pointer.de/photos/?gallery=India%20Karnataka%202009-11&amp;photo=865">subject</a>, more interesting in reality. Really.)</p> <p><a href="http://0pointer.de/static/hampi8"><img alt="Hemakuta Hill" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/hampi8-gimped-small.jpeg" width="1024" height="155" /></a></p> <p>That's all for Hampi, tomorrow I'll post more panoramas, from other stops of our trip.</p> Lennart PoetteringThu, 14 Jan 2010 23:47:00 +0100tag:0pointer.net,2010-01-14:/blog/photos/india-360-at-a-time-3.htmlphotosPublic Service Announcement: Beware of xmlCleanupParser()!https://0pointer.net/blog/projects/beware-of-xmlCleanupParser.html <p>Everyone and his dog seem to call libxml2's xmlCleanupParser() at inappropriate places. For example <a href="https://bugzilla.redhat.com/show_bug.cgi?id=532307">Empathy</a> does it, and Abiword does it too. <a href="http://www.google.com/codesearch?q=xmlCleanupParser">Google Code Search</a> seems to reveal at least Inkscape and Dia do it as well.</p> <p>So, please, if your project links against libxml2 verify that it calls xmlCleanupParser() only once, and right before exiting! And if it calls it more often or somewhere else, then please fix that!</p> <p>For more information <a href="http://lists.fedoraproject.org/pipermail/devel/2010-January/129117.html">see my post on fedora-devel</a>.</p> <p>Thanks for your time.</p> Lennart PoetteringWed, 13 Jan 2010 00:29:00 +0100tag:0pointer.net,2010-01-13:/blog/projects/beware-of-xmlCleanupParser.htmlprojectsIndia, 360 Degrees at a Time, Part Twohttps://0pointer.net/blog/photos/india-360-at-a-time-2.html <p>Here's the second part of my <a href="http://0pointer.de/blog/photos/india-360-at-a-time-1.html">ongoing series</a>.</p> <p>Climbing down the hills, on the banks of the Tungabhadra river you find people washing laundry and bathing, and <a href="http://0pointer.de/photos/?gallery=India%20Karnataka%202009-11&amp;photo=1434">coracles</a> waiting to be used for a trip through the river.</p> <p><a href="http://0pointer.de/static/hampi2"><img alt="Tungabhadra River" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/hampi2-gimped-small.jpeg" width="1024" height="146" /></a></p> <p>The greatest of the ancient temples in Hampi is the <a href="http://en.wikipedia.org/wiki/Vijayanagara#Vittala_Temple">Vitthala Temple</a>:</p> <p><a href="http://0pointer.de/static/hampi3"><img alt="Vitthala Temple" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/hampi3-gimped-small.jpeg" width="1024" height="162" /></a></p> <p>Set in in lush green scenery you find the Achyutaraya Temple, which you already might have seen, from above, in <a href="http://0pointer.de/static/hampi7">yesterday's series</a>:</p> <p><a href="http://0pointer.de/static/hampi4"><img alt="Achyutaraya Temple" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/hampi4-gimped-small.jpeg" width="1024" height="164" /></a></p> <p>That's it for today, tomorrow I'll post more panoramas, both from Hampi and other stops of our trip.</p> Lennart PoetteringTue, 12 Jan 2010 19:05:00 +0100tag:0pointer.net,2010-01-12:/blog/photos/india-360-at-a-time-2.htmlphotosIndia, 360 Degrees at a Time, Part Onehttps://0pointer.net/blog/photos/india-360-at-a-time-1.html <p>Yes, I won't spare you my panorama shots from my recent trip to India. After arriving in Goa <a href="http://en.wikipedia.org/wiki/Badami">Badami</a> was our next stop. It's a very pretty little town in northern Karnataka, and here's a panorama shot from the entrance of the town's famous caves:</p> <p><a href="http://0pointer.de/static/badami1"><img alt="Badami" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/badami1-gimped-small.jpeg" width="1024" height="324" /></a></p> <p>Next step was one of the most amazing places on earth, <a href="http://en.wikipedia.org/wiki/Hampi">Hampi</a> in central Karnataka. It is definitely one of the greatest sights I have ever seen, and I guess I can say I have seen quite a few in my life. A vast landscape of hills covered in boulders, lush mango and banana plantations, rice fields, dotted with age-old temples and impressive ruins. Locals crossing the river in <a href="http://0pointer.de/photos/?gallery=India%20Karnataka%202009-11&amp;photo=1434">coracles</a> that look like they belong in a time centuries ago. Women washing colourful laundry in the river, pilgrims wading across the river in their black clothes. An India that delivers every bit of that promise it makes to its visitors. The ruins rival the grand sites in Greece and the landscape sometimes looks like a Crysis in-game scene.</p> <p>Taken from one of the hills in Hampi this is the sunset:</p> <p><a href="http://0pointer.de/static/hampi1"><img alt="Hampi Sunset" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/hampi1-gimped-small.jpeg" width="1024" height="122" /></a></p> <p>And then, the next day at dawn make your way up the hills again and you can get an even greater view on the whole scenery:</p> <p><a href="http://0pointer.de/static/hampi7"><img alt="Hampi Dawn" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/hampi7-gimped-small.jpeg" width="1024" height="200" /></a></p> <p>That's it for today, tomorrow I'll post more panoramas, both from Hampi and other stops of our trip.</p> <p>Also, if you haven't seen them yet, don't miss <a href="http://0pointer.de/blog/photos/india-again.html">my panoramas from my India trip the year before</a>.</p> Lennart PoetteringMon, 11 Jan 2010 20:56:00 +0100tag:0pointer.net,2010-01-11:/blog/photos/india-360-at-a-time-1.htmlphotosJodhpur After Darkhttps://0pointer.net/blog/photos/jodhpur.html <div> <a href="http://0pointer.de/photos/?gallery=India%20Rajasthan%202009-12&amp;photo=1536"><img src="http://0pointer.de/photos/galleries/India%20Rajasthan%202009-12/lq/img-1536.jpg" width="320" height="480" alt="Jodhpur" /></a>&nbsp;<a href="http://0pointer.de/photos/?gallery=India%20Rajasthan%202009-12&amp;photo=1505"><img src="http://0pointer.de/photos/galleries/India%20Rajasthan%202009-12/lq/img-1505.jpg" width="320" height="480" alt="Jodhpur" /></a>&nbsp;<a href="http://0pointer.de/photos/?gallery=India%20Rajasthan%202009-12&amp;photo=1526"><img src="http://0pointer.de/photos/galleries/India%20Rajasthan%202009-12/lq/img-1526.jpg" width="320" height="480" alt="Jodhpur" /></a> </div> <p>India is a weird and beautiful country. And I am too lazy to retouch my photos.</p> Lennart PoetteringThu, 31 Dec 2009 16:33:00 +0100tag:0pointer.net,2009-12-31:/blog/photos/jodhpur.htmlphotosOn OOMhttps://0pointer.net/blog/projects/on-oom.html <p>Building on what <a href="http://log.ometer.com/2008-02.html#4.2">Havoc wrote two years ago about the fallacies of OOM safety (Out Of Memory) in user code</a> I'd like to point you to <a href="http://article.gmane.org/gmane.comp.audio.jackit/19998">this little mail I just posted to jack-devel</a> which tries to give you the bigger picture. Should be interesting for non-audio folks, too.</p> <p>Say <b>NO</b> to OOM safety!</p> Lennart PoetteringFri, 13 Nov 2009 02:25:00 +0100tag:0pointer.net,2009-11-13:/blog/projects/on-oom.htmlprojectsPublic Service Announcementhttps://0pointer.net/blog/projects/no-more-dmidecode.html <p>Folks! Since quite some time now the kernel exports the DMI machine information below <tt>/sys/class/dmi/id/</tt>. You may stop now parsing the output of <tt>dmidecode</tt> thus depending on external tools and privileged code.</p> <p>For example, to read your BIOS vendor string all you need to do is this:</p> <pre>$ read bv &lt; /sys/class/dmi/id/bios_vendor $ echo $bv</pre> <p>Which is of course much simpler, and cleaner, and safer than anything involving <tt>dmidecode</tt>.</p> <p>Thank you for your time!</p> Lennart PoetteringFri, 06 Nov 2009 11:14:00 +0100tag:0pointer.net,2009-11-06:/blog/projects/no-more-dmidecode.htmlprojectsUbuntu doesn't get ithttps://0pointer.net/blog/projects/pa-in-ubuntu.html #nocomments yes <p>&lt;rant&gt;</p> <p>So in the past Ubuntu packaged PA in a way that, let's say, was not exactly optimal. I thought they'd gotten around fixing things since then. Turns out they didn't. Seems in their upcoming release they again did some <a href="https://bugs.launchpad.net/ubuntu/+source/pulseaudio/+bug/452458">genius thing to make PA on Ubuntu perform worse than it could</a>. The Ubuntu kernel contains all kind of closed-source and other crap to no limits, but backporting a tiny patch that is blessed and merged upstream and in Fedora for ages, that they won't do. Gah.</p> <p>And it doesn't stop there. <a href="http://bazaar.launchpad.net/~ubuntu-core-dev/pulseaudio/ubuntu/annotate/head%3A/debian/patches/0053-fix-sigsegv-module-bluetooth-device.patch">This patch is an outright insult<a />. <a href="http://bazaar.launchpad.net/~ubuntu-core-dev/pulseaudio/ubuntu/annotate/head%3A/debian/patches/0090-disable-flat-volumes.patch">This is disappointing</a>.</p> <p>Madness. Not good, Ubuntu, really not good! And I'll get all the complaints for this f**up again. Thanks!</p> <p>/me is disappointed. Ubuntu, you really can do better than this.</p> <p>&lt;/rant&gt;</p> Lennart PoetteringMon, 19 Oct 2009 03:13:00 +0200tag:0pointer.net,2009-10-19:/blog/projects/pa-in-ubuntu.htmlprojectsThe Times They Are A-Changin'https://0pointer.net/blog/projects/win7-plays-catchup.html #nocomments y <p>Kinda fun <a href="http://channel9.msdn.com/shows/Going+Deep/Elliot-H-Omiya-Larry-Osterman-and-Frank-Yerrace-Inside-Windows-7-Audio-Stack/">watching this video</a>. As it seems the big new features of the Windows 7 audio stack are the ability to move streams while they are live, to do role-based policy routing, and to pause streams during phone calls. Hah! That's so yesterday! <a href="http://pulseaudio.org/">A certain sound server I happen to know very well</a> has been supporting this for a longer time already, and you can even buy that logic in <a href="http://maemo.nokia.com/n900/">various consumer products</a>.</p> <p>Nice to know that in some areas of the audio stack it's not us who need to play catch-up with them, but they are the ones who need to play catch-up with us.</p> Lennart PoetteringSun, 18 Oct 2009 19:33:00 +0200tag:0pointer.net,2009-10-18:/blog/projects/win7-plays-catchup.htmlprojectsIn The Press IIhttps://0pointer.net/blog/projects/cio-lpc-2k9.html <p><a href="http://www.cio.com.au/article/320807/open_source_identity_pulseaudio_creator_lennart_poettering">CIO has an interview with me.</a></p> Lennart PoetteringSat, 10 Oct 2009 16:38:00 +0200tag:0pointer.net,2009-10-10:/blog/projects/cio-lpc-2k9.htmlprojectsIn The Presshttps://0pointer.net/blog/projects/lwn-lpc-2k9.html <p><a href="http://lwn.net/Articles/355542/">LWN covers Paul's and my talk at the Audio MC at LPC, Portland.</a> (Subscribers only for now)</p> <p><b>Update:</b> <a href="http://lwn.net/SubscriberLink/355542/e354c2205dade9e4/">Here's a free subscriber link.</a></p> Lennart PoetteringWed, 07 Oct 2009 20:29:00 +0200tag:0pointer.net,2009-10-07:/blog/projects/lwn-lpc-2k9.htmlprojectsLPC Audio BoF Noteshttps://0pointer.net/blog/projects/audio-bof-notes.html <p>Here are some very short notes from the Audio BoF at the <a href="http://linuxplumbersconf.org/2009/">Linux Plumbers Conference</a> in Portland two weeks ago. Sorry for the delay!</p> <p>Biggest issue discussed was audio routing. On embedded devices this gets more complex each day, and there are a lot of open questions on the desktop, too. Different DSP scenarios; how do mixer controls match up with PCM streams and jack sensing? How do we determine which volume control sliders that are in the pipeline we are currently interested in? How does that relate to policy decisions? Format to store audio routing in?</p> <p>The <a href="http://www.slimlogic.co.uk/?p=40">ALSA scenario subsystem</a> currently being worked on by Liam Girdwood and the folks at SlimLogic and currently on its way to being integrated into ALSA proper hopefully helps us, so that we can strip a lot of complexity related to the routing logic from PulseAudio and move it into a lower level which naturally knows more about the hardware's internal routing.</p> <p>Does it make sense for some apps to bypass the ALSA userspace layer and to talk to the kernel drivers via ioctl()s directly?i (i.e. thus not depending on ALSA's LISP intepreter, and a lot of other complexities)? Probably yes, but certainly not in the short term future. Salsa? libsydney?</p> <p>Should the timing deviation estimation/interpolation be moved from PulseAudio into the kernel? Might be a good idea. Particularly interesting when we try to to monitor not only the system and audio clocks, but the video output and particularly the video input (i.e. video4linux) clocks, too. A unified kernel-based timing system has advantages in accuracy, allows better handling of (pseudo-) atomic timing snapshots, and would centralize timing handling not only between different applications (PA and JACK) but also between different subsystems. Problem: current timing stuff in PulseAudio might be a bit too homegrown for moving it 1:1 into the kernel. Also, depends on FP. Needs someone to push this. Apple does the clock handling in the kernel. How does this relate to ALSA's timer API?</p> <p>Seems Ubuntu is going to kill OSS pretty soon too, following Fedora's lead. Yay!</p> <p>And that's all I have. Should be the biggest points raised. Ping me if I forgot something.</p> Lennart PoetteringWed, 07 Oct 2009 01:36:00 +0200tag:0pointer.net,2009-10-07:/blog/projects/audio-bof-notes.htmlprojectsLatency Controlhttps://0pointer.net/blog/projects/latency-control.html #nocomments yes <p>An often asked question is how to properly talk to <a href="http://pulseaudio.org/">PulseAudio</a> from within applications where latency matters. To answer that question once and for all I've <a href="http://pulseaudio.org/wiki/LatencyControl">written this guide in our Wiki</a> that should light things up a little. If you are interested in audio latency in PA, want to know how to minimize CPU usage and power consumption or how to maximize drop-out safety make sure to read this!</p> Lennart PoetteringTue, 06 Oct 2009 20:49:00 +0200tag:0pointer.net,2009-10-06:/blog/projects/latency-control.htmlprojectsCanonical,https://0pointer.net/blog/projects/canonical-contributions.html #nocomments y <p>one small note: requiring <a href="http://www.canonical.com/contributors">copyright assignment</a> from contributors, and putting your code in exotic VCSes that only a minority of potential contributors know or are willing to use is not helpful for attracting contributions -- right the contrary, it scares them away. Please fix that!</p> Lennart PoetteringMon, 05 Oct 2009 21:17:00 +0200tag:0pointer.net,2009-10-05:/blog/projects/canonical-contributions.htmlprojectsConferenceshttps://0pointer.net/blog/projects/lpc-bluez-maemo-2009.html <p>Last week I've been at the <a href="http://linuxplumbersconf.org/2009/">Linux Plumbers Conference</a> in Portland. Like last year it kicked ass and proved again being one of the most relevant Linux developer conferences (if not <i>the</i> most relevant one). I ran the Audio MC at the conference which was very well attended. The slides for our <a href="http://linuxplumbersconf.org/2009/program/">four talks in the track are available online</a>. (My own slides are probably a bit too terse for most readers, the interesting stuff was in the talking, not the reading...) Personally, for me the most interesting part was to see to which degree Nokia actually adopted <a href="http://pulseaudio.org/">PulseAudio</a> in the N900. While I was aware that Nokia was using it, I wasn't aware that their use is as comprehensive as it turned out it is. And the industry support from other companies is really impressive too. After the main track we had a BoF session, which notes I'll post a bit later. Many thanks to Paul, Jyri, Pierre for their great talks. Unfortunately, Palm, the only manufacturer who is actually already shipping a phone with PulseAudio didn't send anyone to the conference who wanted to talk about that. Let's hope they'll eventually learn that just throwing code over the wall is not how Open Source works. Maybe they'll send someone to next year's LPC in Boston, where I hope to be able to do the Audio MC again.</p> <p>Right now I am at the BlueZ Summit in Stuttgart. Among other things we have been discussing how to improve Bluetooth Audio support in PulseAudio. I guess one could say thet the Bluetooth support in PulseAudio is already one of its highlights, in fact working better then the support on other OSes (yay, that's an area where Linux Audio really shines!). So up next is better support for allowing PA to receive A2DP audio, i.e. making PA act as if it was a Headset or your hifi. Use case: send music from from your mobile to your desktop's hifi speakers. (Actually this is already support in current BlueZ/PA versions, but not easily accessible). Also Bluetooth headsets tend to support AC3 or MP3 decoding natively these days so we should support that in PA too. Codec handling has been on the TODO list for PA for quite some time, for the SPDIF or HDMI cases, and Bluetooth Audio is another reason why we really should have that.</p> <p>Next week I'll be at the <a href="http://wiki.maemo.org/Maemo_Summit_2009">Maemo Summit</a> in Amsterdam. Nokia kindly invited me. Unfortunately I was a bit too late to get a proper talk accepted. That said, I am sure if enough folks are interested we could do a little ad-hoc BoF and find some place at the venue for it. If you have any questions regarding PA just talk to me. The N900 uses PulseAudio for all things audio so I am quite sure we'll have a lot to talk about.</p> <p>See you in Amsterdam!</p> <p>One last thing: Check out <a href="http://colin.guthr.ie/2009/10/kde-plus-pulseaudio-does-not-equal-sucks/">Colin's work to improve integration of PulseAudio and KDE</a>!</p> Lennart PoetteringFri, 02 Oct 2009 16:57:00 +0200tag:0pointer.net,2009-10-02:/blog/projects/lpc-bluez-maemo-2009.htmlprojectsPlumbers 2009 Audio Bof Thu, 10:00 amhttps://0pointer.net/blog/projects/plumbers-audio-bof.html <p>Tomorrow, Thu 24th 10 am, there's going to be an Audio BoF at LPC Portland, Salon E. Don't miss it.</p> Lennart PoetteringThu, 24 Sep 2009 01:10:00 +0200tag:0pointer.net,2009-09-24:/blog/projects/plumbers-audio-bof.htmlprojectsSkypehttps://0pointer.net/blog/projects/skype.html <p>A quick update on Skype: the next Skype version will include native PulseAudio support. And not only that but they even <a href="http://0pointer.de/blog/projects/tagging-audio.html">tag their audio streams properly</a>. This enables PulseAudio to do fancy stuff like automatically pausing your audio playback when you have a phone call. Good job!</p> <p>In some ways they are now doing a better job with integration in to the modern audio landscape than some Free Software telephony applications!</p> <p>Unfortunately they didn't fix the biggest bug though: it's still not Free Software!</p> Lennart PoetteringTue, 22 Sep 2009 19:51:00 +0200tag:0pointer.net,2009-09-22:/blog/projects/skype.htmlprojectsMore Mutracehttps://0pointer.net/blog/projects/mutrace2.html <p>Here's a list of quick updates on my <a href="http://git.0pointer.de/?p=mutrace.git"><tt>mutrace</tt></a> mutex profiler since <a href="http://0pointer.de/blog/projects/mutrace.html">my initial announcement two weeks ago</a>:</p> <p>I added some special support for tracking down use of mutexes in realtime threads. It's a very simple extension that -- if enabled -- checks on each mutex operation wheter it is executed by a realtime thread or not. (--track-rt) The output of a test run of this you can find in <a href="http://lalists.stanford.edu/lad/2009/09/0116.html">this announcement on LAD</a>. Particularly interesting is that you can use this to track down which mutexes are good candidates for priority inheritance.</p> <p>The mutrace tarball now also includes a companion tool <tt>matrace</tt> that can be used to track down memory allocation operations in realtime threads. See the same lad announcement as above for example output of this tool.</p> <p>With help from Boudewijn Rempt I added some compatibility code for profiling C++/Qt apps with mutrace, which he already used <a href="http://mail.kde.org/pipermail/kimageshop/2009-September/007471.html">for some</a> <a href="http://mail.kde.org/pipermail/kimageshop/2009-September/007470.html">interesting profiling results</a> on krita.</p> <p>Finally, after my comments on the locking hotspots in glib's type system, Wim Taymans and Edward Hervey worked on turning the mutex-emulated rwlocks into OS native ones with quite positive results, <a href="https://bugzilla.gnome.org/show_bug.cgi?id=585375">for more information see this bug</a>.</p> <p>As soon as <a href="https://bugzilla.redhat.com/show_bug.cgi?id=523553">my review request</a> is fully processed mutrace will be available in rawhide.</p> <p>A snapshot tarball of <tt>mutrace</tt> <a href="http://0pointer.de/public/mutrace-0.1.tar.gz">you may find here</a> (despite the name of the tarball that's just a snapshot, not the real release 0.1), for all those folks who are afraid of git, or don't have a current autoconf/automake/libtool installed.</p> <p><a href="http://lwn.net/Articles/352828/">Oh, and they named a unit after me.</a></p> Lennart PoetteringTue, 22 Sep 2009 19:38:00 +0200tag:0pointer.net,2009-09-22:/blog/projects/mutrace2.htmlprojectsMeasuring Lock Contentionhttps://0pointer.net/blog/projects/mutrace.html <p>When naively profiling multi-threaded applications the time spent waiting for mutexes is not necessarily visible in the generated output. However lock contention can have a big impact on the runtime behaviour of applications. On Linux <a href="http://valgrind.org/docs/manual/drd-manual.html">valgrind's drd</a> can be used to track down mutex contention. Unfortunately running applications under valgrind/drd slows them down massively, often having the effect of itself generating many of the contentions one is trying to track down. Also due to its slowness it is very time consuming work.</p> <p>To improve the situation if have now written <a href="http://git.0pointer.de/?p=mutrace.git">a mutex profiler called <tt>mutrace</tt></a>. In contrast to valgrind/drd it does not virtualize the CPU instruction set, making it a lot faster. In fact, the hooks <tt>mutrace</tt> relies on to profile mutex operations should only minimally influence application runtime. <tt>mutrace</tt> is not useful for finding synchronizations bugs, it is solely useful for profiling locks.</p> <p>Now, enough of this introductionary blabla. Let's have a look on the data <tt>mutrace</tt> can generate for you. As an example we'll look at <tt>gedit</tt> as a bit of a prototypical Gnome application. Gtk+ and the other Gnome libraries are not really known for their heavy use of multi-threading, and the APIs are generally not thread-safe (for a good reason). However, internally subsytems such as <tt>gio</tt> do use threading quite extensibly. And as it turns out there are a few hotspots that can be discovered with <tt>mutrace</tt>:</p> <pre> $ LD_PRELOAD=/home/lennart/projects/mutrace/libmutrace.so gedit mutrace: 0.1 sucessfully initialized. </pre> <p>gedit is now running and its mutex use is being profiled. For this example I have now opened a file with it, typed a few letters and then quit the program again without saving. As soon as gedit exits <tt>mutrace</tt> will print the profiling data it gathered to stderr. <a href="http://0pointer.de/public/mutrace.txt">The full output you can see here.</a> The most interesting part is at the end of the generated output, a breakdown of the most contended mutexes:</p> <pre> mutrace: 10 most contended mutexes: Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Type 35 368268 407 275 120,822 0,000 0,894 normal 5 234645 100 21 86,855 0,000 0,494 normal 26 177324 47 4 98,610 0,001 0,150 normal 19 55758 53 2 23,931 0,000 0,092 normal 53 106 73 1 0,769 0,007 0,160 normal 25 15156 70 1 6,633 0,000 0,019 normal 4 973 10 1 4,376 0,004 0,174 normal 75 68 62 0 0,038 0,001 0,004 normal 9 1663 52 0 1,068 0,001 0,412 normal 3 136553 41 0 61,408 0,000 0,281 normal ... ... ... ... ... ... ... ... mutrace: Total runtime 9678,142 ms. </pre> <p>(Sorry, LC_NUMERIC was set to de_DE.UTF-8, so if you can't make sense of all the commas, think <tt>s/,/./g</tt>!)</p> <p>For each mutex a line is printed. The 'Locked' column tells how often the mutex was locked during the entire runtime of about 10s. The 'Changed' column tells us how often the owning thread of the mutex changed. The 'Cont.' column tells us how often the lock was already taken when we tried to take it and we had to wait. The fifth column tell us for how long during the entire runtime the lock was locked, the sixth tells us the average lock time, and the seventh column tells us the longest time the lock was held. Finally, the last column tells us what kind of mutex this is (recursive, normal or otherwise).</p> <p>The most contended lock in the example above is #35. 275 times during the runtime a thread had to wait until another thread released this mutex. All in all more then 120ms of the entire runtime (about 10s) were spent with this lock taken!</p> <p>In the full output we can now look up which mutex #35 actually is:</p> <pre> Mutex #35 (0x0x7f48c7057d28) first referenced by: /home/lennart/projects/mutrace/libmutrace.so(pthread_mutex_lock+0x70) [0x7f48c97dc900] /lib64/libglib-2.0.so.0(g_static_rw_lock_writer_lock+0x6a) [0x7f48c674a03a] /lib64/libgobject-2.0.so.0(g_type_init_with_debug_flags+0x4b) [0x7f48c6e38ddb] /usr/lib64/libgdk-x11-2.0.so.0(gdk_pre_parse_libgtk_only+0x8c) [0x7f48c853171c] /usr/lib64/libgtk-x11-2.0.so.0(+0x14b31f) [0x7f48c891831f] /lib64/libglib-2.0.so.0(g_option_context_parse+0x90) [0x7f48c67308e0] /usr/lib64/libgtk-x11-2.0.so.0(gtk_parse_args+0xa1) [0x7f48c8918021] /usr/lib64/libgtk-x11-2.0.so.0(gtk_init_check+0x9) [0x7f48c8918079] /usr/lib64/libgtk-x11-2.0.so.0(gtk_init+0x9) [0x7f48c89180a9] /usr/bin/gedit(main+0x166) [0x427fc6] /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f48c5b42b4d] /usr/bin/gedit() [0x4276c9] </pre> <p>As it appears in this Gtk+ program the rwlock <tt>type_rw_lock</tt> (defined in glib's <tt>gobject/gtype.c</tt>) is a hotspot. GLib's rwlocks are implemented on top of mutexes, so an obvious attempt in improving this could be to actually make them use the operating system's rwlock primitives.</p> <p>If a mutex is used often but only ever by the same thread it cannot starve other threads. The 'Changed.' column lists how often a specific mutex changed the owning thread. If the number is high this means the risk of contention is also high. The 'Cont.' column tells you about contention that actually took place.</p> <p>Due to the way <tt>mutrace</tt> works we cannot profile mutexes that are used internally in glibc, such as those used for synchronizing <tt>stdio</tt> and suchlike.</p> <p><tt>mutrace</tt> is implemented entirely in userspace. It uses all kinds of exotic GCC, glibc and kernel features, so you might have a hard time compiling and running it on anything but a very recent Linux distribution. I have tested it on Rawhide but it should work on slightly older distributions, too.</p> <p>Make sure to build your application with <tt>-rdynamic</tt> to make the backtraces <tt>mutrace</tt> generates useful.</p> <p>As of now, <tt>mutrace</tt> only profiles mutexes. Adding support for rwlocks should be easy to add though. Patches welcome.</p> <p>The output <tt>mutrace</tt> generates can be influenced by various <tt>MUTRACE_xxx</tt> environment variables. See the sources for more information.</p> <p>And now, please take <tt>mutrace</tt> and profile and speed up your application!</p> <p><a href="http://git.0pointer.de/?p=mutrace.git">You may find the sources in my git repository.</a></p> Lennart PoetteringTue, 15 Sep 2009 00:07:00 +0200tag:0pointer.net,2009-09-15:/blog/projects/mutrace.htmlprojectspthread_key_create() is dangeroushttps://0pointer.net/blog/projects/pthread-key-create.html <p>If you use <tt>pthread_key_create()</tt> with a non-NULL <tt>destructor</tt> parameter (or an equivalent TLS construct) in a library/shared object then you <i>MUST</i> link your library wth <tt>-z nodelete</tt> (or an equivalent construct).</p> <p>If you don't, then you'll have a lot of fun (like I just had) debugging segfaults in the TLS destruction logic where functions are called that might not even exist anymore in memory.</p> <p>Now don't tell me I hadn't told you.</p> <p>(Oh, and I hope I don't need to mention that all GObject-based libraries should link with <tt>-z nodelete</tt> anyway, for making sure the type system doesn't break.)</p> Lennart PoetteringMon, 10 Aug 2009 22:39:00 +0200tag:0pointer.net,2009-08-10:/blog/projects/pthread-key-create.htmlprojectsThe Highest Man in Spainhttps://0pointer.net/blog/photos/canaries-360.html <p>Ever wanted to know what's the view like being the highest person in all of Spain? -- No? Hmm, can't help you then. -- Otherwise:</p> <p><a href="http://0pointer.de/static/teide2"><img alt="Pico del Teide" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/teide2-small.jpeg" width="1024" height="230" /></a></p> <p>That's on the summit of <a href="http://en.wikipedia.org/wiki/Pico_del_Teide">Pico del Teide</a> at 3718m, on <a href="http://en.wikipedia.org/wiki/Tenerife">Tenerife island</a>. Unless you leave solid ground this is as high as you can get in Spain. 163m lower it's a bit more obvious that the Teide is a volcano:</p> <p><a href="http://0pointer.de/static/teide"><img alt="Pico del Teide" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/teide-small.jpeg" width="1024" height="184" /></a></p> <p>And coming down to the surrounding caldera it's even more obvious:</p> <p><a href="http://0pointer.de/static/teide3"><img alt="Pico del Teide" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/teide3-small.jpeg" width="1024" height="260" /></a></p> <p><a href="http://0pointer.de/static/teide4"><img alt="Pico del Teide" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/teide4-small.jpeg" width="1024" height="296" /></a></p> <p><a href="http://0pointer.de/static/teide5"><img alt="Pico del Teide" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/teide5-small.jpeg" width="1024" height="229" /></a></p> <p>On a ridge next to the caldera you find the <a href="http://en.wikipedia.org/wiki/Teide_Observatory">Teide Observatory</a>:</p> <p><a href="http://0pointer.de/static/observatory"><img alt="Teide Observatory" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/observatory.jpeg" width="1024" height="98" /></a></p> <p>The caldera is covered in old lava flows:</p> <p><a href="http://0pointer.de/static/caldera"><img alt="Caldera" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/caldera-small.jpeg" width="1024" height="240" /></a></p> <p><a href="http://0pointer.de/static/caldera2"><img alt="Caldera" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/caldera2-small.jpeg" width="1024" height="307" /></a></p> <p>Vulcanism has created various interesting rock formations in the caldera:</p> <p><a href="http://0pointer.de/static/roques"><img alt="Roques" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/roques-small.jpeg" width="1024" height="194" /></a></p> <p><a href="http://0pointer.de/static/roques2"><img alt="Roques" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/roques2-small.jpeg" width="1024" height="176" /></a></p> <p>Tenerife is not just about the Teide and its dusty caldera. In the north of the island you find the <a href="http://en.wikipedia.org/wiki/Macizo_de_Anaga">Anaga mountain range</a>:</p> <p><a href="http://0pointer.de/static/tenerife-north"><img alt="Tenerife North" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/tenerife-north-small.jpeg" width="1024" height="175" /></a></p> <p>Neighboring <a href="http://en.wikipedia.org/wiki/Gran_Canaria">Gran Canaria</a> was where our little trip started and ended, right after the <a href="http://www.grancanariadesktopsummit.org/">Gran Canaria Desktop Summit</a>. Gran Canaria has no Teide but a very impressive landscape nonetheless:</p> <p><a href="http://0pointer.de/static/nublo"><img alt="Roque Nublo" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/nublo-small.jpeg" width="1024" height="210" /></a></p> <p>That's the view from the <a href="http://en.wikipedia.org/wiki/Roque_Nublo">Roque Nublo</a>, the island's most famous landmark. The rock itself is visible here (on the left):</p> <p><a href="http://0pointer.de/static/nublo2"><img alt="Roque Nublo" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/nublo2-small.jpeg" width="1024" height="176" /></a></p> Lennart PoetteringSun, 09 Aug 2009 22:22:00 +0200tag:0pointer.net,2009-08-09:/blog/photos/canaries-360.htmlphotosOh Nine Sixteenhttps://0pointer.net/blog/projects/oh-nine-sixteen.html #nocomments y <p>As a followup to <a href="http://0pointer.de/blog/projects/oh-nine-fifteen.html">Oh Nine Fifteen</a> here's a little overview of the changes coming with <a href="http://pulseaudio.org/">PulseAudio 0.9.16</a> which will be part of Fedora 12 (already in Rawhide; I think Ubuntu Karmic (?) will have it too).</p> <h3>A New Mixer Logic</h3> <p>We now try to control more than just a single ALSA mixer element for volume control. This increases the hardware volume range and granularity exposed and should also help minimizing problems by incomplete or incorrect default mixer initialization on the lower levels.</p> <p>This also adds support for allowing selection of input/output ports for sound cards. This is used to expose changing between Mic vs. Line-In for input source selection and Headphones vs. Speaker for output selection (of course the list of available port is strictly dependant on what you hardware supports). The list of available ports is deliberately kept minimal.</p> <p>Thanks to Bastien the newest GNOME Volume Control now exposes profile/port switching quite nicely, <a href="http://www.hadess.net/2009/07/bad-at-updates-easy-51.html">which he blogged about.</a> <a href="http://0pointer.de/public/g-v-c-ports">This screenshot shows how the port (here called 'Connector') can be selected in the new dialog.</a></p> <p>The mixer rework also allows us to handle semi-pro/pro sound cards a bit more flexibly. For example, which profiles/ports are exposed in PulseAudio or how specific mixer elements are handled can now be controlled by editing .ini file like configuration files in <tt>/usr/share/pulseaudio/alsa-mixer/</tt>. <a href="https://tango.0pointer.de/pipermail/pulseaudio-discuss/2009-June/004229.html">Read this mail for more information about this.</a></p> <h3>UPnP MediaServer Support</h3> <p>PulseAudio now integrates with Zeeshan's fabulous <a href="http://live.gnome.org/Rygel">Rygel UPnP/DLNA MediaServer</a>. If enabled Rygel will automatically expose all local audio devices which are managed by PulseAudio as UPnP/DLNA MediaServer items which your UPnP/DLNA MediaRenderers can now tune into. (Meaning: you can now stream audio from your PC directly to your UPnP DMP (Digital Media Player) device, such as the PS3.) Communication between Rygel and PulseAudio follows our little <a href="http://live.gnome.org/Rygel/MediaServerSpec">Media Server Spec on the GNOME Wiki</a>. This nicely complements the RAOP (Apple Airport) support we introduced in PulseAudio 0.9.15. In one of the next versions of PulseAudio/Rygel we hope to add support for PulseAudio becoming a MediaRenderer as well. This will then not only allow you to stream from your PC to your DMP device, but also allows PulseAudio to act as "networked speaker", which can be used by any UPnP/AV/DLNA control point, such as Windows' Media Player.</p> <h3>Hotplug Support Improved</h3> <p>If you select a particular device as the default for a specific application or class of streams, then when unplugging the device PulseAudio moves the stream automatically to another audio device if one exists. New in PulseAudio 0.9.16 is that if you replug the audio device the stream will instantly be moved back, requiring no further user intervention.</p> <p>Also, PulseAudio now includes some implicit rules for doing the 'right thing' when finding an audio device for an application. For example, unless configured otherwise it will now route telephony applications automatically to Bluetooth headsets if one is connected, in favour of the internal sound card of the computer.</p> <h3>Surround Sound Support for Event Sounds</h3> <p>This is more a new feature of <a href="http://0pointer.de/lennart/projects/libcanberra/">libcanberra</a> than of PulseAudio, but nonetheless: we now support surround for events sounds. This allows us to play full 5.1 login sounds for example, in best THX cinema fashion. We'd love to ship a 5.1 sound for login by default in <a href="http://cgit.freedesktop.org/~mccann/sound-theme-freedesktop/">sound-theme-freedesktop</a>. We'd be very thankful if <i>you</i> would be willing to contribute a sound here, or two! A sound a bit less bombastic than the famous cinema THX effect would probably be a good idea though.</p> <p>And then there's of course the usual batch of fixes and small improvements. A substantial number of non-user visible changes have been made as well. For example, as HAL is now obsolete PulseAudio now moved to udev for its device discovery needs. We replaced our gdbm support by support for tdb. Also, we stripped all security senstive code from PulseAudio, and ported it to use <a href="http://0pointer.de/blog/projects/rtkit.html">RealtimeKit</a> instead. For the upcoming distributions that means that PulseAudio will run as real-time process by default, improving drop-out safety.</p> <p>And for some extra PA eye-candy, have a look on <a href="http://www.gnome-look.org/content/show.php/Impulse+-+PulseAudio+visualizer?content=99383">Impulse</a>!</p> Lennart PoetteringWed, 05 Aug 2009 02:30:00 +0200tag:0pointer.net,2009-08-05:/blog/projects/oh-nine-sixteen.htmlprojectsWorld Domination Accomplishedhttps://0pointer.net/blog/projects/avahi-world-domination.html #nocomments y <p>I hereby officially declare that I have reached my goal of world domination. <a href="http://emacs-fu.blogspot.com/2009/07/emacs-23-is-very-near.html">Emacs 23 (apparently due today) ships with Avahi support out of the box.</a> Obviously, one of the most natural combinations of software thinkable.</p> <p>After Emacs, there's not much else I could win, or is there?</p> Lennart PoetteringWed, 29 Jul 2009 21:03:00 +0200tag:0pointer.net,2009-07-29:/blog/projects/avahi-world-domination.htmlprojectsYet Another Kithttps://0pointer.net/blog/projects/rtkit.html <p><a href="http://0pointer.de/blog/projects/cgroups-and-rtwatch">A while back</a> I was celebrating that arrival of <i>secure</i> realtime scheduling for the desktop. As it appears this was a bit premature then, since (mis-)using cgroups for this turned out to be more problematic and messy than I anticipated.</p> <p>As a followup I'd now like to point you to <a href="http://lalists.stanford.edu/lad/2009/06/0191.html">this announcement I posted to LAD yesterday</a>, introducing <a href="http://git.0pointer.de/?p=rtkit.git">RealtimeKit</a> which should fix the problem for good. It has now entered Rawhide becoming part of the default install (by means of being a dependency of PulseAudio), and I assume the other distros are going to adopt it pretty soon, too.</p> <p><a href="http://lalists.stanford.edu/lad/2009/06/0191.html">Read the full announcement.</a></p> Lennart PoetteringSat, 20 Jun 2009 21:29:00 +0200tag:0pointer.net,2009-06-20:/blog/projects/rtkit.htmlprojectsLinux Plumbers Conference 2009 CFP Ending Soon!https://0pointer.net/blog/projects/plumbersconf-2009.html <p>The <a href="http://linuxplumbersconf.org/2009/submit/">Call for Papers</a> for the <a href="http://www.linuxplumbersconf.org/">Linux Plumbers Conference (LPC)</a> in September in Portland, Oregon <a href="http://lwn.net/Articles/336707/">is ending soon</a>, on <b>June 15th 2009</b>. It's a conference about the core infrastructure of Linux systems: the part of the system where userspace and the kernel interface. It's the first conference where the focus is specifically on getting together the kernel people who work on the userspace interfaces and the userspace people who have to deal with kernel interfaces. It's supposed to be a place where all the people doing infrastructure work sit down and talk, so that each other understands better what the requirements and needs of the other are, and where we can work towards fixing the major problems we currently have with our lower-level APIs.</p> <p>Last year's conference was hugely successful. If you want to read up what happened then, LWN <a href="http://lwn.net/Articles/297958/">has</a> <a href="http://lwn.net/Articles/299088/">good</a> <a href="http://lwn.net/Articles/300324/">coverage</a>.</p> <p>Like last year, I will be running the Audio conference track of LPC. Audio infrastructure on Linux is still heavily fragmented. Pro, desktop and embedded worlds are very seperate. While we have quite good driver support the user experience is far from perfect, mostly because our infrastructure is so balkanized. Join us at the LPC and help to fix this! If you are doing <b>audio infrastructure work</b> on Linux, make sure to attend and <b>submit a paper!</b></p> <p><a href="http://linuxplumbersconf.org/2009/register/">Sign up soon!</a> <a href="http://linuxplumbersconf.org/2009/submit/">Send in your paper quickly!</a></p> <p><a href="http://www.linuxplumbersconf.org"><img style="border: 0" src="http://linuxplumbersconf.org/2009/style/tagline.png" alt="Plumbers Logo" width="493" height="90" /></a></p> <p>See you in Portland!</p> Lennart PoetteringFri, 12 Jun 2009 16:18:00 +0200tag:0pointer.net,2009-06-12:/blog/projects/plumbersconf-2009.htmlprojectsLiving in Berlin? You are a GNOMEr?https://0pointer.net/blog/projects/berlin-gnomers.html <p>If you live in Berlin and are a GNOMEr of some kind then please feel invited top drop by tomorrow (Fri 29) at 4 pm at the <a href="http://www.pratergarten.de/d/biergarten.php4">Prater Biergarten</a> (Weather permitting outside, otherwise inside). We'll have a little GNOME get-together. For now, we know that at least the Openismus Berlin folks will be there, as will I and presumably one special guest from Finland, and whoever else wants to attend.</p> <p>Hope to see you tomorrow!</p> Lennart PoetteringThu, 28 May 2009 23:50:00 +0200tag:0pointer.net,2009-05-28:/blog/projects/berlin-gnomers.htmlprojectsThe Sound of Fedora 11https://0pointer.net/blog/projects/shameless-self-promotion.html <p><a href="http://jaboutboul.blogspot.com/2009/05/sound-of-fedora-11.html">I learned so much when I read this interview.</a> And so will you!</p> Lennart PoetteringThu, 21 May 2009 17:03:00 +0200tag:0pointer.net,2009-05-21:/blog/projects/shameless-self-promotion.htmlprojectsAll About Fragmentshttps://0pointer.net/blog/projects/all-about-periods.html <p>In my on-going series <i><a href="http://0pointer.de/blog/projects/guide-to-sound-apis">Writing Better Audio Applications</a></i> for Linux, here's another installment: a little explanation how fragments/periods and buffer sizes should be chosen when doing audio playback with traditional audio APIs such as ALSA and OSS. This originates from <a href="http://bugzilla.gnome.org/show_bug.cgi?id=572953#c5">some emails I exchanged with the Ekiga folks</a>. In the last weeks I kept copying this explanation to various other folks. I guess it would make sense to post this on my blog here too to reach a wider audience. So here it is, mostly unedited:</p> <pre> Yes. You shouldn't misuse the fragments logic of sound devices. It's like this: The latency is defined by the buffer size. The wakeup interval is defined by the fragment size. The buffer fill level will oscillate between 'full buffer' and 'full buffer minus 1x fragment size minus OS scheduling latency'. Setting smaller fragment sizes will increase the CPU load and decrease battery time since you force the CPU to wake up more often. OTOH it increases drop out safety, since you fill up playback buffer earlier. Choosing the fragment size is hence something which you should do balancing out your needs between power consumption and drop-out safety. With modern processors and a good OS scheduler like the Linux one setting the fragment size to anything other than half the buffer size does not make much sense. Your <i>[Ekiga's ptlib driver that is]</i> ALSA output is configured to set the the fragment size to the size of your codec audio frames. And that's a bad idea. Because the codec frame size has not been chosen based on power consumption or drop-out safety reasoning. It has been chosen by the codec designers based on different reasoning, such as latency. You probably configured your backend this ways because the ALSA library docs say that it is recommended to write to the sound card in multiples of the fragment size. However deducing from this that you hence should configure the fragment size to the codec frame size is wrong! The best way to implement playback these days for ALSA is to write as much as snd_pcm_avail() tells you to each time you wake up due to POLLOUT on the sound card. If that is not a multiple of your codec frame size then you need to buffer the the remainder of the decoded data yourself in system memory. The ALSA fragment size you should normally set as large as possible given your latency constraints but that you have at least two fragments in your buffer size. I hope this explains a bit how frag_size/buffer_size should be chosen. If you have questions, just ask. (Oh, ALSA uses the term 'period' for what I call 'fragment' above. It's synonymous) </pre> Lennart PoetteringSun, 19 Apr 2009 01:34:00 +0200tag:0pointer.net,2009-04-19:/blog/projects/all-about-periods.htmlprojectsGNOME now esound-freehttps://0pointer.net/blog/projects/esound-free.html <p><a href="http://blogs.gnome.org/aklapper/">Andre Klapper</a> just informed me that GNOME is now officially <a href="http://en.wikipedia.org/wiki/Esound">esound</a>-free: all modules have been ported over to <a href="http://0pointer.de/lennart/projects/libcanberra/">libcanberra</a> for event sounds or GStreamer/PulseAudio for everything else. It's time to celebrate!</p> <p>It's an end of an era. The oldest version of esound in GNOME CVS is 0.2.1, commited on May 11th 1998. It has been shipped with every GNOME release since 1.0 back in 1999. (esound outside of GNOME dates even further back, probably some time in the year 1997 or so). After almost 11 years in GNOME it's all over now. Oh, those were the good times.</p> <p>If you maintain a module that is not part of GNOME that still uses esound, hurry and update yours as well!</p> Lennart PoetteringSun, 05 Apr 2009 20:23:00 +0200tag:0pointer.net,2009-04-05:/blog/projects/esound-free.htmlprojectsWhat YOU need to know about Practical Real-Time Programminghttps://0pointer.net/blog/projects/realtime-bossa.html <p><a href="http://etrunko.blogspot.com/">Eduardo Lima</a> just added a couple of more videos from <a href="http://www.bossaconference.indt.org/">one of the best conferences in existence</a> to <a href="http://openbossa.blip.tv/">the OpenBOSSA</a> channel at blip.tv. Humbly as I am I'd like to ask everyone who is interested in real-time and/or audio/video/animation programming <a href="http://blip.tv/file/1953900"><b>to have a peek at this particular one</b></a>.</p> <p>That's all.</p> Lennart PoetteringSat, 04 Apr 2009 01:38:00 +0200tag:0pointer.net,2009-04-04:/blog/projects/realtime-bossa.htmlprojectsDevice Reservation Spechttps://0pointer.net/blog/projects/device-reservation.html <p>The <a href="http://jackaudio.org/">JACK</a> folks and I have agreed on a little specification for <a href="http://git.0pointer.de/?p=reserve.git;a=blob_plain;f=reserve.txt">device reservation</a> that allows clean hand-over of audio device access from <a href="http://pulseaudio.org">PulseAudio</a> to JACK and back. The specification is generic enough to allow locking/hand-over of other device types as well, not just audio cards. So, in case someone needs to implement a similar kind of locking/handover for any kind of resource here's some prior art you can base your work on. Given that HAL is supposed to go away pretty soon this might be an option for a replacement for HAL's current device locking. The logic is as simple as it can get. Whoever owns a certain service name on the D-Bus session bus owns the device access. For further details, <a href="http://git.0pointer.de/?p=reserve.git;a=blob_plain;f=reserve.txt">read the spec</a>.</p> <p>There's even a <a href="http://git.0pointer.de/?p=reserve.git">reference implementation</a> available, which both JACK2 and PulseAudio have now integrated.</p> <p>Also known as PAX SOUND SERVERIS.</p> Lennart PoetteringThu, 26 Feb 2009 18:55:00 +0100tag:0pointer.net,2009-02-26:/blog/projects/device-reservation.htmlprojectsHaving fun with bzrhttps://0pointer.net/blog/projects/bizarre-fun.html #nocomments y <p>So I wanted to hack proper channel mapping query support into <a href="http://www.mega-nerd.com/libsndfile">libsndfile</a>, something I have had on my TODO list for years. The first step was to find the <a href="http://www.mega-nerd.com/libsndfile/development.html">source code repository for it</a>. That was easy. Alas the VCS used is bzr. There are some very vocal folks on the Internet who claim that the bzr user interface is stupendously easy to use in contrast to git which apparantly is the very definition of complexity. And if it is stated on the Internet it must be true. I think I mastered git quite well, so yeah, checking out the sources with bzr can't be that difficult for my limited brain capacity.</p> <p>So let's do what Erik suggests for checking out the sources:</p> <pre>$ bzr get http://www.mega-nerd.com/Bzr/libsndfile-pub/</pre> <p>Calling this I get a nice percentage counter that starts at 0% and ends at, ... uh, 0%. That gives me a real feeling of progress. It takes a while, and then I get an error:</p> <pre>bzr: ERROR: Not a branch: "http://www.mega-nerd.com/Bzr/libsndfile-pub/".</pre> <p>Now that's a useful error message. They even include an all-caps word! I guess that error message is right -- it's not a branch, it is a repository. Or is it not?</p> <p>So what do we do about this? Maybe <tt>get</tt> is not actually the right verb. Let's try to play around a bit. Let's use the verb I use to get sources with in git:</p> <pre>$ bzr clone http://www.mega-nerd.com/Bzr/libsndfile-pub/</pre> <p>Hmm, this results in exactly same 0% to 0% progress counter, and the same useless error message.</p> <p>Now I remember that bzr is actually more inspired by Subversion's UI than by git's, so let's try it the SVN way.</p> <pre>$ bzr checkout http://www.mega-nerd.com/Bzr/libsndfile-pub/</pre> <p>Hmm, and of course, I get exactly the same results again. A counter that counts from 0% to 0% and the same useless error message.</p> <p>Ok, maybe that error is bzr's standard reply? Let's check this out:</p> <pre>$ bzr waldo http://www.mega-nerd.com/Bzr/libsndfile-pub/ bzr: ERROR: unknown command "waldo"</pre> <p>Apparently not. bzr actually knows more than one error message.</p> <p>Ok, I admit doing this by trial-and-error is a rather lame approach. RTFM! So let's try this.</p> <pre>$ man bzr-get No manual entry for bzr-get</pre> <p>Ouch. No man page? How awesome. Ah, wait, maybe they have only a single unreadable mega man page for everything. Let's try this:</p> <pre>$ man bzr</pre> <p>Wow, this actually worked. Seems to list all commands. Now let's look for the help on <tt>bzr get</tt>:</p> <pre>/bzr get Pattern not found (press RETURN)</pre> <p>Hmm, no documentation for their most important command? That's weird! Ok, let's try it again with our git vocabulary:</p> <pre>/bzr clone Pattern not found (press RETURN)</pre> <p>Ok, this not funny anymore. Apparently the verbs are listed in alphabetical order. So let's browse to the letter <i>g</i> as in <tt>get</tt>. However it doesn't exist. There's <tt>bzr export</tt>, and then the next entry is <tt>bzr help</tt> (Oh, irony!) -- but no <tt>get</tt> in-between.</p> <p>Ok, enough of this shit. Maybe the message wants to tell us that the repo actually doesn't exist (even though it confusingly calls it a "branch"). Let's go back to the original page at Erik's site and read things again. Aha, the "<i>main archive archive can be found at (yes, the directory looks empty, but it isn't): <a href="http://www.mega-nerd.com/Bzr/libsndfile-pub/">http://www.mega-nerd.com/Bzr/libsndfile-pub/</a>".</i> Hmm, indeed -- that URL looks very empty when it is accessed. How weird though that in bzr a repo is an empty directory!</p> <p>And at this point I gave up and downloaded the tarball to make my patches against. I have still not managed to check out the sources from the repo. Somehow I get the feeling the actual repo really isn't available anymore under that address.</p> <p>So why am I blogging about this? Not so much to start another flamefest, to nourish the fanboys, nor because it is so much fun to bash other people's work or simply to piss people off. It's more for two reasons:</p> <p>Firstly, simply to make the point that folks can claim a thousand times that git's UI sucks and bzr's UI is awesome. It's simply not true. From what I experienced it is not the tiniest bit better. The error messages useless, the documentation incomplete, the interfaces surprising and exactly as redundant as git's. The only effective difference I noticed is that it takes a bit longer to show those error messages with bzr -- the Python tax. To summarize this more positively: git excels as much as bzr does. Both' documentation, their error messages and their user interface are the best in their class. And they have all the best chances for future improvement.</p> <p>And the second reason of course is that I'd still like to know what the correct way to get the sources is. But for that I should probably ask Erik himself.</p> Lennart PoetteringWed, 25 Feb 2009 10:39:00 +0100tag:0pointer.net,2009-02-25:/blog/projects/bizarre-fun.htmlprojectsThis is funnyhttps://0pointer.net/blog/got-style.html <p><a href="http://userstyles.org/styles/6371">Uh?</a></p> <p>Some folks apparently don't have much respect for my web design skills -- and I always considered myself the Malevich of web design! Pfft!</p> Lennart PoetteringMon, 23 Feb 2009 16:52:00 +0100tag:0pointer.net,2009-02-23:/blog/got-style.htmlmiscGenerating Copyright Headers from git Historyhttps://0pointer.net/blog/projects/copyright.html <p><a href="http://0pointer.de/public/copyright.py">Here's a little a little tool I wrote</a> that automatically generates copyright headers for source files in a git repository based on the git history.</p> <p>Run it like this:</p> <pre>~/projects/pulseaudio$ copyright.py src/pulsecore/sink.c src/pulsecore/core-util.c</pre> <p>And it will give you this:</p> <pre> File: src/pulsecore/sink.c Copyright 2004, 2006-2009 Lennart Poettering Copyright 2006-2007 Pierre Ossman Copyright 2008-2009 Marc-Andre Lureau File: src/pulsecore/core-util.c Copyright 2004, 2006-2009 Lennart Poettering Copyright 2006-2007 Pierre Ossman Copyright 2008 Stelian Ionescu Copyright 2009 Jared D. McNeill Copyright 2009 Marc-Andre Lureau </pre> <p>This little script could use love from a friendly soul to make it crawl entire source trees and patch in appropriate copyright headers. Anyone up for it?</p> Lennart PoetteringSat, 21 Feb 2009 00:16:00 +0100tag:0pointer.net,2009-02-21:/blog/projects/copyright.htmlprojectsTagging Audio Streamshttps://0pointer.net/blog/projects/tagging-audio.html <p>So you are hacking an audio application and the audio data you are generating might eventually end up in <a href="http://pulseaudio.org/">PulseAudio</a> before it is played. <b><a href="http://pulseaudio.org/wiki/ApplicationProperties">If that's the case then please make sure to read this!</a></b></p> <p>Here's the quick summary for Gtk+ developers:</p> <p>PulseAudio can enforce all kinds of policy on sounds. For example, starting in 0.9.15, we will automatically pause your media player while a phone call is going on. To implement this we however need to know what the stream you are sending to PulseAudio should be categorized as: is it music? Is it a movie? Is it game sounds? Is it a phone call stream?</p> <p>Also, PulseAudio would like to show a nice icon and an application name next to each stream in the volume control. That requires it to be able to deduce this data from the stream.</p> <p>And here's where you come into the game: please add three lines like the following next to the beginning of your <tt>main()</tt> function to your Gtk+ application:</p> <pre> ... <a href="http://library.gnome.org/devel/glib/unstable/glib-Miscellaneous-Utility-Functions.html#g-set-application-name">g_set_application_name</a>(_("Totem Movie Player")); <a href="http://library.gnome.org/devel/gtk/unstable/GtkWindow.html#gtk-window-set-default-icon-name">gtk_window_set_default_icon_name</a>("totem"); <a href="http://library.gnome.org/devel/glib/unstable/glib-Miscellaneous-Utility-Functions.html#g-setenv">g_setenv</a>("PULSE_PROP_media.role", "video", TRUE); ... </pre> <p>If you do this then the PulseAudio client libraries will be able to figure out the rest for you.</p> <p>There is more meta information (aka "properties") you can set for your application or for your streams that is useful to PulseAudio. In case you want to know more about them or you are looking for equivalent code to the above example for non-Gtk+ applications, <a href="http://pulseaudio.org/wiki/ApplicationProperties">make sure to read the mentioned page</a>.</p> <p>Thank you!</p> <p>Oh, and even if your app doesn't do audio, calling <tt>g_set_application_name()</tt> and <tt>gtk_window_set_default_icon_name()</tt> is always a good idea!</p> Lennart PoetteringFri, 20 Feb 2009 22:02:00 +0100tag:0pointer.net,2009-02-20:/blog/projects/tagging-audio.htmlprojectsHow to Version D-Bus Interfaces Properly and Why Using / as Service Entry Point Suckshttps://0pointer.net/blog/projects/versioning-dbus.html <p>So you are designing a D-Bus interface and want to make it future-proof. Of course, you thought about versioning your stuff. But you wonder how to do that best. Here are a few things I learned about versioning D-Bus APIs which might be of general interest:</p> <p><b>Version your interfaces!</b> This one is pretty obvious. No explanation needed. Simply include the interface version in the interface name as suffix. i.e. the initial release should use <tt>org.foobar.AwesomeStuff1</tt>, and if you do changes you should introduce <tt>org.foobar.AwesomeStuff2</tt>, and so on, possibly dropping the old interface.</p> <p>When should you bump the interface version? Generally, I'd recommend only bumping when doing incompatible changes, such as function call signature changes. This of course requires clients to handle the <tt>org.freedesktop.DBus.Error.UnknownMethod</tt> error properly for each function you add to an existing interface. That said, in a few cases it might make sense to bump the interface version even without breaking compatibility of the calls. (e.g. in case you add something to an interface that is not directly visible in the introspection data)</p> <p><b>Version your services!</b> This one is almost as obvious. When you completely rework your D-Bus API introducing a new service name might be a good idea. Best way to do this is by simply bumping the service name. Hence, call your service <tt>org.foobar.AwesomeService1</tt> right from the beginning and then bump the version if you reinvent the wheel. And don't forget that you can acquire more than one well-known service name on the bus, so even if you rework everything you can keep compatibilty. (Example: BlueZ 3 to BlueZ 4 switch)</p> <p><b>Version your 'entry point' object paths!</b> This one is far from obvious. The reasons why object paths should be versioned are purely technical, not philosophical: for signals sent from a service D-Bus overwrites the originating service name by the unique name (e.g. <tt>:1.42</tt>) even if you fill in a well-known name (e.g. <tt>org.foobar.AwesomeService1</tt>). Now, let's say your application registers two well-known service names, let's say two versions of the same service, versioned like mentioned above. And you have two objects -- one on each of the two service names -- that implement a generic interface and share the same object path: for the client there will be no way to figure out to which service name the signals sent from this object path belong. And that's why you should make sure to use versioned and hence different paths for both objects. i.e. start with <tt>/org/foobar/AwesomeStuff1</tt> and then bump to <tt>/org/foobar/AwesomeStuff2</tt> and so on. (<a href="http://cgit.freedesktop.org/~david/eggdbus/tree/src/eggdbus/eggdbusconnection.c?id=670144c1d962a3d79584a7e944dabc191d635c76#n357">Also see David's comments about this.</a>)</p> <p>When should you bump the object path version? Probably only when you bump the service name it belongs to. Important is to version the 'entry point' object path. Objects below that don't need explicit versioning.</p> <p>In summary: For good D-Bus API design <b>you should version all three: D-Bus interfaces, service names <i>and</i> 'entry point' object paths.</b></p> <p>And don't forget: nobody gets API design right the first time. So even if you think your D-Bus API is perfect: version things right from the beginning because later on it might turn out you were not quite as bright as you thought you were.</p> <p>A corollary from the reasoning behind versioning object paths as described above is that using <tt>/</tt> as entry point object path for your service is a bad idea. It makes it very hard to implement more than one service or service version on a single D-Bus connection. Again: <b>Don't use <tt>/</tt> as entry point object path. Use something like <tt>/org/foobar/AwesomeStuff</tt>!</b></p> Lennart PoetteringWed, 11 Feb 2009 19:03:00 +0100tag:0pointer.net,2009-02-11:/blog/projects/versioning-dbus.htmlprojectsWriting Volume Control UIs is Hardhttps://0pointer.net/blog/projects/writing-volume-control-uis.html <p>Writing modern volume control UIs (i.e. 'mixer tools') is much harder to get right than it might appear at first. Because that is the way it is I've put together a <a href="http://pulseaudio.org/wiki/WritingVolumeControlUIs">rough guide what to keep in mind when writing them for PulseAudio</a>. Originally just intended to be a bit of help for the gnome-volume-control guys I believe this could be an interesting read for other people as well.</p> <p>It touches a lot of topics: volumes in general, how to present them, what to present, base volumes, flat volumes, what to do about multichannel volumes, controlling clients, controlling cards, handling default devices, saving/restoring volumes/devices, sound event sliders, how to monitor PCM and more.</p> <p>So make sure to give it at least a quick peek! If you plan to write a volume control for ncurses or KDE (hint, hint!) even more so, it's a must read.</p> <p>Maybe this might also help illustrating why I think that abstracting volume control interfaces inside of abstraction layers such as Phonon or GStreamer is doomed to fail, and just not even worth the try.</p> <p><a href="http://pulseaudio.org/wiki/WritingVolumeControlUIs">And now, without further ado I give you 'Writing Volume Control UIs'</a>.</p> Lennart PoetteringTue, 10 Feb 2009 21:03:00 +0100tag:0pointer.net,2009-02-10:/blog/projects/writing-volume-control-uis.htmlprojectsOh Nine Fifteenhttps://0pointer.net/blog/projects/oh-nine-fifteen.html <p>Last week I've released <a href="https://tango.0pointer.de/pipermail/pulseaudio-discuss/2009-February/003068.html">a test version</a> for the upcoming 0.9.15 release of <a href="http://pulseaudio.org/">PulseAudio</a>. It's going to be a major one, so here's a little overview what's new from the user's perspective.</p> <h3>Flat Volumes</h3> <p>Based on code originally contributed by Marc-Andr&eacute; Lureau we now support <i>Flat Volumes</i>. The idea behind flat volumes has been inspired by how Windows Vista handles volume control: instead of maintaining one volume control per application stream plus one device volume we instead fix the device volume automatically to the "loudest" application stream volume. Sounds confusing? Actually it's right the contrary, it feels pretty natural and easy to use and brings us a big step forward to reduce a bit the number of volume sliders in the entire audio pipeline from the application to what you hear.</p> <p>The flat volumes logic only applies to devices where we know the actual multiplication factor of the hardware volume slider. That's most devices supported by the ALSA kernel drivers except for a few older devices and some cheap USB hardware that exports invalid dB information.</p> <h3>On-the-fly Reconfiguration of Devices (aka "S/PDIF Support")</h3> <p>PulseAudio will now automatically probe all possible combinations of configurations how to use your sound card for playback and capturing and then allow on-the-fly switching of the configuration. What does that mean? Basically you may now switch beetween "Analog Stereo", "Digital S/PDIF Stereo", "Analog Surround 5.1" (... and so on) on-the-fly without having to reconfigure PA on the configuration file level or even having to stop your streams. This fixes a couple of issues PA had previously, including proper SPDIF support, and per-device configuration of the channel map of devices.</p> <p>Unfortunately there is no UI for this yet, and hence you need to use <tt>pactl</tt>/<tt>pacmd</tt> on the command line to switch between the profiles. Typing <tt>list-cards</tt> in <tt>pacmd</tt> will tell you which profiles your card supports. </p> <p>In a later PA version this functionality will be extended to also allow input connector switching (i.e. microphone vs. line-in) and output connector switching (i.e. internal speakers vs. line-out) on-the-fly.</p> <h3>Native support for 24bit samples</h3> <p>PA now supports 24bit packed samples as well as 24bit stored in the LSBs of 32bit integers natively. Previously these formats were always converted into 32bit MSB samples.</p> <h3>Airport Express Support</h3> <p>Colin Guthrie contributed native Airport Express support. This will make the <a href="http://en.wikipedia.org/wiki/Remote_Audio_Output_Protocol">RAOP</a> audio output of ApEx routers appear like local sound devices (unfortunately sound devices with a very long latency), i.e. any application connecting to PulseAudio can output audio to ApEx devices in a similar way to how iTunes can do it on MacOSX.</p> <p>Before you ask: it is unlikely that we will ever make PulseAudio be able to act as an ApEx compatible device that takes connections from iTunes (i.e. becoming a RAOP server instead of just an RAOP client). Apple has an unfriendly attitude of dongling their devices to their applications: normally iTunes has to cryptographically authenticate itself to the device and the device to iTunes. iTunes' key has been recovered by the infamous <a href="http://nanocr.eu/2004/08/11/reversing-airtunes/">Jon Lech Johansen</a>, but the device key is still unknown. Without that key it is not realistically possible to disguise PA as an ApEx.</p> <h3>Other stuff</h3> <p>There have been some extensive changes to natively support Bluetooth audio devices well by directly accessing BlueZ. This code was originally contributed by the GSoC student Jo&atilde;o Paulo Rechi Vita. Initially, 0.9.15 was intended to become the version were BT audio just works. Unfortunately the kernel is not really up to that yet, and I am not sure everything will be in place so that 0.9.15 will ship with well working BT support.</p> <p>There have been a lot of internal changes and API additions. Most of these however are not visible to the user.</p> Lennart PoetteringTue, 10 Feb 2009 20:11:00 +0100tag:0pointer.net,2009-02-10:/blog/projects/oh-nine-fifteen.htmlprojectsPascal,https://0pointer.net/blog/projects/pascal-terjan.html <p>replacing integral parts of a system is always a bit of a dilemma. If we replace it only after all the other software/drivers that interface with it is known to work well with it then nobody will bother doing all that compatbility work since they can say "Nobody uses it yet, so why should I bother?" -- and hence the change can never take place.</p> <p>If we replace it before everything works perfectly well with it, then folks will complain: "Oh my god, it doesn't work with my software/drivers, you suck!" -- <a href="http://fasmz.org/~pterjan/blog/?date=20090127">like you just did (though in more polite words)</a>.</p> <p>Hence regardless which way we do it we will do it the wrong way. Biting the bullet and doing the change is however still the better, the only path to improvement. With the limited amount of manpower we have pushing things out knowing that there is some software/drivers that don't work well with it is our only option -- especially if the software in question is unfixable by us since it is closed source.</p> <p>Hence, if we'd do as you wish and not make the distributions adopt PulseAudio right now we can forget about fixing audio on Linux entirely and it will stagnate forever.</p> <p>As mentioned <a href="http://www.j5live.com/2009/01/26/ah-the-memories/">by J5</a> this was the same story with D-Bus, HAL, with udev, and other stuff.</p> <p>And again, folks may claim that PulseAudio is very buggy. While it certainly has bugs, like every software has, most of the issues reported are not things we can or should fix/work-around in PulseAudio, but that are in other layers of the system. In ALSA, in the drivers, in the client applications. However only PA makes them become visible since it depends on a lot more functionality to work properly than any other program before. And quite frankly we use a lot of stuff exactly nobody has used before and that of course was broken due that (in ALSA as one example).</p> <p>Having said all this. Just pointing to other folks to blame doesn't really solve the problem. I did a lot of testing on different sound chips, making sure PulseAudio works fine on them. Of course it's a limited testing set (six cards right now to be exact, a seventh model currently being sent to me by my employer, Red Hat.). The list of cards that are currently known to be problematic are <a href="http://pulseaudio.org/wiki/BrokenSoundDrivers">listed in our Wiki</a>.</p> <p>I am not saying that the points you make are rubbish. However, please see the big picture before getting vocal about it.</p> Lennart PoetteringTue, 27 Jan 2009 19:17:00 +0100tag:0pointer.net,2009-01-27:/blog/projects/pascal-terjan.htmlprojectsIndia, Againhttps://0pointer.net/blog/photos/india-again.html <p>Right after my <a href="http://0pointer.de/blog/photos/brazil">trip to Brazil in November</a> I flew to Bangalore for <a href="http://foss.in/">FOSS.in 2008</a>. It was one amazing conference! After the <a href="http://foss.in/news/fossin2008-the-omelette-post.html">bold changes</a> they had announced I feared they might be a bit too ... bold. But they were not. FOSS.in worked out very well, it was a great success, and it was good to see a lot of familiar faces again. (Which reminds me: Hey, the four of you from the <a href="http://workouts.foss.in/2008/index.php/Implementing_volume-follows-focus_in_PulseAudio">PulseAudio Workout</a>, could you please drop me a line? I forgot to put down your email addresses.)</p> <p>After FOSS.in I flew up to Rajasthan for a much too short trip through this marvelous state:</p> <p> <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=409"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-409.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=423"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-423.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=438"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-438.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=436"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-436.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=458"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-458.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=501"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-501.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=877"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-877.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1062"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1062.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1070"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1070.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1178"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1178.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=111"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-111.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1253"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1253.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1340"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1340.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1826"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1826.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1901"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1901.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1957"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1957.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=2391"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-2391.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=2479"><img width="120" height="80" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-2479.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=24"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-24.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=2341"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-2341.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=105"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-105.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=116"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-116.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=126"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-126.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=284"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-284.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=25"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-25.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=320"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-320.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=68"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-68.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=73"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-73.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=79"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-79.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=94"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-94.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=367"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-367.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=394"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-394.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1240"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1240.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=410"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-410.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=413"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-413.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=420"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-420.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=426"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-426.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=425"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-425.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=427"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-427.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=424"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-424.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1861"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1861.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1337"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1337.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=470"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-470.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=568"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-568.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=608"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-608.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=663"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-663.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=669"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-669.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=719"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-719.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=805"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-805.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1068"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1068.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1246"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1246.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1134"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1134.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1208"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1208.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1350"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1350.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1469"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1469.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1740"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1740.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1881"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1881.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=2526"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-2526.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=387"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-387.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1976"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1976.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=2036"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-2036.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=2093"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-2093.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=2436"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-2436.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=2480"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-2480.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=2502"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-2502.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1089"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1089.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1251"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1251.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1589"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1589.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=2278"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-2278.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1321"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1321.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202008-11&amp;photo=1965"><img width="80" height="120" alt="India" src="http://0pointer.de/photos/galleries/India%202008-11/thumbs/img-1965.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/static/pushkar"><img src="http://0pointer.de/static/pushkar-small.jpeg" alt="Panorama" width="1024" height="188" /></a> </p> <p> <a href="http://0pointer.de/static/pushkar2"><img src="http://0pointer.de/static/pushkar2-small.jpeg" alt="Panorama" width="1024" height="135" /></a> </p> <p> <a href="http://0pointer.de/static/jaipur1"><img src="http://0pointer.de/static/jaipur1-small.jpeg" alt="Panorama" width="1024" height="172" /></a> </p> <p> <a href="http://0pointer.de/static/fatehpur"><img src="http://0pointer.de/static/fatehpur-small.jpeg" alt="Panorama" width="1024" height="137" /></a> </p> <p> <a href="http://0pointer.de/static/tajmahal1"><img src="http://0pointer.de/static/tajmahal1-small.jpeg" alt="Panorama" width="1024" height="190" /></a> </p> <p> <a href="http://0pointer.de/static/tajmahal2"><img src="http://0pointer.de/static/tajmahal2-small.jpeg" alt="Panorama" width="1024" height="180" /></a> </p> <p>That's Pushkar, Jaipur, Fatehpur Sikri and the Taj Mahal (the real one, not the Hotel they bombed).</p> Lennart PoetteringSat, 17 Jan 2009 21:37:00 +0100tag:0pointer.net,2009-01-17:/blog/photos/india-again.htmlphotosBrazilhttps://0pointer.net/blog/photos/brazil.html <p>In November I spent three weeks in Brazil, the country where I grew up two decades ago. Surprisingly little had changed since then. Except maybe that this time I had an DSLR:</p> <p> <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=273"><img width="120" height="80" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-273.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=514"><img width="120" height="80" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-514.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1030"><img width="120" height="80" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1030.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=210"><img width="120" height="80" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-210.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1070"><img width="120" height="80" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1070.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1627"><img width="120" height="80" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1627.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1340"><img width="120" height="80" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1340.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1236"><img width="120" height="80" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1236.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1885"><img width="120" height="80" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1885.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=2291"><img width="120" height="80" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-2291.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=2423"><img width="120" height="80" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-2423.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1680"><img width="120" height="80" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1680.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=758"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-758.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=863"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-863.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1507"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1507.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1267"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1267.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1464"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1464.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1525"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1525.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=2027"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-2027.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=2102"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-2102.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=2223"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-2223.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=2491"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-2491.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=2583"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-2583.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1626"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1626.jpg" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1887"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1887.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=2615"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-2615.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=123"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-123.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1231"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1231.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=1294"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-1294.jpg" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Brazil%202008-11&amp;photo=982"><img width="80" height="120" alt="Brazil" src="http://0pointer.de/photos/galleries/Brazil%202008-11/thumbs/img-982.jpg" /></a> &nbsp; </p> <p>That's Rio de Janeiro and the old colonial towns of Ouro Preto, Mariana, S&atilde;o Jo&atilde;o del Rey, Tiradentes, Congonhas do Campo, Paraty in Minas Gerais and Rio State.</p> <p> <a href="http://0pointer.de/static/ouro-preto"><img src="http://0pointer.de/static/ouro-preto-small.jpeg" alt="Panorama" width="1024" height="167" /></a> </p> <p> <a href="http://0pointer.de/static/copacabana"><img src="http://0pointer.de/static/copacabana-small.jpeg" alt="Panorama" width="1024" height="212" /></a> </p> <p>Once again Ouro Preto, and Copacabana Beach at night.</p> Lennart PoetteringSat, 17 Jan 2009 15:12:00 +0100tag:0pointer.net,2009-01-17:/blog/photos/brazil.htmlphotosAutomatic Backtrace Generationhttps://0pointer.net/blog/projects/automatic-backtrace.html <p><a href="https://wiki.ubuntu.com/Apport">Ubuntu has Apport</a>. <a href="http://fedoraproject.org/wiki/Features/CrashHandling">Fedora has nothing.</a> That sucks big time.</p> <p>Here's the result of a few minutes of hacking up something similar to Apport based on the awesome (and much underused) <a href="http://sourceware.org/frysk/">Frysk</a> debugging tool kit. It doesn't post any backtraces on any Internet servers and has no fancy UI -- but it automatically dumps a stacktrace of every crashing process on the system to syslog and stores all kinds of data in <tt>/tmp/core.*/</tt> for later inspection.</p> <pre> #!/bin/bash set -e export PATH=/sbin:/bin:/usr/sbin:/usr/bin DIR="/tmp/core.$1.$2" umask 077 mkdir "$DIR" cat &gt; "$DIR/core" exec &amp;&gt; "$DIR/dump.log" set +e echo "$1" &gt; "$DIR/pid" echo "$2" &gt; "$DIR/timestamp" echo "$3" &gt; "$DIR/uid" echo "$4" &gt; "$DIR/gid" echo "$5" &gt; "$DIR/signal" echo "$6" &gt; "$DIR/hostname" set -x fauxv "$DIR/core" &gt; "$DIR/auxv" fexe "$DIR/core" &gt; "$DIR/exe" fmaps "$DIR/core" &gt; "$DIR/maps" PKGS=`/usr/bin/fdebuginfo "$DIR/core" | grep "\-\-\-" | cut -d ' ' -f 1 | sort | uniq | grep '^/'| xargs rpm -qf | sort | uniq` [ "x$PKGS" != x ] &amp;&amp; debuginfo-install -y $PKGS fstack -rich "$DIR/core" &gt; "$DIR/fstack" set +x ( echo "Application `cat "$DIR/exe"` (pid=$1,uid=$3,gid=$4) crashed with signal $5." echo "Stack trace follows:" cat "$DIR/fstack" echo "Auxiliary vector:" cat "$DIR/auxv" echo "Maps:" cat "$DIR/maps" echo "For details check $DIR" ) | logger -p local6.info -t "frysk-core-dump-$1" </pre> <p>Copy that into a file <tt>$SOMEWHERE/frysk-core-dump</tt>. Then do a <tt>chmod +x $SOMEWHERE/frysk-core-dump</tt> and a <tt>chown root:root $SOMEWHERE/frysk-core-dump</tt>. Now, tell the kernel that core dumps should be handed to this script:</p> <pre> # echo "|$SOMEWHERE/frysk-core-dump %p %t %u %g %s %h" > /proc/sys/kernel/core_pattern </pre> <p>Finally, increase RLIMIT_CORE to actually enable core dumps. <tt>ulimit -c unlimited</tt> is a good idea. This will enable them only for your shell and everything it spawns. In <tt>/etc/security/limits.conf</tt> you can enable them for all users. I haven't found out yet how to enable them globally in Fedora though, i.e. for every single process that is started after boot including system daemons.</p> <p>You can test this with running <tt>sleep 4711</tt> and then dumping core with C-\. The stacktrace should appear right-away in <tt>/var/log/messages</tt>.</p> <p>This script will automatically try to install the debugging symbols for the crashing application via yum. In some cases it hence might take a while until the backtrace appears in syslog.</p> <p>Don't forget to install Frysk before trying this script!</p> <p>You can't believe how useful this script is. Something crashed and the backtrace is already waiting for you! It's a bugfixer's wet dream.</p> <p>I am a bit surprised though that noone else came up with this before me. Or maybe I am just too dumb to use Google properly?</p> Lennart PoetteringWed, 29 Oct 2008 23:05:00 +0100tag:0pointer.net,2008-10-29:/blog/projects/automatic-backtrace.htmlprojectsPeople of the Free World [1]!https://0pointer.net/blog/projects/free-sound-themes.html <p>GNOME 2.24 supports <a href="http://www.freedesktop.org/wiki/Specifications/sound-theme-spec">XDG sound themes</a>. Unfortunately however right now there is only a single sound theme in existence: the <a href="http://cgit.freedesktop.org/~mccann/sound-theme-freedesktop/">sound-theme-freedesktop</a> -- which is pretty basic.</p> <p>Help us change this! There are many web sites like <a href="http://art.gnome.org/">art.gnome.org</a> which provide a large selection of graphical themes for Gtk+, Metacity, icon sets and so on. We want to see a similarly large selection of sound themes available! And we'd like you to contribute to this!</p> <p>How do you prepare sound themes? Read the <a href="http://0pointer.de/public/sound-theme-spec.html">XDG Sound Theming</a> and the <a href="http://0pointer.de/public/sound-naming-spec.html">XDG Sound Naming</a> specifications. Start with basing your work on the aforementioned <a href="http://people.freedesktop.org/~mccann/dist/sound-theme-freedesktop-0.2.tar.bz2">sound-theme-freedesktop</a>. And then just go ahead!</p> <p>Please note that only subset of the sounds listed in the Sound Naming Specification is currently hooked up properly -- i.e. generated when "input feedback" is enabled or triggered by applications. Nonetheless it makes sense to include them in your theme, because eventually they will be hooked up.</p> <p>When you put a theme together, make sure that you only select sounds that have a sensible Free Software license -- or if you have produced them yourself you pick a good license yourself. GPLv2+, LGPLv2+, CC-BY-SA 3.0 and CC-BY 3.0 are good choices.</p> <p>Not everyone is as lucky as <a href="http://blogs.gnome.org/hughsie/">Richard Hughes</a> and has a mom who is practically an endless source of special effect sounds. If your mom sucks then don't despair! The OLPC team has compiled <a href="http://wiki.laptop.org/go/Sound_samples">a huge set of Free sounds</a> that is waiting to be made an XDG sound theme. I am eagerly looking forward to your sound themes that make use of <a href="http://www.archive.org/details/Berklee44v13">"The Berklee Sampling Archive - Volume 13 - synthesizer - fx (126 samples) spaceships, lasers, explosions, machineguns, glisses"</a> to start a war in space each time you click a button on your screen!<sup>[1]</sup></p> <p><small><b>Footnotes</b></small></p> <p><small>[1] <i>Free</i> as in <i>free desktops</i> that is.</small></p> <p><small>[2] OK, to be honest I am not actually that eagerly looking forward to that. Spacewar-at-your-fingertips is pretty lame in comparison to a theme called "Richard's Mom"<sup>[3]</sup>.</small></p> <p><small>[3] You have no idea what all those Hughsie's-Mom-jokes are about? Then listen to the sound files that are shipped with gnome-power-manager!</small></p> Lennart PoetteringWed, 22 Oct 2008 21:21:00 +0200tag:0pointer.net,2008-10-22:/blog/projects/free-sound-themes.htmlprojectsBerliners!https://0pointer.net/blog/berliners.html <p><a href="http://www.vorratsdatenspeicherung.de/static/demo_en.html">Berliners, you might want to attend this rally!</a> It's tomorrow (hmm, or actually today considering it's already past midnight), October 11th 2 pm, Alexanderplatz.</p> Lennart PoetteringSat, 11 Oct 2008 02:25:00 +0200tag:0pointer.net,2008-10-11:/blog/berliners.htmlmiscResponses to my Audio API Guidehttps://0pointer.net/blog/projects/guide-to-sound-apis-followup.html <p>My <a href="http://0pointer.de/blog/projects/guide-to-sound-apis.html">Audio API guide</a> got quite a few responses.</p> <h3>The Good</h3> <p><a href="http://mailman.alsa-project.org/pipermail/alsa-devel/2008-September/010862.html">Takashi likes it.</a> <a href="http://www.schleef.org/blog/2008/09/24/clear-cutting-the-jungle/">And so does David.</a> Which is great because both are key people in the Linux multimedia community.</p> <p><a href="http://lwn.net/Articles/300423/">It made it to LWN.</a> I sincerely and humbly hope this is not going to stay the only news site picking this up. ;-)</p> <p>The <i>safe ALSA</i> part of the recommendations will most likely be added to the ALSA documentation soon. The GNOME-relevent part I will be adding to the GNOME platform overview.</p> <h3>The Bad</h3> <p><a href="http://aseigo.blogspot.com/2008/09/linux-audio-layers.html">Aaron basically likes it</a>, although he appears disappointed that KDE's and Qt's Phonon wasn't mentioned more positively. Aaron is very fair in his criticism. Nonetheless I don't think it is valid. My guide is not a list of alternatives. It's a list of recommendations. My recommendations. I do believe that my recommendations very much match the mainstream of the opinions of the key people in Linux multimedia and desktop audio. Of course I don't nearly know everyone of the key hackers in Linux multimedia. But I do know most of those who are actively interested in collaboration, whose projects have a lot mindshare and who attend the conferences that matter for Linux desktop audio.</p> <p><a href="http://blogs.gnome.org/uraeus/2008/09/25/in-the-land-of-silly-arguments/">Also see Christian's comments on Aaron's post.</a></p> <h3>The Ugly</h3> <p>It wasn't my intention to start another GNOME-vs.-KDE flamefest. Unfortunately a lot of people took this as great opportunity to troll at the various blog comment forums. I guess it is inevitable that some of those whose favourite software is not listed on a recommendation guide like this start to clamour about that. It's a pity not everyone who thinks I am treating KDE unfairly criticises that as fairly and reasonable as Aaron. Anyway, I humbly take this as a sign that people do consider this guide to be relevant and much needed. ;-)</p> Lennart PoetteringFri, 26 Sep 2008 00:05:00 +0200tag:0pointer.net,2008-09-26:/blog/projects/guide-to-sound-apis-followup.htmlprojectsEverybody Loves Pretty Graphicshttps://0pointer.net/blog/projects/the-linux-audio-stack.html <p>As kind of a followup to my <a href="http://0pointer.de/blog/projects/guide-to-sound-apis.html">Guide to Linux Sound APIs</a> here're some pretty graphics I just drew. (At least "pretty" to the degree of my limited drawing abilities). It's a block diagram depicting the Linux audio stack. A lot of people already drew something similar, and often enough the result was horribly complicated and -- in its conclusion disappointing. So, here's my try:</p> <p><img style="border:0" alt="Linux Audio Stack" src="http://0pointer.de/public/linux-audio-stack.png" width="601" height="245" /></p> <p>The components interface each other across the horizontal lines. The vertical lines seperate unrelated components. The drawing only includes modern, supported APIs and systems as described in the aforementioned blog article. It (hopefully) shows that things in the Linux audio world are not all that bad at all and we have workable answers for most questions without too much complexity, although they might not entirely make everyone overly happy.</p> <p>In an outburst of bias I completely ommited KDE-specific technologies from this drawing. I guess even if I would have included them it'd be called biased anyway, so why bother? Also, they would have distracted the reader and complicated the drawing considerably due to KDE's affection for pluggable backends. So: if you care about KDE, please ignore this diagram.</p> Lennart PoetteringThu, 25 Sep 2008 01:44:00 +0200tag:0pointer.net,2008-09-25:/blog/projects/the-linux-audio-stack.htmlprojectsA Guide Through The Linux Sound API Junglehttps://0pointer.net/blog/projects/guide-to-sound-apis.html <p>At the <a href="http://lwn.net/Articles/299211/">Audio MC</a> at the <a href="http://linuxplumbersconf.org/">Linux Plumbers Conference</a> one thing became very clear: it is very difficult for programmers to figure out which audio API to use for which purpose and which API not to use when doing audio programming on Linux. So here's my try to guide you through this jungle:</p> <h3>What do you want to do?</h3> <dl> <dt style="padding-top:8pt"><i>I want to write a media-player-like application!</i></dt> <dd>Use GStreamer! (Unless your focus is only KDE in which cases Phonon might be an alternative.)</dd> <dt style="padding-top:8pt"><i>I want to add event sounds to my application!</i></dt> <dd>Use libcanberra, install your sound files according to the XDG Sound Theming/Naming Specifications! (Unless your focus is only KDE in which case KNotify might be an alternative although it has a different focus.)</dd> <dt style="padding-top:8pt"><i>I want to do professional audio programming, hard-disk recording, music synthesizing, MIDI interfacing!</i></dt> <dd>Use JACK and/or the full ALSA interface.</dd> <dt style="padding-top:8pt"><i>I want to do basic PCM audio playback/capturing!</i></dt> <dd>Use the <i>safe</i> ALSA subset.</dd> <dt style="padding-top:8pt"><i>I want to add sound to my game!</i></dt> <dd>Use the audio API of SDL for full-screen games, libcanberra for simple games with standard UIs such as Gtk+.</dd> <dt style="padding-top:8pt"><i>I want to write a mixer application!</i></dt> <dd>Use the layer you want to support directly: if you want to support enhanced desktop software mixers, use the PulseAudio volume control APIs. If you want to support hardware mixers, use the ALSA mixer APIs.</dd> <dt style="padding-top:8pt"><i>I want to write audio software for the plumbing layer!</i></dt> <dd>Use the full ALSA stack.</dd> <dt style="padding-top:8pt"><i>I want to write audio software for embedded applications!</i></dt> <dd>For technical appliances usually the <i>safe</i> ALSA subset is a good choice, this however depends highly on your use-case.</dd> </dl> <h3>You want to know more about the different sound APIs?</h3> <dl> <dt style="padding-top:8pt"><i>GStreamer</i></dt> <dd><a href="http://www.gstreamer.net/">GStreamer</a> is the de-facto standard media streaming system for Linux desktops. It supports decoding and encoding of audio and video streams. You can use it for a wide range of purposes from simple audio file playback to elaborate network streaming setups. GStreamer supports a wide range of CODECs and audio backends. GStreamer is not particularly suited for basic PCM playback or low-latency/realtime applications. GStreamer is portable and not limited in its use to Linux. Among the supported backends are ALSA, OSS, PulseAudio. [<a href="http://gstreamer.freedesktop.org/documentation/">Programming Manuals and References</a>]</dd> <dt style="padding-top:8pt"><i>libcanberra</i></dt> <dd><a href="http://0pointer.de/lennart/projects/libcanberra/">libcanberra</a> is an abstract event sound API. It implements the <a href="http://www.freedesktop.org/wiki/Specifications/sound-theme-spec">XDG Sound Theme and Naming Specifications</a>. libcanberra is a blessed GNOME dependency, but itself has no dependency on GNOME/Gtk/GLib and can be used with other desktop environments as well. In addition to an easy interface for playing sound files, libcanberra provides caching (which is very useful for networked thin clients) and allows passing of various meta data to the underlying audio system which then can be used to enhance user experience (such as positional event sounds) and for improving accessibility. libcanberra supports multiple backends and is portable beyond Linux. Among the supported backends are ALSA, OSS, PulseAudio, GStreamer. [<a href="http://0pointer.de/lennart/projects/libcanberra/gtkdoc/">API Reference</a>]</dd> <dt style="padding-top:8pt"><i>JACK</i></dt> <dd><a href="http://jackaudio.org/">JACK</a> is a sound system for connecting professional audio production applications and hardware output. It's focus is low-latency and application interconnection. It is not useful for normal desktop or embedded use. It is not an API that is particularly useful if all you want to do is simple PCM playback. JACK supports multiple backends, although ALSA is best supported. JACK is portable beyond Linux. Among the supported backends are ALSA, OSS. [<a href="http://jackaudio.org/files/docs/html/index.html">API Reference</a>]</dd> <dt style="padding-top:8pt"><i>Full ALSA</i></dt> <dd><a href="http://www.alsa-project.org/">ALSA</a> is the Linux API for doing PCM playback and recording. ALSA is very focused on hardware devices, although other backends are supported as well (to a limit degree, see below). ALSA as a name is used both for the Linux audio kernel drivers and a user-space library that wraps these. ALSA -- the library -- is comprehensive, and portable (to a limited degree). The full ALSA API can appear very complex and is large. However it supports almost everything modern sound hardware can provide. Some of the functionality of the ALSA API is limited in its use to actual hardware devices supported by the Linux kernel (in contrast to software sound servers and sound drivers implemented in user-space such as those for Bluetooth and FireWire audio -- among others) and Linux specific drivers. [<a href="http://www.alsa-project.org/alsa-doc/alsa-lib/">API Reference</a>]</dd> <dt style="padding-top:8pt"><i>Safe ALSA</i></dt> <dd>Only a subset of the full ALSA API works on all backends ALSA supports. It is highly recommended to stick to this <i>safe</i> subset if you do ALSA programming to keep programs portable, future-proof and compatible with sound servers, Bluetooth audio and FireWire audio. See below for more details about which functions of ALSA are considered safe. The <i>safe</i> ALSA API is a suitable abstraction for basic, portable PCM playback and recording -- not just for ALSA kernel driver supported devices. Among the supported backends are ALSA kernel driver devices, OSS, PulseAudio, JACK.</dd> <dt style="padding-top:8pt"><i>Phonon</i> and <i>KNotify</i></dt> <dd><a href="http://phonon.kde.org/">Phonon</a> is high-level abstraction for media streaming systems such as GStreamer, but goes a bit further than that. It supports multiple backends. KNotify is a system for "notifications", which goes beyond mere event sounds. However it does not support the XDG Sound Theming/Naming Specifications at this point, and also doesn't support caching or passing of event meta-data to an underlying sound system. KNotify supports multiple backends for audio playback via Phonon. Both APIs are KDE/Qt specific and should not be used outside of KDE/Qt applications. [<a href="http://api.kde.org/4.0-api/kdelibs-apidocs/phonon/html/index.html">Phonon API Reference</a>] [<a href="http://api.kde.org/4.x-api/kdebase-runtime-apidocs/knotify/html/index.html">KNotify API Reference</a>] </dd> <dt style="padding-top:8pt"><i>SDL</i></dt> <dd><a href="http://www.libsdl.org/">SDL</a> is a portable API primarily used for full-screen game development. Among other stuff it includes a portable audio interface. Among others SDL support OSS, PulseAudio, ALSA as backends. [<a href="http://www.libsdl.org/cgi/docwiki.cgi">API Reference</a>]</dd> <dt style="padding-top:8pt"><i>PulseAudio</i></dt> <dd><a href="http://pulseaudio.org/">PulseAudio</a> is a sound system for Linux desktops and embedded environments that runs in user-space and (usually) on top of ALSA. PulseAudio supports network transparency, per-application volumes, spatial events sounds, allows switching of sound streams between devices on-the-fly, policy decisions, and many other high-level operations. PulseAudio adds a <a href="http://0pointer.de/blog/projects/pulse-glitch-free.html">glitch-free</a> audio playback model to the Linux audio stack. PulseAudio is not useful in professional audio production environments. PulseAudio is portable beyond Linux. PulseAudio has a native API and also supports the <i>safe</i> subset of ALSA, in addition to limited, LD_PRELOAD-based OSS compatibility. Among others PulseAudio supports OSS and ALSA as backends and provides connectivity to JACK. [<a href="http://0pointer.de/lennart/projects/pulseaudio/doxygen/">API Reference</a>]</dd> <dt style="padding-top:8pt"><i>OSS</i></dt> <dd>The <a href="http://www.opensound.com/">Open Sound System</a> is a low-level PCM API supported by a variety of Unixes including Linux. It started out as the standard Linux audio system and is supported on current Linux kernels in the API version 3 as OSS3. OSS3 is considered obsolete and has been fully replaced by ALSA. A successor to OSS3 called OSS4 is available but plays virtually no role on Linux and is not supported in standard kernels or by any of the relevant distributions. The OSS API is very low-level, based around direct kernel interfacing using ioctl()s. It it is hence awkward to use and can practically not be virtualized for usage on non-kernel audio systems like sound servers (such as PulseAudio) or user-space sound drivers (such as Bluetooth or FireWire audio). OSS3's timing model cannot properly be mapped to software sound servers at all, and is also problematic on non-PCI hardware such as USB audio. Also, OSS does not do sample type conversion, remapping or resampling if necessary. This means that clients that properly want to support OSS need to include a complete set of converters/remappers/resamplers for the case when the hardware does not natively support the requested sampling parameters. With modern sound cards it is very common to support only S32LE samples at 48KHz and nothing else. If an OSS client assumes it can always play back S16LE samples at 44.1KHz it will thus fail. OSS3 is portable to other Unix-like systems, various differences however apply. OSS also doesn't support surround sound and other functionality of modern sounds systems properly. <b>OSS should be considered obsolete and not be used in new applications.</b> ALSA and PulseAudio have limited LD_PRELOAD-based compatibility with OSS. [<a href="http://www.opensound.com/pguide/oss.pdf">Programming Guide</a>]</dd> </dl> <p>All sound systems and APIs listed above are supported in all relevant current distributions. For libcanberra support the newest development release of your distribution might be necessary.</p> <p>All sound systems and APIs listed above are suitable for development for commercial (read: closed source) applications, since they are licensed under LGPL or more liberal licenses or no client library is involved.</p> <h3>You want to know why and when you should use a specific sound API?</h3> <dl> <dt style="padding-top:8pt"><i>GStreamer</i></dt> <dd>GStreamer is best used for very high-level needs: i.e. you want to play an audio file or video stream and do not care about all the tiny details down to the PCM or codec level.</dd> <dt style="padding-top:8pt"><i>libcanberra</i></dt> <dd>libcanberra is best used when adding sound feedback to user input in UIs. It can also be used to play simple sound files for notification purposes.</dd> <dt style="padding-top:8pt"><i>JACK</i></dt> <dd>JACK is best used in professional audio production and where interconnecting applications is required.</dd> <dt style="padding-top:8pt"><i>Full ALSA</i></dt> <dd>The full ALSA interface is best used for software on "plumbing layer" or when you want to make use of very specific hardware features, which might be need for audio production purposes.</dd> <dt style="padding-top:8pt"><i>Safe ALSA</i></dt> <dd>The <i>safe</i> ALSA interface is best used for software that wants to output/record basic PCM data from hardware devices or software sound systems.</dd> <dt style="padding-top:8pt"><i>Phonon</i> and <i>KNotify</i></dt> <dd>Phonon and KNotify should only be used in KDE/Qt applications and only for high-level media playback, resp. simple audio notifications.</dd> <dt style="padding-top:8pt"><i>SDL</i></dt> <dd>SDL is best used in full-screen games.</dd> <dt style="padding-top:8pt"><i>PulseAudio</i></dt> <dd>For now, the PulseAudio API should be used only for applications that want to expose sound-server-specific functionality (such as mixers) or when a PCM output abstraction layer is already available in your application and it thus makes sense to add an additional backend to it for PulseAudio to keep the stack of audio layers minimal.</dd> <dt style="padding-top:8pt"><i>OSS</i></dt> <dd>OSS should not be used for new programs.</dd> </dl> <h3>You want to know more about the <i>safe</i> ALSA subset?</h3> <p>Here's a list of DOS and DONTS in the ALSA API if you care about that you application stays future-proof and works fine with non-hardware backends or backends for user-space sound drivers such as Bluetooth and FireWire audio. Some of these recommendations apply for people using the full ALSA API as well, since some functionality should be considered obsolete for all cases.</p> <p>If your application's code does not follow these rules, you must have a very good reason for that. Otherwise your code should simply be considered <b>broken</b>!</p> <p>DONTS:</p> <ul> <li>Do <b>not</b> use "async handlers", e.g. via <tt>snd_async_add_pcm_handler()</tt> and friends. Asynchronous handlers are implemented using POSIX signals, which is a very questionable use of them, especially from libraries and plugins. Even when you don't want to limit yourself to the <i>safe</i> ALSA subset it is highly recommended not to use this functionality. <a href="http://mailman.alsa-project.org/pipermail/alsa-devel/2008-May/008030.html">Read this for a longer explanation why signals for audio IO are evil.</a></li> <li>Do <b>not</b> parse the ALSA configuration file yourself or with any of the ALSA functions such as <tt>snd_config_xxx()</tt>. If you need to enumerate audio devices use <tt>snd_device_name_hint()</tt> (and related functions). That is the only API that also supports enumerating non-hardware audio devices and audio devices with drivers implemented in userspace.</li> <li>Do <b>not</b> parse any of the files from <tt>/proc/asound/</tt>. Those files only include information about kernel sound drivers -- user-space plugins are not listed there. Also, the set of kernel devices might differ from the way they are presented in user-space. (i.e. sub-devices are mapped in different ways to actual user-space devices such as <tt>surround51</tt> an suchlike.</li> <li>Do <b>not</b> rely on stable device indexes from ALSA. Nowadays they depend on the initialization order of the drivers during boot-up time and are thus not stable.</li> <li>Do <b>not</b> use the <tt>snd_card_xxx()</tt> APIs. For enumerating use <tt>snd_device_name_hint()</tt> (and related functions). <tt>snd_card_xxx()</tt> is obsolete. It will only list kernel hardware devices. User-space devices such as sound servers, Bluetooth audio are not included. <tt>snd_card_load()</tt> is completely obsolete in these days.</li> <li>Do <b>not</b> hard-code device strings, especially not <tt>hw:0</tt> or <tt>plughw:0</tt> or even <tt>dmix</tt> -- these devices define no channel mapping and are mapped to raw kernel devices. It is highly recommended to use exclusively <tt>default</tt> as device string. If specific channel mappings are required the correct device strings should be <tt>front</tt> for stereo, <tt>surround40</tt> for Surround 4.0, <tt>surround41</tt>, <tt>surround51</tt>, and so on. Unfortunately at this point ALSA does not define standard device names with channel mappings for non-kernel devices. This means <tt>default</tt> may only be used safely for mono and stereo streams. You should probably prefix your device string with <tt>plug:</tt> to make sure ALSA transparently reformats/remaps/resamples your PCM stream for you if the hardware/backend does not support your sampling parameters natively.</li> <li>Do <b>not</b> assume that any particular sample type is supported except the following ones: U8, S16_LE, S16_BE, S32_LE, S32_BE, FLOAT_LE, FLOAT_BE, MU_LAW, A_LAW.</li> <li>Do <b>not</b> use <tt>snd_pcm_avail_update()</tt> for synchronization purposes. It should be used exclusively to query the amount of bytes that may be written/read right now. Do <b>not</b> use <tt>snd_pcm_delay()</tt> to query the fill level of your playback buffer. It should be used exclusively for synchronisation purposes. Make sure you fully understand the difference, and note that the two functions return values that are not necessarily directly connected!</li> <li>Do <b>not</b> assume that the mixer controls always know dB information.</li> <li>Do <b>not</b> assume that all devices support MMAP style buffer access.</li> <li>Do <b>not</b> assume that the hardware pointer inside the (possibly mmaped) playback buffer is the actual position of the sample in the DAC. There might be an extra latency involved.</li> <li>Do <b>not</b> try to recover with your own code from ALSA error conditions such as buffer under-runs. Use <tt>snd_pcm_recover()</tt> instead.</li> <li>Do <b>not</b> touch buffering/period metrics unless you have specific latency needs. Develop defensively, handling correctly the case when the backend cannot fulfill your buffering metrics requests. Be aware that the buffering metrics of the playback buffer only indirectly influence the overall latency in many cases. i.e. setting the buffer size to a fixed value might actually result in practical latencies that are much higher.</li> <li>Do <b>not</b> assume that <tt>snd_pcm_rewind()</tt> is available and works and to which degree.</li> <li>Do <b>not</b> assume that the time when a PCM stream can receive new data is strictly dependant on the sampling and buffering parameters and the resulting average throughput. Always make sure to supply new audio data to the device when it asks for it by signalling "writability" on the fd. (And similarly for capturing)</li> <li>Do <b>not</b> use the "simple" interface <tt>snd_spcm_xxx()</tt>.</li> <li>Do <b>not</b> use any of the functions marked as "obsolete".</li> <li>Do <b>not</b> use the timer, midi, rawmidi, hwdep subsystems.</li> </ul> <p>DOS:</p> <ul> <li>Use <tt>snd_device_name_hint()</tt> for enumerating audio devices.</li> <li>Use <tt>snd_smixer_xx()</tt> instead of raw <tt>snd_ctl_xxx()</tt></li> <li>For synchronization purposes use <tt>snd_pcm_delay()</tt>.</li> <li>For checking buffer playback/capture fill level use <tt>snd_pcm_update_avail()</tt>.</li> <li>Use <tt>snd_pcm_recover()</tt> to recover from errors returned by any of the ALSA functions.</li> <li>If possible use the largest buffer sizes the device supports to maximize power saving and drop-out safety. Use <tt>snd_pcm_rewind()</tt> if you need to react to user input quickly.</li> </ul> <h3>FAQ</h3> <dl> <dt style="padding-top:8pt">What about ESD and NAS?</dt> <dd>ESD and NAS are obsolete, both as API and as sound daemon. Do not develop for it any further.</dd> <dt style="padding-top:8pt">ALSA isn't portable!</dt> <dd>That's not true! Actually the user-space library is relatively portable, it even includes a backend for OSS sound devices. There is no real reason that would disallow using the ALSA libraries on other Unixes as well.</dd> <dt style="padding-top:8pt">Portability is key to me! What can I do?</dt> <dd>Unfortunately no truly portable (i.e. to Win32) PCM API is available right now that I could truly recommend. The systems shown above are more or less portable at least to Unix-like operating systems. That does not mean however that there are suitable backends for all of them available. If you care about portability to Win32 and MacOS you probably have to find a solution outside of the recommendations above, or contribute the necessary backends/portability fixes. None of the systems (with the exception of OSS) is truly bound to Linux or Unix-like kernels.</dd> <dt style="padding-top:8pt">What about PortAudio?</dt> <dd>I don't think that PortAudio is very good API for Unix-like operating systems. I cannot recommend it, but it's your choice.</dd> <dt style="padding-top:8pt">Oh, why do you hate OSS4 so much?</dt> <dd>I don't hate anything or anyone. I just don't think OSS4 is a serious option, especially not on Linux. On Linux, it is also completely redundant due to ALSA.</dd> <dt style="padding-top:8pt">You idiot, you have no clue!</dt> <dd>You are right, I totally don't. But that doesn't hinder me from recommending things. Ha!</dd> <dt style="padding-top:8pt">Hey I wrote/know this tiny new project which is an awesome abstraction layer for audio/media!</dt> <dd>Sorry, that's not sufficient. I only list software here that is known to be sufficiently relevant and sufficiently well maintained.</dd> </dl> <h3>Final Words</h3> <p>Of course these recommendations are very basic and are only intended to lead into the right direction. For each use-case different necessities apply and hence options that I did not consider here might become viable. It's up to you to decide how much of what I wrote here actually applies to your application.</p> <p>This summary only includes software systems that are considered stable and universally available at the time of writing. In the future I hope to introduce a more suitable and portable replacement for the <i>safe</i> ALSA subset of functions. I plan to update this text from time to time to keep things up-to-date.</p> <p>If you feel that I forgot a use case or an important API, then please contact me or leave a comment. However, I think the summary above is sufficiently comprehensive and if an entry is missing I most likely deliberately left it out.</p> <p>(Also note that I am upstream for both PulseAudio and libcanberra and did some minor contributions to ALSA, GStreamer and some other of the systems listed above. Yes, I am biased.)</p> <p>Oh, and please syndicate this, digg it. I'd like to see this guide to be well-known all around the Linux community. Thank you!</p> Lennart PoetteringWed, 24 Sep 2008 21:52:00 +0200tag:0pointer.net,2008-09-24:/blog/projects/guide-to-sound-apis.htmlprojectsMy take on the Plumbers Conferencehttps://0pointer.net/blog/projects/lpc-summary.html <p>I just came back from the <a href="http://linuxplumbersconf.org/">Linux Plumbers Conference</a>. As some of you might know I was doing an MC about Audio there. Don Marti attended the track and wrote up an <a href="http://lwn.net/Articles/299211/">interesting article</a> over at LWN. It's a recommended read, including the immense number of comments it already resulted in. (I will try to reply to all comments coming up, in case you have questions -- just post them over at LWN)</p> <p>I must really say though that calling that article "It's a mess" and highlighting my critical comments on the situation this way makes me feel slightly uncomfortable, though. Sure, we have some issues to fix and it's the words I chose at the conference -- but it's only part of the story. Things are not really all that bad, and we have enough good stuff to focus on.</p> <p>I enjoyed LPC, and especially the audio MC a lot. The discussions during the MC were lively, focussed and very enlightening. Much better than at others conferences I have been to the information flow was two-ways: instead of just having a speaker who talks about stuff and attendees that listen to them, here all talks were very interactive -- a lot of people in the audience had something to say, and the others did benefit from it.</p> <p>LPC organization was flawless, Portland is awesome. The food was good, too. To summarize: I am happy, very happy! I look forward for another iteration next year and hope we'll be able to have an audio MC then, too.</p> <p>LPC organizers: rock on! Takashi, Jonathan: thank you very much for your participation in the Audio MC!</p> <p>(If you are not subscribed to LWN but want to read the article linked above, ping me, I can hand out a few free links. Alternatively, wait for thursday and it will be available for free.)</p> Lennart PoetteringMon, 22 Sep 2008 02:51:00 +0200tag:0pointer.net,2008-09-22:/blog/projects/lpc-summary.htmlprojectsAudio BoFhttps://0pointer.net/blog/projects/audio-bod-lpc.html <p>To whom it may concern: there'll be an Audio BoF tomorrow (Thu) at the <a href="http://linuxplumbersconf.org/">Linux Plumbers Conference</a>, starting at 4 pm. Dont miss it.</p> Lennart PoetteringThu, 18 Sep 2008 00:03:00 +0200tag:0pointer.net,2008-09-18:/blog/projects/audio-bod-lpc.htmlprojectsNew libcanberra backendshttps://0pointer.net/blog/projects/canberra-oh-eight.html <p>I released <a href="http://0pointer.de/lennart/projects/libcanberra/">libcanberra 0.8</a> a few hours ago. Biggest changes are some portability fixes for Solaris/FreeBSD, inclusion of an OSS backend (contributed by Joe Marcus Clarke) and a GStreamer backend (contributed by Marc-Andr&eacute; Lureau). This will hopefully make <a href="http://mail.gnome.org/archives/desktop-devel-list/2008-August/msg00044.html">certain doubts</a> regarding libcanberra void.</p> <p>Oh, and <a href="http://0pointer.de/lennart/projects/libcanberra/">libcanberra now has a homepage</a>.</p> Lennart PoetteringThu, 28 Aug 2008 22:01:00 +0200tag:0pointer.net,2008-08-28:/blog/projects/canberra-oh-eight.htmlprojectsPulseAudio on Transifexhttps://0pointer.net/blog/projects/pa-on-tx.html <p>Thanks to <a href="http://dimitris.glezos.com/weblog">Dimitris Glezos</a> <a href="http://pulseaudio.org/">PulseAudio</a> and its auxiliary tools are now available on <a href="https://translate.fedoraproject.org/submit/">Fedora's Transifex</a> for translation. If you want to contribute translations, please submit them via Transifex, which will then result in direct commits to our upstream source code repositories -- without further delay or workload on my side. Submission via other ways (bug report, mail ...) will no longer be accepted.</p> <p>Submit your translations <i>now</i> <a href="https://translate.fedoraproject.org/submit/module/pulseaudio/">for PulseAudio</a>, <a href="https://translate.fedoraproject.org/submit/module/pavucontrol">for the volume control</a>, and <a href="https://translate.fedoraproject.org/submit/module/paprefs">for the preferences dialog</a>. And while we are at it, <a href="https://translate.fedoraproject.org/submit/module/avahi">Avahi's waiting for your translations, too</a>.</p> Lennart PoetteringThu, 28 Aug 2008 21:35:00 +0200tag:0pointer.net,2008-08-28:/blog/projects/pa-on-tx.htmlprojectsScott,https://0pointer.net/blog/projects/apple-development-platform.html <p><a href="http://www.netsplit.com/2008/08/11/development-platform/">in contrast to what you say</a> the Apple audio stack (CoreAudio) is far less streamlined that it might appear on first sight. The different APIs that make up the Apple audio stack are far more redundant than you might think. Also, they are different in programming style, and you can list at least as many seperate components for different areas of audio with different API/naming styles as you just did for the Linux audio stack.</p> <p>Listing two components of the Linux audio stack that are considered obsolete these days, and listing one item twice doesn't really help making your post unassailable.</p> <p>Having said that, yes, our Linux audio stack is still chaotic, redundant, badly documented and incomplete. You are very welcome to help fixing this. But just doing a bit PR and sticking a single name on the sum of it all doesn't even touch the real problems we have with the audio APIs on Linux.</p> <p>Free software development is in its very essence distributed. The fact that our APIs sometimes appear a bit higgledy-piggledy is probably just an inevitable consequence of this.</p> Lennart PoetteringTue, 12 Aug 2008 20:02:00 +0200tag:0pointer.net,2008-08-12:/blog/projects/apple-development-platform.htmlprojectsString Poolshttps://0pointer.net/blog/projects/string-pools.html <p>In part 2.4.3 of Ulrich Drepper's excellent <i><a href="http://people.redhat.com/drepper/dsohowto.pdf">How To Write Shared Libraries</a></i> (which unfortunately is a bit out-of-date these days) Ulrich suggests replacing arrays of constant strings by a single concatenated string plus an index lookup table, to avoid unnecessary relocations during startup of ELF programs. Maintaining this <i>string pool</i> is however troublesome, it is hard to read and difficult to edit. In appendix B Ulrich lists an example C excerpt which contains some code for simplifying the maintaining of such strings pools, after an idea from Bruno Haible. In my opinion however that suggestion is not that much simpler, and requires splitting off the actual strings into a seperate source file. Ugly!</p> <p>Some Free Software uses string pools to speed up relocation, e.g. <a href="http://svn.gnome.org/viewvc/gtk%2B/trunk/gdk/x11/gdksettings.c?view=markup">GTK+</a>. Some development tools like <a href="http://www.gnu.org/software/gperf/manual/html_node/Gperf-Declarations.html">gperf</a> contain support for string pools.</p> <p>All solutions for string pool maintaining I could find on the Internet were not exactly beautiful. Either they were completely manual, manual plus a validity checking tool, or very very cumbersome. Googling around I was unable to find a satisfactory tool for this purpose<sup>[1]</sup>.</p> <p>After <a href="http://blog.flameeyes.eu/">Diego Petteno</a> complained about my heavy use of arrays of constant strings in <a href="http://git.0pointer.de/?p=libatasmart.git">libatasmart</a> I sat down to change the situation, and wrote <tt><a href="http://git.0pointer.de/?p=libatasmart.git;a=blob;f=strpool.c;hb=master">strpool.c</a></tt>, a simple parser for a very, very minimal subset of C, written in plain ANSI C. It looks for two special comment markers <tt>/* %STRINGPOOLSTART% */</tt> and <tt>/* %STRINGPOOLSTOP% */</tt>, moves all immediate strings between those markers into a common string pool and rewrites the input with the strings replaced by indexes. Code accessing those strings must use the special <tt>_P()</tt> macro. With these minimal changes to a source file, passing it through <tt>strpool.c</tt> will automatically rewrite it to a string-poolized version. The nice thing about this is that the necessary changes in the source are minimal, and the code stays compilable with and without passing it through the <tt>strpool.c</tt> preprocessor.</p> <p>Here's an example. First the original non-string-poolized version:</p> <pre> static const char* const table[] = { "waldo", "uxknurz", "foobar", "fubar" }; static int main(int argc, char* argv[]) { printf("%s\n", table[2]); return 1; } </pre> <p>For later use with <tt>strpool.c</tt> we change this like this:</p> <pre> <b>#ifndef STRPOOL #define _P(x) x #endif</b> <b>/* %STRINGPOOLSTART% */</b> static const char* const table[] = { "waldo", "uxknurz", "foobar", "fubar" }; <b>/* %STRINGPOOLSTOP% */</b> static int main(int argc, char* argv[]) { printf("%s\n", <b>_P</b>(table[2])); return 1; } </pre> <p>When passed through <tt>strpool.c</tt> this will be rewritten as:</p> <pre> <b>/* Saved 3 relocations, saved 0 strings (0 b) due to suffix compression. */ static const char _strpool_[] = "waldo\0" "uxknurz\0" "foobar\0" "fubar\0"; #ifndef STRPOOL #define STRPOOL #endif #ifndef _P #define _P(x) (_strpool_ + ((x) - (const char*) 1)) #endif</b> #ifndef STRPOOL #define _P(x) x #endif /* %STRINGPOOLSTART% */ static const char* const table[] = { <b>((const char*) 1)</b>, <b>((const char*) 7)</b>, <b>((const char*) 15)</b>, <b>((const char*) 22)</b> }; /* %STRINGPOOLSTOP% */ static int main(int argc, char* argv[]) { printf("%s\n", _P(table[2])); return 1; } </pre> <p>All three versions can be compiled directly with gcc. However, the version that was passed through <tt>strpool.c</tt> compresses the number of relocations for the table array from 4 to 1. Which isn't much of a difference, but the larger your tables are the more relevant the difference in the number of necessary relocations gets.</p> <p>A more realistic example is <a href="http://git.0pointer.de/?p=libatasmart.git;a=blob;f=atasmart.c;hb=master">atasmart.c</a> which after being preprocessed with <tt>strpool.c</tt> looks like <a href="http://0pointer.de/public/atasmart.strpool.c">this</a>. In this specific example the number of necessary startup relocations goes down from > 100 to 9.</p> <p>I am note sure if the parser is 100% correct, but it works fine with all sources I tried. It even does suffix compression like gcc does for normal strings too.</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] Or maybe I just suck in googling? Anyone has a suggestion for such a tool?</small></p> Lennart PoetteringFri, 25 Jul 2008 23:32:00 +0200tag:0pointer.net,2008-07-25:/blog/projects/string-pools.htmlprojectsPulseAudio 0.9.11 releasedhttps://0pointer.net/blog/projects/pulseaudio-0.9.11.html <p><a href="https://tango.0pointer.de/pipermail/pulseaudio-discuss/2008-July/002083.html">I just relased PulseAudio 0.9.11</a>.</p> <p>It's an awesome release. To learn more about why, read the linked email, and <a href="http://0pointer.de/blog/projects/pulse-glitch-free.html">this</a> and <a href="http://0pointer.de/blog/projects/jeffrey-stedfast.html">maybe this blog story</a>.</p> <p><a href="http://pulseaudio.org/"><img src="http://pulseaudio.org/chrome/site/patitle.png" width="345" height="70" style="border: 0" alt="PulseAudio logo" /></a></p> Lennart PoetteringThu, 24 Jul 2008 15:32:00 +0200tag:0pointer.net,2008-07-24:/blog/projects/pulseaudio-0.9.11.htmlprojectsLinux Plumbers Conference CFP Extended!https://0pointer.net/blog/projects/plumbersconf-2.html <p>The <a href="http://www.linuxplumbersconf.org/cfp/">Call for Papers</a> for the <a href="http://www.linuxplumbersconf.org/">Linux Plumbers Conference</a> in September in Portland, Oregon <a href="http://lwn.net/Articles/291189/">has been extended</a> until <b>July 31st 2008</b>. It's a conference about the core infrastructure of Linux systems: the part of the system where userspace and the kernel interface. It's the first conference where the focus is specifically on getting together the kernel people who work on the userspace interfaces and the userspace people who have to deal with kernel interfaces. It's supposed to be a place where all the people doing infrastructure work sit down and talk, so that each other understands better what the requirements and needs of the other are, and where we can work towards fixing the major problems we currently have with our lower-level APIs.</p> <p>I am running the Audio microconf of the Plumbers Conference. Audio infrastructure on Linux is still heavily fragmented. Pro, desktop and embedded worlds are almost completely seperate worlds. While we have quite good driver support the user experience is far from perfect, mostly due because our infrastructure is so balkanized. Join us at the Plumbers Conference and help to fix this! If you are doing <b>audio infrastructure work</b> on Linux, make sure to attend and <b>submit a paper!</b></p> <p><a href="http://linuxplumbersconf.org/register/">Sign up soon!</a> <a href="http://linuxplumbersconf.org/cfp/">Send in your paper early!</a> The conference is expected to sell out pretty quickly!</p> <p><a href="http://www.linuxplumbersconf.org"><img style="border: 0" src="http://www.linuxplumbersconf.org/images/banner.png" alt="Plumbers Logo" width="475" height="88" /></a></p> <p>See you in Portland!</p> Lennart PoetteringWed, 23 Jul 2008 13:39:00 +0200tag:0pointer.net,2008-07-23:/blog/projects/plumbersconf-2.htmlprojectsOh, Solomon! He has outdone you!https://0pointer.net/blog/photos/istanbul-domes.html <p><a href="http://0pointer.de/static/istanbul-domes"><img src="http://0pointer.de/static/domes-many" width="800" height="219" alt="Domes" /></a></p> <p>Wide-angle lenses are a great invention.</p> Lennart PoetteringSun, 20 Jul 2008 23:29:00 +0200tag:0pointer.net,2008-07-20:/blog/photos/istanbul-domes.htmlphotosPulseAudio FUDhttps://0pointer.net/blog/projects/jeffrey-stedfast.html <p><b>Jeffrey Stedfast</b></p> <p>Jeffrey Stedfast seems to have made it his new <a href="http://jeffreystedfast.blogspot.com/2008/06/pulseaudio-solution-in-search-of.html">hobby</a> <a href="http://jeffreystedfast.blogspot.com/2008/07/more-pulseaudio-problems.html">to</a> <a href="http://jeffreystedfast.blogspot.com/2008/07/pulseaudio-again.html">bash</a> <a href="http://jeffreystedfast.blogspot.com/2008/07/pulseaudio-i-told-you-so.html">PulseAudio</a>. In a series of very negative blog postings he flamed my software and hence me in best NotZed-like fashion. Particularly interesting in this case is the fact that he apologized to me privately on IRC for this behaviour shortly after his first posting when he was critizised on <tt>#gnome-hackers</tt> -- only to continue flaming and bashing in more blog posts shortly after. Flaming is very much part of the Free Software community I guess. A lot of people do it from time to time (including me). But maybe there are better places for this than Planet Gnome. And maybe doing it for days is not particularly nice. And maybe flaming sucks in the first place anyway.</p> <p>Regardless what I think about Jeffrey and his behaviour on Planet Gnome, let's have a look on his trophies, the five "bugs" he posted:</p> <ol> <li><a href="http://bugzilla.gnome.org/show_bug.cgi?id=542296">Not directly related to PulseAudio itself.</a> Also, finding errors in code that is related to esd is not exactly the most difficult thing in the world.</li> <li><a href="http://bugzilla.gnome.org/show_bug.cgi?id=542391">The same theme</a>.</li> <li><a href="http://www.pulseaudio.org/ticket/320">Fixed 3 months ago</a>. It is certainly not my fault that this isn't available in Jeffrey's distro.</li> <li><a href="http://www.pulseaudio.org/ticket/322">A real, valid bug report</a>. Fixed in git a while back, but not available in any released version. May only be triggered under heavy load or with a bad high-latency scheduler.</li> <li><a href="http://www.pulseaudio.org/ticket/323">A valid bug, but not really in PulseAudio</a>. Mostly caused because the ALSA API and PA API don't really match 100%.</li> </ol> <p>OK, Jeffrey found a real bug, but I wouldn't say this is really enough to make all the fuss about. Or is it?</p> <p><b>Why PulseAudio?</b></p> <p>Jeffrey wrote something about '<i>solution looking for a problem</i>' when speaking of PulseAudio. While that was certainly not a nice thing to say it however tells me one thing: I apparently didn't manage to communicate well enough why I am doing PulseAudio in the first place. So, why am I doing it then?</p> <ul> <li>There's so much more a good audio system needs to provide than just the most basic mixing functionality. Per-application volumes, moving streams between devices during playback, positional event sounds (i.e. click on the left side of the screen, have the sound event come out through the left speakers), secure session-switching support, monitoring of sound playback levels, rescuing playback streams to other audio devices on hot unplug, automatic hotplug configuration, automatic up/downmixing stereo/surround, high-quality resampling, network transparency, sound effects, simultaneous output to multiple sound devices are all features PA provides right now, and what you don't get without it. It also provides the infrastructure for upcoming features like volume-follows-focus, automatic attenuation of music on signal on VoIP stream, UPnP media renderer support, Apple RAOP support, mixing/volume adjustments with dynamic range compression, adaptive volume of event sounds based on the volume of music streams, jack sensing, switching between stereo/surround/spdif during runtime, ...</li> <li>And even for the most basic mixing functionality plain ALSA/dmix is not really everlasting happiness. Due to the way it works all clients are forced to use the same buffering metrics all the time, that means all clients are limited in their wakeup/latency settings. You will burn more CPU than necessary this way, keep the risk of drop-outs unnecessarily high and still not be able to make clients with low-latency requirements happy. <a href="http://0pointer.de/blog/projects/pulse-glitch-free.html">'Glitch-Free' PulseAudio</a> fixes all this. Quite frankly I believe that 'glitch-free' PulseAudio is the single most important killer feature that should be enough to convince everyone why PulseAudio is the right thing to do. Maybe people actually don't know that they want this. But they absolutely do, especially the embedded people -- if used properly it is a must for power-saving during audio playback. It's a pity that how awesome this feature is you cannot directly see from the user interface.<sup>[1]</sup></li> <li>PulseAudio provides compatibility with a lot of sound systems/APIs that bare ALSA or bare OSS don't provide.</li> <li>And last but not least, I love breaking Jeffrey's audio. It's just soo much fun, you really have to try it! ;-)</li> </ul> <p>If you want to know more about why I think that PulseAudio is an important part of the modern Linux desktop audio stack, please <a href="http://0pointer.de/public/foss.in-pulse.pdf">read my slides from FOSS.in 2007</a>.</p> <p><b>Misconceptions</b></p> <p>Many people (like Jeffrey) wonder why have software mixing at all if you have hardware mixing? The thing is, hardware mixing is a thing of the past, modern soundcards don't do it anymore. Precisely for doing things like mixing in software SIMD CPU extensions like SSE have been invented. Modern sound cards these days are kind of "dumbed" down, high-quality DACs. They don't do mixing anymore, many modern chips don't even do volume control anymore. Remember the days where having a Wavetable chip was a killer feature of a sound card? Those days are gone, today wavetable synthesizing is done almost exlcusively in software -- and that's exactly what happened to hardware mixing too. And it is good that way. In software mixing is is much easier to do fancier stuff like DRC which will increase quality of mixing. And modern CPUs provide all the necessary SIMD command sets to implement this efficiently.</p> <p>Other people believe that JACK would be a better solution for the problem. This is nonsense. JACK has been designed for a very different purpose. It is optimized for low latency inter-application communication. It requires floating point samples, it knows nothing about channel mappings, it depends on every client to behave correctly. And so on, and so on. It is a sound server for audio production. For desktop applications it is however not well suited. For a desktop saving power is very important, one application misbehaving shouldn't have an effect on other application's playback; converting from/to FP all the time is not going to help battery life either. Please understand that for the purpose of pro audio you can make completely different compromises than you can do on the desktop. For example, while having 'glitch-free' is great for embedded and desktop use, it makes no sense at all for pro audio, and would only have a drawback on performance. So, please stop bringing up JACK again and again. It's just not the right tool for desktop audio, and this opinion is shared by the JACK developers themselves. </p> <p>Jeffrey thinks that audio mixing is nothing for userspace. Which is basically what OSS4 tries to do: mixing in kernel space. However, the future of PCM audio is floating points. Mixing them in kernel space is problematic because (at least on Linux) FP in kernel space is a no-no. Also, the kernel people made clear more than once that maths/decoding/encoding like this should happen in userspace. Quite honestly, doing the mixing in kernel space is probably one of the primary reasons why I think that OSS4 is a bad idea. The fancier your mixing gets (i.e. including resampling, upmixing, downmixing, DRC, ...) the more difficulties you will have to move such a complex, time-intensive code into the kernel.</p> <p>Not everytime your audio breaks it is alone PulseAudio's fault. For example, the original flame of Jeffrey's was about the low volume that he experienced when running PA. This is mostly due to the suckish way we initialize the default volumes of ALSA sound cards. Most distributions have simple scripts that initialize ALSA sound card volumes to fixed values like 75% of the available range, without understanding what the range or the controls actually mean. This is actually a very bad thing to do. Integrated USB speakers for example tend export the full amplification range via the mixer controls. 75% for them is incredibly loud. For other hardware (like apparently Jeffrey's) it is too low in volume. How to fix this has been discussed on the ALSA mailing list, but no final solution has been presented yet. Nonetheless, the fact that the volume was too low, is completely unrelated to PulseAudio.</p> <p>PulseAudio interfaces with lower-level technologies like ALSA on one hand, and with high-level applications on the other hand. Those systems are not perfect. Especially closed-source applications tend to do very evil things with the audio APIs (Flash!) that are very hard to support on virtualized sound systems such as PulseAudio [2]. However, things are getting better. <a href="http://pulseaudio.org/wiki/AlsaIssues">My list of issues I found in ALSA</a> is getting shorter. Many applications have already been fixed.</p> <p>The reflex "my audio is broken it must be PulseAudio's fault" is certainly easy to come up with, but it certainly is not always right.</p> <p>Also note that -- like many areas in Free Software -- development of the desktop audio stack on Linux is a bit understaffed. AFAIK there are only two people working on ALSA full-time and only me working on PulseAudio and other userspace audio infrastructure, assisted by a few others who supply code and patches from time to time, some more and some less.</p> <p><b>More Breakage to Come</b></p> <p>I now tried to explain why the audio experience on systems with PulseAudio might not be as good as some people hoped, but what about the future? To be frank: the next version of PulseAudio (0.9.11) will break even more things. The 'glitch-free' stuff mentioned above uses quite a few features of the underlying ALSA infrastructure that apparently noone has been using before -- and which just don't work properly yet on all drivers. And there are quite a few drivers around, and I only have a very limited set of hardware to test with. Already I know that the some of the most popular drivers (USB and HDA) do not work entirely correctly with 'glitch-free'.</p> <p>So you ask why I plan to release this code knowing that it will break things? Well, it works on some hardware/drivers properly, and for the others I know work-arounds to get things to work. And 0.9.11 has been delayed for too long already. Also I need testing from a bigger audience. And it is not so much 0.9.11 that is buggy, it is the code it is based on. 'Glitch-free' PA 0.9.11 is going to part of Fedora 10. Fedora has always been more bleeding edge than other other distributions. Picking 0.9.11 just like that for an 'LTS' release might however be a not a good idea.</p> <p>So, please bear with me when I release 0.9.11. Snapshots have already been available in Rawhide for a while, and hell didn't freeze over.</p> <p><b>The Distributions' Role in the Game</b></p> <p>Some distributions did a better job adopting PulseAudio than others. On the good side I certainly have to list Mandriva, Debian<sup>[3]</sup>, and Fedora<sup>[4]</sup>. OTOH Ubuntu didn't exactly do a stellar job. They didn't do their homework. Adopting PA in a distribution is a fair amount of work, given that it interfaces with so many different things at so many different places. The integration with other systems is crucial. The information was all out there, communicated on the wiki, the mailing lists and on the PA IRC channel. But if you join and hang around on neither, then you won't get the memo. To my surprise when Ubuntu adopted PulseAudio they moved into one of their 'LTS' releases rightaway <sup>[5]</sup>. Which I guess can be called gutsy -- on the background that I work for Red Hat and PulseAudio is not part of RHEL at this time. I get a lot of flak from Ubuntu users, and I am pretty sure the vast amount of it is undeserving and not my fault.</p> <p>Why Jeffrey's distro of choice (SUSE?) didn't package <tt>pavucontrol</tt> 0.9.6 although it has been released months ago I don't know. But there's <a href="http://www.pulseaudio.org/ticket/320">certainly no reason to whine about that to me</a> and bash me for it.</p> <p>Having said all this -- it's easy to point to other software's faults or other people's failures. So, admitting this, PulseAudio is certainly not bug-free, far from that. It's a relatively complex piece of software (threading, real-time, lock-free, sensitive to timing, ...), and every software has its bugs. In some workloads they might be easier to find than it others. And I am working on fixing those which are found. I won't forget any bug report, but the order and priority I work on them is still mostly up to me I guess, right? There's still a lot of work to do in desktop audio, it will take some time to get things completely right and complete. </p> <p>Calls for "audio should just work (tm)" are often heard. But if you don't want to stick with a sound system that was state of the art in the 90's for all times, then I fear things *will have* to break from time to time. And Jeffrey, I have no idea what you are actually hacking on. Some people mentioned something with Evolution. If that's true, then quite honestly, <i>"email should just work"</i>, too, shouldn't it? Evolution is not exactly famous for it's legendary bug-freeness and stability, or did I miss something? Maybe <i>you</i> should be the one to start with making things "just work", especially since Evolution has been around for much longer already.</p> <p><b>Back to Work</b></p> <p>Now that I responded to Jeffrey's FUD I think we all can go back to work and end this flamefest! I wish people would actually try to understand things before writing an insulting rant -- without the slightest clue -- but with words like "clusterfuck". I'd like to thank all the people who commented on Jeffrey's blog and basically already said what I wrote here now.</p> <p>So, and now I am off hacking a bit on PulseAudio a bit more -- or should I say in Jeffrey's words: on my <i>clusterfuck</i> that is an <i>epic fail</i> and that <i>no desktop user needs</i>?</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] BTW 'glitch-free' is nothing I invented, other OS have been doing something like this for quite a while (Vista, Mac OS). On Linux however, PulseAudio is the first and only implementation (at least to my knowledge).</small></p> <p><small>[2] In fact, Flash 9 can not be made fully working on PulseAudio. This is because the way Flash destructs it's driver backends is racy. Unfixably racy, from external code. Jeffrey complained about Flash instability in his second post. This is unfair to PulseAudio, because I cannot fix this. This is like complaining that X crashes when you use binary-only <tt>fglrx</tt>.</small></p> <p><small>[3] To Debian's standards at least. Since development of Debian is very distributed the integration of such a system as PulseAudio is much more difficult since in touches so many different packages in the system that are kind of private property by a lot of different maintainers with different views on things.</small></p> <p><small>[4] I maintain the Fedora stuff myself, so I might be a bit biased on this one... ;-)</small></p> <p><small>[5] I guess Ubuntu sees that this was a bit too much too early, too. At least that's how I understood my invitation to UDS in Prague. Since that summit I haven't heard anything from them anymore, though.</small></p> Lennart PoetteringFri, 18 Jul 2008 17:02:00 +0200tag:0pointer.net,2008-07-18:/blog/projects/jeffrey-stedfast.htmlprojectsTopkapi Tileshttps://0pointer.net/blog/photos/topkapi-tiles.html <p><a href="http://0pointer.de/public/tiles-gimped.jpg"><img src="http://0pointer.de/public/tiles-preview.jpg" width="450" height="450" alt="Topkapi Tiles" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" /></a></p> <p>Tiles in the Topkap&#x131; Saray&#x131; in &#x130;stanbul, Turkey. This time the symmetry is perfect. Thanks to Gimp.</p> Lennart PoetteringThu, 17 Jul 2008 02:45:00 +0200tag:0pointer.net,2008-07-17:/blog/photos/topkapi-tiles.htmlphotosTopkapihttps://0pointer.net/blog/photos/topkapi.html <p><a href="http://0pointer.de/public/topkapi-reduced.jpg"><img src="http://0pointer.de/public/topkapi-preview.jpg" width="450" height="450" alt="Topkapi Cupola" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" /></a></p> <p>One of the cupolas in the Topkap&#x131; Saray&#x131; in &#x130;stanbul, Turkey. In one of the inner rings is a certain asymmetry. I wonder why?</p> Lennart PoetteringThu, 17 Jul 2008 01:05:00 +0200tag:0pointer.net,2008-07-17:/blog/photos/topkapi.htmlphotosConferences and Laptop Bagshttps://0pointer.net/blog/projects/schwag-bags.html <p>Conference organizers! Be attentive to the signs of times! There's a trend towards smaller laptops. Don't hand out laptop bags where those newer laptops (such as the X60) fits in two or more times! Lighter is better!</p> Lennart PoetteringTue, 15 Jul 2008 15:02:00 +0200tag:0pointer.net,2008-07-15:/blog/projects/schwag-bags.htmlprojectsThe Thing with Planet Fedorahttps://0pointer.net/blog/fedora-people.html <p><a href="http://0pointer.de/blog/projects/fedora-planet.html">A while ago</a> I posted a story on my blog which then appeared on Fedora Planet. In it I expressed my doubts on the usefulness of the planet, due to its low signal-to-noise ratio, due to the babel-like mix of languages. As a response to this posting I got a lot of really dumb comments, both directly on the blog story and by email. I was called "intolerant", a "Nazi", "stupid", that I should "revise my geography", that I should go "fuck myself", that I apparently thought that the "world was USA property" <sup>[1]</sup>. Back then I thought that there were just a few morons in the peripherals of the community. But now, since <a href="http://nicubunu.blogspot.com/2008/07/mixed-stuff-fonts-photos-games.html">this incident happened</a> I started to wonder if we might actually have a bigger problem in the community.</p> <p>I guess this is a good opportunity to pimp David Arlie's alternative <a href="http://www.kernelplanet.org/fedora/">Fedora aggregator</a> which I find a very useful replacement for Fedora Planet.</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] I am wondering though why people think that I am a monoglot american? I am not. Neither monoglot, nor american. And if suggesting that I was was intended as an insult, then I can only say that it insulted me far less than the insulter might have thought...</small></p> Lennart PoetteringMon, 14 Jul 2008 23:08:00 +0200tag:0pointer.net,2008-07-14:/blog/fedora-people.htmlmiscBeing Smarthttps://0pointer.net/blog/projects/being-smart.html <p>Last weekend I set myself the task to write an <a href="http://en.wikipedia.org/wiki/S.M.A.R.T.">ATA S.M.A.R.T.</a> (i.e. hard disk health monitoring) reader and parser. After spending some time reading all kinds of <a href="http://www.t13.org/">T13</a> and <a href="http://www.t10.org/">T10</a> docs and a bit of hacking I now present you the following new software:</p> <ul> <li><b><tt>libatasmart</tt>:</b> a lean, small and clean implementation of an ATA S.M.A.R.T. reading and parsing library. It's fairly comprehensive, however I only support a subset of the full S.M.A.R.T. set of functions: those parts which made sense to me, not the esoteric stuff. <a href="http://git.0pointer.de/?p=libatasmart.git;a=blob;f=atasmart.h">Here's the API</a> and <a href="http://git.0pointer.de/?p=libatasmart.git;a=blob;f=README">here's the README</a>.</li> <li><b>skdump:</b> a little tool that produces a similar output to <tt>smartctl -a</tt>, but uses <tt>libatasmart</tt>.</li> <li><b>sktest:</b> a little tool for starting/aborting S.M.A.R.T. self-tests, based on <tt>libatasmart</tt></li> <li><b>gnome-disk-health-service:</b> a little wrapper around <tt>libatasmart</tt> that exports its entire functionality via D-Bus, so that unpriviliged processes can introspect a drive's health records, including temperature, number of bad sectors and suchlike. This is written in Vala, which BTW is awesome for doing D-Bus services. Actually after having done this once now I really hope I will never have to write a D-Bus server without Vala again. I also wrote a Vala <tt>.vapi</tt> file for <tt>libatasmart</tt> which is shipped in its tarball.</li> <li><b>gnome-disk-health:</b> a little tool that reads the S.M.A.R.T. data from g-d-h-s and presents it in a pretty dialog. Includes support for viewing attributes and starting self-tests and stuff. Also written with Vala.</li> </ul> <p><b>Why?</b> You might ask what the point of all this stuff is where <a href="http://smartmontools.sourceforge.net/">smartmontools</a> already exists. What I'd like to see on future GNOME desktops is that as soon as a disk starts to fail a notification bubble pops up warning the user about this fact, and suggesting that he makes backups and replaces the disk. For a tight integration into the desktop, a S.M.A.R.T. implementation that is small, and not C++, and a library (i.e. embeddable into other software with a sane interface) is highly preferable. Also, stuff like distribution installers should link against <tt>libatasmart</tt> to warn the user about old, and defective disks before he even starts the installation on them. (Hey, anaconda developers! That means you! It's a tiny library, and all you need to do is a single call: <tt>int sk_disk_smart_status(SkDisk *d, SkBool *good);</tt>)</p> <p>Please note that I certainly don't plan to replace <tt>smartmontools</tt>. <tt>libatasmart</tt> will always implement only a subset of S.M.A.R.T. If you want the full set of functionality then please refer to <tt>smartmontools</tt>.</p> <p><b>Where's this going?</b> I plan to fully maintain <tt>libatasmart</tt> (including <tt>skdump</tt> and <tt>sktest</tt>) for the future. However <tt>g-d-h</tt> and <tt>g-d-h-s</tt> will probably just bitrot in my repository -- unless someone else wants to pick this up and maintain it. The reason my further interest in those tools is rather limited is that for the long run we will hopefully will see davidz's <a href="http://hal.freedesktop.org/docs/DeviceKit-disks/">DeviceKit-disks</a> (<a href="http://people.freedesktop.org/~david/gdu-smart-and-failing.png">screenhot</a>) changed to use this library for health monitoring. Then DK-d will export the S.M.A.R.T. info on the bus, and a separate daemon would not be necessary anymore. DK-d provides a single interface for all kinds of health parameters for storage, including RAID health and suchlike. I thus think this is the way forward and not g-d-h-s. (That should, of course, not hinder anyone to step up and take up maintainership of g-d-h/g-d-h-s if he wants to. There might be good reasons for doing so. Maybe because you need something to do, or because you want a S.M.A.R.T. solution for the desktop now, and not wait until DeviceKit gets pushed into all the distros).</p> <p>So, here's where you can get this stuff:</p> <blockquote> <p><tt>git://git.0pointer.de/libatasmart.git</tt></p> <p><tt>git://git.0pointer.de/gnome-disk-health.git</tt></p> </blockquote> <p><a href="http://git.0pointer.de/">Browse the GIT repos.</a></p> <p>I will roll a 0.1 tarball of <tt>libatasmart</tt> soon. I'd be thankful if people could run <tt>skdump</tt> on their disks and check if its output is basically the same as <tt>smartctl -a</tt>'s. Especially people with BE machines.</p> <p>Of course the most important part of a software announcement is always the screenshot:</p> <p><a href="http://0pointer.de/public/g-d-h"><img src="http://0pointer.de/public/g-d-h-small" width="500" height="401" alt="Smart-Ass!" /></a></p> <p><tt>return -ETOOMANYDOTS;</tt></p> Lennart PoetteringTue, 01 Jul 2008 23:43:00 +0200tag:0pointer.net,2008-07-01:/blog/projects/being-smart.htmlprojectsOn Version Control Systemshttps://0pointer.net/blog/projects/on-version-control-systems.html <p>Here's what I have to say about today's state of version control systems in Free Software:</p> <p>We shouldn't forget that a VC system is just a development tool. Preferring one over the other is nothing that has any direct influence on code quality, it doesn't make your algorithms perform any better, or your applications look prettier. It's just a tool. As such it should just do its job and get out of the way. A programmer should have religious arguments about code quality, about algorithms or about UIs, but what he certainly should not have is religious arguments over the feature set of specific VCSes<sup>[1]</sup>.</p> <p>Does this mean it doesn't matter at all which VCS to choose? No, of course it does matter a lot. The step from traditional VCSes to DVCS is a major one, an important one. Starting a fresh new Free Software project today and choosing CVS or SVN is anachronistic at best.</p> <p>Which leaves of course the question, which DVCS to pick. If you take the "get out of the way" requirement seriously than there can only be one answer to the question: GIT. Why? It certainly (still) has a steep learning curve, and a steeper one than most other VC systems. But what is even harder to learn than GIT is learning all of GIT, Mercurial, Monotone, Bizarre^H^H^H^H^H^H^HBazaar, Darcs, Arch, SVK at the same time. If every project picked a different VCS system, and you'd want to contribute to more than just a single project, then you'd have to learn them all. And learning them all means learning them all not very well. And needing to learn them all means scaring people away who don't want to learn yet another VCS just to check out your code. Fragmentation in use of VCSes for Free Software projects hinders development. </p> <p>Which brings me to the main point I want to raise with this blog story:</p> <blockquote><p><i>It is much more important to make contributing to Free Software projects easy by choosing a VCS everyone knows well -- than it is to make it easy by choosing a VCS that everyone could learn easily.</i></p></blockquote> <p>So, and which VCS is it that has a chance of qualifying as "everyone knows well" and is a DVCS? I would say there is only one answer to the question: GIT. Sure, there are some high-profile projects using HG (Mozilla, Java, Solaris), but my impression is that the vast majority of projects that are central to free desktops do use GIT.</p> <p>Certainly, some DVCSes might be nicer than others, there might be areas where GIT is lacking in comparison to others, but those differences are tiny. What matters more is not scaring contributors away by making it hard for them to contribute by requiring them to learn yet another VCS.</p> <p>Yes, with CVS, SVN and GIT I think I have learned enough VC systems for now. My hunger for learning further ones is exactly zero. Let me just code, and don't make it hard for me by asking me to learn your favourite one, please.</p> <p>Or in other, frank words, if you start a new Open Source project today, and you don't choose GIT as VCS then you basically ask potential contributors to go away.</p> <p>ALSA recently switched from Mercurial to GIT. That was a good move.</p> <p>So, please stop discussing which DVCS is the best one. It doesn't matter. Picking one that everyone knows is far more important.</p> <p>That's all I have to say.</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] Of course, unless he himself develops a VC system.</small></p> Lennart PoetteringSat, 21 Jun 2008 20:20:00 +0200tag:0pointer.net,2008-06-21:/blog/projects/on-version-control-systems.htmlprojectsFOMS 2009 CFPhttps://0pointer.net/blog/projects/foms-2009.html <p>And here's a <a href="http://www.foms-workshop.org/foms2009/pmwiki.php/Main/CFP">another conference CFP</a>, this time for <a href="http://www.foms-workshop.org/foms2009/">Foundations of Open Media Software 2009</a> (FOMS). It's simply the best conference about multimedia on free systems. Period.</p> <p>It's the third iteration now, and the first two were plain awesome, so don't miss this one. It happens in Hobart, Tasmania, next to linux.conf.au 2009.</p> <p><a href="http://www.foms-workshop.org/foms2009/"><img src="http://www.foms-workshop.org/foms2009/pub/skins/foms.png" width="170" height="170" alt="FOMS Logo" /></a></p> <p>Send in your paper! Attend! Spread the word!</p> Lennart PoetteringThu, 19 Jun 2008 21:25:00 +0200tag:0pointer.net,2008-06-19:/blog/projects/foms-2009.htmlprojectsLinux Plumbers Conference CFPhttps://0pointer.net/blog/projects/plumbersconf.html <p>The <a href="http://www.linuxplumbersconf.org/cfp/">Call for Papers</a> for the <a href="http://www.linuxplumbersconf.org/">Linux Plumbers Conference</a> in September in Portland is out now. It's a conference about the core infrastructure of Linux systems: the part of the system where userspace and the kernel interface. It's the first conference where the focus is specifically on getting together the kernel people who work on the userspace interfaces and the userspace people who have to deal with kernel interfaces. It's supposed to be a place where all the people doing infrastructure work sit down and talk, so that each other understands better what the requirements and needs of the other are, and where we can work towards fixing the major problems we currently have with our lower-level APIs.</p> <p>I am running the Audio microconf of the Plumbers Conference. Audio infrastructure on Linux is still heavily fragmented. Pro, desktop and embedded worlds are almost completely seperate worlds. While we have quite good driver support the user experience is far from perfect, mostly due because our infrastructure is so balkanized. Join us at the Plumbers Conference and help to fix this! <b>If you are doing audio infrastructure work on Linux, make sure to attend or -- even better -- submit a paper!</b></p> <p><a href="http://linuxplumbersconf.org/register/">Sign up soon!</a> <a href="http://linuxplumbersconf.org/cfp/">Send in your paper early!</a> The conference is expected to sell out pretty quickly!</p> <p><a href="http://www.linuxplumbersconf.org"><img src="http://www.linuxplumbersconf.org/images/banner.png" alt="Plumbers Logo" width="950" height="177" /></a></p> <p>See you in Portland!</p> Lennart PoetteringThu, 19 Jun 2008 16:53:00 +0200tag:0pointer.net,2008-06-19:/blog/projects/plumbersconf.htmlprojectsHow to convert a GIT SVN mirror into GIT upstreamhttps://0pointer.net/blog/projects/git-mirror-to-upstream.html <p>Yesterday <a href="https://tango.0pointer.de/pipermail/pulseaudio-discuss/2008-June/001952.html">I did the final steps</a> to convert all my SVN repositories to <a href="http://git.0pointer.de/">GIT</a> (including Avahi and PulseAudio). I had been running hot GIT mirrors of the SVN repositories for quite a while now. The last step was the switch to make them canonical upstream, and to disable the SVN repos.</p> <p>For future Google reference, here are the steps that are necessary to make an SVN GIT mirror into a proper GIT repo:</p> <pre> # On the client: $ git clone ssh://..../git/foobar foobar $ cd foobar $ git checkout trunk $ git branch -m master $ git push origin master # This is a good time to edit the HEAD file on the server and replace its contents "ref: refs/heads/trunk" by "ref: refs/heads/master" $ git push origin :trunk </pre> <p>This will basically replace '<tt>trunk</tt>' by '<tt>master</tt>', and make it the default when clients clone the repository. This will however not rename tags from the <tt>git-svn</tt> style to the GIT style. (Which I personally think would be a bad idea anyway, BTW)</p> <p>Removing the <tt>origin</tt> from the server's config file is a good idea, too, since the repo is now canonical upstream.</p> <p>Of course, afterwards you still need to create proper <tt>.gitignore</tt> files for the repositories. Just taking the value of the old <tt>svn:ignore</tt> property is a bad idea BTW, because <tt>.gitignore</tt> lists patterns that are used for the directory it is placed in <i>and everything beneath</i>, while <tt>svn:ignore</tt> is not applied recursively.</p> <p>And finally you need to remove all those <tt>$Id$</tt> lines and suchlike from all source files since they are kind of pointless on GIT. It is left as an excercise to the user to craft a good sed or perl script to do this automatically and recursively for an entire tree.</p> <p><b>Lazyweb</b>, do you have a good idea how to integrate <tt>mutt</tt> and <tt>git-am</tt> best? I want a key in mutt I can press which will ask me for a GIT directory and then call <tt>git-am --interactive</tt> for the currently selected email. Anyone got a good idea? Right now I am piping the mail from <tt>mutt</tt> to <tt>git-am</tt>. But that sucks, because <tt>--interactive</tt> refuses to work called like that and because I cannot specify the git repo to apply this to.</p> Lennart PoetteringWed, 18 Jun 2008 17:29:00 +0200tag:0pointer.net,2008-06-18:/blog/projects/git-mirror-to-upstream.htmlprojectsK lovers and event soundshttps://0pointer.net/blog/projects/k-lovers-and-event-sounds.html <p>OK, before more people complain that I didn't keep the KDE in the loop about <a href="http://0pointer.de/blog/projects/sixfold-announcement.html">all that fancy event sound infrastructure work</a>. The complaint is only partially valid: stuff like the sound specs have been seen before by the KDE guys. And for the rest it's just better to have something concrete to discuss about first instead of just starting an unfocussed discussion about all the grand plans we might have without ever having looked into actually implementing them.</p> <p>Shortly after I posted that last blog story of mine <a href="http://article.gmane.org/gmane.comp.kde.devel.general/53475">I informed the KDE guys</a> about this, and asked for their comments and suggestions. And <a href="http://article.gmane.org/gmane.comp.kde.devel.general/53571">this is my summary of those dicussions</a>.</p> Lennart PoetteringThu, 12 Jun 2008 18:29:00 +0200tag:0pointer.net,2008-06-12:/blog/projects/k-lovers-and-event-sounds.htmlprojectsA Sixfold Announcementhttps://0pointer.net/blog/projects/sixfold-announcement.html <p>Let's have a small poll here: what is the most annoying feature of a modern GNOME desktop? You got three options to choose from:</p> <ol> <li>Event sounds, if they are enabled</li> <li>Event sounds, if they are enabled</li> <li>Event sounds, if they are enabled</li> </ol> <p>Difficult choice, right?</p> <p>In my pursuit to make this choice a little bit less difficult, I'd like to draw your attention to the following six announcements:</p> <p><b>Announcement Number One: The XDG Sound Theming Specification</b></p> <p>Following closely the mechanisms of the <a href="http://standards.freedesktop.org/icon-theme-spec/icon-theme-spec-latest.html">XDG Icon Theme Specification</a> I may now announce you the <a href="http://0pointer.de/public/sound-theme-spec.html">XDG Sound Theme Specification</a> which will hopefully be established as the future standard for better event sound theming for free desktops. This project was started by Patryk Zawadzki and is now maintained by Marc-Andr&#233; Lureau.</p> <p><b>Announcement Number Two: The XDG Sound Naming Specification</b></p> <p>If we have a Sound Theming Specification, then we also need an <a href="http://0pointer.de/public/sound-naming-spec.html">XDG Sound Naming Specification</a>, again drawing heavily from the original <a href="http://standards.freedesktop.org/icon-naming-spec/icon-naming-spec-latest.html">XDG Icon Naming Specification</a>. It's based on some older <i>Bango</i> work (which seems to be a defunct project these days), and is also maintained by Monsieur Lureau. The list of defined sounds is hopefully much more complete than any previous work in this area for free desktops.</p> <p><b>Announcement Number Three: The freedesktop Sound Theme</b></p> <p>Of course, what would the mentioned two standards be worth if there wasn't a single implementation of them? So here I may now announce you the first (rubbish) version of the <a href="http://0pointer.de/public/sound-theme-freedesktop.tar.gz">XDG freedesktop Sound Theme.</a>. It's basically just a tarball with a number of symlinks linking to the old <tt>gnome-audio</tt> event sounds. It's only a very small subset of the entire list of XDG sound names. My hope is that this initial release will spark community contributions for a better, higher quality default sound theme for free desktops. If you are some kind of musician or audio technician I am happy to take your submissions!</p> <p><b>Announcement Number Four: The libcanberra Event Sound API</b></p> <p>Ok, we now have those two specs, and an example theme, what else is missing to make this stuff a success? Absolutely right, an actual implementation of the sound theming logic! And this is what <a href="http://git.0pointer.de/?p=libcanberra.git;a=summary">libcanberra</a> is. It is a very small and lean implementation of the specification. However, it is also very powerful, and can be used in a much more elaborate way than previous APIs. It's all about the central function called <tt>ca_context_play()</tt> which takes a NULL terminated list of string properties for the sound you want to generate. How this looks like?</p> <pre> { ca_context *c = NULL; /* Create a context for the event sounds for your application */ ca_context_create(&amp;c); /* Set a few application-global properties */ ca_context_change_props(c, CA_PROP_APPLICATION_NAME, "An example", CA_PROP_APPLICATION_ID, "org.freedesktop.libcanberra.Test", CA_PROP_APPLICATION_ICON_NAME, "libcanberra-test", NULL); /* ... */ /* Trigger an event sound */ ca_context_play(c, 0, CA_PROP_EVENT_ID, "button-pressed", /* The XDG sound name */ CA_PROP_MEDIA_NAME, "The user pressed the button foobar", CA_PROP_EVENT_MOUSE_X, "555", CA_PROP_EVENT_MOUSE_Y, "666", CA_PROP_WINDOW_NAME, "Foobar Dialog", CA_PROP_WINDOW_ICON_NAME, "libcanberra-test-foobar-dialog", CA_PROP_WINDOW_X11_DISPLAY, ":0", CA_PROP_WINDOW_X11_XID, "4711", NULL); /* ... */ ca_context_destroy(&amp;c); } </pre> <p>So, the idea is pretty simple, it's all built around those sound event properties. A few you initialize globally for your application, and some you pass each time you actually want to trigger a sound. The properties listed above are only a subset of the default ones that are defined. They can be extended at any time. Why is it good to attach all this information to those event sounds? First, for a11y reasons, where visual feedback in addition of audible feedback might be advisable. And then, if the underlying sound system knows which window triggered the event it can take per-window volumes or other settings into account. If we know that the sound event was triggered by a mouse event, then the sound system could position the sound in space: i.e. if you click a button on the left side of the screen, the event sound will come more out of your left speaker, and if you click on the right, it will be positioned nearer to the right speaker. The more information the underlying audio system has about the event sound the fancier 'earcandy' it can do to enhance your user experience with all kinds of audio effects.</p> <p>The library is thread-safe, brings no dependencies besides OGG Vorbis (and of course a Libc), and whatever the used backend requires. The library can support multiple different backends. Either you can compile a single one directly into the <tt>libcanberra.so</tt> library, or you can bind them at runtime via shared objects. Right now, libcanberra supports ALSA, <a href="http://pulseaudio.org/">PulseAudio</a> and a null backend. The library is designed to be portable, however only supports Linux right now. The idea is to translate the XDG sound names into the sounds that are native the local platform (i.e. to whatever API Windows or MacOS use natively for sound events).</p> <p>Besides all that fancy property stuff it also can do implicit on-demand cacheing of samples in the sound server, cancel currently playing sounds, notify an application when a sound finished to play and other features.</p> <p>My hope is that this piece of core desktop technology can be shared by both GNOME and the KDE world.</p> <p><a href="http://0pointer.de/public/libcanberra-html/libcanberra-canberra.html">Check out the (complete!) documentation!</a></p> <p><a href="http://0pointer.de/public/libcanberra-0.1.tar.gz">Download libcanberra 0.1 now!</a></p> <p><a href="http://git.0pointer.de/?p=libcanberra.git;a=blob;f=README;h=0e4c850be8761b77041f72b475a2b21ff78e30fb;hb=master">Read the README now!</a></p> <p><b>Announcement Number Five: The libcanberra-gtk Sound Event Binding for Gtk+</b></p> <p>If you compile libcanberra with Gtk+ support (optional), than you'll get an additional library <tt>libcanberra-gtk</tt> which provides a couple of functions to simplify event sound generation from Gtk+ programs. It will maintain a global libcanberra context, and provides a few functions that will automatically fill in quite a few properties for you, so that you don't have to fill them in manually. How does that look like? Deadly simple:</p> <pre> { /* Trigger an event sound from a GtkWidget, will automaticall fill in CA_PROP_WINDOW_xxx */ ca_gtk_play_for_widget(GTK_WIDGET(w), 0, CA_PROP_EVENT_ID, "foobar-event", CA_PROP_EVENT_DESCRIPTION, "foobar event happened", NULL); /* Alternatively, triggger an event sound from a GdkEvent, will also fill in CA_PROP_EVENT_MOUSE_xxx */ ca_gtk_play_for_event(gtk_get_current_event(), 0 CA_PROP_EVENT_ID, "waldo-event", CA_PROP_EVENT_DESCRIPTION, "waldo event happened", NULL); } </pre> <p>Simple? Yes, deadly simple.</p> <p><a href="http://0pointer.de/public/libcanberra-html/libcanberra-canberra-gtk.html">Check out the (complete!) documentation!</a></p> <p><b>Announcement Number Five: the libcanberra-gtk-module Gtk+ Module</b></p> <p>Okey, the example code for libcanberra-gtk is already very simple. Can we do it even shorter? Yes!</p> <p>If you compile libcanberra with Gtk+ support, then you will also get a ne GtkModule which will automatically hook into all kinds of events inside a Gtk+ program and generate sound events from them. You can have sounds when you press a button, when you popup a menu or window, or when you select an item from a list box. It's all done automatically, no further change in the program is necessary. It works very similar to the old sound event code in libgnomeui, but is far less ugly, much more complete, and most importantly, works for all Gtk+ programs, not just those which link against libgnomeui. To activate this feature $GTK_MODULES=libcanberra-gtk-module must be set. So, just for completeness sake, here's how the example code for using this feature in your program looks like:</p> <pre> { } </pre> <p>Yes, indeed. No code changes necessary. You get all those fancy UI sounds for free. Awesome? Awesome!</p> <p>Of course, if you use custom widgets, or need more than just the simplest audio feedback for input you should link against libcanberra-gtk yourself, and add <tt>ca_gtk_play_for_widget()</tt> and <tt>ca_gtk_play_for_event()</tt> calls to your code, at the right places.</p> <p><b>Announcement Number Six: My GUADEC talk</b></p> <p>You want to know more about all this fancy new sound event world order? Then make sure to attend <a href="http://guadec.expectnation.com/guadec08/public/schedule/detail/40">my talk at GUADEC 2008 in Istanbul</a>!</p> <p>Ok, that't enough announcements for now. If you want to discuss or contribute to the two specs, then please join the <a href="http://lists.freedesktop.org/mailman/listinfo/xdg">XDG mailing list</a>. If you want to contribute to libcanberra, you are invited to join the <a href="https://tango.0pointer.de/mailman/listinfo/libcanberra-discuss">libcanberra mailing list</a>.</p> <p>Of course these six announcements won't add a happy end to the GNOME sound event story just like that. We still need better sounds, and better integration into applications. But just think of how high quality the sound events on e.g. MacOS X are, and you can see (or hear) what I hope to get for the free desktops as well. Also my hope is that since we now have a decent localization infrastructure for our sounds in place, we can make speech sound events more popular, and thus sound events much more useful. i.e. have a nice girl's voice telling you "You disc finished burning!" instead of some annoying nobody-knows-what-it-means bing sound. I am one of those who usually have there event sounds disabled all the time. My hope is that in a few months time I won't have any reason more to do so.</p> Lennart PoetteringTue, 10 Jun 2008 18:46:00 +0200tag:0pointer.net,2008-06-10:/blog/projects/sixfold-announcement.htmlprojectsDopplrhttps://0pointer.net/blog/dopplr.html <p>Until yesterday Ohloh was the only social network site I was a member of. That changed now. I joined <a href="http://dopplr.com/">DOPPLR</a>. It's pretty nice. Very Web 2.0, but in the good way. Free Software people, join now!</p> <p>No, nobody paid me for writing this, I just think it is indeed useful.</p> Lennart PoetteringMon, 02 Jun 2008 10:46:00 +0200tag:0pointer.net,2008-06-02:/blog/dopplr.htmlmiscSingapore, Australia, Hong Kong and Recifehttps://0pointer.net/blog/photos/sg-au-hk.html <p>In January/February around <a href="http://www.foms-workshop.org/foms2008/">FOMS 2008</a> and <a href="http://linux.conf.au/">linux.conf.au</a> I traveled to Singapore, Hong Kong and Australia, together with two fellow hackers, Kay and David. It took a while until I found the time to go through and sort all the photos I made on this trip. But finally I am done, and I am not going to spare you a few shots.</p> <p> <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=149"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-149.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=294"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-294.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=375"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-375.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=383"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-383.jpg" width="120" height="80" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=469"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-469.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=46"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-46.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=59"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-59.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=82"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-82.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=89"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-89.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=510"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-510.jpg" width="80" height="120" /></a> &nbsp; <br /> <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=527"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-527.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=474"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-474.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=525"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-525.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=73"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-73.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=449"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-449.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Singapore%202008-01&amp;photo=166"><img alt="Singapore" src="http://0pointer.de/photos/galleries/Singapore%202008-01/thumbs/img-166.jpg" width="80" height="120" /></a> &nbsp; </p> <p>That was <a href="http://en.wikipedia.org/wiki/Singapore">Singapore</a>. The next destination on the trip was Australia, more specifically <a href="http://en.wikipedia.org/wiki/Great_Ocean_Road">Great Ocean Road</a> and the <a href="http://en.wikipedia.org/wiki/Northern_Territory">Northern Territory</a>.</p> <p> <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=228"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-228.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=129"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-129.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=274"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-274.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=497"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-497.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=381"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-381.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=470"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-470.jpg" width="120" height="80" /></a> &nbsp; <br /> <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=590"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-590.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=814"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-814.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=885"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-885.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=389"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-389.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=883"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-883.jpg" width="120" height="80" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=394"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-394.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=125"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-125.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=165"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-165.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=136"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-136.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=193"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-193.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=237"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-237.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=840"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-840.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=325"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-325.jpg" width="80" height="120" /></a> &nbsp; <br /> <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=393"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-393.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=94"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-94.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=195"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-195.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=453"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-453.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=622"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-622.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=731"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-731.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Australia%202008-02&amp;photo=755"><img alt="Australia" src="http://0pointer.de/photos/galleries/Australia%202008-02/thumbs/img-755.jpg" width="80" height="120" /></a> &nbsp; </p> <p>And on we went, for <a href="http://en.wikipedia.org/wiki/Hong_Kong">Hong Kong</a>.</p> <p> <a href="http://0pointer.de/photos/?gallery=Hong Kong 2008-02&amp;photo=1"><img alt="Hong Kong" src="http://0pointer.de/photos/galleries/Hong%20Kong%202008-02/thumbs/img-1.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Hong Kong 2008-02&amp;photo=17"><img alt="Hong Kong" src="http://0pointer.de/photos/galleries/Hong%20Kong%202008-02/thumbs/img-17.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Hong Kong 2008-02&amp;photo=58"><img alt="Hong Kong" src="http://0pointer.de/photos/galleries/Hong%20Kong%202008-02/thumbs/img-58.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Hong Kong 2008-02&amp;photo=176"><img alt="Hong Kong" src="http://0pointer.de/photos/galleries/Hong%20Kong%202008-02/thumbs/img-176.jpg" width="120" height="80" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=Hong Kong 2008-02&amp;photo=25"><img alt="Hong Kong" src="http://0pointer.de/photos/galleries/Hong%20Kong%202008-02/thumbs/img-25.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Hong Kong 2008-02&amp;photo=42"><img alt="Hong Kong" src="http://0pointer.de/photos/galleries/Hong%20Kong%202008-02/thumbs/img-42.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Hong Kong 2008-02&amp;photo=14"><img alt="Hong Kong" src="http://0pointer.de/photos/galleries/Hong%20Kong%202008-02/thumbs/img-14.jpg" width="80" height="120" /></a> &nbsp; </p> <p>In March I attended the <a href="http://www.bossaconference.indt.org/">BOSSA Conference</a> in Brazil and visited <a href="http://en.wikipedia.org/wiki/Recife">Recife</a> and <a href="http://en.wikipedia.org/wiki/Olinda">Olinda</a>.</p> <p> <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=56"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-56.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=138"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-138.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=77"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-77.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=141"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-141.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=303"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-303.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=275"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-275.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=164"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-164.jpg" width="120" height="80" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=244"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-244.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=218"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-218.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=189"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-189.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=178"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-178.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=165"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-165.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=143"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-143.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=54"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-54.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Recife 2008-03&amp;photo=45"><img alt="Brazil" src="http://0pointer.de/photos/galleries/Recife%202008-03/thumbs/img-45.jpg" width="80" height="120" /></a> &nbsp; </p> <p>That's all for now.</p> Lennart PoetteringMon, 05 May 2008 00:51:00 +0200tag:0pointer.net,2008-05-05:/blog/photos/sg-au-hk.htmlphotos360° of Recifehttps://0pointer.net/blog/photos/patio-de-sao-pedro.html <p><a href="http://0pointer.de/static/patio-de-sao-pedro"><img alt="Patio de São Pedro" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/patio-de-sao-pedro-small.jpeg" width="1024" height="186" /></a></p> <p>That's the colonial P&aacute;tio de S&atilde;o Pedro in Recife's Santo Ant&ocirc;nio quarter.</p> Lennart PoetteringSat, 03 May 2008 18:47:00 +0200tag:0pointer.net,2008-05-03:/blog/photos/patio-de-sao-pedro.htmlphotosHong Kong from Victoria Peakhttps://0pointer.net/blog/photos/hongkong.html <p><a href="http://0pointer.de/static/hongkong"><img alt="Hong Kong" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/hongkong-small.jpeg" width="1024" height="199" /></a></p> <p>Yepp, pretty well known view.</p> Lennart PoetteringSat, 03 May 2008 18:39:00 +0200tag:0pointer.net,2008-05-03:/blog/photos/hongkong.htmlphotos360° of BOSSAhttps://0pointer.net/blog/photos/summerville.html <p><a href="http://0pointer.de/static/summerville"><img alt="Summerville Beach" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/summerville-small.jpeg" width="1024" height="183" /></a></p> <p>That's the beach of the Summerville Resort near Porto de Galinhas, Brazil, where the best Free Software conference in existence took place in 2008: <a href="http://www.bossaconference.indt.org/">INDT's BOSSA Conference</a>. Oh boy, if you don't believe how good it was, <a href="http://youtube.com/watch?v=sdo-bY6TBzA">just watch their video</a>.</p> Lennart PoetteringSat, 03 May 2008 18:32:00 +0200tag:0pointer.net,2008-05-03:/blog/photos/summerville.htmlphotos360° of Grand Place, Brusselshttps://0pointer.net/blog/photos/grand-place.html <p><a href="http://0pointer.de/static/grand-place"><img alt="Grand Place, Brussels" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/grand-place-small.jpeg" width="1024" height="238" /></a></p> Lennart PoetteringSat, 03 May 2008 18:28:00 +0200tag:0pointer.net,2008-05-03:/blog/photos/grand-place.htmlphotosGSoC 2008https://0pointer.net/blog/projects/gsoc-2008.html <p>I am happy that two <a href="http://code.google.com/soc/2008/">GSoC</a> projects got accepted that are related to projects I maintain:</p> <ul> <li><a href="http://code.google.com/soc/2008/gnome/appinfo.html?csaid=92FFD4A9E4DC91F9">LLMNR Protocol Integration in Avahi</a>, by Sunil Kumar Ghai, mentored by <a href="http://www.lathiat.net/blog">Trent Lloyd</a>. The <a href="http://www.gnome.org/">GNOME project</a> generously allowed this application to happen under its umbrella. <a href="http://en.wikipedia.org/wiki/LLMNR">LLMNR</a> support is a big improvement for <a href="http://avahi.org/">Avahi</a>. We will then integrate into newer Windows networks as seamless as we already integrate into MacOS X networks.</li> <li><a href="http://code.google.com/soc/2008/bluez/appinfo.html?csaid=2218999748B418AE">Integration of the Bluetooth Audio service with PulseAudio</a>, by Jo&atilde;o Paulo Rechi Vita, mentored by <a href="http://vudentz.blogspot.com/">Luiz Augusto von Dentz</a>. Made possible through the <a href="http://www.bluez.org/">BlueZ project</a>.</li> </ul> <p>I'd like to thank the GNOME and BlueZ projects for making these GSoC applications a reality.</p> Lennart PoetteringTue, 22 Apr 2008 17:08:00 +0200tag:0pointer.net,2008-04-22:/blog/projects/gsoc-2008.htmlprojectsFinally, Secure Real-Time on the Desktophttps://0pointer.net/blog/projects/cgroups-and-rtwatch.html <p>Finally, secure real-time scheduling on the Linux desktop can be become a reality. Linux 2.6.25 gained <a href="http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/scheduler/sched-rt-group.txt;h=1c6332f4543c350889eae9ba2b4c766270c1b65e;hb=HEAD">Real-Time Group Scheduling</a>, a feature which allows to limit the amount of CPU time real-time processes and threads may consume.</p> <p>Traditionally on Linux real-time scheduling was limited to priviliged processes, because RT processes can lock up the machine if they enter a busy loop. Scheduling is effectively disabled for them -- they can do whatever they want and are (almost) never preempted by the kernel in what they are doing. In 2.6.12 RLIMIT_RTPRIO was introduced. It's a <a href="http://linux.die.net/man/2/setrlimit">resource limit</a> which opened up real-time scheduling for normal user processes. However the ability to lock up the machine for RT processes was not touched by this. When using <tt>/usr/security/limits.conf</tt> to raise this limit for specific users they'd gain the ability to lock up your machine.</p> <p>Due to this raising this limit is a task that is left to the administrator on all current distros. Shipping a distro with the limit raised by default is shipping a distro where local users can easily freeze their machines.</p> <p>It was always possible to write "watchdog" tools that could supervise RT processes by running on a higher RT priority and checking the CPU load imposed by the process on the system. However, to this point it was not possible in any way that would actually be secure (so that processes cannot escape the watchdog by forking), that wouldn't require lots of work in the watchdog (which is a bad idea since it runs at a very high RT priority, thus while it doing its stuff it will block the important RT processes from running), or that wouldn't be totally ugly.</p> <p>Real-Time Group Scheduling solves the problem. It is now possible to create a <a href="http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cgroups.txt;h=31d12e21ff8a9a780cbdeef130421d419338e310;hb=HEAD">cgroup</a> for the processes to supervise. The processes cannot escape the cgroup by forking. Then, by manipulating the <tt>cpu.rt_runtime_us</tt> property of the cgroup a certain amount of RT CPU time can be assigned to the cgroup -- processes in the group cannot spend more time than this limit per one period of time. (The period length can be controlled globally via <tt>/proc/sys/kernel/sched_rt_period_us</tt>).</p> <p>To demonstrate this I wrote a tool <a href="http://git.0pointer.de/?p=rtwatch.git;a=blob;f=rtwatch.c;h=594979f9059e803bd6ef13b985563974d9197d65;hb=master">rtwatch</a> which implements this technique in a watchdog tool that is SUID root, creates a cgroup, and forks off a user defined process inside, it with raised RLIMIT_PTPRIO but normal user priviliges. The child process can then acquire RT scheduling but never consume more CPU than allowed by the cgroup, with no option to lock up the machine anymore.</p> <p>How to use this?</p> <p><tt>$ rtwatch 5 rtcpuhogger</tt></p> <p>This will start the process <tt>rtcpuhogger</tt> and grant it 5% of the available CPU time. To make sure that this is not misused by the user rtwatch will refuse to assign more than 50% CPU time to a single child. Since RT scheduling is all about determinism it is not possible to assign more than 100% CPU time (globally in sum) to all RT processes this way. Also, rtwatch will always make sure that 5% will be left for other tasks.</p> <p>To work, rtwatch needs to run on Linux 2.6.25 with CONFIG_RT_GROUP_SCHED enabled. Unfortunately the Fedora kernel <a href="https://bugzilla.redhat.com/show_bug.cgi?id=442959">is not compiled this way, yet</a>.</p> <p>Why is all this so great? Those who attended my talk <a href="http://mirror.linux.org.au/pub/linux.conf.au/2008/slides/266-realtime.pdf">Practical Real-Time Programming in Userspace</a> at Linux.conf.au 2008 (or watched <a href="http://mirror.linux.org.au/pub/linux.conf.au/2008/Fri/mel8-266.ogg">the video</a>) will know that besides the fact that I'd love to enable RT support for PulseAudio in Fedora in coming releases <i>by default</i> I'd also love to see RT programming more often used in desktop applications. Everywhere were the CPU time spent on a specific process should not depend on the overall system load, but solely on the time constraints of the job itself and what is process needs RT scheduling should be enabled. Examples for this are music or movie playback (the movie player should have enough time to decode one frame every 25th of a second, regardless what else is running on the system), fancy animations, quick reactions to user actions (i.e. updating the mouse cursor). All this for a machine that is snappier and more responsive with shorter latencies, regardless what else happens on the machine.</p> <p>The day before yesterday, when Linux 2.6.25 was released, we came a big step closer to this goal.</p> Lennart PoetteringFri, 18 Apr 2008 17:33:00 +0200tag:0pointer.net,2008-04-18:/blog/projects/cgroups-and-rtwatch.htmlprojectsRespect $LC_MESSAGES!https://0pointer.net/blog/projects/LC_MESSAGES.html <p><tt>&lt;rant&gt;</tt></p> <p>I really dislike if software ignores my setting of $LC_MESSAGES=C and shows me its UI in German, just because I set $LANG=de_DE. I hate that. I don't want no UI strings in German, the translations are mediocre. I want everything else in German (paper sizes, ...), but no strings please. That's why I configured my locale settings this way. I don't want those settings ignored.</p> <p>Please, developers, read through locale(7) and related man pages before you hack up i18n support. Thank you.</p> <p>The offenders that pissed me off right now are <a href="https://bugzilla.redhat.com/show_bug.cgi?id=441973">Firefox</a> and <a href="https://bugzilla.redhat.com/show_bug.cgi?id=439314">Fedora's man</a>.</p> <p><tt>&lt;/rant&gt;</tt></p> Lennart PoetteringFri, 11 Apr 2008 00:24:00 +0200tag:0pointer.net,2008-04-11:/blog/projects/LC_MESSAGES.htmlprojectsWhat's Cooking in PulseAudio's glitch-free Branchhttps://0pointer.net/blog/projects/pulse-glitch-free.html <p>A while ago I started development of special branch of <a href="http://pulseaudio.org/">PulseAudio</a> which is called <tt>glitch-free</tt>. In a few days I will merge it back to PulseAudio trunk, and eventually release it as 0.9.11. I think it's time to explain a little what all this "glitch-freeness" is about, what made it so tricky to implement, and why this is totally awesome technology. So, here we go:</p> <p><b>Traditional Playback Model</b></p> <p>Traditionally on most operating systems audio is scheduled via sound card <a href="http://en.wikipedia.org/wiki/Interrupt_request">interrupts (IRQs)</a>. When an application opens a sound card for playback it configures it for a fixed size playback buffer. Then it fills this buffer with digital <a href="http://en.wikipedia.org/wiki/Pulse-code_modulation">PCM</a> sample data. And after that it tells the hardware to start playback. Then, the hardware reads the samples from the buffer, one at a time, and passes it on to the <a href="http://en.wikipedia.org/wiki/Digital-to-analog_converter">DAC</a> so that eventually it reaches the speakers.</p> <p>After a certain number of samples played the sound hardware generates an interrupt. This interrupt is forwarded to the application. On Linux/Unix this is done via <tt>poll()/select()</tt>, which the application uses to sleep on the sound card file descriptor. When the application is notified via this interrupt it overwrites the samples that were just played by the hardware with new data and goes to sleep again. When the next interrupt arrives the next block of samples is overwritten, and so on and so on. When the hardware reaches the end of the hardware buffer it starts from its beginning again, in a true <a href="http://en.wikipedia.org/wiki/Ring_buffer">ring buffer</a> fashion. This goes on and on and on.</p> <p>The number of samples after which an interrupt is generated is usually called a <i>fragment</i> (ALSA likes to call the same thing a <i>period</i> for some reason). The number of fragments the entire playback buffer is split into is usually integral and usually a power of two, 2 and 4 being the most frequently used values.</p> <p><a href="http://0pointer.de/public/fragments.png"><img src="http://0pointer.de/public/fragments.png" width="1023" height="447" alt="Schematic overview" /></a><br /> <small><b>Image 1:</b> <i>Schematic overview of the playback buffer in the traditional playback model, in the best way the author can visualize this with his limited drawing abilities.</i></small></p> <p>If the application is not quick enough to fill up the hardware buffer again after an interrupt we get a buffer <i>underrun</i> ("drop-out"). An underrun is clearly hearable by the user as a discontinuity in audio which is something we clearly don't want. We thus have to carefully make sure that the buffer and fragment sizes are chosen in a way that the software has enough time to calculate the data that needs to be played, and the OS has enough time to forward the interrupt from the hardware to the userspace software and the write request back to the hardware.</p> <p>Depending on the requirements of the application the size of the playback buffer is chosen. It can be as small as 4ms for low-latency applications (such as music synthesizers), or as long as 2s for applications where latency doesn't matter (such as music players). The hardware buffer size directly translates to the latency that the playback adds to the system. The smaller the fragment sizes the application configures, the more time the application has to fill up the playback buffer again.</p> <p>Let's formalize this a bit: Let BUF_SIZE be the size of the hardware playback buffer in samples, FRAG_SIZE the size of one fragment in samples, and NFRAGS the number of fragments the buffer is split into (equivalent to BUF_SIZE divided by FRAG_SIZE), RATE the sampling rate in samples per second. Then, the overall latency is identical to BUF_SIZE/RATE. An interrupt is generated every FRAG_SIZE/RATE. Every time one of those interrupts is generated the application should fill up one fragment again, if it missed one interrupt this might become more than one. If it doesn't miss any interrupt it has (NFRAGS-1)*FRAG_SIZE/RATE time to fulfill the request. If it needs more time than this we'll get an underrun. The fill level of the playback buffer should thus usually oscillate between BUF_SIZE and (NFRAGS-1)*FRAG_SIZE. In case of missed interrupts it might however fall considerably lower, in the worst case to 0 which is, again, an underrun.</p> <p>It is difficult to choose the buffer and fragment sizes in an optimal way for an application:</p> <ul> <li>The buffer size should be as large as possible to minimize the risk of drop-outs.</li> <li>The buffer size should be as small as possible to guarantee minimal latencies.</li> <li>The fragment size should be as large as possible to minimize the number of interrupts, and thus the required CPU time used, to maximize the time the CPU can sleep for between interrupts and thus the battery lifetime (i.e. the fewer interrupts are generated the lower your audio app will show up in powertop, and that's what all is about, right?)</li> <li>The fragment size should be as small as possible to give the application as much time as possible to fill up the playback buffer, to minimize drop-outs.</li> </ul> <p>As you can easily see it is impossible to choose buffering metrics in a way that they are optimal on all four requirements.</p> <p>This traditional model has major drawbacks:</p> <ul> <li>The buffering metrics are highly dependant on what the sound hardware can provide. Portable software needs to be able to deal with hardware that can only provide a very limited set of buffer and fragment sizes.</li> <li>The buffer metrics are configured only once, when the device is opened, they usually cannot be reconfigured during playback without major discontinuities in audio. This is problematic if more than one application wants to output audio at the same time via a sound server (or <tt>dmix</tt>) and they have different requirements on latency. For these sound servers/dmix the fragment metrics are configured statically in a configuration file, and are the same during the whole lifetime. If a client connects that needs lower latencies, it basically lost. If a client connects that doesn't need as low latencies, we will continouisly burn more CPU/battery than necessary. </li> <li>It is practically impossible to choose the buffer metrics optimal for your application -- there are too many variables in the equation: you can't know anything about the IRQ/scheduling latencies of the OS/machine your software will be running on; you cannot know how much time it will actually take to produce the audio data that shall be pushed to the audio device (unless you start counting cycles, which is a good way to make your code unportable); the scheduling latencies are hugely dependant on the system load on most current OSes (unless you have an RT system, which we generally do not have). As said, for sound servers/dmix it is impossible to know in advance what the requirements on latency are that the applications that might eventually connect will have.</li> <li>Since the number of fragments is integral and <i>at least 2</i> on almost all existing hardware we will generate at least two interrupts on each buffer iteration. If we fix the buffer size to 2s then we will generate an interrupt at least every 1s. We'd then have 1s to fill up the buffer again -- on all modern systems this is far more than we'd ever need. It would be much better if we could fix the fragment size to 1.9s, which still gives us 100ms to fill up the playback buffer again, still more than necessary on most systems.</li> </ul> <p>Due to the limitations of this model most current (Linux/Unix) software uses buffer metrics that turned out to "work most of the time", very often they are chosen without much thinking, by copying other people's code, or totally at random.</p> <p>PulseAudio &lt;= 0.9.10 uses a fragment size of 25ms by default, with four fragments. That means that right now, unless you reconfigure your PulseAudio manually clients will not get latencies lower than 100ms whatever you try, and as long as music is playing you will get 40 interrupts/s. (The relevant configuration options for PulseAudio are <tt>default-fragments=</tt> and <tt>default-fragment-size-msec=</tt> in <tt>daemon.conf</tt>)</p> <p>dmix uses 16 fragments by default with a size of 21 ms each (on my system at least -- this varies, depending on your hardware). You can't get less than 47 interrupts/s. (You can change the parameters in <tt>.asoundrc</tt>)</p> <p>So much about the traditional model and its limitations. Now, we'll have a peek on how the new <tt>glitch-free</tt> branch of PulseAudio does its things. The technology is not really new. It's inspired by what Vista does these days and what Apple CoreAudio has already been doing for quite a while. However, on Linux this technology is new, we have been lagging behind quite a bit. Also I claim that what PA does now goes beyond what Vista/MacOS does in many ways, though of course, they provide much more than we provide in many other ways. The name <i>glitch-free</i> is inspired by the term Microsoft uses to call this model, however I must admit that I am not sure that my definition of this term and theirs actually is the same.</p> <p><b>Glitch-Free Playback Model</b></p> <p>The first basic idea of the <i>glitch-free</i> playback model (a better, less marketingy name is probably <i>timer-based audio scheduling</i> which is the term I internally use in the PA codebase) is to no longer depend on sound card interrupts to schedule audio but use system timers instead. System timers are far more flexible then the fragment-based sound card timers. They can be reconfigured at any time, and have a granularity that is independant from any buffer metrics of the sound card. The second basic idea is to use playback buffers that are as large as possible, up to a limit of 2s or 5s. The third basic idea is to allow rewriting of the hardware buffer at any time. This allows instant reaction on user-input (i.e. pause/seek requests in your music player, or instant event sounds) although the huge latency imposed by the hardware playback buffer would suggest otherwise.</p> <p>PA configures the audio hardware to the largest playback buffer size possible, up to 2s. The sound card interrupts are disabled as far as possible (most of the time this means to simply lower NFRAGS to the minimal value supported by the hardware. It would be great if ALSA would allow us to disable sound card interrupts entirely). Then, PA constantly determines what the minimal latency requirement of all connected clients is. If no client specified any requirements we fill up the whole buffer all the time, i.e. have an actual latency of 2s. However, if some applications specified requirements, we take the lowest one and only use as much of the configured hardware buffer as this value allows us. In practice, this means we only partially fill the buffer each time we wake up. Then, we configure a system timer to wake us up 10ms before the buffer would run empty and fill it up again then. If the overall latency is configured to less than 10ms we wakeup after half the latency requested.</p> <p>If the sleep time turns out to be too long (i.e. it took more than 10ms to fill up the hardware buffer) we will get an underrun. If this happens we can double the time we wake up before the buffer would run empty, to 20ms, and so on. If we notice that we only used much less than the time we estimated, we can halve this value again. This adaptive scheme makes sure that in the unlikely event of a buffer underrun it will happen most likely only once and never again.</p> <p>When a new client connects or an existing client disconnects, or when a client wants to rewrite what it already wrote, or the user wants to change the volume of one of the streams, then PA will resample its data passed by the client, convert it to the proper hardware sample type, and remix it with the data of the other clients. This of course makes it necessary to keep a "history" of data of all clients around so that if one client requests a rewrite we have the necessary data around to remix what already was mixed before.</p> <p>The benefits of this model are manyfold:</p> <ul> <li>We minimize the overall number of interrupts, down to what the latency requirements of the connected clients allow us. i.e. we save power, don't show up in powertop anymore for normal music playback.</li> <li>We maximize drop-out safety, because we buffer up to 2s in the usual cases. Only with operating systems which have scheduling latencies &gt; 2s we can still get drop-outs. Thankfully no operating system is that bad.</li> <li>In the event of an underrun we don't get stuck in it, but instead are able to recover quickly and can make sure it doesn't happen again.</li> <li>We provide "zero-latency". Each client can rewrite its playback buffer at any time, and this is forwarded to the hardware, even if this means that the sample currently being played needs to be rewritten. This means much quicker reaction to user input, a more responsive user experience.</li> <li>We become much less dependant on what the sound hardware provides us with. We can configure wakeup times that are independant from the fragment settings that the hardware actually supports.</li> <li>We can provide almost any latency a client might request, dynamically without reconfiguration, without discontinuities in audio.</li> </ul> <p>Of course, this scheme also comes with major complications:</p> <ul> <li>System timers and sound card timers deviate. On many sound cards by quite a bit. Also, not all sound cards allow the user to query the playback frame index at any time, but only shortly after each IRQ. To compensate for this deviation PA contains a non-trivial algorithm which tries to estimate and follow the deviation over time. If this doesn't work properly it might happen that an underrun happens much earlier than we expected.</li> <li>System timers on Unix are not very high precision. On traditional Linux with HZ=100 sleep times for timers are rounded up to multiples of 10ms. Only very recent Linux kernels with <tt>hrtimers</tt> can provide something better, but only on x86 and x86-64 until now. This makes the whole scheme unusable for low latency setups unless you run the very latest Linux. Also, <tt>hrtimers</tt> are not (yet) exposed in <tt>poll()/select()</tt>. It requires major jumping through loops to work around this limitation.</li> <li>We need to keep a history of sample data for each stream around, thus increasing the memory footprint and potentially increased cache pressure. PA tries to work against the increased memory footprint and cache pressure this might cause by doing zero-copy memory management.</li> <li>We're still dependant on the maximum playback buffer size the sound hardware supports. Many sound cards don't even support 2s, but only 300ms or suchlike.</li> <li>The rewriting of the client buffers causing rewriting of the hardware buffer complicates the resampling/converting step immensly. In general the code to implement this model is more complex than for the traditional model. Also, ALSA has not really been designed with this design in mind, which makes some things very hard to get right and suboptimal.</li> <li>Generally, this works reliably only on newest ALSA, newest kernel, newest everything. It has pretty steep requirements on software and sometimes even on hardware. To stay comptible with systems that don't fulfill these requirements we need to carry around code for the traditional playback model as well, increasing the code base by far.</li> </ul> <p>The advantages of the scheme clearly outweigh the complexities it causes. Especially the power-saving features of glitch-free PA should be enough reason for the embedded Linux people to adopt it quickly. Make PA disappear from powertop even if you play music!</p> <p>The code in the <tt>glitch-free</tt> is still rough and sometimes incomplete. I will merge it shortly into <tt>trunk</tt> and then upload a snapshot to Rawhide.</p> <p>I hope this text also explains to the few remaining PA haters a little better why PA is a good thing, and why everyone should have it on his Linux desktop. Of course these changes are not visible on the surface, my hope with this blog story is to explain a bit better why infrastructure matters, and counter misconceptions what PA actually is and what it gives you on top of ALSA.</p> Lennart PoetteringTue, 08 Apr 2008 19:54:00 +0200tag:0pointer.net,2008-04-08:/blog/projects/pulse-glitch-free.htmlprojectsUpdated PulseAudio Plugin for SDLhttps://0pointer.net/blog/projects/pa-plugin-for-sdl.html <p>Quick update for all game kiddies: apply <a href="http://0pointer.de/public/sdl-pulse-rework.patch">this patch</a> to SDL and enjoy PulseAudio in your favourite SDL based game without buffering issues. It's basically just fixes the bogus buffer metrics of Stephan's original patch.</p> Lennart PoetteringMon, 31 Mar 2008 20:34:00 +0200tag:0pointer.net,2008-03-31:/blog/projects/pa-plugin-for-sdl.htmlprojectsUpdated PulseAudio Plugin for Xinehttps://0pointer.net/blog/projects/pa-plugin-for-xine.html <p>Quick update for all K-lovers: apply <a href="http://0pointer.de/public/xine-pulse-rework.patch">this patch</a> to xine-lib and enjoy PulseAudio in Amarok and other KDE apps without stability issues. It's a race-free rework of Diego's original patch.</p> Lennart PoetteringMon, 31 Mar 2008 04:21:00 +0200tag:0pointer.net,2008-03-31:/blog/projects/pa-plugin-for-xine.htmlprojectsBOSSA 2008https://0pointer.net/blog/projects/bossa-2008.html <p>Just three words: awesome awesome awesome.</p> <p>And for those asking for it, <a href="http://0pointer.de/public/pulseaudio-bossa-2008.pdf">here are my slides</a>, in which I try to explain the new "glitch-free" audio scheduling core of PulseAudio that I recently commited to the <tt>glitch-free</tt> branch in PA SVN. I also try to make clear why this functionality is practically a *MUST* for all people who want to have low-latency audio, minimal power consumption and maximum drop-out safety for their audio playback. And thus, why all those fancy embedded Linux devices should adopt it better sooner than later. The slides might appear a bit terse if you don't have that awesome guy they usually come with presenting them to you.</p> Lennart PoetteringThu, 20 Mar 2008 22:38:00 +0100tag:0pointer.net,2008-03-20:/blog/projects/bossa-2008.htmlprojects360° of Petrified Duneshttps://0pointer.net/blog/photos/petrified-dunes.html <p><a href="http://0pointer.de/static/kings-canyon-dunes.html"><img alt="Petrified Dunes near Kings Canyon" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/kings-canyon-dunes-gimped-small.jpeg" width="1024" height="161" /></a></p> Lennart PoetteringSun, 17 Feb 2008 14:07:00 +0100tag:0pointer.net,2008-02-17:/blog/photos/petrified-dunes.htmlphotosA Whole Lot of Nothinghttps://0pointer.net/blog/photos/a-whole-lot-of-nothing.html <p><a href="http://0pointer.de/static/saltlake.html"><img alt="The Outback near Mount Conner" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/saltlake-small.jpeg" width="1024" height="126" /></a></p> Lennart PoetteringSat, 16 Feb 2008 18:50:00 +0100tag:0pointer.net,2008-02-16:/blog/photos/a-whole-lot-of-nothing.htmlphotosKata Tjuta in the Heat of the Dayhttps://0pointer.net/blog/photos/kata-tjuta.html <p><a href="http://0pointer.de/static/kata-tjuta.html"><img alt="Kata Tjuta" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/kata-tjuta-gimped-small.jpeg" width="1024" height="216" /></a></p> Lennart PoetteringFri, 15 Feb 2008 01:58:00 +0100tag:0pointer.net,2008-02-15:/blog/photos/kata-tjuta.htmlphotosBack from LCAhttps://0pointer.net/blog/projects/lca2008.html <p>After coming back from my somewhat extended linux.conf.au trip I spent the whole day grepping through email. Only 263 unprocessed emails left in my inbox. Yay.</p> <p><b>PRTPILU</b></p> <p>Thanks to the LCA guys, video footage is now available of all talks, including my <a href="http://linux.conf.au/programme/detail?TalkID=266">talk <i>Practical Real-Time Programming in Linux Userspace</i></a> (<a href="http://mirror.linux.org.au/pub/linux.conf.au/2008/Fri/mel8-266.ogg">Theora</a>, <a href="http://mirror.linux.org.au/pub/linux.conf.au/2008/slides/266-realtime.pdf">Slides</a>). In my endless modesty I have to recommend: go, watch it, it contains some really good stuff (including me not being able to divide 1 by 1000). Right now, the real-time features of the Linux kernel are seldomly used on the desktop due to a couple of reasons, among them general difficulty and unsafety to use them but predominantly it's probably just unawareness. There are a couple of situations however, where scheduling desktop processes as RT makes a lot of sense (think of video playback, mouse curse feedback, etc.), to decouple the execution (scheduling) latency from the system load. This talk focussed mostly on non-trivial technical stuff and all the limitations RT on Linux still has. To fully grok what's going on you thus need some insight into concurrent programming and stuff. </p> <p>My plan is to submit a related talk to GUADEC wich will focus more on actually building RT apps for the desktop, in the hope we will eventually be able to ship a desktop with audio and video that never skips, and where user feedback is still snappy and quick even if we do the most complicated IO intensive processing in lots of different processes in the background on slow hardware.</p> <p>I didn't have time to go through all my slides (which I intended that way and is perfectly OK), so you might want to browse through my slides even if you saw the whole clip. The slides, however, are not particularly verbose.</p> <p><b>Rumors</b></p> <p>Regarding <a href="http://q-funk.blogspot.com/2008/02/rumor-has-it.html">all</a> <a href="http://planet.gentoo.org/developers/leio/2008/02/05/spreading_the_rumors">those</a> <a href="http://blogs.gnome.org/snark/2008/02/06/spreading-the-rumors/">rumors</a> that have been spread while I -- the maintainer of PulseAudio -- was in the middle of the australian outback, fist-fighting with kangaroos near Uluru: I am not really asking anyone to port their apps to the native <a href="http://pulseaudio.org/">PulseAudio</a> API right now. While I do think the API is quite powerful and not redundant, I also acknowledge that it is very difficult to use properly (and very easy to misuse), (mostly) due to its fully asynchronous nature. The mysterious <i>libsydney</i> project is supposed to fix this and a lot more. libsydney is mostly the Dukem Nukem Forever of audio APIs right now, but in contrast to DNF I didn't really announce it publicly yet, so it doesn't really count. <tt>;-)</tt> Suffice to say, the current situation of audio APIs is a big big mess. We are working on cleaning it up. For now: stick to the well established and least-broken APIs, which boils down to ALSA. Stop using the OSS API <i>now</i>! Don't program against the ESD API (except for event sounds). But, most importantly: please stop misusing the existing APIs. I am doing my best to allow all current APIs to run without hassles on top of PA, but due to the sometimes blatant misues, or even brutal violations of those APIs it is very hard to get that working for all applications (yes, that means you, Adobe, and you, Skype). Don't expect that mmap is available on all audio devices -- it's not, and especially not on PA. Don't use <tt>/proc/asound/pcm</tt> as an API for enumerating audio devices. It's totally unsuitable for that. Don't hard code device strings. Use <tt>default</tt> as device string. Don't make assumptions that are not and cannot be true for non-hardware devices. Don't fiddle around with period settings unless you fully grok them and know what you are doing. In short: be a better citizen, write code you don't need to be ashamed of. ALSA has its limitations and my compatibility code certainly as well, but this is not an excuse for working around them by writing code that makes little children cry. If you have a good ALSA backend for your program than this will not only fix your issues with PA, but also with Bluetooth, you will have less code to maintain and also code that is much easier to maintain.</p> <p>Or even shorter: <b>Fix. Your. Broken. ALSA. Client. Code.</b> Thank you.</p> <p>Oh, if you have questions regarding PA, just ping me on IRC (if I am around) or write me an email, like everyone else. Mysterious, blogged pseudo invitations to rumored meetings is not the best way to contact me.</p> Lennart PoetteringTue, 12 Feb 2008 20:26:00 +0100tag:0pointer.net,2008-02-12:/blog/projects/lca2008.htmlprojectsIch bin ein Berlinerhttps://0pointer.net/blog/ich-bin-ein-berliner.html <p>To whom it may concern: I finally moved from Hamburg (Wohldorf) into my new flat in Berlin (Friedrichshain).</p> Lennart PoetteringThu, 17 Jan 2008 13:00:00 +0100tag:0pointer.net,2008-01-17:/blog/ich-bin-ein-berliner.htmlmiscGIT Mirrors of my SVN Repositorieshttps://0pointer.net/blog/projects/svn-git.html <p>To whom it may concern: as a first step to move away from SVN and towards GIT for all my code, I have now configured <a href="http://git.0pointer.de/">live GIT mirrors</a> for all my <a href="http://svn.0pointer.de/">SVN repositories</a>. The plan is to move fully to GIT, but not as long as the GIT integration into Trac is as painful as it is right now. The scripts I used to initialize and update the mirrors are <a href="http://0pointer.de/public/svn-live-init">svn-live-init</a> and <a href="http://0pointer.de/public/svn-live-update">svn-live-update</a>, for those interested. They are based on scripts CJ van den Berg supplied me with.</p> <p>It would be great to have the mirror to be both ways. Lazyweb, do you know how to do that?</p> Lennart PoetteringThu, 03 Jan 2008 15:12:00 +0100tag:0pointer.net,2008-01-03:/blog/projects/svn-git.htmlprojectsAvahi/Zeroconf patch for distcc updatedhttps://0pointer.net/blog/projects/avahi-distcc.html <p>I finally found them time to sit down and update my venerable <a href="http://avahi.org">Avahi</a>/<a href=" http://en.wikipedia.org/wiki/Zeroconf">Zeroconf</a> <a href="http://0pointer.de/public/distcc-avahi.patch">patch for distcc</a>. A patched <a href="http://distcc.samba.org/">distcc</a> automatically discovers suitable compiler servers on the local network, without the need to manually configure them. (<a href="http://lists.samba.org/archive/distcc/2007q4/003593.html">Announcement</a>).</p> <p>Here's a quick HOWTO for using a patched distcc like this:</p> <ul> <li>Make sure to start <tt>distccd</tt> (the server) with the new <tt>--zeroconf</tt> switch, This will make it announce its services on the network.</li> <li>Edit your <tt>$HOME/.distcc/hosts</tt> and add <tt>+zeroconf</tt>. This magic string will enable Zeroconf support in the client, i.e. will be expanded to the list of available suitable distcc servers on your LAN.</li> <li>Now set <tt>$CC</tt> to <tt>distcc gcc</tt> globally for your login sessions. This will tell all well-behaving build systems to use <tt>distcc</tt> for compilation (this doesn't work for the kernel, as one notable exception). Even better than setting <tt>$CC</tt> to <tt>distcc gcc</tt> is setting it to <tt>ccache distcc gcc</tt> which will enable <a href="http://ccache.samba.org/">ccache</a> in addition to distcc. i.e. stick something like this in your <tt>~/.bash_profile</tt>: <tt>export CC="ccache distcc gcc"</tt></li> <li>And finally use <tt>make -j `distcc -j`</tt> instead of plain <tt>make</tt> to enable parallel building with the right number of concurrent processes. Setting <tt>$MAKEFLAGS</tt> properly is an alternative option, however is suboptimal if the evalutation is only done once at login time. </li> </ul> <p>If this doesn't work for you than it is a good idea to run <tt>distcc --show-hosts</tt> to get a list of discovered distcc servers. If this list isn't complete then this is most likely due to mismatching GCC versions or architectures. To check if that's the case use <tt>avahi-browse -r _distcc._tcp</tt> and compare the values of the <tt>cc_machine</tt> and <tt>cc_version</tt> fields. Please note that different Linux distributions use different GCC machine strings. Which is expected since GCC is usually patched quite a bit on the different distributions. This means that a Fedora <tt>distcc</tt> (the client) will not find a Debian <tt>distccd</tt> (the server) and vice versa. But again: that's a feature, not a bug.</p> <p>The new <tt>-j</tt> and <tt>--show-hosts</tt> options for <tt>distcc</tt> are useful for non-zeroconf setups, too.</p> <p>The patch will automatically discover the number of CPUs on remote machines and make use of that information to better distribute jobs.</p> <p>In short: Zeroconf support in <tt>distcc</tt> is totally hot, everyone should have it!</p> <p>For more information have a look on <a href="http://lists.samba.org/archive/distcc/2004q4/002774.html">the announcement of my original patch from 2004</a> (at that time for the historic HOWL Zeroconf daemon), or read the new announcement linked above.</p> <p>Distribution packagers! Please merge this new patch into your packages! It would be a pity to withhold Zeroconf support in distcc from your users any longer!</p> <p>Unfortunately, Fedora doesn't include any distcc packages. Someone should be changing that (who's not me ;-)).</p> <p>You like this patch? Then <a href="http://www.ohloh.net/accounts/7661">give me a kudo on ohloh.net</a>. Now that I earned a golden 10 (after kicking Larry Ewing from position 64. Ha, take that Mr. Ewing!), I need to make sure I don't fall into silver oblivion again. ;-)</p> Lennart PoetteringSun, 30 Dec 2007 17:51:00 +0100tag:0pointer.net,2007-12-30:/blog/projects/avahi-distcc.htmlprojectsAvahi 0.6.22https://0pointer.net/blog/projects/avahi-0.6.22.html <p>A couple of minutes ago I released <a href="http://avahi.org/">Avahi 0.6.22</a> into the wild, the newest iteration of everyone's favourite zero configuration networking suite.</p> <p><a href="http://avahi.org/"><img style="border: 0" src="http://avahi.org/chrome/site/avahi-trac.png" width="200" height="96" alt="Avahi Logo" /></a></p> <p>You ask why this is something to blog about?</p> <p>Firstly, new in this version is Sjoerd Simons' <tt>avahi-gobject</tt> library, a GObject wrapper around the Avahi API. It allows full GObject-style object oriented programming of Zeroconf applications, with signals and everything. To all you GNOME/Gtk+ hackers out there: now it is even more fun to hack your own Zeroconf applications for GNOME/Gtk+!</p> <p>Secondly, this is the first release to ship i18n support. For those who prefer to run their systems with non-english locales<sup>[1]</sup> this should be good news. I've always been a little afraid of adding i18n support, since this either meant that I would have contstantly had to commit i18n patches, or that I would have needed to move my code to GNOME SVN. However, we now have <a href="https://publictest5.fedora.redhat.com/submit/">Fedora's Transifex</a>, which allows me to open up my SVN for translators without much organizational work on my side. Translations are handled centrally, and commited back to my repository when needed. It's a bit like Canonical's Rosetta, but with a focus on commiting i18n changes upstream, and without being closed-source crap.</p> <p>You like this release? Then <a href="http://www.ohloh.net/accounts/7661">give me a kudo on ohloh.net</a>. My ego still thirsts for gold, and I am still (or again) 25 positions away from that. ;-)</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] Personally, I run my desktop with <tt>$LC_MESSAGES=C</tt>, but LANG=de_DE, which are the settings I can recommend to everyone who is from Germany and wants to stay sane. Unfortunately it is a PITA to configure this on GNOME, though.</small></p> Lennart PoetteringMon, 17 Dec 2007 17:43:00 +0100tag:0pointer.net,2007-12-17:/blog/projects/avahi-0.6.22.htmlprojectsBack from Indiahttps://0pointer.net/blog/photos/india.html <p><a href="http://foss.in/">FOSS.in</a> was one of the best conferences I have ever been to, and a lot of fun. The organization was flawless and I can only heartily recommend everyone to send in a presentation proposal for next year's iteration. I certainly hope the commitee is going to accept my proposals next year again. Especially the food was gorgeous.</p> <p>I will spare you the usual conference photos, you can find a lot of those on <a href="http://flickr.com/search/?q=fossin2007&amp;m=tags">flickr</a>. However, what I will not spare you are a couple of photos I shot in <a href="http://en.wikipedia.org/wiki/Bangalore">Bangalore</a>, <a href="http://en.wikipedia.org/wiki/Srirangapatna">Srirangapatna</a> and <a href="http://en.wikipedia.org/wiki/Mysore">Mysore</a>.</p> <p> <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=146"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-146.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=510"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-510.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=462"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-462.jpg" width="120" height="80" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=149"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-149.jpg" width="120" height="80" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=24"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-24.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=378"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-378.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=88"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-88.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=138"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-138.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=251"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-251.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=260"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-260.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=306"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-306.jpg" width="80" height="120" /></a> &nbsp; </p> <p> <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=339"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-339.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=477"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-477.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=342"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-342.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=400"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-400.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=456"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-456.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=486"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-486.jpg" width="80" height="120" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=India%202007-12&amp;photo=524"><img alt="India" src="http://0pointer.de/photos/galleries/India%202007-12/thumbs/img-524.jpg" width="80" height="120" /></a> </p> <p> <a href="http://0pointer.de/static/mysore"><img src="http://0pointer.de/static/mysore-small.jpeg" alt="Panorama" width="1024" height="187" /></a> </p> Lennart PoetteringSun, 16 Dec 2007 11:42:00 +0100tag:0pointer.net,2007-12-16:/blog/photos/india.htmlphotosOh, Felix America!https://0pointer.net/blog/felix-america.html <p><a href="http://amazon.de/gp/product/B00065GZL2/ref=s9_asin_image_1?pf_rd_m=A3JWKAKR8XB7XF&amp;pf_rd_s=center-2&amp;pf_rd">These</a> <a href="http://www.amazon.de/Canon-EF-50mm-1-Objektiv/dp/B00066HJPC/ref=pd_ecc_rvi_2">two</a> lenses cost EUR 679 resp. EUR 386.45 at amazon.de.</p> <p>On amazon.com you can <a href="http://www.amazon.com/Canon-EF-S-10-22mm-3-5-4-5-Digital/dp/B0002Y5WXE/ref=pd_bbs_sr_1?ie=UTF8&amp;s=electro">get</a> <a href="http://www.amazon.com/Canon-50mm-Medium-Telephoto-Cameras/dp/B00009XVCZ/ref=pd_bbs_sr_1?ie=UTF8&amp;s=electr">them</a> for US$ 639, resp. US$ 289.95.</p> <p>At <a href="http://www.google.com/search?ie=UTF-8&amp;oe=UTF-8&amp;q=639+us+dollars+in+euro">today's</a> <a href="http://www.google.com/search?hl=en&amp;safe=off&amp;q=289.95+us+dollars+in+euro&amp;btnG=Search">courses</a> that's 430 EUR, resp. 195 EUR.</p> <p>Americans pay 63%, resp. 50% of what Germans have to pay for these lenses. How unfair! :-(</p> Lennart PoetteringWed, 28 Nov 2007 20:39:00 +0100tag:0pointer.net,2007-11-28:/blog/felix-america.htmlmiscLazyweb: POSIX Process Groups and Sessionshttps://0pointer.net/blog/projects/pgrp-vs-session.html <p><i>Dear Lazyweb,</i></p> <p>I have trouble understanding what exactly POSIX process groups and sessions are good for. The <a href="http://www.opengroup.org/onlinepubs/009695399/functions/setpgid.html">POSIX</a> <a href="http://www.opengroup.org/onlinepubs/009695399/functions/setsid.html">docs</a> are very vague on this. What exactly is the effect of being in a process group with some other process, and what does being in the same session with it add on top? And what is the benefit of being a group/session <b>leader</b> in contrast of just being a normal random process in the group/session?</p> <p>The only thing I understood is that <tt>kill(2)</tt> with a negative first parameter can be used to "multicast" signals to entire process groups, and that SIGINT on C-c is delivered that way. But, is that all? The POSIX docs say "<i>... for the purpose of signaling, placement in foreground or background, and other job control actions</i>", which is very vague. What are those "<i>other job control actions?</i>". What does job control persist of besides multicasting signals? And what is "<i>placement in foreground or background</i>" other than delivering signals?</p> <p>And I totally don't get POSIX sessions and how they differ from POSIX process groups. Please enlighten me!</p> <p>Puzzled,<br /> &nbsp;&nbsp;&nbsp;&nbsp;<i>Lennart</i></p> Lennart PoetteringMon, 26 Nov 2007 03:22:00 +0100tag:0pointer.net,2007-11-26:/blog/projects/pgrp-vs-session.htmlprojectsEmulated atomic operations and real-time schedulinghttps://0pointer.net/blog/projects/atomic-rt.html <p>Unfortunately not all CPU architectures have native support for <a href="http://en.wikipedia.org/wiki/Atomic_operations">atomic operations</a>, or only support a very limited subset. Most prominently <a href="http://en.wikipedia.org/wiki/ARM_architecture">ARM</a>v5 (and older) hasn't any support besides the most basic atomic swap operation<sup>[1]</sup>. Now, more and more free code is starting to use atomic operations and <a href="http://en.wikipedia.org/wiki/Lock-free">lock-free algorithms</a>, one being my own project, <a href="http://pulseaudio.org/">PulseAudio</a>. If you have ever done <a href="http://en.wikipedia.org/wiki/Real-time">real-time programming</a> you probably know that you cannot really do it without support for atomic operations. One question remains however: what to do on CPUs which support only the most basic atomic operations natively?</p> <p>On the kernel side atomic ops are very easy to emulate: just disable interrupts temporarily, then do your operation non-atomically, and afterwards enable them again. That's relatively cheap and works fine (unless you are on SMP -- which fortunately you usually are not for those CPUs). The Linux kernel does it this way and it is good. But what to do in user-space, where you cannot just go and disable interrupts?</p> <p>Let's see how the different userspace libraries/frameworks do it for ARMv5, a very relevant architecture that only knows an atomic swap (exchange) but no <a href="http://en.wikipedia.org/wiki/Compare-and-swap">CAS</a> or even atomic arithmetics. Let's start with an excerpt from <a href="http://sourceware.org/cgi-bin/cvsweb.cgi/ports/sysdeps/arm/bits/atomic.h?rev=1.1&amp;content-type=text/x-cvsweb-markup&amp;cvsroot=glibc">glibc's atomic operations implementation for ARM</a>:</p> <pre>/* Atomic compare and exchange. These sequences are not actually atomic; there is a race if *MEM != OLDVAL and we are preempted between the two swaps. However, they are very close to atomic, and are the best that a pre-ARMv6 implementation can do without operating system support. LinuxThreads has been using these sequences for many years. */</pre> <p>This comment says it all. Not good. The more you make use of atomic operations the more likely you're going to to hit this race. Let's hope glibc is not a heavy user of atomic operations. <a href="http://pulseaudio.org/">PulseAudio</a> however is, and PulseAudio happens to be my focus.</p> <p>Let's have a look on how <a href="http://google.com/codesearch?hl=en&amp;q=+show:R2PL3lqx9sY:caDbKUtFdpk:0EHcDMTKVeo&amp;sa=N&amp;cd=2&amp;ct=rc&amp;cs_p=http://www.studio-to-go.co.uk/source-packages/2.x/qt4-x11-4.2.3.tar.bz2&amp;cs_f=qt4-x11-4.2.3/src/corelib/arch/qatomic_arm.h">Qt4</a> does it:</p> <pre>extern Q_CORE_EXPORT char q_atomic_lock; inline char q_atomic_swp(volatile char *ptr, char newval) { register int ret; asm volatile("swpb %0,%1,[%2]" : "=&amp;r"(ret) : "r"(newval), "r"(ptr) : "cc", "memory"); return ret; } inline int q_atomic_test_and_set_int(volatile int *ptr, int expected, int newval) { int ret = 0; while (q_atomic_swp(&amp;q_atomic_lock, ~0) != 0); if (*ptr == expected) { *ptr = newval; ret = 1; } q_atomic_swp(&amp;q_atomic_lock, 0); return ret; }</pre> <p>So, what do we have here? A slightly better version. In standard situations it actually works. But it sucks big time, too. Why? It contains a spin lock: the variable <tt>q_atomic_lock</tt> is used for locking the atomic operation. The code tries to set it to non-zero, and if that fails it tries again, until it succeeds, in the hope that the other thread -- which currently holds the lock -- gives it up. The big problem here is: it might take a while until that happens, up to 1/HZ time on Linux. Usually you want to use atomic operations to minimize the need for mutexes and thus speed things up. Now, here you got a lock, and it's the worst kind: the spinning lock. Not good. Also, if used from a real-time thread the machine simply locks up when we enter the loop in contended state, because preemption is disabled for RT threads and thus the loop will spin forever. Evil. And then, there's another problem: it's a big bottleneck, because all atomic operations are synchronized via a single variable which is <tt>q_atomic_lock</tt>. Not good either. And let's not forget that only code that has access to <tt>q_atomic_lock</tt> actually can execute this code safely. If you want to use it for lock-free IPC via shared memory this is going to break. And let's not forget that it is unusable from signal handlers (which probably doesn't matter much, though). So, in summary: this code sucks, too.</p> <p>Next try, let's have a look on how <a href="http://svn.gnome.org/viewvc/glib/trunk/glib/gatomic.c?revision=5748&amp;view=markup">glib</a> does it:</p> <pre>static volatile int atomic_spin = 0; static int atomic_spin_trylock (void) { int result; asm volatile ( "swp %0, %1, [%2]\n" : "=&amp;r,&amp;r" (result) : "r,0" (1), "r,r" (&amp;atomic_spin) : "memory"); if (result == 0) return 0; else return -1; } static void atomic_spin_lock (void) { while (atomic_spin_trylock()) sched_yield(); } static void atomic_spin_unlock (void) { atomic_spin = 0; } gint g_atomic_int_exchange_and_add (volatile gint *atomic, gint val) { gint result; atomic_spin_lock(); result = *atomic; *atomic += val; atomic_spin_unlock(); return result; }</pre> <p>Once again, a spin loop. However, this implementation makes use of <tt>sched_yield()</tt> for asking the OS to reschedule. It's a bit better than the Qt version, since it doesn't spin just burning CPU, but instead tells the kernel to execute something else, increasing the chance that the thread currently holding the lock is scheduled. It's a bit friendlier, but it's not great either because this might still delay execution quite a bit. It's better then the Qt version. And probably one of the very few ligitimate occasions where using <tt>sched_yield()</tt> is OK. It still doesn't work for RT -- because <tt>sched_yield()</tt> in most cases is a NOP on for RT threads, so you still get a machine lockup. And it still has the one-lock-to-rule-them-all bottleneck. And it still is not compatible with shared memory.</p> <p>Then, there's <a href="http://google.com/codesearch?hl=en&amp;q=+show:qG2_4wZ91VE:2IKv4KO6qLk:JVTLjbfbCrs&amp;sa=N&amp;cd=1&amp;ct=rc&amp;cs_p=http://freshmeat.net/redir/gc/15903/url_tgz/gc6.8.tar.gz&amp;cs_f=gc-7.0/libatomic_ops-1.2/src/atomic_ops.c">libatomic_ops</a>. It's the most complex code, so I'll spare you to paste it here. Basically it uses the same spin loop. With three differences however:</p> <ol> <li>16 lock variables instead of a single one are used. The variable that is used is picked via simple hashing of the pointer to the atomic variable that shall be modified. This removes the one-lock-to-rule-them-all bottleneck.</li> <li>Instead of <tt>pthread_yield()</tt> it uses <tt>select()</tt> with a small timeval parameter to give the current holder of the lock some time to give it up. To make sure that the <tt>select()</tt> is not optimized away by the kernel and the thread thus never is preempted the sleep time is increased on every loop iteration.</li> <li>It explicitly disables signals before doing the atomic operation.</li> </ol> <p>It's certainly the best implementation of the ones discussed here: It doesn't suffer by the one-lock-to-rule-them-all bottleneck. It's (supposedly) signal handler safe (which however comes at the cost of doing two syscalls on every atomic operation -- probably a very high price). It actually works on RT, due to sleeping for an explicit time. However it still doesn't deal with <a href="http://en.wikipedia.org/wiki/Priority_inversion">priority inversion</a> problems -- which is a big issue for real-time programming. Also, the time slept in the <tt>select()</tt> call might be relatively long, since at least on Linux the time passed to <tt>select()</tt> is rounded up to 1/HZ -- not good for RT either. And then, it still doesn't work for shared memory IPC.</p> <p>So, what do we learn from this? At least one thing: better don't do real-time programming with ARMv5<sup>[2]</sup>. But more practically, how could a good emulation for atomic ops, solely based on atomic swap look like? Here are a few ideas: </p> <ul> <li>Use an implementation inspired by <tt>libatomic_ops</tt>. Right now it's the best available. It's probably a good idea, though, to replace <tt>select()</tt> by a <tt>nanosleep()</tt>, since on recent kernels the latter doesn't round up to 1/HZ anymore, at least when you have high-resolution timers<sup>[3]</sup> Then, if you can live without signal handler safety, drop the signal mask changing.</li> <li>If you use something based on <tt>libatomic_ops</tt> and want to use it for shared memory IPC, then you have the option to move the lock variables into shared memory too. Note however, that this allows evil applications to lock up your process by taking the locks and never giving them up. (Which however is always a problem if not all atomic operations you need are available in hardware) So if you do this, make sure that only trusted processes can attach to your memory segment.</li> <li>Alternatively, spend some time and investigate if it is possible to use futexes to sleep on the lock variables. This is not trivial though, since futexes right now expect the availability of an atomic increment operation. But it might be possible to emulate this good enough with the swap operation. There's now even a FUTEX_LOCK_PI operation which would allow <a href="http://en.wikipedia.org/wiki/Priority_inheritance">priority inheritance</a>.</li> <li>Alternatively, find a a way to allow user space disabling interrupts cheaply (requires kernel patching). Since enabling RT scheduling is a priviliged operation already (since you may easily lock up your machine with it), it might not be too problematic to extend the ability to disable interrupts to user space: it's just yet another way to lock up your machine.</li> <li>For the <tt>libatomic_ops</tt> based algorithm: if you're lucky and defined a struct type for your atomic integer types, like the kernel does, or like I do in PulseAudio with <tt>pa_atomic_t</tt>, then you can stick the lock variable directly into your structure. This makes shared memory support transparent, and removes the one-lock-to-rule-them-all bottleneck completely. Of course, OTOH it increases the memory consumption a bit and increases cache pressure (though I'd assume that this is neglible).</li> <li>For the <tt>libatomic_ops</tt> based algorithm: start sleeping for the time returned by <a href="http://www.opengroup.org/onlinepubs/000095399/functions/clock_getres.html">clock_getres()</a> (cache the result!). You cannot sleep shorter than that anyway.</li> </ul> <p>Yepp, that's as good as it gets. Unfortunately I cannot serve you the optimal solution on a silver platter. I never actually did development for ARMv5, this blog story just sums up my thoughts on all the code I saw which emulates atomic ops on ARMv5. But maybe someone who actually cares about atomic operations on ARM finds this interesting and maybe invests some time to prepare patches for Qt, glib, glibc -- and <a href="http://pulseaudio.org/">PulseAudio</a>.</p> <p><b>Update:</b> I added two more ideas to the list above.</p> <p><b>Update 2:</b> Andrew Haley just posted something like <a href="http://0pointer.de/blog/projects/atomic-rt#1194273540.23">the optimal solution</a> for the problem. It would be great if people would start using this.</p> <p><small><b>Footnotes</b></small></p> <p><small>[1] The Nokia 770 has an ARMv5 chip, N800 has ARMv6. The OpenMoko phone apparently uses ARMv5.</small></p> <p><small>[2] And let's not even think about CPUs which don't even have an atomic swap!</small></p> <p><small>[3] Which however you probably won't, given that they're only available on x86 on stable Linux kernels for now -- but still, it's cleaner.</small></p> Lennart PoetteringMon, 05 Nov 2007 00:17:00 +0100tag:0pointer.net,2007-11-05:/blog/projects/atomic-rt.htmlprojectsRain in Montrealhttps://0pointer.net/blog/photos/montreal-rain.html <p>Sometimes, rain can be quite beautiful.</p> <p><a href="http://0pointer.de/photos/?gallery=Montreal%202007-07&amp;photo=8&amp;exif_style=&amp;show_thumbs="><img src="http://0pointer.de/static/montreal-rain-1.jpeg" alt="Montreal 1" width="156" height="234" /></a>&nbsp;<a href="http://0pointer.de/photos/?gallery=Montreal%202007-07&amp;photo=10"><img src="http://0pointer.de/static/montreal-rain-2.jpeg" alt="Montreal 2" width="350" height="234" /></a>&nbsp;<a href="http://0pointer.de/photos/?gallery=Montreal%202007-07&amp;photo=9"><img src="http://0pointer.de/static/montreal-rain-3.jpeg" alt="Montreal 3" width="156" height="234" /></a></p> <p>I took these during my stay at Montreal after <a href="http://www.linuxsymposium.org/2007/index_2007.php">OLS 2007</a>. Which reminds me: don't miss my talks at <a href="http://foss.in/2007">foss.in 2007</a>, <a href="http://linux.conf.au/">linux.conf.au 2008</a> and <a href="http://www.annodex.org/events/foms2008/">FOMS 2008</a>. I'll be speaking about <a href="http://avahi.org/">Avahi</a>, <a href="http://pulseaudio.org/">PulseAudio</a> and practical real-time programming in userspace.</p> Lennart PoetteringSat, 03 Nov 2007 00:40:00 +0100tag:0pointer.net,2007-11-03:/blog/photos/montreal-rain.htmlphotosFedora Interviewhttps://0pointer.net/blog/projects/fedora-interview.html <p>Don't miss <a href="http://fedoraproject.org/wiki/Interviews/LennartPoettering">this interesting Fedora interview with yours truly</a>, where I go a bit into detail what's coming next for PulseAudio.</p> Lennart PoetteringTue, 30 Oct 2007 23:48:00 +0100tag:0pointer.net,2007-10-30:/blog/projects/fedora-interview.htmlprojectsThe next stephttps://0pointer.net/blog/projects/pa-097.html <p>A few minutes ago, I finally released <a href="http://pulseaudio.org/">PulseAudio</a> <a href="http://pulseaudio.org/milestone/0.9.7">0.9.7</a>. Changes are numerous, especially internally where the core is now threaded and mostly lock-free. Check the rough list on the <a href="http://pulseaudio.org/milestone/0.9.7">milestone page</a>, <a href="https://tango.0pointer.de/pipermail/pulseaudio-discuss/2007-October/000824.html">announcement email</a>. As many of you know we are shipping a pre-release of 0.9.7 in Fedora 8, enabled by default. The final release offers quite a few additions over that prerelease. To show off a couple of nice features, here's a screencast, showing hotplug, simultaneous playback (what Apple calls aggregation) and zeroconfish network support:</p> <p><a href="http://dev.gentooexperimental.org/~flameeyes/mezcalero-pulse-demo.ogm"><img src="http://0pointer.de/public/demo-pulse-small.jpeg" width="300" height="262" alt="screencast" /></a></p> <p>Please excuse the typos. Yes, I still use XMMS, don't ask <sup><small>[1]</small></sup>. Yes, you need a bit of imagination to fully appreciate a screencast that lacks an audio track -- but demos audio software.</p> <p>So, what's coming next? Earcandy, timer-based scheduling/"glitch-free" audio, scriptability through Lua, the todo list is huge. My <a href="http://0pointer.de/public/todo">unnoffical, scratchy, partly german TODO list for PulseAudio</a> is available online.</p> <p>As it appears all relevant distros will now move to PA by default. So, hopefully, PA is coming to a desktop near you pretty soon. -- Oh, you are one of those who still don't see the benefit of a desktop sound server? Then, please reread this <a href="http://mail.gnome.org/archives/desktop-devel-list/2007-October/msg00136.html">too long email of mine</a>, or maybe <a href="http://arstechnica.com/journals/linux.ars/2007/10/17/pulseaudio-to-bring-earcandy-to-linux">this ars.technica article</a>.</p> <p>OTOH, if you happen to like this release, then consider giving me a kudo on <a href="http://www.ohloh.net/accounts/7661">ohloh.net</a>, my ego wants a golden 10. ;-)</p> <p><img src="http://pulseaudio.org/chrome/site/patitle.png" width="345" height="70" alt="logo" /></p> <p><b>Footnotes:</b></p> <p><small>[1] Those music players which categorize audio by ID3 tags just don't work for me, because most of my music files are very badly named. However, my directory structure is very well organized, but all those newer players don't care about directory structures as it seems. XMMS doesn't really either, but <tt>xmms .</tt> does the job from the terminal.</small></p> <p><small><a href="http://farragut.flameeyes.is-a-geek.org/">Flameeyes</a>, thank's for hosting this clip.</small></p> Lennart PoetteringTue, 30 Oct 2007 23:10:00 +0100tag:0pointer.net,2007-10-30:/blog/projects/pa-097.htmlprojectsGlowing Sunhttps://0pointer.net/blog/glowing-sun.html <p><a href="http://0pointer.de/photos/?gallery=Red%20IQ%202007-10"><img src="http://0pointer.de/static/glowing-sun.jpeg" width="640" height="602" alt="Danish design in Red" /></a></p> <p><a href="http://0pointer.de/blog/iq-light-mania.html">Danish-Mexican design</a>, this time in built from red ibico PolyOpaque; for your stylish and very personal red light district.</p> Lennart PoetteringSun, 21 Oct 2007 21:40:00 +0200tag:0pointer.net,2007-10-21:/blog/glowing-sun.htmlmiscYummy Mango Yummy Lassi Yummyhttps://0pointer.net/blog/projects/lassi-lassi-popassi.html <p><a href="http://zee-nix.blogspot.com/2007/10/mango-lassi-icon.html">Zeeshan</a>, Mango Lassi tastes a lot different than a milk shake, believe me! Also, even if Mango Lassi was actually a western thing, do you know that just recently I was witness of <a href="http://www.ohloh.net/accounts/4078">Sjoerd</a><sup><small>[1]</small></sup> ordering a Vindaloo Pizza (or was it Korma?) at a Boston restaurant -- italian pizza with indian-style curry on top. Now, that's what some people might be calling "ignorant of indian cuisine". But actually I think that, like in music, mixing different styles, combining things from different origins is a good thing, and is what makes culture live.</p> <div><b>Footnotes</b></div> <div>[1] Who doesn't have a blog. Can you believe it?</div> Lennart PoetteringWed, 17 Oct 2007 01:53:00 +0200tag:0pointer.net,2007-10-17:/blog/projects/lassi-lassi-popassi.htmlprojectsAn Icon for Mango Lassihttps://0pointer.net/blog/projects/mango-lassi-icon.html <p>Thanks to <a href="http://vdepizzol.wordpress.com">Vinicius Depizzol</a>'s great work, <a href="http://0pointer.de/blog/projects/mango-lassi.html">Mango Lassi</a> now has an icon:</p> <p><img src="http://0pointer.de/public/mango-lassi-icon.png" width="48" height="48" alt="Mango Lassi's new icon" /></p> <p>Muito obrigado!</p> <p>I'd also like to thank everyone else who sent in an icon suggestion. Thank you very much!</p> Lennart PoetteringFri, 12 Oct 2007 21:59:00 +0200tag:0pointer.net,2007-10-12:/blog/projects/mango-lassi-icon.htmlprojectsFedora Planet Noisehttps://0pointer.net/blog/projects/fedora-planet.html <p>Am I the only one who thinks that the usefulness of <a href="http://planet.fedoraproject.org/">Fedora Planet</a> as severely limited because of the low signal-to-noise ratio that is due to far too many non-english (i.e. german, french) language posts?</p> Lennart PoetteringFri, 12 Oct 2007 21:52:00 +0200tag:0pointer.net,2007-10-12:/blog/projects/fedora-planet.htmlprojectsPulseAudio FUDhttps://0pointer.net/blog/projects/pulseaudio-fud.html <p>If you want to know more about <a href="http://pulseaudio.org/">PulseAudio</a>'s relation to GNOME (especially if you think PA is evil) then please read through <a href="http://mail.gnome.org/archives/desktop-devel-list/2007-October/thread.html#00055">this thread</a> on desktop-devel, and especially <a href="http://mail.gnome.org/archives/desktop-devel-list/2007-October/msg00136.html">this long email I posted as a reply a few minutes ago</a>, where I try to debunk all the <a href="http://en.wikipedia.org/wiki/Fear%2C_uncertainty_and_doubt">FUD</a> that has been spread.</p> Lennart PoetteringFri, 12 Oct 2007 21:42:00 +0200tag:0pointer.net,2007-10-12:/blog/projects/pulseaudio-fud.htmlprojectsMango Lassihttps://0pointer.net/blog/projects/mango-lassi.html <p>Yesterday, at the GNOME Summit in Boston I did a quick presentation of my new desktop input sharing hotness thingy, called "Mango Lassi" (Alternatively known as "GNOME Input Sharing"). Something like a <a href="http://synergy2.sourceforge.net/">Synergy</a> done right, or an <a href="http://x2x.dottedmag.net/">x2x</a> that doesn't suck.</p> <p>So, for those who couldn't attend, here's a screenshot, which doesn't really tell how great it is, and which might also be a bit confusing:</p> <p><a href="http://0pointer.de/public/mango-lassi.png"><img src="http://0pointer.de/public/mango-lassi-small.png" width="256" height="192" alt="Mango Lassi Screenshot" /></a></p> <p>And here's a list of random features already available:</p> <ul> <li>Discover desktops to share mouse and keyboards with automatically via <a href="http://avahi.org/">Avahi</a>.</li> <li>Fully peer-to-peer. All Mango Lassi instances are both client and server at the same time. Other hosts may enter or leave a running session at any time.</li> <li>No need to open X11 up for the network</li> <li>You have a 50% chance that for your setup you don't need any configuration at all. In the case of the other 50% you might need to swap the order of your screens manually in a simple dialog, because Mango Lassi didn't guess correctly which screen is left and which screen is right.</li> <li>libnotify integration so that it tells you whenever a desktop joins or leaves your session.</li> <li>Shows a nice OSD on your screen when your screen's input is currently being redirected to another screen.</li> <li>Uses all those nifty GNOME APIs, like D-Bus-over-TCP, Avahi, libnotify, Gtk, ...</li> <li>Supports both the X11 clipboard and the selection, supporting all content types, and not just simple text -- i.e. you can copy and paste image data between Gimp on your screens</li> <li>Lot's of bugs and useless debug output, since this is basically the work of just three weekends.</li> <li>Tray icon</li> </ul> <p>And here's a list of missing features:</p> <ul> <li>Drag'n'drop between screens. (I figured out how this could work, it's just a matter of actually implementing this, which is probably considerable work, because this would require some UI work, to show a download dialog and suchlike.)</li> <li>Integration with Matthias' GTK+ window migration patches, which would allow dragging GTK+ windows between screens. The migration code for GTK+ basically works. It's just a matter of getting them merged in GTK+ proper, and hooking them up properly with Mango Lassi, which probably needs some kind of special support in Metacity so that we get notified when a window drag is happening and the pointer comes near the edges of the screens.</li> <li>Encryption, authentication: Best solution would probably be that D-Bus would get native TLS support which we could then make use of.</li> <li>Support for legacy operating systems like Windows/MacOS. I personally don't care much about this. However, Zeroconf implementations and D-Bus is available on Windows/MacOS too, and the exposed D-Bus interfaces are not too X11-centric, so this should be doable without too much work.</li> <li>UI Love, actually hooking up the desktop order changing buttons, save and restore the order automatically.</li> <li>MPX support (this would *rock*)</li> </ul> <p>And finally, here's where you can get it:</p> <pre>git clone http://git.0pointer.de/repos/mango-lassi.git/</pre> <p><a href="http://git.0pointer.de/?p=mango-lassi.git;a=summary">gitweb</a></p> <p>Oh, and I don't take feature wishlist requests for this project. If you need a feature, implement it yourself. It's Free Software after all! I'd be happy if someone would be willing to work on Mango Lassi in a way that it can become a really good GNOME citizen and maybe even a proper part of it. But personally I'll probably only work on it to a level where it does all I need to work with my Laptop and my Desktop PC on my desk in a sane way. I am almost 100% busy with <a href="http://pulseaudio.org/">PulseAudio</a> these days, and thus unable to give Mango Lassi the love it could use. So, stand up now, if you want to take over maintainership!</p> <p>Hmm, Mango Lassi could use some good artwork, starting with an icon. I am quite sure that someone with better graphic skills then me could easily create a delicious icon perhaps featuring a glass of fresh, juicy <a href="http://images.google.com/images?ie=UTF-8&amp;oe=UTF-8&amp;q=mango+lassi&amp;um=1&amp;sa=N&amp;tab=wi">Mango Lassi</a>. I'd be very thankful for every icon submission!</p> Lennart PoetteringTue, 09 Oct 2007 18:35:00 +0200tag:0pointer.net,2007-10-09:/blog/projects/mango-lassi.htmlprojectsTwenty-Firsthttps://0pointer.net/blog/projects/twenty-first.html <p><a href="http://www.linuxworld.com/community/?q=node/1447">Hahaha.</a></p> <p>Yours truely, Lennart (C list blogger).</p> Lennart PoetteringTue, 09 Oct 2007 06:21:00 +0200tag:0pointer.net,2007-10-09:/blog/projects/twenty-first.htmlprojectsThis is a good movehttps://0pointer.net/blog/projects/kmod.html <p><a href="https://www.redhat.com/archives/fedora-devel-list/2007-September/msg01949.html">I hope other distributions will follow.</a></p> Lennart PoetteringSun, 23 Sep 2007 17:02:00 +0200tag:0pointer.net,2007-09-23:/blog/projects/kmod.htmlprojectsEnforcing a Whitespace Regimehttps://0pointer.net/blog/projects/whitespace-regime.html <p>So, you want to be as tough as the kernel guys and enforce a strict whitespace regime on your project? But you lack the whitespace fascists with too many free time lurking on your mailing list who might do all the bitching about badly formatted patches for you? Salvation is here:</p> <p>Stick <a href="http://0pointer.de/public/pre-commit.txt">this pre-commit file</a> in your SVN repository as <tt>hooks/pre-commit</tt> and give it a <tt>chmod +x</tt> and your SVN server will do all the bitching for you -- for free:</p> <pre>#!/bin/bash -e REPOS="$1" TXN="$2" SVNLOOK=/usr/bin/svnlook # Require some text in the log $SVNLOOK log -t "$TXN" "$REPOS" | grep -q '[a-zA-Z0-9]' || exit 1 # Block commits with tabs or trailing whitespace $SVNLOOK diff -t "$TXN" "$REPOS" | python /dev/fd/3 3&lt;&lt;'EOF' import sys ignore = True SUFFIXES = [ ".c", ".h", ".cc", ".C", ".cpp", ".hh", ".H", ".hpp", ".java" ] filename = None for ln in sys.stdin: if ignore and ln.startswith("+++ "): filename = ln[4:ln.find("\t")].strip() ignore = not reduce(lambda x, y: x or y, map(lambda x: filename.endswith(x), SUFFIXES)) elif not ignore: if ln.startswith("+"): if ln.count("\t") &gt; 0: sys.stderr.write("\n*** Transaction blocked, %s contains tab character:\n\n%s" % (filename, ln)) sys.exit(1) if ln.endswith(" \n"): sys.stderr.write("\n*** Transaction blocked, %s contains lines with trailing whitespace:\n\n%s&lt;EOL&gt;\n" % (filename, ln.rstrip("\n"))) sys.exit(1) if not (ln.startswith("@") or \ ln.startswith("-") or \ ln.startswith("+") or \ ln.startswith(" ")): ignore = True sys.exit(0) EOF exit "$?"</pre> <p>This will cause all commits to be blocked that don't follow my personal tase of whitespace rules.</p> <p>Of course, it is up to you to adjust this script to your personal taste of fascism. If you hate tabs like I do, and fear trailing whitespace like I do, than you can use this script without any changes. Otherwise, learn Python and do some trivial patching.</p> <p>Hmm, so you wonder why anyone would enforce a whitespace regime like this? First of all, it's a chance to be part of a regime -- where you are the dictator! Secondly, if people use tabs source files look like <i>Kraut und R&uuml;ben</i>, different in every editor<sup>[1]</sup>. Thirdly, trailing whitespace make clean diffs difficult<sup>[2]</sup>. And think of the hard disk space savings!</p> <p>I wonder how this might translate into GIT. I have a couple of GIT repositories where I'd like to enforce a similar regime as in my SVN repositories. Suggestions welcome!</p> <p>Oh, and to make it bearable to live under such a regime, configure your <tt>$EDITOR</tt> properly, for example by hooking <tt>nuke-trailing-whitespace.el</tt> to <tt>'write-file-hooks</tt> in Emacs.</p> <small><b>Footnotes</b></small> <p><small>[1] Yes, some people think this is a feature. I don't. But talk to <tt>/dev/null</tt> if you want to discuss this with me.</small></p> <p><small>[2] Yes, there is <tt>diff -b</tt>, but it is still a PITA.</small></p> Lennart PoetteringThu, 20 Sep 2007 23:01:00 +0200tag:0pointer.net,2007-09-20:/blog/projects/whitespace-regime.htmlprojectsiLock-in: Apple locks Free Software out, but where's the news?https://0pointer.net/blog/projects/apple-sucks.html <p><a href="http://www.figuiere.net/hub/blog/?2007/09/15/559-free-software-lock-out">So, Apple now blocks third-party software from accessing iPods.</a> But is behaviour like that news? No, unfortunately not at all.</p> <p>Let's have a look on two technologies that are closely related to the iPod and Apple-style media playback: <a href="http://en.wikipedia.org/wiki/Digital_Audio_Access_Protocol">DAAP (Digital Audio Access Protocol)</a> and <a href="http://en.wikipedia.org/wiki/RAOP">RAOP (Remote Audio Output Protocol).</a> RAOP is the protocol that is spoken when you want to output audio from iTunes over the network on your AirPort base station. DAAP is the popular protocol which you can use to swap music between multiple iTunes instances on a LAN. Both technologies use cryptographic hashes to block interoperable alternative implementations.</p> <p>Now, the RAOP client crypto key has been extracted from iTunes, hence its now possible to implement alternative software that takes the role of iTunes and streams audio to an AirPort. However, noone managed to extract the RAOP server key yet, hence noone is able to implement software that exposes itself as AirPort-compatible audio sink on the network, so that iTunes could stream data to it.</p> <p>With DAAP it's a similar situation: iTunes uses cryptographic hashes to make sure that only real iTunes instances can swap audio with each other. This key has been broken multiple times, hence there are now a couple of alternative DAAP implementations, which can swap audio with iTunes (<a href="http://www.gnome.org/projects/rhythmbox/">Rhythmbox</a> being one example). However, with iTunes 7 Apple changed the cryptographic key once again, and until now nobody managed to break it.</p> <p>So basically, Apple now dongles AirPorts to iTunes, iTunes to iTunes and iTunes to iPods. The whole Apple eco-system of media devices and software is dongled together. And none of the current iterations of the underlying technologies have been fully broken yet.</p> <p>While the audio files you can buy at the iTunes shop may now be DRM-free, you're still locked into the Apple eco-system if you do that. They replaced DRM with <a href="http://en.wikipedia.org/wiki/Vendor_lockin">vendor lock-in</a>.</p> <p>This lock-in behaviour is childish at best. DAAP once was the de-facto standard for swapping media files in LANs. Swapping files in LANs is perfectly legitimate and legal. Then, Microsoft/Intel started to include a similar technology in UPnP, the <a href="http://www.upnp.org/standardizeddcps/mediaserver.asp">UPnP MediaServer</a>. An open technology that has now been included in endless media server devices. Several Free Software implementations exist (most notably <a href="http://www.gupnp.org/">gUPnP</a>). These days, uPNP MediaServer is ubiquitous, DAAP is no more. Apple had the much better starting position, but they blew it, because of their childish locking-out of alternative implementations.</p> <p>I believe that DAAP is the superior protocol in comparison to UPnP MediaServer. (Not really surprising, since I wrote most of <a href="http://avahi.org/">Avahi</a>, which is a free implementation of mDNS/DNS-SD ("Zeroconf"), the (open) Apple technology that is the basis for DAAP.) However, due to the closedness of DAAP I would recommend everyone to favour UPnP MediaServer over DAAP. It's a pity.</p> <p>Both DAAP and UPnP MediaServer are transfer protocols, nothing that is ever directly exposed to the user. Right now, Free Software media players support DAAP much better than UPnP MediaServer. Hopefully, they will start to abstract the differences away, and allow swapping music the same way over DAAP and over uPnP. And hopefully, DAAP will eventually die or Apple will open it. They have shown that they are able to change for the good, they became much more open with WebKit, and they changed the license of Bonjour to a real Free Software license. Let's hope they will eventually notice that locking users in makes their own technology irrelevant in the long term.</p> <p>Oh, and let's hope that <a href="http://nanocrew.net/">Jon</a> finds the time to break all remaining Apple crypto keys! Jon, DAAP 7.0, and the RAOP server key is waiting for you! I'd love to make <a href="http://pulseaudio.org/">PulseAudio</a> RAOP-compatible, both as client and as server.</p> <p><b>Update:</b> <a href="http://arstechnica.com/news.ars/post/20070916-gtkpod-coders-crack-apples-new-ipod-checksum.html">Ars Technica has an update on this.</a></p> Lennart PoetteringSat, 15 Sep 2007 18:52:00 +0200tag:0pointer.net,2007-09-15:/blog/projects/apple-sucks.htmlprojectsYou talkin' to me?https://0pointer.net/blog/projects/lugradio.html <p>Woah, <a href="http://www.lugradio.org/episodes/83">I am interviewed on LugRadio</a>. (@ 71:09)</p> Lennart PoetteringFri, 14 Sep 2007 00:12:00 +0200tag:0pointer.net,2007-09-14:/blog/projects/lugradio.htmlprojectsccachehttps://0pointer.net/blog/projects/ccache.html <pre> $ ccache -s | egrep "(cache hit|cache miss)" cache hit 3518652 cache miss 168484 $ echo $((168484*1000/3518652)) 47 $ </pre> <p>Less than 5% of the compiler invocations on my development machine since 2004 actually processed new and unseen code.</p> <p>I'm still unsure, though, what this is telling me?</p> Lennart PoetteringSun, 02 Sep 2007 23:54:00 +0200tag:0pointer.net,2007-09-02:/blog/projects/ccache.htmlprojectsThree days lefthttps://0pointer.net/blog/projects/foms-2008.html <p>Only <b>three days</b> left for sending in your paper for <a href="http://www.annodex.org/events/foms2008/">FOMS 2008, the best Free Software multimedia conference/workshop around</a>. The best chance to meet all the important people from the major multimedia projects!</p> Lennart PoetteringWed, 22 Aug 2007 16:34:00 +0200tag:0pointer.net,2007-08-22:/blog/projects/foms-2008.htmlprojectsAn era ends, a new one beginshttps://0pointer.net/blog/projects/pulseaudio-fedora.html <p><a href="https://www.redhat.com/archives/fedora-devel-list/2007-August/msg01196.html">Earlier today</a> I switched Fedora over to install <a href="http://pulseaudio.org/">PulseAudio</a> instead of the venerable <a href="http://www.tux.org/~ricdude/overview.html">EsounD</a> by default.</p> Lennart PoetteringThu, 16 Aug 2007 22:30:00 +0200tag:0pointer.net,2007-08-16:/blog/projects/pulseaudio-fedora.htmlprojectsGUADEC 2007 Slideshttps://0pointer.net/blog/projects/guadec-2007-slides.html <p><a href="http://0pointer.de/public/pulseaudio-presentation-guadec2007.pdf">My GUADEC 2007 slides.</a></p> Lennart PoetteringFri, 03 Aug 2007 23:55:00 +0200tag:0pointer.net,2007-08-03:/blog/projects/guadec-2007-slides.htmlprojectsI wonder ...https://0pointer.net/blog/projects/send-file.html <p>... whether the guys behind <a href="http://code.google.com/p/giver/">this</a> know about <a href="http://techn.ocracy.org/telekinesis/">this</a>?</p> <p>It's a pleasure to see as many projects as possible making use of <a href="http://avahi.org">Avahi</a>. OTOH I believe that all solutions should speak the same protocol. Using Apple's somewhat standardized link-local iChat/XMPP protocol (which is what Telekinesis does) seems to be the best option to me: because you get MacOSX interoperability for free and many IM clients (including many on Windows) already contain support for this as well.</p> Lennart PoetteringMon, 23 Jul 2007 19:10:00 +0200tag:0pointer.net,2007-07-23:/blog/projects/send-file.htmlprojectsCUPS 1.3b1 gained Zeroconf supporthttps://0pointer.net/blog/projects/cups-bonjour.html <p>Seems CUPS now comes with Zeroconf/Bonjour network printer browsing support included in the <a href="http://www.cups.org/articles.php?L479">upstream tarball</a>. I haven't tried this myself, but presumably CUPS should work on <a href="http://avahi.org/">Avahi</a> as well, since we ship a -- these days nearly perfect -- Bonjour compatibility library. </p> <p>In Fedora Rawhide <a href="http://cvs.fedora.redhat.com/viewcvs/devel/cups/cups-avahi.patch?rev=1.1&amp;view=auto">this functionality</a> seems to be enabled already. Other distibutions, please follow!</p> <p>Seems at least one good thing came from the recent <a href="http://lwn.net/Articles/242020/">Apple buyout of CUPS/Easy Software Products</a>: I can now remove one item from my TODO list which has been there for a long time already.</p> Lennart PoetteringMon, 23 Jul 2007 18:32:00 +0200tag:0pointer.net,2007-07-23:/blog/projects/cups-bonjour.htmlprojectsSlides for LRL and OLShttps://0pointer.net/blog/projects/ols-lrl-slides.html <p>For those interested: here're my slides for my presentations at LRL and OLS:</p> <ul> <li><a href="http://0pointer.de/public/pulseaudio-presentation-ols2007.pdf">Ottawa Linux Symposium 2007: Cleaning up the Linux Desktop Audio Mess</a> (Not too much new stuff here if you already read my LCA slides)</li> <li><a href="http://0pointer.de/public/avahi-presentation-lrl2007.pdf">LugRadio Live 2007: Six Use Cases for Avahi</a></li> </ul> <p>LWN linked <a href="http://excess.org/article/2007/07/ottawa-linux-symposium-2007-day-4/">a short summary of my OLS talk</a>.</p> Lennart PoetteringWed, 11 Jul 2007 00:25:00 +0200tag:0pointer.net,2007-07-11:/blog/projects/ols-lrl-slides.htmlprojectsIm Zentrum der Machthttps://0pointer.net/blog/photos/im-zentrum-der-macht.html <p>The Government District in Berlin, with the <i>Reichstag</i> and the offices of the members of the <i>Bundestag</i>:</p> <p><a href="http://0pointer.de/static/regierungsviertel.html"><img src="http://0pointer.de/static/regierungsviertel-small.jpeg" width="1024" height="175" alt="Im Zentrum der Macht" /></a></p> <p>The Diana Temple in the <i>Hofgarten</i> in Munich:</p> <p><a href="http://0pointer.de/static/hofgarten.html"><img src="http://0pointer.de/static/hofgarten-small.jpeg" width="1024" height="167" alt="Hofgarten" /></a></p> <p>The <i>K&ouml;nigsplatz</i> in Munich:</p> <p><a href="http://0pointer.de/static/koenigsplatz.html"><img src="http://0pointer.de/static/koenigsplatz-small.jpeg" width="1024" height="122" alt="Königsplatz" /></a></p> <p>The <i>Residenz</i> in Munich:</p> <p><a href="http://0pointer.de/static/residenz.html"><img src="http://0pointer.de/static/residenz-small.jpeg" width="1024" height="166" alt="Residenz" /></a></p> <p>View from the tower of Old <i>St. Peter</i> in Munich:</p> <p><a href="http://0pointer.de/static/stpeter.html"><img src="http://0pointer.de/static/stpeter-small.jpeg" width="1024" height="162" alt="St. Peter" /></a></p> <p>Green pastures of Hamburg-Wohldorf:</p> <p><a href="http://0pointer.de/static/wohldorfer-feld.html"><img src="http://0pointer.de/static/wohldorfer-feld-small.jpeg" width="1024" height="146" alt="Wohldorfer Feld" /></a></p> <p><a href="http://0pointer.de/static/panos.cgi">All my panoramic photos.</a> (Warning! Page contains a lot of oversized, badly scaled images.)</p> Lennart PoetteringSat, 23 Jun 2007 22:26:00 +0200tag:0pointer.net,2007-06-23:/blog/photos/im-zentrum-der-macht.htmlphotosRe: Avahi - what happened. on Solaris..?https://0pointer.net/blog/projects/project-indiana-part2.html <p><a href="http://dar-k.blogspot.com/2007/06/avahi-what-happened-on-solaris.html">In response to Darren Kenny</a>:</p> <ul><li>On Linux (and FreeBSD) <a href="http://0pointer.de/lennart/projects/nss-mdns/">nss-mdns</a> has been providing <i>decent low-level integration of mDNS at the nsswitch level</i> for ages. In fact it even predates Avahi by a few months. Porting it to Solaris would have been almost trivial. And, Sun engineers even asked about nss-mdns, so I am quite sure that Sun knew about this.</li> <li>You claim that our C API was <i>internal</i>? I wonder who told you that. I definitely did not. The API has been available on the Avahi web site for ages and is relatively well documented <sup>[1]</sup>, I wonder how anyone could ever come to the opinion that it was "internal". Regarding API stability: yes, I said that we make no guarantees about API stability -- but I also said it was a top-priority for us to keep the API compatible. I think that is the best you can get from <i>any</i> project of the Free Software community. If there is something in an API that we later learn is irrecoverably broken or stupid by design, then we take the freedom to replace that or remove it entirely. Oh, and even Sun does things like that in Java, Just think of the Java 1.x <tt>java.lang.Thread.stop()</tt> API.</li> <li><tt>nss-mdns</tt> does not make any use of D-Bus. It never did, it never will.</li> <li>GNOME never formally made the decision <i>to go Avahi</i> AFAIK. It's just what everyone uses because it is available on all distributions. Also, a lot of GNOME software can also be compiled against HOWL/Bonjour.</li> <li>Implementing the Avahi API on top of the Bonjour API is just crack. For a crude comparison: this is like implementing a POSIX compatiblity layer on top of the DOS API. Crack. Just crack. There is lot of functionality you can *never* emulate in any reasonable way on top of the current Bonjour API: properly integrated IPv4+IPv6 support, <tt>AVAHI_BROWSER_ALL_FOR_NOW</tt>, the fact that the Avahi API is transaction-based, all the different flag definitions, and a lot more. From a technical persepective emulating Avahi on top of Bonjour is not feasible, while the other way round perfectly is.</li> </ul> <p>Let's also not forget that Avahi comes with a Bonjour compatibility layer, which gets almost any Bonjour app working on top of Avahi. And in contrast your Avahi-on-top-of-Bonjour stuff it is not inherently borked. Yes, our Bonjour compatibility layer is not perfect, but should be very easy to fix if there should still be an incompatibility left. And the API of that layer is of course as much set in stone as the upstream Bonjour API. Oh, and you wouldn't have to run two daemons instead of just one. And you would only need to ship and maintain a single mDNS package. Oh, and the compatibility layer would only be needed for the few remaing applications that still use Bonjour exclusively, and not by the majority of applications.</p> <p>So, in effect you chose Bonjour because of its API and added some Avahi'sh API on top and this all is totally crackish. If you'd have done it the other way round you would have gotten both APIs as well, but the overall solution would not have been totally crackish. And let's not forget that Avahi is much more complete than Bonjour. (Maybe except wide-area support, Federico!).</p> <p>Anyway, my original rant was not about the way Sun makes its decision but just about the fact that your Avahi-to-Bonjour-bridge is ... crack! And that it remains.</p> <p>Wow, six times <i>crack</i> in a single article.</p> <p><b>Footnotes:</b></p> <p><small>[1] For a Free Software API at least.</small></p> Lennart PoetteringWed, 13 Jun 2007 21:14:00 +0200tag:0pointer.net,2007-06-13:/blog/projects/project-indiana-part2.htmlprojectsProject Indianahttps://0pointer.net/blog/projects/project-indiana.html <p>Dear Sun Microsystems,</p> <p>I wonder if the mythical "Project Indiana" consists of <a href="http://src.opensolaris.org/source/xref/jds/spec-files/trunk/patches/">patches like these</a> which among other strange things make the <a href="http://avahi.org/">Avahi</a> daemon just a frontend to the <a href="http://www.apple.com/macosx/features/bonjour/">Apple Bonjour</a> daemon. Given that Avahi is a superset of Bonjour in both functionality and API this is just so ridiculuous -- I haven't seen such a monstrous crack in quite a while.</p> <p>Sun, you don't get it, do you? That way you will only reach the crappiness, bugginess and brokeness of Windows, not the power and usability of Linux.</p> <p>Oh, and please rename that "fork" of Avahi to something completely different -- because it actually is exactly that: something completely different than Avahi.</p> <p>Love,</p> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Lennart</p> Lennart PoetteringWed, 13 Jun 2007 14:07:00 +0200tag:0pointer.net,2007-06-13:/blog/projects/project-indiana.htmlprojectsFreikarten!https://0pointer.net/blog/projects/linuxtag-freikarten.html <p>Anyone looking for free tickets for <a href="http://linuxtag.org/">LinuxTag 2007 (Berlin)</a>? Just ping me, and I'll send you one. (Be quick, they are limited!)</p> Lennart PoetteringTue, 29 May 2007 15:05:00 +0200tag:0pointer.net,2007-05-29:/blog/projects/linuxtag-freikarten.htmlprojectsConferenceshttps://0pointer.net/blog/projects/conferences-2007.html <p>I will be speaking at following conferences in the next three months:</p> <p><a href="http://www.linuxtag.org/2007/en/conf/events/vp-freitag/vortragsdetails.html?talkid=88">LinuxTag, Berlin</a></p> <p><a href="http://www.linuxsymposium.org/2007/view_abstract.php?content_key=158">Ottawa Linux Symposium</a></p> <p><a href="http://www.lugradio.org/live/2007/schedule.html">LugRadio Live, Wolverhampton</a></p> <p><a href="http://guadec.org/node/565">GUADEC, Birmingham</a></p> <p><img src="http://farm1.static.flickr.com/230/480023477_8a690b9c72_m.jpg" width="240" height="100" alt="LugRadio Live Speaker" /></p> Lennart PoetteringMon, 28 May 2007 16:51:00 +0200tag:0pointer.net,2007-05-28:/blog/projects/conferences-2007.htmlprojectsCuddle opening function braces, anyone?https://0pointer.net/blog/projects/cuddle-function.html <p>Dear Lazyweb!</p> <p>Does anyone know how I can teach GNU indent to cuddle opening function braces and the closing ')' of the argument list ? i.e.</p> <pre> int main(int argc, char* argv[]) { } </pre> <p>instead of:</p> <pre> int main(int argc, char* argv[]) { } </pre> <p>Any help appreciated!</p> Lennart PoetteringThu, 10 May 2007 16:22:00 +0200tag:0pointer.net,2007-05-10:/blog/projects/cuddle-function.htmlprojectsDMI-based Autoloading of Linux Kernel Moduleshttps://0pointer.net/blog/projects/dmi-based-module-autoloading.html <p>So, you've always been annoyed by the fact that you have to load all those laptop, i2c, hwmon, hdaps Linux kernel modules manually without having spiffy udev doing that work for you automagically? No more! I just sent <a href="http://0pointer.de/public/dmi-id.patch">a patch</a> <a href="http://lkml.org/lkml/2007/5/8/428">to LKML</a> which adds <a href="http://en.wikipedia.org/wiki/Desktop_Management_Interface">DMI/SMBIOS</a>-based module autoloading to the Linux kernel.</p> <p>Hopefully this patch will be integrated into Linus' kernel shortly. As soon as that happens udev will automatically recognize your laptop/mainboard model and load the relevant modules.</p> <p>Module maintainers, please add <tt>MODULE_ALIAS</tt> lines to your kernel modules to make sure that they are autoloaded using this new mechanism, as soon as it gets commited in Linus' kernel.</p> <p>For a fully automatically configured system only ACPI-DSDT-based module autoloading is missing. I.e. load the "battery" module only when an ACPI battery is actually around.</p> Lennart PoetteringTue, 08 May 2007 22:16:00 +0200tag:0pointer.net,2007-05-08:/blog/projects/dmi-based-module-autoloading.htmlprojectsSuomenlinnahttps://0pointer.net/blog/photos/suomenlinna.html <p><a href="http://0pointer.de/static/suomenlinna.html"><img src="http://0pointer.de/static/suomenlinna-small.jpeg" width="1024" height="181" alt="Suomenlinna" /></a></p> Lennart PoetteringWed, 02 May 2007 01:48:00 +0200tag:0pointer.net,2007-05-02:/blog/photos/suomenlinna.htmlphotosOn Using Huginhttps://0pointer.net/blog/photos/hugin.html <p>On popular request, here are a few suggestions how to make best use of Hugin for stitching your panoramas. You probably should have read some of the tutorials at <a href="http://hugin.sourceforge.net/tutorials/index.shtml">Hugin's web site</a> before reading these suggestions.</p> <ul> <li>Use manual exposure settings in your camera. On Canon cameras this means you should be using the "M" mode. Make sure choose good exposure times and aperture so that the entire range you plan to take photos of is well exposed. If you don't know how to use the "M" mode of your camera you probably should be reading an introduction into photography now. The reason for setting exposure values manually is that you want the same exposing on all photos from your settings.</li> <li>Disable automatic white balance mode. You probably should have done that anyway. "Semi-automatic" white balance mode is probably OK (i.e. selecting the white balance from one of the pre-defined profiles, such as "Daylight", "Cloudy", ...)</li> <li>Also manually set the ISO level. You probably should be doing that anyway.</li> <li>Using autofocus is probably OK.</li> <li>Try not not move around too much while taking the photo series. Hugin doesn't like that too much. It's OK to move a little, but you should do all the shots for your panorama from a single point, and not while moving on a circle, line, or even Bezier-line.</li> <li>When doing 360&deg; panoramas it is almost guaranteed (at least in northern countries) that you have the sun as back light. That will overexpose the panorama in that direction and lower the contrast in the area. To work against this, you might want to choose to do your panorama shots at noon in summer when sun is in zenith. Gray-scaling the shot and doing some other kind of post-processing might be a way to ease this problem.</li> <li>To work against <a href="http://en.wikipedia.org/wiki/Chromatic_aberration">chromatic aberration</a> it is a good idea to use large overlap areas, and doing your shots in "landscape" rather then "portrait" (so that only the center of each image is used in the final image)</li> <li>Running hugin/enblend on an encrypted <tt>$HOME</tt> (like I do) won't make you particularly happy.</li> <li>Pass <tt>-m 256</tt> to enblend. At least on my machine (with limited RAM and dm-crypt) things are a lot faster that way.</li> <li>Sometimes moving things (e.g. people) show up twice (or even more times) in the resulting panorama. Sometimes that is funny, sometimes it is not. To remove them, open the seperate <tt>tif</tt> files before feeding them into enblend into Gimp and cut away the things you want to remove from all but one of these images. Then pass that on to enblend.</li> <li>If regardless how many control points you set in Hugin the images just don't fit together, you should probably run "Optimize Everything" instead of just "Optimize Positions".</li> <li>When doing your shots, make sure to hold the camera all the time at the same height, to avoid having to cut too much of the image away in the final post-processing. This is sometimes quite difficult, especially if you have images with no clear horizon.</li> <li>Remember that you can set horizontal and vertical lines as control points in Hugin! Good for straitening things out and making sure that vertical things are actually vertical in the resulting panorama.</li> </ul> Lennart PoetteringWed, 02 May 2007 00:45:00 +0200tag:0pointer.net,2007-05-02:/blog/photos/hugin.htmlphotosHelsingin Tuomiokirkkohttps://0pointer.net/blog/photos/helsingin-tuomiokirkko.html <p>Following an invitation of the Nokia 770/N800 multimedia team I've been visiting the Nokia research center in Helsinki last week. A good opportunity to get some more material for <a href="http://hugin.sourceforge.net/">Hugin</a>:</p> <p><a href="http://0pointer.de/static/helsingin-tuomiokirkko.html"><img src="http://0pointer.de/static/helsingin-tuomiokirkko-small.jpeg" width="1024" height="291" alt="Helsingin Tuomiokirkko" /></a></p> Lennart PoetteringWed, 02 May 2007 00:42:00 +0200tag:0pointer.net,2007-05-02:/blog/photos/helsingin-tuomiokirkko.htmlphotosTag me!https://0pointer.net/blog/projects/tag.html <p>Jeff now started to use <tt>lennartpoettering</tt> for <a href="http://perkypants.org/blog/2007/05/02/links-for-2007-05-01/">tagging his blog stories</a>... <b>AWESOME</b>!</p> Lennart PoetteringTue, 01 May 2007 20:41:00 +0200tag:0pointer.net,2007-05-01:/blog/projects/tag.htmlprojectsMy thoughts on the future of Gnome-VFShttps://0pointer.net/blog/projects/gnomevfs-future.html <p>One of the major construction sites in GNOME and all the other free desktop environments is the VFS abstraction. Recently, there has been some discussion about developing a <a href="http://freedesktop.org/wiki/Software/dvfs">replacement DVFS</a> as replacement for the venerable Gnome-VFS system. Here are my 5 euro cent on this issue (Yepp, I am not fully up-to-date on the whole DVFS discussion, but during my flight from HEL to HAM I wrote this up, without being necesarily too well informed, lacking an Internet connection. Hence, if you find that I am an uniformed idiot, you're of course welcome to flame me!):</p> <p>First of all, we have to acknowledge that Gnome-VFS never achieved any major adoption besides some core (not even all) GNOME applications. The reasons are many, among them: the API wasn't all fun, using Gnome-VFS added another dependency to applications, KDE uses a different abstraction (KIO), and many others. Adoption was suboptimal, and due to that user experience was suboptimal, too (to say the least). </p> <p>One of the basic problems of Gnome-VFS is that it is a (somewhat) redundant abstraction layer over yet another abstraction layer. Gnome-VFS makes available an API that offers more or less the same functionality as the (most of the time) underlying POSIX API. The POSIX API is well accepted, relatively easy-to-use, portable and very well accepted. The same is not true for Gnome-VFS. Semantics of the translation between Gnome-VFS and POSIX are not always that clear. Paths understood by Gnome-VFS (URLs) follow a different model than those of the Linux kernel. Applications which understand Gnome-VFS can deal with FTP and HTTP resources, while the majority of the applications which do not link against Gnome-VFS does not understand it. Integration of Gnome-VFS-speaking and POSIX-speaking applications is difficult and most of the time only partially implementable.</p> <p>So, in short: One one side we have that POSIX API which is a file system abstraction API. And a (kernel-based) virtual file system behind it. And on the other side we have the Gnome-VFS API which is also a file system abstraction API and a virtual file system behind it. Hence, why did we decide to standardize on Gnome-VFS, and not just on POSIX?</p> <p>The major reason of course is that until recently accessing FTP, HTTP and other protocol shares through the POSIX API was not doable without special kernel patches. However, a while ago the FUSE system has been merged into the Linux kernel and has been made available for other operating systems as well, among them FreeBSD and MacOS X. This allows implementing file system drivers in userspace. Currently there are all kinds of these FUSE based file systems around, FTP and SSHFS are only two of them. My very own <a href="http://0pointer.de/lennart/projects/fusedav/">fusedav</a> tool implements WebDAV for FUSE.</p> <p>Another (*the* other?) major problem of the POSIX file system API is its synchronous design. While that is usually not a problem for local file systems and for high-speed network file systems such as NFS it becomes a problem for slow network FSs such as HTTP or FTP. Having the GUI block for various seconds while an application saves its documents is certainly not user friendly. But, can this be fixed? Yes, definitely, it can! Firstly, there already is the POSIX AIO interface -- which however is quite unfriendly to use (one reason is its use of Unix signals for notification of completed IO operations). Secondly, the (Linux) kernel people are working on a better asynchronous IO API (see the syslets/fibrils discussion). Unfortunately it will take a while before that new API will finally be available in upstream kernels. However, there's always the third solution: add an asynchronous API entirely in userspace. This is doable in a clean (and glib-ified) fashion: have a couple of worker threads which (synchronously) execute the various POSIX file system functions and add a nice, asynchronous API that can start and stop these threads, feed them operations to execute, and so on.</p> <p>So, what's the grand solution I propose for the desktop VFS mess? First, kick Gnome-VFS entirely and don't replace it. Instead write a small D-Bus-accessible daemon that organizes a special directory <tt>~/net/</tt>. Populate that directory with subdirectories for all WebDAV, FTP, NFS and SMB shares that can be found on the local network using both Avahi-based browsing and native SMB browsing. Now use the Linux automounting interface on top of that directory and automount the respective share every time someone wants to access it. For shares that are not announced via Avahi/Samba, add some D-Bus API (and a nice UI) for adding arbitrary shares. NFS and CIFS/SMB shares are mounted with the fast, optimized kernel filesystem implementation; WebDAV and FTP on the other hand are accessed via userspace FUSE-based file systems. The latter should also integrate with D-BUS in some way, to query the user nicely for access credentials and suchlike, with gnome-keyring support and everything.</p> <p><tt>~/net/</tt> itself can -- but probably doesn't need to -- be a FUSE filesystem itself.</p> <p>A shared library should be made available that will implement a few remaining things, that are not available in the POSIX file system API directly:</p> <ul> <li>As mentioned, some nice Glib-ish asynchronous POSIX file system API wrapper</li> <li>High-level file system operations such as copying, moving, deleting (trash!) which show a nice GUI when they are long-running operations.</li> <li>An API to translate and setup URL <tt>&lt;-&gt;</tt> filesystem mappings, i.e. something that translates <tt>ftp://test.local/a/certain/path/</tt> to <tt>~/net/ftp:test.local/a/certain/path</tt> and vice versa. (and probably also to a more user-friendly notation, maybe like "<tt>FTP Share on test.local</tt>" or similar). (Needs to communicate with the <tt>~/net/</tt> handling daemon to setup mappings if required)</li> <li>Meta data extraction. It makes sense to integrate that with extended attribute support (EA) in the kernel file system layer, which should be used more often anyway.</li> <li>Explicit mount operations (in contrast to implicit mounts, that are done through automounting) (this also needs to communicate with the <tt>~/net/</tt> daemon in some way) </li> </ul> <p>Et voil&#225;! Without a lot of new code you get a nice, asynchronous, modern, well integrated file system, that doesn't suck. (or at least, it doesn't suck as much as other solutions).</p> <p>Also, this way we can escape the "abstraction trap". Let's KDE play the abstraction game, maybe they'll grow up eventually and learn that abstracting abstracted abstraction layers is child's play.</p> <p>Yeah, sure, this proposed solution also has a few drawbacks, but be it that way. Here's a short incomprehensive list:</p> <ul> <li>The POSIX file system API sucks for file systems that don't have "inodes" or that are attached to a specific user sessions. -- Yes, sure, but both problems have been overcome by the FUSE project, at least partially.</li> <li>Not that portable -- Yes, but FUSE is now available for many systems besides Linux. The automount project is the bigger problem. But all you loose if you would run this proposed system on these (let's say "legacy") systems that don't have FUSE or automounting is access to FTP and WebDAV shares. So what? Local files can still be accessed.</li> <li>Translating between URLs and <tt>$HOME/net/</tt> based paths sucks -- yepp, it does. But much less than not being able to access FTP/WebDAV shares from some apps but not from others, as we have it right now.</li> <li>Bah, you suck -- Yes, I do. On a straw, taking a nip from my caipirinha, right at the moment.</li> </ul> <p>I guess I don't have to list all the advantages of this solution, do I?</p> <p>BTW, pumping massive amounts of data through D-Bus sucks anyway.</p> <p>And no, I am not going to hack on this. Too busy with other stuff.</p> <p>The plane is now landing in HAM, that shall conclude our small rant.</p> <p><b>Update:</b> No, I didn't get a Caipirinha during my flight. That line I added in before publishing the blog story, which was when I was drinking my Caipirinha. In contrast to other people from the Free Software community I don't own my own private jet yet, with two stewardesses that might fix me a Caipirinha.</p> Lennart PoetteringTue, 01 May 2007 19:30:00 +0200tag:0pointer.net,2007-05-01:/blog/projects/gnomevfs-future.htmlprojectsTo whom it may concernhttps://0pointer.net/blog/redhat.html <p>In case anyone wants to know: starting today I am a Red Hat employee.</p> <p>It's nice if the first day on the new job is a public holiday.</p> Lennart PoetteringTue, 01 May 2007 14:40:00 +0200tag:0pointer.net,2007-05-01:/blog/redhat.htmlmiscPanoramic Hamburghttps://0pointer.net/blog/photos/hamburg-panoramas.html <p>Did I mention I love <a href="http://hugin.sourceforge.net/">Hugin</a>? I do, I really do:</p> <p><a href="http://0pointer.de/static/rathausmarkt.html"><img src="http://0pointer.de/static/rathausmarkt-small.jpg" width="1024" height="165" alt="Hamburg Rathausmarkt" /></a></p> <p><a href="http://0pointer.de/static/hbf.html"><img src="http://0pointer.de/static/hbf-gimped-small.jpg" width="1024" height="260" alt="Hamburg Central Station" /></a></p> <p><a href="http://0pointer.de/static/alsterarkaden.html"><img src="http://0pointer.de/static/alsterarkaden-gimped-small.jpeg" width="1024" height="189" alt="Hamburg Alsterarkaden" /></a></p> Lennart PoetteringTue, 24 Apr 2007 23:56:00 +0200tag:0pointer.net,2007-04-24:/blog/photos/hamburg-panoramas.htmlphotosThree Sistershttps://0pointer.net/blog/photos/three-sisters.html <p>Finally I found the time to sort <a href="http://0pointer.de/photos/?gallery=Australia">my photos from Australia</a>, when I vistited the country after linux.conf.au, in January this year. Some photos are quite good, many are not. However one panoramic view of the Three Sisters in the Blue Mountains NP is particularly beautiful:</p> <a href="http://0pointer.de/static/three-sisters.html"><img src="http://0pointer.de/static/three-sisters-small.jpeg" width="1024" height="124" alt="Three Sisters" /></a> <p>Just perfect as a desktop background on your Xinerama setup!</p> Lennart PoetteringSat, 21 Apr 2007 14:03:00 +0200tag:0pointer.net,2007-04-21:/blog/photos/three-sisters.htmlphotosWhat I miss in GNOMEhttps://0pointer.net/blog/projects/what-i-miss-in-gnome.html <p>A while back there has been a lot of noise about the GNOME "platform" and what GNOME 3.0 should be. Personally -- while I certainly like the progress GNOME makes as a "platform" -- I must say that the platform is already quite good. In my opinion, what is lacking right now are more the tools and utilities that are shipped *with* the GNOME platform than the platform itself. More specifically there are a set of (rather small) tools I am really missing in the standard set of GNOME tools. So, here's my wishlist, in case anybody is interested to know:</p> <p><tt>&lt;wishlist&gt;</tt></p> <ul> <li>A simple, usable VNC/RFB client as counterpart to the VNC server <tt>vino</tt> that has been shipped since early GNOME 2.0 times. Isn't it kind of awkward that we have been shipping a VNC server since ages, but no VNC client? What I want is a client (maybe called <tt>vinagre</tt> as a pun on <tt>vino</tt>) that is more than just a simple frontend to <tt>xvncviewer</tt>, but not necessarily too fancy. Something that integrates well into GNOME, i.e. uses D-Bus, gnome-keyring, <a href="http://0pointer.de/blog/projects/avahify-your-app.html">avahi-ui</a>. <a href="http://libvncserver.sourceforge.net/">There seems to be a libvncclient library</a> that might make the implementation of this tool easy. </li> <li>I am one of the (apparently not so few) people who run their GNOME session with <tt>LANG=de_DE</tt> and <tt>LC_MESSAGES=C</tt>, which enables german dates and everything else, but uses english messages. Right now it's a PITA to configure GNOME that way. It's not really documented how to do that, AFAIK. The best way to do this I found is to edit <tt>~/.gnomerc</tt> and set the variables in there. A simple capplet which allows setting these environment variables from <tt>gnome-session</tt> would be a much better way to configure this. Nothing to fancy again. Just two drop down lists, to choose <tt>LANG</tt> and <tt>LC_MESSAGES</tt> and maybe a subset of the other i18n variables, and possibly <tt>G_FILENAME_ENCODING</tt> (although I might be the only one who still hasn't switched his <tt>$HOME</tt> to UTF-8)</li> <li>There's no world clock in GNOME. Sure, there are <a href="http://www.timeanddate.com/worldclock/">online tools</a> for this, but I am not always online with my laptop.</li> <li>There is no simple tool to take photo snapshots or record short videos from webcams. I want to see something like <a href="http://camorama.fixedgear.org/">camorama</a> in <tt>gnome-media</tt>. Nothing too fancy again. No filters, no TV functionality. Just a small but useful GStreamer frontend.</li> <li>I'd like to see a simple BitTorrent client shipped with GNOME, which is integrated well into the rest of GNOME/Epiphany, so that downloading files from FTP or HTTP looks exactly like downloading them from Bittorrent.</li> </ul> <p><tt>&lt;/wishlist&gt;</tt></p> Lennart PoetteringFri, 20 Apr 2007 19:06:00 +0200tag:0pointer.net,2007-04-20:/blog/projects/what-i-miss-in-gnome.htmlprojectsAvahi on your N800https://0pointer.net/blog/projects/avahi-n800.html <p>I'd love to see proper Avahi support in the Nokia N800 (just think of proper file manager integration of announced WebDAV shares!), but until now Nokia doesn't ship Avahi in Maemo. However, there's now a simple way to install at least basic Avahi support on the N800. The INdT includes Avahi in their <a href="http://openbossa.indt.org.br/canola/">Canola</a> builds. Hence: just install Canola and your N800 will register itself via mDNS on your network.</p> <p>In related news: I am happy to see that Avahi has apparently been included in the just announced <a href="http://www.gnome.org/mobile/">GNOME Embedded Platform</a>.</p> Lennart PoetteringThu, 19 Apr 2007 19:29:00 +0200tag:0pointer.net,2007-04-19:/blog/projects/avahi-n800.htmlprojectsReleases, Releases, Releases ...https://0pointer.net/blog/projects/releases-releases-releases.html <p>I have just released new versions of a few of my packages:</p> <div><img src="http://avahi.org/chrome/site/avahi-trac.png" width="200" height="96" style="float:right; border: 0px; margin: 10px" alt="Avahi Logo" /></div> <ol> <li><a href="http://avahi.org/">Avahi 0.6.18</a>: The most interesting change is probably the addition of <tt>avahi-ui</tt>, our new GTK library which implements a standard dialog for browsing for Avahi services. A quick (albeit slightly out-of-date) introduction into <tt>avahi-ui</tt> (including screenshots) may be found in <a href="http://0pointer.de/blog/projects/avahify-your-app.html">this old blog story</a> of mine. If you are a developer of a GNOME application that acts as network client in some way, please consider adding support for <tt>avahi-ui</tt> to your project. Examples where adding support for <tt>avahi-ui</tt> makes sense are: <ul> <li>Mail applications such as Evolution may use it to browse for POP3, POP3S, IMAP, IMAPS and SMTP servers.</li> <li>VNC applications may use it to browse for VNC/RFB servers</li> <li>Database clients such as Glom may use it to browse for PostrgreSQL servers</li> <li>FTP clients may use it to browse for FTP servers</li> <li>RSS readers may use it to browse for local RSS feeds</li> <li>And lots of others</li> </ul> There are lots of other small and not so small changes in Avahi 0.6.18.</li> <li><a href="http://0pointer.de/lennart/projects/mod_dnssd/">mod_dnssd 0.5</a>: Mostly an update for Apache 2.2</li> <li><a href="http://0pointer.de/lennart/projects/mod_mime_xattr/">mod_mime_xattr 0.4</a>: dito</li> </ol> Lennart PoetteringWed, 18 Apr 2007 23:23:00 +0200tag:0pointer.net,2007-04-18:/blog/projects/releases-releases-releases.htmlprojectsI Am Free Again!https://0pointer.net/blog/thesis.html <p><i>Es ist vollbracht!</i> Today, at 10:54 am -- 66 min before deadline -- I handed in my diploma thesis <sup>[1]</sup>. In a few weeks time you may call me <i>Diplom-Informatiker</i>... <i>Herr Diplom-Informatiker.</i></p> <p>That's all.</p> <p><b>Footnotes:</b></p> <p><small>[1] Thesis title is <i>Diensteverwaltung in Ad-Hoc-Netzwerken</i> (which roughly translates to <i>Service Discovery in Ad-Hoc Networks</i>). Basically, the thesis is about "Mesh-DNS", a protocol akin to <a href="http://www.multicastdns.org/">Multicast DNS</a> (mDNS), which scales better, fixes a few things and takes Mesh network architectures into account. It is intended to be integrated into <a href="http://avahi.org/">Avahi</a> and to be used as service discovery protocol in OLPC. It is compatible with <a href="http://www.dns-sd.org/">DNS-SD</a>, but replaces mDNS. Due to that all existing software linking against Avahi can make use of it without any major changes. It adds a zone <tt>.mesh</tt> which is organized by Mesh-DNS side-by-side to the mDNS-maintained zone <tt>.local</tt>. You will be able to enable support for Mesh-DNS at Avahi compile time. Most likely most distros won't enable it in their default builds, although it offers quite a few features even outside OLPC, such as automatic, idiot-proof router transparency.</small></p> Lennart PoetteringWed, 11 Apr 2007 23:28:00 +0200tag:0pointer.net,2007-04-11:/blog/thesis.htmlmiscWhat's going on with LinuxTag?https://0pointer.net/blog/projects/linuxtag.html <p>Does anybody know what's going on with <a href="https://www.linuxtag.org/2007/">LinuxTag</a>? I submitted a presentation proposal a few months ago. I haven't yet received an email whether my talk has been accepted or not. According to <a href="http://www.linuxtag.org/2007/de/conf/cfp/vp-deadlines.html">their website</a> notification emails should have been sent out on march 13th. Which is nearly a month ago now. When I login to the "virtual conference center" I see that my paper is still "In Review". They didn't respond to my emails (twice).</p> <p>Does anyone have an idea what is going on?</p> Lennart PoetteringTue, 10 Apr 2007 20:34:00 +0200tag:0pointer.net,2007-04-10:/blog/projects/linuxtag.htmlprojectsDear Lazyweb!https://0pointer.net/blog/projects/gnumeric-psfrag.html <p>Does anyone know what I can do to get <a href="http://www.ctan.org/tex-archive/help/Catalogue/entries/psfrag.html">psfrag</a> work with PostScript files generated from <a href="http://www.gnome.org/projects/gnumeric/">Gnumeric</a> (i.e. Cairo) charts?</p> <p>Oh, and why does Gnumeric insist on using <tt>.ps</tt> as suffix for exported charts, although the files written are perfectly valid <tt>.eps</tt> files and presumably everyone uses them as such?</p> <p>Thanks!</p> Lennart PoetteringMon, 09 Apr 2007 13:22:00 +0200tag:0pointer.net,2007-04-09:/blog/projects/gnumeric-psfrag.htmlprojectsPiles of Paperhttps://0pointer.net/blog/pile-of-paper.html <p>As an experiment to test how much people are willing to pay for big big pile of old paper, I am selling <a href="http://cgi.ebay.de/ws/eBayISAPI.dll?ViewItem&amp;item=250097060215">my collection of old editions of the german computer magazine c't</a> on ebay.de. Do yourself something good and buy yourself a piece of (german) computer history. It's a unique chance because if noone wants it I am going to give it into recycling, or maybe make a big, big bonfire.</p> <p>I heard that you can attract girls by reading old german computer magazines from the late nineties. Not that this would have worked for me, but maybe it works for you? There is nothing more attractive to a girl than old computer magazines, especially if they are in a foreign language you don't understand.</p> <p>No, I am not kidding!</p> <p><a href="http://cgi.ebay.de/ws/eBayISAPI.dll?ViewItem&amp;item=250097060215"><img src="http://0pointer.de/public/ctcatchy.jpeg" width="600" height="374" alt="Big Pile of Paper" /></a></p> Lennart PoetteringTue, 27 Mar 2007 17:33:00 +0200tag:0pointer.net,2007-03-27:/blog/pile-of-paper.htmlmiscNo GSoC for Avahihttps://0pointer.net/blog/projects/gsoc-avahi.html <p>As it seems, the <a href="http://avahi.org/">Avahi project</a> has not been accepted as <a href="http://code.google.com/soc/">Google Summer of Code</a> organization, <a href="http://blogs.gnome.org/view/uraeus/2007/03/15/0">much like the GStreamer project</a>.</p> <p>Grr, I cannot say I really understand why three wiki engines <sup>[1]</sup> got accepted, or a UI frontend for <tt>nmap</tt> - but not important infrastructure projects like GStreamer or Avahi. Mhmm, maybe I am just envious, and considering these two projects <i>important</i> is just hybris...</p> <p>Anyway, we had already prepared <a href="http://avahi.org/wiki/GoogleSummerOfCode">a list of exciting</a> <sup>[2]</sup> GSoC project ideas for Avahi. If anyone is interested to work on one of these there might be a small chance to get this done under <a href="http://live.gnome.org/SummerOfCode2007/Ideas">the GNOME umbrella</a>. Feel free to contact either me or Trent if you are interested!</p> <p><b>Footnotes:</b></p> <p><small>[1] If there is something we already have enough of in Free Software - then it is Wiki engines. just check the output of <tt>apt-cache search wiki | wc -l</tt> on a recent Debian system.</small></p> <p><small>[2] In our definition of <i>exciting</i>, of course - which doesn't seem to be the same as Google's. Grrrh!</small></p> Lennart PoetteringThu, 15 Mar 2007 16:52:00 +0100tag:0pointer.net,2007-03-15:/blog/projects/gsoc-avahi.htmlprojectsSelling my Nokia 770https://0pointer.net/blog/ebay-770.html <p>I was one of the lucky ones to get a Nokia N800 developer discount code, and am now a proud owner of one of these toys. Thus I decided <a href="http://cgi.ebay.de/ws/eBayISAPI.dll?ViewItem&amp;item=250087003779">to sell off my old Nokia 770 at Ebay.de</a>. This is your one-time chance to buy a 770 previously owned by one of the Avahi and PulseAudio developers! Wooow! Don't miss this chance to add this exclusive device to your memorabilia collection!</p> Lennart PoetteringThu, 22 Feb 2007 19:32:00 +0100tag:0pointer.net,2007-02-22:/blog/ebay-770.htmlmiscFOMS/LCA Recaphttps://0pointer.net/blog/projects/foms-lca-recap.html <p>Finally, here's my <a href="http://lca2007.linux.org.au/">linux.conf.au 2007</a> and <a href="http://www.annodex.org/events/foms2007/">FOMS 2007</a> recap. Maybe a little bit late, but better late then never.</p> <p>FOMS was a very well organized conference with a packed schedule and a lot of high-profile attendees. To my surprise <a href="http://pulseaudio.org/">PulseAudio</a> has been accepted by the attendees without any opposition (at least none was expressed aloud). After a few "discussions" on a few mailing lists (including GNOME MLs) and some personal emails I got, I had thought that more people were in opposition of the idea of having a userspace sound daemon for the desktop. Apparently, I was overly pessimistic. Good news, that!</p> <p>During the FOMS conference we discussed the problems audio on Linux currently has. One of the major issues still is that we're lacking a cross-platform PCM audio API everyone agrees on. ALSA is Linux-specific and complicated to use. The only real contender is PortAudio. However, PortAudio has its share of problems and hasn't reach wide adoption yet. Right now most larger software projects implement an audio abstraction layer of some kind, and mostly in a very dirty, simplistic and limited fasion. MPlayer does, Xine does it, Flash does it. Everyone does it, and it sucks. (Note: this is only a very short overview why audio on Linux sucks right now. For a longer one, please have a look on the first 15mins of my PulseAudio talk at LCA, linked below.)</p> <p>Several people were asking why not to make the PulseAudio API the new "standard" PCM API for Linux. Due to several reasons that would be a bad idea. First of all, the PulseAudio API cannot be used on anything else but PulseAudio. While PulseAudio has been ported to Win32, Vista already has a userspace desktop sound server, hence running PulseAudio on top of that doesn't make much sense. Thus the API is not exactly cross-platform. Secondly, I - as the guy who designed it - am not happy with the current PulseAudio API. While it is very powerful it is also very difficult to use and easy to misuse, mostly due to its fully asynchronous nature. In addition it is also not the exactly smallest API around.</p> <p>So, what could be done about this? We agreed on a - maybe - controversional solution: defining yet another abstracted PCM audio API. Yes, fixing the problem that we have too many conflicting, competing sound systems by defining yet another API sounds like a paradoxon, but I do believe this is the right path to follow. Why? Because none of the currently available solutions is suitable for all application areas we have on Linux. Either the current APIs are not portable, or they are horribly difficult to use properly, or have a strange license, or are too simple in their functionality. MacOSX managed to establish a single audio API (CoreAudio) that makes almost everyone happy on that system - and we should be able to do same for Linux. Secondly, none of the current APIs has been designed with network sound servers in mind. However, proper networking support reflects back into the API, and in a non-trivial way. An API which works fine in networked environment needs to eliminate roundtrips where possible, be open for time interpolation and have a flexible buffering (besides other minor things). Thirdly none of the current APIs offers enough functionality to properly support all the needs of modern desktop sound systems, such as per-stream volumes, stream names and notifications about external state changes.</p> <p>During FOMS and LCA, Mikko Leppanen (from Nokia), Jean-Marc Valin (from Xiph) and I sat down and designed a draft API for the functionality we would like to see in this API. For the time being we dubbed it <tt>libsydney</tt>, after the city where we started this project. I plan to make this the only supported audio API for PulseAudio, eventually. Thus, if you will code against PulseAudio you will get cross-platform support for free. In addition, because PulseAudio is now being integrated into the major distributions (at least Ubuntu and Fedora), this library will be made available on most systems through the backdoor.</p> <p>So, what will this new API offer? Firstly, the buffering model is much more powerful than of any current sound API. The buffering model mostly follows PulseAudio's internal buffering model which (theoretically) can offer zero-latency streaming and has been pioneered by Jim Gettys' AF sound server. It allows you to seek around in the playback buffer very flexibly. This is very useful to allow very fast reaction to the user's playback control commands while still allowing large buffers, which are good to deal with high network lag. In addition it is very handy for the programmer, such as when implementing streaming clients where packets may arrive out-of-order. The API will emulate this buffering model on top of traditional audio devices, and when used on top of PulseAudio it will use its native implementation. The API will also clearly define which sound formats are guaranteed to be available, thus making it a lot easier to code without thinking of different hardware supporting different formats all the time. Of course, the API will be easier to use than PulseAudio's current API. It will be very portable, scaling from FPU-less architectures to pro-audio machines with a massive number of synchronised channels. There are several modes available to deal with XRUNs semi-automatically, one of them guaranteeing that the time axis stays linear and monotonical in all events.</p> <p>The list of features of this new API is much longer, however, enough of these grand plans! We didn't write any real code for this yet. To make sure that this project is not another one of those which are announced grandiosely without ever producing any code I will stop listing features here now. We will eventually publish a first draft of our C API for public discussion. Stay tuned.</p> <p>Side-by-side with <tt>libsydney</tt> I discussed an abstract API for desktop event sounds with Mikko (i.e. those annoying "bing" sounds when you click a button and the like). Dubbed <tt>libcanberra</tt> (named after the city which one of the developers visited after Sydney), this will hopefully be for the PulseAudio sample cache API what <tt>libsydney</tt> is for the PulseAudio streaming API: a total replacement.</p> <p>As a by-product of the <tt>libsydney</tt> discussion Jean-Marc coded <a href="http://svn.xiph.org/trunk/speex/libspeex/resample.h">a fast C resampling library</a> supporting both floating point and fixed point and being licensed under BSD. (In contrast to <tt>libsamplerate</tt> which is GPL and floating-point-only, but which probably has better quality). PulseAudio will make use of this new library, as will <tt>libsydney</tt>. And I sincerly hope that ALSA, GStreamer and other projects replace their crappy home-grown resamplers with this one!</p> <p>For PulseAudio I was looking for a CODEC which we could use to encode audio if we have to transfer it over the network. Such a CODEC would need to have low CPU requirements and allow low-latency operation, while providing hifi audio. Compression ratio is not such a high requirement. Unfortunately, as it seems no such CODEC exists, especially not a "Free" one. However, the Xiph people recommended to hack up a special version of FLAC for this task. FLAC is fast, has (obviously) good quality and if hacked up could provide low-latency encoding. However, FLAC doesn't compress that well. Current PulseAudio thin-client installations require 170kB network bandwidth for each client if hifi audio is used. Encoding this in FLAC this could cut this in half. Not perfect, but better than nothing.</p> <p>So, that was FOMS! FOMS is a definitely highly recommended conference. If you have the chance to attend next year, don't miss it! I've never been to a more productive, packed conference in my life!</p> <p>At LCA I met fellow Avahi coder Trent Lloyd for the first time. Our talk about Avahi went very well. During my flights to and back from <tt>.au</tt> I hacked up <a href="http://0pointer.de/blog/projects/avahify-your-app.html">avahi-ui</a> which I also announced during that talk. Also, in related news, <tt>tedp</tt> started to work on an implementation of <a href="http://files.dns-sd.org/draft-cheshire-nat-pmp.txt">NAT-PMP</a> (aka "reverse firewall piercing"; both client and server) for inclusion in Avahi. This will hopefully make the upcoming Wide-Area DNS support in Avahi much more useful.</p> <p><tt>linux.conf.au</tt> was a very exciting conference. As a speaker you're treated like a rock star, with stuff like the speakers dinner, the speakers adventure (climbing on top of Sydney's AMP tower) and the penguin dinner. Heck, the organizers even picked me up at the airport, something I really didn't expect when I landed in Sydney, which however is quite nice after a 27h flight.</p> <p>Two talks I particularly enjoyed at LCA:</p> <ul> <li><a href="http://lca2007.linux.org.au/talk/154">nouveau - reverse engineered nvidia drivers</a> (<a href="http://mirror.linux.org.au/pub/linux.conf.au/2007/video/talks/154.ogg">Ogg Theora</a>)</li> <li><a href="http://lca2007.linux.org.au/talk/221">burning cpu and battery on the gnome desktop</a> (<a href="http://mirror.linux.org.au/pub/linux.conf.au/2007/video/talks/221.ogg">Ogg Theora</a>)</li> </ul> <p>And just for the sake of completeness, here are the links to my presentations:</p> <ul> <li><a href="http://lca2007.linux.org.au/talk/211">The PulseAudio Sound Server</a> (<a href="http://mirror.linux.org.au/pub/linux.conf.au/2007/video/talks/211.ogg">Ogg Theora</a>; <a href="http://0pointer.de/public/pulseaudio-presentation-lca2007.pdf">Slides</a>)</li> <li>Using Avahi the "Right Way" (<a href="http://mirror.linux.org.au/linux.conf.au/2007/video/monday/monday_1150_GNOME.ogg">Ogg Theora</a>; <a href="http://0pointer.de/public/avahi-presentation-lca2007.pdf">Slides</a>)</li> </ul> <p>Ok, that's it for now. Thanks go to Silvia Pfeiffer, the rest of the FOMS team and the Seven Team for organizing these two amazing conferences!</p> Lennart PoetteringThu, 08 Feb 2007 21:51:00 +0100tag:0pointer.net,2007-02-08:/blog/projects/foms-lca-recap.htmlprojectsAvahify Your Application!https://0pointer.net/blog/projects/avahify-your-app.html <p>It has never been easier to add Zeroconf service discovery support to your GTK application!</p> <p>The upcoming <a href="http://avahi.org/">Avahi 0.6.18</a> will ship with a new library <tt>libavahi-ui</tt> which contains a GTK UI dialog <tt>AuiServiceDialog</tt>, a simple and easy-to-use dialog for selecting Zeroconf services, similar in style to <tt>GtkFileChooserDialog</tt> and friends. This dialog should be used whenever there is an IP server to enter in a GTK GUI. For example:</p> <ul> <li>Mail applications such as Evolution may use it to browse for POP3, POP3S, IMAP, IMAPS and SMTP servers.</li> <li>VNC applications may use it to browse for VNC/RFB servers</li> <li>Database clients such as Glom may use it to browse for PostrgreSQL servers</li> <li>FTP clients may use it to browse for FTP servers</li> <li>RSS readers may use it to browse for local RSS feeds</li> </ul> <p>So, how does it look like? Here's a screenshot of a service dialog browsing for FTP, SFTP and WebDAV shares simultaneously:</p> <p><img src="http://0pointer.de/public/service-dialog.png" width="484" height="441" alt="Service Dialog" /></p> <p>The dialog properly supports browsing in remote domains, browsing for multiple service types at the same time (i.e. POP3 and POP3S) and supports multi-homed services. It will also resolve the services if requested. Avahi will ship a (very useful!) example tool <a href="http://avahi.org/browser/trunk/avahi-ui/zssh.c"><tt>zssh.c</tt></a> which if started from the command line allows you to quickly browse for local SSH servers and connect to one of those available. (<a href="http://0pointer.de/public/zssh-screencast.ogm">Short Theora screencast of <tt>zssh</tt></a> - Please excuse the strange cursor, seems to be a bug in Istanbul 0.2.1, which BTW is totally broken on multi-headed setups):</p> <p>A simple application making use of this dialog might look like this:</p> <pre> #include &lt;gtk/gtk.h&gt; #include &lt;avahi-ui/avahi-ui.h&gt; int main(int argc, char*argv[]) { GtkWidget *d; gtk_init(&amp;argc, &amp;argv); d = aui_service_dialog_new("Choose Web Service"); aui_service_dialog_set_browse_service_types(AUI_SERVICE_DIALOG(d), "_http._tcp", "_https._tcp", NULL); if (gtk_dialog_run(GTK_DIALOG(d)) == GTK_RESPONSE_OK) g_message("Selected service name: %s; service type: %s; host name: %s; port: %u", aui_service_dialog_get_service_name(AUI_SERVICE_DIALOG(d)), aui_service_dialog_get_service_type(AUI_SERVICE_DIALOG(d)), aui_service_dialog_get_host_name(AUI_SERVICE_DIALOG(d)), aui_service_dialog_get_port(AUI_SERVICE_DIALOG(d))); else g_message("Canceled."); gtk_widget_destroy(d); return 0; } </pre> <p>A more elaborate example is <a href="http://avahi.org/browser/trunk/avahi-ui/zssh.c"><tt>zssh.c</tt></a>. You may browse <a href="http://avahi.org/browser/trunk/avahi-ui/avahi-ui.h">the full API online</a>.</p> <p><tt>AuiServiceDialog</tt> is not perfect yet. It still lacks i18n and a11y support. In addition it follows the HIG only very roughly. Patches welcome! I am also very interested in feedback from more experienced GTK programmers, since my experience with implementing GTK controls is rather limited. This is my first GTK library which should really feel like a GTK API. So please, read through <a href="http://avahi.org/browser/trunk/avahi-ui/avahi-ui.h">the API</a> and <a href="http://avahi.org/browser/trunk/avahi-ui/avahi-ui.c">the implementation</a> and send me your comments! Thank you!</p> <p>If you want to integrate <tt>AuiServiceDialog</tt> into your application and don't want to wait for Avahi 0.6.18, just copy <a href="http://avahi.org/browser/trunk/avahi-ui/avahi-ui.h?format=txt"><tt>avahi-ui.h</tt></a> and <a href="http://avahi.org/browser/trunk/avahi-ui/avahi-ui.c?format=txt"><tt>avahi-ui.c</tt></a> into your sources and make sure to add <tt>avahi-client</tt>, <tt>avahi-glib</tt>, <tt>gtk+-2.0</tt> to your <tt>pkg-config</tt> dependencies.</p> Lennart PoetteringWed, 07 Feb 2007 13:56:00 +0100tag:0pointer.net,2007-02-07:/blog/projects/avahify-your-app.htmlprojectsIQ Light Maniahttps://0pointer.net/blog/iq-light-mania.html <p><a href="http://0pointer.de/blog/iq-in-the-movies.html">As promised</a> <a href="http://0pointer.de/photos/?gallery=Mexican%20IQ">here's a gallery</a> of better quality photos of a mobile made from mexican style IQ lights.</p> <p><a href="http://0pointer.de/photos/?gallery=Mexican%20IQ"><img src="http://0pointer.de/photos/galleries/Mexican%20IQ/lq/img-7.jpg" width="640" height="427" alt="IQ Light Mobile" /></a></p> <p>All these lights have been fabricated using <a href="http://0pointer.de/blog/iqlamp-stencil.html">this stencil</a> and <a href="http://0pointer.de/blog/iq-light-final.html">this material</a>.</p> <p>I hope this gallery shows a little bit how fascinating these lamps are and explain why I am so obsessed of them that I cannot stop blogging about them.</p> Lennart PoetteringSun, 04 Feb 2007 17:55:00 +0100tag:0pointer.net,2007-02-04:/blog/iq-light-mania.htmlmiscIQ in the Movieshttps://0pointer.net/blog/iq-in-the-movies.html <p>The (original) <a href="http://0pointer.de/blog/iq-light-final.html">IQ Light</a> is featured in the stylish and funny Hollywood movie <a href="http://imdb.com/title/tt0425210/">Lucky Number Slevin</a>:</p> <p><a href="http://imdb.com/title/tt0425210/"><img src="http://0pointer.de/public/lns2.jpeg" width="608" height="256" alt="Lucky Number Slevin Still" /></a></p> <p>Related to this, don't miss <a href="http://point-at-infinity.org/iq/">this small but beautiful gallery of a mobile</a> built entirely from (mexican style) IQ lights of various sizes. I hope to post better quality pictures of the same mobile shortly:</p> <p><a href="http://point-at-infinity.org/iq/"><img src="http://0pointer.de/public/iq-gal.jpeg" width="570" height="280" alt="IQ Gallery" /></a></p> <p>Oh, and I am finally back in <tt>.de</tt> after my trip to <tt>.au</tt> and <a href="http://lca2007.linux.org.au/">linux.conf.au 2007</a>/<a href="http://www.annodex.org/events/foms2007/">FOMS 2007</a>. I hope to post a recap of the conferences and their outcome for <a href="http://pulseaudio.org/">PulseAudio</a> and <a href="http://avahi.org/">Avahi</a> shortly.</p> <p>Thanks to the impressing work of Silvia Pfeiffer and the LCA video team there's now a video of my PulseAudio presentation at LCA available online. (<a href="http://mirror.linux.org.au/pub/linux.conf.au/2007/video/talks/211.ogg">Ogg Theora</a>, <a href="http://lca2007.linux.org.au/talk/211">Java Cortado</a>). Don't miss it!</p> Lennart PoetteringFri, 02 Feb 2007 16:39:00 +0100tag:0pointer.net,2007-02-02:/blog/iq-in-the-movies.htmlmiscGood Morning, Freedom Lovers!https://0pointer.net/blog/projects/freedom-lovers.html <p>On popular request, the slides of my PulseAudio talk at <a href="http://lca2007.linux.org.au">linux.conf.au 2007</a> are now <a href="http://0pointer.de/public/pulseaudio-presentation-lca2007.pdf">available for download</a>. And <a href="http://0pointer.de/public/avahi-presentation-lca2007.pdf">here are the slides of the Avahi talk</a> Trent and I did on GNOME.conf.au 2007. Videos will hopefully be available shortly from the LCA web site.</p> <p><i>... Horses? Did anyone say "Horses"?</i></p> Lennart PoetteringFri, 19 Jan 2007 01:01:00 +0100tag:0pointer.net,2007-01-19:/blog/projects/freedom-lovers.htmlprojectsFOMS 2007/Linux.conf.au 2007https://0pointer.net/blog/projects/foms-lca-2007.html <p>On Wed, January 17th, I will be speaking at <a href="http://lca2007.linux.org.au/talk/211">linux.conf.au 2007</a> about the <a href="http://pulseaudio.org/">PulseAudio sound server</a>. Before that, on Mon January 15th, I will do a presentation about <a href="http://avahi.org/">Avahi</a>, together with Trent Lloyd, at GNOME.conf.au 2007. And even before that, I will attend <a href="http://www.annodex.org/events/foms2007/Main/SubjectEntries">FOMS 2007</a>, and probably say a word or two about <a href="http://pulseaudio.org/">PulseAudio</a>, again.</p> <p>Can't wait for those 25h+ of flying from <tt>.de</tt> to <tt>.au</tt>!</p> Lennart PoetteringSat, 06 Jan 2007 19:48:00 +0100tag:0pointer.net,2007-01-06:/blog/projects/foms-lca-2007.htmlprojectsOne last followuphttps://0pointer.net/blog/iq-light-final.html <p>A small, final followup on the <a href="http://0pointer.de/blog/mexico-lamp.html">blog stories</a> <a href="http://0pointer.de/blog/chasing-light.html">about the mexican style</a> <a href="http://0pointer.de/blog/iqlamp-stencil.html">IQ Light</a>:</p> <p>After some unsuccessful experimenting with materials like Polystyrene (cracks too easily), I settled on 0.3mm white Polypropylene which is both easy to work with and easy to find. The light becomes a little bit blue-greyish cold. I bought <a href="http://www.google.com/search?q=ibico%20polyopaque">Ibico PolyOpaque report covers</a> for this purpose, which you can get at German Staples stores. You can get it in 25, 50 or 100 DIN-A4 packs. Because only two full-size pieces can be cut from a single A4 sheet and you need 30 pieces you need at least 15 sheets for a single full-size lamp. I built 10 lamps in various sizes from this material and it seems to work pretty well. </p> <p>Have fun!</p> Lennart PoetteringFri, 29 Dec 2006 20:35:00 +0100tag:0pointer.net,2006-12-29:/blog/iq-light-final.htmlmiscDas Leben der Anderenhttps://0pointer.net/blog/leben-der-anderen.html <p>German movies are usually not my thing - I don't like the topics, I don't like the scripting, I don't like the acting, I don't like the actors, I don't like the drama and I don't like the humor. (Ok, they usually lack humor entirely, so there's not much not to like of the humor.)</p> <p>However, there's now a notable exception: <a href="http://imdb.com/title/tt0405094/">Das Leben der Anderen</a> (<i>The Lives of Others</i>) is a very good film, one that I really like. It's an absorbing drama, the scripting is good and the acting is fine. There's a good reason that it has won the European Movie Award (Best Film) and is one of the top contenders for next years' Oscar (at least the foreign language one).</p> <p>If you get the chance to see this movie, do it! It's worth it.</p> Lennart PoetteringThu, 14 Dec 2006 02:52:00 +0100tag:0pointer.net,2006-12-14:/blog/leben-der-anderen.htmlmiscSan Franciscohttps://0pointer.net/blog/photos/san-fran.html <p>As a followup to my <a href="http://0pointer.de/static/windows">Windows of Barcelona</a> series I prepared <a href="http://0pointer.de/static/windows-sf">Windows of San Francisco</a>:</p> <p><a href="http://0pointer.de/static/windows-sf"><img src="http://0pointer.de/static/windows-sf-small.jpeg" width="225" height="222" alt="Windows of San Francisco" /></a></p> <p>A few other series :</p> <ul> <li><a href="http://0pointer.de/static/adjazenz">Adjazenz Nummero 47</a>, aka <i>Triptych to the City of San Francisco</i> (shown below)</li> <li><a href="http://0pointer.de/static/kongruenz">Kongruenz Nummero 22</a></li> <li><a href="http://0pointer.de/static/korrespondenz">Korrespondenz Nummero 105</a></li> <li><a href="http://0pointer.de/static/resonanz">Resonanz Nummero 62</a></li> </ul> <p><a href="http://0pointer.de/static/adjazenz"><img src="http://0pointer.de/static/adjazenz-small.jpeg" width="400" height="190" alt="Adjazenz!" /></a></p> <p>No, the German names and numbers of the series don't have any special meaning, their sole purpose is to sound "artsy", in the spirit of the famous work "Fluktuation 8" by a certain polish action artist.</p> <p>The remaining photos I made during my visit in San Francisco after the Ubuntu Developers' Summit in Mountain View in November <a href="http://0pointer.de/photos/?gallery=San%20Francisco">are now online</a>, as well.</p> Lennart PoetteringWed, 13 Dec 2006 19:25:00 +0100tag:0pointer.net,2006-12-13:/blog/photos/san-fran.htmlphotosUnique Eyebrowshttps://0pointer.net/blog/photos/unique-eyebrows.html <p>Dear American People,</p> <p>I guess you'll find businesses selling <i>unique eyebrow designs</i> only in god's own country:</p> <p><a href="http://0pointer.de/static/unique-eyebrows"><img src="http://0pointer.de/static/unique-eyebrows.jpeg" width="467" height="700" alt="Unique Eyebrows" /></a></p> <p>And what does "unique" mean? Do their customers get two different designs for their two eyebrows? - What a bargain!</p> <p>Groucho Marx' greasepaint eyebrows are unique, in a way. Maybe that's what they are selling?</p> <p>Confused,<br /> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Lennart (a worried European)</p> Lennart PoetteringWed, 13 Dec 2006 01:03:00 +0100tag:0pointer.net,2006-12-13:/blog/photos/unique-eyebrows.htmlphotosInterlocking Quadrilateralshttps://0pointer.net/blog/iqlamp-stencil.html <p><a href="http://0pointer.de/blog/chasing-light.html">As promised</a>, here's a stencil drawing of the Mexican-style IQ Lamp: <a href="http://0pointer.de/public/iqlamp.ps">.ps</a>, <a href="http://0pointer.de/public/iqlamp.svg">.svg</a>, <a href="http://0pointer.de/public/iqlamp.pdf">.pdf</a>. (1:1, DIN A4/ISO 216 paper size)</p> <p><img src="http://0pointer.de/public/iqlamp.png" width="150" height="119" alt="Fake IQ Light from Mexico - Stencil" /></p> <p>30 of these are needed to assemble one mexican style lamp, as depicted below. The material to cut these patterns from needs to be a thin (less than .5 mm thick) plastic (or maybe cardboard) which needs to be flexible - but not too flexible, and not glossy. It might be advisable to use energy-saving light bulbs for this lamp. They are entirely hidden inside the lamp and might be good to avoid overheating of the plastic. <a href="http://www.sadiethepilot.com/iqweb/iqhowto.htm">Assembling instructions</a>, <a href="http://www.bald-bang.com/IQlight/IQ%20video.html">Video</a>, <a href="http://www.instructables.com/id/E9TA9AH137ET2JYI75/">Instructable</a>. Please note that assembling the mexican-style IQ light needs a quite a bit manual force because all pieces are bent a little, in contrast to the original danish design which appears to be assembled without any force. (at least the video clip suggests that.) For mounting a cable/lamp socket you might need to cut a small hole in one of the plastic sheets, to put the cable through.</p> <p>Once again the photo:</p> <p><img src="http://0pointer.de/public/iq-lamp-mexico.jpeg" width="369" height="359" alt="Fake IQ Light from Mexico" /></p> <p>Have fun!</p> Lennart PoetteringMon, 04 Dec 2006 23:26:00 +0100tag:0pointer.net,2006-12-04:/blog/iqlamp-stencil.htmlmiscEs ist vollbracht!https://0pointer.net/blog/projects/2.6.19.html <p>Yes, finally <a href="http://www.kernel.org/git/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0215ffb08ce99e2bb59eca114a99499a4d06e704">Linux 2.6.19</a> has been released. So you wonder why is this something to blog about? -- Because it is the first Linux version that contains my super-cool <a href="http://www.kernel.org/git/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;h=fdb7153f4426c42b35be5e1206424c984f4de5ea;hb=0215ffb08ce99e2bb59eca114a99499a4d06e704;f=drivers/misc/msi-laptop.c">MSI Laptop driver</a>, one of the most impressing attainments of mankind, only excelled perhaps by <a href="http://kryptochef.net/index2e.htm">KRYPTOCHEF</a>, the only tool in existence which does <i>fullbit</i> encryption.</p> Lennart PoetteringThu, 30 Nov 2006 02:22:00 +0100tag:0pointer.net,2006-11-30:/blog/projects/2.6.19.htmlprojectsChasing A Lighthttps://0pointer.net/blog/chasing-light.html <p>Last friday I posted a <a href="http://0pointer.de/blog/mexico-lamp.html">little Lazyweb experiment</a>, a hunt for information about a certain kind of lamp sold by a street dealer in Mexico City. A quick followup on the results:</p> <p>Surprinsingly many people responded, mostly by email, and partly by <a href="http://0pointer.de/blog/mexico-lamp.html#1164411299.8">blog comment</a>. As it appears I am not the only one who's looking for this specific type of lamp. Furthermore, a non-trivial set of Planet Gnome readers actually already owns one of these devices. Apparently counterfeit versions of this lamp are sold all around the world by street dealers and on markets.</p> <p>The lamp seems to be a modified version of the "IQ Light", a <i>self assembly lighting system made up of interlocking quadrilaterals</i>. It is a scandinavian design, by Holger Str&oslash;m, 1973. It is nowadays exclusively distributed by <a href="http://www.bald-bang.com/">Bald &amp; Bang</a>, Denmark. The lighting system has a very interesting <a href="http://www.iqlight.com/">web site of its own</a>, which even includes an <a href="http://www.sadiethepilot.com/iqweb/iqhowto.htm">HOWTO</a> for assembling these lamps. The <a href="http://www.bald-bang.com/IQlight/IQ%20video.html">Bald &amp; Bang web site</a> has a very stylish video which also shows how to assemble an IQ lamp.</p> <p><img src="http://0pointer.de/public/iq-lamp-mexico.jpeg" width="369" height="359" alt="Fake IQ Light from Mexico" /></p> <p>While my mexican specimen and the official design are very similar, they differ: the mexican design looks - in a way - "tighter" and ... better (at least in my humble opinion). For comparison, please have a look on the photo I took from the mexican version which is shown above, and on the many photos returned by <a href="http://images.google.com/images?hl=en&amp;q=iq%20light&amp;btnG=Google+Search&amp;ie=UTF-8&amp;oe=UTF-8&amp;sa=N&amp;tab=wi">Google Images</a>, or the one from the <a href="http://www.sadiethepilot.com/iqweb/iqsorts.htm">IQ Light homepage</a>. It appears as if the basic geometrical form used by the mexican design is somehow more narrow than the official danish one.</p> <p>So, where can one buy one of those lamps? Fake and real ones are sold <a href="http://search.ebay.de/iq-lamp_W0QQcatrefZC6QQcoactionZcompareQQcoentrypageZsearchQQcopagenumZ1QQfposZ22397QQfromZR10QQfsooZ1QQfsopZ1QQftrtZ1QQftrvZ1QQga10244Z10425QQsacatZQ2d1QQsadisZ200QQsargnZQ2d1QQsaslcZ3QQsbrftogZ1QQsofocusZbs">on eBay</a>, <a href="http://search.ebay.com/iq-lamp_W0QQfkrZ1QQfromZR8QQsubmitsearchZSearch">every now an then</a>. The <a href="http://www.momastore.org/museum/moma/ProductDisplay_IQ%20Light%20Shade_10451_10001_16912_-1_11461_11463_null__">Museum Store of the New York MoMA</a> sells the original version for super-cheap $160. If you search with Google you'll find many more offers like this one, but all of them are not exactly cheap - for a bunch of thin plastic sheets. All these shops sell the danish version of the design, noone was able to point me to a shop where the modified, "mexican" version is sold.</p> <p>Given the hefty price tag and the fact that the fake, mexican version looks better then the original one, I will now build my own lamps, based on the mexican design. For that I will disassamble my specimen (at least partially) and create a paper stencil of the basic plastic pattern. I hope to put this up for download as a <tt>.ps</tt> file some time next week, since many people asked for instructions for building these lamps. Presumably the original design is protected by copyright, hence I will not publish a step-by-step guide how to build your own fake version. But thankfully this is not even necessary, since the vendor already published a HOWTO and a video for this, online.</p> <p>Thank you very much for your numerous responses!</p> Lennart PoetteringMon, 27 Nov 2006 21:37:00 +0100tag:0pointer.net,2006-11-27:/blog/chasing-light.htmlmiscUbuntu vs. Free Softwarehttps://0pointer.net/blog/projects/ubuntu-vs-free-software.html <p>Everybody should read <a href="http://kennke.org/blog/?p=31">Roman Kennke's take on Mark Shuttleworth's OpenSUSE spam mail</a>. It's constructive and sensible.</p> <p>I hope the Ubuntu people find the strength to resist the short-term bliss of desktop bling for long-term software freedom!</p> <p>Please learn the lession Java teaches us: resist the temptation of closed source software and develop alternatives as free software!</p> Lennart PoetteringMon, 27 Nov 2006 20:39:00 +0100tag:0pointer.net,2006-11-27:/blog/projects/ubuntu-vs-free-software.htmlprojectsDear Lazyweb!https://0pointer.net/blog/mexico-lamp.html <p>Let's see how well Lazyweb works for me!</p> <p>One of the nicest types of lamps I know is depicted on this photo:</p> <p><img src="http://0pointer.de/photos/galleries/Various/lq/img-2.jpg" width="640" height="427" alt="mexico lamp" /></p> <p>This lamp is built from a number (16 or so, it's so difficult to count) of identical shapes which are put together (a mano) in a very simple, mathematical fashion. No glue or anything else is need to make it a very robust object. The lamp looks a little bit like certain Julia fractals, its geometrical structure is just beautiful. Every mathematical mind will enjoy it.</p> <p>This particular specimen has been bought from a street dealer in Mexico City, and has been made of thin plastic sheets. I saw the same model made from paper on a market near Barcelona this summer (during GUADEC). Unfortunately I didn't seize the chance to buy any back then, and now I am regretting it!</p> <p>I've been trying to find this model in German and US shops for the last months (Christmas is approaching fast!) but couldn't find a single specimen. I wonder who designed this ingenious lamp and who produces it. It looks like a scandinavian design to me, but that's just an uneducated guess.</p> <p>If you have any information about this specific lamp model, or could even provide me with a pointer where to buy or how to order these lamps in/from Germany, please leave a comment to this blog story, or write me an email to <tt>mzynzcr (at) 0pointer (dot) de</tt>! Thank you very much!</p> Lennart PoetteringSat, 25 Nov 2006 00:15:00 +0100tag:0pointer.net,2006-11-25:/blog/mexico-lamp.htmlmiscuds-mtv -> San Franciscohttps://0pointer.net/blog/projects/uds-mtv.html <p>Is anyone who's attending the Ubuntu Developers Summit in Mountain View right now heading for San Francisco tomorrow? I plan to stay a few days in the city to do sight seeing and stuff. Please catch me at the conference today if you are interested to join me visiting San Francisco!</p> Lennart PoetteringFri, 10 Nov 2006 19:01:00 +0100tag:0pointer.net,2006-11-10:/blog/projects/uds-mtv.htmlprojectsCui Bono?https://0pointer.net/blog/projects/cui-bono.html <p>So, you thought that only Linux users (and other alternative OS zealots) would benefit from <a href="http://0pointer.de/blog/projects/s270ctrl">reverse engineered Windows drivers</a>? Ha! Far from the truth, it's the <a href="http://www.msi-forum.de/thread.php?threadid=24813&amp;gthreadview=0&amp;hilight=&amp;hilightuser=0&amp;page=1#post199381">Windows users themselves</a> who are benefitting. (Sorry, that link is in German)</p> <p>Too bad that this specific Windows port actually infringes my copyrights since it links my GPL'ed code against the non-free <a href="http://www.logix4u.net/inpout32.htm">inpout32.dll</a>. And the guy who did that port doesn't even think it's necessary to put his email address anywhere.</p> Lennart PoetteringWed, 25 Oct 2006 18:12:00 +0200tag:0pointer.net,2006-10-25:/blog/projects/cui-bono.htmlprojectsMSI Laptop Owners!https://0pointer.net/blog/projects/megawiki.html <p>MSI Laptop Owners! Join us and extend the <a href="http://megawiki.org/">MegaWiki</a>, the new Wiki for all kinds of information on Linux on MSI MegaBooks! (and all MSI built laptops sold under other brands)</p> <p>The MegaWiki is still rather empty but we hope that it will soon grow as large as our inspiration, the <a href="http://thinkwiki.org/">ThinkWiki</a> which collects information about IBM ThinkPads. For that we need your help!</p> <p>This site will be the new home of the MSI laptop drivers (backlight control, rfkill) and provide modified ACPI DSDTs to fix a few BIOS errors. And more!</p> Lennart PoetteringWed, 25 Oct 2006 13:58:00 +0200tag:0pointer.net,2006-10-25:/blog/projects/megawiki.htmlprojectsConferences: UDS, FOMS and LCAhttps://0pointer.net/blog/projects/conferences.html <p>To my surprise I have been invited to the Ubuntu Developers Summit in Mountain View early next month (as a "ROCKSTAR", to quote Mark), to promote <a href="http://pulseaudio.org/">PulseAudio</a>. And that although I am not an Ubuntu developer, nor even much of an Ubuntu user. I'll be available for discussing everything Multimedia/<a href="http://pulseaudio.org/">PulseAudio</a> related. While I've not been invited because of my involvement in Avahi/Zeroconf I will, of course, also be available for discussion of these topics. As it appears, Canonical is <a href="http://0pointer.de/blog/projects/launchpad-stole-my-name.html">not resentful</a>, or maybe it's just their way to bribe me into registering with Launchpad? ;-)</p> <p>After UDS I plan to stay a few more days in San Francisco to visit the city. Can anyone point me to cheap accomodation in SF, or perhaps even lives in SF and has room where I could sleep?</p> <p>In addition my PulseAudio presentation has been accepted at <a href="http://lca2007.linux.org.au/">linux.conf.au 2007</a>. At <a href="http://live.gnome.org/Sydney2007">GNOME.conf.au</a> I hope to give another presentation, together with Trent Lloyd about <a href="http://avahi.org/">Avahi</a>, everyone's favourite Zeroconf implementation. And finally I plan to give yet another presentation, again about PulseAudio, at <a href="http://www.annodex.org/events/foms2007">FOMS 2007</a>, the <i>Foundations of Open Media Software</i> conference, which happens shortly before linux.conf.au, also in Sydney. <b>FOMS is still looking for more people to speak at the conference, so, please go to <a href="http://www.annodex.org/events/foms2007/Main/CFP">their CFP page</a> and send in your proposal if you have something to talk about!</b></p> Lennart PoetteringTue, 24 Oct 2006 18:43:00 +0200tag:0pointer.net,2006-10-24:/blog/projects/conferences.htmlprojectsOne fring to rule them all...https://0pointer.net/blog/projects/fring2.html <p><a href="http://0pointer.de/blog/projects/fring.html">A while ago</a> I played around with Cairo and created a Python tool <tt>fring</tt>, similar to KDE's <a href="http://www.methylblue.com/filelight/">Filelight</a>, however not interactive and very simple. Fr&#233;d&#233;ric Back took my code and gave it a little GUI love, and this is the result:</p> <p><a href="http://0pointer.de/public/fring-large.png"><img src="http://0pointer.de/public/fring-small.png" width="300" height="234" alt="fring screenshot" /></a></p> <p>Fr&#233;d&#233;ric added a nice interactive GTK GUI and a fully asynchronous directory walker based on Gnome-VFS which runs in a background thread and thus doesn't block the UI. This makes the user interface snappier than Filelight's ever was. It's a lot of fun to navigate your directories like this!</p> <p>I would have liked to post a screencast of the new <tt>fring</tt> in action here, to show how snappy it is. But unfortunately both <a href="http://people.freedesktop.org/~company/byzanz/">Byzanz</a> and <a href="http://live.gnome.org/Istanbul">Istanbul</a> failed horribly on my 16bpp display.</p> <p>The current version of <tt>fring</tt> is not yet polished for a public release. In the meantime, you can get the sources from the SVN:</p> <pre>svn checkout svn://svn.0pointer.de/fring/trunk fring</pre> <p>Yes, I am aware that a future version of Baobab will offer a similar view of the filesystem. However, it just was so much fun to hack on <tt>fring</tt>, and due to the power of Python it was so easy and quick to develop this tool, that we just couldn't resist to do it.</p> Lennart PoetteringThu, 19 Oct 2006 00:43:00 +0200tag:0pointer.net,2006-10-19:/blog/projects/fring2.htmlprojectsUpdateshttps://0pointer.net/blog/projects/stuff.html <p>Various, unrelated news:</p> <p>Thanks to <a href="http://www.der-marv.de/">Marvin Stark</a> my project <a href="http://packages.debian.org/unstable/utils/syrep">syrep</a> is now available in Debian. As you might know all the cool kids have written their own distributed revision control systems. This is my contribution on this topic. Although I started to work on it four years ago <a href="http://0pointer.de/lennart/projects/syrep/">syrep</a> is still unrivaled and unbeaten in its specific feature set. (Which is admittedly very different from the feature set of most other software in this area.)</p> <p>Thanks to CJ van den Berg and Sjoerd Simons (and a few others from <tt>#pulseaudio</tt>) <a href="http://pulseaudio.org/">PulseAudio</a> is now <a href="http://packages.debian.org/unstable/sound/pulseaudio">available in Debian</a>, the auxiliary GUI tools like <a href="http://0pointer.de/lennart/projects/pavucontrol/">pavucontrol</a> seem to be still missing. Nonetheless: it's now easier then ever to try PulseAudio:</p> <pre>sudo aptitude install pulseaudio \ pulseaudio-module-hal \ pulseaudio-esound-compat \ pulseaudio-utils \ libgstreamer-plugins-pulse0.10-0 \ pulseaudio-module-gconf \ pulseaudio-module-x11 \ pulseaudio-module-zeroconf</pre> <p>For the next months I will focus on my <i>Diplomarbeit</i> (German equivalent of a master thesis). Due to this I passed maintainership of <a href="http://avahi.org/">Avahi</a> to <a href="http://lathiat.livejournal.com/">Trent Lloyd</a> and of <a href="http://pulseaudio.org/">PulseAudio</a> to <a href="http://drzeus.cx/">Pierre Ossman</a>. I hope to resume maintainership of both projects in January.</p> <p>My first <a href="http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8c4c731a89ea6458001f48033f8988447736fb38">non-trivial kernel patch</a> has been merged into Linus' kernel, although the 2.6.19 merge window was already closed. I take this as birthday present from Linus.</p> <p>If you have a laptop (such as the <a href="http://0pointer.de/lennart/tchibo.html">MSI S270</a>) with Ricoh SD/MMC interface (not one of the new controllers which are SDHCI compatible, but the old ones where the SD/MMC is a virtual PCMCIA slot identifying itself as <tt>Bay1Controller</tt>), then please support me in writing a Linux driver for it and request the necessary documentation and datasheets from Ricoh. For more information on this issue see <a href="https://tango.0pointer.de/pipermail/s270-linux/2006-October/thread.html">this posting on the s270-linux mailing list</a>, and <a href="https://tango.0pointer.de/pipermail/s270-linux/2006-October/000045.html">this followup</a>.</p> <p>That's all for now.</p> Lennart PoetteringWed, 18 Oct 2006 19:46:00 +0200tag:0pointer.net,2006-10-18:/blog/projects/stuff.htmlprojectsavahi-autoipd Released and 'State of the Lemur'https://0pointer.net/blog/projects/avahi-0.6.14.html <p>A few minutes ago I released <a href="http://avahi.org/">Avahi</a> 0.6.14 which besides other, minor fixes and cleanups includes a new component <a href="http://avahi.org/download/avahi-autoipd.8.xml"><tt>avahi-autoipd</tt></a>. This new daemon is an implementation of <a href="http://files.zeroconf.org/rfc3927.txt">IPv4LL</a> (aka RFC3927, aka APIPA), a method for acquiring link-local IP addresses (those from the range 169.254/16) without a central server, such as DHCP.</p> <p>Yes, there are already plenty Free implementations of this protocol available. However, this one tries to do it right and integrates well with the rest of Avahi. For a longer rationale for adding this tool to our distribution instead of relying on externals tools, please read <a href="http://lists.freedesktop.org/archives/avahi/2006-September/000863.html">this mailing list thread</a>.</p> <p>It is my hope that this tool is quickly adopted by the popular distributions, which will allow Linux to finally catch up with technology that has been available in Windows systems since Win98 times. If you're a distributor please follow <a href="http://avahi.org/wiki/AvahiAutoipd">these notes</a> which describe how to integrate this new tool into your distribution best.</p> <p>Because <tt>avahi-autoipd</tt> acts as <tt>dhclient</tt> plug-in by default, and only activates itself as last resort for acquiring an IP address I hope that it will get much less in the way of the user than previous implementations of this technology for Linux.</p> <h4>State of the Lemur</h4> <p>Almost 22 months after my first SVN commit to the flexmdns (which was the name I chose for my mDNS implementation when I first started to work on it) source code repository, 18 months after Trent and I decided to join our two projects under the name "Avahi" and 12 months after the release of Avahi 0.1, it's time for a little "State of the Lemur" post.</p> <p>To make it short: Avahi is ubiquitous in the Free Software world. <tt>;-)</tt></p> <p>All major (Debian, Ubuntu, Fedora, Gentoo, Mandriva, OpenSUSE) and many minor distributions have it. A quick Google-based poll I did a few weeks ago shows that it is part of at least <a href="http://avahi.org/wiki/AboutAvahi#Distributions">19 different distributions</a>, including a range of embedded ones. The list of <a href="http://avahi.org/wiki/Avah4users#SoftwareMakinguseofAvahi">applications making native use</a> of the Avahi client API is growing, currently bearing 31 items. That list does not include the legacy HOWL applications and the applications that use our Bonjour compatibility API which can run on top of Avahi, hence the real number of applications that can make use of Avahi is slightly higher. The first commercial hardware appliances which include Avahi are slowly appearing on the market. I know of at least three such products, one being <a href="http://www.excito.com/products.html">Bubba</a>.</p> <p><i>If you package Avahi for a distribution, add Avahi support to an application, or build a hardware appliance with Avahi, please make sure to add an item to the respective lists linked above, it's a Wiki. Thank you! (Anonymous registration without Mail address required, though) </i></p> Lennart PoetteringThu, 14 Sep 2006 00:40:00 +0200tag:0pointer.net,2006-09-14:/blog/projects/avahi-0.6.14.htmlprojectsPlaying with Cairohttps://0pointer.net/blog/projects/fring.html <p>Play around with <a href="http://cairographics.org/">Cairo</a>: <b>Check!</b></p> <p>One thing that has been sitting on my TODO list for a very long time was playing around with Cairo. No longer! Yesterday I spent a little time on hacking a Cairo based equivalent of <a href="http://www.methylblue.com/filelight/">KDE's Filelight</a> (Which BTW is one of the two programs that KDE has but GNOME really lacks, the other being <a href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi">KCacheGrind</a>). The result after two hours is this:</p> <p><a href="http://0pointer.de/public/fring.png"><img src="http://0pointer.de/public/fring-small.png" width="300" height="225" alt="Fring Screenshot" /></a></p> <p>This screenshot shows the development tree of my <a href="http://0pointer.de/lennart/projects/syrep/">Syrep tool</a>.</p> <p>This tool has definitely nicer anti-aliased graphics than Filelight, doesn't it? The source code is here: <a href="http://0pointer.de/public/fring.py"><tt>fring.py</tt></a>. Anyone interested in turning this into a proper GNOME application?</p> Lennart PoetteringWed, 13 Sep 2006 16:43:00 +0200tag:0pointer.net,2006-09-13:/blog/projects/fring.htmlprojectsA few updates on PulseAudiohttps://0pointer.net/blog/projects/pulse-news.html <p>Thanks to Marc-Andre Lureau there's now a <a href="http://bugzilla.gnome.org/show_bug.cgi?id=348572">jhbuild file for PulseAudio</a>. And there is <a href="http://live.gnome.org/PulseAudio">this (little bit chaotic) Wiki page</a> in GNOME Live! about the relation of PulseAudio and GNOME.</p> <p>A few weeks ago I wrote a new page for our Wiki where I tried to describe the steps necessary to get the most out of PulseAudio. It's called the <a href="http://pulseaudio.org/wiki/PerfectSetup">Perfect Setup</a>.</p> <p>A few minutes ago I released <a href="http://pulseaudio.org/">PulseAudio 0.9.5</a> and new versions of the auxiliary tools. The changelog:</p> <ul> <li>Add module-hal-detect, a module that detects all local sound hardware using <a href="http://freedesktop.org/wiki/Software_2fhal">HAL</a> and loads the necessary modules. Handles hot-plug and hot-removal of audio devices. (Contributed by Shahms E. King)</li> <li>Add shared memory transfer method for local clients</li> <li>Update module-volume-restore to automatically restore the output device last used by an application in addition to the volume it last used</li> <li>Add a new module module-rescue-streams for automatically moving streams to another sink/source if the sink/source they are connected to dies</li> <li>Add support for moving streams "hot" between sinks/sources</li> <li>Reduce memory consumption and CPU load as result of Valgrind/Massif profiling</li> <li>Add new module module-gconf for reading additional configuration statements from GConf</li> <li>Fix module-tunnel to work with the latest protocol</li> <li>Miscellaneous fixes</li> </ul> <p>One of the nicest new features of PulseAudio 0.9.5 is HAL integration (which has been contributed by Shahms King). PulseAudio will now automatically detect all available sound devices and will make use of them. It supports both hot-plug and hot-remove.</p> <p>Another nice feature is the GConf integration which allowed us to add another nice application to the PulseAudio toolset: the <i>PulseAudio Preferences</i> utility:</p> <p><a href="http://0pointer.de/lennart/projects/paprefs//screenshot.png"><img src="http://0pointer.de/public/paprefs-small.png" width="172" height="200" alt="paprefs screenshot" /></a></p> <p>The idea is to have a simple, nice configuration dialog that allows configuration of the more exotic features of PulseAudio which we do not enable by default due to security considerations or to not confuse the user. Right now a lot of features are hidden behind non-trivial configuration file statements. This preferences tool shall make them available for the users which are not so keen on editing configuration files.</p> <p>Playing around with <a href="http://valgrind.org/">Valgrind</a>'s Massif tool and <a href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi">KCachegrind</a> I did a little bit of memory and perfomance profiling of the PulseAudio daemon. The 0.9.5 release contains a lot of optimizations which are result of this work.</p> <p>Before:</p> <a href="http://0pointer.de/public/massif-pulseaudio.png"><img src="http://0pointer.de/public/massif-pulseaudio-small.png" width="261" height="200" alt="Massif before" /></a> <p>After:</p> <a href="http://0pointer.de/public/massif-pulseaudio2.png"><img src="http://0pointer.de/public/massif-pulseaudio2-small.png" width="274" height="200" alt="Massif after" /></a> <p>These plots show the memory consumption against the time, from starting the server, to playing stream, to stopping the stream and shutting down the server again. The major improvement was actually an update to <a href="http://www.mega-nerd.com/SRC/index.html">libsamplerate</a> done by its maintainer to improve the memory handling of that library. (He didn't release an updated version of his library containing the changes shown in the plots yet).</p> <p>PulseAudio had the nice feature of remembering the playback volume of every application for quite a while. Starting with 0.9.5 PulseAudio it also remembers the output device for every application. Together with an updated Volume Control tool which now allows moving streams between sinks while they are played this can be used to configure a ruleset like "Ekiga always on the USB headset, Rhytmbox always on the external speakers" very intuitively and easily:</p> <p><a href="http://0pointer.de/public/pavucontrol-move.png"><img src="http://0pointer.de/public/pavucontrol-move-small.png" width="193" height="200" alt="pavucontrol screenshot" /></a></p> <p>And here's a final screenshot showing all the tools we currently have for PulseAudio 0.9.5.</p> <p><a href="http://0pointer.de/public/pulse-screenshot.png"><img src="http://0pointer.de/public/pulse-screenshot-small.png" width="500" height="200" alt="PA Screenshot" /></a></p> Lennart PoetteringSun, 27 Aug 2006 02:19:00 +0200tag:0pointer.net,2006-08-27:/blog/projects/pulse-news.htmlprojectsLaunchpad is Evilhttps://0pointer.net/blog/projects/launchpad-stole-my-name.html <p>I always think twice before entering my name in any web form or posting to a mailing list. Is the web site/list respectable? Do the owners of the web site have any commercial interest in my name (spam, marketing, ...)? Would I ever regret that my name can be found with Google in context with this web site/mailing list? If I enter my name is it used for collecting data about me? Is there any reasonable privacy policy?</p> <p>Often enough I refrain from entering my name after deciding that the answers to these questions are unsatisfactory. I like to be in control of my name. If I am not confident that I remain in control I don't enter my name to any service.</p> <p>Recently it came to my attention that Canonical decided to create <a href="https://launchpad.net/people/mzqrovna">an account (!) for me</a> in their commercial, proprietary bug tracker called "Launchpad". I never asked for one! I never even considered having one, because their service clearly is nothing that would pass the tests mentioned above. They are a commercial service, my account data is apparently "content" for them, they don't seem to have any privacy policy. (At least I couldn't find any, the navigation is pretty crappy.)</p> <p>Canonical's nimbus of being "the good guys" doesn't hinder them to incorporate data from free sources (apparently they got my data from the Debian BTS) and make a commercial service of it, without even asking the original contributors if that would be OK with them, or if it is OK to incorporate their name or personal profile in the service. Apparently Canonical is not much better than a common spam harvester: generating personal profiles for business, without consent of the "victim".</p> <p><b>If anyone from Canonical reads this</b>: It is not OK for me to use my name as "content" for your commercial, proprietary service. Please remove any reference to my name from your "account" database. I don't want to have a Launchpad account. I don't plan to use Launchpad. Let me decide if I ever want to join! Thank you very much.</p> <p>Update: I especially dislike the fact that they created an account for me in a service where Hitler apparently already has six (!) accounts. I am very sure that I don't want to be part of that community.</p> Lennart PoetteringSat, 26 Aug 2006 22:27:00 +0200tag:0pointer.net,2006-08-26:/blog/projects/launchpad-stole-my-name.htmlprojectsAvahi porter for Win32 needed!https://0pointer.net/blog/projects/avahi-win32.html <p>Are you a Win32 hacker and looking for something worthwhile to do in Free Software? We're eagerly looking for someone to port <a href="http://avahi.org/">Avahi</a> to that platform. Now that <a href="http://svn.sourceforge.net/viewvc/windbus/trunk/">D-Bus is available on Win32</a>, the last major stumbling block for this feat is no more.</p> Lennart PoetteringFri, 25 Aug 2006 23:18:00 +0200tag:0pointer.net,2006-08-25:/blog/projects/avahi-win32.htmlprojectsAvahi 0.6.13 releasedhttps://0pointer.net/blog/projects/avahi-0.6.13.html <div><img src="http://avahi.org/chrome/site/avahi-trac.png" width="200" height="96" style="float:right; border: 0px; margin: 10px" alt="Avahi Logo" /></div> <p>I am happy to bring you yet another release of <a href="http://avahi.org/">Avahi</a>, everyone's favourite Zeroconf stack.</p> <ul> <li> Add a new D-Bus method for changing the mDNS host name during runtime. This functionality is only available to members of the UNIX group "netdev", which is the same access group that is enforced by GNOME's NetworkManager daemon. Since NM will probably be the most prominent user of this new method, we decided to limit access to the same group. The access group can be set by passing --with-avahi-priv-access-group= to "configure". If you need more sophisticated access control you can freely edit /etc/dbus/system.d/avahi-dbus.conf.</li> <li> Add a new utility "avahi-set-host-name" which is a command line wrapper around the aforementioned SetHostName() method.</li> <li> Bonjour API compatibility library: <ul> <li> Implement DNSServiceUpdateRecord()</li> <li> Allow passing NULL as callback function for DNSServiceRegister()</li> <li> Implement subtype registration in DNSServiceRegister() in a way that is compatible with Bonjour.</li> <li> Update to newer copy of dns_sd.h</li> </ul></li> <li> If the host name changes update names of static services wich contain wildcards.</li> <li> Don't build documentation about embedding the Avahi mDNS stack into other programs by default. This is a feature used only by embedded developers. Pass --enable-core-docs to "configure" to enable building these docs, like in Avahi &lt;= 0.6.12.</li> <li> Build Qt documentation only when Qt support is enabled in the configuration. Same for GLib.</li> <li> Change algorithm used to find a new host name on conflict. In Avahi &lt;= 0.6.12 a conflicting host name of "foobar" would be changed to the new name "foobar2". With 0.6.13 "foobar-2" will be picked instead. This follows Bonjour's behaviour and has the advantage not confusing people with regular host names ending in digits.</li> <li> Don't disable all static services when SIGHUP is recieved.</li> <li> Fix build when Avahi is configured without Gtk+ but with Python support</li> <li>Fix build on MacOS X</li> <li>Support using Solaris DBM instead of gdbm for the service type database. The latter is still recommended</li> <li>Minor other fixes and documentation updates</li> </ul> <p>The relevant NetworkManager bug about <tt>SetHostName()</tt> is <a href="http://bugzilla.gnome.org/show_bug.cgi?id=352828">#352828</a>.</p> <p>And <a href="http://avahi.org/report/1">our bug tracker</a> is back to only <i>two</i> open bugs for Avahi. That's a good feeling, I can tell you!</p> Lennart PoetteringFri, 25 Aug 2006 21:59:00 +0200tag:0pointer.net,2006-08-25:/blog/projects/avahi-0.6.13.htmlprojectsMSI S270 Laptop Linux Kernel Driverhttps://0pointer.net/blog/projects/s270-kernel.html <p>Earlier this year I worked on <a href="http://0pointer.de/blog/projects/s270ctrl">reverse engineering</a> the brightness control of my <a href="http://0pointer.de/lennart/tchibo.html">MSI S270 laptop</a>. Turning this work into a proper kernel driver was still left to be done. Until yesterday... The result of yesterday's work are <a href="http://lkml.org/lkml/2006/8/9/432">two</a> <a href="http://lkml.org/lkml/2006/8/9/431">kernel</a> patches I already posted for upstream inclusion.</p> <p>If you want to test these drivers, download the latest kernel patches:</p> <ol> <li><a href="http://0pointer.de/public/acpi-ec-transaction.patch"><tt>acpi-ec-transaction.patch</tt></a></li> <li><a href="http://0pointer.de/public/acpi-s270.patch"><tt>acpi-s270.patch</tt></a></li> </ol> <p>The two patches apply to kernel 2.6.17. After patching activate "MSI S270 Laptop Extras" under "Device Drivers"/"Misc devices" and recompile and install. After loading the <tt>s270</tt> module, you now have a backlight class driver exposing its innards in <tt>/sys/class/backlight/s270bl/</tt>. For changing the screen brightness issue as <tt>root</tt>:</p> <pre>echo 8 > /sys/class/backlight/s270bl/brightness</pre> <p>This will set the screen brightness to maximum. The integer range is 0..8.</p> <p>In addition to this backlight class driver we export a platform driver which allows reading the current state of the WLAN/Bluetooth subsystem. The platform drivers also allows toggling the <i>automatic brightness control</i> feature: </p> <pre> cat /sys/devices/platform/s270pf/wlan # Show WLAN status cat /sys/devices/platform/s270pf/bluetooth # Show Bluetooth status echo 1 > /sys/devices/platform/s270pf/auto_brightness # Enable automatic brightness control</pre> <p>If the driver refuses to load (returning ENODEV) and you are sure you have an MSI S270 the machine is probably not recognized correctly by its <a href="http://en.wikipedia.org/wiki/Desktop_Management_Interface">DMI</a> data. In that case you can pass <tt>force=1</tt> to the driver which will force the driver load even when the DMI data doesn't match. YMMV. If everything works correctly please make sure to send me the output of <tt>dmidecode</tt>, so that I can add the DMI data to the list of known laptops in the driver.</p> <p>There might even be a chance that this driver works on other MSI laptop models, too (such as S260). YMMV. But don't come running when the driver causes your machine to explode! MSI laptops such as the S270 or S260 are often sold as OEM hardware under different brands (such as Cytron/TCM/Medion/Tchibo MD96100 or "SAM2000"), so if your laptop looks remotely like <a href="http://0pointer.de/lennart/tchibo.html">this one</a> and <tt>dmidecode | grep MICRO-STAR</tt> yields at least a single line, and you are adventurous than you might want to test this driver on it. And don't forget to send me your <tt>dmidecode</tt> output if it works for you!</p> <p>Unfortunately HAL (at least in my version 0.5.7) doesn't support the generic backlight device class yet, which means no <tt>gnome-power-manager</tt> support for now.</p> <p>Although this driver is based on reverse engineered data it should be legally safe even in the US. After I did my initial work on the S270 controls MSI supplied me with a register table of their <i>ACPI Embedded Controller</i> (which is what this driver interfaces with) and one of their engineers even tested my work.</p> <p>Last but not least I created a <a href="https://tango.0pointer.de/mailman/listinfo/s270-linux">mailing list for discussion of Linux on the MSI S270</a>. Please join if you run Linux on one of these machines! I will announce future driver work for the S270 there.</p> Lennart PoetteringThu, 10 Aug 2006 19:34:00 +0200tag:0pointer.net,2006-08-10:/blog/projects/s270-kernel.htmlprojectsApple Bonjour adopts the Apache License 2.0https://0pointer.net/blog/projects/bonjour-apache-license.html <p>Yesterday <a href="http://bonjour.macosforge.org/">Apple Bonjour</a> <a href="http://lists.apple.com/archives/Darwin-dev/2006/Aug/msg00067.html">has been</a> <a href="http://apple.slashdot.org/article.pl?sid=06/08/07/2359256">released</a> under the Apache License 2.0, replacing the old much criticized (because non-free) APSL licensing.</p> <p>What does this mean for <a href="http://avahi.org/">Avahi</a>? First of all although the <a href="http://en.wikipedia.org/wiki/Apache_license">Apache License</a> is much better than the <a href="http://www.gnu.org/philosophy/apsl.html">APSL</a> it still isn't GPL compatible (at least in the eyes of the FSF), which effectively means that Bonjour still cannot be used by more than <a href="http://freshmeat.net/stats/#license">66% of the Free Software</a> projects available. Secondly Avahi is more powerful in most areas than Bonjour ever was. (In fact, there is only a single feature where Bonjour surpasses us: writable "Wide Area DNS-SD"). Avahi uses all the "hot" Free technologies like <a href="http://www.freedesktop.org/wiki/Software/dbus">D-Bus</a> and a has much better integration in the Linux networking subsystem. Avahi is more secure (<tt>chroot()</tt>...) Avahi is compatible API- and ABI-wise with Bonjour, but not the other way round. Avahi is now part of every major Linux distribution.</p> <p>Avahi is actively developed. The aforementioned Wide Area DNS-SD is currently being worked on by Federico Lucifredi. Since I will write my master thesis about mDNS scalability a lot of additional development will be done for Avahi in the next month.</p> <p>In short: Avahi is here to stay. Apple's move to the Apache license is too little, too late.</p> <p><b>Update:</b> the Bonjour client libraries are BSD licensed, so the 66% argument doesn't hold.</p> Lennart PoetteringTue, 08 Aug 2006 12:38:00 +0200tag:0pointer.net,2006-08-08:/blog/projects/bonjour-apache-license.htmlprojectsZeroConf in Ubuntuhttps://0pointer.net/blog/projects/zeroconf-ubuntu.html <p><i>(Disclaimer: I am not an Ubuntu user myself. But I happen to be the lead developer of <a href="http://avahi.org">Avahi</a>.)</i></p> <p>It came to my attention that Ubuntu <a href="https://lists.ubuntu.com/archives/ubuntu-devel/2006-July/thread.html#19137">is</a> <a href="https://lists.ubuntu.com/archives/ubuntu-devel/2006-July/thread.html#19391">discussing</a> <a href="https://lists.ubuntu.com/archives/ubuntu-devel/2006-July/thread.html#19088">whether</a> <a href="https://lists.ubuntu.com/archives/ubuntu-devel/2006-July/thread.html#19071">to</a> enable Zeroconf/Avahi in default installations. I would like to point out a few things:</p> <p><b>The "No Open Ports" policy:</b> This policy (or at least the way many people interprete it) seems to be thought out by someone who doesn't have much experience with TCP/IP networking. While it might make sense to enforce this for application-level protocols like HTTP or FTP it doesn't make sense to apply it to transport-level protocols such as DHCP, DNS or in this case <a href="http://multicastdns.org/">mDNS</a> (the underlying protocol of Zeroconf/Avahi/Bonjour):</p> <ul> <li>Even the simplest DNS lookup requires the opening of an UDP port for a short period of time to be able to recieve the response. This is usually not visible to the administrator, because the time is too short to show up in <tt>netstat -uln</tt>, but nonetheless it is an open port. (UDP is not session-based (like TCP is) so incoming packets are accepted regardless where they come from)</li> <li>DHCP clients listen on UDP port 68 during their entire lifetime (which in most cases is the same as the uptime of the machine). DHCP may be misused for much worse things than mDNS. Evildoers can forge DHCP packets to change IP addresses and routing of machines. This is definitely something that cannot be done with mDNS. </li> </ul> <p>All three protocols, DNS, DHCP and mDNS, require a little bit of trust in the local LAN. They (usually) don't come with any sort of authentication and they all are very easy to forge. The impact of forged mDNS packets is clearly less dangerous than forged DHCP or DNS packets. Why? Because mDNS doesn't allow you to change the IP address or routing setup (which forged DHCP allows) and because it cannot be used to spoof host names outside the <tt>.local</tt> domain (which forged DNS allows).</p> <p>Enforcing the "No Open ports" policy everywhere in Ubuntu would require that both DNS and DHCP are disabled by default. However, as everybody probably agrees, this would be ridiculous because a standard Ubuntu installation couldn't even be used for the most basic things like web browsing.</p> <p>Oh, and BTW: DNS lookups are usually done by an NSS plugin which is loaded by the libc into every process which uses <tt>gethostbyname()</tt> (the function for doing host name resolutions). So, in effect every single process that uses this function has an open port for a short time. And the DNS client code runs with user priviliges, so an exploit really hurts. <tt>dhclient</tt> (the DHCP client) runs as <tt>root</tt> during the entire runtime, so an exploit of it hurts even more. Avahi in contrast <a href="http://avahi.org/wiki/SecurityConsiderations">runs as its own user and <tt>chroot()</tt>s</a>.</p> <p>It is not my intention to force anyone to use <a href="http://avahi.org/">my software</a>. However, enforcing the "No Open Ports" policy unconditionally is not a good idea. Currently Ubuntu makes exceptions for DHCP/DNS and so it should for mDNS.</p> <p>I do agree that publishing all kinds of local services with Avahi in a default install is indeed problematic. However, if the "No Open Ports" policy is enforced on all other application-level software, there shouldn't be any application that would want to register a service with Avahi.</p> <p>Starting Avahi "on-demand" is not an option either, because it offers useful services even when no local application is accessing is. Most notably this is host name resolution for the local host name. (Hey, yeah, Zeroconf is more than just <i>stealing music</i>.)</p> <p><b>Remember:</b> <a href="http://zeroconf.org/">Zeroconf</a> is about <i>Zero Configuration</i>. Requiring the user to toggle some obscure configuration option before he can use Zeroconf would make it a paradox. Zeroconf was designed to make things "just work". If it isn't enabled by default it is impossible to reach that goal.</p> <p>Oh, and I enabled commmenting in my blog, if anyone wants to flame me on this...</p> Lennart PoetteringWed, 26 Jul 2006 20:59:00 +0200tag:0pointer.net,2006-07-26:/blog/projects/zeroconf-ubuntu.htmlprojectsAnnouncing SECCUREhttps://0pointer.net/blog/projects/seccure.html <p>Yesterday my brother released his second Free Software package, the <a href="http://point-at-infinity.org/seccure/">SECCURE Elliptic Curve Crypto Utility for Reliable Encryption</a>. (<a href="http://en.wikipedia.org/wiki/Recursive_acronym">Recursive acronyms</a>, yay!)</p> <blockquote><p><i>The seccure toolset implements a selection of asymmetric algorithms based on elliptic curve cryptography (ECC). In particular, it offers public key encryption / decryption and signature generation / verification. ECC schemes offer a much better key size to security ratio than classical systems (RSA, DSA). Keys are short enough to make direct specification of keys on the command line possible (sometimes this is more convenient than the management of PGP-like key rings). seccure builds on this feature and therefore is the tool of choice whenever lightweight asymmetric cryptography -- independent of key servers, revocation certificates, the Web of Trust, or even configuration files -- is required.</i></p></blockquote> <p>Anyone willing to work on the <a href="http://bugs.debian.org/378987">Debian RFP</a>?</p> <p>(The first Free Software package of him is <a href="http://point-at-infinity.org/ssss/">ssss</a>, an implementation of <a href="http://en.wikipedia.org/wiki/Secret_sharing">Shamir's secret sharing scheme</a>)</p> Lennart PoetteringThu, 20 Jul 2006 14:00:00 +0200tag:0pointer.net,2006-07-20:/blog/projects/seccure.htmlprojectsGUADEC Sound BOF Slideshttps://0pointer.net/blog/projects/pulse-slides.html <p>Marc-Andre was so kind to upload the <a href="http://etudiant.epita.fr/~lureau_m/GUADEC06-Audio-BOF/">improvised mini-slides</a> we had prepared for GUADEC's sound BOF. Unfortunately there is no recording of the BOF, so this is all we can offer for those interested but who were not able to attend GUADEC.</p> <p>In related news: Thanks to <tt>jat</tt> there is now a native <a href="http://pulseaudio.org/">PulseAudio</a> driver for <a href="http://www.musicpd.org/">MPD</a> (in SVN), and I updated the <a href="http://0pointer.de/public/mplayer-pulse.patch">MPlayer patch</a>, which adds a native PulseAudio driver to MPlayer.</p> Lennart PoetteringSun, 16 Jul 2006 13:53:00 +0200tag:0pointer.net,2006-07-16:/blog/projects/pulse-slides.htmlprojectsPulseAudio Zeroconf support ported to Avahihttps://0pointer.net/blog/projects/pulse-howl-avahi.html <p><a href="http://farragut.flameeyes.is-a-geek.org/articles/2006/07/11/black-out">Diego</a> and others who complained: PulseAudio in SVN now uses Avahi natively for ZeroConf. The old HOWL based code has been removed. </p> Lennart PoetteringFri, 14 Jul 2006 00:06:00 +0200tag:0pointer.net,2006-07-14:/blog/projects/pulse-howl-avahi.htmlprojectsRe: PulseAudio and GNOMEhttps://0pointer.net/blog/projects/pulse-davidz-reply.html <p><a href="http://blog.fubar.dk/?p=71">davidz</a>: Shams King is currently working on HAL support in <a href="http://pulseaudio.org/">PulseAudio</a>. He's planning to extend our <a href="http://pulseaudio.org/wiki/Modules#modulecombine">module-combine</a> to automatically combine all available hardware sound cards found with HAL into a single virtual sound sink. That way, if the user plugs in an USB loudspeaker set it will automatically output the same audio as the internal speakers did before. I believe this is the behaviour most non-technical users would expect from a well designed system.</p> <p>Right now PulseAudio sink names cannot be used to identify the underlying hardware devices, since they are generic names like <tt>alsa_output</tt> or <tt>oss_output2</tt>. However, it might be a good idea to use the ALSA device name (i.e. <tt>alsa_output_hw_0_0</tt>) or even the HAL identifier if it is available. If <a href="http://bugzilla.gnome.org/attachment.cgi?id=58344&amp;action=view">this dialog</a> uses the normal GStreamer <tt>PropertyProbe</tt> API to query the available devices (and does not use HAL directly), we should be able to support this easily in <a href="http://0pointer.de/lennart/projects/gst-pulse/">gst-pulse</a> (right now we support this interface in GstPulseMixer, but not yet in GstPulseSink).</p> <p>Marc-Andre, I wonder how the differentiation between "Sound events", "Music and Movies" and "Audio/Video Conferencing" touches the "role"/"class" model of <a href="http://live.gnome.org/GSmartMix">GSmartMix</a>?</p> <p>Regarding power saving and PulseAudio: First of all, PulseAudio right now is intended to be run per-session, just like <tt>esd</tt> was. However, there is some incomplete support for running it as system-wide instance.</p> <p>I think instead of integrating PulseAudio with <tt>gnome-power-manager</tt> the way you described it is probably a better idea to close the sound device when it is idle regardless if we are in power saving mode or not, and hope that the driver authors fix their stuff to not produce any click or pop sounds when the device is opened or closed. To be honest, all driver/sound card combinations I have access to work properly in this area.</p> <p>In ALSA you usually open devices in O_RDONLY or O_WRONLY mode (and not in O_RDWR) anyway, so falling back to it is not really necessary.</p> Lennart PoetteringThu, 13 Jul 2006 21:54:00 +0200tag:0pointer.net,2006-07-13:/blog/projects/pulse-davidz-reply.htmlprojectsPhotos from GUADEChttps://0pointer.net/blog/photos/guadec2006.html <p><a href="http://0pointer.de/public/guadec-2006-pics/">The few images with GNOME people</a> I made at GUADEC are now online, too.</p> Lennart PoetteringTue, 11 Jul 2006 20:38:00 +0200tag:0pointer.net,2006-07-11:/blog/photos/guadec2006.htmlphotosPhotos from Vilanova/Barcelonahttps://0pointer.net/blog/photos/barcelona.html <p>I finally found the time to sort my photos from Vilanova i la Geltr&uacute; and Barcelona.</p> <p>My <i>Windows of Barcelona</i> series:</p> <p><a href="http://0pointer.de/static/windows.html"><img src="http://0pointer.de/static/windows-small.png" width="150" height="148" alt="Windows of Barcelona" /></a></p> <p>A few other nice shots:</p> <p> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=361"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-361.jpg" width="80" height="120" alt="Photo #361" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=371"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-371.jpg" width="80" height="120" alt="Photo #371" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=366"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-366.jpg" width="80" height="120" alt="Photo #366" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=381"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-381.jpg" width="80" height="120" alt="Photo #381" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=386"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-386.jpg" width="80" height="120" alt="Photo #386" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=222"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-222.jpg" width="80" height="120" alt="Photo #222" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=210"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-210.jpg" width="80" height="120" alt="Photo #210" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=125"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-125.jpg" width="80" height="120" alt="Photo #125" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=137"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-137.jpg" width="80" height="120" alt="Photo #137" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=5"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-5.jpg" width="80" height="120" alt="Photo #5" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=311"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-311.jpg" width="80" height="120" alt="Photo #311" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=301"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-301.jpg" width="80" height="120" alt="Photo #301" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=317"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-317.jpg" width="80" height="120" alt="Photo #317" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=281"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-281.jpg" width="80" height="120" alt="Photo #281" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=269"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-269.jpg" width="80" height="120" alt="Photo #269" /></a> <br /> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=268"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-268.jpg" width="80" height="120" alt="Photo #268" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=89"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-89.jpg" width="80" height="120" alt="Photo #89" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=49"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-49.jpg" width="80" height="120" alt="Photo #49" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=35"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-35.jpg" width="80" height="120" alt="Photo #35" /></a> <a href="http://0pointer.de/photos/?gallery=Barcelona&amp;photo=95"><img src="http://0pointer.de/photos/galleries/Barcelona/thumbs/img-95.jpg" width="80" height="120" alt="Photo #95" /></a> </p> <p>These are: <br /> 1st row: <i>Casa Mil&agrave;</i>; dito; dito; dito; dito; <br /> 2nd row: <i>Palau de la M&uacute;sica Catalana</i>; dito; <i>Mies van der Rohe</i> Pavilion; dito; Vilanova Lighthouse; <br /> 3rd row: <i>Sagrada Fam&iacute;lia</i>; dito; dito; <i>Hospital de Sant Pau</i>; dito; <br /> 4th row: <i>Sagrada Fam&iacute;lia</i>, seen from <i>Sant Pau</i>; City Center/<i>Barri G&ograve;tic</i>; dito; dito; <i>Pla&ccedil;a Reial</i></p> <p>A panoramic view of Barcelona photographed from the Montjuic towards the north:</p> <p><a href="http://0pointer.de/static/montjuic1.html"><img src="http://0pointer.de/static/montjuic1-small.jpeg" width="1024" height="146" alt="Barcelona Panorama" /></a></p> <p>Those "thunderclouds" on the right side of the image are actually a result of not using the same exposure settings on all photos that are part of the panorama. Which is a mistake I didn't repeat with my second panoramic view, which again shows Barcelona from the Montjuic, but this time towards the east:</p> <p><a href="http://0pointer.de/static/montjuic2.html"><img src="http://0pointer.de/static/montjuic2-small.jpeg" width="1024" height="146" alt="Barcelona Panorama 2" /></a></p> <p>Dont miss the <a href="http://0pointer.de/photos/?gallery=Barcelona">the entire album</a>!</p> Lennart PoetteringTue, 11 Jul 2006 15:43:00 +0200tag:0pointer.net,2006-07-11:/blog/photos/barcelona.htmlphotosPulseAudio 0.9.2 releasedhttps://0pointer.net/blog/projects/pulse-release.html <p>We're proud to announce the first release of <a href="http://pulseaudio.org/">PulseAudio</a> after the name change from <i>Polypaudio</i>. Besides a variety of <tt>sed -i -e s/polyp/pulse/g</tt> changes it mostly contains minor bugfixes. <a href="http://0pointer.de/lennart/projects/pulseaudio/pulseaudio-0.9.2.tar.gz">Get it while it is hot!</a></p> <p>In related news PulseAudio now gained its own domain and a new Trac-based homepage: <a href="http://pulseaudio.org/"><tt>http://pulseaudio.org/</tt></a>. And thanks to Rafael Jannone and Pierre Ossman we now have a logo:</p> <img src="http://pulseaudio.org/chrome/site/patitle.png" width="470" height="85" alt="PulseAudio Logo" /> <p>Together with PulseAudio 0.9.2 we released updated versions of all the <a href="http://pulseaudio.org/wiki/AboutPulseAudio#RelatedSoftware">auxiliary GUI tools</a>. A new utility has been released as well, named <a href="http://0pointer.de/lennart/projects/padevchooser/"><i>PulseAudio Device Chooser</i></a>. It installs a tray icon and allows the user to quickly change the sound server attached to the local X11 display, showing a list of servers that is accumulated using ZeroConf service browsing. In addition it allows you to quickly start one of the other GUI tools and shows notification whenever a new PulseAudio server/sink/source appears on the network. Everybody loves screenshots:</p> <img src="http://0pointer.de/lennart/projects/padevchooser/screenshot.png" width="865" height="498" alt="PulseAudio Device Chooser Screenshot" /> Lennart PoetteringSun, 09 Jul 2006 12:39:00 +0200tag:0pointer.net,2006-07-09:/blog/projects/pulse-release.htmlprojectsGUADEC Sound BOF, Part 2https://0pointer.net/blog/projects/guadec-bof2.html <p>There has been some confusion about the date of the Sound BOF, since the <a href="http://live.gnome.org/GUADEC2006/AfterHoursWorkshops">BOF Wiki</a> said a different date than my blog <a href="http://0pointer.de/blog/projects/guadec-bof.html">story of yesterday</a>. To make this clear: the BOF will happen on friday, 4 p.m.</p> Lennart PoetteringWed, 28 Jun 2006 17:07:00 +0200tag:0pointer.net,2006-06-28:/blog/projects/guadec-bof2.htmlprojectsGUADEC Sound BOF on Fridayhttps://0pointer.net/blog/projects/guadec-bof.html <p>There will be a Linux/Gnome <a href="http://live.gnome.org/GUADEC2006/AfterHoursWorkshops">Sound BOF </a> on Friday, 4:00 p.m. I will be there, promoting PulseAudio, as will be Marc-Andre of GSmartMix frame. Everyone interested in the future of audio in Gnome is welcome to join us!</p> Lennart PoetteringWed, 28 Jun 2006 03:48:00 +0200tag:0pointer.net,2006-06-28:/blog/projects/guadec-bof.htmlprojectsAttending GUADEChttps://0pointer.net/blog/projects/guadec-2006.html <p>Due to the generosity of the GNOME Foundation I have been able to get to the GUADEC 2006 this year. I'd like to thank Jeff Waugh and Quim Gil for the "last-minute" funding of my trip to Vilanova, and all the sponsors who actually are providing the funds. If anyone wants to talk to me about <a href="http://avahi.org/">Avahi</a> and/or <a href="http://0pointer.de/lennart/projects/polypaudio/">PulseAudio (aka Polypaudio)</a> (or any of my <a href="http://0pointer.de/lennart">other projects</a>), just try to find and speak to me. (Bungalow 870)</p> <p>In related news, the <a href="http://pulseaudio.org/">new PulseAudio homepage</a> will be "inaugurated" soon, becoming the official new home of PulseAudio/Polypaudio as soon as we release 0.9.2, which hopefully will be pretty soon.</p> Lennart PoetteringSat, 24 Jun 2006 17:33:00 +0200tag:0pointer.net,2006-06-24:/blog/projects/guadec-2006.htmlprojectsTPFKAPA: The Project Formerly Known as Polypaudiohttps://0pointer.net/blog/projects/pulse.html <p>It came to our attention that some people really disliked the name of <a href="http://0pointer.de/lennart/projects/polypaudio/">Polypaudio</a>, because it reminded them of that <a href="http://en.wikipedia.org/wiki/Polyp_%28medicine%29">medical condition</a>, though the software was actually named after the <a href="http://en.wikipedia.org/wiki/Polyp">sea dweller</a>. I actually liked that double entendre, but many did not and expressed concerns that the name would hinder Polypaudio's adoption. After a long discussion on <tt>#polypaudio</tt> we came to the conclusion that a name change is a good idea in this case. Name changes are usually a bad idea, but this time it's worth it, we think.</p> <p>The new name we agreed on is <i>PulseAudio</i>, or shorter just <i>Pulse</i>. It has the nice advantage that it abbreviates to <i>pa</i>, just as <i>Polypaudio</i> did. This allows us to keep source code compatiblity (and binary compatibility to a certain degree) with the current releases of Polypaudio, because the symbol prefix can stay <tt>pa_</tt>. In addition the auxiliary tools <a href="http://0pointer.de/lennart/projects/paman/">paman</a>, <a href="http://0pointer.de/lennart/projects/pavucontrol/">pavucontrol</a>, <a href="http://0pointer.de/lennart/projects/pavumeter/">pavumeter</a> need not to be renamed.</p> <p>We will try to make the transition as smooth as possible and would like to apologize to all the packagers, who need to rename their packages now.</p> <p>The next release of Polypaudio (0.9.2) will be a bugfix release and be the first to bear the new name: <i>PulseAudio 0.9.2</i>.</p> <p><i>Polypaudio is dead. Long live PulseAudio!</i></p> Lennart PoetteringFri, 16 Jun 2006 18:22:00 +0200tag:0pointer.net,2006-06-16:/blog/projects/pulse.htmlprojectsPolypaudio article on LWNhttps://0pointer.net/blog/projects/polypaudio-lwn.html <p>The <a href="http://lwn.net/Articles/185613/">current issue of the Linux Weekly News</a> features a short article about <a href="http://0pointer.de/lennart/projects/polypaudio/">Polypaudio</a>. (The article is not (yet) accessible for free, come back in a week if you aren't an LWN subscriber.)</p> <p>Quoting:</p> <blockquote><p><i>With its support for a wide variety of popular audio utilities, actively developed code, and broad capabilities, the Polypaudio project fills an important role in Linux-based audio development.</i></p></blockquote> Lennart PoetteringThu, 01 Jun 2006 17:44:00 +0200tag:0pointer.net,2006-06-01:/blog/projects/polypaudio-lwn.htmlprojectsHamburg Dockland IIhttps://0pointer.net/blog/photos/dockland2.html <p>Another view of the "Dockland" in Hamburg-Altona:</p> <div> <a href="http://0pointer.de/photos/?gallery=Hamburg%20Dockland&amp;photo=9&amp;exif_style=&amp;show_thumbs="><img src="http://0pointer.de/photos/galleries/Hamburg%20Dockland/lq/img-9.jpg" width="320" height="480" alt="Hamburg Dockland II" /></a> </div> Lennart PoetteringMon, 29 May 2006 22:26:00 +0200tag:0pointer.net,2006-05-29:/blog/photos/dockland2.htmlphotosLooking for a Logohttps://0pointer.net/blog/projects/polypaudio-logo.html <p><a href="http://0pointer.de/lennart/projects/polypaudio/">Polypaudio</a> needs a logo! If you have some time to spare and graphic talent please send us your suggestions! Perhaps something in a nice <a href="http://tango-project.org/">Tango</a> design? See <a href="http://en.wikipedia.org/wiki/Polyp">Wikipedia</a> to for an explanation what a <i>polyp</i> is.</p> <p>Please send your suggestions to <tt>lennart (at) poettering (dot) net</tt> or join <tt>#polypaudio</tt> on freenode.</p> Lennart PoetteringMon, 29 May 2006 15:46:00 +0200tag:0pointer.net,2006-05-29:/blog/projects/polypaudio-logo.htmlprojectsAnyone Interested in Packaging Polypaudio 0.9.0 for Debian?https://0pointer.net/blog/projects/polypaudio-rfp.html <p>I opened an <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=369089">RFP for Polypaudio 0.9.0</a> in the Debian BTS. Anyone interested?</p> <p><i>Update: We have found a volunteer in Franz Pletz.</i></p> Lennart PoetteringSun, 28 May 2006 19:29:00 +0200tag:0pointer.net,2006-05-28:/blog/projects/polypaudio-rfp.htmlprojectsPolypaudio 0.9.0 releasedhttps://0pointer.net/blog/projects/polypaudio-0.9.0.html <p>We are proud to announce <a href="http://0pointer.de/lennart/projects/polypaudio/">Polypaudio 0.9.0</a>. This is a major step ahead since we decided to freeze the current API. From now on we will maintain API compability (or at least try to). To emphasize this starting with this release the shared library sonames are properly versioned. While Polypaudio 0.9.0 is not API/ABI compatible with 0.8 it is protocol compatible.</p> <p>Other notable changes beyond bug fixing, bug fixing and bug fixing are: a new Open Sound System <tt>/dev/dsp</tt> wrapper named <tt>padsp</tt> and a module <tt>module-volume-restore</tt> have been added.</p> <p><tt>padsp</tt> works more or less like that ESOUND tool known as <tt>esddsp</tt>. However, it is much cleaner in design and thus works with many more applications than the original tool. Proper locking is implemented which allows it to work in multithreaded applications. In addition to mere <tt>/dev/dsp</tt> emulation it wraps <tt>/dev/sndstat</tt> and <tt>/dev/mixer</tt>. Proper synchronization primitives are also available, which enables lip-sync movie playback using <tt>padsp</tt> on <tt>mplayer</tt>. Other applications that are known to work properly with <tt>padsp</tt> are <tt>aumix</tt>, <tt>libao</tt>, <tt>XMMS</tt>, <tt>sox</tt>. There are some things <tt>padsp</tt> doesn't support (yet): that's most notably recording, and <tt>mmap()</tt> wrapping. Recording will be added in a later version. <tt>mmap()</tt> support is available in <tt>esddsp</tt> but not in <tt>padsp</tt>. I am reluctant to add support for this, because it cannot work properly when it comes to playback latency handling. However, latency handling this the primary reasoning for using <tt>mmap()</tt>. In addition the hack that is included in <tt>esddsp</tt> works only for Quake2 and Quake3, both being Free Software now. It probably makes more sense to fix those two games than implementing a really dirty hack in <tt>padsp</tt>. Remember that you can always use the original <tt>esddsp</tt> tools since Polypaudio offers full protocol compatibility with ESOUND.</p> <p><tt>module-volume-restore</tt> is a small module that stores the volume of all playback streams and restores them when the applications which created them creates a new stream. If this module is loaded, Polypaudio will make sure that you Gaim sounds are always played at low volume, while your XMMS music is always played at full volume.</p> <p>Besides the new release of Polypaudio itself we released a bunch of other packages to work with the new release:</p> <ul> <li><a href="http://0pointer.de/lennart/projects/gst-polyp/"><tt>gst-polyp</tt> 0.9.0</a>, a Polypaudio plugin for <a href="http://gstreamer.freedesktop.org/">GStreamer 0.10</a>. The plugin is quite sophisticated. In fact it is probably the only sink/source plugin for GStreamer that reaches the functionality of the ALSA plugin that is shipped with upstream. It implements the <tt>GstPropertyProbe</tt> and <tt>GstImplementsInterface</tt> interfaces, which allow <tt>gnome-volume-meter</tt> and other GStreamer tools to control the volume of a Polypaudio server. The sink element listens for <tt>GST_EVENT_TAG</tt> events, and can thus use ID3 tags and other meta data to name the playback stream in the Polypaudio server. This is useful to identify the stream in the <a href="http://0pointer.de/lennart/projects/pavucontrol/">Polypaudio Volume Control</a>. In short: Polypaudio 0.9.0 now offers first class integration into GStreamer.</li> <li><a href="http://0pointer.de/lennart/projects/libao-polyp/"><tt>libao-polyp</tt> 0.9.0</a>, a simple plugin for <a href="http://www.xiph.org/ao/"><tt>libao</tt></a>, which is used for audio playback by tools like <tt>ogg123</tt> and Gaim, besides others.</li> <li><a href="http://0pointer.de/lennart/projects/xmms-polyp/"><tt>xmms-polyp</tt> 0.9.0</a>, an output plugin for XMMS. As special feature it uses the currently played song name for naming the audio stream in Polypaudio.</li> <li><a href="http://0pointer.de/lennart/projects/paman/">Polypaudio Manager 0.9.0</a>, updated for Polypaudio 0.9.0</li> <li><a href="http://0pointer.de/lennart/projects/pavucontrol/">Polypaudio Volume Control 0.9.0</a>, updated for Polypaudio 0.9.0</li> <li><a href="http://0pointer.de/lennart/projects/pavumeter/">Polypaudio Volume Meter 0.9.0</a>, updated for Polypaudio 0.9.0</li> </ul> <p>A screenshot showing most of this in action:</p> <p><a href="http://0pointer.de/public/polypaudio.png"><img src="http://0pointer.de/public/polypaudio-small.png" width="640" height="256" alt="Polypaudio Screenshot" /></a>.</p> <p><i>This screenshot shows: the Polypaudio Manager, the Polypaudio Volume Control, the Polypaudio Volume Meter, the XMMS plugin, the GStreamer plugin used by Rhythmbox and <tt>gstreamer-properties</tt>, <tt>pacat</tt> playing some noise from <tt>/dev/urandom</tt>, <tt>padsp</tt> used on MPlayer. (This screenshot actually shows some post-0.9.0 work, like the icons used by the application windows) </i></p> Lennart PoetteringSun, 28 May 2006 18:21:00 +0200tag:0pointer.net,2006-05-28:/blog/projects/polypaudio-0.9.0.htmlprojectsA big bear hugged one and then there were twohttps://0pointer.net/blog/projects/howl.html <p>Scott Herscher decided to <a href="http://www.porchdogsoft.com/products/howl/">cease development of HOWL</a>. That means only <a href="http://avahi.org/">Avahi</a> and <a href="http://www.apple.com/macosx/features/bonjour/">Bonjour</a> are left as widely known mDNS/DNS-SD implementations.</p> <p>Scott, your work on HOWL has not been in vain. Many Linux/Free Software people (including me) learned to know <a href="http://www.zeroconf.org/">Zeroconf</a> with your software. Without the troubles surrounding the licensing, I would never have started what is now known as Avahi, and HOWL would still be the number one of the Linux mDNS/DNS-SD implementations.</p> <p>The HOWL legacy will live on, since Avahi includes a HOWL compatibility layer which will be kept around for a while.</p> <p> A year and a few weeks ago Trent and I decided to merge our efforts and form Avahi from our seperate works. I wonder how much time it will take us until we see a similar R.I.P. note from the Bonjour camp, on our route to <b>AVAHI WORLD DOMINATION</b>. <tt>;-)</tt></p> <p>In contrast to what Scott wrote in his announcement, Avahi is far from being strictly Linux. Avahi has been ported to FreeBSD, OpenBSD, MacOSX and recently (not yet official) Solaris. (However, he's right with what he writes about me.)</p> Lennart PoetteringFri, 26 May 2006 03:17:00 +0200tag:0pointer.net,2006-05-26:/blog/projects/howl.htmlprojectsHamburg Docklandhttps://0pointer.net/blog/photos/dockland.html <p>The "Dockland" in Hamburg-Altona:</p> <div> <a href="http://0pointer.de/photos/?gallery=Hamburg%20Dockland&amp;photo=13&amp;exif_style=&amp;show_thumbs="><img src="http://0pointer.de/photos/galleries/Hamburg%20Dockland/lq/img-13.jpg" width="320" height="480" alt="Hamburg Dockland" /></a> </div> Lennart PoetteringTue, 23 May 2006 20:17:00 +0200tag:0pointer.net,2006-05-23:/blog/photos/dockland.htmlphotosIntroducing the Polypaudio Volume Controlhttps://0pointer.net/blog/projects/pavucontrol.html <p>The result of a few hours of hacking:</p> <img src="http://0pointer.de/lennart/projects/pavucontrol//screenshot.png" width="508" height="523" alt="pavucontrol screenshot" /> <p><tt>pavucontrol</tt> cannot only control the volume of hardware devices of the <a href="http://0pointer.de/lennart/projects/polypaudio/">Polypaudio sound server</a> but also of all playback streams seperately, much like the new Windows Vista volume control application.</p> <p><a href="http://0pointer.de/lennart/projects/pavucontrol/">Get the Polypaudio Volume Control while it is hot</a>.</p> <p>On a side note I released updated versions of both the <a href="http://0pointer.de/lennart/projects/pavumeter/">Polypaudio Volume Meter</a> and the <a href="http://0pointer.de/lennart/projects/paman/">Polypaudio Manager</a> which are compatible with Polypaudio 0.8.</p> Lennart PoetteringFri, 21 Apr 2006 23:20:00 +0200tag:0pointer.net,2006-04-21:/blog/projects/pavucontrol.htmlprojectsPolypaudio 0.8 Releasedhttps://0pointer.net/blog/projects/polypaudio-0.8.html <p style="margin-left: 1cm"><i>The <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=339589">reports of Polypaudio's death</a> are greatly exaggerated.</i></p> <p>We are proud to announce the release of <a href="http://0pointer.de/lennart/projects/polypaudio/">Polypaudio</a> 0.8, our networked sound daemon for Linux, other Unix-like operating systems, and Microsoft Windows. Since the last official release, 0.7, more than a year has passed. In the meantime Polypaudio experienced major improvements. Major contributions have been made by both Pierre Ossman and me. Pierre is being payed by <a href="http://www.cendio.com/">Cendio AB</a> to work on Polypaudio. Cendio distributes Polypaudio along with their <a href="http://www.cendio.com/products/thinlinc">ThinLinc Terminal Server</a>.</p> <p>Some of the major changes: </p> <ul> <li>New playback buffer model that allows applications to freely seek in the server side playback buffer (both with relative and absolute indexes) and to synchronize multiple streams together, in a way that the playback times are guaranteed to stay synchronized even in the case of a buffer underrun. (Lennart)</li> <li>Ported to <i>Microsoft Windows</i> and <i>Sun Solaris</i> (Pierre)</li> <li>Many inner loops (like sample type conversions) have been ported to <a href="http://liboil.freedesktop.org/wiki/">liboil</a>, which enables us to take advantage of modern SIMD instruction sets, like MMX or SSE/SSE2. (Lennart)</li> <li>Support for channel maps which allow applications to assign specific speaker positions to logical channels. This enables support for "surround sound". In addition we now support seperate volumes for all channels. (Lennart)</li> <li>Support for hardware volume control for drivers that support it. (Lennart, Pierre)</li> <li>Local users may now be authenticated just by the membership in a UNIX group, without the need to exchange authentication cookies. (Lennart)</li> <li>A new driver module <tt>module-detect</tt> which detects automatically what local output devices are available and loads the needed drivers. Supports ALSA, OSS, Solaris and Win32 devices. (Lennart, Pierre)</li> <li>Two new modules implementing <a href="http://en.wikipedia.org/wiki/Real-time_Transport_Protocol">RTP</a>/<a href="http://en.wikipedia.org/wiki/Session_Description_Protocol">SDP</a>/<a href="http://en.wikipedia.org/wiki/Session_Announcement_Protocol">SAP</a> based multicast audio streaming. Useful for streaming music to multiple PCs with speakers simultaneously. Or for implementing a simple "always-on" conferencing solution for the LAN. Or for sharing a single MIC/LINE-IN jack on the LAN. (Lennart)</li> <li>Two new modules for connecting Polypaudio to a <a href="http://jackit.sourceforge.net/">JACK</a> audio server (Lennart)</li> <li>A new Zeroconf (mDNS/DNS-SD) publisher module. (Lennart)</li> <li>A new module to control the volume of an output sink with a <a href="http://www.lirc.org/">LIRC</a> supported infrared remote control, and another one for doing so with a multimeda keyboard. (Lennart)</li> <li>Support for resolving remote host names asynchronously using <a href="http://0pointer.de/lennart/projects/libasyncns/">libasyncns</a>. (Lennart)</li> <li>A simple proof-of-concept HTTP module, which dumps the current daemon status to HTML. (Lennart)</li> <li>Add proper validity checking of passed parameter to every single API functions. (Lennart)</li> <li>Last but not least, the documentation has been beefed up a lot and is no longer just a simple doxygen-based API documentation (Pierre, Lennart)</li> </ul> <p>Sounds good, doesn't it? But that's not all!</p> <p>We're really excited about this new Polypaudio release. However, there are more very exciting, good news in the Polypaudio world. Pierre implemented a Polypaudio plugin for <tt>alsa-libs</tt>. This means you may now use any ALSA-aware application to access a Polypaudio sound server! The patch has already merged upstream, and will probably appear in the next official release of <tt>alsa-plugins</tt>.</p> <p>Due to the massive internal changes we had to make a lot of modifications to the public API. Hence applications which currently make use of the Polypaudio 0.7 API need to be updated. The patches or packages I maintain will be updated in the next weeks one-by-one. (That is: xmms-polyp, the MPlayer patch, the libao patch, the GStreamer patch and the PortAudio patch)</p> <p>A side note: I wonder what this new release means for Polypaudio in Debian. I've never been informed by the Debian maintainers of Polypaudio that it has been uploaded to Debian, and never of the removal either. In fact I never exchanged a single line with those who were the Debian maintainers of Polypaudio. Is this the intended way how the Debian project wants its developers to communicate with upstream? I doubt that!</p> <h4>How does Polypaudio compare to <a href="http://www.tux.org/~ricdude/EsounD.html">ESOUND</a>?</h4> <p>Polypaudio does everything what ESOUND does, and much more. It is a fully compatible drop-in replacement. With a small script you can make it command line compatible (including autospawning). ESOUND clients may connect to our daemon just like they did to the original ESOUND daemon, since we implemented a compatibility module for the ESOUND protocol. </p> <p>Support for other well known networked audio protocols (such as NAS) should be easy to add - if there is a need.</p> <p>For a full list of the features that Polypaudio has over ESOUND, see <a href="http://0pointer.de/lennart/projects/polypaudio/">Polypaudio's homepage</a>.</p> <h4>How does Polypaudio compare to <a href="http://www.alsa-project.org/">ALSA</a>'s dmix?</h4> <p>Some people might ask whether there still is a need for a sound server in times where ALSA's <tt>dmix</tt> plugin is available. The answer is: yes!</p> <p>Firstly, Polypaudio is networked, which <tt>dmix</tt> is not. However, there are many reasons why Polypaudio is useful on non-networked systems as well. Polypaudio is portable, it is available not just for Linux but for FreeBSD, Solaris and even Microsoft Windows. Polypaudio is extensible, there is broad range of <a href="http://0pointer.de/lennart/projects/polypaudio/modules.html">additional modules</a> available which allow the user to use Polypaudio in many exciting ways ALSA doesn't offer. In Polypaudio streams, devices and other server internals can be monitored and introspected freely. The volume of the multiple streams may be manipulated independently of each other, which allows new exciting applications like a work-alike of the new per-application mixer tool featured in upcoming Windows Vista. In multi-user systems, Polypaudio offers a secure and safe way to allow multiple users to access the sound device simultaneously. Polypaudio may be accessed through the ESOUND and the ALSA APIs. In addition, ALSA dmix is still not supported properly by many ALSA clients, and is difficult to setup.</p> <p>A side node: <tt>dmix</tt> forks off its own simple sound daemon anyway, hence there is no big difference to using Polypaudio with the ALSA plugin in auto-spawning mode. (Though admittedly, those ALSA clients that don't work properly with dmix, won't do so with our ALSA plugin as well since they actually use the ALSA API incorrectly.)</p> <h4>How does Polypaudio compare to <a href="http://jackit.sourceforge.net/">JACK</a>?</h4> <p>Everytime people discuss sound servers on Unix/Linux and which way is the right to go for desktops, JACK gets mentioned and suggested by some as a replacement for ESOUND for the desktop. However, this is not practical. JACK is not intended to be a desktop sound server, instead it is designed for professional audio in mind. Its semantics are different from other sound servers: e.g. it uses exclusively floating point samples, doesn't deal directly with interleaved channels and maintains a server global time-line which may be stopped and seeked around. All that translates badly to desktop usages. JACK is really nice software, but just not designed for the normal desktop user, who's not working on professional audio production. </p> <p>Since we think that JACK is really a nice piece of work, we added two new modules to Polypaudio which can be used to hook it up to a JACK server.</p> <p><a href="http://0pointer.de/lennart/projects/polypaudio/">Get Polypaudio 0.8, while it is hot!</a></p> <p>BTW: We're looking for a logo for Polypaudio. Feel free to send us your suggestions!</p> <p><i>Update: The Debian rant is unjust to Jeff Waugh. In fact, he had informed me that he prepared Debian packages of Polypaudio. I just never realized that he had actually uploaded them to Debian. What still stands, however, is that I've not been informed or asked about the removal.</i></p> Lennart PoetteringThu, 13 Apr 2006 21:45:00 +0200tag:0pointer.net,2006-04-13:/blog/projects/polypaudio-0.8.htmlprojectsPanoramic View of St.-Pauli-Langungsbruecken, Hamburg, Germanyhttps://0pointer.net/blog/photos/stintfang.html <p>The result of stitching six photos together with <a href="http://hugin.sourceforge.net/">Hugin</a>, <tt>autopano-sift</tt>, and <tt>enblend</tt>:</p> <a href="http://0pointer.de/static/landungsbruecken.html"><img alt="Picture of St.-Pauli-Landungsbrücken" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/qm-1000.jpeg" width="1000" height="242" /></a> <p>The <i>St.-Pauli-Landungsbr&uuml;cken</i> with the <i>Queen Mary 2</i> in the drydock, Hamburg, Germany in November 2005. Photographed from the <i>Stintfang</i>. The full image has a size of 9256x2240.</p> Lennart PoetteringTue, 04 Apr 2006 16:43:00 +0200tag:0pointer.net,2006-04-04:/blog/photos/stintfang.htmlphotosPanoramic View of Les Deux Alpeshttps://0pointer.net/blog/photos/2alpes.html <p>The result of stitching 16 photos together with <a href="http://hugin.sourceforge.net/">Hugin</a>, <tt>autopano-sift</tt>, and <tt>enblend</tt>:</p> <a href="http://0pointer.de/static/2alpes.html"><img alt="Picture of Les Deux Alpes" style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/2alpes-small.jpeg" width="1024" height="116" /></a> <p>The <i>Massif du Soreiller</i> with the <i>Aiguille du Plat de la Selle</i> near <i>Les Deux Alpes</i>, France. Photographed from the <i>D&ocirc;me de Puy Sali&eacute;</i>. The full image has a size of 14443x2156. <a href="http://0pointer.de/photos/?gallery=Les%20Deux%20Alpes">More pictures from Les Deux Alpes</a>.</p> Lennart PoetteringSun, 02 Apr 2006 00:48:00 +0200tag:0pointer.net,2006-04-02:/blog/photos/2alpes.htmlphotosLCD Brightness Control on MSI S270 Laptopshttps://0pointer.net/blog/projects/s270ctrl.html <p>In response to <a href="http://mjg59.livejournal.com/57875.html">mjg59</a>'s rant about controlling the LCD brightness on laptops, I invested some time to reverse engineer the Windows driver of my <a href="http://0pointer.de/lennart/tchibo.html">MSI S270</a> laptop which implements changing LCD brightness. It requires some ugly fiddling with registers of the "embedded controller" on ports 0x62 and 0x66. The result of my work is <a href="http://0pointer.de/lennart/projects/s270ctrl/">s270ctrl</a>, a small userspace utility. I'm planning to turn this into a proper kernel module soon.</p> Lennart PoetteringMon, 13 Feb 2006 22:25:00 +0100tag:0pointer.net,2006-02-13:/blog/projects/s270ctrl.htmlprojectsAvahi Articles in German "Linux Magazin"https://0pointer.net/blog/projects/avahi-linuxmag.html <p>If you have access to the current issue (03/06) of the german <a href="http://www.linux-magazin.de/">Linux Magazin</a> make sure to read the two extensive articles about <a href="http://avahi.org/">Avahi</a> (p.64 and p.90). Daniel S. Haischt wrote the second article, I wrote the other. Both are a worthy read!</p> Lennart PoetteringMon, 13 Feb 2006 22:22:00 +0100tag:0pointer.net,2006-02-13:/blog/projects/avahi-linuxmag.htmlprojectsDebian Packages of mod_dnssd and mod_mime_xattrhttps://0pointer.net/blog/projects/mod-dnssd-debian.html <p>Due to the great work of Sebastien Estienne there are now Debian packages of <a href="http://0pointer.de/lennart/projects/mod_dnssd/"><tt>mod_dnssd</tt></a> and <a href="http://0pointer.de/lennart/projects/mod_mime_xattr/"><tt>mod_mime_xattr</tt></a> available from my little <a href="http://0pointer.de/debian/">Debian package repository</a>. They've been uploaded to Ubuntu as well, but we are still looking for some Debian developer who would be willing to upload them to Debian proper. Feel free to contact me if you are interested!</p> Lennart PoetteringSun, 12 Feb 2006 01:31:00 +0100tag:0pointer.net,2006-02-12:/blog/projects/mod-dnssd-debian.htmlprojectsPack Ice on the Elbe Riverhttps://0pointer.net/blog/photos/pack-ice.html <p>It has been pretty cold in Hamburg the last days. There's now a thick but holey ice cover on the Elbe river:</p> <div> <a href="http://0pointer.de/photos/?gallery=Hamburg%20Ice%20Floes%202&amp;photo=24"><img src="http://0pointer.de/photos/galleries/Hamburg%20Ice%20Floes%202/lq/img-24.jpg" width="640" height="427" alt="River Elbe" /></a> </div> Lennart PoetteringMon, 30 Jan 2006 02:47:00 +0100tag:0pointer.net,2006-01-30:/blog/photos/pack-ice.htmlphotosAdding Extended Attribute Support to Apache 2.0https://0pointer.net/blog/projects/mod-mime-xattr.html <p>I updated my little Apache module <a href="http://0pointer.de/lennart/projects/mod_mime_xattr/"><tt>mod_mime_xattr</tt></a> to be compatible with Apache 2.0.</p> <p>What is it useful for? Linux (2.4 with patch, 2.6 out-of-the-box) has been supporting <a href="http://acl.bestbits.at/">extended attributes for files (EAs)</a> for ages, but very few applications use them. To change that I wrote a small module for Apache which interpretes the EA <tt>user.mime_type</tt> and uses its value as MIME type for all files served by Apache. The EA has been standardized by the <a href="http://www.freedesktop.org/Standards/shared-mime-info-spec">XDG MIME system</a>, but apparently neither Gnome nor KDE support it right now. </p> <p>Usage of <tt>mod_mime_xattr</tt> is simple. To enable interpretation of the EA on the entire tree use something like this in your Apache configuration file:</p> <pre>&lt;Directory /&gt; XAttrMimeType On &lt;/Directory&gt; </pre> <p>That's all that is required to make use of <tt>user.mime_type</tt> on all files where it is set. To set the EA use a command like this one:</p> <pre>setfattr -n "user.mime_type" -v "text/html" foo.txt</pre> <p>And <tt>foo.txt</tt> will become a file with the MIME type of <tt>text/html</tt>, although its suffix is <tt>.txt</tt>!</p> Lennart PoetteringMon, 23 Jan 2006 17:24:00 +0100tag:0pointer.net,2006-01-23:/blog/projects/mod-mime-xattr.htmlprojectsAvahi Support for Apachehttps://0pointer.net/blog/projects/mod_dnssd.html <p>The first release of <a href="http://0pointer.de/lennart/projects/mod_dnssd/"><tt>mod_dnssd</tt></a> is now available. It adds DNS-SD based Zeroconf support to Apache 2.0 using <a href="http://avahi.org">Avahi</a>.</p> <p>This work has been inspired by Sander Temme's and Sebastien Estienne's <a href="http://www.temme.net/sander/mod_zeroconf/"><tt>mod_zeroconf</tt></a> module, but supersedes it in every way. MacOSX ships with <tt>mod_rendezvous</tt>/<tt>mod_bonjour</tt>, but <tt>mod_dnssd</tt> is much more powerful than this piece of software as well. In short: <tt>mod_dnssd</tt> is definitely the greatest way to add Zeroconf support to Apache available today.</p> <p>A few examples just to show how great <tt>mod_dnssd</tt> is:</p> <pre> DNSSDEnable On </pre> <p>This is everything you need to enable DNS-SD support in Apache after loading the module. It will publish all virtual hosts and all existing <tt>mod_userdir</tt> directories (i.e. <tt>~/public_html</tt>) as services of type <tt>_http._tcp</tt>.</p> <p>In case you want to publish some subdirectory of the web server as service, just place <tt>DNSSDServiceName</tt> inside a &lt;Location&gt; section for that path:</p> <pre> &lt;Location /foobar&gt; DNSSDServiceName "A special service called foobar" &lt;/Location&gt; </pre> <p>You can even use it to publish WebDAV shares using Apache's <tt>mod_dav</tt> module:</p> <pre> &lt;Location /webdav&gt; Dav On DNSSDServiceName "A WebDAV folder" DNSSDServiceTypes _webdav._tcp &lt;/Location&gt; </pre> <p>This especially cool since we now have a free software server counterpart for Gnome's and KDE's WebDAV client functionality.</p> <p>Or to publish your blog as RSS service:</p> <pre> &lt;Location /blog.cgi?rss&gt; DNSSDServiceName "The blog" DNSSDServiceTypes _rss._tcp &lt;/Location&gt; </pre> <p><a href="http://0pointer.de/lennart/projects/mod_dnssd/">Get it while it is hot!</a></p> Lennart PoetteringThu, 19 Jan 2006 16:53:00 +0100tag:0pointer.net,2006-01-19:/blog/projects/mod_dnssd.htmlprojectsAvahi 0.6.3https://0pointer.net/blog/projects/avahi-0.6.3.html <p>A few days ago we relased Avahi 0.6.3. This is an important bugfix release, everyone should update as soon as possible.</p> <p>Avahi now has its own domain <a href="http://avahi.org/"><tt>avahi.org</tt></a> and finally has a logo, thanks to the great work of Mathieu Drouet:</p> <p><img src="http://avahi.org/chrome/site/avahi-trac.png" width="200" height="96" alt="Avahi Logo" /></p> <p>Avahi has moved from <a href="ftp://ftp.debian.org/debian/pool/main/a/avahi/">Debian</a> Experimental to Unstable. <a href="http://packages.ubuntu.com/dapper/source/avahi">Ubuntu</a> moved it from Universe to Main since it successfully passed their security auditing. The <a href="http://fr2.rpmfind.net/linux/rpm2html/search.php?query=avahi&amp;submit=Search+...&amp;system=fedora&amp;arch=">Fedora Core</a> development distribution contains it too, as does <a href="http://fr2.rpmfind.net/linux/rpm2html/search.php?query=avahi&amp;submit=Search+...&amp;system=suse&amp;arch=">SuSE</a>'s and <a href="http://packages.gentoo.org/ebuilds/?avahi-0.6.2">Gentoo</a>'s. But where's Mandriva? Apparently they are <a href="http://qa.mandriva.com/show_bug.cgi?id=19659">considering it</a>, for whatever it is worth. <a href="http://www.freshports.org/net/avahi">FreeBSD Ports</a> has it too. I guess this means that Avahi has now been accepted by all major distributions. Hurrah!</p> Lennart PoetteringTue, 10 Jan 2006 02:09:00 +0100tag:0pointer.net,2006-01-10:/blog/projects/avahi-0.6.3.htmlprojectsWinterhttps://0pointer.net/blog/photos/winter.html <p>Impressions of the winter in Val Thorens, Savoie, France and in Hamburg, Germany:</p> <div> <a href="http://0pointer.de/photos/?gallery=Val%20Thorens&amp;photo=34"><img src="http://0pointer.de/photos/galleries/Val%20Thorens/lq/img-34.jpg" width="320" height="480" alt="Val Thorens" /></a> &nbsp; <a href="http://0pointer.de/photos/?gallery=Hamburg%20Brook&amp;photo=13"><img src="http://0pointer.de/photos/galleries/Hamburg%20Brook/lq/img-13.jpg" width="320" height="480" alt="Hamburg Duvenstedter Brook" /></a> </div> Lennart PoetteringTue, 10 Jan 2006 01:33:00 +0100tag:0pointer.net,2006-01-10:/blog/photos/winter.htmlphotosFractals with Pythonhttps://0pointer.net/blog/projects/mandelbrot.html <p>It's impressing how easy it is to draw fractals with Python. Using the ubercool <a href="http://www.pythonware.com/products/pil/index.htm">Python Imaging Library</a> and native complex number support in Python you can code an elaborate and easy to understand fractal generator in less than 50 lines of code:</p> <pre> #!/usr/bin/python import Image, ImageDraw, math, colorsys dimensions = (800, 800) scale = 1.0/(dimensions[0]/3) center = (2.2, 1.5) # Use this for Mandelbrot set #center = (1.5, 1.5) # Use this for Julia set iterate_max = 100 colors_max = 50 img = Image.new("RGB", dimensions) d = ImageDraw.Draw(img) # Calculate a tolerable palette palette = [0] * colors_max for i in xrange(colors_max): f = 1-abs((float(i)/colors_max-1)**15) r, g, b = colorsys.hsv_to_rgb(.66+f/3, 1-f/2, f) palette[i] = (int(r*255), int(g*255), int(b*255)) # Calculate the mandelbrot sequence for the point c with start value z def iterate_mandelbrot(c, z = 0): for n in xrange(iterate_max + 1): z = z*z +c if abs(z) > 2: return n return None # Draw our image for y in xrange(dimensions[1]): for x in xrange(dimensions[0]): c = complex(x * scale - center[0], y * scale - center[1]) n = iterate_mandelbrot(c) # Use this for Mandelbrot set #n = iterate_mandelbrot(complex(0.3, 0.6), c) # Use this for Julia set if n is None: v = 1 else: v = n/100.0 d.point((x, y), fill = palette[int(v * (colors_max-1))]) del d img.save("result.png") </pre> <p>Some example pictures:</p> <p><a href="http://0pointer.de/public/julia.png"><img src="http://0pointer.de/public/julia-small.png" width="93" height="100" alt="Julia Set" /></a>&nbsp;<a href="http://0pointer.de/public/mandelbrot.png"><img src="http://0pointer.de/public/mandelbrot-small.png" width="113" height="100" alt="Mandelbrot Set" /></a>.</p> Lennart PoetteringTue, 29 Nov 2005 01:31:00 +0100tag:0pointer.net,2005-11-29:/blog/projects/mandelbrot.htmlprojectsIntroducing nss-myhostnamehttps://0pointer.net/blog/projects/nss-myhostname.html <p>I am doing a lot of embedded Linux work lately. The machines we use configure their hostname depending on some external configuration options. They boot from a CF card, which is mostly mounted read-only. Since the hostname changes often but we wanted to use <tt>sudo</tt> we had a problem: <tt>sudo</tt> requires the local host name to be resolvable using <tt>gethostbyname()</tt>. On Debian this is usually done by patching <tt>/etc/hosts</tt> correctly. Unfortunately that file resides on a read-only partition. Instead of hacking some ugly symlink based solution I decided to fix it the right way and wrote a tiny NSS module which does nothing more than mapping the hostname to the IP address 127.0.0.2 (and back). (That IP address is on the loopback device, but is not identical to <tt>localhost</tt>.)</p> <p>Get <a href="http://0pointer.de/lennart/projects/nss-myhostname/"><tt>nss-myhostname</tt></a> while it is hot!</p> <p>BTW: <a href="http://0pointer.de/lennart/projects/peekvc/">This tool I wrote</a> is pretty useful on embedded machines too, and certainly easier to use than <tt>setterm -dump 1 -file /dev/stdout | fold -w 80</tt>. And it does color too. And looping. And is much cooler anyway.</p> Lennart PoetteringSun, 20 Nov 2005 02:29:00 +0100tag:0pointer.net,2005-11-20:/blog/projects/nss-myhostname.htmlprojectsMission accomplishedhttps://0pointer.net/blog/projects/avahi-0.6.html <p><a href="http://www.freedesktop.org/Software/Avahi">Avahi</a> 0.6 is now officially released. <a href="http://www.freedesktop.org/~lennart/avahi-0.6.tar.gz">Get it</a> while it is hot!</p> <p><a href="http://www.freedesktop.org/~lennart/announcement-0.6">Read the announcement</a>.</p> <p>In related news: I prepared <a href="http://0pointer.de/public/distcc-avahi.patch">a patch for distcc that adds Zeroconf support using Avahi</a>.</p> Lennart PoetteringFri, 18 Nov 2005 23:03:00 +0100tag:0pointer.net,2005-11-18:/blog/projects/avahi-0.6.htmlprojectsAvahi 0.6 in Betahttps://0pointer.net/blog/projects/avahi-0.6-pre.html <p>Unless we find any major bugs <a href="http://www.freedesktop.org/Software/Avahi">Avahi</a> 0.6 will be released on friday. We ask everyone to do some testing for us:</p> <ul> <li><a href="http://0pointer.de/public/avahi-snapshot.tar.gz">Current Avahi SVN snapshort</a></li> <li><a href="http://0pointer.de/public/libdaemon-snapshot.tar.gz">Current libdaemon SVN snapshot</a></li> </ul> <p>There have been a bunch of <a href="http://0pointer.de/cgi-bin/viewcvs.cgi/*checkout*/trunk/docs/API-CHANGES-0.6">API changes</a>. However, the API is now frozen, so feel free to start porting your application to the new API now.</p> <p>A rough overview about the many improvements in Avahi 0.6.</p> <ul> <li>Support for (read-only) wide area support. (i.e. DNS-SD over unicast DNS)</li> <li>Ported to FreeBSD, NetBSD, Darwin/MacOSX and to some extent OpenBSD</li> <li>Compatibility layers for HOWL and Bonjour</li> <li>Support for registering/browsing abritrary records</li> <li>Proper support for DNS-SD service subtypes</li> <li>Native C implementations of the client utilities</li> <li>Now passes the Bonjour conformance test suite without any exceptions</li> <li>"Passive observation of failures"</li> <li><tt>chroot()</tt> support</li> <li>Many traffic reduction improvements</li> <li>Bugfixes, cleanups</li> </ul> Lennart PoetteringThu, 17 Nov 2005 01:10:00 +0100tag:0pointer.net,2005-11-17:/blog/projects/avahi-0.6-pre.htmlprojectsHamburg Harbourhttps://0pointer.net/blog/photos/hheurokai.html <p>The Eurokai in the Harbour of Hamburg in the early evening:</p> <a href="http://0pointer.de/photos/?gallery=Hamburg%20Eurokai&amp;photo=7"><img src="http://0pointer.de/public/hheurokai.jpeg" alt="Hamburg Eurokai" width="480" height="700" /></a> Lennart PoetteringTue, 01 Nov 2005 15:57:00 +0100tag:0pointer.net,2005-11-01:/blog/photos/hheurokai.htmlphotosAvahi Gains Compatibility Layers for Apple Bonjour and HOWLhttps://0pointer.net/blog/projects/avahi-compat.html <p>A short while ago I checked in to SVN two API/ABI compatibility modules which implement the <a href="http://www.porchdogsoft.com/products/howl/">HOWL</a> and the <a href="http://developer.apple.com/documentation/Networking/Reference/DNSServiceDiscovery_CRef/">Apple Bonjour (<tt>dns_sd.h</tt>)</a> DNS-SD/mDNS APIs on top of Avahi's native API. Effectively this means that you can run <b>*all*</b> Zeroconf-enabled software that is available for free operating systems seamlessly on top of Avahi. Or at least the software that uses the limited subset of API functions we support. Missing functions will be implemented on an on-demand basis. Gnome-VFS/Nautilus works perfectly, as does Gobby, which are the only real-world applications we tested until now.</p> <p>The list of supported/unsupported functions is available from SVN <a href="http://0pointer.de/cgi-bin/viewcvs.cgi/trunk/avahi-compat-howl/funcs.txt?view=auto">for HOWL</a> and for <a href="http://0pointer.de/cgi-bin/viewcvs.cgi/trunk/avahi-compat-libdns_sd/funcs.txt?view=auto"><tt>dns-sd.h</tt></a>.</p> <p>The compatibility layers are actually pretty interesting pieces of code: for compatibility with the way HOWL/Bonjour integrates with event loops we had to hook up the timeout and I/O watches D-BUS depends on to a single file descriptor. This involves all kinds of ugly things like threading and "creative" ways to use the event loop abstraction Avahi provides. Some might call this "cracktastic", but it actually works pretty well.</p> <p>The compatibility layers are not intended to be long term solutions. For every session object we create a background thread that polls for events and a DBUS session object. This is an utter waste of resources, especially on <tt>dns_sd.h</tt> where every basic operation uses a session object of its own. In addition, our compatibility layers are incomplete. We do not offer the full set of functions or the full semantics. Our compatibility is just good enough to make most Zeroconf-aware programs work with Avahi right now.</p> <p>We consider neither <tt>dns_sd.h</tt> nor the HOWL API a "well designed" API and encourage people to port their programs to our more powerful native API. To stress this the two modules will warn the user about their usage and write a warning line to STDERR and <tt>syslog</tt>. Hopefully this will annoy people sufficiently that Avahi adoption speeds up a little.</p> <p>To our own surprise we actually support at least one API function more than each of the reference implementations! From <tt>dns_sd.h</tt> we support <tt>DNSServiceEnumerateDomains()</tt> which is actually unsupported by Apple Bonjour on POSIX/Linux systems. The documented HOWL function <tt>sw_ipv4_address_decompose()</tt> is actually a NOOP in the reference implementation, but isn't in our compatibility layer.</p> <p>Since <tt>dns_sd.h</tt> is the only file licensed under a BSD license in the otherwise APSL-licensed mDNSResponder distribution, we were able to copy it into our sources untouched.</p> <p>Here's <a href="http://0pointer.de/public/avahi-compat.png">a screenshot of Nautilus and Gobby</a> running on top of Avahi through the HOWL compatibility layers.</p> Lennart PoetteringSun, 16 Oct 2005 17:29:00 +0200tag:0pointer.net,2005-10-16:/blog/projects/avahi-compat.htmlprojectsAvahi Gains "Wide-Area" Supporthttps://0pointer.net/blog/projects/avahi-wide-area.html <p>Yesterday in the late evening I commited "Wide Area" support to <a href="http://www.freedesktop.org/Software/Avahi">Avahi</a> SVN, i.e. "DNS-SD over Unicast DNS". Only browsing, no "Long-Lived Query" support and no publishing for now, but it is a start.</p> <p>To show off how cool this is, here is a "screenshot" of <tt>avahi-browse</tt> showing all services defined in the domain <tt>0pointer.de</tt>:</p> <pre> $ <b>avahi-browse -a -d 0pointer.de</b> Browsing domain '0pointer.de' on any.-1 ... Browsing for services of type '_http-rss091._tcp' (Web Syndication RSS 0.91) in domain '0pointer.de' on any.-1 ... Browsing for services of type '_http-rss20._tcp' (Web Syndication RSS 2.0) in domain '0pointer.de' on any.-1 ... Browsing for services of type '_http._tcp' (Web Site) in domain '0pointer.de' on any.-1 ... Found service 'Lennart's Blog' of type '_http-rss091._tcp' (Web Syndication RSS 0.91) in domain '0pointer.de' on any.-1. Found service 'Lennart's Blog' of type '_http-rss20._tcp' (Web Syndication RSS 2.0) in domain '0pointer.de' on any.-1. Found service 'Lennart's Homepage' of type '_http._tcp' (Web Site) in domain '0pointer.de' on any.-1. Found service 'Avahi mDNS/DNS-SD' of type '_http._tcp' (Web Site) in domain '0pointer.de' on any.-1. Found service 'Lennart's Photos' of type '_http._tcp' (Web Site) in domain '0pointer.de' on any.-1. Found service 'Lennart's Blog' of type '_http._tcp' (Web Site) in domain '0pointer.de' on any.-1. Service data for service 'Lennart's Blog' of type '_http-rss091._tcp' (Web Syndication RSS 0.91) in domain '0pointer.de' on any.-1: Host 0pointer.de (217.160.223.3), port 80, TXT data: ['path=/blog/index.rss'] Service data for service 'Lennart's Blog' of type '_http-rss20._tcp' (Web Syndication RSS 2.0) in domain '0pointer.de' on any.-1: Host 0pointer.de (217.160.223.3), port 80, TXT data: ['path=/blog/index.rss2'] Service data for service 'Lennart's Homepage' of type '_http._tcp' (Web Site) in domain '0pointer.de' on any.-1: Host 0pointer.de (217.160.223.3), port 80, TXT data: ['path=/lennart/'] Service data for service 'Avahi mDNS/DNS-SD' of type '_http._tcp' (Web Site) in domain '0pointer.de' on any.-1: Host freedesktop.org (131.252.208.82), port 80, TXT data: ['path=/Software/Avahi'] Service data for service 'Lennart's Photos' of type '_http._tcp' (Web Site) in domain '0pointer.de' on any.-1: Host 0pointer.de (217.160.223.3), port 80, TXT data: ['path=/photos/'] Service data for service 'Lennart's Blog' of type '_http._tcp' (Web Site) in domain '0pointer.de' on any.-1: Host 0pointer.de (217.160.223.3), port 80, TXT data: ['path=/blog'] </pre> Lennart PoetteringMon, 26 Sep 2005 17:18:00 +0200tag:0pointer.net,2005-09-26:/blog/projects/avahi-wide-area.htmlprojectsLinux on the MSI S270 aka Cytron/TCM/Medion/Tchibo MD96100https://0pointer.net/blog/projects/tchibo-linux.html <p>I finally found the time to write up my experiences running Linux on my new shiny laptop. <a href="http://0pointer.de/lennart/tchibo.html">Read it here</a>.</p> Lennart PoetteringThu, 15 Sep 2005 22:11:00 +0200tag:0pointer.net,2005-09-15:/blog/projects/tchibo-linux.htmlprojectsKDE Ported to Avahihttps://0pointer.net/blog/projects/avahi-kde.html <p>Jakub Stachowski completed support for using Avahi as backend for KDE's KDNSSD subsystem. This means that you can use any Zeroconf-enabled KDE application (including Konqueror) with Avahi as mDNS stack. You can find more information in the <a href="http://wiki.kde.org/tiki-index.php?page=Zeroconf+in+KDE">KDNSSD Wiki</a>.</p> <p>The list of software supporting Avahi grows longer and longer. There are some patches for vino and GnomeMeeting floating around, Rhythmbox already merged DAAP support based on Avahi, KDE is now fully compatible with Avahi. Shall your project be the next in this list? To get started with Avahi, read the <a href="http://www.freedesktop.org/~lennart/doxygen/">developer's documentation</a>.</p> <p>Oh, yes, we released Avahi 0.3 and 0.4 recently. <a href="http://www.freedesktop.org/Software/Avahi">Get it while it's hot</a>. No major changes, just bugfixes an Qt main loop support.</p> Lennart PoetteringThu, 08 Sep 2005 23:52:00 +0200tag:0pointer.net,2005-09-08:/blog/projects/avahi-kde.htmlprojectsAvahi 0.2 Releasehttps://0pointer.net/blog/projects/avahi-0.2-release.html <p>Yesterday we released <a href="http://www.freedesktop.org/Software/Avahi">Avahi 0.2</a>. Get it while it is hot! Full <a href="http://www.freedesktop.org/~lennart/announcement-0.2">announcement here</a>. </p> <p>In related news: Jakub Stachowski is working on a <tt>kdnssd</tt>-to-Avahi bridge. Soon KDE applications will be able to make use of Avahi without even knowing.</p> <p>Sebastien's Zeroconf Gnome Applet now has an SVN repository: <tt>svn checkout svn://svn.0pointer.de/service-discovery-applet/trunk service-discovery-applet</tt>.</p> Lennart PoetteringMon, 29 Aug 2005 15:14:00 +0200tag:0pointer.net,2005-08-29:/blog/projects/avahi-0.2-release.htmlprojectsGnomeMeeting Supports Avahihttps://0pointer.net/blog/projects/avahi-sebest.html <p>Sebastien successfully completed porting GnomeMeeting to <a href="http://www.freedesktop.org/Software/Avahi">Avahi</a>. Therefore I declare him the first one to port a "real world" application to Avahi. Hurrah! <a href="http://0pointer.de/public/gnomemeeting-avahi.png">Screenshot here</a>.</p> <p>Shortly after, Sebestien - not lazy - announced his new Zeroconf service browser applet based on Avahi. It contains a drop down menu with all Zeroconf services found on your LAN. If you select a menu item the applet will execute the application that has been defined as Gnome URL handler for the specific protocol.</p> <img src="http://0pointer.de/public/avahi-applet-browser.png" width="381" height="216" alt="s-d-a" /> Lennart PoetteringSat, 27 Aug 2005 15:24:00 +0200tag:0pointer.net,2005-08-27:/blog/projects/avahi-sebest.htmlprojectsAvahi on Linux Weekly Newshttps://0pointer.net/blog/projects/avahi-lwn.html <p>Seems today's edition of <a href="http://lwn.net/">LWN</a> features a front page story about <a href="http://www.freedesktop.org/Software/Avahi">Avahi</a>. It's actually quite nice, even though I missed an emphasis on the fact that Avahi's mDNS stack itself is embeddable into applications via a shared library.</p> <p>I guess you'll have to wait a week if you want to read the article without subscription.</p> Lennart PoetteringThu, 25 Aug 2005 13:33:00 +0200tag:0pointer.net,2005-08-25:/blog/projects/avahi-lwn.htmlprojectsMe toohttps://0pointer.net/blog/google-talk.html <p>I am on Google Talk now: <b>poettering (at) googlemail.com</b></p> Lennart PoetteringThu, 25 Aug 2005 01:14:00 +0200tag:0pointer.net,2005-08-25:/blog/google-talk.htmlmiscAvahi 0.1 Finally Releasedhttps://0pointer.net/blog/projects/avahi-0.1-release.html <p>We finally released <a href="http://www.freedesktop.org/Software/Avahi">Avahi 0.1</a>. Full release announcement <a href="http://bur.st/~lathiat/avahi/announcement-0.1">here</a>. Avahi comes with a powerful DBUS API. Just two show off the coolnes of that interface a Python example:</p> <pre> import avahi, dbus, gobject bus = dbus.SystemBus() server = dbus.Interface(bus.get_object(avahi.DBUS_NAME, avahi.DBUS_PATH_SERVER), avahi.DBUS_INTERFACE_SERVER) def new_service(interface, protocol, name, type, domain): print "Found service '%s' of type '%s' in domain '%s'" % (name, type, domain) def remove_service(interface, protocol, name, type, domain): print "Service '%s' of type '%s' in domain '%s' disappeared." % (name, type, domain) path = server.ServiceBrowserNew(avahi.IF_UNSPEC, avahi.PROTO_UNSPEC, "_http._tcp", "") b = dbus.Interface(bus.get_object(avahi.DBUS_NAME, path), avahi.DBUS_INTERFACE_SERVICE_BROWSER) b.connect_to_signal('ItemNew', new_service) b.connect_to_signal('ItemRemove', remove_service) gobject.MainLoop().run() </pre> <p>This short program will connect to running <tt>avahi-daemon</tt> and browse for web services.</p> Lennart PoetteringMon, 22 Aug 2005 00:54:00 +0200tag:0pointer.net,2005-08-22:/blog/projects/avahi-0.1-release.htmlprojectsBee on a Thistlehttps://0pointer.net/blog/photos/biene.html <p>A bee on a thistle in the wheat field behind our house:</p> <a href="http://0pointer.de/photos/?gallery=Hamburg%20Weizenfeld&amp;photo=18"><img src="http://0pointer.de/public/biene.jpeg" alt="Bee on a Thistle" /></a> Lennart PoetteringTue, 16 Aug 2005 23:07:00 +0200tag:0pointer.net,2005-08-16:/blog/photos/biene.htmlphotosSimplified "Draft" Plugin for pyblosxomhttps://0pointer.net/blog/projects/pyblosxom-ignore.html <p>The pyblosxom plugin registry links a <a href="http://pyblosxom.sourceforge.net/blog/registry/authentication/draft">plugin</a> which allows hiding "draft" stories before publishing them, so that only you can see them. Unfortunately the link to this plugin is broken. So here's my (simplified) reimplementation:</p> <pre> def cb_prepare(args): request = args["request"] query = request.getHttp().get('QUERY_STRING', '') if not query.endswith("&amp;ignore") and not query == "ignore": data = request.getData() data["entry_list"] = filter(lambda e: not e.has_key('ignore'), data["entry_list"]) </pre> <p>To mark a story as "draft" simply insert this at line #2:</p> <pre> #ignore yes </pre> <p>To browse unpublished stories simply append <tt>?ignore</tt> (or <tt>&amp;ignore</tt>) to your blog URL.</p> Lennart PoetteringThu, 11 Aug 2005 21:00:00 +0200tag:0pointer.net,2005-08-11:/blog/projects/pyblosxom-ignore.htmlprojectsLinking pyblosxom to SVNhttps://0pointer.net/blog/projects/pyblosxom-svn.html <p>If you run a pyblosxom blog with auto-copied stories from SVN you are probably interested in getting stable story dates that don't change every time you update a story. The date of the initial SVN log entry of a story is something like the "day of birth" of a story, so it's a good value to use. Christopher Baus implemented a <a href="http://www.baus.net/svnpyblosxom">plugin</a> for pyblosxom, which looks overly complicated to me: it depends on memcached and comes in two large python scripts.</p> <p>To simplify things I wrote this minimal replacement:</p> <pre> import pysvn, os, sys, anydbm from config import py def get_mtime(fname): cache_fname = os.path.join(py['datadir'], 'SVNDATES') cache = anydbm.open(cache_fname, "c") if cache.has_key(fname): d = float(cache[fname]) else: client = pysvn.Client(fname) l = client.log(fname) if len(l) > 0: d = l[0]['date'] cache[fname] = str(d) else: d = -1 del client del cache return d def cb_filestat(args): args["mtime"] = list(args["mtime"]) d = get_mtime(args["filename"]) if d >= 0: args["mtime"][8] = d return args </pre> <p>Since accessing SVN logs is quite slow the script caches the "date of birth" in a dbm file. Make sure that your web server has enough priviliges to access that database file which is stored in <tt>$datadir/SVNDATES</tt> by default.</p> Lennart PoetteringWed, 10 Aug 2005 15:57:00 +0200tag:0pointer.net,2005-08-10:/blog/projects/pyblosxom-svn.htmlprojectsAvahi 0.1 Loominghttps://0pointer.net/blog/projects/avahi-0.1.html <p><a href="http://www.freedesktop.org/Software/Avahi">Avahi 0.1</a> is due in the next few days. The last missing piece is a simplifying C wrapper around the DBUS API. Though Avahi is currently pre-0.1 it is already quite complete and mature. To put it with Ross Burton: "<i>... this doesnt count as 0.1 because it has docs, man pages *and* works</i>"</p> <p>Unfortunately python-dbus has quite a few bugs which make it very difficult to code with. e.g. it doesn't handle sending empty arrays, fails to send byte values and so on. It is difficult to work around all these issues, therefore the Avahi client tools will not work with an unpatched python-dbus. You need to apply <a href="https://bugs.freedesktop.org/show_bug.cgi?id=4023">this patch</a> (applying to 0.35.2) to fix at least the byte value bug to get them working.</p> Lennart PoetteringWed, 10 Aug 2005 14:13:00 +0200tag:0pointer.net,2005-08-10:/blog/projects/avahi-0.1.htmlprojectsTiger Lilieshttps://0pointer.net/blog/photos/tiger-lilies.html <p>In the garden:</p> <a href="http://0pointer.de/photos/?gallery=Hamburg%20Garden%203&amp;photo=39"><img src="http://0pointer.de/photos/galleries/Hamburg%20Garden%203/lq/img-39.jpg" alt="Tiger Lillies" /></a> Lennart PoetteringWed, 10 Aug 2005 14:11:00 +0200tag:0pointer.net,2005-08-10:/blog/photos/tiger-lilies.htmlphotosWheathttps://0pointer.net/blog/photos/wheat.html <p>The wheat field behind our house:</p> <a href="http://0pointer.de/photos/?gallery=Hamburg%20Garden%203&amp;photo=27"><img src="http://0pointer.de/photos/galleries/Hamburg%20Garden%203/lq/img-27.jpg" alt="Wheat Field" /></a> Lennart PoetteringWed, 10 Aug 2005 14:11:00 +0200tag:0pointer.net,2005-08-10:/blog/photos/wheat.htmlphotosSt.-Pauli-Elbtunnelhttps://0pointer.net/blog/photos/elbtunnel.html <p>I really like this photo I made in the St.-Pauli-Elbtunnel in Hamburg:</p> <a href="http://0pointer.de/photos/?gallery=Hamburg%20Skyline&amp;photo=19"><img src="http://0pointer.de/photos/galleries/Hamburg%20Skyline/lq/img-19.jpg" alt="St.-Pauli-Elbtunnel" /></a> Lennart PoetteringWed, 10 Aug 2005 12:54:00 +0200tag:0pointer.net,2005-08-10:/blog/photos/elbtunnel.htmlphotos