Tag: debian (subscribe)

I am not on Freenode anymore


Tags:

This is a quick public announcement to say that I am not on the Freenode IRC network anymore. My nickname (sergiodj), which was more than a decade old, has just been deleted (along with every other nickname and channel in that network) from their database today, 2021-06-14.

For your safety, you should assume that everybody you knew at Freenode is not there either, even if you see their nicknames online. Do not trust without verifying. In fact, I would strongly encourage that you do not join Freenode anymore: their new policies are absolutely questionable and their disregard for their users is blatant.

If you would like to chat with me, you can find me at OFTC (preferred) and Libera.


Hi there. Long time no write!

On Tuesday, February 23, 2021, I made an announcement at debian-devel-announce about a new service that I configured for Debian: a debuginfod server.

This post serves two purposes: pay the promise I made to Jonathan Carter that I would write a blog post about the service, and go into a bit more detail about it.

What's debuginfod?

From the announcement above:

debuginfod is a new-ish project whose purpose is to serve
ELF/DWARF/source-code information over HTTP.  It is developed under the
elfutils umbrella.  You can find more information about it here:

  https://sourceware.org/elfutils/Debuginfod.html

In a nutshell, by using a debuginfod service you will not need to
install debuginfo (a.k.a. dbgsym) files anymore; the symbols will be
served to GDB (or any other debuginfo consumer that supports debuginfod)
over the network.  Ultimately, this makes the debugging experience much
smoother (I myself never remember the full URL of our debuginfo
repository when I need it).

Perhaps not everybody knows this, but until last year I was a Debugger Engineer (a.k.a. GDB hacker) at Red Hat. I was not involved with the creation of debuginfod directly, but I witnessed discussions about "having way to serve debug symbols over the internet" multiple times during my tenure at the company. So this is not a new idea, and it's not even the first implementation, but it's the first time that some engineers actually got their hands dirty enough to have something concrete to show.

The idea to set up a debuginfod server for Debian started to brew after 2019's GNU Tools Cauldron, but as usual several things happened in $LIFE (including a global pandemic and leaving Red Hat and starting a completely different job at Canonical) which had the effect of shuffling my TODO list "a little".

Benefits for Debian

Debian unfortunately is lagging behind when it comes to offer its users a good debugging experience. Before the advent of our debuginfod server, if you wanted to debug a package in Debian you would need to:

  1. Add the debian-debug apt repository to your /etc/apt/sources.list.

  2. Install the dbgsym package that contains the debug symbols for the package you are debugging. Note that the version of the dbgsym package needs to be exactly the same as the version of the package you want to debug.

  3. Figure out which shared libraries your package uses and install the dbgsym packages for all of them. Arguably, this step is optional but recommended if you would like to perform a more in-depth debugging.

  4. Download the package source, possibly using apt source or some equivalent command.

  5. Open GDB, and make sure you adjust the source paths properly (more below). This can be non-trivial.

  6. Finally, debug the program.

Now, with the new service, you will be able to start from step 4, without having to mess with sources.list, dbgsym packages and version mismatches.

The package source

It is important to mention an existing (but perhaps not well-known) limitation of our debugging experience in Debian: the need to manually download the source packages and adjust GDB to properly find them (see step 4 above). debuginfod is able to serve source code as well, but our Debian instance is not doing that at the moment.

Debian does not provide a patched source tree that is ready to be consumed by GDB or debuginfod (for a good example of a distribution that does this, see Fedora's debugsource packages). Let me show you an example of debugging GDB itself (using debuginfod) on Debian:

$ HOME=/tmp DEBUGINFOD_URLS=https://debuginfod.debian.net gdb -q gdb
Reading symbols from gdb...
Downloading separate debug info for /tmp/gdb...
Reading symbols from /tmp/.cache/debuginfod_client/02046bac4352940d19d9164bab73b2f5cefc8c73/debuginfo...
(gdb) start
Temporary breakpoint 1 at 0xd18e0: file /build/gdb-Nav6Es/gdb-10.1/gdb/gdb.c, line 28.
Starting program: /usr/bin/gdb 
Downloading separate debug info for /lib/x86_64-linux-gnu/libreadline.so.8...
Downloading separate debug info for /lib/x86_64-linux-gnu/libz.so.1...
Downloading separate debug info for /lib/x86_64-linux-gnu/libncursesw.so.6...
Downloading separate debug info for /lib/x86_64-linux-gnu/libtinfo.so.6...
Downloading separate debug info for /tmp/.cache/debuginfod_client/d6920dbdd057f44edaf4c1fbce191b5854dfd9e6/debuginfo...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Downloading separate debug info for /lib/x86_64-linux-gnu/libexpat.so.1...
Downloading separate debug info for /lib/x86_64-linux-gnu/liblzma.so.5...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libbabeltrace.so.1...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libbabeltrace-ctf.so.1...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libipt.so.2...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libmpfr.so.6...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libsource-highlight.so.4...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libxxhash.so.0...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libdebuginfod.so.1...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libstdc++.so.6...
Downloading separate debug info for /lib/x86_64-linux-gnu/libgcc_s.so.1...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0...
Downloading separate debug info for /tmp/.cache/debuginfod_client/dbfea245d26065975b4084f4e9cd2d83c65973ee/debuginfo...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libdw.so.1...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libelf.so.1...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libuuid.so.1...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libgmp.so.10...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libboost_regex.so.1.74.0...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4...
Downloading separate debug info for /lib/x86_64-linux-gnu/libbz2.so.1.0...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libicui18n.so.67...
Downloading separate debug info for /tmp/.cache/debuginfod_client/acaa831dbbc8aa70bb2131134e0c83206a0701f9/debuginfo...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libicuuc.so.67...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libnghttp2.so.14...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libidn2.so.0...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/librtmp.so.1...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libssh2.so.1...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libpsl.so.5...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libnettle.so.8...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libgnutls.so.30...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libbrotlidec.so.1...
Downloading separate debug info for /tmp/.cache/debuginfod_client/39739740c2f8a033de95c1c0b1eb8be445610b31/debuginfo...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libunistring.so.2...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libhogweed.so.6...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libgcrypt.so.20...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libp11-kit.so.0...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libtasn1.so.6...
Downloading separate debug info for /lib/x86_64-linux-gnu/libcom_err.so.2...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libsasl2.so.2...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libbrotlicommon.so.1...
Downloading separate debug info for /lib/x86_64-linux-gnu/libgpg-error.so.0...
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/libffi.so.7...
Downloading separate debug info for /lib/x86_64-linux-gnu/libkeyutils.so.1...

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffebf8) at /build/gdb-Nav6Es/gdb-10.1/gdb/gdb.c:28
28      /build/gdb-Nav6Es/gdb-10.1/gdb/gdb.c: Directory not empty.
(gdb) list
23      in /build/gdb-Nav6Es/gdb-10.1/gdb/gdb.c
(gdb) 

(See all those Downloading separate debug info for... lines? Nice!)

As you can see, when we try to list the contents of the file we're in, nothing shows up. This happens because GDB doesn't know where the file is. So you have to tell it. In this case, it's relatively easy: you see that the GDB package's build directory is /build/gdb-Nav6Es/gdb-10.1/. When you apt source gdb, you will end up with a directory called $PWD/gdb-10.1/ containing the full source of the package. Notice that the last directory's name in both paths is the same, so in this case we can use GDB's set substitute-path command do the job for us (in this example $PWD is /tmp/):

$ HOME=/tmp DEBUGINFOD_URLS=https://debuginfod.debian.net gdb -q gdb
Reading symbols from gdb...
Reading symbols from /tmp/.cache/debuginfod_client/02046bac4352940d19d9164bab73b2f5cefc8c73/debuginfo...
(gdb) set substitute-path /build/gdb-Nav6Es/ /tmp/
(gdb) start
Temporary breakpoint 1 at 0xd18e0: file /build/gdb-Nav6Es/gdb-10.1/gdb/gdb.c, line 28.
Starting program: /usr/bin/gdb 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffebf8) at /build/gdb-Nav6Es/gdb-10.1/gdb/gdb.c:28
warning: Source file is more recent than executable.
28        memset (&args, 0, sizeof args);
(gdb) list
23      int
24      main (int argc, char **argv)
25      {
26        struct captured_main_args args;
27
28        memset (&args, 0, sizeof args);
29        args.argc = argc;
30        args.argv = argv;
31        args.interpreter_p = INTERP_CONSOLE;
32        return gdb_main (&args);
(gdb)

Much better, huh? The problem is that this process is manual, and changes depending on how the package you're debugging was built.

What can we do to improve this? What I personally would like to see is something similar to what the Fedora project already does: create a new debug package which will contain the full, patched source package. This would mean changing our building infrastructure and possibly other somewhat complex things.

Using the service (by default)

At the time of this writing, I am working on an elfutils Merge Request whose purpose is to implement a debconf question to ask the user whether she wants to use our service by default.

If you would like to start using the service right now, all you have to do is set the following environment variable in your shell:

DEBUGINFOD_URLS="https://debuginfod.debian.net"

More information

You can find more information about our debuginfod service here. Try to keep an eye on the page as it's being constantly updated.

If you'd like to get in touch with me, my email is my domain at debian dot org.

I sincerely believe that this service is a step in the right direction, and hope that it can be useful to you :-).


Back in September, we had the GNU Tools Cauldron in the gorgeous city of Montréal (perhaps I should write a post specifically about it...). One of the sessions we had was the GDB BoF, where we discussed, among other things, how to improve our patch review system.

I have my own personal opinions about the current review system we use (mailing list-based, in a nutshell), and I haven't felt very confident to express it during the discussion. Anyway, the outcome was that at least 3 global maintainers have used or are currently using the Gerrit Code Review system for other projects, are happy with it, and that we should give it a try. Then, when it was time to decide who wanted to configure and set things up for the community, I volunteered. Hey, I'm already running the Buildbot master for GDB, what is the problem to manage yet another service? Oh, well.

Before we dive into the details involved in configuring and running gerrit in a machine, let me first say that I don't totally support the idea of migrating from mailing list to gerrit. I volunteered to set things up because I felt the community (or at least the its most active members) wanted to try it out. I don't necessarily agree with the choice.

Ah, and I'm writing this post mostly because I want to be able to close the 300+ tabs I had to open on my Firefox during these last weeks, when I was searching how to solve the myriad of problems I faced during the set up!

The initial plan

My very initial plan after I left the session room was to talk to the sourceware.org folks and ask them if it would be possible to host our gerrit there. Surprisingly, they already have a gerrit instance up and running. It's been set up back in 2016, it's running an old version of gerrit, and is pretty much abandoned. Actually, saying that it has been configured is an overstatement: it doesn't support authentication, user registration, barely supports projects, etc. It's basically what you get from a pristine installation of the gerrit RPM package in RHEL 6.

I won't go into details here, but after some discussion it was clear to me that the instance on sourceware would not be able to meet our needs (or at least what I had in mind for us), and that it would be really hard to bring it to the quality level I wanted. I decided to go look for other options.

The OSCI folks

Have I mentioned the OSCI project before? They are absolutely awesome. I really love working with them, because so far they've been able to meet every request I made! So, kudos to them! They're the folks that host our GDB Buildbot master. Their infrastructure is quite reliable (I never had a single problem), and Marc Dequénes (Duck) is very helpful, friendly and quick when replying to my questions :-).

So, it shouldn't come as a surprise the fact that when I decided to look for other another place to host gerrit, they were my first choice. And again, they delivered :-).

Now, it was time to start thinking about the gerrit set up.

User registration?

Over the course of these past 4 weeks, I had the opportunity to learn a bit more about how gerrit does things. One of the first things that negatively impressed me was the fact that gerrit doesn't handle user registration by itself. It is possible to have a very rudimentary user registration "system", but it relies on the site administration manually registering the users (via htpasswd) and managing everything by him/herself.

It was quite obvious to me that we would need some kind of access control (we're talking about a GNU project, with a copyright assignment requirement in place, after all), and the best way to implement it is by having registered users. And so my quest for the best user registration system began...

Gerrit supports some user authentication schemes, such as OpenID (not OpenID Connect!), OAuth2 (via plugin) and LDAP. I remembered hearing about FreeIPA a long time ago, and thought it made sense using it. Unfortunately, the project's community told me that installing FreeIPA on a Debian system is really hard, and since our VM is running Debian, it quickly became obvious that I should look somewhere else. I felt a bit sad at the beginning, because I thought FreeIPA would really be our silver bullet here, but then I noticed that it doesn't really offer a self-service user registration.

After exchanging a few emails with Marc, he told me about Keycloak. It's a full-fledged Identity Management and Access Management software, supports OAuth2, LDAP, and provides a self-service user registration system, which is exactly what we needed! However, upon reading the description of the project, I noticed that it is written in Java (JBOSS, to be more specific), and I was afraid that it was going to be very demanding on our system (after all, gerrit is also a Java program). So I decided to put it on hold and take a look at using LDAP...

Oh, man. Where do I start? Actually, I think it's enough to say that I just tried installing OpenLDAP, but gave up because it was too cumbersome to configure. Have you ever heard that LDAP is really complicated? I'm afraid this is true. I just didn't feel like wasting a lot of time trying to understand how it works, only to have to solve the "user registration" problem later (because of course, OpenLDAP is just an LDAP server).

OK, so what now? Back to Keycloak it is. I decided that instead of thinking that it was too big, I should actually install it and check it for real. Best decision, by the way!

Setting up Keycloak

It's pretty easy to set Keycloak up. The official website provides a .tar.gz file which contains the whole directory tree for the project, along with helper scripts, .jar files, configuration, etc. From there, you just need to follow the documentation, edit the configuration, and voilà.

For our specific setup I chose to use PostgreSQL instead of the built-in database. This is a bit more complicated to configure, because you need to download the JDBC driver, and install it in a strange way (at least for me, who is used to just editing a configuration file). I won't go into details on how to do this here, because it's easy to find on the internet. Bear in mind, though, that the official documentation is really incomplete when covering this topic! This is one of the guides I used, along with this other one (which covers MariaDB, but can be adapted to PostgreSQL as well).

Another interesting thing to notice is that Keycloak expects to be running on its own virtual domain, and not under a subdirectory (e.g, https://example.org instead of https://example.org/keycloak). For that reason, I chose to run our instance on another port. It is supposedly possible to configure Keycloak to run under a subdirectory, but it involves editing a lot of files, and I confess I couldn't make it fully work.

A last thing worth mentioning: the official documentation says that Keycloak needs Java 8 to run, but I've been using OpenJDK 11 without problems so far.

Setting up Gerrit

The fun begins now!

The gerrit project also offers a .war file ready to be deployed. After you download it, you can execute it and initialize a gerrit project (or application, as it's called). Gerrit will create a directory full of interesting stuff; the most important for us is the etc/ subdirectory, which contains all of the configuration files for the application.

After initializing everything, you can try starting gerrit to see if it works. This is where I had my first trouble. Gerrit also requires Java 8, but unlike Keycloak, it doesn't work out of the box with OpenJDK 11. I had to make a small but important addition in the file etc/gerrit.config:

[container]
    ...
    javaOptions = "--add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED"
    ...

After that, I was able to start gerrit. And then I started trying to set it up for OAuth2 authentication using Keycloak. This took a very long time, unfortunately. I was having several problems with Gerrit, and I wasn't sure how to solve them. I tried asking for help on the official mailing list, and was able to make some progress, but in the end I figured out what was missing: I had forgotten to add the AddEncodedSlashes On in the Apache configuration file! This was causing a very strange error on Gerrit (as you can see, a java.lang.StringIndexOutOfBoundsException!), which didn't make sense. In the end, my Apache config file looks like this:

<VirtualHost *:80>
    ServerName gnutoolchain-gerrit.osci.io

    RedirectPermanent / https://gnutoolchain-gerrit.osci.io/r/
</VirtualHost>

<VirtualHost *:443>
    ServerName gnutoolchain-gerrit.osci.io

    RedirectPermanent / /r/

    SSLEngine On
    SSLCertificateFile /path/to/cert.pem
    SSLCertificateKeyFile /path/to/privkey.pem
    SSLCertificateChainFile /path/to/chain.pem

    # Good practices for SSL
    # taken from: <https://mozilla.github.io/server-side-tls/ssl-config-generator/>

    # intermediate configuration, tweak to your needs
    SSLProtocol             all -SSLv3
    SSLCipherSuite          ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS
    SSLHonorCipherOrder     on
    SSLCompression          off
    SSLSessionTickets       off

    # OCSP Stapling, only in httpd 2.3.3 and later
    #SSLUseStapling          on
    #SSLStaplingResponderTimeout 5
    #SSLStaplingReturnResponderErrors off
    #SSLStaplingCache        shmcb:/var/run/ocsp(128000)

    # HSTS (mod_headers is required) (15768000 seconds = 6 months)
    Header always set Strict-Transport-Security "max-age=15768000"

    ProxyRequests Off
    ProxyVia Off
    ProxyPreserveHost On
    <Proxy *>
        Require all granted
    </Proxy>

    AllowEncodedSlashes On
        ProxyPass /r/ http://127.0.0.1:8081/ nocanon
        #ProxyPassReverse /r/ http://127.0.0.1:8081/r/
</VirtualHost>

I confess I was almost giving up Keycloak when I finally found the problem...

Anyway, after that things went more smoothly. I was finally able to make the user authentication work, then I made sure Keycloak's user registration feature also worked OK...

Ah, one interesting thing: the user logout wasn't really working as expected. The user was able to logout from gerrit, but not from Keycloak, so when the user clicked on "Sign in", Keycloak would tell gerrit that the user was already logged in, and gerrit would automatically log the user in again! I was able to solve this by redirecting the user to Keycloak's logout page, like this:

[auth]
    ...
    logoutUrl = https://keycloak-url:port/auth/realms/REALM/protocol/openid-connect/logout?redirect_uri=https://gerrit-url/
    ...

After that, it was already possible to start worrying about configure gerrit itself. I don't know if I'll write a post about that, but let me know if you want me to.

Conclusion

If you ask me if I'm totally comfortable with the way things are set up now, I can't say that I am 100%. I mean, the set up seems robust enough that it won't cause problems in the long run, but what bothers me is the fact that I'm using technologies that are alien to me. I'm used to setting up things written in Python, C, C++, with very simple yet powerful configuration mechanisms, and an easy to discover what's wrong when something bad happens.

I am reasonably satisfied with the Keycloak logs things, but Gerrit leaves a lot to be desired in that area. And both projects are written in languages/frameworks that I am absolutely not comfortable with. Like, it's really tough to debug something when you don't even know where the code is or how to modify it!

All in all, I'm happy that this whole adventure has come to an end, and now all that's left is to maintain it. I hope that the GDB community can make good use of this new service, and I hope that we can see a positive impact in the quality of the whole patch review process.

My final take is that this is all worth as long as the Free Software and the User Freedom are the ones who benefit.

P.S.: Before I forget, our gerrit instance is running at https://gnutoolchain-gerrit.osci.io.


Back in 2016, when life was simpler, a Fedora GDB user reported a bug (or a feature request, depending on how you interpret it) saying that GDB's gcore command did not respect the COREFILTER_ELF_HEADERS flag, which instructs it to dump memory pages containing ELF headers. As you may or may not remember, I have already written about the broader topic of revamping GDB's internal corefile dump algorithm; it's an interesting read and I recommend it if you don't know how Linux (or GDB) decides which mappings to dump to a corefile.

Anyway, even though the bug was interesting and had to do with a work I'd done before, I couldn't really work on it at the time, so I decided to put it in the TODO list. Of course, the "TODO list" is actually a crack where most things fall through and are usually never seen again, so I was blissfully ignoring this request because I had other major priorities to deal with. That is, until a seemingly unrelated problem forced me to face this once and for all!

What? A regression? Since when?

As the Fedora GDB maintainer, I'm routinely preparing new releases for Fedora Rawhide distribution, and sometimes for the stable versions of the distro as well. And I try to be very careful when dealing with new releases, because a regression introduced now can come and bite us (i.e., the Red Hat GDB team) back many years in the future, when it's sometimes too late or too difficult to fix things. So, a mandatory part of every release preparation is to actually run a regression test against the previous release, and make sure that everything is working correctly.

One of these days, some weeks ago, I had finished running the regression check for the release I was preparing when I noticed something strange: a specific, Fedora-only corefile test was FAILing. That's a no-no, so I started investigating and found that the underlying reason was that, when the corefile was being generated, the build-id note from the executable was not being copied over. Fedora GDB has a local patch whose job is to, given a corefile with a build-id note, locate the corresponding binary that generated it. Without the build-id note, no binary was being located.

Coincidentally or not, at the same I started noticing some users reporting very similar build-id issues on the freenode's #gdb channel, and I thought that this bug had a potential to become a big headache for us if nothing was done to fix it right now.

I asked for some help from the team, and we managed to discover that the problem was also happening with upstream gcore, and that it was probably something that binutils was doing, and not GDB. Hmm...

Ah, so it's ld's fault. Or is it?

So there I went, trying to confirm that it was binutils's fault, and not GDB's. Of course, if I could confirm this, then I could also tell the binutils guys to fix it, which meant less work for us :-).

With a lot of help from Keith Seitz, I was able to bisect the problem and found that it started with the following commit:

commit f6aec96dce1ddbd8961a3aa8a2925db2021719bb
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Feb 27 11:34:20 2018 -0800

    ld: Add --enable-separate-code

This is a commit that touches the linker, which is part of binutils. So that means this is not GDB's problem, right?!? Hmm. No, unfortunately not.

What the commit above does is to simply enable the use of --enable-separate-code (or -z separate-code) by default when linking an ELF program on x86_64 (more on that later). On a first glance, this change should not impact the corefile generation, and indeed, if you tell the Linux kernel to generate a corefile (for example, by doing sleep 60 & and then hitting C-\), you will notice that the build-id note is included into it! So GDB was still a suspect here. The investigation needed to continue.

What's with -z separate-code?

The -z separate-code option makes the code segment in the ELF file to put in a completely separated segment than data segment. This was done to increase the security of generated binaries. Before it, everything (code and data) was put together in the same memory region. What this means in practice is that, before, you would see something like this when you examined /proc/PID/smaps:

00400000-00401000 r-xp 00000000 fc:01 798593                             /file
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: rd ex mr mw me dw sd

And now, you will see two memory regions instead, like this:

00400000-00401000 r--p 00000000 fc:01 799548                             /file
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         4 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: rd mr mw me dw sd
00401000-00402000 r-xp 00001000 fc:01 799548                             /file
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: rd ex mr mw me dw sd

A few minor things have changed, but the most important of them is the fact that, before, the whole memory region had anonymous data in it, which means that it was considered an anonymous private mapping (anonymous because of the non-zero Anonymous amount of data; private because of the p in the r-xp permission bits). After -z separate-code was made default, the first memory mapping does not have Anonymous contents anymore, which means that it is now considered to be a file-backed private mapping instead.

GDB, corefile, and coredump_filter

It is important to mention that, unlike the Linux kernel, GDB doesn't have all of the necessary information readily available to decide the exact type of a memory mapping, so when I revamped this code back in 2015 I had to create some heuristics to try and determine this information. If you're curious, take a look at the linux-tdep.c file on GDB's source tree, specifically at the functions dump_mapping_p and linux_find_memory_regions_full.

When GDB is deciding which memory regions should be dumped into the corefile, it respects the value found at the /proc/PID/coredump_filter file. The default value for this file is 0x33, which, according to core(5), means:

Dump memory pages that are either anonymous private, anonymous
shared, ELF headers or HugeTLB.

GDB had the support implemented to dump almost all of these pages, except for the ELF headers variety. And, as you can probably infer, this means that, before the -z separate-code change, the very first memory mapping of the executable was being dumped, because it was marked as anonymous private. However, after the change, the first mapping (which contains only data, no code) wasn't being dumped anymore, because it was now considered by GDB to be a file-backed private mapping!

Finally, that is the reason for the difference between corefiles generated by GDB and Linux, and also the reason why the build-id note was not being included in the corefile anymore! You see, the first memory mapping contains not only the program's data, but also its ELF headers, which in turn contain the build-id information.

gcore, meet ELF headers

The solution was "simple": I needed to improve the current heuristics and teach GDB how to determine if a mapping contains an ELF header or not. For that, I chose to follow the Linux kernel's algorithm, which basically checks the first 4 bytes of the mapping and compares them against \177ELF, which is ELF's magic number. If the comparison succeeds, then we just assume we're dealing with a mapping that contains an ELF header and dump it.

In all fairness, Linux just dumps the first page (4K) of the mapping, in order to save space. It would be possible to make GDB do the same, but I chose the faster way and just dumped the whole mapping, which, in most scenarios, shouldn't be a big problem.

It's also interesting to mention that GDB will just perform this check if:

  • The heuristic has decided not to dump the mapping so far, and;
  • The mapping is private, and;
  • The mapping's offset is zero, and;
  • There is a request to dump mappings with ELF headers (i.e., coredump_filter).

Linux also makes these checks, by the way.

The patch, finally

I submitted the patch to the mailing list, and it was approved fairly quickly (with a few minor nits).

The reason I'm writing this blog post is because I'm very happy and proud with the whole process. It wasn't an easy task to investigate the underlying reason for the build-id failures, and it was interesting to come up with a solution that extended the work I did a few years ago. I was also able to close a few bug reports upstream, as well as the one reported against Fedora GDB.

The patch has been pushed, and is also present at the latest version of Fedora GDB for Rawhide. It wasn't possible to write a self-contained testcase for this problem, so I had to resort to using an external tool (eu-unstrip) in order to guarantee that the build-id note is correctly present in the corefile. But that's a small detail, of course.

Anyway, I hope this was an interesting (albeit large) read!


Heya!

This past Saturday, April 27th, 2019, Samuel Vale, Alex Volkov and I organized the Toronto Bug Squashing Party here in the city. I was very happy with the outcome, especially the fact that we had more than 10 people attending, including a bunch of folks that came from Montréal!

The start

It was a cold day in Toronto, and we met at the Mozilla Toronto office at 9 in the morning. Right there at the door I met anarcat, who had just arrived from Montréal. Together with Alex, we waited for Will to arrive and open the door for us. Then, some more folks started showing up, and we waited until 10:30h to start the first presentation of the day.

Packaging 101

Anarcat kindly gave us his famous "Packaging 101" presentation, in which he explains the basics of Debian packaging. Here's a picture of the presentation:

anarcat presenting Packaging 101, side

And another one:

anarcat presenting Packaging 101, front

The presentation was great, and Alex recorded it! You can watch it here (sorry, youtube link...).

During the day, we've also taught a few tricks about the BTS, in order to help people file bugs, add/remove tags, comment on bugs, etc.

Then, we moved on to the actual hacking.

Bug fixing

This part took most of the day, as was expected. We started by looking at the RC bugs currently filed against Buster, and deciding which ones would be interesting for us. I won't go into details here, but I think we made great progress, considering this was the first BSP for many of us there (myself included).

You can look at the bugs we worked on, and you will see that we have actually fixed 6 of them! I even fixed a JavaScript bug, which is something totally out of my area of expertise ;-).

I also noticed something interesting. The way we look at bugs can vary wildly between one DD and another. I mean, this is something I always knew, especially when I was more involved with the debian-mentors effort, but it's really amazing to feel this in person. I tend to be more picky when it comes to defining what to do when I start to work on a bug; I try really hard to reproduce it (and spend a lot of time doing so), and will really dive deep into the code trying to understand why some test is failing. Other developer may be less "pedantic", and choose to (e.g.) disable certain test that is failing. In the end, I think everything is a balance and I tried to learn from this experience.

Anyway, given that we looked at 12 bugs and solved 6, I think we did great! And this also helped me to get my head "back in the Debian game"; I was too involved with GDB these past months (there's a post about one of the things I did which is coming soon, stay tunned).

Look at us hacking:

Everybody hacking

Wrap up

At 19h (or 7p.m.), we had to wrap up and prepare to go. Because we had a sizeable number of Brazilians in the group (5!), the logical thing to do was to go to a pub and resume the conversation there :-). If I say it was one of the first times I went to a pub to drink with newly made friends in Toronto, you probably wouldn't believe, so I won't say anything...

I know one thing for sure: we want to make this again, and soon! In fact, my idea is to do another one after Buster is released (and after the summer is gone, of course), so maybe October. We'll see.

Acknowledgements

I would like to thank Mozilla Toronto for hosting us; it was awesome to finally visit their office and enjoy their hospitality, personified by Will Hawkins. It is impossible not to thank anarcat, who came all the way from Montréal to give us his Debian Packaging 101 talk. Speaking of the French-Canadian (and Brazilian), it was super awesome meeting Tiago Vaz and Tássia Camões, and it was great seeing Valessio Brito again.

Let me also thank the "locals" who attended the party; it was great seeing everybody there! Hope I can see everybody again when we make the second edition of our BSP :-).


Hello, Planet Debian


Tags:

Hey, there. This is long overdue: my entry in Planet Debian! I'm creating this post because, until now, I didn't have a debian tag in my blog! Well, not anymore.

Stay tunned!