Tag: linux (subscribe)
Improve gcore and support dumping ELF headers
Back in 2016, when life was simpler, a Fedora GDB user
reported a bug
(or a feature request, depending on how you interpret it) saying that
gcore command did not respect the
flag, which instructs it to dump memory pages containing ELF headers.
As you may or may not remember, I have
written about the broader topic of revamping GDB's internal corefile dump algorithm;
it's an interesting read and I recommend it if you don't know how
Linux (or GDB) decides which mappings to dump to a corefile.
Anyway, even though the bug was interesting and had to do with a work I'd done before, I couldn't really work on it at the time, so I decided to put it in the TODO list. Of course, the "TODO list" is actually a crack where most things fall through and are usually never seen again, so I was blissfully ignoring this request because I had other major priorities to deal with. That is, until a seemingly unrelated problem forced me to face this once and for all!
What? A regression? Since when?
As the Fedora GDB maintainer, I'm routinely preparing new releases for Fedora Rawhide distribution, and sometimes for the stable versions of the distro as well. And I try to be very careful when dealing with new releases, because a regression introduced now can come and bite us (i.e., the Red Hat GDB team) back many years in the future, when it's sometimes too late or too difficult to fix things. So, a mandatory part of every release preparation is to actually run a regression test against the previous release, and make sure that everything is working correctly.
One of these days, some weeks ago, I had finished running the regression check for the release I was preparing when I noticed something strange: a specific, Fedora-only corefile test was FAILing. That's a no-no, so I started investigating and found that the underlying reason was that, when the corefile was being generated, the build-id note from the executable was not being copied over. Fedora GDB has a local patch whose job is to, given a corefile with a build-id note, locate the corresponding binary that generated it. Without the build-id note, no binary was being located.
Coincidentally or not, at the same I started noticing some users
reporting very similar build-id issues on the freenode's
channel, and I thought that this bug had a potential to become a big
headache for us if nothing was done to fix it right now.
I asked for some help from the team, and we managed to discover that
the problem was also happening with upstream
gcore, and that it was
probably something that binutils was doing, and not GDB. Hmm...
Ah, so it's
ld's fault. Or is it?
So there I went, trying to confirm that it was binutils's fault, and not GDB's. Of course, if I could confirm this, then I could also tell the binutils guys to fix it, which meant less work for us :-).
With a lot of help from Keith Seitz, I was able to bisect the problem and found that it started with the following commit:
commit f6aec96dce1ddbd8961a3aa8a2925db2021719bb Author: H.J. Lu <firstname.lastname@example.org> Date: Tue Feb 27 11:34:20 2018 -0800 ld: Add --enable-separate-code
This is a commit that touches the linker, which is part of binutils. So that means this is not GDB's problem, right?!? Hmm. No, unfortunately not.
What the commit above does is to simply enable the use of
-z separate-code) by default when
linking an ELF program on x86_64 (more on that later). On a first
glance, this change should not impact the corefile generation, and
indeed, if you tell the Linux kernel to generate a corefile (for
example, by doing
sleep 60 & and then hitting
C-\), you will
notice that the build-id note is included into it! So GDB was
still a suspect here. The investigation needed to continue.
-z separate-code option makes the code segment in the ELF file
to put in a completely separated segment than data segment. This was
done to increase the security of generated binaries. Before it,
everything (code and data) was put together in the same memory
region. What this means in practice is that, before, you would see
something like this when you examined
00400000-00401000 r-xp 00000000 fc:01 798593 /file Size: 4 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 4 kB Pss: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 4 kB Referenced: 4 kB Anonymous: 4 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd ex mr mw me dw sd
And now, you will see two memory regions instead, like this:
00400000-00401000 r--p 00000000 fc:01 799548 /file Size: 4 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 4 kB Pss: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 4 kB Private_Dirty: 0 kB Referenced: 4 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd mr mw me dw sd 00401000-00402000 r-xp 00001000 fc:01 799548 /file Size: 4 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 4 kB Pss: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 4 kB Referenced: 4 kB Anonymous: 4 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd ex mr mw me dw sd
A few minor things have changed, but the most important of them is the
fact that, before, the whole memory region had anonymous data in
it, which means that it was considered an anonymous private
mapping (anonymous because of the non-zero Anonymous amount of
data; private because of the
p in the
r-xp permission bits).
-z separate-code was made default, the first memory mapping
does not have Anonymous contents anymore, which means that it is
now considered to be a file-backed private mapping instead.
GDB, corefile, and coredump_filter
It is important to mention that, unlike the Linux kernel, GDB doesn't
have all of the necessary information readily available to decide the
exact type of a memory mapping, so when I revamped this code back in
2015 I had to create some heuristics to try and determine this
information. If you're curious, take a look at the
file on GDB's source tree, specifically at the
When GDB is deciding which memory regions should be dumped into the
corefile, it respects the value found at the
/proc/PID/coredump_filter file. The default value for this file is
0x33, which, according to
Dump memory pages that are either anonymous private, anonymous shared, ELF headers or HugeTLB.
GDB had the support implemented to dump almost all of these pages,
except for the ELF headers variety. And, as you can probably infer,
this means that, before the
-z separate-code change, the very first
memory mapping of the executable was being dumped, because it was
marked as anonymous private. However, after the change, the first
mapping (which contains only data, no code) wasn't being dumped
anymore, because it was now considered by GDB to be a file-backed
Finally, that is the reason for the difference between corefiles generated by GDB and Linux, and also the reason why the build-id note was not being included in the corefile anymore! You see, the first memory mapping contains not only the program's data, but also its ELF headers, which in turn contain the build-id information.
gcore, meet ELF headers
The solution was "simple": I needed to improve the current heuristics
and teach GDB how to determine if a mapping contains an ELF header or
not. For that, I chose to follow the Linux kernel's algorithm, which
basically checks the first 4 bytes of the mapping and compares them
\177ELF, which is ELF's magic number. If the comparison
succeeds, then we just assume we're dealing with a mapping that
contains an ELF header and dump it.
In all fairness, Linux just dumps the first page (4K) of the mapping, in order to save space. It would be possible to make GDB do the same, but I chose the faster way and just dumped the whole mapping, which, in most scenarios, shouldn't be a big problem.
It's also interesting to mention that GDB will just perform this check if:
- The heuristic has decided not to dump the mapping so far, and;
- The mapping is private, and;
- The mapping's offset is zero, and;
- There is a request to dump mappings with ELF headers (i.e.,
Linux also makes these checks, by the way.
The patch, finally
I submitted the patch to the mailing list, and it was approved fairly quickly (with a few minor nits).
The reason I'm writing this blog post is because I'm very happy and proud with the whole process. It wasn't an easy task to investigate the underlying reason for the build-id failures, and it was interesting to come up with a solution that extended the work I did a few years ago. I was also able to close a few bug reports upstream, as well as the one reported against Fedora GDB.
The patch has
and is also present at the latest version of Fedora GDB for Rawhide.
It wasn't possible to write a self-contained testcase for this
problem, so I had to resort to using an external tool (
in order to guarantee that the build-id note is correctly present in
the corefile. But that's a small detail, of course.
Anyway, I hope this was an interesting (albeit large) read!
Memory mappings, core dumps, GDB and Linux
After spending the last weeks struggling with this, I decided to write a blog post. First, what is “this” that you are talking about? The answer is: Linux kernel's concept of memory mapping. I found it utterly confused, beyond my expectations, and so I believe that a blog post is the write way to (a) preserve and (b) share this knowledge. So, let's do it!
First things first
First, I cannot begin this post without a few acknowledgements and “thank you's”. The first goes to Oleg Nesterov (sorry, I could not find his website), a Linux kernel guru who really helped me a lot through the whole task. Another “thank you” goes to Jan Kratochvil, who also provided valuable feedback by commenting my GDB patch. Now, back to the point.
The task was requested
needed to respect the
/proc/<PID>/coredump_filter file when generating
a coredump (i.e., when you use the
Currently, GDB has his own coredump mechanism implemented which, despite its limitations and bugs, has been around for quite some time. However, and maybe you don't know that, but the Linux kernel has its own algorithm for generating the corefile of a process. And unfortunately, GDB and Linux were not really following the same standards here...
So, in the end, the task was about synchronizing GDB and Linux. To do
that, I first had to decipher the contents of the
This special file, generated by the Linux kernel when you read it,
contains detailed information about each memory mapping of a certain
process. Some of the fields on this file are documented in the
manpage, but others are missing there (asking for a patch!). Here is an
explanation of everything I needed:
The first line of each memory mapping has the following format:
The fields here are:
a) address is the address range, in the process' address space, that the mapping occupies. This part was already treated by GDB, so I did not have to worry about it.
b) perms is a set of permissions (r ead, w rite, e x ecute, s hared, p rivate [COW -- copy-on-write]) applied to the memory mapping. GDB was already dealing with
rwxpermissions, but I needed to include the
pflag as well. I also made GDB ignore the mappings that did not have the
rflag active, because it does not make sense to dump something that you cannot read.
c) offset is the offset into the applied to the file, if the mapping is file-backed (see below). GDB already handled this correctly.
d) dev is the device (major:minor) related to the file, if there is one. GDB already handled this correctly, though I was using this field for more things (continue reading).
e) inode is the inode on the device above. The value of zero means that no inode is associated with the memory mapping. Nothing to do here.
f) pathname is the file associate with this mapping, if there is one. This is one of the most important fields that I had to use, and one of the most complicated to understand completely. GDB now uses this to heuristically identify whether the mapping is anonymous or not.
GDB is now also interested in
AnonHugePages:fields from the
smapsfile. Those fields represent the content of anonymous data on the mapping; if GDB finds that this content is greater than zero, this means that the mapping is anonymous.
The last, but perhaps most important field, is the
VmFlags:field. It contains a series of two-letter flags that provide very useful information about the mapping. A description of the fields is: a)
sh: the mapping is shared (
dd: this mapping should not be dumped in a corefile (
ht: this is HugeTLB mapping
With that in hands, the following task was to be able to determine whether a memory mapping is anonymous or file-backed, private or shared.
Types of memory mappings
There can be four types of memory mappings:
- Anonymous private mapping
- Anonymous shared mapping
- File-backed private mapping
- File-backed shared mapping
It should be possible to uniquely identify each mapping based on the
information provided by the
smaps file; however, you will see that
this is not always the case. Below, I will explain how to determine each
of the four characteristics that define a mapping.
A mapping is anonymous if one of these conditions apply:
pathnameassociated with it is either
/SYSV%08x (deleted), or
<filename> (deleted)(see below).
- There is content in the
Anonymous:or in the
AnonHugePages:fields of the mapping in the
A special explanation is needed for the
<filename> (deleted) case. It
is not always guaranteed that it identifies an anonymous mapping; in
fact, it is possible to have the
(deleted) part for file-backed
mappings as well (say, when you are running a program that uses shared
libraries, and those shared libraries have been removed because of an
update, for example). However, we are trying to mimic the behavior of
the Linux kernel here, which checks to see if a file has no hard links
associated with it (and therefore is truly deleted).
Although it may be possible for the userspace to do an extensive check
stat ing the file, for example), the Linux kernel certainly could
give more information about this.
A mapping is file-backed (i.e., not anonymous) if:
pathnameassociated with it contains a
<filename>, without the
As has been explained above, a mapping whose
pathname contains the
(deleted) string could still be file-backed, but we decide to consider
It is also worth mentioning that a mapping can be simultaneously
anonymous and file-backed: this happens when the mapping contains a
pathname (without the
(deleted) part), but also contains
A mapping is considered to be private (i.e., not shared) if:
- In the absence of the
VmFlagsfield (in the
smapsfile), its permission field has the flag
- If the
VmFlagsfield is present, then the mapping is private if we do not find the
A mapping is shared (i.e., not private) if:
- In the absence of
smapsfile, the permission field of the mapping does not have the
pflag. Not having this flag actually means
VM_MAYSHAREand not necessarily
VM_SHARED(which is what we want), but it is the best approximation we have.
- If the
VmFlagsfield is present, then the mapping is shared if we find the
With all that in mind, I hacked GDB to improve the coredump mechanism for GNU/Linux operating systems. The main function which decides the memory mappings that will or will not be dumped on GNU/Linux is linux_find_memory_regions_full; the Linux kernel obviously uses its own function, vma_dump_size, to do the same thing.
Linux has one advantage: it is a kernel, and therefore has much more
knowledge about processes' internals than a userspace program. For
example, inside Linux it is trivial to check if a file marked as
(deleted)" in the output of the
smaps file has no hard links
associated with it (and therefore is not really deleted); the same
operation on userspace, however, would require root access to inspect
the contents of the
The case described above, if you remember, is something that impacts the
ability to tell whether a mapping is anonymous or not. I am talking to
the Linux kernel guys to see if it is possible to export this
information directly via the
smaps file, instead of having to do the
While doing this work, some strange behaviors were found in the Linux
kernel. Oleg is working on them, along with other Linux hackers. From
our side, there is still room for improvement on this code. The first
thing I can think of is to improve the heuristics for finding anonymous
mappings. Another relatively easy thing to do would be to let the user
specify a value for
coredump_filter on the command line, without
/proc file. And of course, keep this code always updated
with its counterpart in the Linux kernel.
Upstream discussions and commit
If you are interested, you can see the discussions that happened upstream by going to this link. This is the fourth (and final) submission of the patch; you should be able to find the other submissions in the archive.
The final commit can be found in the official repository.