Question
Understanding the flow of the kernel upon receiving a SIGSEGV for null-dereference
I'm trying to figure out the sequence of things that occur inside the Linux kernel (x86_64, v6.9) when we write these two codes:
// Null-dereference + writing to page zero
*(char *)0 = 0;
// Null-dereference + only reading from page zero
char c = *(char *)0;
I tried to analyze it with the Ftrace, and this is what I got:
handle_mm_fault <-- do_user_addr_fault
sanitize_fault_flags <-- handle_mm_fault
arch_vma_access_permitted <-- handle_mm_fault
bad_area_nosemaphore <-- do_user_addr_fault
__bad_area_nosemaphore <-- do_user_addr_fault
force_sig_fault <-- __bad_area_nosemaphore
So from my understanding, we cause a page fault, and somehow arch_vma_access_permitted()
OR sanitize_fault_flags()
decides to return VM_FAULT_SIGSEGV
and __bad_area_nosemaphore()
uses that to send a SIGSEGV
to the process with force_sig_fault()
. My question is, what is the permission of page zero? Does it even get mapped in the first place? If it doesn't, then I think vma_is_foreign()
should cover this situation and cause the segmentation fault. I also found something interesting in load_elf_binary()
for emulating the ABI behavior of previous Linux versions:
if (current->personality & MMAP_PAGE_ZERO) {
/* Why this, you ask??? Well SVr4 maps page 0 as read-only,
and some applications "depend" upon this behavior.
Since we do not have the power to recompile these, we
emulate the SVr4 behavior. Sigh. */
error = vm_mmap(NULL, 0, PAGE_SIZE, PROT_READ | PROT_EXEC,
MAP_FIXED | MAP_PRIVATE, 0);
}
The most confusing part of it is the PROT_EXEC
. Why do we need to store instructions inside of page zero? And if it has PROT_READ
as well, then while current->personality
has MMAP_PAGE_ZERO
, reading from page zero should not cause a segmentation fault, right? Couldn't find the SVr4 spec so I'm not sure about the details. I'm also not certain when this personality
applies to a task but we can conclude that in some scenarios page zero GETS mapped (of course we can remap it by using mmap()
if mmap_min_addr
is zero, but I'm talking about the default behavior right now, not remapping). I can't find any other vm_mmap()
or do_mmap()
that is mapping page zero.