What is Seccomp?
A large number of system calls are exposed to every userland process with many of them going unused for the entire lifetime of the process. Seccomp filtering provides a means for a process to specify a filter for incoming system calls. The filter is expressed as a Berkeley Packet Filter (BPF) program, as with socket filters, except that the data operated on is related to the system call being made: system call number and the system call arguments.
Seccomp has three primary modes:
SECCOMP_MODE_STRICT
— Turn all security measures that Seccomp provides onSECCOMP_MODE_FILTER
— Allows the developer/user to restrict certain actions via filtersSECCOMP_MODE_DISABLED
— Disable Seccomp on the machine
We can easily find which syscalls are blocked by the process by using seccomp tools
Bypassing Seccomp
Bypassing Seccomp is the userland is impossible and it can only be evaded due to improper implementation. For the purpose of demonstration I will be exploiting the gissa 2 challenge in the Midnight Sun CTF Quals 2019. In this challenge the seccomp was not properly implementated and can be bypassed easily. You can download here.
If we use seccomp-tool
on this file we get the following result.
1
seccomp-tools dump ./gissa_igen
As we can see we get that the syscalls like open
, execve
, etc are disabled but on a close look we find that this filter is different from what the general seccomp.
As we can clearly see that on line 3 and 4 of the latter example it blocks syscalls that have value greater than 0x40000000
. In case of first example we can try to pass syscalls that are greater than 0x40000000
. We add 0x40000000
the offset to the original syscall value to get those syscalls. Thus seccomp is bypassed. An example of this would be to call write
we need to pass the file descriptor (let us assume 0x1
) and the address of the buffer (let us assume 0xdeadbeef
) and we are writing 0x100
bytes. Thus the assembly would be:
1
2
3
4
5
mov rdi, 0x1
mov rsi, 0xdeadbeef
mov rdx, 0x100
mov rax, 0x1
syscall
Another way which would bypass the improper implementation would be:
1
2
3
4
5
mov rdi, 0x1
mov rsi, 0xdeadbeef
mov rdx, 0x100
mov rax, 0x40000001
syscall
You can find the writeup in detail of the challenge in the references section.
Advanced Bypassing techniques
We saw that in the above examples seccomp filters are applied at the start of the process but it can also be applied at the end of the process. In that case, if the process creates a child process then the seccomp will not be applied to the child. Thus we can read the memory of the child process.
One such challenge was in Google CTF 2020 called write only. In that challenge all syscalls except open
and write
were blocked. Thus, we had to read the flag from the memory of the child process. A writeup to that challenge can be found in the resources section.
Disabling Seccomp
Seccomp is implemented using the Berkeley Packet Filter and is done by the kernel using the __secure_computing
(more info here). The information about the process seccomp is stored inside seccomp
struct which is inside the task_struct
(more info here). The kernel sets a TIF_SECCOMP
bit to 1
to indicate that seccomp is enabled.
If we manage to change the TIF_SECCOMP
bit to 0
then we would disable the seccomp in the process. Current task_struct
of the process is stored in the gs
register. The seccomp
struct is at an offset of 0x15d00
from the base of the gs
register. On referencing this address we find the variable flags
whose 8th bit is the bit we want to reset.
1
2
3
4
5
6
7
8
9
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/cred.h>
MODULE_LICENSE ("GPL");
void disable_seccomp(void){
current->thread_info.flags &= ~(_TIF_SECCOMP);
}
The following module disables the seccomp in the current process that we are working. Here the _TIF_SECCOMP
macro is used whose value is equal to 1<<(TIF_SECCOMP)
.
Writing the Shellcode
The above code works fine but when we objdump the following code in order to convert it to working shellcode we can run the following command to get the assembly.
1
Objdump -M intel -d (kernel module name)
We can then use the shellcode that is generated …… but it wont work. If we carefully see we find that the offset of the gs
register is wrong. This is because fs
and gs
are exceptions that were added to address thread-specific data. Their real base addresses are stored in MSRs (model specific registers) instead of the descriptor table. The MSRs are only accessible in kernel mode. Thus we have to manually write assemble and then convert it to shellcode using defuse.ca
I would not be providing the exact details on how to generate the shellcode since it is a challenge in pwn college kernel module. You can always ask for hints to solve this challenge on their official discord server. If you want to try this exact challenge then solve the babykernel_level8.0
challenge in the kernel module of pwn college.
References
Article 1 : https://ajxchapman.github.io/linux/2016/08/31/seccomp-and-seccomp-bpf.html
Article 2 : http://blog.redrocket.club/2019/04/11/midnightsunctf-quals-2019-gissa2/
Article 3 : https://blog.bi0s.in/2020/08/24/Pwn/GCTF20-Writeonly/
Article 4 : https://reverseengineering.stackexchange.com/questions/21033/windbg-why-does-the-gs-register-resolve-to-offset-0x0