Reimplement of UOJ run program in GO: go-judger. Start after I found libseccomp that uses seccomp filter introduced in linux 3.8 (2013). Since I have participated that project (uoj) only a little, I decided to try to do some contributions.
Original implements
The original run program restricted resources (CPU, memory, output) and file access by ptrace
. Including following steps:
Setup up step after fork in child:
- Set resource limits by
setrlimit
- Set environment variables
- Set input / output files
execv
Tracing after fork in parent:
Setup ptrace
options when trapped at execv
wait4
at syscall entrance- Check resource usage, wait status, signals and syscall black list to determine terminate or soft ban
ptrace syscall
enter syscallwait4
at syscall exits- Set syscall return value
ptrace syscall
exit syscall
In this scenario, the traced process required to stop for each syscall and for both entrance and exiting. For harmless syscalls (e.g. brk
, read
), this introduces some resource overhead.
Reimplements
For the newly implemented seccomp
BPF filter provided by libseccomp
, this kind of syscall will handled by the kernel to avoid too much context switch. Also, for a single traced syscall, seccomp
will only be triggered once.
Thus, the new implement becomes.
Setup step after fork in child:
- Set resource limits by
setrlimit
- Set input / output files
- Load
seccomp
filter - Stop itself by
SIGSTOP
execve
with environment variables
Tracing after fork in parent:
Setup ptrace
options when trapped by SIGSTOP
wait4
at seccomp event- Check resource usage, wait status, signals, and call syscall event handles. Handle determins whether to terminates or soft ban
ptrace continue
enter syscall
Notice that SIGSTOP
before execve
is required since if execve
is traced but the ptrace
option have not set up yet, ENOSYS
will returned to execve
. Safe syscalls was allowed by the filter so there is no ptrace
event triggered by safe syscalls.
Also, by setting syscall number to -1
and return value to the register, the soft ban mechanism becomes much efficient.
With all that implemented, process_vm_readv
is used to speed up copy syscall argument instead of ptrace peekdata
.
Conclusion
In conclusion, by restrict CPU time, memory, output, syscalls and file access, run program is able to block potential attacks.
Since GO language does not provides official implements for fork
for runtime duplication issue, it took some time to figure out the usage of raw syscall interface. Because after fork in child, I cannot call any go function, I did buffed the seccomp filter to allow load it after fork. Also, process_vm_readv
is not provided so I wrote a wrapper for it.