SK-SKB eBPF hook

Miscellaneous Content

Before I start to talk about technical terms, first I wanted to share the state I am in when writing this post about eBPF. I am pretty sure most of the people do not care and will bypass this section but still I guess its important for me to share couple of lines first. I read most of the blogs and I believe most of them lack the engaging part, the intention, I just want to implement a different version (I am not saying I invented it I am merely using it) in writing blogs which is both fun and it might help you to learn something.

I recently started the climbing (specifically bouldering). It's a tough exercise and requires lots of upper body strength. I am writing this blog after a 90 minutes of bouldering. Yes, my forarms are killing me and Fell multiple times but still once I got home I was excited to write this blog. Do not ask me why!! you might think this guy is crazy should watch a movie rest or do something else but he is writing instead. I must say you are absolutely right. However, my brain does not work like that. When I am tired whether its after a rough mountain biking, gym, or bouldering session, It is in its' super creative position. This is the critial time that I get my brilliant ideas and start to working on them.

I guess Having said all of these I guess its time to get into the eBPF.

eBPF

I am not going to start from scratch and tell you what eBPF stands for and what it can do. It would be a good practice for you to read about eBPF, but in simple words this is just a small virtual machine inside Linux kernel, that allows you to run custom programs in the kernel space.

You might say why this is important??? it is important because chaning the kernel source code is not needed. Basically, with eBPF you write a program at the runtime then inject it into the kernel space and eBPF virtual machine will run the program for you. Whereas if this was not the case whenever you wanted to add a simple program lets just say a monitoring tool to see how many times a function (e.g, syscall function) gets called you needed to add the counter inside the function then recompile the kernel, ..... so you needed to go through hell before actually to start monitor a function, and you would have wasted 10 minutes of your life only just by compiling the kernel :))

With eBPF (specifically ebpf tools such as bpftrace, ....) these days only with a simple AI query you can monitor a syscall in less than one minute. I just asked Gemini to give me a bpftrace to count the number of times a system-call was invoked:

sudo bpftrace -e 'tracepoint:syscalls:sys_enter_read { @count = count(); }'

So it seems that this eBPF thingy is super beneficial, easy to use, and practical. One thing that is super important to us is that it has specific hooks in the kernel for packet processing. Although you can use it for monitoring, the most important part of its usage is packet processing. It allows you to process packets in different points in the kernel network stack. These packet hooks start with XDP which basically runs on the NIC or driver (based on NIC's support of XDP), this hook is also known as L2 layer. After that hook we reach the TC layer that runs before Netfilter in the kernel and is called L4 (IP/TCP) layer. It is one of the most powerful hooks, it allows you to bypass the Netfilter completely, do DNAT/SNAT and change a packet's metadata, its like a gold mine. After these layers we have other hooks that run before user-space on the socket layer (L7 layer in some sense), basically when network stack wants to transfer a packet to user-space first these hooks are being called. Today we are going to talk about these hooks specifically, L7 layer hooks and how they can actully allow us to enhance the traffic redirection in a L7 Proxy.

L7 Proxy

You are probably asking yourself this question, what is this dude is saying. What is an L7 Proxy?? In real word, many of companies they have exposed servers that they function as a loadbalancer or I do not know something else. The role of these servers is to based on some policies for instance matching the SNI, IP address, port, protocol, and etc, to redirect the traffic to an actual server. This is called L7 proxy simply. Why L7 not L4 because it sits in the user-space instead of kernel-space. So I am restrictively talking about a proxy that sits in the userspace and monitors the traffic. Or the other function of an L7 proxy is to see if the traffic that is traversing this proxy machine is malicious. As result, traffic needs to go to userspace to be monitored or to be matched with one of these policies.

If I wanna give you a simple analogy you can think of the proxy as a postman that can read the letters. lets say the postman name is good old man BOB. Everytime you wanna visit a website you must go to the post office and hand over your request to him. He watches the letter and does every neccessary steps to see if you are a good person. Then he decides to send your letter to the website you requested. As you see everything goes through BOB before reaching the actual server. However, most of the times BOB knows you and trust you because you gave him a cookie, and he is happy with that 😄. Therefore, you do not need to give the letter to him. You basically can put it on the send pile then go home. You see you are not bothering the BOB with more work, the letter is on the send pile and it will be delivered. So keep this last part in your mind we are going back to this (right now I am feeling like a director that does flashbacks 🤣).

So one major problem with these L7 proxies is that once a traffic reaches the user-space you cannot tell the kernel to redirect the traffic for you. The user-space now is responsible to read the traffic from one socket and redirecting it another socket (the server side). In real world scenarios we encounter these kind of traffic very much. Lets say we have a proxy that does certificate inspections. This means that the Proxy does the TLS handshake with client and makes another connection with the requested website. After it checks the certificates, if everything is good it starts transferring the traffic from one socket to another one.

You see this is a performance bottelneck, you might ask yourself why?? user-space is basically does nothing after the certificate inspection. Merely, gets the packet from one side and transfer it to another one. But traffic needs to be redirected to user-space, meaning we are paying the cost of context switch, waking up, and in some cases the user/kernel copy (for instance if IO_URING is being used this cost is mitigated!). This is where eBPF can help you to overcome the problem.

SK-SKB eBPF hook

So far you might have guessed what I am going to put in this section. If not it is also okay, I needed to write it anyway :) Previous section showed that when a traffic reaches the user-space, kernel basically tells the application (user-space) you are responsible for this traffic starting from now. As a result, even if a traffic needs to be redirected to an actual website, user-space needs to read from one socket (the client side) and redirect it to the second one (server side).

But what if there was a way that tell the kernel please redirect the traffic for us in the kernel. This way user-space is taken out of the equations, resulting in no context-switch, application wake up, and copying. It turned out that eBPF can give you this piece. But before getting into the eBPF first lets see the traditional methods of doing this.

One way that this was possible was through using splice syscall. On simple words this syscall takes two file descriptor (fds) and transfer the data from the source one to the destination one. However, this still needs the user-space interaction. By that I mean user-space waits for an event (using EPOLL or IO_URING) then calls the splice this will transfer data in the kernel without user-space intervension.

splice is a good way to achieve traffic redirection. Ironically for doing this for TCP sockets is called TCP splicing. Just look how famous this syscall is! However, this method still needs the user-space interaction. On the other hand, eBPF does this job much cleaner. By using SK_SKB hook eBPF watchs the fds that you are interested to redirect data from. Okay here is a little bit technical but I gurantee I will explain it in a good way.

Now I wanna talk about the SK_SKB transfer method. So imagine you have two TCP sockets with the first TCP socket fd 1 and second one fd 2. You wanna transfer the data from socket 1 to socket 2. For this you need to add the fds to an eBPF SOCKMAP. This SOCKMAP is an hashmap (as simple as that). When a message comes on the recieve queue of the first socket (fd 1), the eBPF program will be executed for that message. The reason that eBPF program will be executed is that you added the fd to the SOCKMAP so kernel knows that for every message on that socket the eBPF program needs to be executed. In that eBPF program we can decide where this message should be sent. Et voilà!! magic, the traffic can be redirected completely in the kernel side. Remeber that I told you I will come back to the postal office analogy. Lets just assume that BOB has an assistance, once BOB decides that letters from a specific sender are safe, he tells the assistant that do not send those letters to me anymore, put it directly on the send pile. This assistant is the eBPF program.

I guess SK_SKB is mostly implemented to redirect the traffic of TCP. However, UDP sockets in some cases are allowed, you need to call the connect syscall on the incoming traffic. However, I like to invite you to think about QUIC or more specifically UDP traffic. These kinds of traffics are basically stateless because of the UDP and they mostly contain small packets, redirecting them as soon as possible just gonna save millions of CPU cycles and would be more energy efficient, keeps our planet safe. So comming up with a XDP or TC hook to redirect these kinds of traffic you are basically not only helping your company to profit you are saving the planet!

Just to be clear this hook is not that great on the kernel v4.14 (this is when the hook was implemented). However, these days it has been refined and deliveres a good performance. So I would suggest to use it with more recent kernels specifically v6 and over. You can read kernel mailing list to see what kind of optimizations it went through I will put a link below.

I will not add the code for this in this blog but soon I will post it on my github and put the link here. So stay tuned.

Final Thoughts

How I came to know this hook? I saw that when a traffic hits the user-space and does not need to be (deeply) inspected (like ssl-inspection) in the L7 Proxies, the redirection happens in the userspace. I told myself there is definitely a way to redirect the traffic in the kernel. I searched, searched until I found this hook.

Resources

SK_SKB page

SK_SKB cloudflare

SK_SKB performance over new kernel

UDP SOCKETS