LinuxMeerkat

I swear! Meerkats can do Linux


Leave a comment

Who does the cleanup, the process or the kernel?

For a long time I believed that cleanup is work that the kernel does. However that is not completely true. Part of cleanup is also taken by the process itself. But let’s dive into the details..

Ways to terminate a program

There are some ways to terminate a process from whithin the process itself(also referred sometimes as “normal termination”):

  • exit()
  • _exit()
  • return

(break can not be used, as it must be within a switch or loop.)

There are other ways to terminate the process from outside the process by sending a signal. The signal can be sent from a terminal by typing

kill [processID]

or by hitting CTRL+C after you run the program. This will send a SIGINT signal to the process. There are also ways to send signals from whithin the process, like for example using the function abort(). abort() will send a SIGABRT and thus terminate the process(if default handler for SIGABRT is used). However as the same signal can be sent outside the process by the user explicitly(from the terminal), we can easily put this way of terminating a process in the same basket with the other “abnormal” ways of termination.

Do something on exit( atexit )

This is not something that has an absolute connection with cleanup but I mention it for a deep understanding of what happens on a process’ termination.

On any normal termination of a process we can do some specific work. Maybe I want something printed on screen every time the process terminates normally:

void onexit(void){
  puts("Process terminated normally.");
}

int main() {
  atexit(onexit);
  /* do stuff here */
  return 0;
}

Now every time main returns, our onexit() function is invoked. Pay attention to that even “return 1;” would work as it is still considered a normal termination of the process. Remember, “normal termination” has not to do with the value returned, but rather with what caused the process termination(was it an outside signal or was just the exit function invoked?).

A more compilcated example using atexit() would be to free some memory after exiting the program:

#include
#include

void *a; //we need global scope

void* allocSomeMemory(void* a) {
  return malloc(10);
}

void freeSomeMemory(void) {
  free(a);
}

int main() {
  a=allocSomeMemory(a);
  atexit(freeSomeMemory);
  /* do stuff with variable a */
  return 0;
}

This will allocate 10 bytes to the global variable a. Then we add a handler(a function that takes care of something) to free the memory allocated whenever we exit the process.

Now this will work, however if it’s a practical thing to do is a subjective thing. The kernel is going to free all resources for the process anyway so we just add more delay to the termination by freeing memory explicitly on exit. However this can be good practice for debugging in some cases.

What is process cleanup?

Cleanup is just the state of bringing back the kernel’s resources to as they were before running a program. That means freeing memory, flushing buffers, closing files, removing the process ID from the process table in the kernel, decrementing counters for open files, removing kernel timers, sending signals to the parent of the process, and much more.

There are two main players when it comes to cleanup:

  • The kernel
  • The process

We will start with the process cleanup which is somehow bulkier to describe.

Cleanup from the process

Process cleanup can occur in both normal and abnormal termination scenarios. In normal termination the cleanup occurs when exit() or return from main occurs. In fact returning from main is going to invoke exit() automatically. The process itself has an overhead(which it got when we compiled it) that tells it what to clean and how. Cleanup of the process are considered these things:

  1. Do some last work if there is registered with atexit()
  2. Flush all unwritten buffered data
  3. Close open streams
  4. Remove all files created by function tmpfile()
  5. Return the exit status and control to the kernel

Personally I don’t think point 1 is so much of a cleanup in the broader sense, but it gives the opportunity to the programmer to add some default behavior on every normal termination that in many cases would be to tidy up things.

Now all this occurs with the call of exit(). There is an exit function that will bypass all first 4 steps and go directly to the 5th step. That function is _exit(). So calling _exit() instead of exit() is actually going to give control directly to the kernel.

“Ok, so what’s the big deal with using _exit() instead of exit() and vice versa?”

In most cases exit() is the way to go. However some times you don’t want the same things “cleaned twice”. One example is using fork(). With fork(), a child process is created. The child inherits a lot of things by the parent process and amongst others the parent’s buffers. If exit() is called from the child, the inherited buffers will be flushed. Later on when the parent also exits, it will flush its buffers as well. In this scenario we will get double output.

Using _exit() in the child, we bypass the flushing from the child process and thus we don’t get unnecessary side effects(like double output).

Cleanup from the kernel

No matter if exit() or _exit() is used, in the end kernel is the big reaper. We will not go too deep into what excactly the kernel does but a few points in the cleanup routine are:

  • destroying kernel structures that were created for the process
  • memory allocated for the process is freed
  • decrementing open files
  • sending signals to the parent process

At this point the process is dead, that is, it’s not loaded in the memory. A very few structures are still present in the kernel solely in case the parent process might be interested. This is what we call a zombie process.

In order for these last structures to be destroyed, the parent must wait() for the child process. Once that has happened, the zombie process disappears and all resources in conjunction with the dead process are free.


20 Comments

File descriptors explained

File descriptors are often used in conjunction with file input and output. However it is not that clear for many people what file descriptors essentially are and that makes it harder to code. That’s what I will try to elaborate in this article so that you really know what you’re dealing with when you close file descriptors, duplicate them, pipe them, etc. Notice that this article uses code and conventions from C.

Files vs File descriptors

First of all we’ll start with the difference between a file structure and a file descriptor. Imagine you have an array of files like this:

files[] -> file1 | file2 | file3

file1, file2 and file3 are file data structures. A file structure is an opaque data structure. Opaque means that we don’t really know how that data structure looks like and we don’t either bother about it. All we need to know is that a file structure represents a file on a hard disk, a USB stick or whatever other storage device.

Going back to our example, the file descriptors are the indices(plural for index) of the array. So the indices in the above example would look like this:

index 0 1 2
file structure file1 file2 file3

Why do we need file descriptors when we have file stuctures?

The truth is that a process keeps track only of file descriptors. The file structures are on the side of the kernel. So the reason is the same reason that we use pointers and not actual data structures when we program in general: to save space and time, or otherwise efficiency. (There is also the element of security but we will not go into that.)

In C code you will often see something like this:

FILE* myfile;

This is something totally different from the a file structure in the kernel that we saw earlier. FILE in C is just a wrapper, or simply said; a structure holding an other structure. In fact FILE in C is just a file descriptor with some extra bells and whistles, nothing more. So why use FILE in C when we can use the file descriptors? Well file descriptors are just numbers. Imagine if we had to open and close numbers all the time. It would be hard to keep track of what we are doing. Everything would be a mess! Except the much friendlier name of a FILE, the FILE structure in C lets us also use more advanced functions which can take a FILE as argument but not a naked file descriptors.

When you start a new process, three file descriptors are created by default. These three file descriptors are called the standard file descriptors and are given the numbers 0, 1, 2. If you remember the Unix mantra, it says that everything in a Unix system is considered a file. That is even true for hardware devices like your monitor and keyboard. In fact there are file structures in the kernel that are corresponding to just those. The file descriptors 0, 1, 2 are indices to these special files. To be more exact, 0 is the index corresponding to the “keyboard file” and 1 and 2 are indices corresponding to the “monitor file”.

Streams

Earlier we said that there are three file descriptors created by default: 0, 1 and 2. We said that 0 corresponds to the “keyboard file” in the kernel and 1 and 2 correspond to the “monitor file”. If we were to sketch all this on paper, it would look a bit like below.

Process1 has the three default file descriptors we talked about. Notice how we hide the “keyboard file” and “monitor file” in the kernel box. That is mostly for simplicity as there is a lot of things going on in the kernel. Other than that the user(in this case the programmer) is not meant to know the inner workings of the kernel. The only thing the programmer can see is the C-like FILE structures and the file descriptors so we play that way.

Now there is something more than file descriptors in this figure. It’s the arrows that show the flow of the data. These arrows, call them channels, call them buses, call them rivers, or whatever you like, have a special name: streams. Stream is just an abstract name to make it easier for the programmer to visualize what is happening with the data. It’s merely easier to talk about stream of data than to talk about indices and the file structures in the kernel that those indices correspond to.

Now the three default file descriptors we talked about earlier, have in fact been baptized with their own special names: stdin, stdout and stderr.

These names are just abstract words that we use to talk about three specific channels of data(in most cases characters). Stdin is the data that we get from the user. Stdout is the flow of normal output to the user and stderr is the flow of output when errors happen in the program. Hopefully now it makes sense why file descriptor 0 is corresponding to the keyboard and file descriptors 1 and 2 are corresponding to the monitor. That is the program gets input from the keyboard while output goes to the monitor.

Now in C and many other languages there are three specific macros that are called stdin, stdout and stderr. These are not streams although they have the same name. That wouldn’t be possible anyway as streams is just an abstract idea(as said earlier) so that the programmer has it easier to visualize what is happening(even if it makes it a living hell for some). The stdin, stdout and stderr macros in C are just pointers to FILE structures(the C-like ones). You can in fact use these functions as you would use any FILE when programming in C. For example see the code below.

fprintf(stdout, "linux");
fprintf(stderr, "meerkat");

Will print “linuxmeerkat” on the screen. A question arises: Why do we have two macros that direct data to the same place? This is the same question as: Why do we have two streams to the monitor? The answer is rather simple. Many times when we develop a program we need to output errors on a different channel than the normal output. It’s just more neat to keep different things separated than mix them. Think for example how easier it is now if we want to hide all error messages in our program or just redirect them to a different place than the monitor.

Finally here is a table with the standard file descriptors(fd) and the corresponding standard streams:

fd stream
0 stdin
1 stdout
2 stderr

File descriptor tables

It’s an important detail to understand that file descriptors are the only file-relevant thing that a process can keep track of. As said earlier FILE structures in C are just a wrapper for a file descriptor so they can also be thought as file descriptors to justify the fact that a process has knowledge of only file descriptors. Now, each process keeps its unique file descriptor table. Say that we have two processes: process 1 and process 2.

process1
0
1
2
process2
0
1
2

When a new process gets created, file descriptors 0, 1 and 2 are created automatically and mapped to stdin, stdout and stderr.

To the eye the above two file descriptor tables look the same. That is, they have the same numbers. However the file descriptor 0 in process1 can be pointing to a totally different thing than the file descriptor 0 in process2.

(You may ask yourself: “How can file descriptor 0 in the two processes point to different things, if as we assumed, file descriptors are indices?”. Well that is a very logical question and the explanation is rather simple. A file descriptor is bundled with a pointer(in the abstract meaning), or if you like, an index to the global Open File Table. That is however nothing we should be worried about, so it’s not shown in the diagrams of this article.)

Just because we say that file descriptors 0, 1, 2 are standard it doesn’t mean that they are always going to correspond to stdin, stdout and stderr. Standard streams are as we think until the programmer decides it’s time to change things around. That is something that I will demonstrate. Bellow you see a visual representation of file descriptors and their connection with the kernel.

This is how the processes look like if we assume that file descriptors are not altered in any way. The arrows show the flow of data. The keyboard and screen icons in the kernel are file structures of the special files. The truth is that things are more complicated in the kernel. However now we focus on keeping track of the file descriptors. That’s also the only thing we can alter directly from inside the process. However don’t assume that because there are two icons that there are only two file structures.

Now let’s change the file descriptors in the first process a bit and see what happens:

close(1);                             //we close the stdout stream
FILE* f=fopen("myarticle.txt", "w");  //open a file for writing
fprintf(stderr, "file has fd: %d\n", fileno(f));

Pay attention that we fprintf() to the stderr as stdout is closed. From the code above this is what we get printed when we run the program:

file has fd: 1

From this output it’s crystal clear that file descriptor 1 is not pointing to stdout anymore. In fact it’s pointing to the file myarticle.txt. How do I know? The reason I know is that the usual behavior of the kernel is whenever we try to create a new file descriptor to give it the lowest number possible. That keeps things clean. Following this paradigm, after we close the stdout stream in the example, we immediately create a file for writing. This file needs a file descriptor. The kernel sees that the lowest number that can be used is 1 so the file gets that index number.

What about the file in the kernel that already has index 1? you might ask. As we said, the kernel is a bit more complicated than shown in the above figure. The kernel doesn’t mess files from different processes, even if they are the same files. So file descriptor 1 from the first process corresponds to a totally different file structure in the kernel than the file descriptor 1 in the second process. That’s something that a programmer shouldn’t worry about. As we said.. care only about the file descriptors. The kernel is a magician that you shouldn’t be aware of how his tricks work.

Here follows a figure with the file descriptors and the kernel after we ran the above code. The file description tables of both processes look like before. However look how the streams on the first process are now differentiated!

We see a new arrow there pointing to a new icon in the kernel. That icon is a new file data structure created in the kernel. The new arrow is a stream like stdin, stdout, stderr. The only difference is that we don’t have a standard name for it. However we can clearly see from the figure that the data is flowing to a file and particularly the file myarticle.txt. Notice that stderr is still flowing to the monitor. That’s also the reason we can still print text on fprintf() to the screen. If we closed this channel then we wouldn’t be able to output anything on the screen anymore.

TIP: When you want to sketch the file descriptors in paper I find it convenient to write the file descriptor in the first column and in the second column the stream-name or filename in case the stream is nameless. In this example I would write it like bellow.

process1
0 stdin
1 myarticle.txt
2 stderr

Why clones are helpful (duplicating file descriptors)

When we say that a file descriptor is removed or closed, it means that the file descriptor is destryied! Deallocated. No returning back. Nada! It’s gone forever and ever! We have lost it and with it also lost the stream it was connected to.

Now I want to remind you of the popular so called memory-leak in C. Say we allocate some memory in a function for a pointer. If we have only one pointer to the allocated memory space and somehow we lose it, we automatically get a memory leak as there is no way we can reach the memory where the pointer was pointing at. In C a solution would be to have a backup pointer and that’s also a solution used with file descriptors.

When we duplicate a file descriptor with dup(), what we actually do is making a second index for a file structure in the kernel. Let’s take an example:

int newfd;      //we declare a new file descriptor
newfd=dup(1);   //we make it a clone of file descriptor 1
printf("newfd: %d\n", newfd);

Output:
newfd: 3

The new file descriptor newfd is a clone of file descriptor 1 (stdout). Otherwise said, file descriptor 1 and 3 point to the same file structure in the kernel. That means that destroying file descriptor 1 is not going to have any effect on newfd. See the bellow code where we continue on the same example.

close(1);                 //destroying file descriptor 1 (stdout)
close(2);                 //destroying file descriptor 2 (stderr)
dprintf(newfd, "test");   //sending some data to the cloned file descriptor

Output:
test

As you see we destroyied all file descriptors to the monitor. However as we had made a backup file descriptor of 1, we can still print on the screen. By the way dprintf() is the same function as fprintf() with only difference that it takes a file descriptor as parameter instead of a FILE.

Here is also a visualisation of what we did:

I hope you can clearly see now how dup() works. The function merely duplicates a file descriptor. You might wonder about something however. If file descriptors are numbers, then why not just copy the file descriptor number to a new integer variable? Something like this:

FILE *f=fopen("test.txt", "r");   //open a file
int filefd=fileno(f);             //get file's fd
int filefd2;                      //a secondary fd to the file
filefd2=filefd;                   //this is where we "duplicate"

This is wrong and will not work. You see we don’t make a new file description in this case. filefd2 and filefd are going to have the exact same value and thus being the same exact index. The idea of duplication is to make a new file descriptor, a new index. In this example if we delete filefd, there is no way to access the file structure in the kernel through filefd2 as filefd2 and filefd are the exact same thing and thus deleting one is like deleting the other.

The below code is the correct way to do it.

FILE *f=fopen("test.txt", "r");   //open a file
int filefd=fileno(f);             //get file's fd
int filefd2;                      //a secondary fd to the file
filefd2=dup(filefd);              //this is where we "duplicate"

If filefd gets deleted now, we can still access it through filefd2.

A few words on pipes

I was concidering leaving pipes outside of this article. However I want people to have a full grasp of the tight relation between pipes and file descriptors so I will mention the basics on pipes here.

Pipe is essentially a pair of file descriptors. One file descriptor is used for input while the other file descriptor is used for output. Now what we feed to the one file descriptor will magically pop out the other file descriptor. Binary, characters, integers, everything is welcome and works. You can think of a pipe as a physical pipe that whatever you drop on the top end, will get out from the end at the bottom.

We declare a pipe just as an array of two integers like this:

int mypipe[2];

Now this alone doesn’t do anything. We have to tell the kernel to setup the pipe for us so that we can use it:

pipe(mypipe); //initializing the pipe

Now we have our fully working pipe.

PipeAs we said data goes through one file descriptor and comes out through the other one. The file descriptor that takes input is mypipe[1] and the one used for output is mypipe[0]. Notice that the numbers 1 and 0 are not file descriptors. They are the indices of the pipe. You should be very super extra careful on which end of the pipe is supposed to get data and which end is supposed to give data.

Programmers are probably some very egoistic bastards. I refer to the people that implemented the pipes in the kernel and you will see why. When it comes to pipes we refer to the write end of the pipe or the read end of the pipe from the view of the programmer and not the pipe itself. So for example the write end of the pipe is mypipe[1] while the read end is mypipe[0]. However if you are a plummer or an electronics person then you are familiar with reading the input/output from the view of the pipe itself. So be extra careful on that detail! You might spend endless hours, days or weeks debugging code just because you mixed the read end with the write end.

Now let’s test our pipe:

char buffer[5]="";
write(mypipe[1], "test", 5);      //writing to pipe
read(mypipe[0], buffer, 5);       //reading from pipe
puts(buffer);

This should output the text “test” on the terminal.

I will not go any deeper into pipes as that needs its own article in my humble opinion. I hope I made it a bit clearer what file descriptors really are and their relation to pipes and files.