Category Archives: linux

On UNIX File Permissions

It is pretty shameful to say that i still get UNIX permissions wrong sometimes. I hope to make this final forever by writing about them here 🙂

Every file and directory has three set of permissions associated with it: owner’s permissions, group members’ permissions and others’ permissions. User identity is checked against these sets in this order and the first matching set is chosen, which has a side-effect that i always tend to ignore:

* File owner may have less permissions than group members

* Group members may have less permissions that others (i.e, rest of the world)

For example, a file can have rwx permissions for others, but no permissions for owner and group members. (Of course owner can change the file permissions to get what he wants, but thats irrelevant.)

Second point i always miss is, the semantics of directory permissions.

* Read permission on a directory means you can read the contents of the directory, which means you can read file names inside the directory (but not their attributes — you need execute permission for this.)

* Write permission on a directory means you can modify the contents of the directory, which means you can add new files, remove files, rename files, etc. (you don’t need to be owner of any of these files.)

* Execute permission on a directory means you can traverse the directory down to further sub directories (but you cannot list the files — you need read permission for this)

What is interesting here is, write permission to a directory is sufficient to add or remove any file in it, which also means — you don’t need to be the owner of that file to delete it. This point is necessary to understand the need for sticky-bit and is also a cause for old mkdirs’ time-of-check and time-of-use (TOCTOU) bug.

I will write a notes on set-user-id, set-group-id and sticky-bits some other time.

Bottom Halfs

Bottom-halfs are non-critical portion of interrupt-handlers, i.e. interrupt-handler code that is executed with interrupts enabled. Bottom-halfs can be masked by processes when they access any data shared with bottom-halfs using local_bh_disable, spin_lock_bh, etc. functions.

Though bottom-halfs are a deferred work, in priority they are considered as equal to interrupts. So kernel processes them as quickly as possible. Kernel guarantees to execute all unmasked bottom-halfs on meeting below two conditions:

1. Following the execution of any top-half, after interrupts are enabled.
2. Following any process’s context switch, on a call to schedule(), before another process is scheduled.

Two types of bottom-halfs are supported with different different concurrency semantics.

Softirqs

Bottom-halfs which can be executed on multiple cpus simultaneously, are called softirqs. Softirqs are completely reentrant, which means, softirq code for an interrupt X can be executed on multiple cpus simultaneously.

This concurrency requirement suggest that the device serviced is of very high priority and/or can generate interrupts at a every high rate. Very few devices need this level of interrupt service, so Linux supports only a fixed number of softirqs and which devices belong to this class is decided at compile time.

Tasklets

Tasklets are sufficient to service almost all devices. Tasklets are a type of bottom-halfs which are guaranteed to be executed in serial. When tasklet code for interrupt X is executing on cpu ONE, it will be delayed from execution on other cpus. In fact, as a cache optimization, it is queued to execute on cpu ONE after completion.

There is no limitation on number of tasklets. Device drivers can add or remove tasklets dynamically.

Implementation Notes

Since all interrupt-handlers need not have work as a bottom-half, Linux provides a fixed number of bottom-halfs. After completing any top-half it iterates over all bottom-half slots looking for active (or activated) bottom-halfs and executes them.

One of the bottom-half slot has a reentrant framework which in turn iterates over all registered softirqs and executes the pending softirqs (of course, only when it is not executing on other cpus at that time). This is how tasklets are built on top of softirqs and provide serialization.

Ticket Spin Locks

Nice explanation on how spin locks are implemented in Linux currently and a new design checked-in to make them fair (as in first-come-first-serve).

http://lwn.net/SubscriberLink/267968/60e6ec9a59ec9677/

Building Linux Kernel

Below steps should be sufficient to build & install linux kernel manually.

$ tar xzvf linux-2.6.9.tar.gz
$ cd linux-2.6.9
$ make mrproper
$ make config                    # see below
$ make

# make modules_install install   # -- as root

In step 4 (make config) you need to select the drivers you want to build along your kernel and also how (within kernel? or as modules?). Easiest
way to get this is using an existing kernel confg file from /boot/config* (with a matching kernel number); see item 3 below.

Couple of alternatives exists for this “make config” step. Anyone of the below steps is also ok.

1. “make defconfig” – this will choose default settings for all drivers. This will work for most of the systems, but not always!
2. “make xconfig” – this will bring up an X windows based graphical driver selection utility.
3. “make oldconfig” – this will look for file named “.config” in the source tree and it will use that configration. This is useful for kernel re-builds, where you have chosen your drivers already in your previous build sessions, if any.
4. Others like, “make allyesconfig”, “make allnoconfig” or “make allmodconfig” etc. are also available. But these are used mostly for testing purposes. Try “make help”, it will show complete list of make commands (targets) accepted by kernel build system.

Here is the step-by-step process to build and install your custom linux kernel:

1. Execute ‘uname -r’ command on your terminal. It will give you the verion of running Linux kernel.
2. Download the vanilla kernel from http://www.kernel.org, whose version matches closest to your running kernel version upto 3-digits.
3. $ tar xzvf linux-2.6.???.tar.gz
4. $ cd linux-2.6.???
5. $ cp /boot/config-`uname -r` .config
6. $ make oldconfig
7. $ make

8. # make modules_install install

9. Include your kernel paths in /boot/grub/menu.lst file as shown below, where XXX and YYY values should be equal to whatever value other entries in that file have.

title "my custom kernel"
root XXX
kernel /boot/vmlinuz-2.6.??? root=YYY
initrd /boot/initrd.img-2.6.???

If everything goes fine, you can reboot your machine and can select “my custom kernel” option in grub menu. You will boot with your brand new kernel. Be prepared to see kernel panics 🙂

Adding a New System-call

First, i would like you to build the linux kernel manually and boot the system with that new kernel.

Once you were able to boot with your custom built kernel, you can follow the same procedure after adding new files or modifying existing kernel files.

If you modified any kernel files, you have to build, install and reboot the machine with new kernel. Then only your kernel modifications take effect.

If you add any new-files to kernel, you have to change the kernel’s Makefiles too. Every directory in kernel source-tree contains a file with name Makefile. Its format is easy, if you read it once, you will be able to add your new-files to kernel’s build process easily. Unless you add your file into one of the Makefiles your files will not be part of the kernel build.

Now i have told you the basic procedure to modify the kernel files or add new files to the kernel. Now lets talk about adding new system-call. From this point i assume know how to build the kernel and boot from it.

Whenever a program does a system-call, kernel looks for the system-call function in a system-call table. This table contains function-pointers to all system-call function definitions. This table used to be in entry.S file, but in latest kernels, it is moved into syscall_table.S file in arch/i386/kernel directory. By looking at it you will be able to figure out how to add one more entry into that table. You should add new entry into that table AT THE END.

What are those sys_* names? They are C function names for the system-calls. Look at include/linux/syscalls.h file. It contains the C function declarations for _almost_ all system-calls. So you should also add your system-calls declaration there.

OK, i assume you have added your system-call’s declaration in include/linux/syscalls.h and an entry in syscall_table.S files. Note down the index of your system-call’s entry in system-call table. wtf?

Now a million-dollar question: when a program calls read system-call how does kernel identify its a read system-call, not a write or a fork? This is where that __NR_???? symbol comes into picture. Have a look at include/asm-i386/unistd.h file.

Each system-call is given an unique number in include/asm-i386/unistd.h file. When a program does read system-call it puts __NR_read number in EAX CPU register and invokes system-call instruction (int 0x80). Linux kernel looks at this EAX register to identify which system-call the program has called for. This value is used as an index into the system-call table defined in syscall_table.S file. So your system-call also needs an unique number, and your system-calls address *must* be at the same index in the table (in syscall_table.S file). BTW don’t forget to add +1 to NR_syscalls value defined in unistd.h file. Got it? Read the above four paragraphs again :-).

Now pick any kernel source file or write a new-file and add your system-calls definition to it. If you choose to write in a new-file edit the Makefile to include it in the kernel build. Thats it. Now go ahead build the kernel and fix any errors/warnings you face. Install the new kernel and reboot your machine.

Now your kernel has a new system-call implemented. But how can user-space programs make use of it? Library files that are shipped with our OS don’t know about existance of our new system-call, so we have to write assembly stub code which can invoke our new system-call. How do we do it? Fortunately linux kernel source tree contains macros which can generate assembly stubs for new system-calls and we can make use of them or copy them. They are available in include/asm-i386/unistd.h file under #ifdef __KERNEL__ macro. We could use this information as below:

#include <linux/unistd.h>
#include <errno.h>

// generates assembly stub with prototype
// long sayhello (void) // no parameters

_syscall0(long,sayhello);

int main(int argc, char *argv[])
{
    sayhello();
    return 0;
}

You should compile it with special gcc flags as below:

gcc -nostdinc -I /path/to/linux-2.6.18/include -I /usr/include -D__KERNEL__ sayhello.c

This will put our modified linux kernel header files before system’s header files in the header files search path. The -D__KERNEL__ flag is necessary because _syscall0 is #ifdef guarded by it.