The Linux File Access Primitives

by Erik Troan

One of the most important abstractions of the POSIX API is the file. While nearly all operating systems provide files for permanent storage, all versions of UNIX provide access to most system resources through the file abstraction.

More concretely, this means that Linux uses the same set of system calls to provide access to devices (such as floppy disks and tape devices), networking resources (most commonly TCP/IP connections), system terminals, and even kernel status information. Thanks to their ubiquity, fluency in file-related system calls is important for every Linux programmer. Let's examine the basic concepts behind the file API and describe the most important file related system calls.

Linux provides many different kinds of files. The most common type is simply called a regular file, which stores hunks of information for later access. The vast majority of files you work with; such as executables (e.g., /bin/vi), data files (e.g., /etc/ passwd), and system libraries (e.g., /lib/libc.so.6); are all regular files. Usually these reside somewhere on disk, but that may not necessarily be the case (as we'll see later).

Another type of file is the directory, which contains a list of other files and their locations. When you use the ls command to list the files in a directory, it opens the file for that directory and prints out information on all of the files mentioned in it.

Other files include block devices (which represent filesystem-cached devices such as hard drives), character devices (which represent uncached devices like tape drives, mice, and system terminals), pipes and sockets (which allow processes to talk to one another), and symbolic links (which allow files to be given more than one name in the directory hierarchy).

Most files have one or more symbolic names which refer to them. These symbolic names are a set of strings delimited by the / character, and identify the file to the kernel. These are the pathnames with which all Linux users are quite familiar; for example, the pathname /home/ewt/articlerefers to the file that contains the text of this article on my laptop. No two files share the same name (a single file can have more then one name, however), so a pathname uniquely identifies a single file.

Each file a process has access to is identified by a small nonnegative integer, called a "file descriptor". File descriptors are created by system calls which open files and are inherited by new processes which are forked off from the current process. That is, when a process starts a new program, the original process's open files are normally inherited by the new program.

By convention, most programs reserve the first three file descriptors (0, 1, and 2) for a special purposes -- access to the so-called standard input, standard output, and standard error streams. File descriptor 0 is standard input, where many programs expect to receive input from the outside world. File descriptor 1 is standard output. Most programs display normal output there. For output related to error conditions, file descriptor 2 (standard error) is used.

Anyone comfortable with Linux shells has seen the use of the standard in, out, and error file descriptors. Normally, the shell runs commands with file descriptors 0, 1, and 2 all referring to the shell's terminal. When the > character is used to instruct the shell to send a program's output to another file, the shell opens that file as file descriptor 1 before invoking the new program. This causes the program to send its output to the given file rather than the user's terminal -- the beauty is that this is transparent to the program itself!

Similarly, the < character instructs the shell to use a particular file as file descriptor 0. This forces the program to read its input from that file -- in both cases, any errors from the program will still appear on the terminal, as those are sent to standard error on file descriptor 2. (Under the "bash" shell, you can redirect standard error using 2> rather than >.) This type of file redirection is one of the most powerful features of the Linux command line.

Before using any file-related system calls, programs should include <fcntl.h> and <unistd.h>; these provide the function prototypes and constants for the most common file routines. In the example code below, we'll assume that each program begins with

#include <fcntl.h>

#include <unistd.h>

First, let's look at how to read and write from a file. Intuitively enough, the read() and write() system calls are the most common ways of doing this. Both system calls expect three arguments: the file descriptor to access, a pointer to the information to read or write, and the number of characters which should be read or written. The number of characters which were successfully read or written is returned. Figure 1 illustrates a simple program which reads a line from standard input (file descriptor 0) and writesit to standard output (file descriptor 1).

Figure 1

void main(void) {

char buf[100];

int num;

num = read(0, buf, sizeof(buf));

write(1, "I got: ", 7); /* Length of "I got: " is 7! */

write(1, buf, num);

}

There are two things worth noting about this process. First, we asked read() to return 100 characters, but if we run this program, we only get input until the user presses the "enter" key. Many file operations work on a best-effort basis: they try to return all of the information the program asks for, but these may succeed only partially. By default, the terminal is configured to return from a read() call as soon as a \n is available (which is generated by pressing the "enter" key). This is actually quite convenient, since most users expect programs to be line-oriented anyway. Regular data files don't show this kind of behavior, however, and relying on it may cause unexpected results.

The other thing to notice is that we didn't have to write a \n after displaying our output. The read() call gave us the \n from the user, and we just write() that \n back to standard out. If you'd like to see what happens without that newline, try changing the last line to

write(1, buf, num - 1);

One last point about this simple example -- at no point does buf contain an actual C string! C strings are terminated by a single \0 character which marks the end of the string. As read() doesn't add a \0 to the end of the buffer, using strlen() (or any other C string function) on a read() buffer would be a big mistake! This behavior allows read() and write() to manipulate data which includes \0 characters, which is an impossibility for normal string functions.

The read() and write() system calls work on the vast majority of files. They don't work on directories, which should be accessed through special functions such as readdir(). Also, read() and write() don't work for certain types of sockets.

Some files, such as regular files and block device files, use the concept of a file pointer. It specifies where in the file the next read() call will read from, and where the next write() call will write to. After a read() or a write(), the file pointer is advanced (internally, by the kernel) by the number of characters which were processed. This makes it easy to read all of the data in a file with a simple loop [See Figure 2].

Figure 2

char buffer[1024];

while ((num = read(0, buffer, 1024))) {

printf("got some data\n");

}

This loop will read all of the data on standard in, automatically advancing the kernel's internal file pointer after every read. When the file pointer is at the end of the file, the read() will return 0 and the loop will exit. Some files (such as character devices -- the terminal is a good example) don't have a file pointer per se, so on them this program will continue running until the user provides an end of file marker (by pressing "Ctrl-D").

Now that we've seen how to read and write from a file, the next thing to learn is how to open a new file. There are different ways of opening different types of files; the only one we'll discuss here is opening files that are represented in the filesystem through a pathname, including regular files, directories, device files, and named pipes. While some socket files have path names, those must be opened through an alternate method.

Disclaimers aside, the open() system call allows programs to access most system files. open() is an unusual system call as it takes either two or three arguments:

int open(const char *

pathname,

int flags);

or,

int open(const char *

pathname,

int flags,

int perm);

The first form is more common; it opens a file which already exists. The second form should be used when the file may need to be created. The third argument specifies the access permissions that the new file should be given.

The first parameter to open() is the full path name as a normal C string (that is, terminated with a \0). The second parameter specifies how the file should be opened, and is one or more of the following flags logically ORd together:

O_RDONLY: The file may only be read

O_RDWR: The file may be read from or written to

O_APPEND: The file may be read, or appended to

O_CREAT: If the file does not already exist, it should be created

O_EXCL: If the file already exists, fail rather then create it (should only be used with O_CREAT)

O_TRUNC: If the file already exists, remove all data from it(this is similar to creating a new file)

The third parameter to open() is needed only when O_CREAT is used; it specifies the file permissions as a number, which is the same format as the numeric permissions argument to the chown command. The permissions specified to open() are affected by the user's umask, which allows the user to specify a set of default permissions that all new files should obtain. Most programs creating files call open() with a third argument of 0666, enabling the user to control their default permissions through the umask (the umask command of most shells can change this).

Figure 3

int fd;

fd = open("myfile", O_RDWR | O_CREAT | O_TRUNC, 0666)

if (fd < 0) {

/* Some error occurred */

/* ... */

}

For example, Figure 3 shows how to open a file for reading and writing, creating it if it doesn't exist, and discarding any data which is in it if it does.

open() returns a file descriptor which references the file. Recall that file descriptors are always >= 0; if open() returns a negative value an error occurred and the global variable errno contains the UNIX error code describing the problem. open() will always return the smallest number that it can; for example, if file descriptor 0 is not being used, open() will always return 0.

When a process is finished with a file, it should close it through the close() system call, which takes the form:

int close(int fd);

The file descriptor to close is the only argument to close(), and it returns 0 on success. While it may seem odd for close() to fail, if the file descriptor refers to a file on a remote server, say, and the system cannot properly flush its caches,close() can actually fail. When a process terminates, the kernel automatically closes any files left open.

The final common file operation is moving the file pointer. This only makes sense for files with file pointers (naturally), and attempting this on inappropriate files will return an error. The lseek() system call is used for this purpose:

off_t lseek(int fd,

off_t pos,

int whence);

The off_t type is a fancy way of saying long int (long is where the "l" in lseek comes from). lseek() returns the final position of the file's file pointer relative to the start of the file, or -1 if there was an error. This system call expects the file descriptor whose file pointer is being moved as the first argument, and the position in the file to move it to as the second. The last argument describes how the file pointer is moved.

SEEK_SET moves it to pos bytes from the beginning of the file

SEEK_END moves it to pos bytes from the end of the file

SEEK_CUR moves it pos bytes toward the end of the file from its current position

The combination of open(), close(), write(), read(), andlseek() provides the basic file access API for Linux. While there are numerous other functions which manipulate files, those described here are used most of the time.

Most programmers use the familiar ANSI C library file functions (such as fopen() and fread()), rather than the lower-level system calls described here. fopen() and fread() are, as you would expect, implemented on top of these system calls in a user-level library. Still, it's not uncommon to see usage of the low-level system calls, especially in more complex programs. By familiarizing yourself with these routines and interfaces you'll be on your way to becoming a true UNIX hacker.

Erik Troan is a developer for Red Hat Software and co-author of the book Linux Application Development. He can be reached at ewt@redhat.com.