Introduction

The Linux file system is a hierarchically structured tree where every location has its distinct meaning. The file system structure is standardized through the file system hierarchy standard of which you’ll find this chapter to be a description off. Lately however, more and more distributions are making a small change towards their file system layout (but all consistent) so the standard is in need of updating. When a setting deviates from the current standard, I will say so in this chapter.

Of course, a file system is always stored on media (be it a hard drive, a CD or a memory fragment); how these media relate to the file system and how Linux keeps track of those is also covered in this chapter.

Structure

The file system is a tree-shaped structure. The root of the tree, which coincidentally is called the file system root but is always depicted as being above all other, is identified by the slash character: “/“. It is the highest place you can go to. Beneath it are almost always only directories:

~$ cd /
~$ ls -F
bin/     home/     opt/      srv/     var/
boot/    lib/      proc/     sys/
dev/     media/    root/     tmp/
etc/     mnt/      sbin/     usr/

The ls -F commands shows the content of the root location but appends an additional character to special files. For instance, it appends a “/” to directories, an “@” to symbolic links and a “*” to executable files. The advantage is that, for this book, you can easily see what type of files you have. By default, Gentoo enables colour-mode for the ls command, telling you what kind of files there are by the colour. For books however, using the appended character is more sane.

A popular way of representing the file system is through a tree. An example would be for the top level:

/
+- bin/
+- boot/
+- dev/
+- etc/
+- home/
+- lib/
+- media/
+- mnt/
+- opt/
+- proc/
+- root/
+- sbin/
+- srv/
+- sys/
+- tmp/
+- usr/
`- var/

The more you descend, the larger the tree becomes and it will soon be too difficult to put it on a single view. Still, the tree format is a good way of representing the file system because it shows you exactly how the file system looks like.

/
+- bin/
+- ...
+- home/
|  +- thomas/
|  |  +- Documents/
|  |  +- Movies/
|  |  +- Music/
|  |  +- Pictures/         <-- You are here
|  |  |  `- Backgrounds/
|  |  `- opentasks.txt
|  +- jane/
|  `- jack/
+- lib/
+- ...
`- var/

We’ve briefly covered navigating through the tree previously: suppose that you are currently located inside /home/thomas/Pictures. To descend even more (into the Backgrounds directory) you would type “cd Backgrounds“. To ascend back (to /home/thomas) you would type “cd ..” (.. being short for “parent directory”).

Before we explain the various locations, let’s first consider how the file system is stored on one (or more) media…

Mounting File Systems

The root of a file system is stored somewhere. Most of the time, it is stored on a partition of a disk. In many cases you would want to combine multiple partitions for a single file system. Combining one partition with the file system is called mounting a file system. Your file system is always seen as a tree structure, but parts of a tree (a branch) can be located on a different partition, disk or even other medium (network storage, DVD, USB stick, …).

Mounting

Suppose that you have the root of a file system stored on one partition, but that all the users’ files are stored on another. This would mean that /, and everything beneath it, is on one partition except /home and everything beneath that, which is on a second one.

Figure 5.1. Two partitions used for the file system structure

Two partitions used for the file system structure


The act of mounting requires that you identify a location of the file system as being a mount point (in the example, /home is the mount point) under which every file is actually stored on a different location (in the example, everything below /home is on the second partition). The partition you “mount” to the file system doesn’t need to know where it is mounted on. In fact, it doesn’t. You can mount the users’ home directories at /home (which is preferable) but you could very well mount it at /srv/export/systems/remote/disk/users. Of course, the reason why you would want to do that is beyond me, but you could if you want to.

The mount command by itself, without any arguments, shows you a list of mounted file systems:

$ mount
/dev/sda8 on /             type ext3       (rw,noatime)
proc      on /proc         type proc       (rw)
sysfs     on /sys          type sysfs      (rw,nosuid,nodev,noexec,relatime)
udev      on /dev          type devtmpfs   (rw,nosuid,relatime,size=10240k,mode=755)
devpts    on /dev/pts      type devpts     (rw,nosuid,noexec,relatime,gid=5,mode=620)
/dev/sda7 on /home         type ext3       (rw,noatime)
none      on /dev/shm      type tmpfs      (rw)
/dev/sda1 on /mnt/data     type ext3       (rw,noatime)
usbfs     on /proc/bus/usb type usbfs      (rw,noexec,nosuid,devmode=0664,devgid=85)

The above example, although bloated with a lot of other file systems we know nothing about yet, tells us that the file system can be seen as follows:

/             (on /dev/sda8)
+- ...
+- dev/       (special: "udev")
|  +- pts     (special: "devpts")
|  `- shm     (special: "none")
+- proc/      (special: "proc")
|  `- bus/
|     `- usb/ (special: "usbfs")
+- sys/       (special: "sys")
+- home/      (on /dev/sda7)
`- mnt/
   `- data/   (on /dev/sda1)

Ignoring the special mounts, you can see that the root of the file system is on device /dev/sda8. From /home onwards, the file system is stored on /dev/sda7 and from /mnt/data onwards, the file system is stored on /dev/sda1. More on this specific device syntax later.

The concept of mounting allows programs to be agnostic about where your data is structured. From an application (or user) point of view, the file system is one tree. Under the hood, the file system structure can be on a single partition, but also on a dozen partitions, network storage, removable media and more.

File Systems

Each medium which can contain files is internally structured. How this structure looks like is part of the file system it uses. Windows users might remember that originally, Microsoft Windows used FAT16 and later on FAT32 before they all migrated to one of the many NTFS revisions currently in use by Microsoft Windows. Well, all these are in fact file systems, and Linux has its own set as well.

Linux however doesn’t require its partitions to have one possible file system (like “only NTFS is supported”): as long as it understands it and the file system supports things like ownership and permissions, you are free to choose whatever file system you want. In fact, during most distribution installations, you are asked which file system to choose. The following is a small list of popular file systems around, each with a brief explanation on its advantages and disadvantages…

  • The ext2 file system is Linux’ old, yet still used file system. It stands for extended 2 file system and is quite simple. It has been in use almost since the birth of Linux and is quite resilient against file system fragmentation – although this is true for almost all Linux file systems. It is however slowly being replaced by journaled file systems.
  • The ext3 file system is an improvement on the ext2 file system, adding, amongst other things, the concept of journaling.
  • The ext4 file system is an improvement on the ext3 file system, adding, amongst other things, support for very large file systems/files, extents (contiguous physical blocks), pre-allocation and delayed allocation and more. The ext4 file system is backwards compatible with ext3 as long as you do not use extents. Ext4 is frequently seen as the default file system of choice amongst administrators and distributions.
  • The reiserfs file system is written from scratch. It provides journaling as well, but its main focus is on speed. The file system provides quick access to locations with hundreds of files inside (ext2 and ext3 are much slower in these situations) and keeps the disk footprint for small files small (some other file systems reserve an entire block for every file, reiserfs is able to share blocks with several files). Although quite popular a few years back, the file system has been seeing a lack of support through its popular years (harmful bugs stayed in for quite some time) and is not frequently advised by distributions any more. Its successor, reiser4, is still quite premature and is, due to the imprisonment of the main developer Hans Reiser, not being developed that actively any more.
  • The btrfs file system is a promising file system. It addresses concerns regarding huge storage backend volumes, multi-device spanning, snapshotting and more. Although its primary target was enterprise usage, it also offers interesting features to home users such as online grow/shrink (both on file system as well as underlying storage level), object-level redundancy, transparent compression and cloning.
  • The xfs file system is an enterprise-ready, high performance journaling file system. It offers very high parallel throughput and is therefore a common choice amongst enterprises.
  • The zfs file system (ZFSonLinux) is a multi-featured file system offering block-level checksumming, compression, snapshotting, copy-on-write, deduplication, extremely large volumes, remote replication and more. It has been recently ported from (Open)Solaris to Linux and is gaining ground.

A file system journal keeps track of file write operations by first performing the write (like adding new files or changing the content of files) in a journal first. Then, it performs the write on the file system itself after which it removes the entry from the journal. This set of operations ensures that, if at any point the file system operation is interrupted (for instance through a power failure), the file system is able to recover when it is back up and running by either replaying the journal or removing the incomplete entry: as such, the file system is always at a consistent state.

It is usually not possible to switch between file systems (except ext2 <> ext3) but as most file systems are mature enough you do not need to panic “to chose the right file system”.

Now, if we take a look at the following mount output, we notice that there is a part of the line that says which “type” a mount has. Well, this type is the file system used for that particular mount.

$ mount
rootfs      on /                        type rootfs      (rw)
sysfs       on /sys                     type sysfs       (rw,seclabel,relatime)
selinuxfs   on /sys/fs/selinux          type selinuxfs   (rw,relatime)
/dev/md3    on /                        type ext4        (rw,seclabel,noatime,nodelalloc)
/dev/md4    on /srv/virt                type ext4        (rw,noatime,data=ordered,barrier=0)
proc        on /proc                    type proc        (rw,nosuid,nodev,noexec,relatime)
tmpfs       on /run                     type tmpfs       (rw,rootcontext=system_u:object_r:var_run_t,seclabel,nosuid,nodev,noexec,relatime)
udev        on /dev                     type devtmpfs    (rw,seclabel,nosuid,relatime,size=10240k,nr_inodes=493003,mode=755)
/dev/mapper/volgrp-nfs    on /srv/nfs   type ext4        (rw,noatime,data=journal)
/dev/mapper/volgrp-usr    on /usr       type ext4        (rw,noatime,data=journal)
/dev/mapper/volgrp-home   on /home      type ext4        (rw,noatime,nosuid,nodev,data=journal)
/dev/mapper/volgrp-opt    on /opt       type ext4        (rw,noatime,data=journal)
/dev/mapper/volgrp-var    on /var       type ext4        (rw,noatime,data=journal)
/dev/mapper/volgrp-vartmp on /var/portage type ext4      (rw,noatime,data=ordered,rootcontext="system_u:object_r:tmp_t")
mqueue      on /dev/mqueue              type mqueue      (rw,seclabel,nosuid,nodev,noexec,relatime)
devpts      on /dev/pts                 type devpts      (rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620)
shm         on /dev/shm                 type tmpfs       (rw,seclabel,nosuid,nodev,noexec,relatime)
securityfs  on /sys/kernel/security     type securityfs  (rw,nosuid,nodev,noexec,relatime)
debugfs     on /sys/kernel/debug        type debugfs     (rw,nosuid,nodev,noexec,relatime)
none        on /selinux                 type selinuxfs   (rw)
tmpfs       on /var/tmp                 type tmpfs       (rw,nosuid,noexec,nodev,rootcontext="system_u:object_r:tmp_t")
tmpfs       on /tmp                     type tmpfs       (rw,nosuid,noexec,nodev,rootcontext="system_u:object_r:tmp_t")
rc-svcdir   on /lib64/rc/init.d         type tmpfs       (rw,nosuid,nodev,noexec,relatime,rootcontext="system_u:object_r:initrc_state_t",seclabel,size=1024k,mode=755)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nodev,noexec,nosuid)
rpc_pipefs  on /var/lib/nfs/rpc_pipefs  type rpc_pipefs  (rw)
nfsd        on /proc/fs/nfsd            type nfsd        (rw,nodev,noexec,nosuid)

As you can see, all partitions (the non-special lines) are all typed as ext4. But what are those other file systems?

  • proc is a special file system which doesn’t exist on a device, but is a sort of gateway to the Linux kernel. Everything you see below /proc is something the kernel displays the moment you read it. It is a way to communicate with the kernel (and vice versa) using a very simple interface: file reading and file writing, something well supported.

    I will elaborate on /proc more later in this chapter.

    proc is known to be a pseudo file system: it does not contain real files, but runtime information.

  • sysfs is a special file system just like proc: it doesn’t exist on a device, and is a sort of gateway to the Linux kernel. It differs from proc in the way it is programmed as well as structured: sysfs is more structured and tailored towards computer-based parsing of the files and directories, whereas proc is more structured and tailored towards human-based reading/writing to the files and directories.

    The idea is that proc will eventually disappear (although there is no milestone set yet since many people like the simple way /proc gives them information) and be fully replaced by the sysfs file system.

    Like /proc, sysfs is known to be a pseudo file system and will be elaborated more later in this chapter.

  • tmpfs is a temporary file system. Its contents is stored in memory and not on a persistent disk. As such, its storage is usually very quick (memory is a lot faster than even the fastest SSDs and hard disks out there). I do say usually, because tmpfs can swap out pages of its memory to the swap location, effectively making those parts of the tmpfs file system slower (as they need to be read from disk again before they can be used).

    Within Linux, tmpfs is used for things like the shared memory in /dev/shm and /tmp.

  • devtmpfs is similar to the tmpfs file system, but contains device files managed by the kernel. The devtmpfs file system was brought to life to handle the concern of providing important device files before udev (the device manager) is able to start up and take control.
  • devpts is a pseudo file system like proc and sysfs. It contains device files used for terminal emulation (like getting a console through the graphical environment using xterm, uxterm, eterm or another terminal emulation program). In earlier days, those device files were created statically, which caused most distributions to allocate a lot of terminal emulation device files (as it is difficult to know how many of those emulations a user would start at any point in time). To manage those device files better, a pseudo file system is developed that creates and destroys the device files as they are needed.
  • usbfs is also a pseudo file system and can be compared with devpts. It also contains files which are created or destroyed as USB devices are added or removed from the system. However, unlike devpts, it doesn’t create device files, but pseudo files that can be used to interact with the USB device.

    As most USB devices are generic USB devices (belonging to certain classes, like generic USB storage devices) Linux has developed a framework that allows programs to work with USB devices based on their characteristics, through the usbfs file system.

  • selinuxfs is a pseudo file system that represents the SELinux subsystem in the Linux kernel. It is used by the SELinux libraries to interact with the SELinux security server, querying the SELinux policy and more. Linux systems that do not have SELinux enabled will not have this file system mounted.
  • mqueue is a pseudo file system used for inter-process message queue support (POSIX message queues)
  • binfmt_misc is a pseudo file system used to register executable formats. Through binfmt, the Linux kernel is able to execute arbitrary executable file formats by recognizing the registered executable formats and passing it on to userspace applications.

Many more special file systems exist (some are even mentioned in the mount output above), but I leave that to the interested reader to find out more about these file systems.

Partitions and Disks

Every hardware device (except the network interface) available to the Linux system is represented by a device file inside the /dev location. Partitions and disks are no exception. Let’s take a serial ATA hard disk as an example.

A SATA disk driver internally uses the SCSI layer to represent and access data. As such, a SATA device is represented as a SCSI device. The first SATA disk on your system is represented as /dev/sda, its first partition as /dev/sda1. You could read sda1 backwards as: “1st partition (1) on the first (a) scsi device (sd)”.

~$ ls -l /dev/sda1
brw-rw---- 1 root disk 8,  1  Nov 12  10:10  /dev/sda1

A regular ATA disk (or DVD-ROM) would be represented by /dev/sdc (sd stood for scsi disk but is now seen as the identification of a ATA device) or /dev/sr0.

The device management software on the system will most likely create symbolic links to /dev/sdc called /dev/cdrom or /dev/dvdrom for the administrator’s convenience.

$ ls -l /dev/sda
brw-rw---- 1 root cdrom 3, 0  Apr 23  21:00 /dev/sda

On a default Gentoo installation, the device manager (which is called udev) creates the device files as it encounters the hardware. For instance, on my system, the partitions for my first SATA device can be listed as follows:

$ ls -l /dev/sda*
brw-r----- 1 root disk 8, 0 Sep 30 18:11 /dev/sda
brw-r----- 1 root disk 8, 1 Sep 30 18:11 /dev/sda1
brw-r----- 1 root disk 8, 2 Sep 30 18:11 /dev/sda2
brw-r----- 1 root disk 8, 5 Sep 30 18:11 /dev/sda5
brw-r----- 1 root disk 8, 6 Sep 30 18:11 /dev/sda6
brw-r----- 1 root disk 8, 7 Sep 30 18:11 /dev/sda7
brw-r----- 1 root disk 8, 8 Sep 30 18:11 /dev/sda8

Inside the /dev location, there are also symbolic links (pointers) towards those device files, which can be used to identify the partitions or disks by other means. For instance, to list the disk devices by their UUID (Universally Unique IDentifier):

$ ls -l /dev/disk/by-uuid
total 0
lrwxrwxrwx. 1 root root 10 Dec 16 20:01 1628b93d-3448-4b8c-b72b-1d68e89bd2fa -> ../../sda2
lrwxrwxrwx. 1 root root 10 Dec 16 20:01 5550c45f-9660-44f2-8e86-05a612d028a3 -> ../../dm-2
lrwxrwxrwx. 1 root root 10 Dec 16 20:01 77eb40be-f571-49c6-bbb0-a12677615fe3 -> ../../dm-5
lrwxrwxrwx. 1 root root 10 Dec 16 20:01 9644f675-6eaf-4974-9e1a-0b8eafa931ae -> ../../sdb2
lrwxrwxrwx. 1 root root 10 Dec 16 20:01 9beda062-6e15-4323-9ad1-53b6a9e39676 -> ../../dm-0
lrwxrwxrwx. 1 root root 10 Dec 16 20:01 9e7a1178-b0ad-4cd8-8977-a471a5d2b797 -> ../../dm-4
lrwxrwxrwx. 1 root root  9 Dec 16 20:01 b06fa545-0d5a-4c9a-97cb-83b4e1799f9a -> ../../md3
lrwxrwxrwx. 1 root root 10 Dec 16 20:01 b80c76a3-d52f-4006-9bf5-62f4d7edc791 -> ../../dm-3
lrwxrwxrwx. 1 root root 10 Dec 16 20:01 bdb15de1-3430-47b4-9e63-ee58557f1d17 -> ../../dm-1
lrwxrwxrwx. 1 root root  9 Dec 16 20:01 c44503ce-e52e-452c-b4bc-767ddd1d3b27 -> ../../md1
lrwxrwxrwx. 1 root root  9 Dec 16 20:01 d8a2bb27-15da-49bb-b205-3160c307835c -> ../../md4

The advantage of using UUIDs is that they uniquely identify a partition or disk. If the disks in the system are later juggled around, or we are talking about removable devices, then by using the UUID we know for sure that we are looking at the right partition (and not another disk that got named /dev/sda2 for instance).

The ‘mount’ Command and the fstab file

The act of mounting a medium to the file system is performed by the mount command. To be able to perform its duty well, it requires some information, such as the mount point, the file system type, the device and optionally some mounting options.

For instance, the mount command to mount /dev/sda7, housing an ext3 file system, to /home, would be:

# mount -t ext3 /dev/sda7 /home

One can also see the act of mounting a file system as “attaching” a certain storage somewhere on the file system, effectively expanding the file system with more files, directories and information.

However, if your system has several different partitions, it would be a joke to have to enter the commands every time over and over again. This is one of the reasons why Linux has a file system definition file called /etc/fstab. The fstab file contains all the information mount could need in order to successfully mount a device. An example fstab is shown below:

/dev/sda8  /          ext4    defaults,noatime      0 0
/dev/sda5  none       swap    sw                    0 0
/dev/sda6  /boot      ext4    noauto,noatime        0 0
/dev/sda7  /home      ext4    defaults,noatime      0 0
/dev/sdb1  /media/usb auto    user,noauto,gid=users 0 0

The file is structured as follows:

  1. The device to mount (also supports labels – we’ll discuss that later)
  2. The location to mount the device to (mount point)
  3. The file system type, or auto if you want Linux to automatically detect the file system
  4. Additional options (use “defaults” if you don’t want any specific option), such as noatime (don’t register access times to the file system to improve performance) and users (allow regular users to mount/umount the device)
  5. Dump-number (you can leave this at 0)
  6. File check order (you can leave this at 0 as well)

Thanks to this file, the previous mount command example is not necessary any more (as the mount is performed automatically) but in case the mount has not been done already, the command is simplified to:

# mount /home

If you ever need to remove a medium from the file system, use the umount command:

# umount /home

This is of particular interest for removable media: if you want to access a CD or DVD (or even USB stick), you need to mount the media on the file system first before you can access it. Likewise, before you can remove the media from your system, you first need to unmount it:

# mount /media/dvd
(The DVD is now mounted and accessible)
# umount /media/dvd
(The DVD is now not available on the file system any more and can be
 removed from the tray)

Of course, modern Linux operating systems have tools in place which automatically mount removable media on the file system and unmount it when they are removed. Gentoo Linux does not offer such tool by default (you need to install it) though.

Swap location

You can (and probably will) have a partition dedicated for paging: this partition will be used by Linux when there is insufficient physical memory to keep all information about running processes (and their resources). When this is the case, the operating system will start putting information (which it hopes will not be used soon) on the disk, freeing up physical memory.

This swap partition is a partition like any other, but instead of a file system usable by end users, it holds a specific file system for memory purposes and is identified as a swap partition in the partition table:

# fdisk -l /dev/sda
Disk /dev/sda: 60.0 GB, 60011642880 bytes
255 heads, 63 sectors/track, 7296 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x8504eb57

   Device Boot  Start    End      Blocks   Id  System
/dev/sda1   *       1   1275    10241406   83  Linux
/dev/sda2        1276   7296    48363682+   5  Extended
/dev/sda5        1276   1525     2008093+  82  Linux swap / Solaris
/dev/sda6        1526   1532       56196   83  Linux
/dev/sda7        1533   2778    10008463+  83  Linux
/dev/sda8        2779   7296    36290803+  83  Linux

The swap partition is pointed by through the /etc/fstab file and enabled at boot-up.

To view the currently active swap partitions (or files, as swap files are supported as well), view the content of the /proc/swaps file or run the swapon -s command:

# cat /proc/swaps
Filename     Type      Size    Used  Priority
/dev/sda5    partition 2008084 0     -1

The Linux File System Locations

As said before, every location on the Linux file system has its specific meaning. We’ve already covered a few of them without explicitly telling that those are standard locations, such as /home which houses the local users’ home directories. The Linux File system Standard covers all these standard locations, but this chapter would be very incomplete if it didn’t talk about these as well.

System Required Locations

The system required locations are locations you cannot place on another file system medium because those locations are required by the mount command itself to function properly:

  • /bin usually contains executable programs needed to bring the system up and running. Recently however, more and more distributions are moving all applications towards /usr/bin and are using symbolic links to transition towards this new structure.
  • /etc contains all the configuration files for the system (not the user-specific configurations)
  • /lib usually contains the system libraries necessary to successfully boot the system and run the commands which are located inside /bin. Recently however, these files are also being migrated towards /usr/lib.
  • /sbin, just like /bin, contains executable programs. However, whereas /bin has programs which users can use as well, /sbin contains programs solely for system administrative purposes

Userland Locations

Userland locations are the locations which contain the files for the regular operation of a system (such as application data and the applications themselves). These can be stored on separate media if you want, but if you do, you will need to setup an initial ram disk to boot your system with. More about initial ram file systems later. The location for the userland locations is /usr (which comes from Unix System Resources).

  • /usr is the root of the userland locations (and usually the mount point of the separate medium)
  • /usr/X11R6 contains all the files necessary for the graphical window server (X11); they are subdivided in binaries (bin/), libraries (lib/) and header definitions (/include) for programs relying on the X11 system.
  • /usr/bin contains all the executable programs
  • /usr/lib contains all the libraries for the above mentioned programs
  • /usr/share contains all the application data for the various applications (such as graphical elements, documentation, …)
  • /usr/local is often a separate mount as well, containing programs specific to the local system (the /usr might be shared across different systems in large environments)
  • /usr/sbin is, like /usr/bin, a location for executable programs, but just like /bin and /sbin, /usr/sbin contains programs for system administrative purposes only.

General Locations

General locations are, well, everything else which might be placed on a separate medium…

  • /home contains the home directories of all the local users
  • /boot contains the static boot-related files, not actually necessary once the system is booted (for instance, it includes the bootloader configuration and kernel image)
  • /media contains the mount points for the various detachable storage (like USB disks, DVDs, …)
  • /mnt is a location for temporarily mounted media (read: not worth the trouble of defining them in fstab)
  • /opt contains add-on packages and is usually used to install applications into which are not provided by your package manager natively (as those should reside in /usr) or build specific to the local system (/usr/local).
  • /tmp contains temporary files for the system tools. The location can be cleansed at boot up.
  • /var contains data that changes in size, such as log files, caches, etc.

Special Kernel-provided File Systems

Some locations on the file system are not actually stored on a disk or partition, but are created and managed on-the-fly by the Linux kernel.

  • /proc contains information about the running system, kernel and processes
  • /sys contains information about the available hardware and kernel tasks
  • /dev contains device files

These locations will often also have other (pseudo) file systems mounted underneath.

The Root File System /

As said before, the root file system / is the parent of the entire file system. It is the first file system that is mounted when the kernel boots (unless you use an initial ramdisk), and your system will not function properly if the kernel detects corruption on this file system. Also, due to the nature of the boot process, this file system will eventually become writable (as the boot process needs to store its state information, etc.)

Some locations on the root file system are strongly advised to remain on the root file system (i.e. you should never ever mount another file system on top of that location). These locations are:

  • /bin and /sbin as these contain the binaries (commands) or links to binaries that are needed to get a system up to the point it can mount other file systems. Although this functionality is gradually becoming less and less so, it would still break systems if you make separate mounts for these (small) locations.
  • /lib as this contains the libraries that are needed by the commands in /bin.
  • /etc as this contains the systems’ configuration files, including those that are needed during the boot-up of the system.

    A prime example of a configuration file inside /etc is fstab (which contains information about the other file systems to mount at boot time).

The Variable Data Location /var

The var location contains variable data. You should expect this location to be used frequently during the life time of your installation. It contains log files, cache data, temporary files, etc.

For many, this alone is a reason to give /var its own separate file system: by using a dedicated file system, you ensure that flooding the /var location doesn’t harm the root file system (as it is on a different file system).

The Userland Location /usr

The usr location contains the systems’ day-to-day application files. A specific property of the location is that, if you are not updating your system, it could be left unmodified. In other words, you should be able to have only read-only access to the /usr location. Most distributions however do not support this feature anymore and assume that the /usr location is writable by the administrator at all times.

Having /usr on a separate file system also has other advantages (although some might be quite far-fetched 😉

  • If you are performing system administration tasks, you could unmount /usr so that end users don’t run any programs they shouldn’t during the administrative window.
  • By placing /usr (and some other locations) on separate media, you keep your root file system small which lowers the chance of having a root file system corruption that will make booting impossible.
  • You can use a file system that is optimized for fast reading (writing doesn’t require specific response times)

The advantages however are becoming less and less relevant nowadays. Instead, distributions are focusing more towards initial ram file systems (a small, in-memory file system used to boot the system), which will be discussed later in this book.

The Home Location /home

Finally, the /home location. This location contains the end users’ home directories. Inside these directories, these users have full write access. Outside these directories, users usually have read-only rights (or even no rights at all). The structure inside a home directory is also not bound to specific rules. In effect, the users’ home directory is the users’ sole responsibility.

However, that also means that users have the means of filling up their home location as they see fit, possibly flooding the root file system if /home isn’t on a separate partition. For this reason, using a separate file system for /home is a good thing.

Another advantage of using a separate file system for /home is when you would decide to switch distributions: you can reuse your /home file system for other Linux distributions (or after a re-installation of your Linux distribution).

Permissions and Attributes

By default, Linux supports what is called a discretionary access control (DAC) permission system where privileges are based on the file ownership and user identity. However, projects exist that enable mandatory access control (MAC) on Linux, which bases privileges on roles and where the administrator can force security policies on files and processes.

As most MAC-based security projects (such as RSBAC, LIDS and grSecurity) are not part of the default Linux kernel yet, I will talk about the standard, discretionary access control mechanism used by almost all Linux distributions. SELinux, which is part of the default Linux kernel, will also not be discussed. If you are interested in running a SELinux powered system, I recommend to use Gentoo Hardened which supports SELinux. There is also a Gentoo Hardened SELinux Handbook which is worth reading through.

Read, Write and Execute

The Linux file system supports various permission flags for each file or directory. You should see a flag as a feature or privilege that is either enabled or disabled and is set independently of the other flags. The most used flags on a file system are the read (r), write (w) and execute (x) flags. Their meaning differs a bit based on the target.

However, supporting these flags wouldn’t make a system secure: you want to mix these privileges based on who works with the file. For instance, the system configuration files should only be writable by the administrator(s); some might not even be readable by the users (like the file containing the user passwords).

To enable this, Linux supports three kinds of privilege destinations:

  • the owner of the file (1st group of privileges)
  • the group owner of the file (2nd group of privileges)
  • everybody else (3rd group of privileges)

This way, you can place one set of privileges for the file owner, another set for the group (which means everybody who is member of the group is matched against these privileges) and a third one set for everybody else.

In case of a file,

  • the read privilege informs the system that the file can be read (viewed)
  • the write privilege informs the system that the file can be written to (edited)
  • the execute privilege informs the system that the file is a command which can be executed

As an example, see the output of the ls -l command:

$ ls -l /etc/fstab
-rw-r--r-- 1 root root 905 Nov 21 09:10 /etc/fstab

In the above example, the fstab file is writable by the root user (rw-) and readable by anyone else (r–).

In case of a directory,

  • the read privilege informs the system that the directory’s content can be viewed
  • the write privilege informs the system that the directory’s content can be changed (files or directories can be added or removed)
  • the execute privilege informs the system that you are able to jump inside the directory (using the cd command)

As an example, see the output of the ls -ld command:

$ ls -ld /etc/cron.daily
drwxr-x--- 2 root root 4096 Nov 26 18:17 /etc/cron.daily/

In the above example, the cron.daily directory is viewable (r), writable (w) and “enterable” (x) by the root user. People in the root group have view- and enter rights (r-x) whereas all other people have no rights to view, write or enter the directory (—).

Viewing Privileges

To view the privileges on a file, you can use the long listing format support of the ls command. For instance, to view the permissions on the systems’ passwd file (which contains the user account information):

$ ls -l /etc/passwd
-rw-r--r-- 1 root root 3108 Dec 26 14:41 /etc/passwd

This file’s permissions are read/write rights for the root user and read rights for everybody else.

The first character in the permission output shows the type of the file:

  • ‘-‘: regular file
  • ‘d’: a directory
  • ‘l’: a symbolic link
  • ‘b’: a block device (like /dev/sda1)
  • ‘c’: a character device (like /dev/console)
  • ‘p’: a named pipe
  • ‘s’: a unix domain socket

The rest of the permission output is divided in three parts: one for the file owner, one for the file owning group and one for all the rest. So, in the given example, we can read the output ‘-rw-r–r–‘ as:

  1. the file is a regular file
  2. the owner (root – see third field of the output) has read-write rights
  3. the members of the owning group (also root – see fourth field of the output) have read rights
  4. everybody else has read rights

Another example would be the privileges of the /var/log/sandbox directory. In this case, we also use ls-d argument to make sure ls shows the information on the directory rather than its contents:

$ ls -ld /var/log/sandbox
drwxrwx--- 2 root portage 4096 Jul 14 18:47 /var/log/sandbox

In this case:

  1. the file is a directory
  2. the owner (root) has read, write and execute rights
  3. the members of the owning group (portage) also have read, write and execute rights
  4. everybody else can’t do anything (no read, no execute and certainly no write rights)

Another method to obtain the access rights is to use the stat command:

$ stat /etc/passwd
  File: `/etc/passwd'
  Size: 3678        Blocks: 8          IO Block: 4096   regular file
Device: 808h/2056d  Inode: 3984335     Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-03-18 21:46:06.000000000 +0100
Modify: 2013-03-18 21:46:06.000000000 +0100
Change: 2013-03-18 21:46:06.000000000 +0100

In the output of the stat command, you notice the same access flags as we identified before (-rw-r–r– in this case), but also a number. This number identifies the same rights in a short-hand notation.

To be able to read the number, you need to know the values of each right:

  • execute rights gets the number 1
  • write rights gets the number 2
  • read rights gets the number 4

To get the access rights of a particular group (owner, group or everybody else), add the numbers together.

For a file with privileges (-rw-r–r–), this gives the number 644:

  • 6 = 4 + 2, meaning read and write rights for the owner
  • 4 = 4, meaning read rights for the group
  • 4 = 4, meaning read rights for everybody else

The first 0 that we notice in stats‘ output identifies the file as having no very specific privileges.

Specific Privileges

There are a few specific privileges inside Linux as well.

The restricted deletion flag, or sticky bit, has been identified before. When set on a directory, it prevents people with write access to the directory, but not to the file, to delete the file (by default, write access to a directory means that you can delete files inside that directory regardless of their ownership). The most well-known use for this flag is for the /tmp location:

$ stat /tmp
  File: `/tmp'
  Size: 28672      Blocks: 56         IO Block: 4096   directory
Device: 808h/2056d Inode: 3096577     Links: 759
Access: (1777/drwxrwxrwt)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2010-01-10 17:44:04.000000000 +0100
Modify: 2013-04-02 00:04:36.000000000 +0200
Change: 2013-04-02 00:04:36.000000000 +0200

Another specific privilege that we have identified before is the setuid or setgid flag. When set on an executable (non-script!), the executable is executed with the rights of the owner (setuid) or owning group (setgid) instead of with the rights of the person that is executing it. That does mean that people with no root privileges can still execute commands with root privileges if those commands have the setgid flag set. For this reason, the number of executables with the setuid/setgid bit set need to be limited and well audited for possible security exposures. A nice example for this flag is /bin/mount:

$ stat /bin/mount
  File: `/bin/mount'
  Size: 59688      Blocks: 128        IO Block: 4096   regular file
Device: 808h/2056d Inode: 262481      Links: 1
Access: (4711/-rws--x--x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2010-02-06 13:50:35.000000000 +0100
Modify: 2013-01-02 13:50:35.000000000 +0100
Change: 2013-01-02 13:50:43.000000000 +0100

Changing Privileges

To change the privileges of a file or directory, you should use the chmod command (change mode). Its syntax is easy enough to remember well. First, the target permissions:

  • ‘u’ for user,
  • ‘g’ for group, and
  • ‘o’ for everybody else (others)

Then you can set (=), add (+) or remove (-) privileges. For instance, to make /etc/passwd writable for the members of the owning group:

# chmod g+w /etc/passwd

You can also combine privileges. For instance, if you want to remove write privileges for the owning group and remove read privileges for the others:

# chmod g-w,o-r /etc/passwd

Finally, you can use the numeric notation if you want as well:

# chmod 644 /etc/passwd

Changing Ownership

When you need to change the ownership of a file or directory, use the chown (change owner) or chgrp (change group) command. For instance, to change the owner of a file to the user “jack”:

# chown jack template.txt

To change the owner of a file, you need to be root though – it will not help if you are the current owner. This is not true for the group though: if you are a member of the target group, you can change the owning group:

$ ls -l bar
-rw-r--r-- 1 swift  users   0 May 13 20:41 bar
$ chgrp dialout bar
$ ls -l bar
-rw-r--r-- 1 swift  dialout 0 May 13 20:41 bar

If you need to change the owner and group, you can use a single chown command: just separate the target owner and group with a colon, like so:

# chown jack:dialout template.txt

Attributes

Some file systems allow you to add additional attributes to files. These attributes might have influence on the permissions / usage of these files, or on how the operating system works with these files. Not many distributions use these attributes, because not all file systems support them.

Listing and Modifying Attributes

To view the attributes of a file, you can use the lsattr command (list attributes); to modify the attributes, use chattr (change attributes). As Gentoo does not have an example file, lets’ create one first:

# touch /tmp/foo
# chattr +asS /tmp/foo

Now let’s see what lsattr has to say:

# lsattr /tmp/foo
s-S--a---------  /tmp/foo

Not a big surprise, given the chattr command before. But what does it mean? Well, man chattr gives us the information we need, but here is it in short-hand:

  • s: when the file is deleted, its blocks are zeroed and written back to disk (unlike regular files where only the reference to the file is deleted)
  • S: when changes are made to the file, the changes are immediately synchronized to disk (no memory caching allowed)
  • a: the file can only be appended (data is added to the file); changes are not allowed to existing content. Very useful for log files.

Another very interesting attribute is the immutable flag (i) that doesn’t allow the file to be deleted, changed, modified, renamed or moved.

POSIX ACLs

Next to the discretionary access controls applicable to a Linux file (user, group and others), it is possible to add more access controls on files and directories through POSIX ACLs.

With the getfacl command, the access controls on a file or directory are shown, together with the POSIX access controls (if applicable). The setfacl command can be used to add or remove POSIX ACLs from the file or directory.

For instance, to allow the user minidlna read access on a file that he otherwise has no access to:

$ setfacl -m u:minidlna:r TEMPFILE
$ getfacl TEMPFILE
# file: home/swift/TEMPFILE
# owner: swift
# group: users
user::rw-
group::r--
other::r--
user:minidlna:r--

Supporting POSIX ACLs requires specific file system support, so it might be necessary to enable this in the kernel. Also, the file system should be mounted with the “acl” mount option.

Generic extended attributes

Files can have additional extended attributes assigned to them. POSIX ACLs for instance uses an extended attribute called system.posix_acl_access, whereas SELinux uses an extended attribute called security.selinux. Extended attributes are metadata assigned to files that are used by one or more applications for a particular function.

When the extended attribute is in the security namespace (the name starts with “security.“) then this is a security-sensitive attribute and can only be modified by administrators or properly privileged users.

To list all extended attributes assigned to a file, use getfattr:

$ getfattr -m . -d TEMPFILE
# file: home/swift/TEMPFILE
security.selinux="staff_u:object_r:user_home_t"

Usually users do not need to modify extended attributes directly; instead, the application(s) supporting these extended attributes will take care of this. But it is and remains possible to directly modify extended attributes (again, if you have the proper permissions) using setfattr.

Locating Files

With all these locations, it might be difficult to locate a particular file. Most of the time, the file you want to locate is inside your home directory (as it is the only location where you have write privileges). However, in some cases you want to locate a particular file somewhere on your entire system.

Luckily, there are a few commands at your disposal to do so.

mlocate

The locate command manages and uses a database of files to help you find a particular file. Before you can use locate, you first need to install it (the package is called sys-apps/mlocate) and then create the file database. Also, this database is not automatically brought up to date while you modify your system, so you’ll need to run this command (which is the same for creating a new database or updating an existing one) every now and then:

# updatedb

A popular way of keeping this database up to date is to use the system scheduler (called cron) which is discussed later.

When your database is build and somewhat up to date, you can locate any particular file on your filesystem using locate:

# locate make.conf
/etc/portage/make.conf
(...)
/usr/portage/local/layman/make.conf

As you can see, the locate command returns all files it has found where the string (in this case, “make.conf”) is used in the filename, even when the file name is different.

The name mlocate is the name of the project that maintains the package. Earlier in history, the package of choice for the locate functionality was slocate.

find

The find command is a very important and powerful command. Unlike locate, it only returns live information (so it doesn’t use a database). This makes searches with find somewhat slow, but find‘s power isn’t speed, but the options you can give to find a particular file…

Regular find patterns

The most simple find construct is to locate a particular file inside one or more directories. For instance, to find files or directories inside /etc whose name is dhcpd.conf (exact matches):

$ find /etc -name dhcpd.conf
/etc/dhcp/dhcpd.conf

To find files (not directories) where dhcpd is in the filename, also inside /etc directory:

$ find /etc -type f -name '*dhcpd*'
/etc/conf.d/dhcpd
/etc/init.d/dhcpd
/etc/udhcpd.conf
/etc/dhcp/dhcpd.conf

To find files in the /etc directory who have been modified within the last 7 days (read: “less than 7 days ago”):

$ find /etc -type f -mtime -7
/etc/mtab
/etc/adjtime
/etc/wifi-radar.conf
/etc/genkernel.conf

You can even find files based on their ownership. For instance, find the files in /etc that do not belong to the root user:

$ find /etc -type f -not -user root

Combining find patterns

You can also combine find patterns. For instance, find files modified within the last 7 days but whose name does not contain .conf:

$ find /etc -type f -mtime -7 -not -name '*.conf'
/etc/mtab
/etc/adjtime

Or, find the same files, but the name should also not be mtab:

$ find /etc -type f -mtime -7 -not \( -name '*.conf' -or -name mtab)
/etc/adjtime

Working with the results

With find, you can also perform tasks on the results. For instance, if you want to view the “ls -l” output against the files that find finds, you can add the -exec option. The string after -exec should contain two special character sequences:

  • '{}' represents the file found by the find command. The command given to the -exec option is executed and ‘{}’ is substituted with the filename.
  • \; ends the command in the -exec clause.
$ find /etc -type f -mtime -7 -exec ls -l '{}' \;

On the Internet, you’ll also find the following construction:

$ find /etc -type f -mtime -7 | xargs ls -l '{}' 

The result is the same, but its behaviour is somewhat different.

When using -exec, the find command executes the command for every file it encounters. The xargs construction will attempt to execute the command as little as possible, based on the argument limits.

For instance, if the find command returns 10000 files, the command given to -exec is executed 10000 times, once for every file. With xargs, the command might be executed only a few dozen times. This is possible because xargs appends multiple files for a single command as it assumes that the command given can cope with multiple files.

Example run for find -exec:

ls -l file1
ls -l file2
...
ls -l file10000

Example run for xargs:

ls -l file1 file2 ... file4210
ls -l file4211 file4212 ... file9172
ls -l file9173 file9174 ... file10000