RHCE 2: Manipulating files in Linux. The following blog post is a concise summary of how one can interact with files on a Linux system. In fact the information contained herein applies to all Linux and UNIX. By my own admission, if you learn everything contained in this one post of the many posts on my blog, you’ll be well on your way when it comes to turning your hand to any UNIX or Linux system. Basic but essential knowledge.
Creating a file
“Everything is a file”. You’ll hear that said about UNIX and/or Linux. Unlike Windows, there is no registry, just the filesystem. As such, everything is represented by a file somewhere in the filesystem. More on the different types of file later.
cat, touch, vi, vim, nano, tee and > (after a command) are all used to create files. tee is special since when you pipe a command into tee, it will write standard input to standard output as well as displaying the results on screen (using > would hide output to the screen as it redirects it to a file instead).
Listing files
ls, ll, ll -i (displays inode of the file) commands are used with many possible switches to display directory listings. e.g. ls -al (long listing showing permissions, including hidden files) ls -lart (same but sorted in reverse date order too) are common uses of the ls command.
Display contents of a file
cat, more, less, head, tail, view, vi, vim, nano, uniq and strings are all commands used to display files in similar but slightly different ways, i.e. in their entirety, a page at a time, the top lines, the bottom lines, in an editor, just unique lines of the file and just ascii text (not binary information) contained within a file.
Copy or Rename a file
cp, rsync, tar, cpio, mv -can all be used to copy files, move files or rename files. In Linux, you don’t rename a file, you move it.
Remove a file
rm, erase, rmdir (if it’s a directory, though rm -r will recurse through the tree removing subdirectories as well as files contained beneath the specified starting point. This is dangerous, especially when used with rm -rf to force it.)
Ownership and Permissions
Like Windows, files have an owning user, they also have an owning group attribute as well as permissions that dictate what level of access the owning user, owning group and everyone else has to the file (or directory). This is slightly different to Windows, whereby permissions can be set on multiple groups added to the ACL (access control list) of a file (or directory) and takes some getting used to.
To change owner or group use the chown and chgrp commands, or just the chown user:group command to do both in one go.
To change the permissions, use the chmod command.
-rwxrwxrwx where – means regular file (more on different file types later), then the first rwx is read, write, execute permissions of the owner, the second rwx is the same for the group and the third rwx is everyone else. Each permission bit has a value
– 421 421 421
So to set permissions of owner full access, group read, everyone read i.e. rwxr–r– would be 4+2+1, 4, 4 i.e. 744 so chmod 744 filename. Full access for everyone would be chmod 777 filename.
Types of file
Regular (ascii or binary)
Executable (allowed to execute)
Directory (contains one or more files)
Symlink (hard or soft link to another file – hard has it’s own inode but is still linked, soft shares the inode of the linked file. ln -s realfile linkfile is a common use. It’s common to get the order the wrong way around.)
Device (character/raw or block special files are used to send streams of data to kernel modules which controls the sending of the data stream to hardware, e.g. a volume group has a character special file, a disk device has a block special file)
Named Pipe (fifo – first in first out used to send one-way streams of data to other processes (inter-process communication or IPC).
Socket -a two-way named pipe. Used for system services for example, whereby information is received and transmitted.
File attributes
Besides permissions that control access to a file, files on a Linux system can also have attributes applied to them that controls what can and can’t be done to the file – even by the root user.
stat -Display statistics about a file.
wc -Word count a file (can also be used with wc -l to count lines in a file, or wc -c to count characters)
lsattr -List attributes of a file.
chattr -Change attributes of a file.
a -Can only be appended to
A -Access time not updated
c -Auto compress
d -cannot be backed up by the dump command
D -contents of the directory are written synchronously to disk
i -is immutable (cannot be changed or deleted)
j -is added to the journal before being written to disk on journalling file systems
s -is securely deleted, i.e. actual data blocks are wiped too
S -file is synchronously written to disk
u -undeletable
Pattern matching
The famous grep command is used to simply match lines of text contained in a file, or more cleverly lines containing patterns of text (defined by regular expressions) in a file or files. More on Regular Expressions will be covered later.
grep -l pattern file1 file2 file3 -finds lines containing pattern in files file1, file2 and file3
grep -n pattern file1 -find the pattern and displays the line numbers where the matches occur.
grep -v -anything but the pattern matches
grep ^pattern or grep pattern$ matches the patterns when they occur at the beginning or the end of the line only.
grep -i ignores case (because Linux is case sensitive of course)
egrep or grep -E ‘pattern1|pattern2’ file1 -displays either pattern matched
Comparing files
diff, comm and grep are used to compare two files and print matching lines and differing lines, e.g. diff -c file1 file2 displays the output in 3 sections. comm 123 file1 file2 very similar to diff -c whereby section 1, 2 and/or 3 are suppressed instead of displayed. Section 1 contains lines unique to file1, section2 contains lines unique to file2 and section3 contains lines in both. Use of comm takes some getting used to, so read the man page to be sure you’re getting the results you’re after and not something else, or just use diff -c. comm is very cool tool though, and I find myself using it more than diff. A new favourite is grep -Fxv -f decommissioned backupclients which would list any lines in a list of backupclients that were not found in the decommissioned list.
Finding files
The find command in UNIX/Linux is fantastic, but like Linux itself, it has a reputation for having a steep learning curve. I’ll try to make it easy by keeping this short and sweet.
find path option action where option and action have values and commands specifed respectively, i.e. find path option value action command
e.g. find ./ -size -1G -exec ls -al {} \; find ./ -size -1G -exec ls -al {} \; will find files from the present working directory down that are less than 1Gb and will long list any matches
other options are
-name match names (can also use regular expressions like grep)
-atime last accessed time
-user owning user is
-mtime last modified time
-ctime change time
-group owning group
-perm permissions are e.g. 744
-inum inode number is
-exec can be replaced with -ok or -print to keep the command simpler for simpler finding requirements. -exec can execute any command upon the files found that match the specified matched conditions, e.g. ls, cp, mv or rm (very dangerous).
the locate command can also be used to find files. for executable binary commands, it might be quicker to use which or whereis to display the path of the binary that would be executed if the full path was not specified (relying upon the PATH environment variable to locate and prioritise. Also check for any command aliases in your ~/.profile and ~/.bashrc if whereis or which turns nothing up as a command alias by one name may be calling a binary by another name. I begin to digress!
Sorting files
sort
sort -k2 -n -sort on column 2, numerically (useful if the file contains columns of data). Can also be used to sort by month, e.g. ls -al | sort -k 6M and use -o outputfile to write results to a file rather than > or >>
Extracting data from a file
cut and awk can be used to extract delimited lines of data from a file or columns of data from a file respectively, e.g.
cat filename | cut -d, -f3 filename -displays the third key in a comma delimited file
cat filename | awk {‘print $3’} -displays the third column in a file
Translating data in a file
sed and tr are stream editors for filtering and transforming text and translating or deleting characters respectively. many great examples of sed are to be found on the internet.
a simple example of sed would be echo day | sed s/day/night/ to convert all occurrences of the word day into night.
a similar, simple example of tr would be tr “day” “night” < input.txt > output.txt