With sort, you can order files based on the order in the dictionary or by numerical value, randomize file lines, remove duplicate lines, and check if a file is sorted.
You may be able to do other things with it but first, let’s worry about wrapping our heads around how to use sort in bash scripts.
What is sort?
Sort is an external command that concatenates files while sorting their contents according to a sort type and writes results of sort to standard output.
Sort command options for bash
The sort command comes with 31 options (13 main and 18 categorized as other). Most experienced bash programming (even experts) know only a few main sort options required to get by. Others are seldom touched. Lucky for you we have time to touch them all.
Main sort options
These are the options that help you get things done and sort (Sorting) in addition to manipulate sorted results (Post processing) and apply filters (Filters) prior to sorting.
Sorting
Sort comes with 5 different types of sorting. Here is A table showing each sort type with associated options.
Sort | Short option / long option / etc word |
Numeric sort (general) | -g / –general-numeric-sort general-numeric support for scientific notation 0.1234e4 = 1234 |
Numeric sort (human) | -h / –human-numeric-sort human-numeric 1.234K = 1234 |
Numeric | -n / –numeric-sort numeric … < -1 < 0 < 1 < … |
Month | -M / –month-sort month Unknown < Jan < Feb < … < Nov < Dec |
Random | -r / –random-sort random |
Version | -V / –version-sort version |
Note that each type of sort has a long option ending with -sort. In addition to specific sort options, the –sort=WORD option may be used to sort by word. For example –sort=random may be used in place of –random-sort or -r.
Examples
Here are some sort command examples for each sorting method.
Example) Sorting names
Sort has no issues sorting lines alphabetically. Consider a list of famous people not sorted.
Function
{
curl –silent https://www.biographyonline.net/people/famous-100.html
| grep post-content | sed -e ‘s/<[^>]*.//g’ -e ‘s/WWII//g’ -e ‘s/(Wilbur)
/1 Wright/’| grep -o -e ‘(([A-Z]+[.]?)+[a-z]*s)+([0-9]+s[^)]+.’
}
Command line
Output
Steve Jobs (1955 – 2012)
Sting (1951 – )
Tiger Woods (1975 – )
Tom Cruise (1962 – )
Usain Bolt (1986 – )
Vinci (1452 – 1519)
Walt Disney (1901 – 1966)
Wilbur Wright (1867 – 1912)
Woodrow Wilson (1856 – 1924)
Example) General numeric sort
If we need to sort numeric values taking into fact scientific notation such as 99e2, we can use general numeric sort.
Function
{
seq 100 | sort –random-sort | sed ‘3i 9e2’ | sed ‘3i 99K’
}
Consider the sorted output using each method. Note that in addition to containing values 1 through 100, the list also includes ‘9e12′ (900) and ’99K’ (99000).
Command line
Output
97
98
99
99K
100
What about 900 and 99000. That’s right it’s just numeric sort. Next.
Command line
Output
97
98
99
100
99K
What about 900. That’s right it’s just human numeric sort. Next.
Command line
Output
97
98
99
99K
100
9e2
What about 99000. That’s right it’s just general numeric sort. As you see no sorting method is compatible in this case; however, that doesn’t mean you can’t come up with a fix.
Command line
Output
97
98
99
100
9e2
99e3
Now that’s more like it.
Example) Human numeric sort
If we need to sort numerical values taking into fact the meaning of notations such as K, G, M, and E, we can use human numeric sort.
Command line
Output
97
98
99
100
3k
Example) Numeric sort
If all we need is to sort integers numeric sort does the trick.
Command line
Output
96
97
98
99
100
Example) Month sort
Month sort allows you to order lines by month. It could prove useful for grouping lines together by month especially in the case that the option of sorting by time is not available.
Function
{
cat <<EOF
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
EOF
}
Suppose that are months are not sorted.
Command line
Output
Oct
Dec
Apr
May
Sep
Aug
Nov
Jul
Jan
Feb
Jun
We can always sort by month.
Command line
Output
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Note that if we change Dec to any substring in November say ‘Novem’, it will appear after ‘Nov’ in the sorted output.
Example) Random sort – kill someone else’s terminal
As expected, random sort does the opposite of sorting, mixes up lines.
Suppose that for education purposes we want to kill another user. We would have to make sure it is not our pty and randomize the listings so that it is nicer and that we can say that ptys were selected at random.
Commands
{
{
local pty;
pty="${1}"
};
echo -n "You’re going down in" > /dev/${pty};
for i in 5 4 3 2 1;
do
sleep 1;
echo -n " ${i}" > /dev/${pty};
done;
echo " Bye!" > /dev/${pty};
sleep 1
}
{
ps | grep pty | grep -v -e $( mypty ) | sort –random-sort | head -1 > stdin;
{
message-pty $( pty < stdin );
kill $( pid < stdin )
}
}
Output in someone else’s terminal
You’re going down in 5 4 3 2 1 Bye!]
(exit)
Example) Version sort – sorting ips
As you know source files may be versioned using strings such as 1.0. Furthermore, versions may go deeper with version numbers like 1.0.0 such as seen in popular semantic version schemes.
Version sort allows you to sort version numbers. Great! Now what? Let’s test it out.
For this example, I’ve prepared a bash script to generate random ips so that we don’t have to go there. It’s in the repo. For those of us that don’t have the repo here’s a quick start.
Commands
alias random-ips=‘test -f "linuxhint.com/generate-random-ips.sh" ; bash ${_}’
Now that you are ready let’s get started.
Command line
Output
180.33.247.107
87.130.125.109
76.86.8.20
162.41.183.150
226.58.10.196
83.121.11.145
80.199.197.19
44.214.89.52
185.174.143.111
Okay, it works. Now let’s see what happens when we try and sort ips.
Command line
Output
8.96.11.181
82.169.213.206
84.218.132.51
84.3.101.97
87.137.131.40
87.59.32.91
89.149.111.242
97.121.162.244
98.145.130.186
At first glance, it appears to work but lines like 8.96.11.181 should appear elsewhere.
Commands
for o in d h n V g M
do
sort ips –${o} > ips${o,,}
done
{
echo all sorts equal numeric sort
diff ips{n,d} 1>/dev/null || echo dictionary order != numeric sort
diff ips{n,h} 1>/dev/null || echo human numeric sort != numeric sort
diff ips{n,g} 1>/dev/null || echo general numeric sort != numeric sort
diff ips{n,v} 1>/dev/null || {
echo version sort != numeric sort
show_n_v_ips_diff="true"
}
}
test ! "${show_n_v_ips_diff}" || diff ips{n,v}
}
Output
dictionary order != numeric sort
version sort != numeric sort
13,14d12
< 44.221.43.20
< 44.27.108.172
15a14,15
> 44.27.108.172
> 44.221.43.20
27d26
< 84.218.132.51
29c28
< 87.137.131.40
As you see version sort allows you to sort version numbers when other sorting methods fail.
Example) Version sort – sorting file names with version numbers
Building on the last example, let’s use version sort a little closer to its intended use. As you know, version numbers commonly appear in filenames. See Details about version sort.
First, let’s transform ips into something else more project source file like.
Commands
alpha="abcdefghijklmnopqrstuvwxyz";
echo -n ${alpha:$(( RANDOM % 26 )):1}
}
beta () {
alpha="ab";
echo -n ${alpha:$(( RANDOM % 2 )):1}
}
{
cat ips | while read -r line; do
echo $(alpha)-v${line}$(test $(( RANDOM % 5 )) -eq 0 || beta).tar.gz;
done | tee sips
}
Output
k-v117.38.14.165a.tar.gz
d-v87.59.32.91a.tar.gz
h-v115.215.64.100.tar.gz
s-v72.174.246.218b.tar.gz
h-v163.93.19.173.tar.gz
u-v184.225.11.92b.tar.gz
y-v205.53.5.211a.tar.gz
t-v175.196.164.17b.tar.gz
e-v167.42.221.178b.tar.gz
c-v126.54.190.189b.tar.gz
b-v169.180.221.131a.tar.gz
y-v210.125.170.231a.tar.gz
x-v71.56.120.9b.tar.gz
Exercise
Make the above commands run faster using xargs
See example in how to use xargs command in bash scripts.
This time, we won’t even bother using any of the other sorting methods.
Command line
Output
e-v62.140.229.42a.tar.gz
e-v149.77.211.215a.tar.gz
e-v167.42.221.178b.tar.gz
e-v194.189.236.29a.tar.gz
e-v198.145.199.84b.tar.gz
e-v240.1.147.196b.tar.gz
f-v50.100.142.42b.tar.gz
f-v117.58.230.116.tar.gz
f-v139.17.210.68b.tar.gz
f-v153.18.145.133b.tar.gz
g-v201.153.203.60b.tar.gz
g-v213.58.67.108.tar.gz
h-v5.206.37.224.tar.gz
Now you see that version sort may be useful when sorting file names with version numbers.
Pre sort
Sort has four main options that affect actual sorting namely, –ignore-leading-blanks, –ignore-case, –ignore-nonprinting, and –dictionary-order, that may or may not overlap. Example using each option follow.
Sort ignoring leading blanks
Sort allows input leading blanks to be ignored as an option. Leading blanks are preserved in the sorted output.
Option
Usage
Commands
cat >> fp << EOF
Marilyn Monroe (1926 – 1962)
Abraham Lincoln (1809 – 1865)
EOF
cat fp | sort | tac
Output
Albert Einstein (1879 – 1955)
Al Gore (1948 – )
Abraham Lincoln (1809 – 1865)
Marilyn Monroe (1926 – 1962)
Abraham Lincoln (1809 – 1865)
Note that leading spaces in lines added to fp appear first in sort output.
To fix this we need to ignore leading blanks as follows.
Commands
cat >> fp << EOF
Marilyn Monroe (1926 – 1962)
Abraham Lincoln (1809 – 1865)
EOF
cat fp | sort –ignore-leading-blanks –ignore-leading-blanks | tac
Output
Marilyn Monroe (1926 – 1962)
Marie Antoinette (1755 – 1793)
…
Albert Einstein (1879 – 1955)
Al Gore (1948 – )
Abraham Lincoln (1809 – 1865)
Abraham Lincoln (1809 – 1865)
Alternatives
Note that the alternative does not preserve leading blanks in sort output.
Sort ignoring case
Sort allows input case to be ignored as an option. The case is preserved in the sorted output.
Option
Usage
Commands
cat >> fp << EOF
abraham Lincoln (1809 – 1865)
ABraham Lincoln (1809 – 1865)
EOF
cat fp | sort | tac
Output
Alfred Hitchcock (1899 – 1980)
Albert Einstein (1879 – 1955)
Al Gore (1948 – )
Abraham Lincoln (1809 – 1865)
ABraham Lincoln (1809 – 1865)
Note that leading spaces in lines added to fp appear first in sort output.
To fix this we need to ignore leading blanks as follows.
Commands
cat >> fp << EOF
abraham Lincoln (1809 – 1865)
ABraham Lincoln (1809 – 1865)
EOF
cat fp | sort –ignore-case | tac
Output
Alfred Hitchcock (1899 – 1980)
Albert Einstein (1879 – 1955)
Al Gore (1948 – )
Abraham Lincoln (1809 – 1865)
abraham Lincoln (1809 – 1865)
ABraham Lincoln (1809 – 1865)
Alternatives
Note that the alternative does not preserve case in sort output.
Sort ignoring nonprinting
Sort allows input nonprinting to be ignored as an option. Nonprinting is preserved in the sorted output.
Option
Usage
Commands
echo -e "x90Abe" >> fp
cat fp | sort | tac
Output
Angelina Jolie (1975 – )
Amelia Earhart (1897 – 1937)
Alfred Hitchcock (1899 – 1980)
Albert Einstein (1879 – 1955)
Al Gore (1948 – )
Abraham Lincoln (1809 – 1865)
Looks like we are missing an ‘Abe’ do to non-printing characters in sort input.
To fix this we need to ignore non-printing characters.
Commands
echo -e "x90Abe" >> fp
cat fp | sort –ignore-nonprinting | tac
[/cc
<strong>Output</strong>
[cc lang="bash"]
Amelia Earhart (1897 – 1937)
Alfred Hitchcock (1899 – 1980)
Albert Einstein (1879 – 1955)
Al Gore (1948 – )
Abraham Lincoln (1809 – 1865)
▒Abe
Sort dictionary order
Sort allows all input to be ignored except spaces and alphanumeric characters as an option. Input is preserved in the sorted output.
echo -e "x90Abe" >> fp
cat fp | sort –d | tac
Post sort
Sort has one main option that doesn’t affect sorting namely, –reverse. However, it affects the output, allowing order to be toggled between ascending and descending. An example follows.
Sort reverse output
Sort allows output to be displayed in reverse order as an option.
Option
Usage
Command line
Output
Amelia Earhart (1897 – 1937)
Alfred Hitchcock (1899 – 1980)
Albert Einstein (1879 – 1955)
Al Gore (1948 – )
Abraham Lincoln (1809 – 1865)
Alternatives
Other options for sort
There are twenty-two other options for sort. Examples follow.
Sort check
Sort has an option that allows you to check if the input is sorted. It returns after the first instance of an unsorted line. It the case that input is required to be sorted but is likely already in order, using sort check is appropriate.
Option
Usage
Command line
Output
Command line
Output
Sort output
Sort has an option that allows you to specify a file to write to instead of using standard output or redirection. Its use may improve compatibility across scripting environments.
Option
Usage
Command line
Output
Sort null terminated
Sort has an option that allows you to set the line delimiter to null instead of a newline.
Option
Usage
Command line
Output
Sort stable
Sort has an option that allows you to disable last-resort comparison. As a result, more stable runtimes may be achieved in the case of large enough inputs that could cause sort to run unstable.
Option
Usage
Command line
Output
user 0m9.201s
sys 0m0.107s
Sort buffer size
Sort has an option that allows you to set the amount of memory used as a buffer while sorting. It can be used to limit memory consumption sorting larger inputs. Performance may be affected.
Option
Usage
Command line
time seq 1000000 | sort –random-sort | sort –stable –buffer-size=64 >/dev/null
Output
user 0m9.858s
sys 0m2.092s
Sort unique
Sort has an option that allows you to remove duplicate lines in sort output
Option
Usage
Command line echo 1 2 2 4 5 | tr ’40’ ’00’ | sort –zero-terminated –unique
Output
Alternatives
Conclusion
Sort is an external command useful not only when used in combination with other external commands but also comes in handy when used with commands with no built-in ordering method such as a user-defined function or bash scripts in general.