Bash sort command

Bash sort command
Good luck trying to implement a sort algorithm in bash than finishes before tomorrow. No worries, you don’t need to because you have the sort command.

With sort, you can order files based on the order in the dictionary or by numerical value, randomize file lines, remove duplicate lines, and check if a file is sorted.

You may be able to do other things with it but first, let’s worry about wrapping our heads around how to use sort in bash scripts.

What is sort?

Sort is an external command that concatenates files while sorting their contents according to a sort type and writes results of sort to standard output.

Sort command options for bash

The sort command comes with 31 options (13 main and 18 categorized as other). Most experienced bash programming (even experts) know only a few main sort options required to get by. Others are seldom touched. Lucky for you we have time to touch them all.

Main sort options

These are the options that help you get things done and sort (Sorting) in addition to manipulate sorted results (Post processing) and apply filters (Filters) prior to sorting.

Sorting

Sort comes with 5 different types of sorting. Here is A table showing each sort type with associated options.

Sort Short option / long option / etc
word
Numeric sort (general) -g / –general-numeric-sort
general-numeric
support for scientific notation
0.1234e4 = 1234
Numeric sort (human) -h / –human-numeric-sort
human-numeric
1.234K = 1234
Numeric -n / –numeric-sort
numeric
… < -1 < 0 < 1 < …
Month -M / –month-sort
month
Unknown < Jan < Feb < … < Nov < Dec
Random -r / –random-sort
random
Version -V / –version-sort
version

Note that each type of sort has a long option ending with -sort. In addition to specific sort options, the –sort=WORD option may be used to sort by word. For example –sort=random may be used in place of –random-sort or -r.

Examples

Here are some sort command examples for each sorting method.

Example) Sorting names

Sort has no issues sorting lines alphabetically. Consider a list of famous people not sorted.

Function

famous-people()
{
curl –silent https://www.biographyonline.net/people/famous-100.html
| grep post-content | sed -e ‘s/<[^>]*.//g’ -e ‘s/WWII//g’ -e ‘s/(Wilbur)
/1 Wright/’
| grep -o -e ‘(([A-Z]+[.]?)+[a-z]*s)+([0-9]+s[^)]+.’
}

Command line

famous-people | sort

Output

Stephen King (1947)
Steve Jobs (19552012)
Sting (1951)
Tiger Woods (1975)
Tom Cruise (1962)
Usain Bolt (1986)
Vinci (14521519)
Walt Disney (19011966)
Wilbur Wright (18671912)
Woodrow Wilson (18561924)

Example) General numeric sort

If we need to sort numeric values taking into fact scientific notation such as 99e2, we can use general numeric sort.

Function

unsorted-numeric-values ()
{
seq 100 | sort –random-sort | sed ‘3i 9e2’ | sed ‘3i 99K’
}

Consider the sorted output using each method. Note that in addition to containing values 1 through 100, the list also includes ‘9e12′ (900) and ’99K’ (99000).

Command line

unsorted-numeric-values | sort -n

Output

96
97
98
99
99K
100

What about 900 and 99000. That’s right it’s just numeric sort. Next.

Command line

unsorted-numeric-values | sort -h

Output

96
97
98
99
100
99K

What about 900. That’s right it’s just human numeric sort. Next.

Command line

unsorted-numeric-values | sort -g

Output

96
97
98
99
99K
100
9e2

What about 99000. That’s right it’s just general numeric sort. As you see no sorting method is compatible in this case; however, that doesn’t mean you can’t come up with a fix.

Command line

unsorted-numeric-values | sed ‘s/[kK]/e3/’ | sort -g

Output

96
97
98
99
100
9e2
99e3

Now that’s more like it.

Example) Human numeric sort

If we need to sort numerical values taking into fact the meaning of notations such as K, G, M, and E, we can use human numeric sort.

Command line

seq 100 | sort –random-sort | sed ‘3i 3k’ | sort –h

Output

96
97
98
99
100
3k

Example) Numeric sort

If all we need is to sort integers numeric sort does the trick.

Command line

seq 100 | sort –random-sort | sort –numerics-sort

Output

95
96
97
98
99
100

Example) Month sort

Month sort allows you to order lines by month. It could prove useful for grouping lines together by month especially in the case that the option of sorting by time is not available.

Function

months ()
{
cat  <<EOF
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
EOF

}

Suppose that are months are not sorted.

Command line

months | sort –random-sort

Output

Mar
Oct
Dec
Apr
May
Sep
Aug
Nov
Jul
Jan
Feb
Jun

We can always sort by month.

Command line

months | sort –random-sort | sort –month-sort

Output

Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec

Note that if we change Dec to any substring in November say ‘Novem’, it will appear after ‘Nov’ in the sorted output.

Example) Random sort – kill someone else’s terminal

As expected, random sort does the opposite of sorting, mixes up lines.

Suppose that for education purposes we want to kill another user. We would have to make sure it is not our pty and randomize the listings so that it is nicer and that we can say that ptys were selected at random.

Commands

message-pty ()
{
{
local pty;
pty="${1}"
};
echo -n "You’re going down in" > /dev/${pty};
for i in 5 4 3 2 1;
do
sleep 1;
echo -n " ${i}" > /dev/${pty};
done;
echo " Bye!" > /dev/${pty};
sleep 1
}
{
ps | grep pty | grep -v -e $( mypty ) | sort –random-sort | head -1 > stdin;
{
message-pty $( pty < stdin );
kill $( pid < stdin )
}
}
Output in someone else’s terminal
You’re going down in 5 4 3 2 1 Bye!]
(exit)

Example) Version sort – sorting ips

As you know source files may be versioned using strings such as 1.0. Furthermore, versions may go deeper with version numbers like 1.0.0 such as seen in popular semantic version schemes.

Version sort allows you to sort version numbers. Great! Now what? Let’s test it out.

For this example, I’ve prepared a bash script to generate random ips so that we don’t have to go there. It’s in the repo. For those of us that don’t have the repo here’s a quick start.

Commands

git clone https://github.com/temptemp3/linuxhint.com.git
alias random-ips=‘test -f "linuxhint.com/generate-random-ips.sh" ; bash ${_}’

Now that you are ready let’s get started.

Command line

random-ips 200 | tee ips

Output

199.174.177.98
180.33.247.107
87.130.125.109
76.86.8.20
162.41.183.150
226.58.10.196
83.121.11.145
80.199.197.19
44.214.89.52
185.174.143.111

Okay, it works. Now let’s see what happens when we try and sort ips.

Command line

sort ips

Output

76.88.194.157
8.96.11.181
82.169.213.206
84.218.132.51
84.3.101.97
87.137.131.40
87.59.32.91
89.149.111.242
97.121.162.244
98.145.130.186

At first glance, it appears to work but lines like 8.96.11.181 should appear elsewhere.

Commands

{
for o in d h n V g M
do
sort ips –${o} > ips${o,,}
done
{
echo  all sorts equal numeric sort
diff ips{n,d} 1>/dev/null || echo dictionary order != numeric sort
diff ips{n,h} 1>/dev/null || echo human numeric sort != numeric sort
diff ips{n,g} 1>/dev/null || echo general numeric sort != numeric sort
diff ips{n,v} 1>/dev/null || {
echo version sort != numeric sort
show_n_v_ips_diff="true"
}
}
test ! "${show_n_v_ips_diff}" || diff ips{n,v}
}

Output

all sorts equal numeric sort
dictionary order != numeric sort
version sort != numeric sort
13,14d12
< 44.221.43.20
< 44.27.108.172
15a14,15
> 44.27.108.172
> 44.221.43.20
27d26
< 84.218.132.51
29c28
< 87.137.131.40

As you see version sort allows you to sort version numbers when other sorting methods fail.

Example) Version sort – sorting file names with version numbers

Building on the last example, let’s use version sort a little closer to its intended use. As you know, version numbers commonly appear in filenames. See Details about version sort.

First, let’s transform ips into something else more project source file like.

Commands

alpha () {
alpha="abcdefghijklmnopqrstuvwxyz";
echo -n ${alpha:$(( RANDOM % 26 )):1}
}
beta () {
alpha="ab";
echo -n ${alpha:$(( RANDOM % 2 )):1}
}
{
cat ips | while read -r line; do
echo $(alpha)-v${line}$(test $(( RANDOM % 5 )) -eq 0 || beta).tar.gz;
done | tee sips

}

Output

x-v56.16.109.54.tar.gz
k-v117.38.14.165a.tar.gz
d-v87.59.32.91a.tar.gz
h-v115.215.64.100.tar.gz
s-v72.174.246.218b.tar.gz
h-v163.93.19.173.tar.gz
u-v184.225.11.92b.tar.gz
y-v205.53.5.211a.tar.gz
t-v175.196.164.17b.tar.gz
e-v167.42.221.178b.tar.gz
c-v126.54.190.189b.tar.gz
b-v169.180.221.131a.tar.gz
y-v210.125.170.231a.tar.gz
x-v71.56.120.9b.tar.gz

Exercise

Make the above commands run faster using xargs

See example in how to use xargs command in bash scripts.

This time, we won’t even bother using any of the other sorting methods.

Command line

sort -V sips

Output

d-v127.100.108.192.tar.gz
e-v62.140.229.42a.tar.gz
e-v149.77.211.215a.tar.gz
e-v167.42.221.178b.tar.gz
e-v194.189.236.29a.tar.gz
e-v198.145.199.84b.tar.gz
e-v240.1.147.196b.tar.gz
f-v50.100.142.42b.tar.gz
f-v117.58.230.116.tar.gz
f-v139.17.210.68b.tar.gz
f-v153.18.145.133b.tar.gz
g-v201.153.203.60b.tar.gz
g-v213.58.67.108.tar.gz
h-v5.206.37.224.tar.gz

Now you see that version sort may be useful when sorting file names with version numbers.

Pre sort

Sort has four main options that affect actual sorting namely, –ignore-leading-blanks, –ignore-case, –ignore-nonprinting, and –dictionary-order, that may or may not overlap. Example using each option follow.

Sort ignoring leading blanks

Sort allows input leading blanks to be ignored as an option. Leading blanks are preserved in the sorted output.

Option

–ignore-leading-blanks

Usage

sort –ignore-leading-blanks

Commands

famous-people > fp
cat >> fp << EOF
Marilyn Monroe (1926 – 1962)
Abraham Lincoln (1809 – 1865)
EOF

cat fp | sort | tac

Output

Alfred Hitchcock (18991980)
Albert Einstein (18791955)
Al Gore (1948)
Abraham Lincoln (18091865)
Marilyn Monroe (19261962)
Abraham Lincoln (18091865)

Note that leading spaces in lines added to fp appear first in sort output.

To fix this we need to ignore leading blanks as follows.

Commands

famous-people > fp
cat >> fp << EOF
Marilyn Monroe (1926 – 1962)
Abraham Lincoln (1809 – 1865)
EOF

cat fp | sort –ignore-leading-blanks –ignore-leading-blanks | tac

Output

Marilyn Monroe (19261962)
Marilyn Monroe (19261962)
Marie Antoinette (17551793)

Albert Einstein (18791955)
Al Gore (1948)
Abraham Lincoln (18091865)
Abraham Lincoln (18091865)

Alternatives

cat fp | sed ‘s/^s*//’ | sort | tac

Note that the alternative does not preserve leading blanks in sort output.

Sort ignoring case

Sort allows input case to be ignored as an option. The case is preserved in the sorted output.

Option

–ignore-case

Usage

sort –ignore-case

Commands

famous-people > fp
cat >> fp << EOF
abraham Lincoln (1809 – 1865)
ABraham Lincoln (1809 – 1865)
EOF

cat fp | sort | tac

Output

Amelia Earhart (18971937)
Alfred Hitchcock (18991980)
Albert Einstein (18791955)
Al Gore (1948)
Abraham Lincoln (18091865)
ABraham Lincoln (18091865)

Note that leading spaces in lines added to fp appear first in sort output.

To fix this we need to ignore leading blanks as follows.

Commands

famous-people > fp
cat >> fp << EOF
abraham Lincoln (1809 – 1865)
ABraham Lincoln (1809 – 1865)
EOF

cat fp | sort –ignore-case | tac

Output

Amelia Earhart (18971937)
Alfred Hitchcock (18991980)
Albert Einstein (18791955)
Al Gore (1948)
Abraham Lincoln (18091865)
abraham Lincoln (18091865)
ABraham Lincoln (18091865)

Alternatives

cat fp | while read -r line ; do echo ${line,,} ; done | sort | tac

Note that the alternative does not preserve case in sort output.

Sort ignoring nonprinting

Sort allows input nonprinting to be ignored as an option. Nonprinting is preserved in the sorted output.

Option

–ignore-nonprinting

Usage

sort –ignore-nonprinting

Commands

famous-people > fp
echo -e "x90Abe" >> fp
cat fp | sort | tac

Output

Audrey Hepburn (19291993)
Angelina Jolie (1975)
Amelia Earhart (18971937)
Alfred Hitchcock (18991980)
Albert Einstein (18791955)
Al Gore (1948)
Abraham Lincoln (18091865)

Looks like we are missing an ‘Abe’ do to non-printing characters in sort input.

To fix this we need to ignore non-printing characters.

Commands

famous-people > fp
echo -e "x90Abe" >> fp
cat fp | sort –ignore-nonprinting | tac
[/cc
<strong>Output</strong>
[cc lang="bash"]
Amelia Earhart (18971937)
Alfred Hitchcock (18991980)
Albert Einstein (18791955)
Al Gore (1948)
Abraham Lincoln (18091865)
▒Abe

Sort dictionary order

Sort allows all input to be ignored except spaces and alphanumeric characters as an option. Input is preserved in the sorted output.

famous-people > fp
echo -e "x90Abe" >> fp
cat fp | sort –d | tac

Post sort

Sort has one main option that doesn’t affect sorting namely, –reverse. However, it affects the output, allowing order to be toggled between ascending and descending. An example follows.

Sort reverse output

Sort allows output to be displayed in reverse order as an option.

Option

–reverse

Usage

sort –reverse

Command line

famous-people | sort –reverse

Output

Angelina Jolie (1975)
Amelia Earhart (18971937)
Alfred Hitchcock (18991980)
Albert Einstein (18791955)
Al Gore (1948)
Abraham Lincoln (18091865)

Alternatives

sort | tac

Other options for sort

There are twenty-two other options for sort. Examples follow.

Sort check

Sort has an option that allows you to check if the input is sorted. It returns after the first instance of an unsorted line. It the case that input is required to be sorted but is likely already in order, using sort check is appropriate.

Option

–check

Usage

sort –check

Command line

seq 10 | sort –random-sort | sort –check

Output

sort: -:3: disorder: 10

Command line

seq 10 | sort –random-sort | sort | sort –check

Output

(blank)

Sort output

Sort has an option that allows you to specify a file to write to instead of using standard output or redirection. Its use may improve compatibility across scripting environments.

Option

–output=FILE

Usage

sort –output=FILE

Command line

seq 10 | sort –random-sort –output=random-10

Output

(blank)

Sort null terminated

Sort has an option that allows you to set the line delimiter to null instead of a newline.

Option

–zero-terminated

Usage

sort –zero-terminated

Command line

seq 10 | tr ’12’ ’00’ | sort –zero-terminated –random-sort

Output

25346178910

Sort stable

Sort has an option that allows you to disable last-resort comparison. As a result, more stable runtimes may be achieved in the case of large enough inputs that could cause sort to run unstable.

Option

–stable

Usage

sort –stable

Command line

time seq 1000000 | sort –random-sort | sort –stable >/dev/null

Output

real    0m9.138s
user    0m9.201s
sys     0m0.107s

Sort buffer size

Sort has an option that allows you to set the amount of memory used as a buffer while sorting. It can be used to limit memory consumption sorting larger inputs. Performance may be affected.

Option

–buffer-size=SIZE

Usage

sort –buffer-size=64

Command line

time seq 1000000 | sort –random-sort | sort –stable –buffer-size=64 >/dev/null

Output

real    0m21.685s
user    0m9.858s
sys     0m2.092s

Sort unique

Sort has an option that allows you to remove duplicate lines in sort output

Option

–unique

Usage

sort –unique

Command line

echo 1 2 2 4 5 | tr ’40’ ’00’ | sort –zero-terminated –unique

Output

1245

Alternatives

sort | uniq

Conclusion

Sort is an external command useful not only when used in combination with other external commands but also comes in handy when used with commands with no built-in ordering method such as a user-defined function or bash scripts in general.

Related Posts
Leave a Reply

Your email address will not be published.Required fields are marked *