Linux uniq Command

Linux uniq Command
What’s “uniq”? It’s a great command-line tool that will report or omit any duplicate text in the given input. It’s a part of the GNU coreutils and so, available on every single UNIX/Linux system out there. Let’s get started with “uniq”!

Uniq usage

Here’s how the base structure of “uniq” commands looks like.

uniq <options> <input> <output>

For example, let’s check out the content of “duplicate.txt”. Of course, it contains a lot of duplicate text content for the purpose of this article.

cat duplicate.txt | sort

There are clearly duplicate contents, right? Let’s filter them through “uniq”.

cat duplicate | sort | uniq

The output looks so better with only the unique values, right?

However, you just don’t need to use the piping method to do the job. “uniq” can directly work on the files as well.

uniq <options> <filename>

Deleting duplicate content

Yes, deleting the duplicate content from the input and keeping the first occurrence only is the default behavior of “uniq”. Note that this duplicate deletion only occurs when “uniq” finds concurrent duplicate items.

Let’s check out this example. I’ve created another “duplicate1.txt” file that contains duplicate items. However, they’re not adjacent to each other.

bat duplicate1.txt

Now, filter this output using “uniq”.

cat duplicate1.txt | uniq

All the duplicate contents are there! That’s why if you’re working with something similar to this, pipe the content through “sort” to make sure that all the contents are sorted and duplicates are adjacent to each other.

cat duplicate1.txt | sort

Now, “uniq” will do its job normally.

cat duplicate1.txt | sort | uniq

Number of repetitions

If you want, you can check out how many times a line is repeated in the content. Just use the “-c” flag with “uniq”.

cat duplicate.txt | sort | uniq -c

Note: “uniq” will also do its regular job of deleting the duplicate ones.

Printing duplicate lines

Most of the times, we want to get rid of the duplicates, right? This time, how about just checking out what’s duplicate?

Yes, “uniq” is also able to do that. In this case, you have to use the “-D” option. I’ll be using “sort” in-between to have a better, more refined result.

cat duplicate.txt | sort | uniq -D

WOW! That’s a LOT of duplicates! However, all duplicates are clustered together, making it difficult to navigate through. How about adding a little gap in-between?

uniq –all-repeated=<method>

Here, there are 3 different methods available: none (default value), prepend and separate.

cat duplicate.txt | sort | uniq –all-repeated=prepend

cat duplicate.txt | sort | uniq –all-repeated=separate

Now, it looks better.

Skipping uniqueness check

In many cases, the uniqueness has to be checked by a different part of the line.

Let’s understand this by example. In the file duplicate1.txt, let’s say that the duplication is determined by the second part. How do you tell “uniq” to do that? Generally, it checks for the first field (by default). Well, we can also do that, too. There’s this “-f” flag to do just the job.

uniq -f <number_of_fields_to_skip> <filename>
cat duplicate1.txt | sort -k 2 | uniq -f 1

If you’re wondering with the “sort” flag, it’s to tell “sort” to sort based on the second column.

Display all lines but separate duplicates

According to all the examples mentioned above, “uniq” only keeps the first occurrence of the duplicated content and removes the rest. How about removing the duplicate contents altogether? Yes, using the flag “-u”, we can force “uniq” to keep the non-repetitive lines only.

cat duplicate.txt | sort

cat duplicate.txt | sort | uniq -u

Hmm, too many duplicates now gone…

Skip initial characters

We discussed how to tell “uniq” to do its job for other fields, right? It’s time for starting the check after a number of initial characters. For this purpose, the “-s” flag accompanied by the number of characters is going to tell “uniq” to do the job.

cat duplicate1.txt | sort -k 2 | uniq -s 2

It’s similar to the example where “uniq” was to do its task in the second field only. Let’s see another example with this trick.

cat duplicate.txt | sort | uniq -s 5

Check initial characters ONLY

Just like the way we told “uniq” to skip first couple characters, it’s also possible to tell “uniq” to just limit the check within the first couple characters. There’s a dedicated “-w” flag for this purpose.

cat duplicate.txt | sort | uniq -w 5

This command tells “uniq” to perform uniqueness check within the first 5 characters.

Let’s see another example of this command.

cat duplicate1.txt | sort | uniq -w 5

It wipes out all the other instances of “duplicate” entries because it did the uniqueness check on “dupli” part.

Case insensitivity

When checking for uniqueness, “uniq” also checks for the case of the characters. In some situations, case sensitivity doesn’t matter, so we can use the flag “-i” to make “uniq” case insensitive.

Here I present you the demo file.

Some really clever duplication with a mixture of uppercase and lowercase letters, right? It’s time to call upon the strength of “uniq” to purge the mess!

cat duplicate1.txt | sort | uniq -i

Wish granted!

NULL-terminated output

The default behavior of “uniq” is to end the output with a newline. However, the output can also be terminated with a NULL. That’s pretty useful if you’re going to use it in scripting. Here, the flag “-z” is what does the job.

cat duplicate.txt | sort | uniq -z

Combining multiple flags

We learned a number of flags of “uniq”, right? How about combining them together?

For example, I’m combining the case insensitivity and number of repetition together.

If you’re ever planning to mix multiple flags together, at first, make sure that they work the right way together. Sometimes, things just don’t work as they should.

Final thoughts

“uniq” is quite a unique tool that Linux offers. With so much powerful features, it can be useful in tons of ways. For the list of all the flags and their explanations, consult the man and info pages of “uniq”.

man uniq

info uniq

Enjoy!

Related Posts
Leave a Reply

Your email address will not be published.Required fields are marked *