TSCMD097 FAQ: I need to remove duplicate entries from the output or a file. Assorted NT/2000/XP/.. CMD.EXE script tricks written by Timo Salmi

97} I need to remove duplicate entries from the output or a file.

Translating Eric Pement's solutions from his UNIX-flavored SED collection, his option is

sed "$!N; /^$.*$\n\1$/!P; D"

To test it, let's have

The contents of MyFile.txt will be

This is line 1
This is line 2
This is line 2
This is line 2

This is line 6
This is line 6
This is line 8

and the output will be

C:\_D\TEST>cmdfaq
This is line 1
This is line 2

This is line 6
This is line 8

If one uses third party utilities, and is prepared to go beyond awk/sed then a uniq.exe UNIX port is one option.

A Visual Basic Script (VBScript) aided solution demonstration as the third option:

@echo off & setlocal enableextensions
::
:: Make a test file
set myfile_=MyFile.txt
for %%f in ("%myfile_%") do if exist %%f del %%f
for %%i in (1 2 2 2) do >>"%myfile_%" echo This is line %%i
>>"%myfile_%" echo.
for %%i in (6 6 8) do >>"%myfile_%" echo This is line %%i
::
:: Build a Visual Basic Script
set skip=
set vbs_=%temp%\tmp$$$.vbs
findstr "'%skip%VBS" "%~f0" > "%vbs_%"
::
:: Run the script with Microsoft Windows Script Host Version 5.6
<"%myfile_%" cscript //nologo "%vbs_%"
::
:: Clean up
for %%f in ("%vbs_%" "%myfile_%") do if exist %%f del %%f
endlocal & goto :EOF
'
'.............................................
'The Visual Basic Script
'
prev = "" 'VBS
first = true 'VBS
Do While Not WScript.StdIn.AtEndOfStream 'VBS
str = WScript.StdIn.ReadLine 'VBS
If (str <> prev) Or first Then 'VBS
WScript.StdOut.WriteLine str 'VBS
End If 'VBS
prev = str 'VBS
first = false 'VBS
Loop 'VBS

The input and output will be as in the sed solution earlier.

If(!) you are prepared to accept omitting empty lines and ignore potential exclamation marks and other special character dilemmas, then with a pure script

The output will be (note the dropping of the empty line)
C:\_D\TEST>cmdfaq
This is line 1
This is line 2
This is line 6
This is line 8

Can this task be solved with a pure cmd script so that the empty lines are not omitted? Yes, but the solution is a bit kludgy and complicated. And the issue of the poison characters remains.

@echo off & setlocal enableextensions enabledelayedexpansion
::
:: Make a test file
set myfile_=MyFile.txt
for %%f in ("%myfile_%") do if exist %%f del %%f
for %%i in (1 2 2 2) do >>"%myfile_%" echo This is line %%i
>>"%myfile_%" echo.
for %%i in (6 6 8) do >>"%myfile_%" echo This is line %%i
::
:: Process
for %%v in (prev LineCount) do set %%v=
for /f "delims=" %%a in (
'findstr /n /v /c:"SomeUnlikelyString" Myfile.txt') do (
set str=%%a
set /a LineCount+=1
set /a mod = LineCount/10 + 2
call :WriteOneLine "!str!" "!prev!" !mod!
set prev=!str!
)
::
:: Clean up
for %%f in ("%myfile_%") do if exist %%f del %%f
endlocal & goto :EOF
::
:: =============================================
:WriteOneLine
setlocal
set str=%~1
set prev=%~2
set offset=%3
set str=!str:~%offset%!
set prev=!prev:~%offset%!
if not [!str!]==[!prev!] echo.!str!
endlocal & goto :EOF

The output will be
C:\_D\TEST>cmdfaq
This is line 1
This is line 2

This is line 6
This is line 8

There is, however, a concise pure script solution which drops the duplicate lines irrespective of their location in the original file. It also drops all the empty lines. Recall that the original test file is

This is line 1
This is line 2
This is line 2
This is line 2

This is line 6
This is line 6
This is line 8

@echo off & setlocal enableextensions
::
:: Make the test file
set oldfile_=C:\_M\MyOldFile.txt
set newfile_=C:\_M\MyNewFile.txt
for %%f in ("%oldfile_%") do if exist %%f del %%f
for %%i in (1 2 2 2) do >>"%oldfile_%" echo This is line %%i
>>"%oldfile_%" echo.
for %%i in (6 6 8) do >>"%oldfile_%" echo This is line %%i
::
:: Start from scratch
for %%f in ("%newfile_%") do if exist %%f del %%f
::
:: Pick unique lines from the original file, i.e. our test file
for /f "tokens=* delims=" %%a in ('type "%oldfile_%"') do (
find /i "%%a" "%newfile_%">nul
if errorlevel 1 echo %%a>>"%newfile_%"
)
::
:: Display the result
type "%newfile_%"
endlocal & goto :EOF

The output will be
C:\_D\TEST>cmdfaq
File not found - C:\_M\MYNEWFILE.TXT
This is line 1
This is line 2
This is line 6
This is line 8

Note using if errorlevel 1 instead of if !errorlevel! GTR 0 to avoid the need of using an enabledelayedexpansion.

To put things in perspective. By quite a coincidence I needed the other day to perform that task myself combining two lists of newsgroups and then removing the duplicates. I didn't have to think for one second which route to take when the situation came up for real. Skipped all the nice and fancy scripts and chose the UNIX port uniq.exe without any hesitation.

Also see Item #162. It includes the reverse task of listing duplicate lines.

Assorted NT/2000/XP/.. CMD.EXE Script Tricks From the html version of the tscmd.zip 1cmdfaq.txt file To the Description and the Index

Assorted NT/2000/XP/.. CMD.EXE Script Tricks
From the html version of the tscmd.zip 1cmdfaq.txt file
To the Description and the Index