97} I need to remove duplicate entries from the output or a file.
Translating Eric Pement's solutions from his UNIX-flavored
SED
collection, his option is
sed "$!N; /^\(.*\)\n\1$/!P; D"
To test it, let's have
@echo off & setlocal enableextensions
::
:: Make a test file
set myfile_=MyFile.txt
for %%f in ("%myfile_%") do if exist %%f del %%f
for %%i in (1 2 2 2) do >>"%myfile_%" echo This is line %%i
>>"%myfile_%" echo.
for %%i in (6 6 8) do >>"%myfile_%" echo This is line %%i
::
<"%myfile_%" sed "$!N; /^\(.*\)\n\1$/!P; D"
::
:: Clean up
for %%f in ("%myfile_%") do if exist %%f del %%f
endlocal & goto :EOF
The contents of MyFile.txt will be
This is line 1
This is line 2
This is line 2
This is line 2
This is line 6
This is line 6
This is line 8
and the output will be
C:\_D\TEST>cmdfaq
This is line 1
This is line 2
This is line 6
This is line 8
If one uses third party utilities, and is prepared to go beyond
awk/sed then a uniq.exe UNIX port is one option.
A Visual Basic Script (VBScript) aided solution demonstration as the
third option:
@echo off & setlocal enableextensions
::
:: Make a test file
set myfile_=MyFile.txt
for %%f in ("%myfile_%") do if exist %%f del %%f
for %%i in (1 2 2 2) do >>"%myfile_%" echo This is line %%i
>>"%myfile_%" echo.
for %%i in (6 6 8) do >>"%myfile_%" echo This is line %%i
::
:: Build a Visual Basic Script
set skip=
set vbs_=%temp%\tmp$$$.vbs
findstr "'%skip%VBS" "%~f0" > "%vbs_%"
::
:: Run the script with Microsoft Windows Script Host Version 5.6
<"%myfile_%" cscript //nologo "%vbs_%"
::
:: Clean up
for %%f in ("%vbs_%" "%myfile_%") do if exist %%f del %%f
endlocal & goto :EOF
'
'.............................................
'The Visual Basic Script
'
prev = "" 'VBS
first = true 'VBS
Do While Not WScript.StdIn.AtEndOfStream 'VBS
str = WScript.StdIn.ReadLine 'VBS
If (str <> prev) Or first Then 'VBS
WScript.StdOut.WriteLine str 'VBS
End If 'VBS
prev = str 'VBS
first = false 'VBS
Loop 'VBS
The input and output will be as in the sed solution earlier.
If(!) you are prepared to accept omitting empty lines and ignore
potential exclamation marks and other special character dilemmas, then
with a pure script
@echo off & setlocal enableextensions
enabledelayedexpansion
::
:: Make a test file
set myfile_=MyFile.txt
for %%f in ("%myfile_%") do if exist %%f del %%f
for %%i in (1 2 2 2) do >>"%myfile_%" echo This is line %%i
>>"%myfile_%" echo.
for %%i in (6 6 8) do >>"%myfile_%" echo This is line %%i
::
:: Process
set prev=
for /f "delims=" %%a in ('type "%myfile_%"') do (
set str=%%a
if not [!str!]==[!prev!] echo %%a
set prev=!str!
)
::
:: Clean up
for %%f in ("%myfile_%") do if exist %%f del %%f
endlocal & goto :EOF
The output will be (note the dropping of the empty line)
C:\_D\TEST>cmdfaq
This is line 1
This is line 2
This is line 6
This is line 8
Can this task be solved with a pure cmd script so that the empty
lines are not omitted? Yes, but the solution is a bit kludgy and
complicated. And the issue of the poison characters remains.
@echo off & setlocal enableextensions enabledelayedexpansion
::
:: Make a test file
set myfile_=MyFile.txt
for %%f in ("%myfile_%") do if exist %%f del %%f
for %%i in (1 2 2 2) do >>"%myfile_%" echo This is line %%i
>>"%myfile_%" echo.
for %%i in (6 6 8) do >>"%myfile_%" echo This is line %%i
::
:: Process
for %%v in (prev LineCount) do set %%v=
for /f "delims=" %%a in (
'findstr /n /v /c:"SomeUnlikelyString" Myfile.txt') do (
set str=%%a
set /a LineCount+=1
set /a mod = LineCount/10 + 2
call :WriteOneLine "!str!" "!prev!" !mod!
set prev=!str!
)
::
:: Clean up
for %%f in ("%myfile_%") do if exist %%f del %%f
endlocal & goto :EOF
::
:: =============================================
:WriteOneLine
setlocal
set str=%~1
set prev=%~2
set offset=%3
set str=!str:~%offset%!
set prev=!prev:~%offset%!
if not [!str!]==[!prev!] echo.!str!
endlocal & goto :EOF
The output will be
C:\_D\TEST>cmdfaq
This is line 1
This is line 2
This is line 6
This is line 8
There is, however, a concise pure script solution which drops the
duplicate lines irrespective of their location in the original file.
It also drops all the empty lines. Recall that the original test file
is
This is line 1
This is line 2
This is line 2
This is line 2
This is line 6
This is line 6
This is line 8
@echo off & setlocal enableextensions
::
:: Make the test file
set oldfile_=C:\_M\MyOldFile.txt
set newfile_=C:\_M\MyNewFile.txt
for %%f in ("%oldfile_%") do if exist %%f del %%f
for %%i in (1 2 2 2) do >>"%oldfile_%" echo This is line %%i
>>"%oldfile_%" echo.
for %%i in (6 6 8) do >>"%oldfile_%" echo This is line %%i
::
:: Start from scratch
for %%f in ("%newfile_%") do if exist %%f del %%f
::
:: Pick unique lines from the original file, i.e. our test file
for /f "tokens=* delims=" %%a in ('type "%oldfile_%"') do (
find /i "%%a" "%newfile_%">nul
if errorlevel 1 echo %%a>>"%newfile_%"
)
::
:: Display the result
type "%newfile_%"
endlocal & goto :EOF
The output will be
C:\_D\TEST>cmdfaq
File not found - C:\_M\MYNEWFILE.TXT
This is line 1
This is line 2
This is line 6
This is line 8
Note using
if errorlevel 1 instead of
if !errorlevel! GTR 0 to avoid the need
of using an enabledelayedexpansion.
To put things in perspective. By quite a coincidence I needed the
other day to perform that task myself combining two lists of
newsgroups and then removing the duplicates. I didn't have to think
for one second which route to take when the situation came up for
real. Skipped all the nice and fancy scripts and chose the UNIX port
uniq.exe without any hesitation.
Also see
Item #162.
It includes the reverse task of listing duplicate lines.