140} How do I count how many times a word appears in my text file?
This is how to do it. First decompose the original
text file so that each word is on a line of its own. Then get with
find the number of lines with the word appearing.
@echo off & setlocal enableextensions
set source_=C:\wherever\MYFILE.TXT
< "%source_%"
sed -e "s/[\x20\x09]/\x0d\x0a/g" > "%temp%\words.tmp"
for /f "tokens=3" %%c in ('
find /i /c "
mytext" "%temp%\words.tmp"') do set count_=%%c
echo Found your word %count_% times
for %%f in ("%temp%\words.tmp") do if exist %%f del %%f
endlocal & goto :EOF
Hexadecimal [\x20\x09] denotes the appearance of either a space or a tab.
Hexadecimal \x0d\x0a inserts an eoln (end of line).
Consider finding how many times the word "the" appears in the
explanation test before the code above. Substituting
"
the" in place of "
mytext"
we get
C:\_D\TEST>cmdfaq
Found your word 4 times
Why 4 times, not 3? Because also the word "Then" is
matched. If we wish to get the strict answer, we have to adjust the
code as follows
@echo off & setlocal enableextensions
set source_=C:\wherever\MYFILE.TXT
< "%source_%" sed -e "s/[\x20\x09]/\x0d\x0a/g" > "%temp%\words.tmp"
for /f "tokens=*" %%c in ('
findstr /i
/x /c:"the" "%temp%\words.tmp"') do (
if %errorlevel% EQU 0 set /a count_+=1)
echo Found your word %count_% times
for %%f in ("%temp%\words.tmp") do if exist %%f del %%f
endlocal & goto :EOF
C:\_D\TEST>cmdfaq
Found your word 3 times
Or with
G(nu)AWK:
@echo off & setlocal enableextensions
set source_=C:\wherever\MYFILE.TXT
unxgawk "{j=1;while(j<=NF){if($j==\"myword\"){n++};j++};j=1};END{print \"@set count_=\"n}" "%source_%">"%temp%\setcount.cmd"
for %%c in (call del) do %%c "%temp%\setcount.cmd"
echo Found your word %count_% times
endlocal & goto :EOF
The gawk solution is more accurate in the sense that it does not
count partial word occurrences. It also is case sensitive.
Another rendition of the GnuWin32 gawk solution
@echo off & setlocal enableextensions
set source_=C:\_D\TEST\My test file.txt
set word_=line
::
set temp_=%temp%
if defined mytemp if exist "%mytemp%\" set temp_=%mytemp%
set awkcmd_=%temp_%\tscmd$$$.awk
::
> "%awkcmd_%" echo {j=1;
>> "%awkcmd_%" echo while(j^<=NF)
>> "%awkcmd_%" echo {if($j=="%word_%"){n++};j++}};
>> "%awkcmd_%" echo END{print n}
::
for /f "usebackq" %%n in (`
unxgawk -f "%awkcmd_%" "%source_%"`) do set count_=%%n
for %%f in ("%awkcmd_%") do if exist %%f (del %%f)
if not defined count_ set count_=0
::
echo Found the word "%word_%" %count_% times in "%source_%"
endlocal & goto :EOF
The output might be
C:\_D\TEST>cmdfaq
Found the word "line" 11 times in "C:\_D\TEST\My test file.txt"
The problem can also be solved with a Visual Basic Script (VBScript)
aided command-line script. Furthermore, assume (to make the solution
more generic) that besides spaces, tabs, periods, and commas can
separate words.
@echo off & setlocal enableextensions
::
:: The data file to be inspected
set source_=C:\_M\My test file.txt
::
:: Build a Visual Basic Script
set skip=
set vbs_=%temp%\tmp$$$.vbs
>"%vbs_%" findstr "'%skip%VBS" "%~f0"
::
:: Run it with Microsoft Windows Script Host Version 5.6
< "%source_%" cscript //nologo "%vbs_%" > "%temp%\words.tmp"
::
:: Do the counting
for /f "tokens=*" %%c in ('
findstr /i /x /c:"MyWord" "%temp%\words.tmp"') do (
if %errorlevel% EQU 0 set /a count_+=1)
::
:: Display the result
echo Found your word %count_% times
::
:: Clean up
for %%f in ("%vbs_%" "%temp%\words.tmp") do if exist %%f del %%f
endlocal & goto :EOF
'
'............................................
' The Visual Basic Script
'
Do While Not WScript.StdIn.AtEndOfStream 'VBS
LineStr = WScript.StdIn.ReadLine 'VBS
LineStr = Replace(LineStr, Chr(9), " ") 'VBS
LineStr = Replace(LineStr, ".", " ") 'VBS
LineStr = Replace(LineStr, ",", " ") 'VBS
LineArray = Split(LineStr, " ", -1, vbTextCompare) 'VBS
For i = 0 to UBound(LineArray) 'VBS
WScript.StdOut.WriteLine LineArray(i) 'VBS
Next 'VBS
Loop 'VBS