C:\_G\WWW\~ELISANET\INFO\tscmd140.html
<http://www.elisanet.fi/tsalmi/info/tscmd140.html>
Copyright © 2003- by Prof. Timo Salmi  
Last modified Mon 19-Nov-2018 17:20:30

 
Assorted NT/2000/XP/.. CMD.EXE Script Tricks
From the html version of the tscmd.zip 1cmdfaq.txt file
To the Description and the Index
 

This page is edited from the 1cmdfaq.txt faq-file contained in my tscmd.zip command line interface (CLI) collection. That zipped file has much additional material, including a number of detached .cmd script files. It is recommended that you also get the zipped version as a companion.

Please see "The Description and the Index page" for the conditions of usage and other such information.



140} How do I count how many times a word appears in my text file?

This is how to do it. First decompose the original text file so that each word is on a line of its own. Then get with find the number of lines with the word appearing.
  @echo off & setlocal enableextensions
  set source_=C:\wherever\MYFILE.TXT
  < "%source_%" sed -e "s/[\x20\x09]/\x0d\x0a/g" > "%temp%\words.tmp"
  for /f "tokens=3" %%c in ('
    find /i /c "mytext" "%temp%\words.tmp"') do set count_=%%c
  echo Found your word %count_% times
  for %%f in ("%temp%\words.tmp") do if exist %%f del %%f
  endlocal & goto :EOF
Hexadecimal [\x20\x09] denotes the appearance of either a space or a tab.
Hexadecimal \x0d\x0a inserts an eoln (end of line).

Consider finding how many times the word "the" appears in the explanation test before the code above. Substituting "the" in place of "mytext" we get
C:\_D\TEST>cmdfaq
Found your word 4 times

Why 4 times, not 3? Because also the word "Then" is matched. If we wish to get the strict answer, we have to adjust the code as follows
  @echo off & setlocal enableextensions
  set source_=C:\wherever\MYFILE.TXT
  < "%source_%" sed -e "s/[\x20\x09]/\x0d\x0a/g" > "%temp%\words.tmp"
  for /f "tokens=*" %%c in ('
    findstr /i /x /c:"the" "%temp%\words.tmp"') do (
    if %errorlevel% EQU 0 set /a count_+=1)
  echo Found your word %count_% times
  for %%f in ("%temp%\words.tmp") do if exist %%f del %%f
  endlocal & goto :EOF

C:\_D\TEST>cmdfaq
Found your word 3 times

Or with G(nu)AWK:
  @echo off & setlocal enableextensions
  set source_=C:\wherever\MYFILE.TXT
  unxgawk "{j=1;while(j<=NF){if($j==\"myword\"){n++};j++};j=1};END{print \"@set count_=\"n}" "%source_%">"%temp%\setcount.cmd"
  for %%c in (call del) do %%c "%temp%\setcount.cmd"
  echo Found your word %count_% times
  endlocal & goto :EOF
The gawk solution is more accurate in the sense that it does not count partial word occurrences. It also is case sensitive.

Another rendition of the GnuWin32 gawk solution
  @echo off & setlocal enableextensions
  set source_=C:\_D\TEST\My test file.txt
  set word_=line
  ::
  set temp_=%temp%
  if defined mytemp if exist "%mytemp%\" set temp_=%mytemp%
  set awkcmd_=%temp_%\tscmd$$$.awk
  ::
  >  "%awkcmd_%" echo {j=1;
  >> "%awkcmd_%" echo while(j^<=NF)
  >> "%awkcmd_%" echo {if($j=="%word_%"){n++};j++}};
  >> "%awkcmd_%" echo END{print n}
  ::
  for /f "usebackq" %%n in (`
    unxgawk -f "%awkcmd_%" "%source_%"`) do set count_=%%n
  for %%f in ("%awkcmd_%") do if exist %%f (del %%f)
  if not defined count_ set count_=0
  ::
  echo Found the word "%word_%" %count_% times in "%source_%"
  endlocal & goto :EOF

The output might be
  C:\_D\TEST>cmdfaq
  Found the word "line" 11 times in "C:\_D\TEST\My test file.txt"

The problem can also be solved with a Visual Basic Script (VBScript) aided command-line script. Furthermore, assume (to make the solution more generic) that besides spaces, tabs, periods, and commas can separate words.
  @echo off & setlocal enableextensions
  ::
  :: The data file to be inspected

  set source_=C:\_M\My test file.txt
  ::
  :: Build a Visual Basic Script

  set skip=
  set vbs_=%temp%\tmp$$$.vbs
  >"%vbs_%" findstr "'%skip%VBS" "%~f0"
  ::
  :: Run it with Microsoft Windows Script Host Version 5.6

  < "%source_%" cscript //nologo "%vbs_%" > "%temp%\words.tmp"
  ::
  :: Do the counting

  for /f "tokens=*" %%c in ('
    findstr /i /x /c:"MyWord" "%temp%\words.tmp"') do (
    if %errorlevel% EQU 0 set /a count_+=1)
  ::
  :: Display the result

  echo Found your word %count_% times
  ::
  :: Clean up

  for %%f in ("%vbs_%" "%temp%\words.tmp") do if exist %%f del %%f
  endlocal & goto :EOF
  '
  '............................................
  ' The Visual Basic Script
  '

  Do While Not WScript.StdIn.AtEndOfStream 'VBS
    LineStr = WScript.StdIn.ReadLine 'VBS
    LineStr = Replace(LineStr, Chr(9), " ") 'VBS
    LineStr = Replace(LineStr, ".", " ") 'VBS
    LineStr = Replace(LineStr, ",", " ") 'VBS
    LineArray = Split(LineStr, " ", -1, vbTextCompare) 'VBS
    For i = 0 to UBound(LineArray) 'VBS
      WScript.StdOut.WriteLine LineArray(i) 'VBS
    Next 'VBS
  Loop 'VBS

[Previous] [Next]

C:\_G\WWW\~ELISANET\INFO\tscmd140.html
C:\_G\WWW\~ELISANET\FTPCMD\TSALMI.CMD /tscmd140
http://www.elisanet.fi/tsalmi/info/tscmd140.html
file:///c:/_g/www/~elisanet/info/tscmd140.html