Bash command to check directory recursively and count average line count and average chars per line in files

Bash command:

find . -type f | xargs wc -lc | sort -n | awk '{prev=curr; curr=$1" "$3} NR==1{print "Min lines:", $1, $3} END{print "Max lines:", prev; print "Total of", $1, "lines in", (NR - 1), "files"; print "Avg line count:", ($1 / (NR - 1)); print "Avg chars per line:", ($2 / $1)}';

Breakdown of bash command. The command above comprises 4 commands, separated by the | pipe character. The output of the previous command is piped in as the input to the next command.

find . -type f
Get list of all files in current directories and subdirectories recursively.

xargs wc -lc
Take input and count no. of lines and bytes for each file.

sort -n
Sort the input numerically in ascending order. Since the no. of lines is the 1st number for each row in the input, the input will be sorted by the no. of lines for each file. The file with
the least no. of lines will be on top.

Breakdown of rules in awk script. Note that for each row in the input, the default field separator, i.e. whitespace, is used as delimiter. Hence, each row in the input will have 3 fields – no. of lines in file, no. of bytes in file, filename. $0 refers to the entire row, $1 refers to the 1st field, and so on.

{prev=curr; curr=$1" "$3}
As awk goes thru each row in the input, concatenate the 1st and 3rd fields together and save it. The current row and previous row will be saved.

NR==1{print "Min lines:", $1, $3}
If the row number is 1, print out the 1st and 3rd field, which are the no. of lines in the file and the filename
respectively.

END{print "Max lines:", prev; print "Total of", $1, "lines in", (NR - 1), "files"; print "Avg line count:", ($1 / (NR - 1)); print "Avg chars per line:", ($2 / $1)}
If reach end of input, print out the previous row. Print out the 1st field in the last row which contains the total no. of lines. Calculate the average line count and average characters per line.

Just a tip: If you need to initialize a variable, e.g. sum, just add a BEGIN{sum = 0} rule at the start of the awk script 🙂