Utilizing BASH and AWK to Merge Text Files

Sometimes, you need BASH to do some oddball things. This is one such task.

Scenario:

You are given two text files, file P,  which contains an alpha descending list of people, and file F, which contains a list of filenames belonging to that person. You would like to merge these files into a single, tab-delimited text file for later processing. Well, here’s how you can use BASH to do so!

$ nano ~/merge.sh
#!/bin/bash
echo "********** DIP FILE PREPARER **********"
echo ""
echo "
This program accepts 2 input files, an index file (f) containing the split PDF files and an index file (p) which contains the ids of the people to be matched with their file.  Please make sure that both files are sorted by alpha-descending (A to Z) order and that both files have the same number of lines, or the script will be unable to match them properly.
"
echo "==============================================================="
echo "1. Prepare ref or file index file"
echo "2. Merge ref_index.txt and file_index.txt"
echo "3. Display example pre-processed index file"

echo "What would you like to do?"
read action

case $action in
  "1") 	echo "Preparing file name index file"
       	echo "Please input full or relative file path to your index file containing a list of files: "
       	read file_index
       	echo "Please enter an output file name (no file extension is necessary): "
       	read output_file_index
       	awk '{ printf "\x22%s\x22\n",$0}' $file_index > $output_file_index.txt
       	echo "File output to $output_file_index.txt"
       	;;
  "2")	echo "Please input location of index file"
	read file_index
	echo "Please input location of ref file"
	read ref_index

	# get line count of each file
	ln_file_index=$(awk 'END {print NR}' $file_index)
	ln_ref_index=$(awk 'END {print NR}' $ref_index)

	# check if line counts are equal
	if [ "$ln_file_index" == "$ln_ref_index" ]; then
		echo "Files match!"
		echo "Please input Round Name (RG 2017, FED 2017, etc.)"
		read round
		echo "Preparing parser_output.txt"
		echo -e "\x22ref\x22\t\x22filename\x22\t\x22round\x22" >> parser_output.txt
		# loop and output lines into output file
		for (( x=1; x<=$ln_file_index; x++ ))
		do
			echo "Processing $file_index line $x"
			file_line=$(awk "NR==$x{print;exit}" $file_index)
			ref_line=$(awk "NR==$x{print;exit}" $ref_index)
			echo -e "$ref_line\t$file_line\t\x22$round\x22" >> parser_output.txt
		done
		echo "Processing complete"
	else
		echo "Files do not match!  Please make sure that the files have a matching line count"
		echo "$file_index line count  $ln_file_index"
		echo "$ref_index line count $ln_ref_index"
	fi
	;;
  "3")	cat sample_index.txt|less
	;;
esac

Share On... Tweet about this on TwitterShare on FacebookShare on RedditShare on Google+Share on LinkedIn

Posted in Tutorials

Leave a Reply