VocExile/Progrmming Topics/Program to Search Multiple Files

A Program to Search Multiple Files for Text Strings

By Ed Perley

About a year or so ago, a coworker asked me to find a way to quickly search a large number of binary data files for embedded identifying strings. I first thought the UNIX program GREP could do this. But the MSDOS version of the program that I have (C Utilities Toolchest from MIX Software) choked when I tried it. Apparently it only works with ASCII files. This caused me to design the program shown below.

The program will scan all of the files ending with .JOB in the current subdirectory, starting with the eleventh position in each file. The number of characters read is defined by the SIZE macro, which is set to 60.

The main function is used to provide information, instructions, and input a search string. It selects the first file ending with .JOB in the subdirectory, and a loop continues to select the files until there are no more files left. It calls two functions, listfile and compare.

The listfile function opens the selected file and reads the first 60 characters into the string, text[], starting at the eleventh character. This is because in these files, the text always begins at the eleventh character in the job file.

The compare function compares the input string to the 60 character string extracted from each file. In the outer loop, the first character of the string is compared with every character in the extracted string. If a match is found for the first character, the flag is set to 1, and the inner loop is entered.

All subsequent string characters are then compared with all subsequent characters in the extracted string. If two characters do not match, the flag is set to 0, and the computer exits the inner loop. The next character is selected for testing. If the flag is still set to 1 when when the inner loop is completed, an exact match has been found. The outer loop is exited, and the name of the file that contains a matching string is displayed along with 60 character string extracted from it.

The program was compiled with the Power C Compiler from MIX Software. Portions of the program were inspired by examples given in the Power C manuel. String manipulation library functions were not used because of the fear that they, like the Grep program, might have problems with certain characters in the file."

Back to Programming Menu.

Back to Main Menu.

THE PROGRAM FUNCTIONS

#include direct.h
#include stdlib.h
#include stdio.h
#include dos.h
#include string.h
#include conio.h

#define SIZE 60

/*------ FUNCTION PROTOTYPES -----------------------------------------*/

void main(void);
void listfile(struct ffblk *filedata);
void compare(struct ffblk *filedata);

/*------VARIABLES -----------------------------------------------------*/

struct ffblk filedata;

char fname[14];---- /* THE CURRENT FILE NAME */
char text[SIZE];--- /* TEXT EXTRACTED FROM THE FILE AND READY TO TEST */
char string[40];--- /* THE INPUT TEXT STRING TO BE SEARCHED FOR */
FILE *fp;---------- /* THE FILE POINTER */

/*------ THE PROGRAM ---------------------------------------------------*/

main()
{ --- printf("\n \n \n");
--- printf("\n LOOK: A PROGRAM TO SEARCH JOB FILES FOR A TEXT STRING");
--- printf("\n by E. Perley, February 1996");
--- printf("\n \n Enter the text to search for. ");
--- printf(" Press any key to abort the search. \n");
--- gets(string);
--- printf("\n The above text string is found in the following files: \n");
--- if(findfirst("*.JOB", &filedata, FA_NORMAL) == 0)---/*READ FIRST FILE */
--- {
------- listfile(&filedata);
------- compare(&filedata);
------- while(findnext(&filedata) == 0)--- /*READ ALL FILES UNTIL EOF */
------- {
------- listfile(&filedata);
------- compare(&filedata);
------- if(getkey() != EOF) { fcloseall(); exit(0); }---/*EXIT PROGRAM */
------- }
--- }
--- else perror("findfirst error");
--- printf("\n");
--- fcloseall();
}

void listfile(struct ffblk *filedata)
{
/* OPEN AND READ THE NUMBER OF CHARACTERS INDICATED BY 'SIZE' */
--- fp = fopen(filedata->ff_name, "rb");
--- if (fp == NULL) printf("open error");
--- else
--- {
--- fread(&text, sizeof(char), (size_t) SIZE, fp);
--- }
--- fclose(fp);
}

void compare(struct ffblk *filedata)
{
--- int x, y;
--- int tlen, slen;
--- int flag;

--- slen = (int)(strlen(string) -1 );
--- tlen = SIZE - slen - 1;

--- flag = 0;
--- x = 11; /* POSITION OF FIRST TEXT CHARACTER */

/* OUTER WHILE LOOP--INCREMENT TEXT ONE CHAR AT A TIME-------*/
--- while( x < tlen && flag == 0)
--- {
------ if (string[0] == text[x] )
------ {
--------- flag = 1;
--------- y = 1;
--------- /* WHILE LOOP--COMPARE INPUT FILE ONE CHAR AT A TIME */
--------- while (flag == 1 && y < slen)
--------- {
-------------- if( text[x+y] != string[y]) flag = 0;
-------------- y++;--- /*INCREMENT Y */
--------- }---/* END OF INNER WHILE LOOP ---------------------------- */

------ }--- /* END OF IF STATEMENT */
------ x++; /*INCREMENT X */
--- }--- /* END OF OUTER WHILE LOOP ------------------------------- */

--- if (flag == 1)
--------- { --------- printf("\n% 14s", filedata->ff_name);
--------- text[0] = 46; text[1] = 46; text[2] = 46; text[3] =46;
--------- text[4] = 46; text[5] = 46; text[6] = 46; text[7] = 46;
--------- text[8] = 46; text[9] = 46; text[10] = 46; text[11] = 46;
--------- printf("%- 60s", text);
--------- }

}

Please note:

Your C compiler may not have the direct.h header file referred to in the above code. It contains a mixed bag of functions. Some of them are from the UNIX System V system, and some are designed specifically for PC computers.

The functions you need from this header file are "findfirst" and "findnext." Perhaps you can find something equivalent to them in another header file. Here is how they work:

Required Headers:
#include direct.h
#include dos.h

The functions:
int findfirst( char *filename, struct ffblk *filedata, int attr );
int findnext( struct ffblk *filedata );

"The 'findfirst' function searches for the file specified by 'filename' and stores information about the file in the structure pointed to by 'filedata.' The 'filename' string may optionally include a drive and/or directory prefix, and the file name may include the wild card character (*). The 'attr' argument specifies one or more file attributes, one of which must match the attribute of the file specified by 'filename.' The file attribute may be one or more of the following values defined in dos.h. Attribute values may be combined using the bitwise OR operator ( | ).

The 'findnext' function may be called to search for subsequent files that match the 'filedata' obtained by a previous call to the 'findfirst' function. The 'findnext' function is typically called if the preceding 'findfirst' function call specifies the filename with one or more wild card characters (e.g. pc*.*)."

File Attributes:
FA_NORMAL.............................. normal file
FA_RDONLY.............................. read only file
FA_HIDDEN.............................. hidden file
FA_SYSTEM.............................. operating system file
FA_LABEL............................... volume label
FA_DIREC............................... subdirectory
FA_ARCH................................ archive

The structure 'ffblk' defined in direct.h
struct ffblk { char ff_reserved[21];............................ /* USED BY DOS
char ff_attrib;.................................../* ATTRIBUTES OF THE FILE
char ff_ftime;..................................../* TIME FILE LAST MODIFIED
char ff_fdate;..................................../*DATE FILE LAST MODIFIED
char ff_size;...................................../*SIZE OF FILE
char ff_name[13]................................../*NAME OF FILE - LIMITED TO 14 CHAR
} ;

(Power C The High Performance C Compiler, Mix Software, Inc. 1990, page 279 - )

Back to Programming Menu.

Back to Main Menu.


http://www.nfinity.com/~exile/look.htm
Email: See bottom of Programming Menu
Date last updated: March, 2001