C – File Finder – Tiny C Projects

Back in ancient times, one of the most popular MS-DOS utilities I wrote was the Fast File Finder. It wasn’t particularly fast, of course. But it did the job of finding a file anywhere on the PC’s hard drive when given a filename. This program was included on the companion floppy diskettes provided with many of my early computing books. Yes, floppy diskettes.

http://mng.bz/M0w2

In today’s operating systems, finding files is a big deal. Both Windows and Mac OS X feature powerful file-finding tools, locating files by not only name but also date, size, and content. The Linux command prompt offers its own slate of file finding tools, just as powerful (if not more so) as their graphical counterparts. For a budding C programmer, or anyone desiring to build their C kung fu, using these tools is useful, but you can’t improve your programming skills by just using the tools.

Hunting for files, and potentially doing something with them, relies upon the directory-spelunking tools covered in Chapter 10. From this base, you can expand your knowledge of C by:

  • Reviewing other file-finding utilities
  • Exploring methods for finding text
  • Locating files in a directory tree
  • Using wildcards to match files
  • Finding filename duplicates

When I program a utility, especially one that’s similar to one that’s already available, I look for improvements. Many command line tools feature a parade of options and features. These switches make the command powerful but beyond what I need. I find the abundance of options overwhelming. Better for me is to build a more specific version of the utility. While such a program may not have the muscle of something coded by expert C programmers of yore, it’s specific to my needs. By writing your own file tools, you learn more about programming in C, plus you get a tool you can use — and customize to your workflow.

11.1  The Great File Hunt

My personal file finding utilities are based on frustration with the existing crop of Linux file finding tools — specifically, find and grep.

Nothing is wrong with these commands that some well-chosen curse words can address. Still, I find myself unable to commit the command formats and options to memory. I constantly refer to the documentation when it comes to using these file finding tools. I understand that this admission could get me kicked out of the neighborhood computer club.

The find command is powerful. In Linux, such power implies options galore, often more command line switches available than letters of the alphabet — upper- and lowercase. This complexity explains why many nerds resort instead to using GUI file search tools instead of a terminal window to locate lost files.

Here is the deceptively simple format for the find command:

Yep. Easy. Suppose you want to locate a file named budget.csv, located somewhere in your home directory tree. Here is the command to use:

The pathname is ~, shortcut for your home directory. The -name switch identifies the file to locate, budget.csv. The final switch, -print (the one everyone forgets), directs the find command to send the results to standard output. You may think something like output would be the necessary default, but the find command can do more with found files than send their names to standard output.

The find command’s desired output may appear on a line by itself, which is fortunate. More common is that you must sift through a long series of errors and duplicate matches. Eventually the desired file is found, and its path revealed:

Yes, you can create an alias to the specific find utility format you use often. No, I’m not going to get into a debate about how powerful and useful the find command is or why I’m a dweeb for not comparing it with a sunshine lollypop for delicious goodness.

The other file finding command is grep, which I use specifically to locate files containing a specific tidbit of text. In fact, I’ve used grep many times when writing this book to locate defined constants in header files. From the /usr/include directory, here is the command to locate the time_t defined constant in various header files:

The -r switch directs grep to recursively look through directories. The string to find is time_t and the * wildcard directs the program to search all filenames.

Many lines of text spew forth when issuing this command, as the time_t defined constant is referenced in multiple header files. Even this trick didn’t locate the specific definition I wanted, though it pointed me in the right direction.

These utilities — find and grep (and its better cousin, egrep) — are wonderful and powerful. Yet I want something friendly and usable without the requirement of chronically checking man pages or referring to hefty command line reference books. This reason is why I code my own versions, covered in this chapter.

With your knowledge of C, you can easily code your own file finding utilities specific to your needs, as complex or as simple as you desire. Then, if you forget any of the options, you have only yourself to blame.

11.2 A File Finder

My goal for finding files is to type a command like this:

The utility digs deep through the current directory tree, scouring subdirectory after subdirectory, hunting for the specific file. If found, the full pathname is output — useful information to me. Add in the capability of using wildcards to locate files, and I’ll never need the find command again — in the specific format to locate a file.

Oh. Yeah, I suppose my own utility must be named something other than find, already used in Linux. How about ff for Find File?

11.2.1 Coding the Find File utility

Chapter 10 covers the process of directory exploration, using the recursive dir() function to plumb subdirectory depths. Building upon this function is perfect for creating a file-finding utility. The goal is to scan directories and compare those files found with a matching filename supplied by the user.

The Find File utility presented in this chapter doesn’t use the same dir() function from Chapter 10. No, the recursive directory finding function requires modification to locate specific files, not all files. I’ve renamed the function find() because I know the name would infuriate the find utility.

My find() function features the same first two arguments as dir() from Chapter 10. But as shown in Listing 11.1, this updated function adds a third argument, match, to help hunt the named file. Other differences between dir() and find() are commented in the listing.

Beyond the additions noted in Listing 11.1, I use the PATH_MAX defined constant, which requires including the limits.h header file. Because not every C library implements PATH_MAX, some preprocessor directives are required:

The value of PATH_MAX differs depending on the operating system. For example, in Windows it could be 260 bytes but in my version of Ubuntu Linux it’s 1024. I’ve seen it as high as 4096 bytes, so 256 seems like a good value that won’t blow up anything. If you want to define a higher value, feel free to do so.

My Find File utility also counts matched files. To keep track, I use variable count, which is defined externally. I am loathe to use global variables, but in this situation having count be external is an effective way to keep track of files found. Otherwise, I could include count as a fourth argument to the find() function, but as a recursive function, maintaining its value consistently introduces all kinds of chaos.

The source code that includes the find() function is named findfile01.c, where the main() function is shown in Listing 11.2. The main() function’s job is to fetch the filename from the command line, retrieve the current path, make the call to the find() function, and then report the results. Listing 11.2 shows the main() function.

Both find() and main() are included in the source code file findfile01.c, available in this book’s online repository. I’ve built the source code into the program file named ff. Here are a few sample runs:

Above, the Find File utility locates all the a.out files in my home directory tree.

In the example above, the utility doesn’t find any files named hello.

Above, the utility attempts to locate all files with the .c extension in the current directory. Rather than return them all, you see only the first match reported: finddupe01.c. The problem here is that the code doesn’t recognize wildcards; it finds only specific filenames.

To match files with wildcards, you must understand something known as the glob. Unlike The Blob, star of the eponymous 1958 horror film, knowing the glob won’t get you killed.


Edit from the blogger:

After 11.2.1 – Coding the Find File utility, the chapter continues with the following structure, which I am not going to “spoil” for you:

You can use the code blvitaca22, if you are buying any manning book, it gives 35%. The following tiny-url – http://mng.bz/M0w2 will make the process easier.

Enjoy it!

Dan Gookin has over 30 years experience in enlighting IT people with his informative and entertaining manner. His most book is "DOS For Dummies". The author delivers online trainings and has his own YouTube channel.

Tagged with: , , , , ,