Which method do you use to recursively read files from folder?

Generally, here’s how I read all files from folder in my C++ application (cross-platform):

std::vector<std::filesystem::directory_entry> paths;

for (const auto &entry : std::filesystem::recursive_directory_iterator(folderPath)) {
	paths.push_back(entry);
}

The problem is this require C++17, which seems @Vortico won’t support it.

How would you do this?
boost alternative seems a pain…

1 Like
std::vector<std::string> dir_content(std::string dir_path)
{

    std::vector<std::string>    dir_content;
    DIR                                 *dir = opendir(dir_path.c_str());

    if(dir)
    {
        struct dirent *d;
        while((d = readdir(dir)))
        {
            std::string filename = d->d_name;

            if(filename != "." && filename != "..")
            {
                std::string filepath = std::string(dir_path) + "/" + filename;
                dir_content.push_back(filepath);
            }
        }
        closedir(dir);
    }

    return dir_content;
}
1 Like

Hi @synthi,

thanks for the reply.
dirent.h, right?

Is it cross-platform?
Do you use in Rack? Works fine?

yes dirent.h,, works fine on all the platforms

of course you want to check path by path the type (could be an alias for example)

and if you want it recursive, pass the vector as reference call it recursevely when the type is DT_DIR

1 Like

I took @synthi code and made it into an object. Note that if you’re going to recurse into deeply nested directory trees, you’re better off not building a vector of filenames first, and return the names one at a time as you read them.

#include <sys/stat.h>
#include <sys/types.h>
#include <dirent.h>
#include <vector>
#include <iostream>

class dirRecurser {
public:
   typedef std::vector<std::string> NameList;
   // initialize recursive list of files based on name
   dirRecurser(const char *name, bool recurse=true)
   	{
   		const std::string _name(name);
   		this->dir_content(_name,recurse);
   	}
   // return number of filenames found
   size_t size()
   	{
   		return this->m_Contents.size();
   	}
   // array-like access to filename list
   const std::string &operator[] (int index)
   	{
   		if(index < 0 || index >= this->m_Contents.size())
   			{
   				throw "dirRecurser: index out of bounds";
   			}
   		return this->m_Contents[index];
   	}
   // get a handle on the actual name vector
   const NameList &GetAllNames()
   	{
   		return this->m_Contents;
   	}
private:
   // the actual recurser
   void dir_content(const std::string &dir_path, bool recurse)
   	{
   		// assumes that dir_path is a dir. If not then zero filenames
   		// go in vector
   		DIR *dir = opendir(dir_path.c_str());
   		if(!dir)
   			{
   				return;
   			}
   		struct dirent *d;
   		while((d = readdir(dir)))
   			{
   				std::string filename = d->d_name;

   				if(filename != "." && filename != "..")
   					{
   						std::string filepath =
   							std::string(dir_path) + "/" + filename;
   						if(!this->isDir(filepath))
   							{
   								this->m_Contents.push_back(filepath);
   							}
   						else if(recurse)
   							{
   								this->dir_content(filepath,recurse);
   							}
   					}
   			}
   		closedir(dir);
   	}
   bool isDir(const std::string &in_name)
   	{
   		struct stat statbuf;
   		return stat(in_name.c_str(), &statbuf) == 0 &&
   			(statbuf.st_mode & S_IFMT) == S_IFDIR;
   	}
   NameList m_Contents;
};

int main(int argc, char **argv)
{
   std::string dirname(".");
   if(argc > 1)
   	dirname = argv[1];
   std::cout << "Contents of " << dirname << std::endl;
   dirRecurser dir(dirname.c_str(),true);

   for(unsigned i = 0; i < dir.size(); ++i) {
   	std::cout << dir[i] << std::endl;
   }
   return 0;
}

2 Likes

I’ve added system::getEntriesRecursive() to the Rack API. Of course, it won’t be available until the next release.

4 Likes

Thanks all for the replies.

Fact is, compared to std::filesystem::recursive_directory_iterator(folderPath) they are EXTREMELY slow.
In a test with a folder that contains ~70k files, is few ms against 3/4 seconds.

I’ll try to optimize a bit the code given by you, and let you know :wink:

do you have 70k files in one dir ? Are you google ?
:rofl:

1 Like

I’m using recursive directory scans Antonio… lots of sample/pack libraries :stuck_out_tongue:

However, I’ve elaborated a fancy “algo” anyway, using dirent.h, which is as fast as the native std::filesystem::recursive_directory_iterator.

I’ll clean up the code and do some experiments, and let you know :wink:

1 Like

I’m interested in why recursive_directory_iterator is so fast and my example is so slow. I’m betting my code is spending most of it’s time adding elements to a std::vector. If you overflow the internal array in std::vector it has to reallocate it and copy all elements to the new array.

However, I’ve elaborated a fancy “algo” anyway, using dirent.h , which is as fast as the native std::filesystem::recursive_directory_iterator .

Even given the issue of extending a std::vector (and a bunch of std::string copying) that speedup doesn’t seem possible.

Probably that isDir? Did you try simple with:

if (d->d_type == DT_DIR) ?

That is an obvious optimization, yes! Will that speed it up? Let me check!

With stat on every file and a large directory: 6.056s With d->d_type == DT_DIR test : 3.580s

I didn’t do repeated runs and average them, but that indicates 69% of the time was spent in unnecessary calls to stat to test if a name points to a directory. It doesn’t explain this:

Fact is, compared to std::filesystem::recursive_directory_iterator(folderPath)
they are EXTREMELY slow. In a test with a folder that contains ~70k files, is few ms against 3/4 seconds .

And actually I installed g++ 8.3.1 and used std::filesystem::recursive_directory_iterator to do the same thing (print every path on stdout) and it was noticeably slower taking 9.639 seconds versus 3.58 seconds. So the massive speedup is still a mystery to me.