I took @synthi code and made it into an object. Note that if you’re going to recurse into deeply nested directory trees, you’re better off not building a vector of filenames first, and return the names one at a time as you read them.
#include <sys/stat.h>
#include <sys/types.h>
#include <dirent.h>
#include <vector>
#include <iostream>
class dirRecurser {
public:
typedef std::vector<std::string> NameList;
// initialize recursive list of files based on name
dirRecurser(const char *name, bool recurse=true)
{
const std::string _name(name);
this->dir_content(_name,recurse);
}
// return number of filenames found
size_t size()
{
return this->m_Contents.size();
}
// array-like access to filename list
const std::string &operator[] (int index)
{
if(index < 0 || index >= this->m_Contents.size())
{
throw "dirRecurser: index out of bounds";
}
return this->m_Contents[index];
}
// get a handle on the actual name vector
const NameList &GetAllNames()
{
return this->m_Contents;
}
private:
// the actual recurser
void dir_content(const std::string &dir_path, bool recurse)
{
// assumes that dir_path is a dir. If not then zero filenames
// go in vector
DIR *dir = opendir(dir_path.c_str());
if(!dir)
{
return;
}
struct dirent *d;
while((d = readdir(dir)))
{
std::string filename = d->d_name;
if(filename != "." && filename != "..")
{
std::string filepath =
std::string(dir_path) + "/" + filename;
if(!this->isDir(filepath))
{
this->m_Contents.push_back(filepath);
}
else if(recurse)
{
this->dir_content(filepath,recurse);
}
}
}
closedir(dir);
}
bool isDir(const std::string &in_name)
{
struct stat statbuf;
return stat(in_name.c_str(), &statbuf) == 0 &&
(statbuf.st_mode & S_IFMT) == S_IFDIR;
}
NameList m_Contents;
};
int main(int argc, char **argv)
{
std::string dirname(".");
if(argc > 1)
dirname = argv[1];
std::cout << "Contents of " << dirname << std::endl;
dirRecurser dir(dirname.c_str(),true);
for(unsigned i = 0; i < dir.size(); ++i) {
std::cout << dir[i] << std::endl;
}
return 0;
}
Fact is, compared to std::filesystem::recursive_directory_iterator(folderPath) they are EXTREMELY slow.
In a test with a folder that contains ~70k files, is few ms against 3/4 seconds.
I’ll try to optimize a bit the code given by you, and let you know
I’m interested in why recursive_directory_iterator is so fast and my example is so slow. I’m betting my code is spending most of it’s time adding elements to a std::vector. If you overflow the internal array in std::vector it has to reallocate it and copy all elements to the new array.
However, I’ve elaborated a fancy “algo” anyway, using dirent.h , which is as fast as the native
std::filesystem::recursive_directory_iterator .
Even given the issue of extending a std::vector (and a bunch of std::string copying) that speedup doesn’t seem possible.
That is an obvious optimization, yes! Will that speed it up? Let me check!
With stat on every file and a large directory: 6.056s
With d->d_type == DT_DIR test : 3.580s
I didn’t do repeated runs and average them, but that indicates 69% of the time was spent in unnecessary calls to stat to test if a name points to a directory. It doesn’t explain this:
Fact is, compared to std::filesystem::recursive_directory_iterator(folderPath)
they are EXTREMELY slow.
In a test with a folder that contains ~70k files, is few ms against 3/4 seconds .
And actually I installed g++ 8.3.1 and used std::filesystem::recursive_directory_iterator to do the same thing (print every path on stdout) and it was noticeably slower taking 9.639 seconds versus 3.58 seconds. So the massive speedup is still a mystery to me.