Questions on using local build as main Rack

domino · November 26, 2019, 8:11pm

So far I have built v1.1.6 tag (adjusted for march=native) and the plugins I want. I’ve tested some closed source plugins (Vult etc) by copying them to my build plugins directory and it all seems ok.

During this process, a few questions came up that I couldn’t find clear answers to.

Is it normal to see nearly an 80% improvement in performance for a native linux (AMD Phenom II) build? I expected some improvement, but that was a shock amount to me and has me considering moving to a compile yourself distro.

If I make it my main Rack and use the library to supplement the local plugin builds, how would the library handle my plugin builds?

Would I have to remove the ones I’ve built myself from my account to avoid conflicts?

Can the build use it’s own plugin folder for the library so I can keep the official setup intact?

lomasmodules · November 26, 2019, 10:19pm

The compiler can make specific optimizations for your CPU. But it is impossible to know how much it will improve performance beforehand.

For the library if you update a plugin on your machine it will delete your compiled version. You will have to download and compile the newer version each time the plugin is updated or keep using the old one.

TroubledMind · November 26, 2019, 11:34pm

I’m surprised you’re seeing that much of a difference. I ran Gentoo and Funtoo for a while but I’m not sure that I could say there was a great difference in speed overall. Certainly if you want to have a lot of control it is good but I don’t think I would bother if you just want to do it for speed alone.

Coirt · November 26, 2019, 11:51pm

Once you name / rename your compiled versions slug it should not overwrite e.g. SLUG_dev. Presumably your compiled version will always be ahead of your repo version you could just increment the version and not update all if you where to run out of dev mode.

There is the /.Rack flag which will target the plugin folder. you could make dist with the temp slug copied there, safest option imo and you get to test any changes along side previous version.

domino · November 27, 2019, 12:41am

Yeah, I was surprised at the speed difference too. I can only assume that nocona uses some intel instructions that my amd cpu has to emulate, so by going native it avoids that.

I’ll dig into distro march flags deeper before jumping into anything too daft. I’m sure there would be a lot of AMD specific distros if it normally made that much difference.

domino · November 27, 2019, 1:12am

I feel like an idiot now. How did I miss those command line options that seem to do just what I’m looking for?

Thanks

ablaut · November 27, 2019, 8:47am

as a former gentoo dev, i’m not surprised. optimizing builds for your machine and your needs can mean significant performance improvements for certain applications, especially cpu heavy ones.

but overall, the most improvement you would see from a system like gentoo is the cutting out of clutter, so your computer feels faster, because it doesn’t need to waste time on stuff you don’t need. the customizability is great.

Vortico · November 27, 2019, 9:08pm

No, that’s not normal. If you can investigate why, post an issue on GitHub.

domino · November 30, 2019, 11:41am

Investigating why is a bit beyond my current skill set unfortunately. I can get as far as running Rack through valgrind but don’t know enough to make use of it’s output.

domino · December 3, 2019, 12:14pm

I had another little look into the performance difference by comparing the assembly from nocona to native.

I’m still out of my depth, but I’m starting to get the idea that it’s lots of little things adding up to the big difference.

One thing that did stand out to me was the lack of PXOR in the native code. Instead it’s XORPS. Researching this led me to https://www.agner.org/optimize/microarchitecture.pdf which reminded me why I prefer higher level languages

In this case it looks like XORPS is used because AMD do all their logical instructions in the integer domain, so it avoids the domain switching latency of PXOR.

That’s the kind of thing I was thinking of when I guessed there was an instruction AMD had to ‘emulate’ and in the small file I compared (engine/Port.cpp) it happened 18 times.

I doubt there is going to be one thing at the root of this, but this looks like it might be a big part of it to me.

michael · December 3, 2019, 8:48pm

Can you share how you measured performance?

domino · December 3, 2019, 9:14pm

When I first posted it was just from the CPU info from pressing F3. Totals for the default demo went from 4.7 to exactly 1.

I’ve since double checked with conky and on the same patch cpu use went from ~83% to about 17% if I recall correctly. It was under 1/4 of the cpu use for certain.

As you might imagine, I can tell the difference in use and can do far more before getting under runs.

jerrysv · December 3, 2019, 10:00pm

I have intel cpu’s, which means the compilation is near as good as it’s going to get, but I do wonder about llvm polyhedral optimization - just not enough to build my own llvm toolchain

domino · December 3, 2019, 11:12pm

You’re not tempting me, I’ve enough to wonder about already… though I did have a quick glance…

In the past I’ve only really compiled things that weren’t already in my distro so this is all new to me. My machine is generally fast enough even though it’s years old, I’m used to it’s limits and it’s usually me keeping it waiting rather than the other way around…

domino@desktop1 ~/src/Rack $ systemd-analyze
Startup finished in 2.060s (kernel) + 1.663s (initrd) + 1.812s (userspace) = 5.536s
graphical.target reached after 1.424s in userspace