The most limiting factor will be the VRAM. But even the 1660 should render something with small resolutions, this was also written in the excellent blog posts about the text-to-image scripts: Text-to-Image Summary – Part 1 | Softology's Blog
Sorry for being imprecise, CUDA cores is not correct, RT and Tensor cores is what I meant…
According to Softologys blog you may be able to use a non-RTX card, but at the reduction in settings that would be required seem to make it mostly not worth the effort
My assertion that an RTX was required was based directly on the machine learning mode pre requisite installation instructions:
An NVIDIA 3090 GPU with 24GB VRAM is highly recommended and will get the best performance and highest resolution outputs.
An NVIDIA 2080 GPU with 8GB VRAM would be the bare minimum hardware spec and will only give you small resolution images.
If you have an older GPU or one with less than 8GB VRAM, do not bother. You will only be disappointed and complain “it doesn’t work”.
The second image is really nice! What Script are you using? I am half-through, but I have not found something, that comes even close to your creations.
Currently I am using the disco diffusion v5 script
for the above images i used:
a custom size of 640x360 (this gets resized by VoC to an acceptable ratio)
7500 guidance scale
250 range scale
CLIP denoised turn on
Secondary model turned on (turning this off seems to double the rendering time)
350 iterations
Super resolution output using ruDALL -E Real-ESRGAN x2
But really that is all to do with image quality, in terms of what you get it is all about text prompt trial and error, and using some sneaky tricks
I slightly change the text prompt each time, adding a word, changing a word, or removing a word depending on the outcome.
For the final image I ended up with the text prompt: Electronic piano keyboards and Modular synthesizers with dials arcane wires instruments, GI render macro ZBrush CGSociety
The GI render is a trick to make the image look computer generated (check out global illumination to see what that looks like)
I introduced the macro part after the first 5 to get rid of the black and white style and introduce a bit of depth of field
And I think the first prompt had the word scaffolding included, but i removed that at some point
I love all of these!
Especially with the depth of field turned on it looks like you got very close to this weird machine and snapped a picture with a DSLR.
I can’t really hope to run this stuff on my own, but I got an invite to Midjourney yesterday and I have hardly been doing anything else since then Crazy stuff.
Too lazy to read and check if it was suggested already, but it would be really cool to create an AI that works with VCV. Maybe force it replicate the sounds with a set of VCOs. Haha, i hope there is someone crazy enough to do it