So over in (Ai image generation) Remote Setup I was looking at the optimal cost per image setup. Because I love data so much I decided to go ahead and document how I got the values in the table.

I did this by looking at this Tom’s Hardware article where they gave a benchmark of images per minute on a few different GPU’s. I manually copied the values for the 512 x 512 image generation into a CSV file.

For the vast.ai data I found that their API call will return every machine if you don’t put on any filters (don’t even need any headers!) So I download that and ended up with an 8 MB JSON file.

I currently have this fascination with DuckDB. It originally started with SQLite, but moved on to DuckDB as I prefer using it for analysis rather than actual storage. If you didn’t know, DuckDB can natively consume JSON files.

So I fired up DuckDB and ran the following:

SELECT * FROM 'vastai.json';

This yielded a table with one column and one row…

This is because everything is underneath a single offers entry:

{
	"offers": [
		{ "entry1" : ...},
		{ "entry2" : ...},
		{ "entryN" : ...},
	]
}

This can be easily solved by using UNNEST. And we need to put on recursive to get it properly unnest it all.

SELECT UNNEST(OFFERS, recursive := true) FROM 'vastai.json';

Ahh.. much better. That UNNEST feature is really powerful.

Once I had that I spent a bit of time exploring the data and comparing it to what I could see on the website. I was never able to fully correlate the price I saw on the website with the prices I saw from the API. They always seemed to be off by a few cents. But I decided close enough was near enough.

I went through and stored the data in tables next:

CREATE TABLE VAST AS 
SELECT UNNEST(OFFERS, recursive := true) 
	FROM 'vastai.json'
;
	
CREATE TABLE PERF AS 
SELECT * 
	FROM 'GpuImagePerformance.csv' -- Where I stored data from Tom's Hardware
;

Then to get the final query I ran:

SELECT Average_Cost
		, ImagesPerMinute
		, (ImagesPerMinute * 60) / Average_Cost IMAGES_PER_DOLLAR
		, GPU_NAME
		, RENTABLE
		, TOTAL
FROM (
	SELECT AVG(dph_total) "Average_Cost"
			, p.ImagesPerMinute 
			, GPU_NAME
			, COUNT(CASE WHEN RENTABLE = 1 THEN 1 END) RENTABLE
			, COUNT(*) TOTAL
		FROM VAST v
			JOIN PERF p on lower(v.GPU_NAME) = lower(p.GpuName)
		WHERE verification = 'verified'
			AND num_gpus = 1 -- I don't need multiple for stable diffusion
		GROUP BY *
		ORDER BY 1 desc
)
ORDER BY IMAGES_PER_DOLLAR DESC

I actually stripped out the rentable and total columns after all because they weren’t important and the table was getting squished. Here are the results again:

Average cost per hour ($)Images Per MinuteImages per average costGPU
0.1077826086956521826.0414495.845098830172RTX 3070
0.178641.6213982.082866741319RTX 3080 Ti
0.128829.3113653.726708074533RTX 2080 Ti
0.1328.713246.153846153846RTX 3070 Ti
0.2009375000000000537.011048.211508553652RTX 3080
0.2730769230769230546.8510293.802816901409RTX 3090 Ti
0.265659308701658342.99689.101475795313RTX 3090
0.1523.719484.000000000002RTX 4060 Ti
0.475559682876712775.139478.93642440802RTX 4090
0.3500000000000000351.558837.142857142857RTX 4080
0.1338400000000217.817984.160191271968RTX 3060
0.480940000000000123.32906.807501975298RTX 3060 Ti
0.831.952396.25Titan RTX

Conclusion

These results are pretty basic. They don’t take into account many factors such as the rest of the machine’s performance, or even the GPU VRAM available. If I wanted to I could easily add an extra statement to the WHERE clause such as AND gpu_ram >= 12 * 1024 to get all cards where the VRAM is at last 12 GB (in case you were wondering, the 3080 Ti wins in that case).

All in all it was a fun little use of DuckDB to analyze some data quickly. While I could have used other tools to do this, I feel the data ingestion capabilities of DuckDB, coupled with its small file size and ease of use allowed me to get the information I wanted quickly. I do not know how to suggest it to someone who has no or limited SQL knowledge though.