So over in (Ai image generation) Remote Setup I was looking at the optimal cost per image setup. Because I love data so much I decided to go ahead and document how I got the values in the table.

I did this by looking at this Tom’s Hardware article where they gave a benchmark of images per minute on a few different GPU’s. I manually copied the values for the 512 x 512 image generation into a CSV file.

For the vast.ai data I found that their API call will return every machine if you don’t put on any filters (don’t even need any headers!) So I download that and ended up with an 8 MB JSON file.

I currently have this fascination with DuckDB. It originally started with SQLite, but moved on to DuckDB as I prefer using it for analysis rather than actual storage. If you didn’t know, DuckDB can natively consume JSON files.

So I fired up DuckDB and ran the following:

SELECT * FROM 'vastai.json';

This yielded a table with one column and one row…

This is because everything is underneath a single offers entry:

{
	"offers": [
		{ "entry1" : ...},
		{ "entry2" : ...},
		{ "entryN" : ...},
	]
}

This can be easily solved by using UNNEST. And we need to put on recursive to get it properly unnest it all.

SELECT UNNEST(OFFERS, recursive := true) FROM 'vastai.json';

Ahh.. much better. That UNNEST feature is really powerful.

Once I had that I spent a bit of time exploring the data and comparing it to what I could see on the website. I was never able to fully correlate the price I saw on the website with the prices I saw from the API. They always seemed to be off by a few cents. But I decided close enough was near enough.

I went through and stored the data in tables next:

CREATE TABLE VAST AS 
SELECT UNNEST(OFFERS, recursive := true) 
	FROM 'vastai.json'
;
	
CREATE TABLE PERF AS 
SELECT * 
	FROM 'GpuImagePerformance.csv' -- Where I stored data from Tom's Hardware
;

Then to get the final query I ran:

SELECT Average_Cost
		, ImagesPerMinute
		, (ImagesPerMinute * 60) / Average_Cost IMAGES_PER_DOLLAR
		, GPU_NAME
		, RENTABLE
		, TOTAL
FROM (
	SELECT AVG(dph_total) "Average_Cost"
			, p.ImagesPerMinute 
			, GPU_NAME
			, COUNT(CASE WHEN RENTABLE = 1 THEN 1 END) RENTABLE
			, COUNT(*) TOTAL
		FROM VAST v
			JOIN PERF p on lower(v.GPU_NAME) = lower(p.GpuName)
		WHERE verification = 'verified'
			AND num_gpus = 1 -- I don't need multiple for stable diffusion
		GROUP BY *
		ORDER BY 1 desc
)
ORDER BY IMAGES_PER_DOLLAR DESC

I actually stripped out the rentable and total columns after all because they weren’t important and the table was getting squished. Here are the results again:

Average cost per hour ($)	Images Per Minute	Images per average cost	GPU
0.10778260869565218	26.04	14495.845098830172	RTX 3070
0.1786	41.62	13982.082866741319	RTX 3080 Ti
0.1288	29.31	13653.726708074533	RTX 2080 Ti
0.13	28.7	13246.153846153846	RTX 3070 Ti
0.20093750000000005	37.0	11048.211508553652	RTX 3080
0.27307692307692305	46.85	10293.802816901409	RTX 3090 Ti
0.2656593087016583	42.9	9689.101475795313	RTX 3090
0.15	23.71	9484.000000000002	RTX 4060 Ti
0.4755596828767127	75.13	9478.93642440802	RTX 4090
0.35000000000000003	51.55	8837.142857142857	RTX 4080
0.13384000000002	17.81	7984.160191271968	RTX 3060
0.4809400000000001	23.3	2906.807501975298	RTX 3060 Ti
0.8	31.95	2396.25	Titan RTX

Conclusion

These results are pretty basic. They don’t take into account many factors such as the rest of the machine’s performance, or even the GPU VRAM available. If I wanted to I could easily add an extra statement to the WHERE clause such as AND gpu_ram >= 12 * 1024 to get all cards where the VRAM is at last 12 GB (in case you were wondering, the 3080 Ti wins in that case).

All in all it was a fun little use of DuckDB to analyze some data quickly. While I could have used other tools to do this, I feel the data ingestion capabilities of DuckDB, coupled with its small file size and ease of use allowed me to get the information I wanted quickly. I do not know how to suggest it to someone who has no or limited SQL knowledge though.

🍀 Leaf's Garden

Explorer

Using DuckDB to Compute Optimal Cost per Image

Conclusion

Graph View

Backlinks