PUB /

AIPro

Agents or similar

AI Distributed mining

like gpu mining but offering node hardware for rental.


AI Server build


2.3 - 2.5 watts of power per RDIMM for 2133mhz
3.0 - 3.4 watts of power per RDIMM for 2666mhz to 3600 Mhz ddr4 modules

$20 - 32gb sticks 2666v
$12 - 16gb sticks 2666v from r/homelabsales
https://www.reddit.com/r/homelabsales 

$78 - 32gb 3200MT sticks from https://memory.net/store/page/3/?filter_technology=ddr4-3200
$44 - 16gb 3200MT sticks from https://memory.net/store/page/3/?filter_technology=ddr4-3200


https://en.wikichip.org/wiki/amd/cores/milan

Super AI server Build:

8x 32gb RDIMM will take around 60 - 75 watts
8x 32gb LRDIMM will take 110 - 120 watts
16x 32gb RDIMM will take around 120 - 150 watts


$744 - Supermicro MB MBDH12SSLCTO Socket SP3 - 5x pcie gen4 16x and 2x pcie4 x8- 8x ddr4 3200mhz - dual 10g
			pair wth: Milan AMD 7663 56c/112Th upto 3.5Ghz for $640
			or pair with 7773 upto 3.5Ghz 64C/128Th For $795
			or pair with 73F3 upto 4.0Ghz 16C/32Th For $1188
			or pair with 74F3 upto 4.0Ghz 24C/48Th For $1345
			or pair with 75F3 upto 4.0Ghz 32C/64Th For $1620

$1200- Asrock Rack Rome2D32GM-2T  ((32 DIMM SLOTS!!!) ( Still only 8-Memory channels per CPU, more expensive unless 32x32gb sticks is cheaper than 16x32gb sticks)
$699 - AsRock Rack ROME2D16-2T EEB Server Motherboard Dual Socket AMD SP3 https://www.neweggbusiness.com/product/product.aspx?item=9B13-140-060

$558 - Single socket m32-ar0 16dimms, only 8 channel but 4 - PCIe Gen4 x16 and 1-x8 with 1-Gen3 x16 and 1-x8 expansion slots rev3.0 supports epyc 7003.
$708 - Ebay Dual Socket h12dsi-n6  16-dimms 16 channel
$350 - Ebay tugm h11ssi  - 3 PCI-E 3.0 x16 / 3 PCI-E 3.0 x8
$??? - H12SSL-i   This H12 supports PCIE4.0 unlike h11ssi, pair with 7742

$1350 - full System for 5x+ gpu ai server: Tyan s8030gm4ne-2t & AMD EPYC 7742 64 Core & 256G RAM X550 10GBe +heatsink

$3000 - ebay full system 2x 24c/48t (48c/96t), 512gb ddr4 x16 channels

or better pci slot layout here: Asrock Rack EPYCD8

$750- Gigabyte MZ32-AR0: 16 dimm slot, single socket, 7002/7003
$1000 - Gigabyte MZ72-HB2: 16 dimm slot, dual socket, 7002/7003

https://www.ebay.com/str/tugm4470

$305 AMD EPYC 7K62 48core/96thread from tugm
$795 AMD EPYC 7H12 64core/128thread top-of-the-line
$365 8x64Gb LRDIMM DDR4 - 512gb RAM
$390 Gigabyte MZ31-AR0


https://www.ebay.com/itm/176562276096?_skw=ssl&itmmeta=01JPSDC8CR55PATVCGEGD5FAZ0&hash=item291bee8700:g:F2oAAOSwglBm2spW

double g1/4 connector https://modmymods.com/alphacool-es-double-nipple-plug-in-g1-4-ag-to-g1-4-ag-deep-black-13757.html


https://www.reddit.com/r/LocalLLaMA/comments/1lnin1x/ai_coding_agentswhat_am_i_doing_wrong

IndianaNetworkAdmin
•

Instead of doing direct integration, I give detailed instructions of what I need. Have a model develop pseudocode first,
then feed that in and have them build individual functions. When using smaller models it's better to make things as modular as possible.

"I need logic to accomplish the following: Accept two variables, add them together, and return the value.
The function should work for any numeric value, float or otherwise. The function should throw an error if a non-numeric value is provided.
The returned value should be a float."

Take that, and then:
"I need the following psudocode rendered in Python 3.11 using only native Python capabilities: <Logic from prior response>"
I've done this when I've been too lazy to reinvent the wheel. The pseudocode pass also gives you a chance to review the logic before it's turned to code.

You can also do a second pass of pseudocode -
"Evaluate the following pseudocode, and determine if there is a more optimal approach with the assumption it will be rendered with Python 3.11:"

This lets it determine if there is a better or faster way of doing things. For example if the sorting method provided could be more easily accomplished
with another method.

I don't use a local model at the moment, I'm waiting to see how things look in another six months before adding a dedicated LLM microtower to my cluster.
But the above process does phenomenally well on Gemini at the moment. And by doing pseudocode first and limiting to individual components, you can work
with smaller context limits.