Are you considering purchasing a system with non-upgradable HBM memory like CPU chips?

F5F Stay Refreshed Hardware Desktop Are you considering purchasing a system with non-upgradable HBM memory like CPU chips?

Are you considering purchasing a system with non-upgradable HBM memory like CPU chips?

Are you considering purchasing a system with non-upgradable HBM memory like CPU chips?

Pages (2): Previous 1 2

F

Fijiboys777
Member

196

10-16-2023, 04:17 PM

#11

I'd like to explore processors featuring HBM close to the die while still including at least one DDR4 or DDR5 memory controller. This setup would allow adding slower memory compared to HBM, leveraging the wide bus benefits of HBM. With 1024 bits per stack, even one or two stacks could be ideal for high-performance instructions like AVX-512. Some optimizations could target memory sizes from 512 to 2048 bits, filling cache lines in multiples of these values. This approach would not be replaced by DDR4/DDR5 v-cache, as the additional 64 MB of HBM memory provides a fast SRAM layer that operates much quicker than slower caches. This tightly integrated solution works well with many CPU components needing ultra-low latency.

F

Fijiboys777

10-16-2023, 04:17 PM #11

I'd like to explore processors featuring HBM close to the die while still including at least one DDR4 or DDR5 memory controller. This setup would allow adding slower memory compared to HBM, leveraging the wide bus benefits of HBM. With 1024 bits per stack, even one or two stacks could be ideal for high-performance instructions like AVX-512. Some optimizations could target memory sizes from 512 to 2048 bits, filling cache lines in multiples of these values. This approach would not be replaced by DDR4/DDR5 v-cache, as the additional 64 MB of HBM memory provides a fast SRAM layer that operates much quicker than slower caches. This tightly integrated solution works well with many CPU components needing ultra-low latency.

M

mj18wals
Senior Member

256

10-18-2023, 11:20 AM

#12

I've often pondered two possibilities: interfaces resembling consoles with a shared high-speed memory pool supplying both CPU and GPU. It's similar to a powerful APU capable of pushing mid-range GPUs beyond what current APUs offer. A massive L4 cache, envisioned in gigabytes, would be involved. It won't substitute system RAM. In my view, this setup suits situations where AMD's 3D Cache remains significantly limited. Latency could be greater than L3, but the focus would be on avoiding RAM accesses and keeping data within the socket, offering a consistent view across multiple CCDs. AMD would need a major shift to make this work, though. Infinity Fabric's bandwidth wouldn't be enough. I believe they'd have to switch to a more direct, chip-to-chip approach, much like Apple's Ultra CPUs, and the way NV is handling the largest Blackwell architecture.

M

mj18wals

10-18-2023, 11:20 AM #12

I've often pondered two possibilities: interfaces resembling consoles with a shared high-speed memory pool supplying both CPU and GPU. It's similar to a powerful APU capable of pushing mid-range GPUs beyond what current APUs offer. A massive L4 cache, envisioned in gigabytes, would be involved. It won't substitute system RAM. In my view, this setup suits situations where AMD's 3D Cache remains significantly limited. Latency could be greater than L3, but the focus would be on avoiding RAM accesses and keeping data within the socket, offering a consistent view across multiple CCDs. AMD would need a major shift to make this work, though. Infinity Fabric's bandwidth wouldn't be enough. I believe they'd have to switch to a more direct, chip-to-chip approach, much like Apple's Ultra CPUs, and the way NV is handling the largest Blackwell architecture.

D

DerVerdelger
Junior Member

29

10-21-2023, 07:43 PM

#13

Absolutely, the bandwidth in consumer systems is quite poor right now. Running 16+ cores on a dual-channel setup is really inefficient. Switching to a larger HBM or standard DRAM with a wider bus would be much better—256 to 512 GB would be practical at a fair price. Upgradeability isn’t my priority since I often hit the limits of what’s available. I’m currently stuck at 128 GB with my AM4 configuration and would love to upgrade to DDR5 when 64 GB becomes common, so I could double my storage. A compact mini workstation would make me very happy. This is essentially the Xeon MAX series, by the way.

D

DerVerdelger

10-21-2023, 07:43 PM #13

Absolutely, the bandwidth in consumer systems is quite poor right now. Running 16+ cores on a dual-channel setup is really inefficient. Switching to a larger HBM or standard DRAM with a wider bus would be much better—256 to 512 GB would be practical at a fair price. Upgradeability isn’t my priority since I often hit the limits of what’s available. I’m currently stuck at 128 GB with my AM4 configuration and would love to upgrade to DDR5 when 64 GB becomes common, so I could double my storage. A compact mini workstation would make me very happy. This is essentially the Xeon MAX series, by the way.

S

Sampo05
Junior Member

14

10-29-2023, 05:53 PM

#14

Intel attempted to integrate Optane storage via PCIe on consumer hardware. It seems we might see more DRAM soldered directly onto motherboards when speeds become sufficient, similar to the approach used in laptops. They're moving closer to removing the socket and using a strong air cooler instead. It's possible you haven't noticed yet, but AMD's Strix Point APU with a Ryzen 7 7700X and a Zen 5 chip is being discussed in leaks. AMD aimed to enhance APUs by adding more I/O space and possibly integrating a dedicated GPU. Despite their progress, they likely wouldn't have considered such a design change until dGPU sales dropped enough to affect their business.

S

Sampo05

10-29-2023, 05:53 PM #14

Intel attempted to integrate Optane storage via PCIe on consumer hardware. It seems we might see more DRAM soldered directly onto motherboards when speeds become sufficient, similar to the approach used in laptops. They're moving closer to removing the socket and using a strong air cooler instead. It's possible you haven't noticed yet, but AMD's Strix Point APU with a Ryzen 7 7700X and a Zen 5 chip is being discussed in leaks. AMD aimed to enhance APUs by adding more I/O space and possibly integrating a dedicated GPU. Despite their progress, they likely wouldn't have considered such a design change until dGPU sales dropped enough to affect their business.

S

s3tBR
Member

179

11-06-2023, 07:07 AM

#15

I've sort of wondered the purpose of 3D v-cache, since they could otherwise just make the die larger to add that cache. I've assumed that its because a larger die requires a larger substrate to make all the connections to the motherboard, so stacking the dies is worth the effective clock speed reduction when adding a +25C insulator on top of the CPU cores. Or its simply a cost calculation with the yield rates. Its not like there's not physically more space on AM5's substrate to accommodate a few extra millimeters of silicon. Ryzen CCDs are already quite small in comparison to Intel's monolithic dies, but I imagine there's a lot of math and estimations involved with calculating failure rates and impurities relative to die size.

S

s3tBR

11-06-2023, 07:07 AM #15

I've sort of wondered the purpose of 3D v-cache, since they could otherwise just make the die larger to add that cache. I've assumed that its because a larger die requires a larger substrate to make all the connections to the motherboard, so stacking the dies is worth the effective clock speed reduction when adding a +25C insulator on top of the CPU cores. Or its simply a cost calculation with the yield rates. Its not like there's not physically more space on AM5's substrate to accommodate a few extra millimeters of silicon. Ryzen CCDs are already quite small in comparison to Intel's monolithic dies, but I imagine there's a lot of math and estimations involved with calculating failure rates and impurities relative to die size.

A

Arkunir
Junior Member

4

11-10-2023, 01:07 AM

#16

No. There's a tradeoff between speed and size, and consumer Optane Memory is too far on the size end and low on speed. Conversely I feel AMD's 3D cache is faster and smaller than I'd like. The nearest consumer product example to what I'm thinking of would be Broadwell-C. At the time you had up to quad core CPUs with 4MB L3 cache. But slapped next to it on substrate was 128MB of eDRAM as L4. Best case official ram bandwidth at the time would be ~25GB/s with dual channel 1600. The eDRAM was nominally 50GB/s and latency was likely much lower. Given today we're reaching low hundreds of MB L3 per CPU, order of magnitude bigger would take us into GB range. Bandwidth will have to be faster than ram, so nothing PCIe based would meet that. For consumer tier CPUs, I don't feel HBM would be a good mix as it is based on low clock but very wide, thus would likely come with a latency penalty. It makes more sense for server CPUs with many more cores, and GPUs which are inherently very wide.

A

Arkunir

11-10-2023, 01:07 AM #16

No. There's a tradeoff between speed and size, and consumer Optane Memory is too far on the size end and low on speed. Conversely I feel AMD's 3D cache is faster and smaller than I'd like. The nearest consumer product example to what I'm thinking of would be Broadwell-C. At the time you had up to quad core CPUs with 4MB L3 cache. But slapped next to it on substrate was 128MB of eDRAM as L4. Best case official ram bandwidth at the time would be ~25GB/s with dual channel 1600. The eDRAM was nominally 50GB/s and latency was likely much lower. Given today we're reaching low hundreds of MB L3 per CPU, order of magnitude bigger would take us into GB range. Bandwidth will have to be faster than ram, so nothing PCIe based would meet that. For consumer tier CPUs, I don't feel HBM would be a good mix as it is based on low clock but very wide, thus would likely come with a latency penalty. It makes more sense for server CPUs with many more cores, and GPUs which are inherently very wide.

E

ERKKIN
Member

218

11-12-2023, 12:31 AM

#17

I hadn't encountered that before, though I was familiar with strix halo. I think it wouldn't fit well with sufficient RAM, so it wouldn't be appealing for my setup. But if it came in a compact 32~64GB version, it could work nicely on a lightweight laptop. That's not a big concern for me since I don't need it from Nvidia (for work), and for my device I'd prefer something reliable with better battery life instead. Adding this cache would raise latency and power use, and expanding it to a full 2D-plane would significantly increase the chip size.

E

ERKKIN

11-12-2023, 12:31 AM #17

I hadn't encountered that before, though I was familiar with strix halo. I think it wouldn't fit well with sufficient RAM, so it wouldn't be appealing for my setup. But if it came in a compact 32~64GB version, it could work nicely on a lightweight laptop. That's not a big concern for me since I don't need it from Nvidia (for work), and for my device I'd prefer something reliable with better battery life instead. Adding this cache would raise latency and power use, and expanding it to a full 2D-plane would significantly increase the chip size.

K

KasieKat
Member

188

11-12-2023, 02:10 AM

#18

The v-cache occupies a substantial area, with quick searches indicating a size around 6mm by 6mm (36mm²). The Zen 3 CCD die measures approximately 83.736mm² (11.27 x 7.43mm), effectively covering nearly half of the CPU die volume. Adding this would greatly expand the chip size significantly. Images below reveal the internal layout of a Zen3 processor, showing the 32 MB cache in magenta/pink and the 4 MB L2 cache between the L3 cache and cores. This illustrates how much space is dedicated to SRAM cache. When manufacturing silicon chips on wafers—most commonly 300mm or 30cm in diameter—the process isn’t flawless, resulting in defects per square inch. TSMC’s techniques are highly refined, keeping defect rates low. Still, smaller dies benefit AMD since more functional chips can be extracted from a single wafer. If flaws occur, cores in affected areas can often be bypassed, whereas cache issues are harder to fix. Selling CPUs with only partially functional caches is less advantageous. For instance, using 11 x 7 mm CCDs could yield 400 chips, but if 20 are defective, you’d end up with just 200 working units. Alternatively, employing 20 x 10 mm dies that include the v-cache might produce around 250 chips, though 40–50 could be flawed—resulting in only about 200 operational chips. AMD also repurposes these CCDs for other products like Threadripper and EPYC processors, maximizing profitability. You might find it helpful to view this video for a clearer grasp on why focusing on compact chiplets is more strategic.

K

KasieKat

11-12-2023, 02:10 AM #18

The v-cache occupies a substantial area, with quick searches indicating a size around 6mm by 6mm (36mm²). The Zen 3 CCD die measures approximately 83.736mm² (11.27 x 7.43mm), effectively covering nearly half of the CPU die volume. Adding this would greatly expand the chip size significantly. Images below reveal the internal layout of a Zen3 processor, showing the 32 MB cache in magenta/pink and the 4 MB L2 cache between the L3 cache and cores. This illustrates how much space is dedicated to SRAM cache. When manufacturing silicon chips on wafers—most commonly 300mm or 30cm in diameter—the process isn’t flawless, resulting in defects per square inch. TSMC’s techniques are highly refined, keeping defect rates low. Still, smaller dies benefit AMD since more functional chips can be extracted from a single wafer. If flaws occur, cores in affected areas can often be bypassed, whereas cache issues are harder to fix. Selling CPUs with only partially functional caches is less advantageous. For instance, using 11 x 7 mm CCDs could yield 400 chips, but if 20 are defective, you’d end up with just 200 working units. Alternatively, employing 20 x 10 mm dies that include the v-cache might produce around 250 chips, though 40–50 could be flawed—resulting in only about 200 operational chips. AMD also repurposes these CCDs for other products like Threadripper and EPYC processors, maximizing profitability. You might find it helpful to view this video for a clearer grasp on why focusing on compact chiplets is more strategic.

T

TheMaaykGamer
Member

62

11-12-2023, 04:08 AM

#19

Yes, but that would require huge design hurdles and contradict the PC enthusiast idea of upgrading every couple of months.

T

TheMaaykGamer

11-12-2023, 04:08 AM #19

Yes, but that would require huge design hurdles and contradict the PC enthusiast idea of upgrading every couple of months.

C

CometKalea
Member

81

11-12-2023, 06:57 AM

#20

The main issue I see is during fixes and repairs, which modularity helps with. This depends on having spare parts or funds, something I doubt most users possess. Intel likely manages this with their binning strategy for the 12-14th generation. Disabling each P-core or E-core also removes 3MB of L3 cache. I share this view, and it seems to be decreasing with the smaller chips in Zen 4c. There are concerns about 3D v-cache, as it acts like a significant heat barrier on the chip. Given Intel hasn't conducted extensive long-term stress tests on these CPUs (as revealed in a recent interview), it's likely AMD hasn't either. The bonding method used for the 3D v-cache die appears smooth, with no visible or felt differences compared to previous dies. The 7950x3D has been around just over two years, so any design flaws might soon become apparent. The indium solder seems more flexible than expected, helping the CCDs withstand repeated thermal cycles without cracking too much. The thermal limits of the 3D v-cache module are real, but for direct-die builds like the 7950x3D, the TDP cap is around 155W—well within the capabilities of the stock chip. Still, the stock model is capped at 105W due to performance constraints.

C

CometKalea

11-12-2023, 06:57 AM #20

The main issue I see is during fixes and repairs, which modularity helps with. This depends on having spare parts or funds, something I doubt most users possess. Intel likely manages this with their binning strategy for the 12-14th generation. Disabling each P-core or E-core also removes 3MB of L3 cache. I share this view, and it seems to be decreasing with the smaller chips in Zen 4c. There are concerns about 3D v-cache, as it acts like a significant heat barrier on the chip. Given Intel hasn't conducted extensive long-term stress tests on these CPUs (as revealed in a recent interview), it's likely AMD hasn't either. The bonding method used for the 3D v-cache die appears smooth, with no visible or felt differences compared to previous dies. The 7950x3D has been around just over two years, so any design flaws might soon become apparent. The indium solder seems more flexible than expected, helping the CCDs withstand repeated thermal cycles without cracking too much. The thermal limits of the 3D v-cache module are real, but for direct-die builds like the 7950x3D, the TDP cap is around 155W—well within the capabilities of the stock chip. Still, the stock model is capped at 105W due to performance constraints.

Pages (2): Previous 1 2