Thursday, May 29, 2008

Nvidia's GTX 280

Well, after plenty of speculation about Nvidia's next generation high-end graphics architecture, the "almost" official specs are in:

- 65nm TSMC process technology.
- ~1.4 Billion transistors (almost double the amount in the current G92's Geforce 9800 GTX/8800 GTS 512MB/8800 GT, etc, 65nm core).
- 24x24mm die -576mm^2- (480mm^2 in G80; 330mm^2 in G92).
- 240 scalar processors (128 in G92).
- 32 Render Output Processors -ROP's- (28 in G80; 16 in G92).
- 80 Texture Mapping Units -TMU's- (from 64 in G92).
- 512bit-wide memory bus (doubled from G92).
- 600MHz core clockspeed, 1300MHz shader clockspeed.
- 1024MB of 1100MHz GDDR3 (effectively 2200MHz) which, coupled with the 512bit bus, provides for 140GB/s of bandwidth.
- 236 Watt Thermal Design Power.
- 933 MegaFLOPS (FLOPS -or FLoating Point Operations per Second- is a largely theoretical metric).

It should be released in mid-June in two versions:

A GTX 280 with the specs above, for a 600+ US dollar price tag.
A GTX 260 version featuring a binned version of the same core, with some disabled portions, less memory, but also a much lower price -US$ 449- and lower TDP -182 Watt-.

Early performance leaked on to the internet suggest it is much faster than G92, sometimes over 100% more.
It also seems to point to a much more aggressive stance on GPGPU through Nvidia's CUDA API, with the helping hand of the brand new FP64 support built into the chip.

As usual, let's wait for proper results on launch day to get a better assessment of its capabilities.

Monday, May 5, 2008

GT200 and the magic ever-flippin' specs

These past few weeks have seen the rebirth of the GT200 leaked spec craze, after the relatively quietness of the 9800 GTX/9800 GX2 marketing blitz.
Only a few loose bits seem to be common among them, namely, a 512bit-wide memory bus (GDDR3, probably) and between 192 and 240 scalar processors for the core.
Things like the process technology (55nm, 65nm), TDP, performance, etc, are huge clouds so far.

Frankly, it's too soon to be speculating about a massively reworked architecture.
Even if it's a dual-core chip tied by an internal NF200-style bridge (different from a MCM-style layout, mind you...), i sincerely doubt that it's much more than G92 with a few extra execution units, double-precision (for CUDA, as it's mostly useless for graphics rendering right now).
The reason is that broad acceptance of DX10 in most games is still in its adolescence, and DX11 is still quite a few years ahead. DX10.1 support is also in doubt, and for good reason, as Nvidia doesn't seem to be interested in retro-actively aiding in game support for the competitor's current GPU's. It looks like DX10.1 will follow DX9's SM 2.0b into historic obscurity.

Wednesday, April 2, 2008

Atomized

So the Intel Atom was finally (and officially) unveiled in its full glory.

Now, i admit, there are some very questionable design decisions made by Intel.

First, why was it that a company capable of such technical prowess as cramming a fully x86, 64bit ready, dual-threaded architecture in to a tiny 45nm core and package could not pair it properly to its platform, and instead had to resort to a mash up from a severely crippled "Lakeport-G" (a.k.a, the three-year old Intel 945G/GC) and a PowerVR Mobile core (si
milar to the one found on the iPhone) for its 3D duties -the 2D and video output engines were designed in-house by Intel but i wished they weren't, as they can't do much more than 720p-level resolutions over a D-sub/VGA analog out, therefore wasting the "Full-HD" H.264, VC-1 and MPEG 2 codec support from the 3D portion-.

Second, with 90nm GM965 and G35 cores (and a 65nm G45 on the way very soon) out for more than a year, why did they make this mash up in a very old and very, very power inefficient 130nm process (you've read that correctly, a 130nm Northbridge+Southbridge design made with the same Pentium 4 "Northwood" -welcome back to 2001 ;)- process technology) ?


Third, with this "Poulsbo" chipset negatively impacting the surface mount required for integration in mobile devices (it dwarfs the CPU in both die size and package dimensions), and the lack of support for the PCI bus, SATA and digital video output, i'm afraid the original concept of the "Silverthorne" might be absolutely doomed, as there are right now much more power-efficient, feature-rich and popular Linux-based ARM solutions.

So, the core will only really be attractive when paired with the standard 945GC in ultra low-cost mobile machines and entry-level desktops for the 3rd world or network access point for medium to large businesses.
Those will be the famed "Diamondville" cores, that, because of it's high Vcore and TDP, will also be much cheaper to mass produce.

Ironically, when it comes down to power consumption, Intel is even "promoting" the superiority of SiS chipsets over their own in this slide:

Yep. Intel 945GC + ICH7 = 22W TDP.
SiS 671 + SiS 968 = 8W TDP.

Of course, "Diamondville" can also *in theory* be paired to any Core 2-ready chipset, since they share the GTL+ Quad-Pumped Front Side Bus protocol, but i guess that will be up to the motherboard manufacturers themselves, as there will be no socketed version of the CPU (and even if there was to be one, i doubt it would use the same pin-count and layout as standard Centrino Core Duo and Core 2 Duo-based CPU's).

"Silverthorne" won't have that "luxury", because only the "Poulsbo" chipset can function in the FSB's exclusive "CMOS mode" to save additional power.

In all, i think the launch was a double-edged sword.

"Diamondville" may turn out to be one of Intel's best-selling CPU's ever (if their platforms are marketed and sold properly at attractive levels).

But "Silverthorne" faces a very uphill battle, since the Windows option is not there (and Linux is already extremely wide-spread among ARM vendors, because of its ease of modification and cross-platform development tools).

For a real fight, we'll have to wait for "Moorestown" in roughly 1.5 years' time.
That -presumably still 45nm-based- successor to the "Silverthorne"/"Diamondville" will do away with the chipset, and integrate the Northbridge and Southbridge functions (memory controller included) directly into the CPU core die itself.
Of course, by then, the dual-core, shared L2 cache, 40nm Cortex A9 basic design from ARM will also be on the market, and it might prove to be a tough nut to crack.

Wednesday, March 12, 2008

AMD RV770/R700

As I suspected earlier, vr-zone.com is reporting that RV770 consists of five clusters with 160 stream processors each (grouped in the same 4 IEEE754 +1 fashion as in the Rv6xx/R6xx cores).

My take: 5 slightly modified HD3650's (40 stream processors more than the 120 of the RV635), striped of their individual memory controllers altogether (which, as we saw with the RV670, RV610/20 and RV630/35, does impact the transistor count significantly), which would then access an on-die, but "off-array" common memory controller, bypassing the frame-buffer multiplication of the same data, a limitation of multi-chip rendering.

The connection between these "arrays" themselves and/or the common memory controller could be made by some form of on-chip coherent HyperTransport link.

Let's wait a bit more and see if i was way too off the mark or not...

Intel Xeon "Dunnington"


The Intel "Dunnington", successor of the "Tigerton" (Xeon X73xx, E73xx, L73xx) and "Tulsa", is scheduled for the third quarter of this year, just ahead of the brand new architecture, "Nehalem"-derivatives for the enterprise and high performance workstation segment.

This will likely be the "swan song" of the venerable Socket 604, in use since the old "Paxville" MP (early Pentium 4's "Netburst" offspring), but it will go out with a bang, nevertheless.

The big news: 45nm, single die, 6-core design, with up to 16MB of shared L3 cache.
Despite the single die and L3 cache, it is not a brand new architecture with an integrated memory controller, but a mere "packaging" of 3 dual-core 45nm Core 2 derivatives, with each pair of cores sharing a single 3MB of L2 cache, and only then accessing the other cores though the rather large, but much higher latency L3 space.

The increased computing density's need for one big FSB bus (impractical with the soon-to-be-retired GTL+ protocol design and hardware infrastructure in place) is therefore offset by the shared L3 -up to a point-.
This also means a new and easy, drop-in upgrade path for most high-density setups.

Of course, Nehalem will not benefit from it, so Intel is giving the socket 604 one last chance until the "Quickpath"-enabled motherboards start over with a clean slate (which could take years, giving the server market's lag against the desktop/notebook typical upgrade patterns).

Monday, March 10, 2008

JPR Desktop Graphics report for Q4'07

The latest figures are in, and they don't look too promising for AMD.
Despite the "apparent" success of the RV670 launch (Radeon HD 3850 and HD 3870), the reality was that, not only they couldn't take market share away from Nvidia, but they have actually lost a huge 7% of their own share to their Santa Clara rivals.

Nvidia now commands a mighty 71% of all PC desktop discrete graphics shipments (up from 64%) which, despite the "mentideros" close to console gaming interests saying otherwise, grew a staggering 50.3% year-over-year, and a sizable 23% sequentially against Q3'07.

Source: Jon Peddie Research

Intel Atom = Prices




According to vr-zone.com, the suggested price for the Intel Atom 230 SKU (single-core, dual-threaded, 1.6GHz/512KB cache L2) will be 29 US dollars -supposedly single unit price when purchased in 1000 unit lots for OEM's, as usual-.

This bodes very well for the expected quick mass adoption of the platform for low-cost notebooks, desktop PC's and even entry-level UMPC's.

Sunday, March 9, 2008

Physics effects in games don't need to be in 3D

Download and info:
http://www.acc.umu.se/~emilk/downloads.html

AMD "R700"



Contrarily to popular belief, i'm still not entirely convinced that the "R700" next-gen high-end from AMD is just two "RV770" GPU's (more powerful RV670 cores) "stitched" together on a PCB like the Radeon HD3870 X2.

Oh, and it will almost certainly be a 55nm core yet again, but it doesn't take a genius to figure that out once you're aware of the TSMC process roadmaps...
After the Asus Eee PC 701, Cloudbook, HP 2133 (glup) and of the recently revealed 9 inch screen version of the Eee PC, lo and behold, here it comes, the MSI "Wind":



As predicted, the Intel Atom is taking the (low-end) world for its own.

However, i still believe that the VIA "Isaiah"/CN-based HP 2133 UMPC is, not only more feature-filled (it even has an ExpressCard/54 slot, for god sakes; not even the top of the line, über-expensive Apple Macbook Pro 17 incher has one :P), but it could end up being faster in most single-threaded software applications than a single core, dual-threaded Intel Atom "Diamondville"
due to the former Out-of-Order CPU design and significantly improved Floating Point processing capabilities.
The also new -if actually used on the HP "mini-wonder"- DX10 IGP from VIA's S3 subsidiary could also give it an edge over the modified GMA950 from the Intel "Poulsbo" chipset that currently is a required companion to the Intel Atom core.

As they say, time will tell.

P.S.:
It is believed that the first versions of the 9-inch Eee PC still sport the >underclocked to a mere 630MHz by bringing the "quad-pumped" FSB from 100MHz down to 70MHz< Intel Celeron M 900MHz based on the old 90nm "Dothan" core (second generation, but still single-core Pentium M), so be careful when you decide to choose one of those as your next "net-machine".
An Intel Atom "Diamondville" should be significantly faster than it anyday, especially if its clockspeed overs around the rumored 1.6~1.8GHz figures. Not to mention a much lower power consumption due to the 45nm process and agressive clock gating techniques, which are completely absent in the Celeron M.

G92 product naming thoughts


Many in the industry question Nvidia's motives for the admittedly awkward naming of their first G92 GPU's, shipped in millions of "8800 GT", 8800 GTS 512MB and "8800 GS" around the world since last fall.
I could atest first hand to some confusion in the minds of plenty of people who ask me for advice in their purchase decisions, but i also have somewhat of a theory about it.
Bear with me for a moment.



The G92 was to be the Geforce 9xxx right from the start, but the absence of a high-end refresh to the R600 ("HD 2900 XT") from AMD, and the chance to make it throughout the whole Holiday Season without having to show all the cards in the company's deck (pun not intended :D) were just too good to pass on.
Think of it as a two-part product family launch.
First they capitalize on the huge "Geforce 8800" marketing potential (it's, in many ways, the Radeon 9700 Pro from Nvidia) by calling two cheaper variants of the G92 by that name and ride the usual "Christmas shopping craze".

Second, it's no coincidence that the 8800 GTS 512MB is very similar to the 9800 GTS and 9800 GTX whose launch is near.
The 8800 GTS 512MB was probably to be launched alongside the 9800 GTX bearing the name "9800 GTS" back in November, divided from the high-end part only by it's core and memory clocks, besides a more elaborate PCB design (a lá 7900 GTX vs 7900 GT).

The 9800 GX2 is a quick answer to the Radeon HD3870 X2, but, honestly, it's only there to show off Quad-SLI once more, alongside Three-Way SLI (with the 9800 GTX, 8800 GTX and 8800 Ultra in that case).

Geforce 9800 GTX



With the arrival of the Geforce 9800 GTX, Nvidia hopes to have a somewhat accessible point (at least compared to the aging 8800 GTX/Ultra, which has unacceptable TDP's due to their old 90nm-based GPU) of entry for the "Three-Way SLI" technology introduced with the nForce 680i SLI/780i SLI.
As the Intel X48 high end core logic clearly comes to the market later than expected (bearing "Crossfire X" support), Nvidia needed a quick and easy solution to turn up the appeal of their next top of the line chipset, the (almost) all new nForce 790i SLI/Ultra SLI for high bandwidth, low voltage DDR3 memory technology.
I say "almost", because, even though the Northbridge is in fact completely new, and now integrates PCI-Express 2.0 on-chip (besides switching from DDR2 to DDR3), the Southbridge component is still the 2 year old nForce 590/570 SLI for AMD systems (here relegated to I/O hub only).
Now, i'm not against using it
per se (it still has plenty to offer as a Southbridge, from 6 SATA 3Gb/s ports with RAID 5, to HD Audio, integrated Gigabit Ethernet and IDE/ATA, USB 2.0, etc), given the fact that it's main purpose is to provide 16 lanes of PCI-Express to the third full-length slot -which gives it an edge over their top counterparts from AMD and Intel, which can't go beyond x4 or x8-.



But, even with a new revision on its shoulders, the process tech used (130nm) is a bit long in the tooth, and the fact that a certain DFI motherboard shown at Cebit has an active cooler on top of it (but only a small passive heatsink on the main chip, the 790i SLI) should be somewhat odd for the buyer of such over-the-top parts.

Incidentally, the nForce 790i SLI Northbridge finally marks the transition to 90nm (matching the "P965" and "P35" from Intel, but still behind the soon to be released "P45" -essentially, a 65nm version of "P35").