Jump to content

Talk:Xenos (graphics chip): Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
No edit summary
Line 95: Line 95:
:Xenos, has 48 vector4+scaler alus, all madd, and decoupled from texture fetches, and filtering. [[User:Swapnil 404|Swapnil 404]] ([[User talk:Swapnil 404|talk]]) 21:47, 7 March 2008 (UTC)
:Xenos, has 48 vector4+scaler alus, all madd, and decoupled from texture fetches, and filtering. [[User:Swapnil 404|Swapnil 404]] ([[User talk:Swapnil 404|talk]]) 21:47, 7 March 2008 (UTC)


:I think you may have already known this, but the scalar opp is MADD so its 2 shader flops so the 240 Gigaflops comment for the xenos is correct. [[User:Gears, Gears, Gears|Gears, Gears, Gears]] ([[User talk:Gears, Gears, Gears|talk]]) 08:48, 10 March 2008 (UTC)
:I think you may have already known this, but the xenos's scalar ALU is MADD so its 2 shader flop's so the 240 Gigaflops comment for the xenos is correct. [[User:Gears, Gears, Gears|Gears, Gears, Gears]] ([[User talk:Gears, Gears, Gears|talk]]) 08:48, 10 March 2008 (UTC)

Revision as of 06:35, 11 March 2008

I am not familiar with the wikipedia editing, but now that AMD has bought ATI, every time the article mentions ATI could somebody add behind all ATI : (now AMD) 82.120.186.74 21:31, 24 October 2007 (UTC)[reply]

Move

Suggest moving to Xenos (GPU). No assertion of usage prominence over other uses for "Xenos". Would favour Xenos (Greek) under first precendences.--ZayZayEM (talk) 05:55, 10 December 2007 (UTC)[reply]

It would make sense to me, as the concept is the source of the other names. Other possibilities though would be to point this to the disambiguous page. --Falcorian (talk) 17:49, 10 December 2007 (UTC)[reply]


Should it be Xenos (GPU), Xenos GPU or Xenos (graphical processing unit)--ZayZayEM (talk) 02:09, 11 December 2007 (UTC)[reply]

So, how was the vertex rate figured?

In this article, they seem to have come to a "1.5 billion vertices per second" figure from somewhere. I would assume they thought that since a typical polygon is a triangle with three sides, 1.5 billion vertices, would form 500 million polygons. So it seems they worked backwards from the "500 million" figures released by Microsoft, and just assumed 1.5 billion vertices from there.

It's my understanding, that typical gpu listings for this figure, calculate vertex shader processing, by assuming it takes 4 clock cycles to complete the simplest vertex positional transform (matrix * vector). So for every vertex shader you have, = 1/4 the clock frequency. That's true of all gpus from what I can tell. And since all polygons in a mesh could (theoretically) be sharing a vertex with it's neighbor in strips and fans, the theoretical maximum polygon count is sometimes "considered" 1 to 1 with vertex rate.

The problem here, is that (being unified) all of Xenos's alus could potentially be processing geometry at once, such would be the case in the z-only pre-pass. But it wouldn't make much sense to list that as a per second figure of 6 billion. And technically, 1 block of alus devoted to vertex work, would be 2 billion vertices per second. But again, even that wouldn't make much sense for a list of reasons.

So, Microsoft listed the "set-up" limit in their specifications. That would be the maximum you could actually draw on screen, after backface and occlusion culling, etc.. And with a reasonable number of vertex shader instructions (outside of simple transform), you would avoid reaching that limit.

I'm not sure how it should technically be listed, but I think the 1.5 billion figure is wrong regardless, as I believe the gpu is only capable of setting up 500 million "vertices", and the traditional "shared vertex" condition is already applied to the 500 million figure. Thus, you would likely never see much more than a few million polygons drawn in any given frame. (which I strongly believe to be the case)

Anyone disagree? Agree? No problem if you think I'm wrong, but please explain a bit. Thanx. Swapnil 404 (talk) 22:53, 20 December 2007 (UTC)[reply]

Double shader preformance for the Xenos?

One of the lead engineers who was working on the Xenos (His name escapse me at the moment) stated that the xenos is capable of 96 billion shader ops per second, thats twice the ammount stated by Microsoft, Im assuming that the piplines now do 2 vector4 ops and 2 scalar ops, so 4 ops per pipline and 48*4*500,000,000=96,000,000,000 shader ops per second, I dont know if this is true, or what I said just then made any sense, im just wondering if anyone can confirm or disprove this but if its right can you please post this on the article (If im right I think it may effect everything and definetly make the flop count per pipline 20). —Preceding unsigned comment added by Gears, Gears, Gears (talkcontribs) 02:49, 29 January 2008 (UTC)[reply]

Yeah, I remember when he said that. Feldstein I think it was. But I don't think he was implying shader ops. Perhaps shader flops. 4 flop madds per cycle, per shader. Rather than just listing vector and scaler. It could be, that you can issue a vector, scaler, vertex fetch, and texture load, all with-in one instruction cycle. That would be 4. And it would contrast Nvidia, because they hadn't decoupled their texturing ops from shader ops, etc.. And one stalls the other.Swapnil 404 (talk) 02:00, 23 February 2008 (UTC)[reply]

Shader flop or op count

I thought when they said shader op they were refering to shader ALU operations per second, but any way that means that for the PS3's RSX so, because it preforms 2 scalar, 2 vector and one fog (Is a fog op similar to a vector, but missing a colour value?) op per pipe, so should we change it to shader flops per second for both? Or am I completly wrong. —Preceding unsigned comment added by Gears, Gears, Gears (talkcontribs) 03:52, 25 February 2008 (UTC)[reply]

They were, for xenos. Just vector + scaler. They didn't consider anything else, like interpolation units, fetch units, etc.. I was just suggesting the fetching and loading as possible examples of what Feldstein meant by 4 ops, outside of it just being him misspeaking.
RSX can't compute a shader, in the same cycle it issues a texture fetch. And there are still situations where one alu stalls the other, etc.. Xenos has separate logic for such tasks. So on RSX, issuing a texture :fetch, (and some of its latency) cuts directly into the shader operations, while on Xenos it doesn't.
Really though, you'd have to specify what is meant by "shader op", as it could be any number of things really. (fp16 normalize could be considered one. I doubt they consider the fog alu an op, as I "think" it's there, more for legacy software, and modern games compute fog in the shader itself) And things like those mini-alus, seem to be there, to meet the shader model 3.0 specification requirements)
Just as a base figure, an rsx shader alu, can do 4 flop madds, for vector and scaler ops. (vector3+ scaler, vector2 + vector2, or Vector4, etc)
(madd is multiply + ADD, considered 2 flops)
So, 2 alus per shader, each capable of 4 flop madds, madd = 2 flops.
24 x 2 = 48 x (4 x 2) = 384 flops per clock.
http://www.watch.impress.co.jp/game/docs/20060329/3dps309.htm
But then, you could ask, where are the vertex shaders considered in those figures. (could be just a ps slide)
But it is a slide meant for developers, so no need to get counting every flop you can find to inflate the number, etc.. Swapnil 404 (talk) 20:47, 29 February 2008 (UTC)[reply]
Truth is my knowledge of the subject is more limited then your own, but I would just like to know, should we consider the shader op count on this page 96 billion per second and keep it so on this page, or should we put it to something you see as correct. Also since you seem to be more informed about the Xenos than me could you work out how many programable flops there are per clock for the Xenos? Gears, Gears, Gears (talk) 09:12, 5 March 2008 (UTC)[reply]


Neah, I wouldn't change it on my own really. I've read the 96 billion quote, but I don't think it implies what's considered raw shaders. There are a bunch of things you could consider as a "shader op". A Microsoft rep has said Xenos could do "160 shader operations per cycle" or more, if you consider the 32 control flow ops, 16 texture fetches, and 16 programmable vertex fetches per clock, and consider that they can all be issued simultaneously, while on RSX they cut directly into shader operations to varying degrees. (the first alu in each of RSX's shaders, doubles as a tmu for texture calls for example) And perhaps he's right. But then, there would be other things to consider on RSX as well.

And just for straight "shaders", it'd be just the 48 alus, all vector4 MADD, + scaler special function. (scaler seems to be 1 flop, from a few different places I've read, although a Microsoft rep had calculated it as 2 in one of their flops comparisons) So, 48 x 4 x 2 = 384 for vector plus 48 more for scaler. 432 generic shader flops per cycle. If we count the scaler as 2, then it's 480. Which matches the Microsoft reps figures, when he said 240 billion per second. There are flops involved in other operations, but I think most would limit "programmable" to just those. Swapnil 404 (talk) 15:48, 5 March 2008 (UTC)[reply]

Thanks for clearing up all those things Swapnil, but I would like to know one more thing, Just a straight out programmable shader flop performance (including vertex piplines on the RSX) comparison of the Xenos and RSX? If not then thats ok, but its definetly four flop MADDs for a vector4 opp right and 1 flop for a scalar opp? Gears, Gears, Gears (talk) 07:58, 6 March 2008 (UTC)[reply]


Well, I'm sure it depends alot on what the load is. The ratio of vertex shaders to pixel shaders, and the number of texture fetches involved, etc.. Along with a list of other factors. On paper, it used to be thought of as RSX has more raw shader power on paper, Xenos was more "efficient". Of course, that was before RSX was clocked back to 500mhz/650mhz ram, and doesn't consider any other components involved with "shading".

From the folks who've worked directly with the hardware, and would be in a position to know first hand (and actually willing to talk about it), most have said Xenos > RSX. Especially vertex shader work (by quite a bit), but it seems perhaps pixel shaders as well in some cases. Code optimized to run really well on RSX, could be expected to run ok on Xenos in many instances, but code optimized for Xenos would overwhelm RSX in some areas. (of course, none of that assumes eventually using cell to reduce rsx work with pre-culling, etc..) Overall, for gpu's shader performance, most give the edge to Xenos to varying degrees.

And a vertex shader is vector4+scaler. Xenos' need to be capable of both pixel and vertex work. I would guess, the scaler is a single flop. But I guess it could be either, as I've heard it both ways. Swapnil 404 (talk) 01:44, 7 March 2008 (UTC)[reply]

Just one last question, thanks for answering all the rest, what operations in the pipline make a pixel shader (or what you could call a general pixel shader). And can you also tell me if this is right, to the best of your knowledge:

RSX total programable flops (pixel and vertex pipes)= 24x2x(4x2)+8x(4x2)+8= 456 shader flops per cycle (or 464 if scalar= 2 flops)

Xenos total programable flops= 48x(4x2)+48= 432 shader flops per cycle (or 480 if scalar= 2 flops)

Does the RSX have 24x2 because its instructions are co-issued for the pixel piplines? Thanks for all the help anyway. Gears, Gears, Gears (talk) 09:58, 7 March 2008 (UTC)[reply]

Yeah, pretty much. There are two alus tied together in each pipeline. Pixel shaders are typically vector3+scaler. (red, green, blue, alpha)
Nvidia pixel shader alu, just did 4 flops at a time. (madd capable) So, it could do a vector3+scaler, or a vector4, or scaler+scaler or a vector2+vector2. Depending on what needs processing.
A typical ati gpu, had 2 alus per pipe, but only one was madd capable, the other was just an add, and they were just standard "vector3+scaler". Meaning, that any time vector2 instructions came up, the alu could only process one at a time, and the other flops go wasted in that cycle. (but I don't think those came up very often)
The difference for ati was that they had separate logic for issuing texture fetches. So, they don't waste any alu cycles doing that, and could hide fetch latency with flops, by just processing something else until it gets what it needs. Nvidia alus were far more likely to stall waiting for a fetch result.
Xenos, has 48 vector4+scaler alus, all madd, and decoupled from texture fetches, and filtering. Swapnil 404 (talk) 21:47, 7 March 2008 (UTC)[reply]
I think you may have already known this, but the xenos's scalar ALU is MADD so its 2 shader flop's so the 240 Gigaflops comment for the xenos is correct. Gears, Gears, Gears (talk) 08:48, 10 March 2008 (UTC)[reply]