A Japanese company has announced an ambitious new system that uses what is essentially a complex, 45nm ray-tracing GPU to accelerate real-time ray traced rendering. The target market is automotive design, and, unfortunately for any gamers who might fantasize about one day using for games, it's likely to stay confined to that niched forever.
A Japanese company has announced a massive, 800 teraflop real-time ray tracing (RTRT) system that gangs together nine, 73-core chips into a single system that fits inside a desktop computer form factor. The new chip, which is being jointly developed with Toyota and Unisys, is aimed at the auto industry, where designers will use it to prototype body designs and paint combinations.
As for how this system works, there are currently only two sources of information: a Japanese description on the website of the chip's maker, TOPS Systems Corporation, and a Nikkei article in English that's presumably a summary of the Japanese original. Given the paucity of information and the relative shallowness of my technical knowledge of ray tracing, I'll give my best shot at explaining this system and putting it in context, and I'll invite others to weigh in with more info in the comments thread.
As I noted above, the overall system consists of nine identical, 45nm ASICs ganged together via some unspecified interconnect scheme. Each individual ASIC consists of nine compute clusters connected to one another through a shared bus. (See this excellent diagram from Nikkei.) This bus also hosts a 64-bit RISC master controller that presumably takes in work in batches and assigns it to the other eight cores, which then do the grunt work of computing the rays; there are also I/O and memory interfaces attached to this shared bus, which link the chip to the rest of the system.
There are a few things that are interesting about these clusters, one of which is pictured below. First, you'll notice that each cluster is made of eight heterogeneous cores, each of which is supposed to handle on part of the ray tracing algorithm. The heterogeneous cores are connected by a high-bandwidth, three-bus link (system bus, data bus, and instruction bus), which lets a job move in stages from one core to the next. Clearly, this is a pipeline setup, with one core per stage, and in this respect the ASIC is very much a ray-tracing GPU—analogous to the fixed-function GPUs of yesteryear, which had custom hardware blocks dedicated to each stage of the rasterization pipeline.
The fact that this is a ray-tracing GPU has very important implications for the part's future in gaming. To wit, it has no such future. But more on that in a moment.
The second interesting thing about this system is that it addresses RTRT as a compute problem, instead of as a data management problem like the much less ambitious Caustic Graphics solution. You'll recall that Caustic's solution relies on the traditional GPU to do the computational heavy lifting, with the Caustic board accelerating the data lookup part of the problem. The TOPS design, in contrast, is more traditional brute force, multicore plus caches solution whose main novel twist is this one-core-per-pipeline-stage idea.
There's a reason that the other plans for a hardware-based RTRT solution have been homogeneous multicore designs that throw bandwidth and highly parallel math hardware at the problem, and that's the fact that you can actually repurpose a homogeneous design for other applications in different verticals, thereby gaining the sales volume needed to make producing the IC profitable. This brings me to the reason why you shouldn't plan on using a successor to this chip in a computer game.
Source: ars technica