Pushing several steps farther in the multicore direction, Intel on Wednesday demonstrated a fully programmable 48-core processor it thinks will pave the way for massive data computers powerful enough to do more of what humans can.
The 1.3-billion transistor processor, called Single-chip Cloud Computer (SCC) is successor generation to the 80-core "Polaris" processor that Intel's Tera-scale research project produced in 2007. Unlike that precursor, though, the second-generation model is able to run the standard software of Intel's x86 chips such as its Pentium and Core models.
The cores themselves aren't terribly powerful--more like lower-end Atom processors than Intel's flagship Nehalem models, Intel Chief Technology Officer Justin Rattner said at a press event here. But collectively they pack a lot of power, he said, and Intel has ambitious goals in mind for the overall project.
"The machine will be capable of understanding the world around them much as humans do," Rattner said. "They will see and hear and probably speak and do a number of other things that resemble human-like capabilities, and will demand as a result very (powerful) computing capability."
Intel is working with companies facing large-scale computing challenges that today require thousands of networked servers. That's very much a here-and-now problem compared to the more sci-fi challenges of computer vision.
Intel's idea with the SCC and its ilk, Rattner said: "Could you replace a rack full of equipment today with one or a number of high-core count processors like the SCC?"
The chipmaker found only one flaw with the chip so far and has booted Windows and Linux on SCC systems. The company demonstrated computers using the processor running Microsoft's Visual Studio on Windows and other tasks at the event.
No silver bullet for parallel programming
The Tera-scale project doesn't fundamentally address one of the big challenges in today's computing industry, though: getting multicore chips to run today's computing jobs that are often designed to run as a single thread of instructions rather than independent tasks running in parallel. In days of yore, processor clock frequencies got steadily faster, letting single threads execute faster, but overheating issues led chip designers instead down the multicore path for trying to increase computing power.
"This isn't a full solution," Rattner said of the programming challenge. He said that from a programmer's perspective, the SCC is similar in many ways to a server with 48 cores.
While the chip may not have any silver bullets for the parallel programming challenge, it does have the advantage of some compatibility with existing computer designs. It can run ordinary software for Intel chips, unlike the increasingly capable graphics chips touted by Intel rivals Nvidia and Advanced Micro Devices.
"Our thrust is to maintain the compatibility and familiarity of the Intel architecture as we move to more and more performance," Rattner said. "That's why we could bring up Windows and Linux environments with relatively little effort."
The system is different in some ways, though, notably in its lack of cache coherency--technology that keeps data stored in each core's high-speed memory bank synchronized with the others on the chip. By contrast, Intel's Larrabee processor, a many-core x86 chip under development for graphics acceleration, is a cache-coherent design that has a large amount of real estate devoted to caching data.
100 chips for research partners
Intel hopes to encourage academics and others to tackle programming challenges on the chip. To that end, Intel plans to share 100 SCC-based systems with various partners in industry and academia.
Microsoft is one such partner. "We're very excited about this as a research vehicle," said Jim Larus, director of cloud-computing futures at Microsoft Research.
One major feature of the SCC design is a high-speed "mesh" network that lets each of the 48 cores communicate with others or with the four linked memory controllers. The first-generation Tera-scale chip had such a network, but the second-generation mesh consumes only a third of the power and is accelerated with built-in hardware instructions for minimum communication delays, Rattner said.
That fast communication was designed in part as a response to what Intel industry partners desired, Rattner said. "They were looking for extremely low latency--not just core to core at the chip level, but interchip as well," he said. Each link on the chip can carry 64 gigabytes of data per second.
Better power management is one element of the new design. The chip cores can be switched on or off as the chip is running.
"It's extremely clever, because it means the processor could be run in an adaptive mode. Processors could be turned on and off depending on the applications," said Jon Peddie, an analyst with Jon Peddie Research.
Overall, the chip consumes between 25 and 125 watts, Rattner said. It's built using a manufacturing process with 45-nanometer electronics features.
It consists of 24 dual-core modules linked together. A computer based on the chip can accommodate a maximum of 64GB of memory.