Mobile processor designer Arm will make your Android phone -- and maybe your ultralight Windows laptop -- to run faster with the a new chip generation, the Cortex-A75.
Arriving in Android smartphones in 2019, the new Arm Cortex-A76 CPU, based on DynamIQ technology, promises to deliver laptop-class performance while maintaining the power efficiency of a smartphone. This technololgy has been already appied to the Windows10 Always-Connected PCs with integrated Arm-based SoCs from Qualcomm, which deliver 20-plus hours of battery life,along with LTE connectivity and access to the Windows app ecosystem.
The A76 packs up to 2-Mbytes L2 cache, 4-Mbytes L3, and running at more than 3 GHz in a 7-nm node. It aims to deliver 90% of the Specint2006 performance of an Intel mobile Skylake chip with one-fourth the area and half the power - or roughly the same performance in thermally constrained systems.
Compared to an A72 core at 10 nm, a 7-nm A76 should deliver 35% more performance or use 40% less power. That's a step up from 15% to 25% increases that Arm typically delivers with annual core upgrades. In its day, the A72 delivered about 75% of the performance of Intel's mobile Broadwell processors.
The comparisons are based on CPUs running at similar frequencies, since Intel's chips typically support higher frequencies than Arm's cores.
The A76 aims to expand Arm's dominance in smartphones into laptops with 4+4 A76/A55 configurations sporting large caches..
he Cortex-A76 is also said to deliver 4x compute performance improvements for AI/ML at the edge, enabling responsive, secure experiences on PCs and smartphones.<.p>
"We think we've turned a corner relative to the overall performance curve," Rene Haas, president of Arm's intellectual property group, said at a press conference Thursday in San Francisco. He promised "laptop-class performance" and said it should compete with Intel's high-end Core i7 models.
Microarchitectural improvements are included in the Cortex-A76 to increase the performance, through instruction per cycle uplift or deeper memory level parallelism.
Some of the key enhancements include:
- Decoupled branch prediction and instruction fetch: Built to hide latency at high bandwidth, the in-order Cortex-A76 front-end is able to fetch 4 to 8 instructions per cycle, using multi-level branch target caches and hybrid indirect predictor to sustain the maximum throughput.
- A wider machine: Cortex-A76 is Arm's first 4-wide decode core, increasing the maximum instruction per cycle capability. Up to 8 operations per cycle can then be dispatched to the out-of-order core, supporting a wider area-/power-optimized instruction window.
- More integer and vector execution throughput: Quad-issue integer units are integrated in the core including 3x simple ALU and 1x multi-cycle integer. Moreover, Cortex-A76 supports dual-issue native 16B (128-bit) vector and floating-point units, twice the throughput of any previous Arm CPU. Vitally, it can deliver the 4x ML performance improvements we mentioned earlier.
- Enhanced memory system: The full cache hierarchy is co-optimized for latency and bandwidth, with a sophisticated 4th generation prefetcher, deep memory-level parallelism.
Along with the new suite of system IP, Arm offers POP technology that supports Cortex-A76, and its LITTLE core companion Cortex-A55, for various process technologies. The Cortex-A76 POP IP for TSMC 16FFC delivers the fastest performance in one of the most cost-effective process technologies available. For those looking for leading-edge process technologies and targeting premium and high-end applications, the Cortex-A76 and Cortex-A55 POP IPs for TSMC 7FF also will be available by Q4 2017.
Arm claims that the latest 7-nm nodes will only deliver 2% to 3% more speed than the 16-nm node.
The company also announced two accompanying chips, the Mali-G76 graphics processor and the Mali-V76 video chip.