This article may contain an excessive amount of intricate detail that may interest only a particular audience.(April 2021)
|Release date||2011 (Original); 2018 (Zen based)|
|Cores||2 to 8|
The AMD Accelerated Processing Unit (APU), formerly known as Fusion, is the marketing term for a series of 64-bit microprocessors from Advanced Micro Devices (AMD), designed to act as a central processing unit (CPU) and graphics processing unit (GPU) on a single die. APUs are general purpose processors that feature integrated graphics processors (IGPs).
AMD announced the first generation APUs, Llano for high-performance and Brazos for low-power devices in January 2011. The second generation Trinity for high-performance and Brazos-2 for low-power devices were announced in June 2012. The third generation Kaveri for high performance devices were launched in January 2014, while Kabini and Temash for low-power devices were announced in the summer of 2013. Since the launch of the Zen microarchitecture, Ryzen and Athlon APU's have released to the global market as Raven Ridge on the DDR4 platform, after Bristol Ridge a year prior.
The AMD Fusion project started in 2006 with the aim of developing a system on a chip that combined a CPU with a GPU on a single die. This effort was moved forward by AMD's acquisition of graphics chipset manufacturer ATI in 2006. The project reportedly required three internal iterations of the Fusion concept to create a product deemed worthy of release. Reasons contributing to the delay of the project include the technical difficulties of combining a CPU and GPU on the same die at a 45 nm process, and conflicting views on what the role of the CPU and GPU should be within the project.
The first generation desktop and laptop APU, codenamed Llano, was announced on 4 January 2011 at the 2011 CES show in Las Vegas and released shortly thereafter. It featured K10 CPU cores and a Radeon HD 6000-series GPU on the same die on the FM1 socket. An APU for low-power devices was announced as the Brazos platform, based on the Bobcat microarchitecture and a Radeon HD 6000-series GPU on the same die.
At a conference in January 2012, corporate fellow Phil Rogers announced that AMD would re-brand the Fusion platform as the Heterogeneous System Architecture (HSA), stating that "it's only fitting that the name of this evolving architecture and platform be representative of the entire, technical community that is leading the way in this very important area of technology and programming development." However, it was later revealed that AMD had been the subject of a trademark infringement lawsuit by the Swiss company Arctic, who used the name "Fusion" for a line of power supply products.
The second generation desktop and laptop APU, codenamed Trinity was announced at AMD's 2010 Financial Analyst Day and released in October 2012. It featured Piledriver CPU cores and Radeon HD 7000 Series GPU cores on the FM2 socket. AMD released a new APU based on the Piledriver microarchitecture on 12 March 2013 for Laptops/Mobile and on 4 June 2013 for desktops under the codename Richland. The second generation APU for low-power devices, Brazos 2.0, used exactly the same APU chip, but ran at higher clock speed and rebranded the GPU as Radeon HD7000 series and used a new IO controller chip.
A third generation of the technology was released on 14 January 2014, featuring greater integration between CPU and GPU. The desktop and laptop variant is codenamed Kaveri, based on the Steamroller architecture, while the low-power variants, codenamed Kabini and Temash, are based on the Jaguar architecture.
Since the introduction of Zen-based processors, AMD renamed their APU's as the Ryzen with Radeon Graphics and Athlon with Radeon Graphics, with desktop units assigned with G suffix on their model numbers to distinguish itself with regular processors (e.g. Ryzen 5 3400G & Athlon 3000G) and also to differentiate away from their former Bulldozer era A-series APU's. The mobile counterparts were always paired with Radeon Graphics regardless of suffixes.
In November 2017, HP released the Envy x360, featuring the Ryzen 5 2500U APU, the first 4th generation APU, based on the Zen CPU architecture and the Vega graphics architecture.
AMD is a founding member of the Heterogeneous System Architecture (HSA) Foundation and is consequently actively working on developing HSA in cooperation with other members. The following hardware and software implementations are available in AMD's APU-branded products:
|Type||HSA feature||First implemented||Notes|
|Optimized Platform||GPU Compute C++ Support||2012
|Support OpenCL C++ directions and Microsoft's C++ AMP language extension. This eases programming of both CPU and GPU working together to process support parallel workloads.|
|HSA-aware MMU||GPU can access the entire system memory through the translation services and page fault management of the HSA MMU.|
|Shared Power Management||CPU and GPU now share the power budget. Priority goes to the processor most suited to the current tasks.|
|Architectural Integration||Heterogeneous Memory Management: the CPU's MMU and the GPU's IOMMU share the same address space.||2014
|CPU and GPU now access the memory with the same address space. Pointers can now be freely passed between CPU and GPU, hence enabling zero-copy.|
|Fully coherent memory between CPU and GPU||GPU can now access and cache data from coherent memory regions in the system memory, and also reference the data from CPU's cache. Cache coherency is maintained.|
|GPU uses pageable system memory via CPU pointers||GPU can take advantage of the shared virtual memory between CPU and GPU, and pageable system memory can now be referenced directly by the GPU, instead of being copied or pinned before accessing.|
|System Integration||GPU compute context switch||2015
|Compute tasks on GPU can be context switched, allowing a multi-tasking environment and also faster interpretation between applications, compute and graphics.|
|GPU graphics pre-emption||Long-running graphics tasks can be pre-empted so processes have low latency access to the GPU.|
|Quality of service||In addition to context switch and pre-emption, hardware resources can be either equalized or prioritized among multiple users and applications.|
|Mainstream||Llano||Trinity||Richland||Kaveri||Kaveri Refresh (Godavari)||Carrizo||Bristol Ridge||Raven Ridge||Picasso|
|Mainstream||Llano||Trinity||Richland||Kaveri||Carrizo||Bristol Ridge||Raven Ridge||Picasso|
|Basic||Desna, Ontario, Zacate||Kabini, Temash||Beema, Mullins||Carrizo-L||Stoney Ridge|
|Embedded||Trinity||Bald Eagle||Merlin Falcon,
|Great Horned Owl||Grey Hawk||Ontario, Zacate||Kabini||Steppe Eagle, Crowned Eagle,
|Prairie Falcon||Banded Kestrel|
|Platform||High, standard and low power||Low and ultra-low power|
|Released||Aug 2011||Oct 2012||Jun 2013||Jan 2014||2015||Jun 2015||Jun 2016||Oct 2017||Jan 2019||Mar 2020||Jan 2021||Jan 2011||May 2013||Apr 2014||May 2015||Feb 2016||Apr 2019|
|CPU microarchitecture||K10||Piledriver||Steamroller||Excavator||"Excavator+"||Zen||Zen+||Zen 2||Zen 3||Bobcat||Jaguar||Puma||Puma+||"Excavator+"||Zen|
|PCI Express version||2.0||3.0||2.0||3.0|
|Fab. (nm)||GF 32SHP
|Die area (mm2)||228||246||245||245||250||210||156||180||75 (+ 28 FCH)||107||?||125||149|
|Min TDP (W)||35||17||12||10||4.5||4||3.95||10||6|
|Max APU TDP (W)||100||95||65||18||25|
|Max stock APU base clock (GHz)||3||3.8||4.1||4.1||3.7||3.8||3.6||3.7||3.8||4.0||1.75||2.2||2||2.2||3.2||3.3|
|Max APUs per node[b]||1||1|
|Max CPU[c] cores per APU||4||8||2||4||2|
|Max threads per CPU core||1||2||1||2|
|i386, i486, i586, CMOV, NOPL, i686, PAE, NX bit, CMPXCHG16B, AMD-V, RVI, ABM, and 64-bit LAHF/SAHF|
|BMI1, AES-NI, CLMUL, and F16C||N/A|
|AVIC, BMI2 and RDRAND||N/A|
|ADX, SHA, RDSEED, SMAP, SMEP, XSAVEC, XSAVES, XRSTORS, CLFLUSHOPT, and CLZERO||N/A||N/A|
|WBNOINVD, CLWB, RDPID, RDPRU, and MCOMMIT||N/A||N/A|
|FPUs per core||1||0.5||1||1||0.5||1|
|Pipes per FPU||2||2|
|FPU pipe width||128-bit||256-bit||80-bit||128-bit|
|CPU instruction set SIMD level||SSE4a[e]||AVX||AVX2||SSSE3||AVX||AVX2|
|FMA4, LWP, TBM, and XOP||N/A||N/A||N/A||N/A|
|L1 data cache per core (KiB)||64||16||32||32|
|L1 data cache associativity (ways)||2||4||8||8|
|L1 instruction caches per core||1||0.5||1||1||0.5||1|
|Max APU total L1 instruction cache (KiB)||256||128||192||256||512||64||128||96||128|
|L1 instruction cache associativity (ways)||2||3||4||8||16||2||3||4|
|L2 caches per core||1||0.5||1||1||0.5||1|
|Max APU total L2 cache (MiB)||4||2||4||1||2||1|
|L2 cache associativity (ways)||16||8||16||8|
|APU total L3 cache (MiB)||N/A||4||8||16||N/A||4|
|APU L3 cache associativity (ways)||16||16|
|L3 cache scheme||Victim||N/A||Victim||Victim|
|Max stock DRAM support||DDR3-1866||DDR3-2133||DDR3-2133, DDR4-2400||DDR4-2400||DDR4-2933||DDR4-3200, LPDDR4-4266||DDR3L-1333||DDR3L-1600||DDR3L-1866||DDR3-1866, DDR4-2400||DDR4-2400|
|Max DRAM channels per APU||2||1||2|
|Max stock DRAM bandwidth (GB/s) per APU||29.866||34.132||38.400||46.932||68.256||?||10.666||12.800||14.933||19.200||38.400|
|GPU microarchitecture||TeraScale 2 (VLIW5)||TeraScale 3 (VLIW4)||GCN 2nd gen||GCN 3rd gen||GCN 5th gen||TeraScale 2 (VLIW5)||GCN 2nd gen||GCN 3rd gen||GCN 5th gen|
|GPU instruction set||TeraScale instruction set||GCN instruction set||TeraScale instruction set||GCN instruction set|
|Max stock GPU base clock (MHz)||600||800||844||866||1108||1250||1400||2100||2100||538||600||?||847||900||1200|
|Max stock GPU base GFLOPS[f]||480||614.4||648.1||886.7||1134.5||1760||1971.2||2150.4||?||86||?||?||?||345.6||460.8|
|3D engine[g]||Up to 400:20:8||Up to 384:24:6||Up to 512:32:8||Up to 704:44:16||Up to 512:32:8||80:8:4||128:8:4||Up to 192:?:?||Up to 192:?:?|
|Video decoder||UVD 3.0||UVD 4.2||UVD 6.0||VCN 1.0||VCN 2.1||VCN 2.2||UVD 3.0||UVD 4.0||UVD 4.2||UVD 6.0||UVD 6.3||VCN 1.0|
|Video encoder||N/A||VCE 1.0||VCE 2.0||VCE 3.1||N/A||VCE 2.0||VCE 3.1|
|AMD Fluid Motion|
|GPU power saving||PowerPlay||PowerTune||PowerPlay||PowerTune|
|PlayReady[h]||N/A||3.0 not yet||N/A||3.0 not yet|
|Supported displays[i]||2-3||2-4||3||3 (desktop)
4 (mobile, embedded)
AMD APUs have a unique architecture: they have AMD CPU modules, cache, and a discrete-class graphics processor, all on the same die using the same bus. This architecture allows for the use of graphics accelerators, such as OpenCL, with the integrated graphics processor. The goal is to create a "fully integrated" APU, which, according to AMD, will eventually feature 'heterogeneous cores' capable of processing both CPU and GPU work automatically, depending on the workload requirement.
The first generation APU, released in June 2011, was used in both desktops and laptops. It was based on the K10 architecture and built on a 32 nm process featuring two to four CPU cores on a thermal design power (TDP) of 65-100 W, and integrated graphics based on the Radeon HD6000 Series with support for DirectX 11, OpenGL 4.2 and OpenCL 1.2. In performance comparisons against the similarly priced Intel Core i3-2105, the Llano APU was criticised for its poor CPU performance and praised for its better GPU performance. AMD was later criticised for abandoning Socket FM1 after one generation.
The AMD Brazos platform was introduced on 4 January 2011, targeting the subnotebook, netbook and low power small form factor markets. It features the 9-watt AMD C-Series APU (codename: Ontario) for netbooks and low power devices as well as the 18-watt AMD E-Series APU (codename: Zacate) for mainstream and value notebooks, all-in-ones and small form factor desktops. Both APUs feature one or two Bobcat x86 cores and a Radeon Evergreen Series GPU with full DirectX11, DirectCompute and OpenCL support including UVD3 video acceleration for HD video including 1080p.
AMD expanded the Brazos platform on 5 June 2011 with the announcement of the 5.9-watt AMD Z-Series APU (codename: Desna) designed for the Tablet market. The Desna APU is based on the 9-watt Ontario APU. Energy savings were achieved by lowering the CPU, GPU and northbridge voltages, reducing the idle clocks of the CPU and GPU as well as introducing a hardware thermal control mode. A bidirectional turbo core mode was also introduced.
AMD announced the Brazos-T platform on 9 October 2012. It comprised the 4.5-watt AMD Z-Series APU (codenamed Hondo) and the A55T Fusion Controller Hub (FCH), designed for the tablet computer market. The Hondo APU is a redesign of the Desna APU. AMD lowered energy use by optimizing the APU and FCH for tablet computers.
The Deccan platform including Krishna and Wichita APUs were cancelled in 2011. AMD had originally planned to release them in the second half 2012.
The first iteration of the second generation platform, released in October 2012, brought improvements to CPU and GPU performance to both desktops and laptops. The platform features 2 to 4 Piledriver CPU cores built on a 32 nm process with a TDP between 65 W and 100 W, and a GPU based on the Radeon HD7000 Series with support for DirectX 11, OpenGL 4.2, and OpenCL 1.2. The Trinity APU was praised for the improvements to CPU performance compared to the Llano APU.
In January 2013 the Jaguar-based Kabini and Temash APUs were unveiled as the successors of the Bobcat-based Ontario, Zacate and Hondo APUs. The Kabini APU is aimed at the low-power, subnotebook, netbook, ultra-thin and small form factor markets, while the Temash APU is aimed at the tablet, ultra-low power and small form factor markets. The two to four Jaguar cores of the Kabini and Temash APUs feature numerous architectural improvements regarding power requirement and performance, such as support for newer x86-instructions, a higher IPC count, a CC6 power state mode and clock gating. Kabini and Temash are AMD's first, and also the first ever quad-core x86 based SoCs. The integrated Fusion Controller Hubs (FCH) for Kabini and Temash are codenamed "Yangtze" and "Salton", respectively. The Yangtze FCH features support for two USB 3.0 ports, two SATA 6 Gbit/s ports, as well as the xHCI 1.0 and SD/SDIO 3.0 protocols for SD-card support. Both chips feature DirectX 11.1-compliant GCN-based graphics as well as numerous HSA improvements. They were fabricated at a 28 nm process in an FT3 ball grid array package by Taiwan Semiconductor Manufacturing Company (TSMC), and were released on 23 May 2013.
The PlayStation 4 and Xbox One were revealed to both be powered by 8-core semi-custom Jaguar-derived APUs.
The third generation of the platform, codenamed Kaveri, was partly released on 14 January 2014. Kaveri contains up to four Steamroller CPU cores clocked to 3.9 GHz with a turbo mode of 4.1 GHz, up to a 512-core Graphics Core Next GPU, two decode units per module instead of one (which allows each core to decode four instructions per cycle instead of two), AMD TrueAudio, Mantle API, an on-chip ARM Cortex-A5 MPCore, and will release with a new socket, FM2+. Ian Cutress and Rahul Garg of Anandtech asserted that Kaveri represented the unified system-on-a-chip realization of AMD's acquisition of ATI. The performance of the 45 W A8-7600 Kaveri APU was found to be similar to that of the 100 W Richland part, leading to the claim that AMD made significant improvements in on-die graphics performance per watt; however, CPU performance was found to lag behind similarly specified Intel processors, a lag that was unlikely to be resolved in the Bulldozer family APUs. The A8-7600 component was delayed from a Q1 launch to an H1 launch because the Steamroller architecture components allegedly did not scale well at higher clock speeds.
AMD announced the release of the Kaveri APU for the mobile market on 4 June 2014 at Computex 2014, shortly after the accidental announcement on the AMD website on 26 May 2014. The announcement included components targeted at the standard voltage, low-voltage, and ultra-low voltage segments of the market. In early-access performance testing of a Kaveri prototype laptop, AnandTech found that the 35 W FX-7600P was competitive with the similarly priced 17 W Intel i7-4500U in synthetic CPU-focused benchmarks, and was significantly better than previous integrated GPU systems on GPU-focused benchmarks. Tom's Hardware reported the performance of the Kaveri FX-7600P against the 35 W Intel i7-4702MQ, finding that the i7-4702MQ was significantly better than the FX-7600P in synthetic CPU-focused benchmarks, whereas the FX-7600P was significantly better than the i7-4702MQ's Intel HD 4600 iGPU in the four games that could be tested in the time available to the team.