Cell processor to power Wii successor?

File it under rumours, but the latest in a slow trickle of supposed leaks is that the Wii’s successor will use the Cell processor. Jain Menon, a chief technical officer at IBM, had the following to say:

IBM's Cell microprocessor

Don't be fooled by its size; the Cell processor means serious business.

We want to stay in the [console] business, we intend to stay in the business. I think you’ll see [Cell] integrated into our future Power road map. That’s the way to think about it as opposed to a separate line – it’ll just get integrated into the next line of things that we do. We’re working with all of the game folks to provide our capabilities into those next-generation machines.

If the business speak is too much for you, he’s basically saying that IBM intends to use the Cell processor in future consoles. What’s the big hullabaloo about this CPU, then? Well, it’s the beast that powers the PlayStation 3, the most powerful console of this generation. If Nintendo was to use it – or rather, an updated version – in their next console, we would have one very powerful machine on our hands. It may be a little unusual for Nintendo to put effort into the hardware for once, but given their recent change of heart with the 3DS, which produces graphics mistakable for an Xbox 360 without breaking into a sweat, it’s quite likely that they’ll continue this in their home console market, seeing how many third-parties refuse to develop for the Wii due to its lack of horsepower.

Source: Nintendo Life

We can deliver all the latest Wii U news straight to your inbox every morning. Want in?

Comments, Reactions, and General Hooliganism

  1. I see this as slightly disheartening, yet I’m getting really excited. Since the PS3 uses multiple proccessors, the Wii may simply add more processors and call it a day. However, I’m afraid the move to a (now) couple year old technology may be Nintendo’s style, this may be the first time they copy hardware from another company. Still it’s pretty powerful, and I’m actually starting to get excited for the Wii successor!

  2. Zyblorg

    I think #1 raises a certain point: Nintendo has always been about doing things “Nintendo’s Way.” Copying the tech of another console just doesn’t seem like their usual style, and I feel their philosophy will get in the way of them accepting this.

    • F0

      I thought about that, too; it would be unusual indeed for them to use other consoles’ tech (it’s more of a Microsoft thing to do). But the 3DS has already proven that they’re willing to change it up a little; when’s the last time they put graphics hardware first so boldly and proudly?

    • Wolvesgod

      They put graphic first in Gamecube, and wasn’t able to put a lot behind it.

  3. wiiboy101

    seems a desperate comment to me like they know there out and ARM or a arm like marvell has the contract so to speak,it seems there saying there trying to hang on to nintendo etc

    there talking of cell being used in powerpc ps3 has no powerpc nor does x360 they use very very basic inline cores called powerpe not pc the only console with a powerPC chip is wii as it uses a real powerpc 750 with out of order exercution and branch prediction things x360 and ps3 lack

    maybe a broadway with loads of cell like extra co-processors and a big speed bump would make a better chip than cell is now and much smaller and cooler add a big cacthe memory it could be great

    but it all seems a bit weird to me ARM or even a ati fusion seems more likely to me unless IBM are so desperate there offering the world at a bargin price

  4. Zyblorg

    Well I know Nintendo intends on starting sales of the next home console at a higher price. They saw the negative impact on their end for selliing the Wii at such a low price-point so early on, and they don’t want to repeat that.

    • Wolvesgod

      Nintendo was makeing a $6 operating profit, from when they first release the Wii.

  5. wiiboy101

    iv found a direct replacment the wii cpu its clearly evolved over the powerpc 750cl/gekko/broadway cpus.

    POWERPC 476FP 32 bit g3/g4 type cpu with gekko/broadway like 2×32 bit sims fpu its a NEXT GEN wii family cpu..

    its built at 45nm its only 3.6mm in size and uses only 1.6 watts at 1.6GHZ clockspeed it is a new SOC system on chip version of broadway cpu and has a custom dual bus 128bit and can house 1 to 16 cpus on one chip conbined with other processors like gpus…it can execute 5 instructions per clock gekko was 2 broadway was 3 if i remember correctly…THIS IS 100% A NEXT GEN powerpc 750 cl/broadway family cpu

    powerpc 476fp go google it im telling you now iv found how wii 2 will be done…4 x 476fp cpus at 1.6ghz would take up no more space than broadway does now…

  6. wiiboy101

    wii 2 cpu = powerpc 476fp type cpu or even a bunch of them wiis cpu family is now multi cpu on same chip ready and system on chip ready and has a custom 128bit bus and second custom bus for catch snooping its looking a hot system on chip choice

    1 to 16 of these broadway 2s so to call them can fit on a single 45nm chip im hoping wii 2 comes late so they can go 22nm and put more in there

    powerpc 476fp 1.6 watts at 1.6ghz vs broadway 3.2 wats at 729mhz

    i think a broadway version will be in wii 2 its the same cpu family directly

  7. wiiboy101

    *

    Embedded Symmetric MultiProcessing system on a SoC with 1.6GHz PowerPC IP in 45nm

    By Gerard Boudon, IBM Microelectronics

    Abstract:

    Because the dimensions of lithography are now closer to the fundamental physical limits, scaling is more and more difficult and thus multi-core processor solutions are just starting to be more popular in the embedded area. This paper describes in details the features that allow SoCs to be built with up to eight 1.6 GHz PowerPC CPU cores in an embedded system supporting Symmetric Multiprocessing (SMP) architecture. The balancing between CPU execution speed, memory bandwidth and latency, and coherency overhead has been the objective of the design of the PLB6 and the L2 Cache IP’s, to reduce as much as possible the drop-off in performance-per-core inherent in an SMP approach.

    1. Introduction

    In September 2009, [1] IBM has introduced an 1.6 GHz PowerPC CPU IP in 45nm SOI – 3.6mm2 size – that can be integrated in multi-core system-on-chip (SoC) product families for communication, storage, consumer, and aerospace and defense embedded applications.

    2. PowerPC476 IP

    The PowerPC 476FP embedded processor core is a 5- issue, 5-pipeline, superscalar, 32-bit reduced instruction set computer (RISC) processor. The core supports the Power Instruction Set Architecture V2.05. The core also supports memory coherency to broaden ASIC solutions into multiprocessing system environments and to increase its scalability.

    The overall organization of the processor core is shown on Figure 1 .

    Instruction Path

    The PowerPC 476FP processor is a high performance core with capability to issue up to 5-instructions per cycle. These instructions can feed in parallel the following five fixed point units as well as the separate floating-point (FP) pipeline:

    * Branch pipeline
    * Load and Store operations
    * Simple arithmetic and logical operations
    * Simple and complex instruction pipeline,
    * Multiplication and division pipeline

    L1 Cache

    The L1 32 KB Instruction and L1 32 KB Data caches are two-cycle pipelined cache accesses, with index of real address in the 1024-entry unified translation lookaside buffer (UTLB)

    The L1 Cache Address and Data Caches are snoopable. Early delivery of instructions to the floating-point unit is enabled because all instructions are predecoded

    Floating-Point Unit

    The floating-point unit (FPU) is a pipelined, doubleprecision math computation processing unit that is attached to the processor core. The FPU conforms to the IEEE Standard for Binary Floating-Point Arithmetic. The FPU is a Six-stage super-pipelined floating-point arithmetic execution with independent floating-point load-and-store and execution units

    Figure 1: PowerPC476 CPU core block diagram

    Power Saving

    The PPC476FP includes design features to minimize the operating power of the PowerPC 476FP

    * All latches are clock gated so that idle functions do not waste power.
    * All non executing and idle functions are disabled.
    * Static random access memory (SRAM) is partitioned so that only the required memory zone is enabled or selected.
    * Doze and idle sleep modes are available.
    * The central logic and the floating-point unit have separate clock enables

    3. L2 Cache

    The L2 Cache IP [2] can be configured in 256K, 512KB or 1 MB with a maximum of 4,096 entries. The L2 cache line is 128 byte and the cache is 4 way set associative. To support the high RAS (Reliability and Serviceability) requirement of the networking application, the L2 Cache arrays are protected by Parity and ECC bits.

    4. PLB6

    The 476FP “subsystem” which includes the PowerPC 476FP CPU core, the Level 2 cache/cache controller, is connected to other “subsystems” through the PLB6, the latest architectural extension of the CoreConnect local bus architecture. This structure enables SoC designers to easily and rapidly develop entire families of products, scaling the number of “master” cores from 1 to 16 (including 1 to 8 Coherent CPU cores) on the bus. The PLB4 CoreConnect internal Bus is a shared bus of 128bit data at a maximum speed one fourth of the CPU speed. It was designed for sub 1GHz CPU cores. High performance is achieved with a dual bus structure one bus with a high throughput and the second with low latency. Each of them is independently capable of handling read and write operations at the same time.

    The new IBM CoreConnect PLB6 bus looks more like a fabric with high speed point to point links, with each of them having 128 bit Read and 128 bit Write Data paths at one half of the CPU clock speed.

    The bus fabric on the PLB6 is capable of supporting up to 8 coherent master elements, giving SoC designers the flexibility to mix and match I/O masters, processors and other accelerators within the fabric.

    The high throughput of this bus is due to its fabric structure with up to eight slave segments, that can simultaneously receive or transmit Data.

    Each slave segment may have up to 4 slaves. It is possible, in 45nm technology and without any preplacement in silicon, to operate the bus structure at up to 800MHz.

    Symmetric Multiprocessing (SMP)

    In order to guaranty coherency between data in main memory and data in the various caches, the design of a conventional SMP system is following the MESI protocol:

    * M Modified
    * E Exclusive
    * S Shared
    * I Invalid

    These states are associated with each cache line (L2 for the PPC476). Each CPU performs snooping operation where these cache states are used. Notice that the cache in the SMP processors architecture must have the same Cache line size and the same MESI states.

    The performance of such coherent SMP system is limited by the fact that transactions are possible only between the Cache and the main memory. For example, when a CPU 1 wants to read a data that is in a M (Modified)

    state in the cache of a CPU2, the first operation is for the CPU2 to write the Data in the memory, and then the CPU1 can read it. Result; 2 operations with 2 memory access are needed for CPU1 to get the data.

    With the symmetric multiprocessing architecture, scaling up the number of processors, is efficient if at the same time the hardware coherency is smart enough to handle the huge bandwidth demand of the coherence transactions.

    It is necessary to have a non-blocking coherence resolution which prevents stopping CPU execution most of the time; In the PPC476 three additional states are introduced in the L2 cache in order to allow Cache to Cache transfer, and better Atomic operations.

    The Data transfer is eased by a dedicated path between different subunit called Intervention data path. These 3 states are:

    * MU Modified Unsolicited
    * T Tagged
    * SL Shared Last

    The purpose of intervention by a CPU Master is to reduce the latency needed to fetch a cache line when it is not present in its L2, but is present in other L2′s.

    The SL (Share Last) state is used for Intervention. It designate one (only) cache responsible to provide the Data after an intervention. As result of an intervention among L2′s, Cache to Cache transfer is done instead of Memory access after a L2 miss.

    5. Example of SoC implementation

    An example of system implementation is shown in figure 4. The PLB6 is mainly reserved for high speed access and for handling memory coherency due to the use of multiple CPU cores. The System Memory is also attached to the PLB6 because fast access to memory is very important for running code and provides data not already in cache. Due to high speed of the CPU it is necessary to provide data from main memory at a speed that only the late generation of DDR3-1600MHz SDRAM can give.

    A SoC requires also high speed I/O’s, that are attached here below a PLB6 to PLB4 interface. These I/O are commonly PCI Express with second generation 5gbps per port throughput. Legacy Ethernet is also mandatory because it is important at least to load code in the system. Other IP blocks such as USB or SATA can be connected through an AXI bus for example.

    6. Physical implementation

    SOI: In order to reduce power and electrical leakage, the choice of 45nm silicon-on-insulator (SOI) technology was made. SOI can provide up to a 30 % chip performance improvement and 40 % power reduction, compared to standard bulk silicon technology [3]. This technology is used by IBM in a wide range of application-specific integrated circuits (ASIC) and foundry clients as well as in chips for its servers and storage products.

    For performance and power dissipation optimization, the PowerPC476FP CPU IP has been designed in a hard core, while for flexibility in personalization, L2 cache and PLB6 are synthesizable; see layout of the PPC476 core on figure 5. The following table indicates various areas of the IP block necessary to build an SMP system.

    7. Design verification by emulation

    Functional verification of the PowerPC 476FP CPU, L2 cache and PLB6 bus complex made extensive use of hardware emulation through a custom multi-core FPGA based test board. In addition to greatly speeding up the design verification effort, this emulation platform has provided for Linux kernel and device driver configuration and testing, and it is providing for extensive early code development and benchmarking.

    8. Conclusions

    It has been agreed in the industry that the future of embedded system is multi core. The PPC476 includes 3 key IP cores; the 1.6GHz CPU, the L2 cache and the PLB6, which combined together, helps SoC designers to built embedded system with the highest performance. This performance is achieved with the coherence of data managed by hardware assist.

  8. F0

    Wowie. You seem very confident about the PPC476. Clearly, you’re far more of a techie than me, but from what I can understand, it would be 100% backwards-compatible with Dolphin and Broadway software while offering exponentially more power as a next-gen CPU should. Sounds pretty awesome, and I’m assuming it’s fairly low-cost, too, amirite?

    Thanks for the info; I’ll write a post about it.

  9. wiiboy101

    im not saying that exact chip im saying the wii family of 32bit chips clearly lives on and it allows both a next gen wii so to speak and a 100% backward compatibility…..

    the chip above is basically a shrunk high clock speed high bus speed 128 bit bus version of broadway give or take

    the dual fpu unit pipes are clearly as found on gekko the first ever cpu to do that on G3 chip then powerpc 750cl and broadway the only 3 designs with that 2x sims engine

    the above cpu clearly is a broadway base design including that 2x 32bit sims engine it might lack the gamecentric stuff and compression stuff broadway has but its the same direct family of processors THATS WHAT IM SAYING DUDE

    im just trying to help this blog get of the ground we shall see one day but 100% backward compatability and a smaller cooler design is right up nintendo street

    reminder nintendo stated wii successor to be 100% backward compatible and smaller and cooler and cheaper to build

    this very system on chip system and cpu family allows EXACTLY THAT

    they may even go DMP gpu and leave hollywood on a seperate chip with its 1t sram for backward compatibility the new dmp gpu would be on chip with the above type of design

  10. F0

    Thanks for all that. Seriously. It does seem like a logical choice for Nintendo to use this chip. I just don’t remember them ever confirming that the Wii’s successor will, in fact, be backwards-compatible with Wii games; link to the source of that would be nice.

    The only reason I haven’t written formally about it yet is because I’d like to save this material for when I relaunch this blog under its new domain, wii2blog.com, which should become available any day now…

  11. wiiboy101

    ok dude just trying to help,, neogaf (rollseyes) etc say the family of processors is over, iv proved otherwise,in ibm articles it states high performance and backward compatibility with legacy code meaning g3/g4 powrpc 32bit like macs gamecubes wiis etc

    if we are waiting a lot longer for wii 2 then maybe a all new cpu etc or even further shrinking of the above thats a lot of spec in a tiny area

    cell doesn’t make sense to me lol