



# Computational SRAM: Towards Efficient Near-Memory Computing through Tightly Coupled HW/SW Design

**Emanuele Valea**, Jean-Philippe Noel, Thaddée Bricout, Henri-Pierre Charles, Leo De La Fuente, Bastien Giraud, Maha Kooli, Benjamin Lacour, Manuel Pezzin, Maria Ramirez Corrales

2nd In-Memory Architectures and Computing Applications Workshop



### M. Kooli et al., "Towards a Truly Integrated Vector Processing Unit for Memory-bound Applications Based on a Cost-competitive Computational SRAM Design Solution", ACM JETC, Vol. 18, Issue 2, No. 40, 2022, pp. 1-26. J.-P. Noel et al., "A 35.6 TOPS/W/mm2 3-stage pipelined computational SRAM with adjustable form factor for highly data-centric applications", IEEE SSCL, Vol. 3, 2020, pp. 286–289. J.-P. Noel et al., "Computational SRAM Design Automation using Pushed-Rule Bitcells for Energy-Efficient Vector Processing", DATE Conference, 2020, pp. 1187–1192.

## Computational SRAM (C-SRAM)

What about programming and **SW compatibility**?

- The proposed NMC solution is compatible with any existing memory macros (off-the-shelf memory compilers, custom design)
- Coupled with a Vector Processing Unit (VPU) executing NMC instructions issued from the host CPU
- A specific C-SRAM ISA is supported
- In SoC architectures data RAM can be advantageously replaced by a C-SRAM

| Category   | Mnemonic                 | Description                                   |  |  |
|------------|--------------------------|-----------------------------------------------|--|--|
| Memory     | сору                     | Copy a line into another                      |  |  |
|            | bcast                    | Broadcast 8/16/32-bit value to the whole Line |  |  |
|            | hswap                    | Horizontal 32/64-bit word swap                |  |  |
| Logical    | slli, srli               | Shift Left or Right Logical Immediate         |  |  |
|            | (n)and, (n)or,<br>(n)xor | Logical AND, OR & XOR (and negation)          |  |  |
| Arithmetic | add, sub                 | Arithmetic 8/16/32-bit Addition & Subtractio  |  |  |
|            | mullo, mulhi             | Arithmetic 8-bit integer Multiply             |  |  |
|            | maclo                    | Arithmetic 8-bit integer Multiply-Accumulate  |  |  |

#### C-SRAM Instruction Set Architecture



#### C-SRAM Testchip





2

## HybroGen compilation toolchain (open-source)

- HybroGen provides a compilation flow from source C code to parallelized binary employing specific NMC instructions
- Dynamic compilation approach in order to adapt NMC scheduling to format and size of input data



M. Kooli et al., "Towards a Truly Integrated Vector Processing Unit for Memory-bound Applications Based on a Cost-competitive Computational SRAM Design Solution", ACM JETC, Vol. 18, Issue 2, No. 40, 2022, pp. 1-26.
K. Mambu et al., "Dedicated Instruction Set for Pattern-based Data Transfers: an Experimental Validation on Systems Containing In-Memory Computing Units," in IEEE TCAD

### **Come to the poster!**





A. Bartalah, "Despirition FAM Super-Advantace step Advances in Each in Energy-Efficient Inter-Parameter," IMM Conference, 2020, pp. 101–103.
B. Bartalah, "Despirition and the hyperby surgedy period states any efficient parameter FAM Interface", IMM Conference, 2020, pp. 101–103.

- We show some **preliminary results** on several applications:
  - Image processing
  - Artificial Intelligence
  - Cryptography (post-quantum)

|                     | Sobel Filter<br>(Edge Detection) | Matrix<br>Multiplication | Frame<br>Differencing | Convolution<br>(Tensor Flow) |
|---------------------|----------------------------------|--------------------------|-----------------------|------------------------------|
| Speed-Up            | ~x4.3                            | ~x4.9                    | ~x7.7                 | ~x3.5                        |
| Energy<br>Reduction | ~x8.3                            | ~x13                     | ~x10.3                | ~x4                          |

[5] Mambu et al. 2022 Towards Integration of a Dedicated Memory Controller and Its Instruction Set to Improve Performance of Systems Containing Computational SRAM.J. Low Power Electron. Appl., 12, 18.







# Thank you!

Emanuele Valea Email: <u>emanuele.valea@cea.fr</u>

