Mapping Spiking Neural Networks的论文汇总以及思考
首先感谢平台的支持,在SNN Mapping领域并非是我独自面临的挑战。去年研读相关资料后发现创新并非易事,毕竟优化过程往往遵循类似生物进化的规律,但当你亲手实现时才发现自己的成果远逊于他人的研究,因此决定暂时远离这一领域,但你知道这背后有诸多原因:实验环境复杂、仿真数据量大且算法调优空间丰富,与他人相比则稍显力不从心。确实感到沮丧
虽然繁琐的事务性工作虽令人望而却步,可作为一个总是老老实实地完成各项任务的人而言,我还是愿意挑大梁尽一份力,尽一份义务,毕竟责任重于泰山嘛!然而内心深处始终存有抵触情绪,如果当初及时行动调整状态的话,或许情况会有所不同吧!到了2021年依然未能摆脱困境,如今却依然深陷其中难以自拔!不再过多谈论其他事情,我的实话是我内心是拒绝的.我的回答依然是'那好吧'.无论如何今年决定从新开始,希望小伙伴们看到我写的不对的地方热心帮我指出来,不要让我走很多弯路哦.你们的支持与帮助实在给了我极大的动力去克服困难并不断进步!Thanks♪(・ω・)ノ
目前我看到关于Mapping SNN的有:
This study focuses on advancing algorithms that simulate spiking neural networks, which has seen significant progress with the introduction of manycore neuromorphic architectures. These specialized designs are crafted to emulate the parallel information processing mechanisms found in biological neural systems, thereby offering superior energy efficiency compared to conventional von Neumann architecture-based systems. The implementation of these sophisticated computational frameworks into practical applications is anticipated to pave the way for novel opportunities across domains including artificial intelligence and computational neuroscience.
该研究团队将跳线神经网络映射到一个多核心类脑架构上
Efficient Mapping of Spiking Neural Networks into the NOn-Chip architecture
- A method to configure an Across-layer event-driven neural network into the Network-on-Chip architecture.
好像都是我关注的博主“嘀嗒一声小刺猬”,感谢他。
第五项Mapping Spiking Neural Networks to Neuromorphic Hardware
第五项[Mapping Spiking Neural Networks to Neuromorphic Hardware](https://ieeexplore.ieee.org/document/8913677 "Mapping Spiking Neural Networks to Neuromorphic Hardware"])
A. Balaji et al., "Mapping Spiking Neural Networks to Neuromorphic Hardware," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems , vol. 28, no. 1, pp. 76-86, Jan. 2020, doi: 10.1109/TVLSI.2019.2951493.
Neuromorphic hardware realizes biological neurons and synapses to operate a spiking neural network (SNN)-based machine learning system. We introduce SpiNeMap, a design methodology aimed at optimizing SNN mappings onto crossbar-based neuromorphic hardware by minimizing spike latency and energy expenditure. The SpiNeMap methodology comprises two sequential stages: SpiNeCluster and SpiNePlacer. In the first stage, SpiNeCluster employs a heuristic clustering technique to partition an SNN into clusters of local synapses. Synapses within each cluster are mapped within dedicated crossbars of the hardware, while inter-cluster global synapses are distributed across shared interconnects. A key objective of this clustering phase is to reduce the number of spikes on global synapses, thereby alleviating spike congestion and enhancing system performance. The second stage, SpiNePlacer, utilizes a metaheuristic approach to determine the optimal placement of both local and global synapses on the hardware. This placement optimization focuses on minimizing energy consumption and spike latencies. Through comprehensive evaluation using synthetic and realistic SNN datasets on state-of-the-art neuromorphic platforms, we demonstrate that our proposed SpiNeMap method achieves significant performance improvements compared to existing techniques. Specifically, we observe a 45% reduction in average energy consumption coupled with a 21% improvement in spike latency performance compared to current best-performing SNN mapping methods.
Mapping spiking neural networks to neuromorphic hardware during runtime
Balaji et al. conducted a comprehensive study on the real-time mapping of spiking neural networks to hardware designed for neuromorphic computing and published their findings in the Journal of Signal Processing Systems in 2020
Balaji A et al. Real-time mapping of spiking neural networks onto neuromorphic hardware[J]. arXiv e-prints, 2020.
Neuromorphic 架构基于生物神经元和突触实现能够执行使用 spikes 的机器学习算法以及生物启发式的学习算法。这些架构在能源效率方面具有优势,并因此适合在资源受限及功耗受限环境下的认知信息处理应用中使用;此类环境包括物联网(IoT)中的传感器节点及边缘设备。为了将一个 spike 神经网络(SNN)映射到神经架构上,在先期工作中已提出基于设计时间的方法:首先利用代表性数据对 SNN 进行Offline 分析然后将其映射至硬件上以优化某些目标函数如减少 spike 通信或最大化资源利用率等;在许多新兴应用中机器学习模型会根据输入数据进行在线学习从而导致新的连接形成或现有连接消失;因此已映射至神经架构上的 SNN 可能需要重新映射以确保最佳性能表现;然而由于计算时间较长基于设计时间的方法并不适合实时在线每轮学习后重新映射机器学习模型的任务;本文提出了一种实时运行时内动态划分及映射 SNN 应用到神经架构的设计方法论;该方法论分为两个步骤:第一步采用分层贪心策略将 SNN 划分为若干神经元及突触集群并考虑架构约束条件;第二步采用 hill-climbing 最优化算法最小化集群间通信量从而提高架构共享互连线上的能源消耗效率;为了评估我们算法的有效性我们进行了实验研究并使用合成型及真实型 SNN 应用进行了测试结果表明我们的算法较现有的基于设计时间的分割方法平均减少了 780 倍的映射时间(后者仅降低约 6.25% 的解决方案质量)
To map a spiking neural network (SNN) to a neuromorphic architecture, existing research has developed time-based design strategies where SNNs are analyzed comprehensively using representative datasets and then systematically mapped onto hardware to optimize specific objectives such as minimizing spike communication costs or enhancing resource utilization. In dynamic machine learning applications, models often adapt in real-time through online learning mechanisms. During runtime, new connections within the SNN may dynamically form or prune based on input activity levels, necessitating re-mapping the already deployed SNN onto neuromorphic hardware to maintain optimal performance. However, due to inherent computational delays, conventional time-based approaches are impractical for real-time re-mapping following each training epoch.
方案:该设计方法按步骤分为两步——首先采用分层贪心算法将SNNs按神经元与突触划分至集群中并考虑仿生架构的限制条件;其次应用优化算法以最小化集群间传递的总神经冲动从而提升架构共享互连的能量消耗效率
Our algorithm showcases its effectiveness by cutting down on SNN mapping time by an average of 780 times compared to the leading-edge design-time based SNN partitioning approach, which still maintains a 6.25% reduction in solution quality.
交叉棒NOC为何采用
常见的映射方法:在将SNN映射到这些架构时,默认的做法是将SNN中的神经元和突触划分为集群,并将这些集群映射到交叉棒上,在优化硬件性能方面尤其注重减少交叉棒间传递的神经冲动数量(spikes),从而降低了能量消耗。
Prior methods to partition and map an SNN to neuromorphic hardware, such as PSOPART [16], SpiNeMap [6], PyCARL [4], NEUTRAMS [25] and DFSynthesizer [42] are design-time approaches that require significant exploration time to generate a good solution. Although suitable for mapping supervised machine learning models, these approaches cannot be used at run-time to remap SNNs frequently.
DFSynthesizer: S. Song et al., Mapping event-driven neural networks onto biologically inspired computing platforms, In proceedings of LCTES, 2020.
For online learning purposes, we present a method to establish runtime period-wise correspondence between SNNs and crossbar-type neuromorphic hardware. This methodology unfolds in two phases: initially, we conduct intelligent grouping of neurons within the SNN architecture; subsequently, by employing hill-climbing optimization techniques (HCO), we minimize inter-chip spike communication effectively.
However, IoT传感器收集的数据随着时间不断演变,并可能与用于训练神经网络模型的代表性数据不具相似性. 这种输入数据与离线训练模型之间的关系变化被称作_概念漂移_[23]. 最终, 概念漂移会导致模型随着时间推移预测精度下降, 并影响其质量. 因此, 定期使用自适应学习算法重新训练模型以利用最近的数据变得必要.
Design-time decisions for a supervised SNN are established prior to deploying the trained model. However, when a model undergoes retraining in an online learning scenario, two primary changes occur: (1) new synaptic connections may emerge while existing ones may be pruned as the model learns new events, and (2) synaptic weights experience continuous adjustments following each learning epoch. Maintaining optimal hardware performance necessitates a runtime optimization strategy that periodically reconfigures the SNN architecture to match hardware capabilities after each learning epoch.
自适应算法:定期更新模型以实现性能提升,并采用自适应学习算法进行训练。包括以下几种算法:迁移学习[38]、持续学习[43]以及深度强化学习等技术手段。
Contributions:
Following are our key contributions.
We present an algorithm to divide and implement online learning SNNs within the runtime period of neuromorphic hardware in the context of IoT applications.
We validate the effectiveness and efficacy of our approach in online mapping by comparing the exploration time with the total number of spikes communicated between the crossbars against a state-of-the-art design-time benchmark.

The architecture of a SNN hardware is characterized by a method of interconnecting pre- and post-synaptic neurons via synapses in a spiking neural network, b a crossbar organization that ensures complete interconnectivity between pre- and post-synaptic neurons, and c a modern neuromorphic hardware system incorporating multiple crossbars with a time-multiplexed interconnect for enhanced functionality.
一种二维排列的结构即为交叉神经网络(crossbar),其中包含了(n_²) 个突触(synapse),对应(n)个神经元(neurons)。图1b展示了单个交叉神经网络及其连接模式:包含(n)个预前向神经元与(n)个后向神经元。通过突触元素实现连接。由于交叉神经网络的规模(n)受到限制(小于512),因为随着规模扩大,动态能耗与漏电能耗将呈指数级增长;因此,在构建大规模神经形态硬件时,通常会采用共享互连技术集成多个交叉神经网络。

These partitioning methods target finding the optimal hardware performance, which requires considerable time and makes them unsuitable for partitioning and remapping online learning SNNs.
[44]Wen, W., Wu, C. R., Hu, X., Liu, B., Ho, T. Y., Li, X., & Chen, Y. (2015). An eda framework for large scale hybrid neuromorphic computing systems. In 2015 52Nd ACM/EDAC/IEEE design automation conference (DAC) (pp. 1–6): IEEE.
An experimental team led by Wijesinghe et al. investigated a novel approach to constructing an all-memristor deep spiking neural computing system, establishing it as a pivotal milestone toward realizing the potential of energy-efficient probabilistic brain functionality.
Xia, Q., & Yang, J. J. (2019). Memristive crossbar arrays are a promising platform for biologically inspired computation. Nature Materials, 18(4), 309.
44,45,46的论文侧重点是提高crossbar的性能
其他的Mapping工具:
PSOPART :2018
16
16
Das et al. (2018) 在 DATE 大会上报告了他们在亚可编程神经元电路硬件上的研究进展。
NEUTRAMS :2016
25
SpiNeMap :也是作者自己整的 2019
6
DFSynthesizer:作者写的【42】实际上是错的,应该是 : 2020
S. Song et al., "Implementing spiking neural networks in neuromorphic hardware," in LCTES, 2020.
作者提出要实用pyCARl 也是作者自己整的 2020
PyCARL[4]支持基于SNN的应用硬件软件协同仿真。该框架提供分析与优化功能,以实现针对SNN的应用划分与映射在精确时序模型的神经流处理器架构上进行优化研究。
The authors present Pycarl as a PyNN-based interface module designed to facilitate hardware-software co-simulation frameworks for spiking neural networks in the 2020 International Joint Conference on Neural Networks (IJCNN).
实时映射的方法有哪些?:这个组的自引非常多,几乎可以看出都是一个组相互引。
多种方法已被提出用于多处理器系统中的任务映射问题研究。
其中一种方法是基于启发式的运行时管理器,被提议用于多处理器系统的资源分配与能量效率优化。
另外一种方法则采用了遗传算法的思想,并结合动态电压缩放技术,旨在提升实时任务调度效率的同时减少系统功耗。
此外还有一种基于负载感知的任务调度算法被设计用于多处理器架构下动态负载均衡。
这些研究大多集中在设计阶段提出了各自的解决方案。
然而目前关于深度神经网络(SNN)在类脑智能硬件上的实时映射问题尚无有效的方法可用。
根据现有文献可知目前的研究主要集中在静态映射方案上.
因此本研究提出了一种全新的基于运行时的动态映射策略,其相较于现有的设计阶段方案具有显著的执行效率优势。
该方法通过减少时间分频总线上的通信开销来实现能耗优化.
11
12
12
12
13
13
14
14
20
20
30
30
算法详细过程:
The network model was constructed using a digraph structure, where each connection corresponds to a synapse with weights equal to the total number of spikes communicated between SNN neurons. The input required by the mapping algorithm includes neurons involved in computation (A), along with their spike communication rates across each synapse and crossbar size (k). The algorithm proceeds in two distinct phases, as illustrated in Figure 3.

Mapping of online learning SNN on Neuromorphic Hardware.

有向图是一个directed graph,在每个边中代表的是synapse的连接关系;权重是两个SNN神经元之间传递的总突触数量。

The figure demonstrates a method for dividing an SNN comprising six neurons into three distinct sublist groups, as shown in Figure 4. The communication patterns between individual neurons are visualized through synaptic connections. Initially, as outlined in Section 3.1, we organize the input neuron list into sublist groups (Section 3.1), ensuring each sublist corresponds to an available crossbar structure for efficient mapping. Subsequently, as detailed in Section 3.2, we minimize the number of synaptic connections between sublist groups by relocating specific neuron clusters within these groups (highlighted in blue).
Building Sub-lists
Algorithm 1 outlines a greedy partitioning strategy. With the goal of partitioning the input neuron list into s sublists, where s represents the total number of crossbars in a given design. The length of each sublist is determined by allocating k crossbars per target hardware. A margin variable (line 3) tracks available neuron slots within each sublist. The average spike count per crossbar is calculated based on total spikes communicated in SNN applications. A cost function (Algorithm 2) calculates communication costs, measured in total spikes, between each pair of sublists.
The algorithm runs through all neurons designated as (n, i) within input list (A), adjusting slots within each current sublist (line 8). Neurons are added sequentially until either: (1) a total of k neurons have been processed in that sublist; or (2) a threshold cost—in terms of spike counts—is exceeded, along with availability of additional slots determined by (m, a, r, g, i, n), ensuring sufficient capacity for further processing. Once either condition has been fulfilled, validation of this sublist occurs and its boundary is recorded. Upon successful validation of all but final sublist—the penultimate sublist—the overall process concludes, as all remaining boundaries are already established at position (n) within list. List p contains these established boundaries.
Local Search
该方案基于Algorithm-1实现较为简单但效果欠佳。尽管从Algorithm-1得到的每个子列表_s_都能满足成本标准但这些子列表中成本分布可能存在不均衡的情况。为了寻求更优解我们通过执行多次局部搜索来平衡各子列表的成本这通常采用hill-climbing优化技术通过对子列表逐个迭代并移动其边界来实现
Algorithm 3 outlines the hill-climbing optimization technique. This technique relies on a cost function (line 2) to compute and evaluate solutions. The cost function used in this optimization process is detailed in Algorithm 2. This function calculates the maximum cost, measured as the number of spikes, for a selected sublist. The optimal solution should have the lowest possible cost. The algorithm proceeds by systematically examining each sub-list to identify the optimal solution with the lowest associated cost. It does this by moving the boundary of each sub-list one position either left or right. Each neuron (n_i) within a sub-list is then shifted across this boundary into adjacent sublists, and their respective costs are recalculated. The algorithm selects solutions based on those with local minimum costs. This process is repeated for every neuron in list A until all sublists with minimum costs have been identified.
Evaluation
Simulation Environment
The experiments are carried out on a system that includes 8 CPUs, equipped with 32 gigabytes of RAM and a NVIDIA Tesla GPU, running Ubuntu 16.04.
CARLsim [10] : A GPU accelerated simulator used to train and test SNN-based applications. CARLsim reports spike times for every synapse in the SNN.
DYNAP-SE [36]: Our approach is evaluated using the DYNAP-SE model, with 256-neuron crossbars interconnected using a NoC. [47].
Evaluated Applications
In order to evaluate the online mapping algorithm, we utilize two synthetic and two realistic SNN-based applications. The synthetic applications are marked with an 'S_' followed by the number of neurons in each application. The realistic applications employed include edge detection (EdgeDet) and MLP-based digit recognition, known as MLP-MNIST. The table 2 also provides details on synapses (column 3), topology (column 4), and spike counts for each application, which were derived from simulations conducted using CARLsim [[...]].

Evaluated Design-time vs run-time Approach
To evaluate our proposed runtime approach's performance, we selected a state-of-the-art design-time benchmark as our baseline. Both algorithms were configured with a crossbar size of 256 (k=256). In this study, we contrast various approaches:
PSOPART [16]: The PSOPART approach is a design-time partitioning technique that uses and instance of particle swarm optimization (PSO) to minimize the number of spikes communicated on the time-multiplexed interconnect.
HCO-Partitioning : 该方法采用分层逐步划分策略,并结合贪心划分阶段后跟基于HCO的局部搜索优化阶段以减少交叉电极间传递的尖峰数量。
Results
According to Table 3, we present a comprehensive analysis of the execution times (in seconds) for both design-time and run-time mapping algorithms across synthetic and realistic applications. The following two key findings are noteworthy. The first finding is that our HCO partitioning algorithm operates significantly faster than the PSOPART algorithm, achieving an average execution time reduction of 780 times. The second finding highlights that this substantial improvement in runtime—maintained below a mere 50 seconds—permits enabling real-time adjustments by re-mapping online learning SNNs onto edge devices prior to initiating the next training epoch.

Figure 5 demonstrates the lifetime of an online learning application relative to the execution time per training epoch (t) and the HCO partitioning algorithm (h). The execution time for this partitioning algorithm must be significantly less than the period between epochs. Achieving this is done through the HCO-partitioning algorithm, which operates at a significantly higher efficiency rate compared to existing state-of-the-art design-time methods (780x faster).
注意:如果发现有误,请告知作者修改方案或补充说明

PSOPART传的spikes数目更少
In the present paper, we introduce an innovative algorithm designed for remapping online learning SNNs onto neuromorphic hardware. The algorithm's runtime mapping process is executed in two sequential stages: (1) implementing a layered greedy partitioning of SNN neurons, and (2) enhancing this partitioning through hill-climbing optimization techniques aimed at minimizing spike communication between crossbars. We address the feasibility issue of applying state-of-the-art design-time approaches during runtime remapping. Through comprehensive evaluation using both synthetic and realistic SNN applications, our algorithm achieves an average 780x reduction in runtime mapping time compared to state-of-the-art design-time approaches while maintaining only a negligible 6.25% performance loss.
Discussion
In this section, we explore the scalability of our approach. Each execution of Algorithm-1 carries out fundamental mathematical operations. The hill-climbing algorithm calculates up to 2x(s-2) solutions and then compares them to identify the minimal cost among all solutions. In this scenario, the codomain of the cost function consists of well-ordered positive integers. The cost function itself scales linearly with respect to n, though it's important to note that hill-climb optimization ceases once only the local minimum solution has been evaluated. Therefore, from a strategic standpoint, it's advantageous to minimize the number of times the cost function needs to be evaluated.
结论是损失了6.25%的spikes数量,但是速度快了780倍。

Academic achievements highlight Adarsha Balaji's receipt of a Bachelor's degree in 2012 from Visvesvaraya Technological University, located in India, alongside the award of a Master's degree in 2017 from Drexel University, situated in Philadelphia, PA. Currently pursuing his Ph.D., Adarsha Balaji has been making significant academic strides within the Department of Electrical Engineering since joining the academic community.
His research is centered around the field of Computer Engineering at Drexel University in Philadelphia, PA. He specializes in the design of neuromorphic computing systems with a particular emphasis on data flow and power efficiency optimization in spiking neural networks (SNN) hardware design.
印度韦斯科技大学(VTU)Visvesvaraya Technological University是印度25强大学之一。从成立以来,她致力于领先前沿科技研究、培养科技创新高级领导人才的一流的高等学府。印度大学排名位列第15名。德雷塞尔大学 (Drexel University),简称DU,建立于1891年,是一所一流的四年制综合性私立大学,坐落在美国东海岸宾夕法尼亚州最大的城市费城,并在加州首府萨克拉门托设有分校,被称为“费城三大名校”之一(另两所为 宾夕法尼亚大学和天普大学)。费城是全美第五大城市,也是仅次于纽约、洛杉矶和芝加哥的全美第四大都会区。2020年QS美国大学排名第59名。
7. Compiling Spiking Neural Networks to Neuromorphic Hardware
S. Song et al., “Converting spiking neural networks into neuromorphic hardware,” in LCTES, 2020.
Shihao Song et al., 2020
Machine learning applications implemented using spike-based computation models, such as Spiking Neural Networks (SNNs), offer significant potential to reduce energy consumption when executed on neuromorphic hardware. However, compiling and mapping SNNs to such hardware presents challenges, particularly when shared compute and storage resources (e.g., crossbars) must accommodate both neurons and synapses. To address these limitations, we present an approach for analyzing and implementing SNNs on resource-constrained neuromorphic platforms while ensuring key performance metrics like execution time and throughput are maintained. Our method delivers three primary contributions: first, a greedy algorithm for partitioning SNNs into neuron/synapse clusters that fit within crossbar resources; second, leveraging the semantic richness of Synchronous Dataflow Graphs (SDFGs) to model clustered networks and assess their performance through Max-Plus Algebra, considering available computational capacity, buffer sizes, and communication bandwidth; third, self-timed execution-based techniques for compiling and dynamically adapting SNN-based applications to neuromorphic hardware in real-time. The evaluation of our approach was conducted using standard SNN-based applications across various neuromorphic platforms, yielding results that significantly outperform current implementations.
SNEAP: A High-Speed and Highly Efficient Workflow for the Implementation of Massive-Scale SNN onto Neuromorphic Platform Based on NoC.
Li S, Guo S, Zhang L et al. SNEAP: A High-Speed and Energy-Efficient Toolchain for the Mapping of Large-Scale Spiking Neural Networks onto NoC-Based Neuromorphic Platforms[J]. 2020.
Spiking neural network (SNN), as the third generation of artificial neural networks, has been widely adopted in vision and audio tasks. Nowadays, many neuromorphic platforms support SNN simulation and adopt Network-on-Chips (NoC) architecture for multi-cores interconnection. However, interconnection brings huge area overhead to the platform. Moreover, run-time communication on the interconnection has a significant effect on the total power consumption and performance of the platform. In this paper, we propose a toolchain called SNEAP for mapping SNNs to neuromorphic platforms with multi-cores, which aims to reduce the energy and latency brought by spike communication on the interconnection. SNEAP includes two key steps: partitioning the SNN to reduce the spikes communicated between partitions, and mapping the partitions of SNN to the NoC to reduce average hop of spikes under the constraint of hardware resources. SNEAP can reduce more spikes communicated on the interconnection of NoC and spend less time than other toolchains in the partitioning phase. Moreover, the average hop of spikes is reduced more by SNEAP within a time period, which effectively reduces the energy and latency on the NoC-based neuromorphic platform. The experimental results show that SNEAP can achieve 418x reduction in end-to-end execution time, and reduce energy consumption and spike latency, on average, by 23% and 51% respectively, compared with SpiNeMap.
A Reservoir Computing Approach to Application Mapping onto Network-on-Chip-based Neuromorphic Platforms
Li Shou, Wang Ling, Wang Shou... Liquid State Machines' Application Mapping in NoC-Based Neuromorphic Platforms[M]// Advanced Computer Architecture, th Conference on Computer Architecture (ACA), Kunming, China, Year of Publication: 2020. Proceedings.
Liquid State Machine (LSM) 是一种包含回环连接的突触神经网络(SNN)中的典型代表。目前,在视觉与音频任务处理方面,LSM广泛部署于各种可编程神经架构平台上。这些平台采用网络在芯片上(NoC)架构以实现多核间的互联。然而,在液态状态机的回环部分产生的大量通信数据对平台的整体性能产生了显著影响。为此,在本文中我们提出了一种基于工具链 SNEAP 的 LSM 映射方法旨在减少 spike 通信带来的能耗与延迟问题。该方法主要包含两个关键步骤:一是将 LSM 进行划分以减少各划分部分之间的 spike 通信量二是将划分后的 LSM 部分映射至 NoC 平台以在硬件资源受限的情况下减少 spike 的平均跳数在此方法下对大规模 LSM 的处理同样适用性良好实验结果表明我们的方法能够在端到端执行时间上实现 1.5 倍的缩减同时将平均能耗降低至 57%并在 8x8 二维网格 NoC 平台上平均能耗降低 57%并在 4x4 二维网格 NoC 平台上平均 spike 跳数降低 23%与 SpiNeMap 方法相比
Comprehensive End-to-End Realization of Diverse Hybrid Neural Network Architectures on a Cross-Paradigm Neuromorphic Chip
Wang G et al.采用全面集成的方法实现了多种混合型神经网络在交叉架构类神经形态芯片上的端到端部署[J]. Frontiers in Neuroscience, 2021, 15:615279.
机构 :
精密仪器系、脑智能计算研究中心(CBICR)、未来芯片创新中心、光记忆国家工程研究中心、清华大学(北京)
The integration of computer-science-focused artificial neural networks (ANNs) with neuroscience-oriented spiking neural networks (SNNs) has become a highly promising approach for advancing artificial intelligence through complementary strengths. Such integration requires supporting individual modeling for ANNs and SNNs, along with their hybrid configurations. This includes simultaneously calculating networks within a single paradigm while converting between different information representations. Despite the advancements in dedicated hardware platforms, realizing efficient computation and signal conversion remains challenging. To address these challenges, we propose an end-to-end mapping framework designed for implementing various hybrid neural networks on many-core neuromorphic architectures based on the cross-paradigm Tianjic chip. We develop hardware configuration schemes for four common signal conversion methods and establish a global timing synchronization mechanism across various heterogeneous components. Experimental results demonstrate that our framework can implement these hybrid models with low execution latency, low power consumption, and minimal accuracy loss. This research provides a novel approach for developing hybrid neural network models tailored to brain-inspired computing chips, thereby unlocking the full potential of these models.

Figure 1. Illustration of the Tianjic chip architecture: (A) fine-grained configurable operation modules; (B) unified communication format; (C) adjustable timing schedule.
Fine-Grained Configurable Operation Modules
The functional core (FCore) represents the essential component of the Tianjic chip, comprising four key components: an axon dedicated to input organization, a dendrite equipped with synapses for integration operations, a soma responsible for non-linear neuronal transformations, and a router designed for activation transmission (Figure 1A). Each module is programmable to operate in distinct modes or execute specific tasks, thereby enabling the chip to support both artificial neural networks (ANN) and spiking neural networks (SNN). Among these modules, the dendrite and soma are primarily dedicated to computational tasks. The dendrite, augmented with synapse memory, forms a 256 × 256 virtual crossbar structure capable of performing a range of vector and matrix computations. Table 1 enumerates the vector-matrix multiplication (VMM), vector-vector accumulation (VVA), and vector buffering (VB) operations utilized in this study.

Table 1. Integration and transformation operations in Tianjic.
