Advertisement

ARM嵌入式平台性能测试

阅读量:

在新项目开发过程中,当需要选择一个全新的基于ARM嵌入式处理器平台时,可能会遇到性能评估这一关键问题。针对这一问题,通常可以从以下几方面进行参考:首先,查看ARM厂家发布的性能指标,但这些数据多为理论值;其次,通过购买同样平台的开发板并移植应用进行实际测试,这种方法直观有效但可能需要投入较大时间和精力;最后,也可以考虑使用专门针对目标平台编写的benchmark软件进行测试,这种方法是一种折中方案,具有一定的参考价值但需要注意其测试结果受硬件本身以及不同BSP和软件配置等多种因素的影响,因此只有在硬件设置一致性较高的前提下才能获得较有意义的结果

根据上述思路,本文统一使用Toradex提供的工业产品等级ARM计算机模块以及其官方发布的最新版Linux BSP V2.5Beta3作为基准测试平台,在测试过程中,尽可能保持对结果影响较大的CPU主频和显示输出分辨率一致.所选测试样本包括:基于NVIDIA Tegra2的Colibri T20 512M、基于NXP i.MX6DL的Colibri i.MX6DL 512M以及基于NXP Vybrid的Colubi VF61 256M;其中前两台均采用双核Cortex-A9 ARM架构,第三台则采用Cortex-A5与M4异构双核架构.本研究仅针对A5架构进行测试分析.

1). 本文涉及的硬件平台测试项目及工具如下

a). 硬件平台

这三种支持接口兼容的Colubi ARM计算机模块和一块Colubi Eva Board(参考链接:https://www.toradex.cn/zh_cn/products/carrier-board/colibri-evaluation-carrier-board)

b). 测试项目及对应工具

- CPU测试: nbench

- Memory测试: stream

- Storage测试: dd, hdparm

- Ethernet测试: iperf

- CPU压力测试: stress

- GPU压力测试: glmark2

注:所涉及工具除glmark2均已经在BSP中预装.

2). 测试进程以及结果

a). 预设

基于该链接的内容, 两个A9平台被建议禁用DVFS(dynamic voltage and frequency switching)。Colubi T20 CPU被配置为主频率1 GHz, 而Colubi i.MX6DL CPU则被配置为主频率800 MHz, 由于不支持DVFS功能, Colubi VF61无需设置主频率。

显示分辨率所有平台均统一设置为默认值640x480.

b). CPU测试

进入/usr/bin运行下面命令

nbench

- Colibri T20结果如下

=============LINUX DATA BELOW=============

CPU : Dual

L2 Cache :

OS : Linux 3.1.10-V2.5b3+gc8ead50

C compiler : arm-angstrom-linux-gnueabi-gcc

libc : static

MEMORY INDEX : 5.042

INTEGER INDEX : 5.245

FLOATING-POINT INDEX: 6.401

Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

--------------------------------------------------------------------------

- Colibri i.MX6DL 结果如下

==============LINUX DATA BELOW=========

CPU : Dual ARMv7 Processor rev 10 (v7l)

L2 Cache :

OS : Linux 3.14.28-V2.5b3+g0632def

C compiler : arm-angstrom-linux-gnueabi-gcc

libc : static

MEMORY INDEX : 4.028

INTEGER INDEX : 4.177

FLOATING-POINT INDEX: 5.137

Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

----------------------------------------------------------------------------

Colibri VF61 结果如下

==============LINUX DATA BELOW=========

CPU : ARMv7 Processor rev 1 (v7l)

L2 Cache :

OS : Linux 4.1.15-v2.5b3+ge6d111c

C compiler : arm-angstrom-linux-gnueabi-gcc

libc : static

MEMORY INDEX : 1.896

INTEGER INDEX : 2.337

FLOATING-POINT INDEX: 2.139

Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

---------------------------------------------------------------------

nbench测试结果显示单个CPU性能主要基于频率参数,在本报告中将对比T20与i.MX6这两款A9核心处理器的表现情况。其中T20由于其频率参数略高于i.MX6具有一定的性能优势;而VF61虽然也采用了A5架构但其频率参数明显低于前两者导致整体性能表现相对逊色。此外,在本报告中还提供了另一个CPU评估工具lmbench的具体信息

c). Memory测试

运行下面命令

stream

- Colibri T20结果如下

==================================

STREAM copy latency: 33.38 nanoseconds

STREAM copy bandwidth: 479.33 MB/sec

STREAM scale latency: 35.58 nanoseconds

STREAM scale bandwidth: 449.65 MB/sec

STREAM add latency: 41.73 nanoseconds

STREAM add bandwidth: 575.10 MB/sec

STREAM triad latency: 42.90 nanoseconds

STREAM triad bandwidth: 559.44 MB/sec

---------------------------------------------------------

- Colibri i.MX6DL 结果如下

=================================

STREAM copy latency: 18.33 nanoseconds

STREAM copy bandwidth: 873.08 MB/sec

STREAM scale latency: 23.45 nanoseconds

STREAM scale bandwidth: 682.30 MB/sec

STREAM add latency: 26.90 nanoseconds

STREAM add bandwidth: 892.26 MB/sec

STREAM triad latency: 25.58 nanoseconds

STREAM triad bandwidth: 938.16 MB/sec

------------------------------------------------------

- Colibri VF61 结果如下

=================================

STREAM copy latency: 30.53 nanoseconds

STREAM copy bandwidth: 524.09 MB/sec

STREAM scale latency: 30.78 nanoseconds

STREAM scale bandwidth: 519.82 MB/sec

STREAM add latency: 134.66 nanoseconds

STREAM add bandwidth: 178.23 MB/sec

STREAM triad latency: 149.24 nanoseconds

STREAM triad bandwidth: 160.81 MB/sec

-----------------------------------------------------------

d). Storage测试

./ 因为T20和VF61采用了Nand Flash存储结构,不适合采用hdparm进行参数配置,因此决定统一改用dd作为测试工具来评估该模块内部固件存储性能。

运行下面命令

sync;time -p bash -c "(dd if=/dev/zero bs=1024 count=100000 of=/test.file;sync)" //测试写速度

echo 3 > /proc/sys/vm/drop_caches ;time dd if=/test.file of=/dev/null bs=1024 //测试读速度

- Colibri T20结果如下

读取测试,约为14.7MB/sec

=======================

100000+0 records in

100000+0 records out

real 0m6.795s

user 0m0.030s

sys 0m1.830s

-----------------------------------------

写入测试,约为9MB/sec

========================

100000+0 records in

100000+0 records out

real 11.08

user 0.01

sys 2.19

-----------------------------------------

- Colibri i.MX6DL结果如下

读取测试,约为43.5MB/sec

========================

100000+0 records in

100000+0 records out

real 0m2.306s

user 0m0.020s

sys 0m0.680s

--------------------------------------------

写入测试,约为10MB/sec

=========================

100000+0 records in

100000+0 records out

real 10.07

user 0.09

sys 3.64

-------------------------------------------

- Colibri VF61 结果如下

读取测试,约为24MB/sec

========================

[ 1178.378483] sh (407): drop_caches: 3

100000+0 records in

100000+0 records out

real 0m4.161s

user 0m0.100s

sys 0m3.180s

------------------------------------------

写入测试,约为12.8MB/sec

========================

100000+0 records in

100000+0 records out

real 7.78

user 0.13

sys 3.85

-----------------------------------

./ 使用hdparm测试外部8G SD卡读取速度

运行下面命令

hdparm -t /dev/mmcblk1p1

- Colibri T20结果如下

====================

/dev/mmcblk0p1:

Timing buffered disk reads: 52 MB in 3.02 seconds = 17.22 MB/sec

------------------------------------

- Colibri i.MX6DL 结果如下

====================

/dev/mmcblk1p1:

Timing buffered disk reads: 56 MB in 3.09 seconds = 18.13 MB/sec

------------------------------------

- Colibri VF61 结果如下

=====================

/dev/mmcblk0p1:

Timing buffered disk reads: 54 MB in 3.07 seconds = 17.60 MB/sec

-------------------------------------

e). Ethernet测试

将测试目标板和Linux主机连接到同一局域网,目标板为100M网口.

通过Linux主机端执行以下命令(其中以TCP测试为例),还可以通过调整参数来执行其他类型的测试。

iperf -s

在目标板上面运行下面命令

iperf -c $hostip -t 60 -P 8

- Colibri T20结果如下

=========================

[SUM] 0.0-60.1 sec 676 MBytes 94.3 Mbits/sec

------------------------------------------

- Colibri i.MX6DL 结果如下

=======================

[SUM] 0.0-60.2 sec 677 MBytes 94.4 Mbits/sec

---------------------------------------

- Colibri VF61 结果如下

=======================

[SUM] 0.0-60.1 sec 674 MBytes 94.2 Mbits/sec

---------------------------------------

f). CPU压力测试

在三个平台上面分别运行下面命令

stress -c 2

在另一终端中使用”top”命令查看CPU使用状态,两个CPU均已经满负荷

g). GPU压力测试

为实现glmark2工具的使用需求,在Toradex openembedded环境下对相关IPK安装包进行了编译,请具体环境配置建议参考此处,并以Colibri i.MX6平台为例进行说明。

安装流程

opkg install libpng12_1.2.51-r0_armv7at2hf-vfp-neon.ipk

opkg install glmark2_2014.03-r0_armv7at2hf-vfp-neon-mx6qdl.ipk

运行

glmark2-es2

=======================================================

glmark2 2014.03

=======================================================

OpenGL Information

GL_VENDOR: Vivante Corporation

GL_RENDERER: Vivante GC880

GL_VERSION: OpenGL ES 3.0 V5.0.11.p4.25762

=======================================================

[build] use-vbo=false: FPS: 495 FrameTime: 2.020 ms

[build] use-vbo=true: FPS: 908 FrameTime: 1.101 ms

[texture] texture-filter=nearest: FPS: 702 FrameTime: 1.425 ms

[texture] texture-filter=linear: FPS: 664 FrameTime: 1.506 ms

[texture] texture-filter=mipmap: FPS: 704 FrameTime: 1.420 ms

[shading] shading=gouraud: FPS: 485 FrameTime: 2.062 ms

[shading] shading=blinn-phong-inf: FPS: 248 FrameTime: 4.032 ms

[shading] shading=phong: FPS: 151 FrameTime: 6.623 ms

[shading] shading=cel: FPS: 114 FrameTime: 8.772 ms

[bump] bump-render=high-poly: FPS: 159 FrameTime: 6.289 ms

[bump] bump-render=normals: FPS: 426 FrameTime: 2.347 ms

[bump] bump-render=height: FPS: 340 FrameTime: 2.941 ms

[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 104 FrameTime: 9.615 ms

[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 37 FrameTime: 27.027 ms

[pulsar] light=false:quads=5:texture=false: FPS: 601 FrameTime: 1.664 ms

[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 52 F

rameTime: 19.231 ms

[desktop] effect=shadow:windows=4: FPS: 212 FrameTime: 4.717 ms

[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:

update-method=map: FPS: 52 FrameTime: 19.231 ms

[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:

update-method=subdata: FPS: 51 FrameTime: 19.608 ms

[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:u

pdate-method=map: FPS: 62 FrameTime: 16.129 ms

[ideas] speed=duration: FPS: 46 FrameTime: 21.739 ms

[jellyfish] : FPS: 89 FrameTime: 11.236 ms

[terrain] : FPS: 4 FrameTime: 250.000 ms

[shadow] : FPS: 175 FrameTime: 5.714 ms

[refract] : FPS: 27 FrameTime: 37.037 ms

[conditionals] fragment-steps=0:vertex-steps=0: FPS: 427 FrameTime: 2.342 ms

[conditionals] fragment-steps=5:vertex-steps=0: FPS: 93 FrameTime: 10.753 ms

[conditionals] fragment-steps=0:vertex-steps=5: FPS: 383 FrameTime: 2.611 ms

[function] fragment-complexity=low:fragment-steps=5: FPS: 173 FrameTime: 5.780 m

s

[function] fragment-complexity=medium:fragment-steps=5: FPS: 51 FrameTime: 19.60

8 ms

[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 156 FrameTime:

6.410 ms

[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 156 FrameTim

e: 6.410 ms

[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 82 FrameTime:

12.195 ms

=======================================================

glmark2 Score: 255

=======================================================

全部评论 (0)

还没有任何评论哟~