OESF | ELSI | pdaXrom | OpenZaurus | Zaurus Themes | Community Links | Ibiblio

IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
> Optimised Memcpy Function For Nokia 770?
lardman
post Mar 15 2006, 09:06 AM
Post #1





Group: Members
Posts: 4,515
Joined: 25-October 03
From: Bath, UK
Member No.: 464



Anyone been following this thread (which has become split unfortunately)?:
http://maemo.org/pipermail/maemo-developer...rch/003269.html
http://maemo.org/pipermail/maemo-developer...rch/003276.html

The chap is asking for people to supply results for processors other than the omap, and I thought we might be the perfect people (assuming most people reading the forum have a Zaurus or a 770 & a Zaurus) as we have a variety of processor types (and possibly optimisations).

I wonder how much of a difference the in-lining makes though. Anyone have any ideas?

-finline-functions would in-line things, but presumably functions like memcpy() would have to be available in source form to be inlined (making it more difficult for the test program, but not for patching glibc)?

I need to rebuild my toolchains, so I won't be able to do much for a day or so, but if anyone else has a working toolchain I'd be interested to see the results.


Si
Go to the top of the page
 
+Quote Post
lardman
post Mar 26 2006, 08:14 AM
Post #2





Group: Members
Posts: 4,515
Joined: 25-October 03
From: Bath, UK
Member No.: 464



Right, I ran some tests on my c750 and 5500 and 770:

================================================================
================================================================
Sharp Zaurus sl-C750 (c7x0/Shepherd)
XScale-PXA255 rev 6 (v5l), 400MHz
================================================================
================================================================

root@c7x0:/media/cf/other# ./arm5-fastmem-arm-test
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 182.36MB/s
memset8() memory bandwidth: 182.36MB/s
memcpy() memory bandwidth (perfectly aligned): 80.04MB/s
memcpy16() memory bandwidth (perfectly aligned): 34.49MB/s
memcpy() memory bandwidth (16-bit aligned): 73.07MB/s
memcpy16() memory bandwidth (16-bit aligned): 31.02MB/s
--- testing performance for random blocks (size 0-15 bytes) --- memset time: 0.820
memset8 time: 0.750
--- testing performance for random blocks (size 0-511 bytes) --- memset time: 2.080
memset8 time: 2.060

================================================================

root@c7x0:/media/cf/other# ./arm4-fastmem-arm-test
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 183.96MB/s
memset8() memory bandwidth: 182.36MB/s
memcpy() memory bandwidth (perfectly aligned): 81.92MB/s
memcpy16() memory bandwidth (perfectly aligned): 34.89MB/s
memcpy() memory bandwidth (16-bit aligned): 74.63MB/s
memcpy16() memory bandwidth (16-bit aligned): 31.35MB/s
--- testing performance for random blocks (size 0-15 bytes) --- memset time: 0.790
memset8 time: 0.720
--- testing performance for random blocks (size 0-511 bytes) --- memset time: 2.060
memset8 time: 2.060




================================================================
================================================================
Sharp Zaurus sl-5500 (Collie)
StrongARM-1110 rev 9 (v4l) 206MHz
================================================================
================================================================

root@collie:/media/cf/other# ./arm5-fastmem-arm-test
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 35.67MB/s
memset8() memory bandwidth: 101.80MB/s
memcpy() memory bandwidth (perfectly aligned): 59.07MB/s
memcpy16() memory bandwidth (perfectly aligned): 59.24MB/s
memcpy() memory bandwidth (16-bit aligned): 48.88MB/s
memcpy16() memory bandwidth (16-bit aligned): 59.24MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.740
memset8 time: 0.540
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 7.840
memset8 time: 3.090

================================================================

root@collie:/media/cf/other# ./arm4-fastmem-arm-test
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 35.67MB/s
memset8() memory bandwidth: 101.80MB/s
memcpy() memory bandwidth (perfectly aligned): 59.07MB/s
memcpy16() memory bandwidth (perfectly aligned): 58.91MB/s
memcpy() memory bandwidth (16-bit aligned): 49.00MB/s
memcpy16() memory bandwidth (16-bit aligned): 59.07MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.730
memset8 time: 0.540
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 7.850
memset8 time: 3.100




================================================================
================================================================
Nokia N770
ARM926EJ-Sid(wb) rev 3 (v5l) 200MHz
OMAP1710 ?
================================================================
================================================================

./arm5-fastmem-arm-test
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 117.16MB/s
memset8() memory bandwidth: 262.14MB/s
memcpy() memory bandwidth (perfectly aligned): 102.30MB/s
memcpy16() memory bandwidth (perfectly aligned): 110.96MB/s
memcpy() memory bandwidth (16-bit aligned): 69.21MB/s
memcpy16() memory bandwidth (16-bit aligned): 99.39MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.400
memset8 time: 0.280
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 2.430
memset8 time: 1.190

================================================================

./arm4-fastmem-arm-test
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 119.16MB/s
memset8() memory bandwidth: 265.46MB/s
memcpy() memory bandwidth (perfectly aligned): 100.82MB/s
memcpy16() memory bandwidth (perfectly aligned): 109.80MB/s
memcpy() memory bandwidth (16-bit aligned): 68.53MB/s
memcpy16() memory bandwidth (16-bit aligned): 98.46MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.400
memset8 time: 0.280
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 2.430
memset8 time: 1.170

================================================================

Would someone mind running the tests (actually the two binaries should be about the same so I only included the c7x0 one) on a pxa270 system?

Thanks,


Si

P.S. Excuse the .zip container, I'm sending this from a WinXP machine and the board won't accept anything else I've tried
Attached File(s)
Attached File  c7x0_fastmem_arm_test.zip ( 7.93K ) Number of downloads: 59
 
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 30th July 2014 - 03:07 AM