Author Topic: Optimised Memcpy Function For Nokia 770?  (Read 6692 times)

lardman

  • Hero Member
  • *****
  • Posts: 4512
    • View Profile
    • http://people.bath.ac.uk/enpsgp/Zaurus/
Optimised Memcpy Function For Nokia 770?
« on: March 15, 2006, 12:06:24 pm »
Anyone been following this thread (which has become split unfortunately)?:
http://maemo.org/pipermail/maemo-developer...rch/003269.html
http://maemo.org/pipermail/maemo-developer...rch/003276.html

The chap is asking for people to supply results for processors other than the omap, and I thought we might be the perfect people (assuming most people reading the forum have a Zaurus or a 770 & a Zaurus) as we have a variety of processor types (and possibly optimisations).

I wonder how much of a difference the in-lining makes though. Anyone have any ideas?

-finline-functions would in-line things, but presumably functions like memcpy() would have to be available in source form to be inlined (making it more difficult for the test program, but not for patching glibc)?

I need to rebuild my toolchains, so I won't be able to do much for a day or so, but if anyone else has a working toolchain I'd be interested to see the results.


Si
C750 OZ3.5.4 (GPE, 2.6.x kernel)
SL5500 OZ3.5.4 (Opie)
Nokia 770
Serial GPS, WCF-12, Socket Ethernet & BT, Ratoc USB
WinXP, Mandriva

lardman

  • Hero Member
  • *****
  • Posts: 4512
    • View Profile
    • http://people.bath.ac.uk/enpsgp/Zaurus/
Optimised Memcpy Function For Nokia 770?
« Reply #1 on: March 26, 2006, 11:14:26 am »
Right, I ran some tests on my c750 and 5500 and 770:

================================================================
================================================================
Sharp Zaurus sl-C750 (c7x0/Shepherd)
XScale-PXA255 rev 6 (v5l), 400MHz
================================================================
================================================================

root@c7x0:/media/cf/other# ./arm5-fastmem-arm-test
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 182.36MB/s
memset8() memory bandwidth: 182.36MB/s
memcpy() memory bandwidth (perfectly aligned): 80.04MB/s
memcpy16() memory bandwidth (perfectly aligned): 34.49MB/s
memcpy() memory bandwidth (16-bit aligned): 73.07MB/s
memcpy16() memory bandwidth (16-bit aligned): 31.02MB/s
--- testing performance for random blocks (size 0-15 bytes) --- memset time: 0.820
memset8 time: 0.750
--- testing performance for random blocks (size 0-511 bytes) --- memset time: 2.080
memset8 time: 2.060

================================================================

root@c7x0:/media/cf/other# ./arm4-fastmem-arm-test
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 183.96MB/s
memset8() memory bandwidth: 182.36MB/s
memcpy() memory bandwidth (perfectly aligned): 81.92MB/s
memcpy16() memory bandwidth (perfectly aligned): 34.89MB/s
memcpy() memory bandwidth (16-bit aligned): 74.63MB/s
memcpy16() memory bandwidth (16-bit aligned): 31.35MB/s
--- testing performance for random blocks (size 0-15 bytes) --- memset time: 0.790
memset8 time: 0.720
--- testing performance for random blocks (size 0-511 bytes) --- memset time: 2.060
memset8 time: 2.060




================================================================
================================================================
Sharp Zaurus sl-5500 (Collie)
StrongARM-1110 rev 9 (v4l) 206MHz
================================================================
================================================================

root@collie:/media/cf/other# ./arm5-fastmem-arm-test
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 35.67MB/s
memset8() memory bandwidth: 101.80MB/s
memcpy() memory bandwidth (perfectly aligned): 59.07MB/s
memcpy16() memory bandwidth (perfectly aligned): 59.24MB/s
memcpy() memory bandwidth (16-bit aligned): 48.88MB/s
memcpy16() memory bandwidth (16-bit aligned): 59.24MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.740
memset8 time: 0.540
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 7.840
memset8 time: 3.090

================================================================

root@collie:/media/cf/other# ./arm4-fastmem-arm-test
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 35.67MB/s
memset8() memory bandwidth: 101.80MB/s
memcpy() memory bandwidth (perfectly aligned): 59.07MB/s
memcpy16() memory bandwidth (perfectly aligned): 58.91MB/s
memcpy() memory bandwidth (16-bit aligned): 49.00MB/s
memcpy16() memory bandwidth (16-bit aligned): 59.07MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.730
memset8 time: 0.540
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 7.850
memset8 time: 3.100




================================================================
================================================================
Nokia N770
ARM926EJ-Sid(wb) rev 3 (v5l) 200MHz
OMAP1710 ?
================================================================
================================================================

./arm5-fastmem-arm-test
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 117.16MB/s
memset8() memory bandwidth: 262.14MB/s
memcpy() memory bandwidth (perfectly aligned): 102.30MB/s
memcpy16() memory bandwidth (perfectly aligned): 110.96MB/s
memcpy() memory bandwidth (16-bit aligned): 69.21MB/s
memcpy16() memory bandwidth (16-bit aligned): 99.39MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.400
memset8 time: 0.280
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 2.430
memset8 time: 1.190

================================================================

./arm4-fastmem-arm-test
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 119.16MB/s
memset8() memory bandwidth: 265.46MB/s
memcpy() memory bandwidth (perfectly aligned): 100.82MB/s
memcpy16() memory bandwidth (perfectly aligned): 109.80MB/s
memcpy() memory bandwidth (16-bit aligned): 68.53MB/s
memcpy16() memory bandwidth (16-bit aligned): 98.46MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.400
memset8 time: 0.280
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 2.430
memset8 time: 1.170

================================================================

Would someone mind running the tests (actually the two binaries should be about the same so I only included the c7x0 one) on a pxa270 system?

Thanks,


Si

P.S. Excuse the .zip container, I'm sending this from a WinXP machine and the board won't accept anything else I've tried
C750 OZ3.5.4 (GPE, 2.6.x kernel)
SL5500 OZ3.5.4 (Opie)
Nokia 770
Serial GPS, WCF-12, Socket Ethernet & BT, Ratoc USB
WinXP, Mandriva