OESF | ELSI | pdaXrom | OpenZaurus | Zaurus Themes | Community Links | Ibiblio

IPB

Welcome Guest ( Log In | Register )

> Mplayer Development And Optimization For Arm
Serge
post Dec 5 2006, 02:43 PM
Post #1





Group: Members
Posts: 51
Joined: 8-October 06
Member No.: 11,724



Probably it is a good idea to consolidate efforts and try to submit some of the useful ARM related patches upstream:
http://lists.mplayerhq.hu/pipermail/ffmpeg...ust/014460.html
http://lists.mplayerhq.hu/pipermail/mplaye...ber/046207.html

I can only test MPlayer on Nokia 770, so can't be sure if any ARM9E (that's the core used in Nokia 770) specific optimizations are also good for Zaurus. So people who are able to compile MPlayer from sources and test it on zaurus are welcome in this thread. One of the examples is the new armv5te optimized idct in MPlayer 1.0rc1, can anybody benchmark it on Zaurus?

Also this is not quite ARM architecture related, but libmad based decoder in MPlayer seems to have troubles with variable bitrate audio (it loses sync with video). Some more details can be found here http://lists.mplayerhq.hu/pipermail/mplaye...ust/045017.html and in the followup messages. Any volunteer to investigate this problem?

All in all, ffmpeg optimizations for ARM are not nearly as good as for x86, so investing some time in it may provide some performance improvement.
Go to the top of the page
 
+Quote Post
 
Start new topic
Replies
koen
post Dec 27 2006, 01:27 AM
Post #2





Group: Members
Posts: 1,014
Joined: 4-January 05
From: Enschede, The Netherlands
Member No.: 6,107



I ran the benchmark on my ipaq h2200 (400MHz pxa255) and I can see that the memory bus is a bottleneck, since the 770 and pxa270 machines run the bus at a higher speed.
If that isn't the case, arm926 cores kick xscale ass smile.gif

CODE
root@h2200:/data# sh doom-test.sh
idct is 2
BENCHMARKs: VC:  82.432s VO:   0.071s A:   0.000s Sys:   1.293s =   83.796s
BENCHMARKs: VC:  80.798s VO:   0.066s A:   0.000s Sys:   0.916s =   81.780s
BENCHMARKs: VC:  80.758s VO:   0.067s A:   0.000s Sys:   0.912s =   81.737s
BENCHMARKs: VC:  80.676s VO:   0.070s A:   0.000s Sys:   0.897s =   81.643s
BENCHMARKs: VC:  80.649s VO:   0.067s A:   0.000s Sys:   0.950s =   81.665s
idct is 7
BENCHMARKs: VC:  75.593s VO:   0.069s A:   0.000s Sys:   0.902s =   76.564s
BENCHMARKs: VC:  78.993s VO:   0.069s A:   0.000s Sys:   0.903s =   79.965s
BENCHMARKs: VC:  79.248s VO:   0.066s A:   0.000s Sys:   0.933s =   80.246s
BENCHMARKs: VC:  79.242s VO:   0.067s A:   0.000s Sys:   0.931s =   80.239s
BENCHMARKs: VC:  79.080s VO:   0.066s A:   0.000s Sys:   0.904s =   80.050s
idct is 10
BENCHMARKs: VC:  77.020s VO:   0.067s A:   0.000s Sys:   0.905s =   77.992s
BENCHMARKs: VC:  80.152s VO:   0.066s A:   0.000s Sys:   0.905s =   81.124s
BENCHMARKs: VC:  80.219s VO:   0.181s A:   0.000s Sys:   0.903s =   81.303s
BENCHMARKs: VC:  80.238s VO:   0.066s A:   0.000s Sys:   1.024s =   81.328s
BENCHMARKs: VC:  80.359s VO:   0.066s A:   0.000s Sys:   0.906s =   81.331s
idct is 16
BENCHMARKs: VC:  73.140s VO:   0.068s A:   0.000s Sys:   0.916s =   74.124s
BENCHMARKs: VC:  76.616s VO:   0.066s A:   0.000s Sys:   1.014s =   77.695s
BENCHMARKs: VC:  76.927s VO:   0.066s A:   0.000s Sys:   0.905s =   77.899s
BENCHMARKs: VC:  76.992s VO:   0.069s A:   0.000s Sys:   0.906s =   77.966s
BENCHMARKs: VC:  77.157s VO:   0.067s A:   0.000s Sys:   0.940s =   78.165s
Go to the top of the page
 
+Quote Post
Serge
post Dec 27 2006, 02:16 PM
Post #3





Group: Members
Posts: 51
Joined: 8-October 06
Member No.: 11,724



Thanks for running benchmarks. They show that these armv5te optimizations for idct are useful for xscale too. I was just unsure if it is possible to develop a shared code that runs fine on both arm926 and xscale or have to implement two different versions. I'll try to optimize this idct further as much as possible primarily for arm926, but will also keep in mind that this code is also useful on xscale and will take this into account smile.gif Anyway, iwmmxt implementation of idct specifically optimized for xscale may be a better choice (idct takes quite a noticeable fraction of decoding time, so it is at least useful for some machines like zaurus C3000). If anybody skilled with arm assembly would like to try it, I could provide some help with information (but I don't have any machine that can run iwmmxt code anyway).

QUOTE(koen @ Dec 27 2006, 01:27 AM)
I ran the benchmark on my ipaq h2200 (400MHz pxa255) and I can see that the memory bus is a bottleneck, since the 770 and pxa270 machines run the bus at a higher speed.

That's interesting. If memory performance is really very important for mplayer, probably it should be possible to find the parts of code with heavy memory use and optimize memory access patterns for better cache and memory bus utilization. I have already done some tests trying to figure out how to make best use of memory bandwidth on Nokia 770 some time ago: http://maemo.org/pipermail/maemo-developer...ber/006579.html

This information can turn out to be very useful for further optimizations smile.gif

QUOTE
If that isn't the case, arm926 cores kick xscale ass smile.gif

Well, arm926 core should be somewhat faster per clock, here are some links to optimization docs for different arm flavours: http://www.internettablettalk.com/forums/s...read.php?t=2406

But I expected that 416MHz should be still a lot faster because of higher cpu clock frequency. Maybe memory performance is really a limiting factor here and it makes performance of all these chips closer to each other.

Another possible explanation could be nonoptimal set of optimization options or older version of gcc for zaurus builds of mplayer. It should be relatively easy to test mplayer with a different set of optimization options. You can take upstream mplayer 1.0rc1 tarball and compile it using:
CFLAGS="-O4 -mcpu=iwmmxt -fomit-frame-pointer -ffast-math" ./configure
make

It may have some problems with video/audio output drivers if compiled without zaurus specific patches, but this should not be a problem for testing decoding capabilities only smile.gif
Go to the top of the page
 
+Quote Post

Posts in this topic
Serge   Mplayer Development And Optimization For Arm   Dec 5 2006, 02:43 PM
washo   I second that a better player would be great Im a ...   Dec 6 2006, 09:14 AM
ldrolez   Hi! Check atty sources, 99% of mplayer for the...   Dec 7 2006, 09:34 AM
koen   QUOTE(ldrolez @ Dec 7 2006, 05:34 PM)Hi! ...   Dec 7 2006, 09:53 AM
Antikx   QUOTE(koen @ Dec 7 2006, 11:53 AM)mpeg-video ...   Dec 7 2006, 10:54 AM
Serge   QUOTE(ldrolez @ Dec 7 2006, 09:34 AM)Check at...   Dec 7 2006, 11:06 AM
koen   QUOTE(Serge @ Dec 7 2006, 07:06 PM)The check ...   Dec 7 2006, 02:29 PM
danboid   I'm very happy to learn that the ARM specific ...   Dec 7 2006, 02:45 PM
Serge   QUOTE(danboid @ Dec 7 2006, 02:45 PM)I'm ...   Dec 11 2006, 12:30 PM
Serge   Just to keep you informed, the work on implementin...   Dec 25 2006, 02:30 AM
koen   QUOTE(Serge @ Dec 25 2006, 10:30 AM)By the wa...   Dec 25 2006, 04:16 AM
koen   QUOTE(koen @ Dec 25 2006, 12:16 PM)The cxxx m...   Dec 25 2006, 05:38 AM
Serge   QUOTE(koen @ Dec 25 2006, 04:16 AM)The cxxx m...   Dec 31 2006, 12:40 PM
Serge   Some information about mplayer benchmarking. It co...   Dec 26 2006, 03:53 PM
danboid   Hi Serge! I conducted a bunch of benchmark te...   Dec 27 2006, 12:36 AM
koen   I ran the benchmark on my ipaq h2200 (400MHz pxa25...   Dec 27 2006, 01:27 AM
Serge   Thanks for running benchmarks. They show that thes...   Dec 27 2006, 02:16 PM
danboid   Hi Serge! I'm willing to do some more ben...   Jan 1 2007, 12:29 AM
Civil   QUOTECFLAGS="-O4 -mcpu=iwmmxt -fomit-frame-po...   Jan 1 2007, 02:24 AM
Serge   civil: http://www.hpc.ru/board/viewtopic.php?t=990...   Jan 1 2007, 03:00 AM
danboid   Yeah Civil, be civil (Sorry, couldn't resist ...   Jan 1 2007, 03:29 AM
Civil   Serge It was just comments... I don't know eng...   Jan 1 2007, 03:47 AM
Serge   Done some patch for 'dct_unquantize_h263_intra...   Jan 1 2007, 06:37 PM
Serge   OK, committed 'dct_unquantize_h263_intra' ...   Jan 2 2007, 09:32 AM
Serge   Well, some more optimizations for h263 unquantizer...   Jan 6 2007, 08:24 AM
Serge   Just for additional statistics, 'Doom benchmar...   Jan 8 2007, 02:29 PM
Serge   Hello again. I guess the benchmarks of -Os vs. -O2...   Jan 17 2007, 03:37 PM
Meanie   QUOTE(Serge @ Jan 18 2007, 09:37 AM)Hello aga...   Jan 17 2007, 04:40 PM
Serge   Here is a new progress update report I have imple...   Jan 22 2007, 02:30 PM
lardman   Serge, I'll build your comparison benchmarks ...   Jan 22 2007, 02:55 PM
Civil   QUOTEDo you need any assistance in benchmarking? I...   Jan 28 2007, 11:58 AM
Serge   QUOTE(Civil @ Jan 28 2007, 11:58 AM)P.S. mpla...   Jan 28 2007, 12:53 PM
Civil   QUOTEWouldn't it be better to create a new top...   Jan 28 2007, 01:04 PM
Serge   Some more mplayer related news, mplayer port for m...   Feb 14 2007, 01:57 PM
tjchick   Hmm. It looks like the mplayer 1.0rc1 code include...   Mar 14 2007, 07:39 AM
Serge   QUOTE(tjchick @ Mar 14 2007, 07:39 AM)Hmm. It...   Mar 14 2007, 08:14 AM
tjchick   QUOTE(Serge @ Mar 14 2007, 05:14 PM)QUOTE(tjc...   Mar 14 2007, 08:29 AM
Meanie   QUOTE(tjchick @ Mar 15 2007, 02:29 AM)QUOTE(S...   Mar 14 2007, 08:53 AM
Serge   QUOTE(tjchick @ Mar 14 2007, 08:29 AM)Yes, yo...   Mar 14 2007, 09:32 AM
tjchick   Yes, IWMMX needs OS support, as well as having th...   Mar 15 2007, 01:51 AM
Serge   QUOTE(tjchick @ Mar 15 2007, 01:51 AM)Yes, IW...   Mar 15 2007, 10:52 AM
tjchick   QUOTE(Serge @ Mar 15 2007, 07:52 PM)QUOTE(tjc...   Mar 15 2007, 12:05 PM
Meanie   actually, i think your new build is much faster th...   Mar 15 2007, 03:39 PM
tjchick   On cacko on c1000, I see: VC: 36.186 VC: 36.927 VC...   Mar 21 2007, 07:10 AM
Serge   You can try to override idct by using '-lavdop...   Mar 21 2007, 08:26 AM
koen   QUOTE(Serge @ Mar 21 2007, 04:26 PM)By the wa...   Mar 21 2007, 08:42 AM
Serge   QUOTE(koen @ Mar 21 2007, 08:42 AM)QUOTE(Serg...   Mar 22 2007, 10:56 AM
koen   QUOTE(Serge @ Mar 22 2007, 06:56 PM)QUOTE(koe...   Mar 22 2007, 01:35 PM
tjchick   QUOTE(Serge @ Mar 21 2007, 05:26 PM)You can t...   Mar 23 2007, 02:00 PM
Serge   Hi, I'm working on further optimizing ARMv5 ID...   Jul 14 2007, 01:16 PM
Capn_Fish   I'll see if I can give it a try. How much is ...   Jul 14 2007, 01:47 PM
Serge   QUOTE(Capn_Fish @ Jul 14 2007, 01:47 PM)I...   Jul 14 2007, 02:04 PM
Civil   pxa270, 416MHz (Zaurus C3100), Gentoo 2007.0, eabi...   Jul 15 2007, 05:45 AM
Serge   QUOTE(Civil @ Jul 15 2007, 05:45 AM)pxa270, 4...   Jul 15 2007, 08:40 AM
Serge   I'm sorry for a long delay with an answer. Cou...   Aug 28 2007, 09:33 PM
speculatrix   Any improvement at all is very much welcomed - I h...   Aug 30 2007, 01:50 PM
Serge   QUOTE(speculatrix @ Aug 30 2007, 01:50 PM)Any...   Sep 2 2007, 10:18 AM
XorA   A zaurus C3200 px27x Before new idct mplayer -no...   Sep 3 2007, 02:32 AM
Serge   OK, thanks, so at least this IDCT optimization is ...   Sep 4 2007, 11:03 AM
speculatrix   could there be other factors affecting memory acce...   Sep 4 2007, 01:03 PM
XorA   Todays SVN mplayer with rev 257 of IDCT code produ...   Sep 20 2007, 05:46 AM
zap   Did some benchmarks today in different environment...   Nov 6 2007, 07:00 AM
zap   Took a look at tcpmp sources this evening. ffmpeg ...   Nov 6 2007, 02:50 PM
Serge   Hello zap, Please also try testing atty's bui...   Nov 6 2007, 11:26 PM
zap   QUOTE(Serge @ Nov 7 2007, 10:26 AM) Pleas...   Nov 7 2007, 10:23 AM
tjchick   Just a quick update from me, mostly of interest to...   Nov 13 2007, 02:38 PM
speculatrix   good news indeed, anything which improves the medi...   Nov 13 2007, 02:59 PM


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 26th November 2014 - 04:54 AM