OESF | ELSI | pdaXrom | OpenZaurus | Zaurus Themes | Community Links | Ibiblio

IPB

Welcome Guest ( Log In | Register )

> Mplayer Development And Optimization For Arm
Serge
post Dec 5 2006, 02:43 PM
Post #1





Group: Members
Posts: 51
Joined: 8-October 06
Member No.: 11,724



Probably it is a good idea to consolidate efforts and try to submit some of the useful ARM related patches upstream:
http://lists.mplayerhq.hu/pipermail/ffmpeg...ust/014460.html
http://lists.mplayerhq.hu/pipermail/mplaye...ber/046207.html

I can only test MPlayer on Nokia 770, so can't be sure if any ARM9E (that's the core used in Nokia 770) specific optimizations are also good for Zaurus. So people who are able to compile MPlayer from sources and test it on zaurus are welcome in this thread. One of the examples is the new armv5te optimized idct in MPlayer 1.0rc1, can anybody benchmark it on Zaurus?

Also this is not quite ARM architecture related, but libmad based decoder in MPlayer seems to have troubles with variable bitrate audio (it loses sync with video). Some more details can be found here http://lists.mplayerhq.hu/pipermail/mplaye...ust/045017.html and in the followup messages. Any volunteer to investigate this problem?

All in all, ffmpeg optimizations for ARM are not nearly as good as for x86, so investing some time in it may provide some performance improvement.
Go to the top of the page
 
+Quote Post
 
Start new topic
Replies
tjchick
post Mar 14 2007, 07:39 AM
Post #2





Group: Members
Posts: 14
Joined: 9-May 06
Member No.: 9,821



Hmm. It looks like the mplayer 1.0rc1 code includes iwmmxt stuff, but does not actually use it unless you change the code. I have done this for the results below.

Here are my benchmark results on a standard Sl-C3200, not overclocked, running open zaurus:

BENCHMARKs: VC: 44.056s VO: 0.078s A: 0.000s Sys: 0.831s = 44.965s
BENCHMARK%: VC: 97.9787% VO: 0.1734% A: 0.0000% Sys: 1.8479% = 100.0000%
BENCHMARKs: VC: 43.234s VO: 0.079s A: 0.000s Sys: 0.816s = 44.128s
BENCHMARK%: VC: 97.9734% VO: 0.1785% A: 0.0000% Sys: 1.8481% = 100.0000%
BENCHMARKs: VC: 43.487s VO: 0.076s A: 0.000s Sys: 0.813s = 44.376s
BENCHMARK%: VC: 97.9957% VO: 0.1715% A: 0.0000% Sys: 1.8328% = 100.0000%
BENCHMARKs: VC: 43.669s VO: 0.076s A: 0.000s Sys: 0.820s = 44.565s
BENCHMARK%: VC: 97.9891% VO: 0.1712% A: 0.0000% Sys: 1.8398% = 100.0000%
BENCHMARKs: VC: 43.497s VO: 0.078s A: 0.000s Sys: 0.810s = 44.386s
BENCHMARK%: VC: 97.9976% VO: 0.1764% A: 0.0000% Sys: 1.8260% = 100.0000%

Tim
Go to the top of the page
 
+Quote Post
Serge
post Mar 14 2007, 08:14 AM
Post #3





Group: Members
Posts: 51
Joined: 8-October 06
Member No.: 11,724



QUOTE(tjchick @ Mar 14 2007, 07:39 AM)
Hmm. It looks like the mplayer 1.0rc1 code includes iwmmxt stuff, but does not actually use it unless you change the code.

Do you really need to change the code to use iwmmx? Isn't it a simple matter of properly running configure?

Did you try using something similar to what I suggested in this thread before?
CFLAGS="-O4 -mcpu=iwmmxt -fomit-frame-pointer -ffast-math" ./configure
make
Go to the top of the page
 
+Quote Post
tjchick
post Mar 14 2007, 08:29 AM
Post #4





Group: Members
Posts: 14
Joined: 9-May 06
Member No.: 9,821



QUOTE(Serge @ Mar 14 2007, 05:14 PM)
QUOTE(tjchick @ Mar 14 2007, 07:39 AM)
Hmm. It looks like the mplayer 1.0rc1 code includes iwmmxt stuff, but does not actually use it unless you change the code.

Do you really need to change the code to use iwmmx? Isn't it a simple matter of properly running configure?

Did you try using something similar to what I suggested in this thread before?
CFLAGS="-O4 -mcpu=iwmmxt -fomit-frame-pointer -ffast-math" ./configure
make
*



Yes, you really do - the code gets compiled, but not used, as the code is only installed following a test like this:
if( mm_flags & MM_IWMMXT ) -> install dsp code.

It fills in mm_flags wih 0! There is some code to overide this using avctx->dsp_mask & FF_MM_FORCE, but I did not look too hard at getting this going. I wonder if this is related to the lavdopts somehow?

That's why the others only saw a 2% improvment (compiling with the better tune options), and I see a 30% or so improvement.

Tim
Go to the top of the page
 
+Quote Post
Serge
post Mar 14 2007, 09:32 AM
Post #5





Group: Members
Posts: 51
Joined: 8-October 06
Member No.: 11,724



QUOTE(tjchick @ Mar 14 2007, 08:29 AM)
Yes, you really do - the code gets compiled, but not used, as the code is only installed following a test like this:
if( mm_flags & MM_IWMMXT ) -> install dsp code.

It fills in mm_flags wih 0! There is some code to overide this using avctx->dsp_mask & FF_MM_FORCE, but I did not look too hard at getting this going. I wonder if this is related to the lavdopts somehow?

That's why the others only saw a 2% improvment (compiling with the better tune options), and I see a 30% or so improvement.

Thanks for the detailed explanation, it clarifies the current situation a lot. When I submitted ARMv5TE instructions support for MPlayer configure, I could not verify that IWMMXT works as well (for an obvious reason, I don't have any device that supports IWMMXT): http://lists.mplayerhq.hu/pipermail/mplaye...ber/046537.html

Please check the latest MPlayer SVN just as Meanie suggested, and if it still has problems with enabling iwmmxt, please try to make a clean fix and submit this patch upstream. If you check the first post in this thread, you will see that upstream developers are not very familiar with ARM platform. Only atty did some improvements for MPlayer at some time in the past, but he is unwilling to help upstream to integrate his fixes for whatever reason. So it is up to us (and you as well) to work on improving ARM support in MPlayer (including IWMMXT support). Nobody else can do this job. And upstream developers are not obliged to fix our problems.

PS. I'm sorry if it was me who created a false impression of IWMMXT being fully supported in MPlayer 1.0.rc1 sad.gif

edit: IWMMX has some additional registers, so their save/restore on context switches should be probably supported by the kernel? Maybe these extra checks in mplayer are there to ensure that it is safe to use iwmmxt even though cpu itself may support them? Anyway that was just a wild guess, I'm not familiar with XScale at all.

And thanks for actually digging into the code and checking if iwmmxt really works, the results posted in this thread were suspicious from the very start smile.gif
Go to the top of the page
 
+Quote Post
tjchick
post Mar 15 2007, 01:51 AM
Post #6





Group: Members
Posts: 14
Joined: 9-May 06
Member No.: 9,821



[quote=Serge,Mar 14 2007, 06:32 PM]
Thanks for the detailed explanation, it clarifies the current situation a lot. When I submitted ARMv5TE instructions support for MPlayer configure, I could not verify that IWMMXT works as well (for an obvious reason, I don't have any device that supports IWMMXT): http://lists.mplayerhq.hu/pipermail/mplaye...ber/046537.html

Please check the latest MPlayer SVN just as Meanie suggested, and if it still has problems with enabling iwmmxt, please try to make a clean fix and submit this patch upstream.

[\quote]
I already did this stuff yesteday, before I saw your messages. Yes Meanie, even latest SVN does not fix matters. I posted a patch to the ffmpeg dev mailing list, got some feedback and posted another patch. Am awaiting the response.

[quote]
If you check the first post in this thread, you will see that upstream developers are not very familiar with ARM platform. Only atty did some improvements for MPlayer at some time in the past, but he is unwilling to help upstream to integrate his fixes for whatever reason. So it is up to us (and you as well) to work on improving ARM support in MPlayer (including IWMMXT support). Nobody else can do this job. And upstream developers are not obliged to fix our problems.

PS. I'm sorry if it was me who created a false impression of IWMMXT being fully supported in MPlayer 1.0.rc1 sad.gif

edit: IWMMX has some additional registers, so their save/restore on context switches should be probably supported by the kernel? Maybe these extra checks in mplayer are there to ensure that it is safe to use iwmmxt even though cpu itself may support them? Anyway that was just a wild guess, I'm not familiar with XScale at all.

And thanks for actually digging into the code and checking if iwmmxt really works, the results posted in this thread were suspicious from the very start smile.gif
*

[/quote]
Yes, IWMMX needs OS support, as well as having the right processor. Unfortunatly I (and others) can not find a simple, portable method for detecting this. So the only option is to try and use iwmmxt is it is compiled in - you need to turn on compile switches to get it.

I also noted one more thing - the iwmmxt code does not provide the h363_inter function, so I canged ffmpeg to use the armv5 version. This provided a small speed increase. So either the version which was in use was pretty good (be warned - it is easy to spend a lot of time writing arm assembler which is *worse* than the compiler output), or the system is memory bound as others have suggested. It might be worth looking at joining together more of the reads and writes if possible (the system uses SDRAM, so the performance for single words sucks compared to 2 words etc, in the case of an overstretched cache)

Here are the new results:
BENCHMARKs: VC: 43.497s
BENCHMARKs: VC: 42.813s
BENCHMARKs: VC: 43.040s
BENCHMARKs: VC: 43.269s
BENCHMARKs: VC: 43.090s

Thanks,
Tim
Go to the top of the page
 
+Quote Post
Serge
post Mar 15 2007, 10:52 AM
Post #7





Group: Members
Posts: 51
Joined: 8-October 06
Member No.: 11,724



QUOTE(tjchick @ Mar 15 2007, 01:51 AM)
Yes, IWMMX needs OS support, as well as having the right processor. Unfortunatly I (and others) can not find a simple, portable method for detecting this. So the only option is to try and use iwmmxt is it is compiled in - you need to turn on compile switches to get it.

That's probably fine. By the way, you can also try to compile MPlayer with the use of Intel IPP (Integrated Performance Primitives) library and check if it helps to improve performance.

QUOTE
I also noted one more thing - the iwmmxt code does not provide the h363_inter function, so I canged ffmpeg to use the armv5 version. This provided a small speed increase.

This should not be a problem as dct_unquantize_h263_inter is not a performance critical function. But it is pretty much similar to dct_unquantize_h263_intra (which consumes a noticeable amount of decoding time, something like ~7%), so implementing it was quite easy. You can see some gprof output with the statistics about decoding this Doom video clip on Nokia 770: http://lists.mplayerhq.hu/pipermail/ffmpeg...ary/050363.html

QUOTE
So either the version which was in use was pretty good

It was just not performance critical, I wonder why you even managed to see some improvement wink.gif

QUOTE
(be warned - it is easy to spend a lot of time writing arm assembler which is *worse* than the compiler output),

Actually I find compiler generated code for ARM quite poorly optimized. It can't make the good use of conditionally executed instructions, can't use DSP instructions, schedule code in an optimal way to avoid pipeline stalls. Of course, it only makes sense optimizing code that is bottleneck to gain any visible performance improvement overall.

I prefer to always develop some simple performance and correctness tests for the performance critical functions I'm optimizing. So I can ensure that they really provide performance improvement and do not introduce stability issues.

Random assembly hacking is not a productive way of working for sure smile.gif

QUOTE
or the system is memory bound as others have suggested.

This particular function is run on fully cached data, so memory access time is not important here. I investigated mplayer memory access pattern using valgrind (callgrind tool) getting more or less precise information about cache misses.

Code that heavily depends on memory performance is in motion compensation functions and partially idct (cache write misses for destination buffer).

QUOTE
It might be worth looking at joining together more of the reads and writes if possible (the system uses SDRAM, so the performance for single words sucks compared to 2 words etc, in the case of an overstretched cache)

Yes, paying special attention at accessing memory properly and using prefetch can improve performance quite noticeably.

PS. In order to ensure that video is decoded not only fast, but also right, you can use '-vo md5' option. I noticed some really ugly video decoding artefacts when using standard ARM optimized IDCT (some vertical stripes on panning scenes), ARMv5TE optimized IDCT is identical to C implementation.
Go to the top of the page
 
+Quote Post

Posts in this topic
Serge   Mplayer Development And Optimization For Arm   Dec 5 2006, 02:43 PM
washo   I second that a better player would be great Im a ...   Dec 6 2006, 09:14 AM
ldrolez   Hi! Check atty sources, 99% of mplayer for the...   Dec 7 2006, 09:34 AM
koen   QUOTE(ldrolez @ Dec 7 2006, 05:34 PM)Hi! ...   Dec 7 2006, 09:53 AM
Antikx   QUOTE(koen @ Dec 7 2006, 11:53 AM)mpeg-video ...   Dec 7 2006, 10:54 AM
Serge   QUOTE(ldrolez @ Dec 7 2006, 09:34 AM)Check at...   Dec 7 2006, 11:06 AM
koen   QUOTE(Serge @ Dec 7 2006, 07:06 PM)The check ...   Dec 7 2006, 02:29 PM
danboid   I'm very happy to learn that the ARM specific ...   Dec 7 2006, 02:45 PM
Serge   QUOTE(danboid @ Dec 7 2006, 02:45 PM)I'm ...   Dec 11 2006, 12:30 PM
Serge   Just to keep you informed, the work on implementin...   Dec 25 2006, 02:30 AM
koen   QUOTE(Serge @ Dec 25 2006, 10:30 AM)By the wa...   Dec 25 2006, 04:16 AM
koen   QUOTE(koen @ Dec 25 2006, 12:16 PM)The cxxx m...   Dec 25 2006, 05:38 AM
Serge   QUOTE(koen @ Dec 25 2006, 04:16 AM)The cxxx m...   Dec 31 2006, 12:40 PM
Serge   Some information about mplayer benchmarking. It co...   Dec 26 2006, 03:53 PM
danboid   Hi Serge! I conducted a bunch of benchmark te...   Dec 27 2006, 12:36 AM
koen   I ran the benchmark on my ipaq h2200 (400MHz pxa25...   Dec 27 2006, 01:27 AM
Serge   Thanks for running benchmarks. They show that thes...   Dec 27 2006, 02:16 PM
danboid   Hi Serge! I'm willing to do some more ben...   Jan 1 2007, 12:29 AM
Civil   QUOTECFLAGS="-O4 -mcpu=iwmmxt -fomit-frame-po...   Jan 1 2007, 02:24 AM
Serge   civil: http://www.hpc.ru/board/viewtopic.php?t=990...   Jan 1 2007, 03:00 AM
danboid   Yeah Civil, be civil (Sorry, couldn't resist ...   Jan 1 2007, 03:29 AM
Civil   Serge It was just comments... I don't know eng...   Jan 1 2007, 03:47 AM
Serge   Done some patch for 'dct_unquantize_h263_intra...   Jan 1 2007, 06:37 PM
Serge   OK, committed 'dct_unquantize_h263_intra' ...   Jan 2 2007, 09:32 AM
Serge   Well, some more optimizations for h263 unquantizer...   Jan 6 2007, 08:24 AM
Serge   Just for additional statistics, 'Doom benchmar...   Jan 8 2007, 02:29 PM
Serge   Hello again. I guess the benchmarks of -Os vs. -O2...   Jan 17 2007, 03:37 PM
Meanie   QUOTE(Serge @ Jan 18 2007, 09:37 AM)Hello aga...   Jan 17 2007, 04:40 PM
Serge   Here is a new progress update report I have imple...   Jan 22 2007, 02:30 PM
lardman   Serge, I'll build your comparison benchmarks ...   Jan 22 2007, 02:55 PM
Civil   QUOTEDo you need any assistance in benchmarking? I...   Jan 28 2007, 11:58 AM
Serge   QUOTE(Civil @ Jan 28 2007, 11:58 AM)P.S. mpla...   Jan 28 2007, 12:53 PM
Civil   QUOTEWouldn't it be better to create a new top...   Jan 28 2007, 01:04 PM
Serge   Some more mplayer related news, mplayer port for m...   Feb 14 2007, 01:57 PM
tjchick   Hmm. It looks like the mplayer 1.0rc1 code include...   Mar 14 2007, 07:39 AM
Serge   QUOTE(tjchick @ Mar 14 2007, 07:39 AM)Hmm. It...   Mar 14 2007, 08:14 AM
tjchick   QUOTE(Serge @ Mar 14 2007, 05:14 PM)QUOTE(tjc...   Mar 14 2007, 08:29 AM
Meanie   QUOTE(tjchick @ Mar 15 2007, 02:29 AM)QUOTE(S...   Mar 14 2007, 08:53 AM
Serge   QUOTE(tjchick @ Mar 14 2007, 08:29 AM)Yes, yo...   Mar 14 2007, 09:32 AM
tjchick   Yes, IWMMX needs OS support, as well as having th...   Mar 15 2007, 01:51 AM
Serge   QUOTE(tjchick @ Mar 15 2007, 01:51 AM)Yes, IW...   Mar 15 2007, 10:52 AM
tjchick   QUOTE(Serge @ Mar 15 2007, 07:52 PM)QUOTE(tjc...   Mar 15 2007, 12:05 PM
Meanie   actually, i think your new build is much faster th...   Mar 15 2007, 03:39 PM
tjchick   On cacko on c1000, I see: VC: 36.186 VC: 36.927 VC...   Mar 21 2007, 07:10 AM
Serge   You can try to override idct by using '-lavdop...   Mar 21 2007, 08:26 AM
koen   QUOTE(Serge @ Mar 21 2007, 04:26 PM)By the wa...   Mar 21 2007, 08:42 AM
Serge   QUOTE(koen @ Mar 21 2007, 08:42 AM)QUOTE(Serg...   Mar 22 2007, 10:56 AM
koen   QUOTE(Serge @ Mar 22 2007, 06:56 PM)QUOTE(koe...   Mar 22 2007, 01:35 PM
tjchick   QUOTE(Serge @ Mar 21 2007, 05:26 PM)You can t...   Mar 23 2007, 02:00 PM
Serge   Hi, I'm working on further optimizing ARMv5 ID...   Jul 14 2007, 01:16 PM
Capn_Fish   I'll see if I can give it a try. How much is ...   Jul 14 2007, 01:47 PM
Serge   QUOTE(Capn_Fish @ Jul 14 2007, 01:47 PM)I...   Jul 14 2007, 02:04 PM
Civil   pxa270, 416MHz (Zaurus C3100), Gentoo 2007.0, eabi...   Jul 15 2007, 05:45 AM
Serge   QUOTE(Civil @ Jul 15 2007, 05:45 AM)pxa270, 4...   Jul 15 2007, 08:40 AM
Serge   I'm sorry for a long delay with an answer. Cou...   Aug 28 2007, 09:33 PM
speculatrix   Any improvement at all is very much welcomed - I h...   Aug 30 2007, 01:50 PM
Serge   QUOTE(speculatrix @ Aug 30 2007, 01:50 PM)Any...   Sep 2 2007, 10:18 AM
XorA   A zaurus C3200 px27x Before new idct mplayer -no...   Sep 3 2007, 02:32 AM
Serge   OK, thanks, so at least this IDCT optimization is ...   Sep 4 2007, 11:03 AM
speculatrix   could there be other factors affecting memory acce...   Sep 4 2007, 01:03 PM
XorA   Todays SVN mplayer with rev 257 of IDCT code produ...   Sep 20 2007, 05:46 AM
zap   Did some benchmarks today in different environment...   Nov 6 2007, 07:00 AM
zap   Took a look at tcpmp sources this evening. ffmpeg ...   Nov 6 2007, 02:50 PM
Serge   Hello zap, Please also try testing atty's bui...   Nov 6 2007, 11:26 PM
zap   QUOTE(Serge @ Nov 7 2007, 10:26 AM) Pleas...   Nov 7 2007, 10:23 AM
tjchick   Just a quick update from me, mostly of interest to...   Nov 13 2007, 02:38 PM
speculatrix   good news indeed, anything which improves the medi...   Nov 13 2007, 02:59 PM


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 20th December 2014 - 03:32 AM