Author Topic: Mplayer Development And Optimization For Arm  (Read 84526 times)

speculatrix

  • Administrator
  • Hero Member
  • *****
  • Posts: 3706
    • View Profile
Mplayer Development And Optimization For Arm
« Reply #60 on: September 04, 2007, 05:03:30 pm »
could there be other factors affecting memory access - ensuring the right word size is used and not just aligning on the correct byte boundary, reading the data in the right size chunks to make best of use of any arm caching and pre-fetch logic in the CPU?

the Z has got quite poor CF speed as (so I understand) it shares bus cycles with the main memory, the SD slot is faster because it's a separate bus off the CPU.
Gemini 4G/Wi-Fi owner, formerly zaurus C3100 and 860 owner; also owner of an HTC Doubleshot, a Zaurus-like phone.

XorA

  • Full Member
  • ***
  • Posts: 101
    • View Profile
    • http://
Mplayer Development And Optimization For Arm
« Reply #61 on: September 20, 2007, 09:46:49 am »
Todays SVN mplayer with rev 257 of IDCT code produces there benchmarks.

Im thinking this is an improvement.

Machine is SL-C3200 Zaurus

mplayer -nosound -vo null -quiet -benchmark -loop 12 -lavdopts idct=16 matrixbench_normdivx_vbrmp3.avi | grep BENCHMARKs
BENCHMARKs: VC: 186.320s VO:   0.064s A:   0.000s Sys:   2.718s =  189.103s
BENCHMARKs: VC: 188.632s VO:   0.065s A:   0.000s Sys:   3.130s =  191.827s
BENCHMARKs: VC: 188.897s VO:   0.065s A:   0.000s Sys:   2.742s =  191.704s
BENCHMARKs: VC: 189.111s VO:   0.065s A:   0.000s Sys:   2.710s =  191.886s
BENCHMARKs: VC: 188.934s VO:   0.065s A:   0.000s Sys:   2.699s =  191.698s
BENCHMARKs: VC: 189.177s VO:   0.064s A:   0.000s Sys:   2.727s =  191.968s
BENCHMARKs: VC: 188.932s VO:   0.064s A:   0.000s Sys:   2.725s =  191.721s
BENCHMARKs: VC: 189.237s VO:   0.064s A:   0.000s Sys:   2.705s =  192.007s
BENCHMARKs: VC: 188.937s VO:   0.066s A:   0.000s Sys:   2.707s =  191.709s
BENCHMARKs: VC: 189.076s VO:   0.065s A:   0.000s Sys:   2.717s =  191.857s
BENCHMARKs: VC: 189.161s VO:   0.065s A:   0.000s Sys:   2.713s =  191.939s
BENCHMARKs: VC: 189.101s VO:   0.065s A:   0.000s Sys:   2.721s =  191.887s
--
SL-C860 XorABuild/GPE
Sandisk Connect Plus SD/1GMB CF/512M
BT PCMCIA

zap

  • Newbie
  • *
  • Posts: 3
    • View Profile
Mplayer Development And Optimization For Arm
« Reply #62 on: November 06, 2007, 10:00:22 am »
Did some benchmarks today in different environments.

Same command line as above, same video clip, mplayer 1.0 rc2 24587-r5, built by XorA (from Angstrom iwmmxt feed) gave about 181 seconds minimal time.

atty's mplayer on Cacko ROM gave about 162 seconds (same command line, same clip).

TCPMP on a iPAQ 4700 (624MHz PXA270 CPU) gave "228%" in benchmark mode, which translates, I think, to  187.64/2.28=82.3 seconds.

Serge, perhaps TCPMP is worth looking as well? As far as I know, it is open-source.
Greetings,
Andrew

zap

  • Newbie
  • *
  • Posts: 3
    • View Profile
Mplayer Development And Optimization For Arm
« Reply #63 on: November 06, 2007, 05:50:38 pm »
Took a look at tcpmp sources this evening. ffmpeg sources were not modified (except palmOS-specific hacks), so I believe the big speed difference is just because hx4700 is a quite fast device, or mplayer does something terribly wrong (?).

Aside from this, nothing interesting in tcpmp sources. Just a collection of codecs from various sources and the glue code. Lots of custom assembly for fast blitting and scaling.
Greetings,
Andrew

Serge

  • Jr. Member
  • **
  • Posts: 51
    • View Profile
Mplayer Development And Optimization For Arm
« Reply #64 on: November 07, 2007, 02:26:29 am »
Hello zap,

Please also try testing atty's build without '-lavdopts idct=16' option (it forces armv5te optimized idct from ffmpeg, but atty's build should be able to use a more efficient iwmmxt optimized idct from IPP).

Anyway, as already mentioned in this thread, there is something wrong with mplayer running on Zaurus devices (or the devices with XScale core). For example, even Nokia 770 with 252MHz ARM9E cpu appears to be faster than Zaurus when playing this matrix video clip (time for decoding it is ~158 seconds). Though intuitively everything should be quite the opposite: Zaurus has a lot higher cpu clock frequency and supports iwmmxt SIMD instructions in addition to armv5te.

TCPMP might be an interesting option (for somebody else to try), but I'm satistied with mplayer/ffmpeg on Nokia 770 and N800 at the moment. Translating mplayer performance on Nokia 770 to 'TCMP percents', it would be something like 118%, and if we try to estimate how it would theoretically run at 624MHz, that would be ~290%. I know that this approximation is wrong as memory speed also does matter a lot, but anyway, looks like both TCPMP and ffmpeg should provide at least comparable performance.

In order to get optimal mplayer performance on Zaurus, somebody just needs to profile it there (doing it with gprof is quite simple), find performance bottlenecks and try to fix them. I might have a look at what's wrong if I got XScale device to experiment with (I had plans to buy some motorola EZX phone, A1200 or E6, but these plans are on hold now).
Siarhei Siamashka (ssvb on #maemo, irc.freenode.net)
currently taking part in porting MPlayer to Nokia 770 and Nokia N800, feel free to join :)

zap

  • Newbie
  • *
  • Posts: 3
    • View Profile
Mplayer Development And Optimization For Arm
« Reply #65 on: November 07, 2007, 01:23:59 pm »
Quote from: Serge
Please also try testing atty's build without '-lavdopts idct=16' option (it forces armv5te optimized idct from ffmpeg, but atty's build should be able to use a more efficient iwmmxt optimized idct from IPP).
I ran it without this option since I thought atty's mplayer uses the right idct transform by default. By the way, Angstrom' mplayer also seems to use the best idct transform by default, at least I haven't noticed any difference when running mplayer with and without this option.

Quote from: Serge
Anyway, as already mentioned in this thread, there is something wrong with mplayer running on Zaurus devices (or the devices with XScale core). For example, even Nokia 770 with 252MHz ARM9E cpu appears to be faster than Zaurus when playing this matrix video clip (time for decoding it is ~158 seconds). Though intuitively everything should be quite the opposite: Zaurus has a lot higher cpu clock frequency and supports iwmmxt SIMD instructions in addition to armv5te.
Tried the same clip on TCPMP on my old Dell Axim X5 (400MHz PXA255, 64Mb RAM). It shows 131.68% so indeed it looks like something is wrong on Zaurus, because my Dell Axim has a 100MHz bus and C3100 has AFAIK 143MHz bus (e.g. faster RAM).

Quote from: Serge
In order to get optimal mplayer performance on Zaurus, somebody just needs to profile it there (doing it with gprof is quite simple), find performance bottlenecks and try to fix them. I might have a look at what's wrong if I got XScale device to experiment with (I had plans to buy some motorola EZX phone, A1200 or E6, but these plans are on hold now).
I'll try to do that when time permits.
Greetings,
Andrew

tjchick

  • Newbie
  • *
  • Posts: 14
    • View Profile
Mplayer Development And Optimization For Arm
« Reply #66 on: November 13, 2007, 05:38:08 pm »
Just a quick update from me, mostly of interest to the angstrom people...

You may remember I hacked mplayer/ffmpeg to actually use iwmmxt rather than just compiling them.

I got VC times of apx 43 seconds for the doom clip running on angstrom.

Now I am using *the same binary* and get VC times of 37 seconds on the latest Angstrom test images, so something has changed, maybe cache support or iwmmxt support in the kernel. Anyhow, my results are now about the same as my tests on cacko with attys mplayer.

If I use the default mplayer included in the angstrom iwmmxt feeds, I see VC of 52 seconds. I'm going to take a look, and try the svn version.

Cheers,
Tim

speculatrix

  • Administrator
  • Hero Member
  • *****
  • Posts: 3706
    • View Profile
Mplayer Development And Optimization For Arm
« Reply #67 on: November 13, 2007, 05:59:24 pm »
good news indeed, anything which improves the media performance on the zaurus is great!
Gemini 4G/Wi-Fi owner, formerly zaurus C3100 and 860 owner; also owner of an HTC Doubleshot, a Zaurus-like phone.