![]() ![]() |
Mar 21 2007, 08:26 AM
Post
#46
|
|
|
Group: Members Posts: 51 Joined: 8-October 06 Member No.: 11,724 |
You can try to override idct by using '-lavdopts idct=<some_number>' in atty's build and test it. After getting the numbers we can see if it is really IPP that matters, or maybe atty's build has some other optimizations.
By the way, IWMMXT seems to be very close to MMX (there is even a table of mapping of the instructions in intel manual). FFmpeg has MMX optimized IDCT implementation. So maybe direct conversion of MMX->IWMMXT is not so hard? |
|
|
|
Mar 21 2007, 08:42 AM
Post
#47
|
|
![]() Group: Members Posts: 1,014 Joined: 4-January 05 From: Enschede, The Netherlands Member No.: 6,107 |
QUOTE(Serge @ Mar 21 2007, 04:26 PM) By the way, IWMMXT seems to be very close to MMX (there is even a table of mapping of the instructions in intel manual). FFmpeg has MMX optimized IDCT implementation. So maybe direct conversion of MMX->IWMMXT is not so hard? Except that ARM has no immediate assignments and needs aligned data... |
|
|
|
Mar 22 2007, 10:56 AM
Post
#48
|
|
|
Group: Members Posts: 51 Joined: 8-October 06 Member No.: 11,724 |
QUOTE(koen @ Mar 21 2007, 08:42 AM) QUOTE(Serge @ Mar 21 2007, 04:26 PM) By the way, IWMMXT seems to be very close to MMX (there is even a table of mapping of the instructions in intel manual). FFmpeg has MMX optimized IDCT implementation. So maybe direct conversion of MMX->IWMMXT is not so hard? Except that ARM has no immediate assignments MMX instruction set does not have immediate assignments either QUOTE and needs aligned data... FFmpeg does special care for alignment, many functions have guaranteed alignment specified for the data they are processing (some SSE instructions require 16-byte alignment after all, so ARM is not the most strict in this respect). Input data for IDCT is also 16-byte aligned for example, that's more than enough for ARM Anyway, somebody just needs to give it a try. To encourage you more and prove that it might work, looks like atty took the existing MMX implementation of dct_unquantize_h263_intra_mmx and converted it to dct_unquantize_h263_intra_iwmmxt |
|
|
|
Mar 22 2007, 01:35 PM
Post
#49
|
|
![]() Group: Members Posts: 1,014 Joined: 4-January 05 From: Enschede, The Netherlands Member No.: 6,107 |
QUOTE(Serge @ Mar 22 2007, 06:56 PM) QUOTE(koen @ Mar 21 2007, 08:42 AM) QUOTE(Serge @ Mar 21 2007, 04:26 PM) By the way, IWMMXT seems to be very close to MMX (there is even a table of mapping of the instructions in intel manual). FFmpeg has MMX optimized IDCT implementation. So maybe direct conversion of MMX->IWMMXT is not so hard? Except that ARM has no immediate assignments MMX instruction set does not have immediate assignments either QUOTE and needs aligned data... FFmpeg does special care for alignment, many functions have guaranteed alignment specified for the data they are processing (some SSE instructions require 16-byte alignment after all, so ARM is not the most strict in this respect). Input data for IDCT is also 16-byte aligned for example, that's more than enough for ARM Right, o-hand ported the fbmmx layer in the xserver to iwmmx but it wasn't faster since you had to align the data by hand. Maybe ffmpeg can gain more. |
|
|
|
Mar 23 2007, 02:00 PM
Post
#50
|
|
|
Group: Members Posts: 14 Joined: 9-May 06 Member No.: 9,821 |
QUOTE(Serge @ Mar 21 2007, 05:26 PM) You can try to override idct by using '-lavdopts idct=<some_number>' in atty's build and test it. After getting the numbers we can see if it is really IPP that matters, or maybe atty's build has some other optimizations. I did try it, and using the non-IPP IDCT produces results which are comparable ish. atty mplayer is still faster by 10% or so, so there are still a few more tweaks I need to sort out, but it was 40% better when using ipp. Cheers, Tim |
|
|
|
Jul 14 2007, 01:16 PM
Post
#51
|
|
|
Group: Members Posts: 51 Joined: 8-October 06 Member No.: 11,724 |
Hi, I'm working on further optimizing ARMv5 IDCT for mplayer/ffmpeg. Older implementation from mplayer 1.0rc1 was only optimized for ARM9E cores. Now it should get noticeably faster on long pipeline cores such as XScale (Sharp Zaurus) and ARM11 (Nokia N800).
Can anybody compile and run the following test on XScale: > svn checkout https://garage.maemo.org/svn/mplayer/trunk/libavcodec > cd libavcodec/tests > make test-idct You may need to specify the name of your crosscompiler when running make (ex. 'CC="arm-softfloat-linux-gnueabi-gcc" make test-idct') After that please copy 'test-idct' bunary to your device and run it specifying cpu clock frequency in the command line (for 416MHz Zaurus it would be './test-idct --freq=416') For those who are curious, here are the results from running this test on Nokia 770: CODE > ./test-idct --freq=252 Assuming cpu clock frequency 252MHz (ARMv6 disabled) Please be patient and wait for the results, test requires quite a lot of time to run... correctness tests passed --- benchmarking with zero idct coefficients --- simple_idct_armv5te time=886.0 simple_idct_put_armv5te cache=no, time=1062.2 simple_idct_put_armv5te cache=yes, time=1032.8 simple_idct_add_armv5te cache=no, time=1323.7 simple_idct_add_armv5te cache=yes, time=1186.2 simple_idct_armv5te_ref time=1041.8 simple_idct_put_armv5te_ref cache=no, time=1257.6 simple_idct_put_armv5te_ref cache=yes, time=1253.0 simple_idct_add_armv5te_ref cache=no, time=1561.9 simple_idct_add_armv5te_ref cache=yes, time=1445.6 --- benchmarking with random idct coefficients --- simple_idct_armv5te time=1423.4 simple_idct_put_armv5te cache=no, time=1665.7 simple_idct_put_armv5te cache=yes, time=1655.3 simple_idct_add_armv5te cache=no, time=1934.6 simple_idct_add_armv5te cache=yes, time=1783.8 simple_idct_armv5te_ref time=1698.6 simple_idct_put_armv5te_ref cache=no, time=1914.0 simple_idct_put_armv5te_ref cache=yes, time=1911.6 simple_idct_add_armv5te_ref cache=no, time=2221.2 simple_idct_add_armv5te_ref cache=yes, time=2098.9 Results for Nokia N800: CODE > ./test-idct --freq=330 --enable-armv6 Assuming cpu clock frequency 330MHz (ARMv6 enabled) Please be patient and wait for the results, test requires quite a lot of time to run... correctness tests passed --- benchmarking with zero idct coefficients --- simple_idct_armv5te time=751.3 simple_idct_put_armv5te cache=no, time=947.7 simple_idct_put_armv5te cache=yes, time=866.9 simple_idct_add_armv5te cache=no, time=1099.2 simple_idct_add_armv5te cache=yes, time=937.6 simple_idct_armv5te_ref time=1084.5 simple_idct_put_armv5te_ref cache=no, time=1288.4 simple_idct_put_armv5te_ref cache=yes, time=1280.5 simple_idct_add_armv5te_ref cache=no, time=1538.2 simple_idct_add_armv5te_ref cache=yes, time=1397.9 simple_idct_armv6 time=762.4 simple_idct_put_armv6 cache=no, time=1034.9 simple_idct_put_armv6 cache=yes, time=765.4 simple_idct_add_armv6 cache=no, time=1063.2 simple_idct_add_armv6 cache=yes, time=903.2 --- benchmarking with random idct coefficients --- simple_idct_armv5te time=1220.0 simple_idct_put_armv5te cache=no, time=1413.3 simple_idct_put_armv5te cache=yes, time=1355.4 simple_idct_add_armv5te cache=no, time=1576.0 simple_idct_add_armv5te cache=yes, time=1417.2 simple_idct_armv5te_ref time=1872.0 simple_idct_put_armv5te_ref cache=no, time=2079.6 simple_idct_put_armv5te_ref cache=yes, time=2081.5 simple_idct_add_armv5te_ref cache=no, time=2342.7 simple_idct_add_armv5te_ref cache=yes, time=2190.1 simple_idct_armv6 time=1138.9 simple_idct_put_armv6 cache=no, time=1426.7 simple_idct_put_armv6 cache=yes, time=1144.8 simple_idct_add_armv6 cache=no, time=1444.1 simple_idct_add_armv6 cache=yes, time=1281.9 Test results from XScale are needed to check if my assumptions are correct (I used ARM9E, ARM11 and XScale manuals for reference to write code that works the best on all these CPUs, but could only test it on Nokia 770 and N800). Theoretically, results from XScale should be very similar to the results from Nokia N800 (ARM11). Lower numbers are better (that is time for running IDCT in cpu cycles). Functions with '_ref' suffix belong to the reference armv5te optimized idct implementation from mplayer 1.0rc1 If anybody want to build an optimized mplayer, you need to download this file and replace simple_idct_armv5te.S in your mplayer sources. |
|
|
|
Jul 14 2007, 01:47 PM
Post
#52
|
|
![]() Group: Members Posts: 2,350 Joined: 30-July 06 Member No.: 10,575 |
I'll see if I can give it a try.
How much is this likely to speed up MPlayer, or is that what you're trying to determine? |
|
|
|
Jul 14 2007, 02:04 PM
Post
#53
|
|
|
Group: Members Posts: 51 Joined: 8-October 06 Member No.: 11,724 |
QUOTE(Capn_Fish @ Jul 14 2007, 01:47 PM) I'll see if I can give it a try. How much is this likely to speed up MPlayer, or is that what you're trying to determine? IDCT usually takes 20-40% of video decoding time. There will be no huge overall speedup, but the improvement should be quite noticeable (IDCT itself becomes up to 1.5x faster on ARM11). The goal is to reduce performance difference from the mplayer compiled with IPP (see a previous tjchick's post) and possibly beat it The best results can be achieved by using IWMMX instructions though. But some older cores do not support IWMMX (PXA255 for example) and a tweaked ARMv5TE IDCT would be handy there. Also IWMMX optimized IDCT still needs to be written and this ARMv5TE IDCT can serve as a placeholder until then. |
|
|
|
Jul 15 2007, 05:45 AM
Post
#54
|
|
|
Group: Members Posts: 103 Joined: 22-August 05 From: Moscow, Russia. Member No.: 7,924 |
pxa270, 416MHz (Zaurus C3100), Gentoo 2007.0, eabi.
CODE Assuming cpu clock frequency 416MHz (ARMv6 disabled)
Please be patient and wait for the results, test requires quite a lot of time to run... correctness tests passed --- benchmarking with zero idct coefficients --- simple_idct_armv5te time=751.9 simple_idct_put_armv5te cache=no, time=1988.0 simple_idct_put_armv5te cache=yes, time=860.2 simple_idct_add_armv5te cache=no, time=1136.2 simple_idct_add_armv5te cache=yes, time=923.1 simple_idct_armv5te_ref time=1131.8 simple_idct_put_armv5te_ref cache=no, time=1297.1 simple_idct_put_armv5te_ref cache=yes, time=1281.0 simple_idct_add_armv5te_ref cache=no, time=1625.5 simple_idct_add_armv5te_ref cache=yes, time=1385.5 --- benchmarking with random idct coefficients --- simple_idct_armv5te time=1168.7 simple_idct_put_armv5te cache=no, time=2281.7 simple_idct_put_armv5te cache=yes, time=1277.0 simple_idct_add_armv5te cache=no, time=1595.2 simple_idct_add_armv5te cache=yes, time=1340.3 simple_idct_armv5te_ref time=1821.7 simple_idct_put_armv5te_ref cache=no, time=1988.0 simple_idct_put_armv5te_ref cache=yes, time=1981.6 simple_idct_add_armv5te_ref cache=no, time=2326.5 simple_idct_add_armv5te_ref cache=yes, time=2084.4 |
|
|
|
Jul 15 2007, 08:40 AM
Post
#55
|
|
|
Group: Members Posts: 51 Joined: 8-October 06 Member No.: 11,724 |
QUOTE(Civil @ Jul 15 2007, 05:45 AM) pxa270, 416MHz (Zaurus C3100), Gentoo 2007.0, eabi. ... Thanks for running this test. Almost all is just as I expected, XScale pipeline is really very similar to ARM11. Number crunching part of IDCT is now ~1.5x faster ('simple_idct_armv5te' vs. 'simple_idct_armv5te_ref'). Also everything is very fast if we don't take memory performance into account and all the memory accesses hit cache. But generally we are interested in performance of functions 'simple_idct_put_armv5te' and 'simple_idct_add_armv5te' when the results get stored into memory and that memory region is not in the cache. Everything is fine with 'simple_idct_add_armv5te' and it really got quite a lot faster. But there seems to be an unexpected problem with 'simple_idct_put_armv5te'. Probably write buffer (some temporary storage in cpu for memory writes that bypass cache) overflows and XScale pipeline stalls resulting in a very bad performance. When 'simple_idct_put_armv5te' stores results into memory region which is in cache, it works very fast. I'll try to tweak the code a bit and will ask you to rerun this test a bit later. Thanks again for running the test, if we did not check this code on XScale before its submission to ffmpeg, performance on XScale would be not too good (don't know how it would affect overall results as 'simple_idct_add_armv5te' would speed up and 'simple_idct_put_armv5te' would slow down). Anyway, after the code gets fixed for XScale, I think we can expect something like 5-10% of overall video decoding improvement on it (depending on video file). |
|
|
|
Aug 28 2007, 09:33 PM
Post
#56
|
|
|
Group: Members Posts: 51 Joined: 8-October 06 Member No.: 11,724 |
I'm sorry for a long delay with an answer. Could you try to run this idct test on XScale again? I believe that this performance regression for 'simple_idct_put_armv5te' should be fixed now.
|
|
|
|
Aug 30 2007, 01:50 PM
Post
#57
|
|
![]() Group: Admin Posts: 3,277 Joined: 29-July 04 From: Cambridge, England Member No.: 4,149 |
Any improvement at all is very much welcomed - I hope that these optimisations will make it into Angstrom as soon as proven and stable!
|
|
|
|
Sep 2 2007, 10:18 AM
Post
#58
|
|
|
Group: Members Posts: 51 Joined: 8-October 06 Member No.: 11,724 |
QUOTE(speculatrix @ Aug 30 2007, 01:50 PM) Any improvement at all is very much welcomed - I hope that these optimisations will make it into Angstrom as soon as proven and stable! Well, I'm maintaining mplayer package for maemo and have some good stuff already which I would like to contribute to ffmpeg. I'm only posting some test code sample here to ensure that these my submissions will not cause any regressions on XScale and will not hurt you By the way, here are the latest synthetic benchmarks of ARMv5TE optimized IDCT (SVN revision 249) on Nokia N800 as its ARM11 cpu is similar to XScale: CODE $ ./test-idct --freq=330 Assuming cpu clock frequency 330MHz (ARMv6 disabled) Please be patient and wait for the results, test requires quite a lot of time to run... correctness tests passed --- benchmarking with zero idct coefficients --- simple_idct_armv5te time=685.8 simple_idct_put_armv5te cache=no, time=780.4 simple_idct_put_armv5te cache=yes, time=770.0 simple_idct_add_armv5te cache=no, time=984.9 simple_idct_add_armv5te cache=yes, time=853.3 simple_idct_add_pf_pld_armv5te cache=no, time=940.9 simple_idct_add_pf_pld_armv5te cache=yes, time=863.1 simple_idct_add_pf_ldr_armv5te cache=no, time=958.3 simple_idct_add_pf_ldr_armv5te cache=yes, time=862.5 simple_idct_armv5te_ref time=1088.1 simple_idct_put_armv5te_ref cache=no, time=1286.2 simple_idct_put_armv5te_ref cache=yes, time=1282.9 simple_idct_add_armv5te_ref cache=no, time=1518.2 simple_idct_add_armv5te_ref cache=yes, time=1393.9 --- benchmarking with random idct coefficients --- simple_idct_armv5te time=1147.0 simple_idct_put_armv5te cache=no, time=1240.9 simple_idct_put_armv5te cache=yes, time=1233.8 simple_idct_add_armv5te cache=no, time=1467.0 simple_idct_add_armv5te cache=yes, time=1317.2 simple_idct_add_pf_pld_armv5te cache=no, time=1403.5 simple_idct_add_pf_pld_armv5te cache=yes, time=1366.2 simple_idct_add_pf_ldr_armv5te cache=no, time=1438.8 simple_idct_add_pf_ldr_armv5te cache=yes, time=1341.3 simple_idct_armv5te_ref time=1872.6 simple_idct_put_armv5te_ref cache=no, time=2065.1 simple_idct_put_armv5te_ref cache=yes, time=2064.9 simple_idct_add_armv5te_ref cache=no, time=2308.4 simple_idct_add_armv5te_ref cache=yes, time=2179.2 Also here is a more real test with matrixbench_normdivx_vbrmp3.avi video clip from http://samples.mplayerhq.hu/benchmark/testsuite1/ CODE Benchmark with current IDCT: # mplayer -nosound -vo null -quiet -benchmark -loop 12 -lavdopts idct=16 matrixbench_normdivx_vbrmp3.avi | grep BENCHMARKs BENCHMARKs: VC: 135.127s VO: 0.163s A: 0.000s Sys: 1.387s = 136.677s BENCHMARKs: VC: 132.337s VO: 0.153s A: 0.000s Sys: 1.382s = 133.872s BENCHMARKs: VC: 133.986s VO: 0.148s A: 0.000s Sys: 1.351s = 135.485s BENCHMARKs: VC: 134.576s VO: 0.174s A: 0.000s Sys: 1.351s = 136.102s BENCHMARKs: VC: 132.979s VO: 0.161s A: 0.000s Sys: 1.387s = 134.527s BENCHMARKs: VC: 132.987s VO: 0.145s A: 0.000s Sys: 1.408s = 134.539s BENCHMARKs: VC: 132.945s VO: 0.150s A: 0.000s Sys: 1.394s = 134.489s BENCHMARKs: VC: 132.248s VO: 0.152s A: 0.000s Sys: 1.353s = 133.753s BENCHMARKs: VC: 131.673s VO: 0.152s A: 0.000s Sys: 1.366s = 133.191s BENCHMARKs: VC: 132.138s VO: 0.149s A: 0.000s Sys: 1.370s = 133.656s BENCHMARKs: VC: 132.536s VO: 0.144s A: 0.000s Sys: 1.364s = 134.044s BENCHMARKs: VC: 132.332s VO: 0.148s A: 0.000s Sys: 1.329s = 133.810s Benchmark with the new optimized IDCT (after replacing 'simple_idct_armv5te.S' and recompiling mplayer): # mplayer -nosound -vo null -quiet -benchmark -loop 12 -lavdopts idct=16 matrixbench_normdivx_vbrmp3.avi | grep BENCHMARKs BENCHMARKs: VC: 122.543s VO: 0.162s A: 0.000s Sys: 1.416s = 124.120s BENCHMARKs: VC: 120.901s VO: 0.152s A: 0.000s Sys: 1.371s = 122.424s BENCHMARKs: VC: 122.490s VO: 0.147s A: 0.000s Sys: 1.338s = 123.975s BENCHMARKs: VC: 124.826s VO: 0.151s A: 0.000s Sys: 1.325s = 126.302s BENCHMARKs: VC: 123.052s VO: 0.143s A: 0.000s Sys: 1.393s = 124.588s BENCHMARKs: VC: 121.897s VO: 0.146s A: 0.000s Sys: 1.366s = 123.409s BENCHMARKs: VC: 122.406s VO: 0.139s A: 0.000s Sys: 1.359s = 123.903s BENCHMARKs: VC: 123.448s VO: 0.150s A: 0.000s Sys: 1.381s = 124.979s BENCHMARKs: VC: 119.141s VO: 0.143s A: 0.000s Sys: 1.360s = 120.644s BENCHMARKs: VC: 120.555s VO: 0.147s A: 0.000s Sys: 1.340s = 122.042s BENCHMARKs: VC: 120.686s VO: 0.141s A: 0.000s Sys: 1.377s = 122.203s BENCHMARKs: VC: 120.902s VO: 0.143s A: 0.000s Sys: 1.358s = 122.402s It really confirms video decoding speedup in the range 5-10% as estimated earlier. It is interesting to see how it will work on XScale. Also it would be very interesting to compare performance of this IDCT implementation to the one from IPP to check which one is faster now and how much? |
|
|
|
Sep 3 2007, 02:32 AM
Post
#59
|
|
|
Group: Members Posts: 101 Joined: 23-June 04 Member No.: 3,800 |
A zaurus C3200 px27x
Before new idct mplayer -nosound -vo null -quiet -benchmark -loop 12 -lavdopts idct=16 matrixbench_normdivx_vbrmp3.avi | grep BENCHMARKs BENCHMARKs: VC: 209.368s VO: 0.168s A: 0.000s Sys: 3.011s = 212.547s BENCHMARKs: VC: 213.062s VO: 0.170s A: 0.000s Sys: 3.022s = 216.253s BENCHMARKs: VC: 214.726s VO: 0.169s A: 0.000s Sys: 3.039s = 217.935s BENCHMARKs: VC: 214.936s VO: 0.170s A: 0.000s Sys: 2.674s = 217.780s BENCHMARKs: VC: 215.113s VO: 0.170s A: 0.000s Sys: 3.182s = 218.464s BENCHMARKs: VC: 215.065s VO: 0.170s A: 0.000s Sys: 2.618s = 217.853s BENCHMARKs: VC: 215.700s VO: 0.170s A: 0.000s Sys: 2.611s = 218.482s BENCHMARKs: VC: 215.293s VO: 0.170s A: 0.000s Sys: 2.606s = 218.069s BENCHMARKs: VC: 215.575s VO: 0.170s A: 0.000s Sys: 2.621s = 218.366s BENCHMARKs: VC: 215.655s VO: 0.169s A: 0.000s Sys: 2.608s = 218.433s BENCHMARKs: VC: 215.323s VO: 0.170s A: 0.000s Sys: 2.614s = 218.107s BENCHMARKs: VC: 215.373s VO: 0.170s A: 0.000s Sys: 2.610s = 218.153s After new idct mplayer -nosound -vo null -quiet -benchmark -loop 12 -lavdopts idct=16 matrixbench_normdivx_vbrmp3.avi | grep BENCHMARKs BENCHMARKs: VC: 203.236s VO: 0.169s A: 0.000s Sys: 2.651s = 206.056s BENCHMARKs: VC: 207.844s VO: 0.170s A: 0.000s Sys: 2.641s = 210.654s BENCHMARKs: VC: 207.917s VO: 0.171s A: 0.000s Sys: 2.633s = 210.722s BENCHMARKs: VC: 207.760s VO: 0.170s A: 0.000s Sys: 2.634s = 210.564s BENCHMARKs: VC: 207.879s VO: 0.172s A: 0.000s Sys: 2.617s = 210.668s BENCHMARKs: VC: 207.367s VO: 0.170s A: 0.000s Sys: 2.635s = 210.172s BENCHMARKs: VC: 208.025s VO: 0.170s A: 0.000s Sys: 2.629s = 210.824s BENCHMARKs: VC: 207.421s VO: 0.170s A: 0.000s Sys: 2.623s = 210.213s BENCHMARKs: VC: 207.879s VO: 0.170s A: 0.000s Sys: 2.618s = 210.667s BENCHMARKs: VC: 207.960s VO: 0.171s A: 0.000s Sys: 2.635s = 210.765s BENCHMARKs: VC: 207.909s VO: 0.170s A: 0.000s Sys: 2.628s = 210.707s BENCHMARKs: VC: 207.877s VO: 0.170s A: 0.000s Sys: 2.627s = 210.675s |
|
|
|
Sep 4 2007, 11:03 AM
Post
#60
|
|
|
Group: Members Posts: 51 Joined: 8-October 06 Member No.: 11,724 |
OK, thanks, so at least this IDCT optimization is useful on Zaurus too. I'll try to submit it upstream soon, so that we would all have it in mplayer 1.0rc2 whenever it gets released
But video performance on Zaurus looks quitey bad according to this benchmark, hence significantly lower relative effect of IDCT optimization. Poor performance is partially caused by IWMMXT optimizations not getting enabled in the default mplayer 1.0rc1 sources because of a bug. Also earlier in this thread we got benchmarks from atty's build of mplayer and it had a much better performance. A large part of this improvement was considered to be introduced by the use of IPP. But IPP only provides IDCT acceleration and IDCT looks to be quite fast already (if 1.5x IDCT performance improvement results in 7-8 seconds of difference, the whole IDCT probably takes no more than 30 seconds of all the decoding time). Even if IPP magically reduced IDCT overhead to zero, there is still too much time wasted somewhere remaining. Maybe it is still a good idea to try to find the source of this performance bottleneck and fix it once and for all (submitting all the relevant patches to upstream mplayer/ffmpeg)? There was an idea about slow memory causing performance problems. But memory performance (both bandwidth and latency) can be easily benchmarked. Also could I/O performance (reading from flash memory or HDD) affect video decoding time so much on Zaurus?. In this case putting some video clip in ramdisk should eliminate this factor. |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 22nd May 2013 - 01:10 PM |