[FFmpeg-devel] PATCH: av_strtod

Tue Jun 2 04:59:06 CEST 2009

-----Original Message-----
...
> This is a known issue with msvcrt. Search the ffmpeg-devel archives
for some past discussions. MinGW32 uses a POSIX compliant strtod
implementation and it suits us well.

Well, I understand that it uses it's own implementation, otherwise it
wouldn't work :)

> > Then my lib doesn't pass seek_test.exe.

> This is also known issue. Look for a thread on this list about huge
negative numbers, MinGW32, and printf or something. MinGW32 has a printf
implementation that recently fixed this issue, and it also suits us
well.

Basically, nodoby wants to "pollute code with ugly workarounds for
broken platforms/compilers", that's like a regular answer :)
In my original answer, I wrote about a test app that prints result of
the expression the 1<<63 / 44100 and *both* icl and gcc produce results
different from what seek_test prints. I don't think it's something about
presision and that gcc uses higher precision, because GCC PRODUCES THE
SAME RESULT AS ICL, but in seek_test it produces different result. 
Seek_test could instead print string like
"43243245324543543253245235432/44100" instead of doing float point
division and printing the result. In case of seek test its task to to
veryfy that code produces correct results and the values that need to be
verified are the pkt.dts/pts and there is not need to devide them or do
any math, all of them could be just printed to a file and compared to
.ref

[...]
> > plus the worst ever code is in recently added
mlp_filter_channel_x86, 
> > which mangles lables. WTF, why not a standalone .asm file instead?!?

> A patch based in configure disabling that code on supported compilers
where it fails is welcome.

See my previous email on how I made it work. I didn't disable any
optimizations/inline asm to compile entire ffmpeg
Instead of storing table of lables and then jumping in the loop I create
multiple functions and I use function pointer table.

> It's not quite clear how you got around some issues. Specially the
ones listed here:
> http://wiki.multimedia.cx/index.php?title=Icc#Windows_version
> Did you rewrite inline assembly in ms style, or is there a way to
switch inline assembly modes inside the asm block so that you can insert
your ALIGN directive? Did you disable the code that needs MANGLE?

Off course I didn't rewite any assembly in ms style. It's TOO error
prone, but for most of my changes I actually did icl/gcc -S to see
produced assembly listing and then I compared with unmodified version to
make sure that they are identical. On top of that, in ms style assembly
compiler is more likely to produce slow junk or the programmer has to
manually do mov eax, input_var; which might be unnecessary.

The MANGLE problem was quite a disaster, as it would require too many
changes that almost for sure would lead to bugs and endless testing, but
there is a nice way to fix it. Hopefully, it will be reviewed by ffmpeg
developers and taken into ffmpeg.
I redefined MANGLE as
//#define MANGLE(a) EXTERN_PREFIX LOCAL_MANGLE(a)
#define MANGLE(mem) "%["#mem"]"

Which makes final asm look instead of:
	"movzbl _ff_h264_lps_range(%0, %%ebx, 2), %%esi\n\t"
It became
	"movzbl %[ff_h264_lps_range](%0, %%ebx, 2), %%esi\n\t"

Then ff_h264_lps_range has to be properly declared to make sure that
compiler understands that this variable isn't externally imported.

And all I had to do is to to modify list of asm inputs everywhere where
I had undefined variables. Moreover, named asm inputs are obviously a
good idea especially with ffmpeg's inline asm where there are multiple
defines that reference asm args by %0 %1 etc. which makes this code
understandable only by the original author. Named args are less error
prone and alow 30 inputs instead of 10. One more issue - all 64 bit
constants had to be changed into arrays, so that there would never be
any standalone 64-bit static value as input to asm block; otherwise
compiler tries to optimize it and use that 64 bit value as immidiate val
and produces complete junk that runs like 3-4 time slower.
Instead of 
	movq _some64bit_const, mmx_reg //_some64bit_const is {0x1234,
0x5678}
It would emit junk like 
	push 0x1234
	push 0x5678
	movq esp, mmx_reg 

For .align I didn't do anything, I redefine it as "#.align 1<< ...".
That's just to align code; After reviewing the code that uses it, I
don't think that ommiting .aling will have big impact.
It's however possible to make it work:
__asm__("att asm");
__asm{ ALIGN 16 };
__asm__("1:  \n\t att asm");

I verified and it worked in some places, but in some others after that
__asm {ALIGN 16}; compiler inserts some instructions (those that have to
be executed as part of the input args, like loading addess into eax or
whatever). And those instructions are like 5 bytes long each which
completely breaks code alignment. As I understand code missalignment on
ia32 can't have big performance penalty, since almost all instructions
are like 3-4-5-6 or whaever bytes long (unlike arm with 32bit
instructions), so entire code is randomly aligned.
I'm sure it's possible to find a workaround, for that align directive.
Maybe move input args that require setup to the first __asm__ block, so
that all setup would be done before the ALIGN 16.