[FFmpeg-devel] [PATCH v7 2/2] lavc/tiff: Decode embedded JPEGs in DNG images

Nick Renieris velocityra at gmail.com
Sun Jul 28 11:40:57 EEST 2019


Actually, I checked a more accurate version of my loop, and GCC
optimizes away the LUT check anyway:
https://godbolt.org/z/G1e1R4
As you can see it's smart enough to create 2 versions of my functions
(started at L3 with a lookup and L7 without it) and it does the check
outside.

There's no guarantee this is happening with the actual version of
course (it could be slower, or even faster if it also optimizes it
through dng_blit).
I could check the actual disasm in FFmpeg, but I don't think it's
worth it at this point (my mentor agrees).

Στις Κυρ, 28 Ιουλ 2019 στις 10:40 π.μ., ο/η Nick Renieris
<velocityra at gmail.com> έγραψε:
>
> Στις Κυρ, 28 Ιουλ 2019 στις 1:30 π.μ., ο/η Reimar Döffinger
> <Reimar.Doeffinger at gmx.de> έγραψε:
> >
> > Huh? Are you thinking of templates? always_inline can usually be used the same way.
> > I haven't looked into the how or anything, it was just a comment in principle.
>
> You're totally right (I checked on godbolt just to make sure), I
> hadn't considered that. Would need a trivial change (plus the function
> is already inlined) so I'll do it.
>
> > > I believe branch prediction and instruction pipelining will hide this
> > > delay. I doubt it has any measurable effect on performance.
> >
> > There are CPUs without that.
>
> Yes, but with neither? Branch prediction, sure, but I'd like to hear
> one example of a CPU FFmpeg runs on without pipelining (I genuinely
> don't know if there are).
>
> > Of course would need to look at what the actual requirements are concerning precision, rounding and range. But that should be done anyway.
> Like I said, I did. It's not possible :)
>
> > Well, I did not find it obvious where src and dst are from and what format they are in.
> > Essentially if they are decoder output it's likely fine, if they are read from a file without decoding step it's likely wrong.
>
> Right, I see your concern. They are read from a file and they can be
> both LE/BE, but that's specified in the file and there is logic in
> tiff.c to account for it. So when we process them they are guaranteed
> to be the same endianness as the host.
>
> > >> Also not sure if since these are essentially brightness/contrast adjustments if we should't rather just have a way to export the transform to use...
> > >
> > > Export to where?
> >
> > Just provide as metadata and leave to e.g. libavfilter.
>
> Not sure how I'd do that or if it'd be appropriate/feasible.
>
> > > I don't see why you'd need to complicate this by
> > > calling into other components, the transformation code is concise,
> > > clear and accurate for this use case.
> >
> > Slow and unoptimized and in it's current form hard to SIMD optimize, especially without changing output as well though (in addition to the large bit width of floats reducing the benefit, denormals can be an issue for SIMD-accelerating float code).
> > Unless I miss a reason why nobody would ever want this to be faster?
>
> I'll take out the comparison which should make it faster, I'll talk
> with my mentor if he thinks I should use libavfilter.
>
> > >> There is no need for an "else" block if the "if" block ends in a return.
> > >
> > > Indeed, but in my opinion it makes the code clearer on first glance
> > > (if you miss the return). I can change it if you insist.
> >
> > The second comment was the more relevant one actually.
> > I only really care about finding some way to make this part a whole lot more readable.
>
> Ok, hopefully the code I posted right after your review is readable enough.


More information about the ffmpeg-devel mailing list