[FFmpeg-devel] [PATCH] avcodec/utils/avpriv_find_start_code: optimization. If HAVE_FAST_UNALIGNED is true, handle "1 + sizeof(long)" bytes per step.
Michael Niedermayer
michaelni at gmx.at
Thu Jan 1 20:18:37 CET 2015
On Fri, Jan 02, 2015 at 01:27:51AM +0800, zhaoxiu.zeng wrote:
> 在 2015/1/1 11:49, Michael Niedermayer 写道:
> > On Thu, Jan 01, 2015 at 10:13:58AM +0800, zhaoxiu.zeng wrote:
> >> libavcodec/utils.c | 68 ++++++++++++++++++++++++++++++++++++++++++++----------
> >> 1 file changed, 56 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/libavcodec/utils.c b/libavcodec/utils.c
> >> index 1ec5cae..14a43e2 100644
> >> --- a/libavcodec/utils.c
> >> +++ b/libavcodec/utils.c
> >> @@ -3772,30 +3772,74 @@ const uint8_t *avpriv_find_start_code(const uint8_t *av_restrict p,
> >> uint32_t *av_restrict state)
> >> {
> >> int i;
> >> + uint32_t stat;
> >>
> >> av_assert0(p <= end);
> >> if (p >= end)
> >> return end;
> >>
> >> + stat = *state;
> >> for (i = 0; i < 3; i++) {
> >> - uint32_t tmp = *state << 8;
> >> - *state = tmp + *(p++);
> >> - if (tmp == 0x100 || p == end)
> >> + uint32_t tmp = stat << 8;
> >> + stat = tmp + *(p++);
> >> + if (tmp == 0x100 || p == end) {
> >> + *state = stat;
> >> return p;
> >> + }
> >> }
> >>
> >> - while (p < end) {
> >> - if (p[-1] > 1 ) p += 3;
> >> - else if (p[-2] ) p += 2;
> >> - else if (p[-3]|(p[-1]-1)) p++;
> >> - else {
> >> +#if HAVE_FAST_UNALIGNED
> >> +#if HAVE_FAST_64BIT
> >> + for (; p + 6 <= end; p += 9) {
> >> + uint64_t t = AV_RN64A(p - 2);
> >> + if (!((t - 0x0100010001000101ULL) & ~(t | 0x7fff7fff7fff7f7fULL)))
> >> + continue;
> >> +#else
> >> + for (; p + 2 <= end; p += 5) {
> >> + uint32_t t = AV_RN32A(p - 2);
> >> + if (!((t - 0x01000101U) & ~(t | 0x7fff7f7fU)))
> >> + continue;
> >> +#endif
> >> + /* find the first zero byte in t */
> >> +#if HAVE_BIGENDIAN
> >> + while (t >> (sizeof(t) * 8 - 8)) {
> >> + t <<= 8;
> >> + p++;
> >> + }
> >> +#else
> >> + while (t & 0xff) {
> >> + t >>= 8;
> >> + p++;
> >> + }
> >> +#endif
> >
> > this maybe can be simplified by using ff_startcode_find_candidate_c()
> >
> There is a little different. ff_startcode_find_candidate_c find the first "0x00", but we only care "0x00 0x00".
> Use 0x0100010001000101ULL not 0x0101010101010101ULL to reduce the hit ratio of lonely "0x00", so it can be faster
> if there are some lonely "0x00" in the buffer.
if you can improve ff_startcode_find_candidate_c() please do so.
But the code should not be duplicated in a slightly different way
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Those who are too smart to engage in politics are punished by being
governed by those who are dumber. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150101/88ddcee7/attachment.asc>
More information about the ffmpeg-devel
mailing list