[FFmpeg-devel] [PATCH] avutil: add av_memcpy() to avoid undefined behavior with NULL, NULL, 0

Michael Niedermayer michael at niedermayer.cc
Thu Jul 4 21:04:02 EEST 2019


On Wed, Jul 03, 2019 at 12:50:59PM +0200, Hendrik Leppkes wrote:
> On Wed, Jul 3, 2019 at 10:46 AM Michael Niedermayer
> <michael at niedermayer.cc> wrote:
> >
> > On Wed, Jul 03, 2019 at 09:41:41AM +0200, Reimar Döffinger wrote:
> > >
> > >
> > > On 03.07.2019, at 08:29, Michael Niedermayer <michael at niedermayer.cc> wrote:
> > >
> > > > On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote:
> > > >> On 7/2/2019 5:56 PM, Michael Niedermayer wrote:
> > > >>> Signed-off-by: Michael Niedermayer <michael at niedermayer.cc>
> > > >>> ---
> > > >>> doc/APIchanges      |  3 +++
> > > >>> libavutil/mem.h     | 13 +++++++++++++
> > > >>> libavutil/version.h |  2 +-
> > > >>> 3 files changed, 17 insertions(+), 1 deletion(-)
> > > >>>
> > > >>> diff --git a/doc/APIchanges b/doc/APIchanges
> > > >>> index b5fadc2a48..65b8ed74ad 100644
> > > >>> --- a/doc/APIchanges
> > > >>> +++ b/doc/APIchanges
> > > >>> @@ -15,6 +15,9 @@ libavutil:     2017-10-21
> > > >>>
> > > >>> API changes, most recent first:
> > > >>>
> > > >>> +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h
> > > >>> +  Add av_memcpy()
> > > >>> +
> > > >>> 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h
> > > >>>   Add FF_DECODE_ERROR_DECODE_SLICES
> > > >>>
> > > >>> diff --git a/libavutil/mem.h b/libavutil/mem.h
> > > >>> index 5fb1a02dd9..a35230360d 100644
> > > >>> --- a/libavutil/mem.h
> > > >>> +++ b/libavutil/mem.h
> > > >>> @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size);
> > > >>>  */
> > > >>> void av_memcpy_backptr(uint8_t *dst, int back, int cnt);
> > > >>>
> > > >>> +/**
> > > >>> + * memcpy() implementation without a NULL pointer special case
> > > >>> + *
> > > >>> + * @param dst  Destination buffer
> > > >>> + * @param src  Source buffer
> > > >>> + * @param cnt  Number of bytes to copy; must be >= 0
> > > >>> + */
> > > >>> +static inline void av_memcpy(void *dst, const void *src, size_t cnt)
> > > >>
> > > >> How many cases are there in the codebase where cnt can be 0, and dst or
> > > >> src NULL, without it having been checked before calling memcpy? And from
> > > >> those, which would not be from situations where the code should have
> > > >> instead aborted and returned ENOMEM, or EINVAL if either of them are
> > > >> function arguments?
> > > >
> > > > There are around 2500 occurances of memcpy in the codebase
> > > > To awnser your question it would be needed to review all of them and in many
> > > > cases their surrounding code.
> > > > So that is unlikely to be awnsered by anyone accuratly
> > > >
> > > > Also iam not sure i understand why you ask or why this would matter
> > > > the suggested function allows to simplify cases where the NULL can
> > > > occur, not where it cannot or should not. That is this is intended for
> > > > the cases where we already have or are adding explicit checks to
> > > > avoid the NULL case.
> > > >
> > > > i could rename this to av_memcpy_nullsafe which would make it clearer but
> > > > also its more to write and read
> > >
> > > I admit I thought that a worthwhile idea originally,
> > > but I have to think back to a long time ago that every function added to our "API" has a cost of people having to know about it and how to use it.
> > > And if it's currently only 2 places that would benefit I think James is right to ask if it makes sense.
> > > Of course another question might be if it might make sense to replace all memcpy uses with it.
> > > I mean, isn't it naturally expected behaviour that the pointers would be ignored if the copy amount is 0? There might be a lot of code assuming that we do not know about...
> >
> > in addition to the 2 there are these related commits found by very dumb git log greps
> > In further addition there would be cases that deal with src == dst, something we
> > could add a check for in av_memcpy() too
> >
> 
> Personally I really don't like hiding too much magic in a function
> like this. It can easily lead to brittle code, someone may think the
> pointer is always valid since its memcpy'ed to, and uses it
> afterwards, and there you have a disaster.

Why would someone think that a pointer to an array of length 0 is valid ?
malloc(0) for example does not gurantee that


> Either the function name should make perfectly clear what it does, or
> preferably it should just not exist and code should just validate its
> stuff.

what name would you suggest ?

but we can just drop this and add the checks to memcpy() where needed
i was just trying to make all the undefined behaviour fixes a bit less
ugly as multiple people complained ...

thanks

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20190704/a5ab19c4/attachment.sig>


More information about the ffmpeg-devel mailing list