[FFmpeg-devel] [PATCH] avformat/hlsenc: Fix path handling on Windows

Sat Jan 15 23:45:04 EET 2022

Soft Works:
> 
> 
>> -----Original Message-----
>> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Andreas
>> Rheinhardt
>> Sent: Saturday, January 15, 2022 7:34 PM
>> To: ffmpeg-devel at ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH] avformat/hlsenc: Fix path handling on
>> Windows
>>
>> Soft Works:
>>>
>>>
>>>> -----Original Message-----
>>>> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Andreas
>>>> Rheinhardt
>>>> Sent: Saturday, January 15, 2022 7:40 AM
>>>> To: ffmpeg-devel at ffmpeg.org
>>>> Subject: Re: [FFmpeg-devel] [PATCH] avformat/hlsenc: Fix path handling on
>>>> Windows
>>>>
>>>> ffmpegagent:
>>>>> From: softworkz <softworkz at hotmail.com>
>>>>>
>>>>> Signed-off-by: softworkz <softworkz at hotmail.com>
>>>>> ---
>>>>>     avformat/hlsenc: Fix path handling on Windows
>>>>>
>>>>>     Handling for DOS path separators was missing
>>>>>
>>>>> Published-As: https://github.com/ffstaging/FFmpeg/releases/tag/pr-
>>>> ffstaging-19%2Fsoftworkz%2Fsubmit_hlspath-v1
>>>>> Fetch-It-Via: git fetch https://github.com/ffstaging/FFmpeg pr-ffstaging-
>>>> 19/softworkz/submit_hlspath-v1
>>>>> Pull-Request: https://github.com/ffstaging/FFmpeg/pull/19
>>>>>
>>>>>  libavformat/hlsenc.c | 4 ++++
>>>>>  1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/libavformat/hlsenc.c b/libavformat/hlsenc.c
>>>>> index ef8973cea1..eff7f4212e 100644
>>>>> --- a/libavformat/hlsenc.c
>>>>> +++ b/libavformat/hlsenc.c
>>>>> @@ -3028,6 +3028,10 @@ static int hls_init(AVFormatContext *s)
>>>>>                  }
>>>>>
>>>>>                  p = strrchr(vs->m3u8_name, '/');
>>>>> +#if HAVE_DOS_PATHS
>>>>> +                p = FFMAX(p, strrchr(vs->m3u8_name, '\\'));
>>>>> +#endif
>>>>> +
>>>>>                  if (p) {
>>>>>                      char tmp = *(++p);
>>>>>                      *p = '\0';
>>>>>
>>>>> base-commit: c936c319bd54f097cc1d75b1ee1c407d53215d71
>>>>>
>>>>
>>>
>>> Thanks for reviewing.
>>>
>>>> 1. You seem to be under the impression that NULL <= all other pointers.
>>>> This is wrong. Relational operators acting on pointers are only defined
>>>> when both point to the same object (the case of "one past the last
>>>> element of an array" is also allowed) and are undefined behaviour
>> otherwise.
>>>
>>> The case about NULL is interesting - I wasn't aware of that.
>>> Is it practically relevant, i.e. is there any platform where casting
>>> (void *)0 does not evaluate to 0 ?
>>>
>>
>> "An integer constant expression with the value 0, or such an expression
>> cast to type
>>  void *, is called a null pointer constant." (C11, 6.3.2.3 3) (void*)0
>> is therefore a valid null pointer constant and is also commonly used for
>> the NULL macro. (void*)0 == 0 is always true, because the right hand
>> side is converted to the type of the pointer (namely to a null pointer)
>> and all null pointers compare equal. But this is irrelevant to
>> relational comparisons, because checking for equality of pointers is not
>> subject to these pointers pointing to the same object (or one past the
>> last element of an array...), whereas this is so for relational operations.
>>
>> (If one uses unsigned for pointers, then one only needs to reserve two
>> values that can not be used as part of an object: 0 and the max value
>> (the latter can't be used for an object, because using a pointer one
>> past the object is legal and has to be consistent with "<=" (and anyway
>> said pointer must compare unequal to NULL)); if one used signed
>> comparisons for pointers
> 
> On initial read I had thought you were implying something like "signed 
> pointers". But maybe one should really create a computer system with 
> "symmetric addressing" where pointers are always pointing to the center 
> of a memory area (odd-number sized). I think it would be a great 
> invention for bored and under-challenged developers who love to 
> deep-dive into academic details - hihi..
> 
> 
> (I'd bet, concepts and theory exist somewhere anywhere)
>  
>> one would have to reserve -1, 0 and the max
>> value, the former because a one past the end array element needs to
>> compare unequal to NULL and the latter to be consistent with <= and a
>> potential one-past-the-end element. But this is a very small advantage.
>> Honestly, I don't know whether compilers consistently use unsigned
>> comparisons for pointer comparisons at all (even when restricted to
>> compilers for systems with HAVE_DOS_PATHS). The fact that comparisons of
>> pointers to different objects is UB means that compiler writers actually
>> can choose what they want.)
> 
> So one could cast both pointers to unsigned to make the relational
> comparison valid, as long as either (1) both are pointing to the same object,
> or (2) one or (3) both are NULL/0?
> 

Cast to unsigned? The above is not meant to require the programmer to
add casts; it is about whether an implementation can simply use numbers
for pointers under the hood and what it has to do to ensure that it is
compliant with the C specs.
Also notice that (void*)0 <= (void*)0 is UB, although (void*)0
==(void*)0 is defined and true. Whether NULL <= NULL is UB depends upon
the implementation details of said macro (if it is an integer constant,
the comparisons are valid (and true)).

> C++ defines NULL as 0 (which I had actually also assumed to be the case
> in C), so this wouldn't be an issue there I guess?

This?

> (I mean "hypothetical" issue same as what it is in C)
> 
> BTW, thanks for the insight. That part is really interesting.
> 
> Now for the other one..
> 
>> (Furthermore, it is not guaranteed by the spec that zeroing a pointer
>> via memset (or calloc) generates a valid null pointer. E.g. the
>> documentation of calloc has this footnote: "Note that this [the bitwise
>> zero-initialization] need not be the same as the representation of
>> floating-point zero or a null pointer constant." But I don't know a
>> system where this is not so and we definitely require it to be so.)
>>
>>>> 2. Apart from that: Your code would potentially evaluate strrchr()
>>>> multiple times which is bad style (given that this function is likely
>>>> marked as pure the compiler could probably optimize the second call
>>>> away, but this is not a given).
>>>
>>> It's not my code. It's code copied from avstring.c - so please blame
>>> whoever that wrote.
>>>
>>
>> I couldn't find strrchr() being evaluated multiple times unnecessarily
>> due to a macro in avstring.c.
> 
> I don't understand. av_basename() does this:
> 
>     p = strrchr(path, '/');
> #if HAVE_DOS_PATHS
>     q = strrchr(path, '\\');
>     d = strchr(path, ':');
>     p = FFMAX3(p, q, d);
> #endif
> 
> and I do this:
> 
>                 p = strrchr(vs->m3u8_name, '/');
> #if HAVE_DOS_PATHS
>                 p = FFMAX(p, strrchr(vs->m3u8_name, '\\'));
> #endif
> 
> So, for Windows, I'm counting 3 vs. 2...
> 

Indeed, you completely misunderstood:
#define FFMAX(a,b) ((a) > (b) ? (a) : (b)),
so what you intend to add will expand to
((p) > (strrchr(vs->m3u8_name, '\\')) ? (p) : (strrchr(vs->m3u8_name,
'\\')))
As you can see, there are two calls to strrchr(). A good compiler can
optimize the second call away (especially if the function is declared as
pure or similar), but it is better not to rely on this.

FFMAX3 uses FFMAX internally and evaluates its argument even more often
than FFMAX; but this doesn't matter, because these are just pointers;
there are no calls or side-effects or anything.

> 
>>> Regarding performance, I'm not sure whether this is relevant in any way,
>>> given the low frequency of execution and putting it into relation to
>>> all the other things that ffmpeg is doing usually.
>>>
>>
>> The above would be a valid point if there were a tradeoff between
>> writing the code without repeated evaluations and writing clear code.
>> (And even then you'd be ignoring that the performance difference might
>> be negligible for code only run very infrequently, but bloated code
>> takes more space in the binary even when executed infrequently.) But
>> there is no such tradeoff here.
> 
> A HLS segment typically has 0.5 to 20 MB. Let's assume 1 MB. Even when the 
> file name/path would be re-evaluated for each segment, it would be looping
> over something like 100 byte _additionally_ per segment. To produce the
> segment, those 1 MB data would need to be "touched" or looped over multiple
> times (effectively), even when only remuxing. Let's say 10x.
> That leaves us with 100 iterations vs. 10 Million iterations.
> Static ffmpeg binaries have like 60 MB. How many bytes does it add to the 
> code? 20? => makes 20 vs. 60 Million bytes
> 

To quote myself: "The above would be a valid point if there were a
tradeoff between writing the code without repeated evaluations and
writing clear code."

> 
>>>> 3. The code in av_basename() is also wrong.
>>>
>>> ...
>>>
>>>> 4. Is there actually a reason why you don't use av_basename() directly
>> here?
>>>
>>> Yes, multiple:
>>>
>>> 1. Different behavior - those functions are returning a "." when not found
>>
>> av_basename() also has precise guarantees when this happens.
> 
> by "those", I meant av_basename and av_dirname(). I wanted to do an
> absolute minimal fix that is safe without rewriting the code and without
> having to do much testing. I don't even use that code, the fix was
> a favor for somebody else and I've been once again stupid enough to 
> submit it to the ML - thinking it might be useful for somebody.
> 
> 
>>> 2. The docs tell it's required to copy a string before supplying it to
>>>    those functions (as they may changing the string).
>>
>> You are confusing av_basename() and av_dirname().
> 
> No. av_dirname() is actually the one that is needed. 
> 

One could use av_basename() to get the beginning of the basename to
temporarily zero it. I specifically asked why you didn't use av_basename().

>>> 3. The hlsenc code changes the string temporarily and restores it after
>>>    wards. The same couldn't be done when using the avstring functions.
>>>
>>
>> Why? 
> 
> I don't know. You need to ask the author. 

I asked you why you believed that this couldn't be done with the
avstring functions, not why the code is as it is. (I know why the code
is as it is: To simplify appending two strings. See
6db81e93a95d150ec828214ba7eb6183577c748c.)

> 
>> (Actually, your code is still slightly different from av_basename(): 
>           ---^^^^---
> 
> Are you aware that I haven't written a single line of this code besides 
> the 3 lines of the patch?
> 

And we are talking about this three line patch which you have written...

> I know nothing about the code. I have just fixed that single thing.
> To do it in a way that is conforming and acceptable, I looked around
> how it's done elsewhere. avstring.c is a central part of the code base,
> used virtually everywhere, so I used the same approach, assuming that
> it must be the most acceptable one. I didn't even spend a second for
> thinking about it, because it's pointless here anyway: Somebody will
> always come with another opinion. 
> But when even the project's own code is not good enough for submission,
> then it will sooner or later inflate.
>