[FFmpeg-devel] [PATCH v3 1/2] avformat/url: fix logic for removing ".." path components

Marton Balint cus at passwd.hu
Wed Jul 29 23:02:48 EEST 2020



On Wed, 29 Jul 2020, Nicolas George wrote:

> Zlomek, Josef (12020-07-29):
>> I also noticed that there are many more bugs in ff_make_absolute_url()
>> I just fixed one of them.
>> I am looking forward for your complete fix.
>
> I think the key to a working version is to properly parse URLs first.
> Then we can parse the base and relative part, and build the absolute URL
> from the components, simplifying the path component, and only the path
> component, at the same time.
>
> Here is my preliminary code for parsing, in case somebody wants to look
> at it.

Thanks for working on this. I agree that proper (as RFC compliant as it 
can be) URL parsing is needed here. Probably we should clearly document 
differences from RFC compliant parsing, if we cannot do it entirely RFC 
compliantly...

> diff --git a/libavformat/url.c b/libavformat/url.c
> index 20463a6674..7612a2ee6e 100644
> --- a/libavformat/url.c
> +++ b/libavformat/url.c
> @@ -78,6 +78,75 @@ int ff_url_join(char *str, int size, const char *proto,
>      return strlen(str);
>  }
> 
> +static const char *find_delim(const char *delim, const char *cur, const char *end)
> +{
> +    while (cur < end && !strchr(delim, *cur))
> +        cur++;
> +    return cur;
> +}
> +
> +int ff_url_decompose(URLComponents *uc, const char *url, const char *end)
> +{
> +    const char *cur, *aend, *p;
> +
> +    if (!end)
> +        end = url + strlen(url);
> +    cur = uc->url = url;
> +
> +    /* scheme */
> +    uc->scheme = cur;
> +    p = find_delim(":/", cur, end); /* lavf "schemes" can contain options */
> +    if (*p == ':')
> +        cur = p + 1;
> +
> +    /* authority */
> +    uc->authority = cur;
> +    if (end - cur >= 2 && cur[0] == '/' && cur[1] == '/') {
> +        cur += 2;
> +        aend = find_delim("/", cur, end);

? and # can also separate authority from the rest.

> +
> +        /* userinfo */
> +        uc->userinfo = cur;
> +        p = find_delim("@", cur, aend);
> +        if (*p == '@')
> +            cur = p + 1;
> +
> +        /* host */
> +        uc->host = cur;
> +        if (*cur == '[') { /* hello IPv6, thanks for using colons! */
> +            p = find_delim("]", cur, aend);
> +            if (*p != ']')
> +                return AVERROR(EINVAL);
> +            if (p + 1 < aend && p[1] != ':')
> +                return AVERROR(EINVAL);
> +            cur = p + 1;
> +        } else {
> +            cur = find_delim(":", cur, aend);
> +        }
> +
> +        /* port */
> +        uc->port = cur;
> +        cur = aend;
> +    } else {
> +        uc->userinfo = uc->host = uc->port = cur;
> +    }
> +
> +    /* path */
> +    uc->path = cur;
> +    cur = find_delim("?#", cur, end);
> +
> +    /* query */
> +    uc->query = cur;
> +    if (*cur == '?')
> +        cur = find_delim("#", cur, end);
> +
> +    /* fragment */
> +    uc->fragment = cur;
> +
> +    uc->end = end;
> +    return 0;
> +}
> +
>  static void trim_double_dot_url(char *buf, const char *rel, int size)
>  {
>      const char *p = rel;
> diff --git a/libavformat/url.h b/libavformat/url.h
> index de0d30aca0..99d453b378 100644
> --- a/libavformat/url.h
> +++ b/libavformat/url.h
> @@ -344,4 +344,41 @@ const AVClass *ff_urlcontext_child_class_iterate(void **iter);
>  const URLProtocol **ffurl_get_protocols(const char *whitelist,
>                                          const char *blacklist);
> 
> +typedef struct URLComponents {
> +    const char *url;        /**< whole URL, for reference */
> +    const char *scheme;     /**< possibly including lavf-specific options */
> +    const char *authority;  /**< "//" if it is a real URL */
> +    const char *userinfo;   /**< including final '@' if present */
> +    const char *host;
> +    const char *port;       /**< including initial ':' if present */
> +    const char *path;
> +    const char *query;      /**< including initial '?' if present */
> +    const char *fragment;   /**< including initial '#' if present */
> +    const char *end;
> +} URLComponents;
> +
> +#define url_component_end_scheme      authority
> +#define url_component_end_authority   userinfo
> +#define url_component_end_userinfo    host
> +#define url_component_end_host        port
> +#define url_component_end_port        path
> +#define url_component_end_path        query
> +#define url_component_end_query       fragment
> +#define url_component_end_fragment    end

I am not sure about this approach, the known characters at the end 
or at the start will make further operations a bit harder. I'd just 
simply add another field for each URL component to signal the end, e.g.

url
url_end
scheme
scheme_end
...

Regards,
Marton


More information about the ffmpeg-devel mailing list