[FFmpeg-devel] [PATCH v4 1/4] doc: Explain what "context" means

Wed May 15 18:54:19 EEST 2024

Derived from detailed explanations kindly provided by Stefano Sabatini:
https://ffmpeg.org/pipermail/ffmpeg-devel/2024-April/325903.html
---
 doc/context.md | 394 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 394 insertions(+)
 create mode 100644 doc/context.md

diff --git a/doc/context.md b/doc/context.md
new file mode 100644
index 0000000000..fb85b3f366
--- /dev/null
+++ b/doc/context.md
@@ -0,0 +1,394 @@
+# Introduction to contexts
+
+“%Context” is a name for a widely-used programming idiom.
+This document explains the general idiom and the conventions FFmpeg has built around it.
+
+This document uses object-oriented analogies to help readers familiar with
+[object-oriented programming](https://en.wikipedia.org/wiki/Object-oriented_programming)
+learn about contexts.  But contexts can also be used outside of OOP,
+and even in situations where OOP isn't helpful.  So these analogies
+should only be used as a first step towards understanding contexts.
+
+## “Context” as a way to think about code
+
+A context is any data structure that is passed to several functions
+(or several instances of the same function) that all operate on the same entity.
+For example, [object-oriented programming](https://en.wikipedia.org/wiki/Object-oriented_programming)
+languages usually provide member functions with a `this` or `self` value:
+
+```c
+class my_cxx_class {
+  void my_member_function() {
+    // the implicit object parameter provides context for the member function:
+    std::cout << this;
+  }
+};
+```
+
+Contexts are a fundamental building block of OOP, but can also be used in procedural code.
+For example, most callback functions can be understood to use contexts:
+
+```c
+struct MyStruct {
+  int counter;
+};
+
+void my_callback( void *my_var_ ) {
+  // my_var provides context for the callback function:
+  struct MyStruct *my_var = (struct MyStruct *)my_var_;
+  printf("Called %d time(s)", ++my_var->counter);
+}
+
+void init() {
+  struct MyStruct my_var;
+  my_var.counter = 0;
+  register_callback( my_callback, &my_var );
+}
+```
+
+In the broadest sense, “context” is just a way to think about code.
+You can even use it to think about code written by people who have never
+heard the term, or who would disagree with you about what it means.
+
+## “Context” as a tool of communication
+
+“%Context“ can just be a word to understand code in your own head,
+but it can also be a term you use to explain your interfaces.
+Here is a version of the callback example that makes the context explicit:
+
+```c
+struct CallbackContext {
+  int counter;
+};
+
+void my_callback( void *ctx_ ) {
+  // ctx provides context for the callback function:
+  struct CallbackContext *ctx = (struct CallbackContext *)ctx_;
+  printf("Called %d time(s)", ++ctx->counter);
+}
+
+void init() {
+  struct CallbackContext ctx;
+  ctx.counter = 0;
+  register_callback( my_callback, &ctx );
+}
+```
+
+The difference here is subtle, but important.  If a piece of code
+*appears compatible with contexts*, then you are *allowed to think
+that way*, but if a piece of code *explicitly states it uses
+contexts*, then you are *required to follow that approach*.
+
+For example, imagine someone modified `MyStruct` in the earlier example
+to count several unrelated events across the whole program.  That would mean
+it contained information about multiple entities, so was not a context.
+But nobody ever *said* it was a context, so that isn't necessarily wrong.
+However, proposing the same change to the `CallbackContext` in the later example
+would violate a guarantee, and should be pointed out in a code review.
+
+ at warning Guaranteeing to use contexts does not mean guaranteeing to use
+object-oriented programming.  For example, FFmpeg creates its contexts
+procedurally instead of with constructors.
+
+## Contexts in the real world
+
+To understand how contexts are used in the real world, it might be
+useful to compare [curl's MD5 hash context](https://github.com/curl/curl/blob/bbeeccdea8507ff50efca70a0b33d28aef720267/lib/curl_md5.h#L48)
+with @ref AVMD5 "FFmpeg's equivalent context".
+
+The [MD5 algorithm](https://en.wikipedia.org/wiki/MD5) produces
+a fixed-length digest from arbitrary-length data.  It does this by calculating
+the digest for a prefix of the data, then loading the next part and adding it
+to the previous digest, and so on.  Projects that use MD5 generally use some
+kind of context, so comparing them can reveal differences between projects.
+
+```c
+// Curl's MD5 context looks like this:
+struct MD5_context {
+  const struct MD5_params *md5_hash;    /* Hash function definition */
+  void                  *md5_hashctx;   /* Hash function context */
+};
+
+// FFmpeg's MD5 context looks like this:
+typedef struct AVMD5 {
+    uint64_t len;
+    uint8_t  block[64];
+    uint32_t ABCD[4];
+} AVMD5;
+```
+
+Curl's struct name ends with `_context`, guaranteeing contexts are the correct
+interpretation.  FFmpeg's struct does not explicitly say it's a context, but
+ at ref libavutil/md5.c "its functions do" so we can reasonably assume
+it's the intended interpretation.
+
+Curl's struct uses `void *md5_hashctx` to avoid guaranteeing
+implementation details in the public interface, whereas FFmpeg makes
+everything accessible.  This kind of data hiding is an advanced context-oriented
+convention, and is discussed below.  Using it in this case has strengths and
+weaknesses.  On one hand, it means changing the layout in a future version
+of curl won't break downstream programs that used that data.  On the other hand,
+the MD5 algorithm has been stable for 30 years, so it's arguably more important
+to let people dig in when debugging their own code.
+
+Curl's struct is declared as `struct <type> { ... }`, whereas FFmpeg uses
+`typedef struct <type> { ... } <type>`.  These conventions are used with both
+context and non-context structs, so don't say anything about contexts as such.
+Specifically, FFmpeg's convention is a workaround for an issue with C grammar:
+
+```c
+void my_function( ... ) {
+  int                my_var;        // good
+  MD5_context        my_curl_ctx;   // error: C needs you to explicitly say "struct"
+  struct MD5_context my_curl_ctx;   // good: added "struct"
+  AVMD5              my_ffmpeg_ctx; // good: typedef's avoid the need for "struct"
+}
+```
+
+Both MD5 implementations are long-tested, widely-used examples of contexts
+in the real world.  They show how contexts can solve the same problem
+in different ways.
+
+## FFmpeg's advanced context-oriented conventions
+
+Projects that make heavy use of contexts tend to develop conventions
+to make them more useful.  This section discusses conventions used in FFmpeg,
+some of which are used in other projects, others are unique to this project.
+
+### Naming: “Context” and “ctx”
+
+```c
+// Context struct names usually end with `Context`:
+struct AVSomeContext {
+  ...
+};
+
+// Functions are usually named after their context,
+// context parameters usually come first and are often called `ctx`:
+void av_some_function( AVSomeContext *ctx, ... );
+```
+
+If an FFmpeg struct is intended for use as a context, its name usually
+makes that clear.  Exceptions to this rule include AVMD5 (discussed above),
+which is only identified as a context by the functions that call it.
+
+If a function is associated with a context, its name usually
+begins with some variant of the context name (e.g. av_md5_alloc()
+or avcodec_alloc_context3()).  Exceptions to this rule include
+ at ref avformat.h "AVFormatContext's functions", many of which
+begin with just `av_`.
+
+If a function has a context parameter, it usually comes first and its name
+often contains `ctx`.  Exceptions include av_bsf_alloc(), which puts the
+context argument second to emphasise it's an out variable.
+
+### Data hiding: private contexts
+
+```c
+// Context structs often hide private context:
+struct AVSomeContext {
+  void *priv_data; // sometimes just called "internal"
+};
+```
+
+Contexts usually present a public interface, so changing a context's members
+forces everyone that uses the library to at least recompile their program,
+if not rewrite it to remain compatible.  Hiding information in a private context
+ensures it can be modified without affecting downstream software.
+
+Object-oriented programmers may be tempted to compare private contexts to
+*private class members*.  That's often accurate, but for example it can also
+be used like a *virtual function table* - a list of functions that are
+guaranteed to exist, but may be implemented differently for different
+sub-classes.  When thinking about private contexts, remember that FFmpeg
+isn't *large enough* to need some common OOP techniques, even though it's
+solving a problem that's *complex enough* to benefit from some rarer techniques.
+
+### Manage lifetime: allocate, initialize and free
+
+```c
+void my_function( ... ) {
+
+    // Context structs are allocated then initialized with associated functions:
+
+    AVSomeContext *ctx = av_some_context_alloc( ... );
+
+    // ... configure ctx ...
+
+    av_some_context_init( ctx, ... );
+
+    // ... use ctx ...
+
+    // Context structs are freed with associated functions:
+
+    av_some_context_free( ctx );
+
+}
+```
+
+FFmpeg contexts go through the following stages of life:
+
+1. allocation (often a function that ends with `_alloc`)
+   * a range of memory is allocated for use by the structure
+   * memory is allocated on boundaries that improve caching
+   * memory is reset to zeroes, some internal structures may be initialized
+2. configuration (implemented by setting values directly on the object)
+   * no function for this - calling code populates the structure directly
+   * memory is populated with useful values
+   * simple contexts can skip this stage
+3. initialization (often a function that ends with `_init`)
+   * setup actions are performed based on the configuration (e.g. opening files)
+5. normal usage
+   * most functions are called in this stage
+   * documentation implies some members are now read-only (or not used at all)
+   * some contexts allow re-initialization
+6. closing (often a function that ends with `_close()`)
+   * teardown actions are performed (e.g. closing files)
+7. deallocation (often a function that ends with `_free()`)
+   * memory is returned to the pool of available memory
+
+This can mislead object-oriented programmers, who expect something more like:
+
+1. allocation (usually a `new` keyword)
+   * a range of memory is allocated for use by the structure
+   * memory *may* be reset (e.g. for security reasons)
+2. initialization (usually a constructor)
+   * memory is populated with useful values
+   * related setup actions are performed based on arguments (e.g. opening files)
+3. normal usage
+   * most functions are called in this stage
+   * compiler enforces that some members are read-only (or private)
+   * no going back to the previous stage
+4. finalization (usually a destructor)
+   * teardown actions are performed (e.g. closing files)
+5. deallocation (usually a `delete` keyword)
+   * memory is returned to the pool of available memory
+
+FFmpeg's allocation stage is broadly similar to OOP, but can do some higher-level
+operations.  For example, AVOptions-enabled structs (discussed below) contain an
+AVClass member that is set during allocation.
+
+FFmpeg's "configuration" and "initialization" stages combine to resemble OOP's
+"initialization" stage.  This can mislead object-oriented developers,
+who are used to doing both at once.  This means FFmpeg contexts don't have
+a direct equivalent of OOP constructors, as they would be doing
+two jobs in one function.
+
+FFmpeg's three-stage creation process is useful for complicated structures.
+For example, AVCodecContext contains many members that *can* be set before
+initialization, but in practice most programs set few if any of them.
+Implementing this with a constructor would involve a function with a list
+of arguments that was extremely long and changed whenever the struct was
+updated.  For contexts that don't need the extra flexibility, FFmpeg usually
+provides a combined allocator and initializer function.  For historical reasons,
+suffixes like `_alloc`, `_init`, `_alloc_context` and even `_open` can indicate
+the function does any combination of allocation and initialization.
+
+FFmpeg's "closing" stage is broadly similar to OOP's "finalization" stage,
+but some contexts allow re-initialization after finalization.  For example,
+SwrContext lets you call swr_close() then swr_init() to reuse a context.
+
+FFmpeg's "deallocation" stage is broadly similar to OOP, but can perform some
+higher-level functions (similar to the allocation stage).
+
+Very few contexts need the flexibility of separate "closing" and
+"deallocation" stages, so these are usually combined into a single function.
+Closing functions usually end with "_close", while deallocation
+functions usually end with "_free".
+
+### Reflection: AVOptions-enabled structs
+
+Object-oriented programming puts more focus on data hiding than FFmpeg needs,
+but it also puts less focus on
+[reflection](https://en.wikipedia.org/wiki/Reflection_(computer_programming)).
+
+To understand FFmpeg's reflection requirements, run `ffmpeg -h full` on the
+command-line, then ask yourself how you would implement all those options
+with the C standard [`getopt` function](https://en.wikipedia.org/wiki/Getopt).
+You can also ask the same question for any other programming languages you know.
+[Python's argparse module](https://docs.python.org/3/library/argparse.html)
+is a good example - its approach works well with far more complex programs
+than `getopt`, but would you like to maintain an argparse implementation
+with 15,000 options and growing?
+
+Most solutions assume you can just put all options in a single block,
+which is unworkable at FFmpeg's scale.  Instead, we split configuration
+across many *AVOptions-enabled structs*, which use the @ref avoptions
+"AVOptions API" to reflect information about their user-configurable members,
+including members in private contexts.
+
+An *AVOptions-enabled struct* is a struct that contains an AVClass element as
+its first member, and uses that element to provide access to instances of
+AVOption, each of which provides information about a single option.
+The AVClass can also include more @ref AVClass "AVClasses" for private contexts,
+making it possible to set options through the API that aren't
+accessible directly.
+
+AVOptions-accessible members of a context should be accessed through the
+AVOptions API whenever possible, even if they're not hidden away in a private
+context.  That ensures values are validated as they're set, and means you won't
+have to do as much work if a future version of FFmpeg changes the layout.
+
+AVClass was created very early in FFmpeg's history, long before AVOptions.
+Its name suggests some kind of relationship to an OOP
+base [class](https://en.wikipedia.org/wiki/Class_(computer_programming)),
+but the name has become less accurate as FFmpeg evolved, to the point where
+AVClass and AVOption are largely synonymous in modern usage.  The difference
+might still matter if you need to support old versions of FFmpeg,
+where you might find *AVClass context structures* (contain an AVClass element
+as their first member) that are not *AVOptions-enabled* (don't use that element
+to provide access to instances of AVOption).
+
+Object-oriented programmers may be tempted to compare @ref avoptions "AVOptions"
+to OOP getters and setters.  There is some overlap in functionality, but OOP
+getters and setters are usually specific to a single member and don't provide
+metadata about the member; whereas AVOptions has a single API that covers
+every option, and provides help text etc. as well.
+
+Object-oriented programmers may be tempted to compare AVOptions-accessible
+members of a public context to protected members of a class.  Both provide
+global access through an API, and unrestricted access for trusted friends.
+But this is just a happy accident, not a guarantee.
+
+## Final example: context for a codec
+
+AVCodecContext is an AVOptions-enabled struct that contains information
+about encoding or decoding one stream of data (e.g. the video in a movie).
+It's a good example of many of the issues above.
+
+The name "AVCodecContext" tells us this is a context.  Many of
+ at ref libavcodec/avcodec.h "its functions" start with an `avctx` parameter,
+indicating this object provides context for that function.
+
+AVCodecContext::internal contains the private context.  For example,
+codec-specific information might be stored here.
+
+AVCodecContext is allocated with avcodec_alloc_context3(), initialized with
+avcodec_open2(), and freed with avcodec_free_context().  Most of its members
+are configured with the @ref avoptions "AVOptions API", but for example you
+can set AVCodecContext::opaque or AVCodecContext::draw_horiz_band() if your
+program happens to need them.
+
+AVCodecContext provides an abstract interface to many different *codecs*.
+Options supported by many codecs (e.g. "bitrate") are kept in AVCodecContext
+and reflected as AVOptions.  Options that are specific to one codec are
+stored in the internal context, and reflected from there.
+
+To support a specific codec, AVCodecContext's private context is set to
+an encoder-specific data type.  For example, the video codec
+[H.264](https://en.wikipedia.org/wiki/Advanced_Video_Coding) is supported via
+[the x264 library](https://www.videolan.org/developers/x264.html), and
+implemented in X264Context.  Although included in the documentation, X264Context
+is not part of the public API.  Whereas there are strict rules about
+changing AVCodecContext, a version of FFmpeg could modify X264Context or
+replace it with another type altogether.  An adverse legal ruling or security
+problem could even force us to switch to a completely different library
+without a major version bump.
+
+The design of AVCodecContext provides several important guarantees:
+
+- lets you use the same interface for any codec
+- supports common encoder options like "bitrate" without duplicating code
+- supports encoder-specific options like "profile" without bulking out the public interface
+- reflects both types of options to users, with help text and detection of missing options
+- hides implementation details (e.g. its encoding buffer)
-- 
2.43.0