Overview
########

Main registry
=============

The class :class:`madam.core.Madam` manages the extensions that can be used to
process different file formats and provides convenience methods to read and
write files.  The simplest way to create a registry with default settings is:

.. code-block:: python

   from madam import Madam

   madam = Madam()

For scripts that do not need a custom configuration, a lazily-initialised
module-level singleton is also available:

.. code-block:: python

   import madam

   asset = madam.default_madam.read(open('photo.jpg', 'rb'))

Format-specific defaults (quality, codec options, etc.) can be passed as a
configuration dictionary to the constructor.  See :doc:`configuration` for
the full list of options.


Media assets
============

At the core of MADAM are **assets** in the form of :class:`madam.core.Asset`
objects.  An asset is an immutable value object that holds:

* **essence** — the raw media bytes, accessible as a file-like object via
  ``asset.essence``.
* **metadata** — a :class:`frozendict` of key/value pairs accessible as
  attributes (``asset.width``) *or* via the ``metadata`` dict.

Assets are typically created by calling :meth:`~madam.core.Madam.read`:

.. code-block:: python

   with open('photo.jpg', 'rb') as f:
       asset = madam.read(f)

   print(asset.mime_type)   # 'image/jpeg'
   print(asset.width)       # e.g. 4000
   print(asset.height)      # e.g. 3000
   print(asset.color_space) # 'RGB'

Because assets are immutable, every transformation returns a *new* asset
rather than modifying the original:

.. code-block:: python

   resized = make_thumbnail(asset)    # new Asset, 'asset' is unchanged
   assert asset.width == 4000         # original unaffected

Each asset also has a content-addressed identifier:

.. code-block:: python

   print(asset.content_id)
   # 'e3b0c44298fc1c149afb4c8996fb92427ae41e4649b934ca495991b7852b855'

Two assets with identical essence bytes always share the same ``content_id``,
making it suitable as an object-store key or a deduplication handle.


Processors
==========

The extensions used to read, process, and write file formats are called
**processors**.  There are two types:

Essence processors (or just processors)
    Represented by :class:`madam.core.Processor` objects.  They are
    responsible for reading and writing the raw media data and for providing
    operators that modify it.  One implementation is
    :class:`madam.image.PillowProcessor`.

Metadata processors
    Represented by :class:`madam.core.MetadataProcessor` objects.  They read
    and write metadata *only*, without touching the essence.  Examples include
    :class:`madam.exif.ExifMetadataProcessor` (EXIF in JPEG/WebP),
    :class:`madam.iptc.IPTCMetadataProcessor` (IPTC in JPEG), and
    :class:`madam.xmp.XMPMetadataProcessor` (XMP in JPEG).

You can retrieve the processor for a specific asset directly:

.. code-block:: python

   processor = madam.get_processor(asset)


Operators
=========

Essence processors provide **operators**: methods decorated with
:func:`~madam.core.operator` that are *configured first* and then *applied*
to one or many assets.  This two-step design lets you reuse a configured
callable across many assets without repeating the configuration:

.. code-block:: python

   from madam.image import ResizeMode

   processor = madam.get_processor(asset)

   # Step 1: configure — returns an Asset → Asset callable
   make_thumbnail = processor.resize(width=200, height=200, mode=ResizeMode.FIT)

   # Step 2: apply to any number of assets
   thumbnail_a = make_thumbnail(asset_a)
   thumbnail_b = make_thumbnail(asset_b)

Operators can be stored, passed to functions, and added to pipelines just
like any other callable.

.. note:: Operators can raise :class:`~madam.core.OperatorError` when
    something goes wrong.  See `Error handling`_ below for how to distinguish
    between retryable and permanent failures.

Example — image adjustments (obtain the processor first via ``get_processor``):

.. code-block:: python

   processor = madam.get_processor(asset)

   enhance = processor.adjust_brightness(factor=1.2)
   result = enhance(asset)

   add_vignette = processor.vignette(strength=0.4)
   result = add_vignette(result)

Example — format conversion:

.. code-block:: python

   to_webp = processor.convert(mime_type='image/webp')
   webp_asset = to_webp(asset)

   with open('output.webp', 'wb') as f:
       madam.write(webp_asset, f)


Pipelines
=========

The :class:`madam.core.Pipeline` class makes it easy to apply a sequence of
operators to one or many assets.

Linear pipeline
---------------

.. code-block:: python

   from madam.core import Pipeline
   from madam.image import ResizeMode

   processor = madam.get_processor(asset)

   portrait_pipeline = Pipeline()
   portrait_pipeline.add(processor.resize(width=300, height=300, mode=ResizeMode.FIT))
   portrait_pipeline.add(processor.sharpen(radius=2, percent=120))
   portrait_pipeline.add(processor.convert(mime_type='image/webp'))

   for processed in portrait_pipeline.process(*source_assets):
       with open(f'out_{processed.content_id}.webp', 'wb') as f:
           f.write(processed.essence.read())

Branching pipeline
------------------

:meth:`~madam.core.Pipeline.branch` fans each input asset out through
several independent sub-pipelines, yielding one output per sub-pipeline per
input:

.. code-block:: python

   thumb_pipe = Pipeline()
   thumb_pipe.add(processor.resize(width=150, height=150, mode=ResizeMode.FILL))

   preview_pipe = Pipeline()
   preview_pipe.add(processor.resize(width=1200, height=900, mode=ResizeMode.FIT))

   pipeline = Pipeline()
   pipeline.branch(thumb_pipe, preview_pipe)

   # For 10 source assets this yields 20 assets: thumbnail + preview for each.
   for asset in pipeline.process(*originals):
       ...

Conditional pipeline
--------------------

:meth:`~madam.core.Pipeline.when` applies one operator when a predicate holds,
and optionally another when it does not:

.. code-block:: python

   pipeline = Pipeline()
   pipeline.when(
       predicate=lambda a: a.width > 1920,
       then=processor.resize(width=1920, height=1080, mode=ResizeMode.FIT),
   )
   # Assets at or below 1920 px wide pass through unchanged.

   # With an else_ branch for format normalization:
   pipeline.when(
       predicate=lambda a: a.mime_type == 'image/png',
       then=processor.convert(mime_type='image/webp'),
       else_=processor.convert(mime_type='image/jpeg'),
   )


Metadata
========

:meth:`~madam.core.Madam.read` automatically extracts metadata from all
registered processors and makes it available directly on the returned asset.

Reading metadata
----------------

Metadata is grouped by processor format under top-level keys:

.. code-block:: python

   asset = madam.read(open('photo.jpg', 'rb'))

   # Format metadata set by the essence processor:
   print(asset.mime_type)    # 'image/jpeg'
   print(asset.width)        # 4000
   print(asset.height)       # 3000

   # EXIF metadata (if present):
   exif = asset.metadata.get('exif', {})
   print(exif.get('camera.manufacturer'))  # e.g. 'Canon'
   print(exif.get('camera.model'))         # e.g. 'EOS 5D Mark III'
   print(exif.get('focal_length'))         # e.g. 85.0
   print(exif.get('datetime_original'))    # datetime.datetime object

   # IPTC metadata (if present):
   iptc = asset.metadata.get('iptc', {})
   print(iptc.get('headline'))
   print(iptc.get('keywords'))    # list of strings

   # XMP metadata (if present):
   xmp = asset.metadata.get('xmp', {})
   print(xmp.get('title'))
   print(xmp.get('rights'))

   # Unified creation timestamp (EXIF → XMP → ffmetadata priority):
   print(asset.created_at)    # ISO 8601 string, e.g. '2024-06-15T10:30:00'

Audio and video metadata live under ``'video'`` and ``'audio'`` sub-keys:

.. code-block:: python

   video_asset = madam.read(open('video.mp4', 'rb'))
   print(video_asset.duration)           # seconds, e.g. 120.5
   print(video_asset.metadata['video'])  # {'codec': 'h264', 'bitrate': 4000, …}
   print(video_asset.metadata['audio'])  # {'codec': 'aac', 'sample_rate': 48000, …}

Writing metadata
----------------

Pass a metadata dict to :meth:`~madam.core.Madam.write`; the library
re-embeds metadata into the essence automatically:

.. code-block:: python

   from madam.exif import ExifMetadataProcessor

   exif_proc = ExifMetadataProcessor()

   # Read existing metadata
   with open('photo.jpg', 'rb') as f:
       metadata = exif_proc.read(f)
       f.seek(0)
       plain_essence = exif_proc.strip(f)

   # Add a description
   updated = dict(metadata)
   updated.setdefault('exif', {})['description'] = 'Sunset over the Alps'

   # Re-combine and write
   with open('photo.jpg', 'rb') as f_in, open('annotated.jpg', 'wb') as f_out:
       combined = exif_proc.combine(f_in, updated)
       f_out.write(combined.read())


Storage
=======

MADAM organises media assets using modular **storage backends**.  All backends
subclass :class:`madam.core.AssetStorage` and behave like Python dictionaries,
storing an asset together with its metadata and a set of tag strings.  The
basic store/retrieve pattern is:

.. code-block:: python

   # Store
   storage[asset_key] = (asset, {'portrait', 'holiday_2024'})

   # Retrieve
   asset, tags = storage[asset_key]

Three built-in backends are provided:

:class:`~madam.core.InMemoryStorage`
    Stores assets in a plain Python dictionary.  Thread-safe.  Data is lost
    when the process exits.

    .. code-block:: python

       from madam.core import InMemoryStorage

       storage = InMemoryStorage()
       storage['hero'] = (asset, {'homepage', 'featured'})
       hero, tags = storage['hero']

:class:`~madam.core.ShelveStorage`
    Persists assets to disk using the Python :mod:`shelve` module.

    .. code-block:: python

       from madam.core import ShelveStorage

       storage = ShelveStorage('/var/lib/madam/shelve')
       storage['hero'] = (asset, {'homepage'})

:class:`~madam.core.FileSystemAssetStorage`
    Stores each asset as two files: the essence bytes and a JSON metadata
    sidecar.  Writes are atomic (write-then-rename), making it safe for
    concurrent workers on shared file systems.

    .. code-block:: python

       from madam.core import FileSystemAssetStorage

       storage = FileSystemAssetStorage('/var/lib/madam/assets')
       storage['hero'] = (asset, {'homepage', 'featured'})

Filtering
---------

All backends support filtering by metadata values or by tags:

.. code-block:: python

   # Find all JPEG assets wider than 1000 px
   results = list(storage.filter(mime_type='image/jpeg', width=1000))

   # Find assets tagged with both 'homepage' and 'featured'
   results = list(storage.filter_by_tags({'homepage', 'featured'}))

:class:`~madam.core.InMemoryStorage` uses an inverted index for O(k) filter
performance (k = number of matching assets).  The disk-based backends use a
linear scan.


Error handling
==============

All operator failures raise exceptions from the
:class:`~madam.core.OperatorError` hierarchy:

.. code-block:: python

   from madam.core import OperatorError, TransientOperatorError, PermanentOperatorError

   try:
       result = operator(asset)
   except TransientOperatorError:
       # Temporary failure (e.g. out of memory, disk full) — safe to retry.
       queue.retry()
   except PermanentOperatorError:
       # Permanent failure (e.g. unsupported codec, corrupt input) — do not retry.
       queue.dead_letter()
   except OperatorError:
       # Generic failure — catch-all.
       log.error('Operator failed', exc_info=True)

:class:`~madam.core.UnsupportedFormatError` is a subclass of
:class:`~madam.core.PermanentOperatorError` and is raised when a file format
is not recognised or not supported by the available processors.


.. _FFmpeg: https://ffmpeg.org/
.. _Pillow: https://python-pillow.org/
.. _piexif: https://piexif.readthedocs.io/