From 98f97fbe1b7dc90f03999c7cd826bd1888ffbfe0 Mon Sep 17 00:00:00 2001 From: Sean Quah Date: Tue, 23 Nov 2021 18:22:16 +0000 Subject: [PATCH 1/3] Update the media repository documentation --- changelog.d/11415.doc | 1 + docs/media_repository.md | 87 +++++++++++++++++++++++++++++++--------- 2 files changed, 69 insertions(+), 19 deletions(-) create mode 100644 changelog.d/11415.doc diff --git a/changelog.d/11415.doc b/changelog.d/11415.doc new file mode 100644 index 000000000000..e405531867ed --- /dev/null +++ b/changelog.d/11415.doc @@ -0,0 +1 @@ +Update the media repository documentation. diff --git a/docs/media_repository.md b/docs/media_repository.md index 99ee8f1ef7ff..ae2d46b5895c 100644 --- a/docs/media_repository.md +++ b/docs/media_repository.md @@ -2,29 +2,78 @@ *Synapse implementation-specific details for the media repository* -The media repository is where attachments and avatar photos are stored. -It stores attachment content and thumbnails for media uploaded by local users. -It caches attachment content and thumbnails for media uploaded by remote users. +The media repository + * stores avatars, attachments and their thumbnails for media uploaded by local + users. + * caches avatars, attachments and their thumbnails for media uploaded by remote + users. + * caches resources and thumbnails used for URL previews. -## Storage +All media in Matrix can be identified by a unique +[MXC URI](https://spec.matrix.org/latest/client-server-api/#matrix-content-mxc-uris), +consisting of a server name and media ID: +``` +mxc:/// +``` -Each item of media is assigned a `media_id` when it is uploaded. -The `media_id` is a randomly chosen, URL safe 24 character string. +## Local Media +Synapse generates 24 character media IDs for content uploaded by local users. +These media IDs consist of upper and lowercase letters and are case-sensitive. +Other homeserver implementations may generate media IDs differently. -Metadata such as the MIME type, upload time and length are stored in the -sqlite3 database indexed by `media_id`. +Local media is recorded in the `local_media_repository` table, which includes +metadata such as MIME types, upload times and file sizes. +Note that this table is shared by the URL cache, which has a different media ID +scheme. -Content is stored on the filesystem under a `"local_content"` directory. +### Paths +A file with media ID `aabbcccccccccccccccccccc` and its `128x96` `image/jpeg` +thumbnail, created by scaling, would be stored at: +``` +local_content/aa/bb/cccccccccccccccccccc +local_thumbnails/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale +``` +Older thumbnails may omit the thumbnailing method: +``` +local_thumbnails/aa/bb/cccccccccccccccccccc/128-96-image-jpeg +``` -Thumbnails are stored under a `"local_thumbnails"` directory. +## Remote Media +When media from a remote homeserver is requested from Synapse, it is assigned +a local `filesystem_id`, with the same format as locally-generated media IDs, +as described above. -The item with `media_id` `"aabbccccccccdddddddddddd"` is stored under -`"local_content/aa/bb/ccccccccdddddddddddd"`. Its thumbnail with width -`128` and height `96` and type `"image/jpeg"` is stored under -`"local_thumbnails/aa/bb/ccccccccdddddddddddd/128-96-image-jpeg"` +A record of remote media is stored in the `remote_media_cache` table. -Remote content is cached under `"remote_content"` directory. Each item of -remote content is assigned a local `"filesystem_id"` to ensure that the -directory structure `"remote_content/server_name/aa/bb/ccccccccdddddddddddd"` -is appropriate. Thumbnails for remote content are stored under -`"remote_thumbnail/server_name/..."` +### Paths +A file from `matrix.org` with `filesystem_id` `aabbcccccccccccccccccccc` and its +`128x96` `image/jpeg` thumbnail, created by scaling, would be stored at: +``` +remote_content/server_name/aa/bb/cccccccccccccccccccc +remote_thumbnail/server_name/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale +``` +Older thumbnails may omit the thumbnailing method: +``` +remote_thumbnail/server_name/aa/bb/cccccccccccccccccccc/128-96-image-jpeg +``` + +Note that `remote_thumbnail/` does not have an `s`. + +## URL Previews +When generating previews for URLs, Synapse may download and cache various +resources, including images. These resources are assigned temporary media IDs +of the form `yyyy-mm-dd_aaaaaaaaaaaaaaaa`, where `yyyy-mm-dd` is the current +date and `aaaaaaaaaaaaaaaa` is a random sequence of 16 case-sensitive letters. + +The metadata for these cached resources is stored in the +`local_media_repository` and `local_media_repository_url_cache` tables. + +Resources for URL previews are deleted after a few days. + +### Paths +The file with media ID `yyyy-mm-dd_aaaaaaaaaaaaaaaa` and its `128x96` +`image/jpeg` thumbnail, created by scaling, would be stored at: +``` +url_cache/yyyy-mm-dd/aaaaaaaaaaaaaaaa +url_cache_thumbnails/yyyy-mm-dd/aaaaaaaaaaaaaaaa/128-96-image-jpeg-scale +``` From bf117b66cb68b088eb6348b41e49ca430d79e462 Mon Sep 17 00:00:00 2001 From: Sean Quah Date: Tue, 23 Nov 2021 18:34:44 +0000 Subject: [PATCH 2/3] Fix typo --- docs/media_repository.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/media_repository.md b/docs/media_repository.md index ae2d46b5895c..5b2cf94a68cd 100644 --- a/docs/media_repository.md +++ b/docs/media_repository.md @@ -49,12 +49,12 @@ A record of remote media is stored in the `remote_media_cache` table. A file from `matrix.org` with `filesystem_id` `aabbcccccccccccccccccccc` and its `128x96` `image/jpeg` thumbnail, created by scaling, would be stored at: ``` -remote_content/server_name/aa/bb/cccccccccccccccccccc -remote_thumbnail/server_name/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale +remote_content/matrix.org/aa/bb/cccccccccccccccccccc +remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale ``` Older thumbnails may omit the thumbnailing method: ``` -remote_thumbnail/server_name/aa/bb/cccccccccccccccccccc/128-96-image-jpeg +remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg ``` Note that `remote_thumbnail/` does not have an `s`. From e6f1ce13dc3af7b99e2f0d97497a1e9f0f627c7d Mon Sep 17 00:00:00 2001 From: Sean Quah Date: Wed, 24 Nov 2021 17:59:30 +0000 Subject: [PATCH 3/3] Address PR feedback --- docs/media_repository.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/docs/media_repository.md b/docs/media_repository.md index 5b2cf94a68cd..ba17f8a856f1 100644 --- a/docs/media_repository.md +++ b/docs/media_repository.md @@ -7,7 +7,8 @@ The media repository users. * caches avatars, attachments and their thumbnails for media uploaded by remote users. - * caches resources and thumbnails used for URL previews. + * caches resources and thumbnails used for + [URL previews](development/url_previews.md). All media in Matrix can be identified by a unique [MXC URI](https://spec.matrix.org/latest/client-server-api/#matrix-content-mxc-uris), @@ -33,17 +34,15 @@ thumbnail, created by scaling, would be stored at: local_content/aa/bb/cccccccccccccccccccc local_thumbnails/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale ``` -Older thumbnails may omit the thumbnailing method: -``` -local_thumbnails/aa/bb/cccccccccccccccccccc/128-96-image-jpeg -``` ## Remote Media When media from a remote homeserver is requested from Synapse, it is assigned a local `filesystem_id`, with the same format as locally-generated media IDs, as described above. -A record of remote media is stored in the `remote_media_cache` table. +A record of remote media is stored in the `remote_media_cache` table, which +can be used to map remote MXC URIs (server names and media IDs) to local +`filesystem_id`s. ### Paths A file from `matrix.org` with `filesystem_id` `aabbcccccccccccccccccccc` and its @@ -60,6 +59,9 @@ remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg Note that `remote_thumbnail/` does not have an `s`. ## URL Previews +See [URL Previews](development/url_previews.md) for documentation on the URL preview +process. + When generating previews for URLs, Synapse may download and cache various resources, including images. These resources are assigned temporary media IDs of the form `yyyy-mm-dd_aaaaaaaaaaaaaaaa`, where `yyyy-mm-dd` is the current