Methodology¶

This section describes the process used to build the project database and generate the dynamic range analysis.

The main goal has been to keep the process simple, reproducible and robust enough for a heterogeneous music collection made up of different formats, origins and tagging styles.

1. Source files¶

The analysed collection consists mainly of audio files obtained from:

personal CD rips
SACD rips converted to DSF
special editions such as XRCD, SHM-CD and similar
files stored locally in a folder-structured library

The formats currently analysed are:

FLAC
DSF

2. Metadata extraction¶

For each track, metadata is extracted directly from the file, prioritising embedded tags whenever available.

The fields used include:

artist
album
track title
year
genre
format

Where metadata is incomplete or inconsistent, fallback rules are applied based on the filename or containing folder.

3. Dynamic range calculation¶

The DR (Dynamic Range) value is obtained track by track using an external analysis tool on each audio file.

The aim of this metric is to provide a quantitative approximation of the perceived dynamic range of a given mastering.

Important¶

The DR value:

does not replace listening
does not measure audio quality on its own
should be interpreted as a complementary indicator

That said, it is useful for detecting general trends and comparing different editions.

4. Cleaning and normalisation¶

One of the most important steps in the project is metadata normalisation, since a real-world collection typically contains many irregular cases.

Corrections applied may include:

removal of extra whitespace
cleaning of problematic characters
UTF-8 encoding correction
resolution of duplicate or contaminated fields
partial normalisation of artist / album / track
handling of non-homogeneous date formats

The goal is not absolute perfection, but a database consistent enough for statistical analysis and musical exploration.

5. Storage¶

The processed data is stored in a SQLite database, which allows:

fast queries
aggregations by artist, album or genre
statistics generation
export to other formats if needed

The SQLite database acts as the single source of truth for the project.

6. Generating visualisations and conclusions¶

From the database, various artefacts are generated for the web:

aggregated tables
JSON files
interactive visualisations
analytical summaries

This provides a clear separation between:

the extraction and calculation layer
the presentation layer

7. Limitations¶

As with any project of this kind, some limitations are unavoidable:

inconsistent metadata across editions
differences between manual and automatic tagging
non-homogeneous repertoire across genres
possible presence of outliers or occasional errors

This analysis should therefore be understood as an exploratory support tool, not a definitive ranking.

General approach¶

The philosophy throughout the project has been:

prioritise a maintainable, understandable and practically useful solution over pursuing technical perfection that does not justify the effort.

That balance allows the database to remain alive, extensible and enjoyable.