Skip to content

Methodology

This section describes the process used to build the project database and generate the dynamic range analysis.

The main goal has been to keep the process simple, reproducible and robust enough for a heterogeneous music collection made up of different formats, origins and tagging styles.


1. Source files

The analysed collection consists mainly of audio files obtained from:

  • personal CD rips
  • SACD rips converted to DSF
  • special editions such as XRCD, SHM-CD and similar
  • files stored locally in a folder-structured library

The formats currently analysed are:

  • FLAC
  • DSF

2. Metadata extraction

For each track, metadata is extracted directly from the file, prioritising embedded tags whenever available.

The fields used include:

  • artist
  • album
  • track title
  • year
  • genre
  • format

Where metadata is incomplete or inconsistent, fallback rules are applied based on the filename or containing folder.


3. Dynamic range calculation

The DR (Dynamic Range) value is obtained track by track using an external analysis tool on each audio file.

The aim of this metric is to provide a quantitative approximation of the perceived dynamic range of a given mastering.

Important

The DR value:

  • does not replace listening
  • does not measure audio quality on its own
  • should be interpreted as a complementary indicator

That said, it is useful for detecting general trends and comparing different editions.


4. Cleaning and normalisation

One of the most important steps in the project is metadata normalisation, since a real-world collection typically contains many irregular cases.

Corrections applied may include:

  • removal of extra whitespace
  • cleaning of problematic characters
  • UTF-8 encoding correction
  • resolution of duplicate or contaminated fields
  • partial normalisation of artist / album / track
  • handling of non-homogeneous date formats

The goal is not absolute perfection, but a database consistent enough for statistical analysis and musical exploration.


5. Storage

The processed data is stored in a SQLite database, which allows:

  • fast queries
  • aggregations by artist, album or genre
  • statistics generation
  • export to other formats if needed

The SQLite database acts as the single source of truth for the project.


6. Generating visualisations and conclusions

From the database, various artefacts are generated for the web:

  • aggregated tables
  • JSON files
  • interactive visualisations
  • analytical summaries

This provides a clear separation between:

  • the extraction and calculation layer
  • the presentation layer

7. Limitations

As with any project of this kind, some limitations are unavoidable:

  • inconsistent metadata across editions
  • differences between manual and automatic tagging
  • non-homogeneous repertoire across genres
  • possible presence of outliers or occasional errors

This analysis should therefore be understood as an exploratory support tool, not a definitive ranking.


General approach

The philosophy throughout the project has been:

prioritise a maintainable, understandable and practically useful solution over pursuing technical perfection that does not justify the effort.

That balance allows the database to remain alive, extensible and enjoyable.