Who gets documented and how: a technical glimpse into 118 published digital catalogues raisonnés

Jun 26

We inspected the technology behind 118 prominent, live digital catalogues raisonnés to identify digital strategies that the field actually uses. What we found was not only a story about software. It was a story about how the art market continues to influence which artists receive lasting scholarly infrastructure, and on what terms.

Eva Gonzalès, *La Plage de Dieppe: Vue prise du château*, c. 1871-72. Oil on canvas, 28.5 x 70 cm. Château-Musée de Dieppe; in the public domain.

This dataset contains 118 digital catalogues raisonnés. It was created through institutional knowledge, reference to ICRA and CRSA programs, and several aggregated lists available online. Entries were removed from the dataset when they were not associated with an institution or authored by a professional art historian. It is by no means an exhaustive list, but it is representative of published projects.

Metrics at a glance

10% of the dataset documents female artists
8% of the dataset documents artists of color
59% of independent catalogues run on a purpose-built CR platform
22% of independent catalogues are custom-built
19% of independent catalogues run on an assembled solution
3% of custom and assembled catalogues use a modern JavaScript framework

“Independent catalogues” refer to the 86 catalogues in the dataset that do not run on a single, institutionally built infastructure. The 32 catalogues in the dataset that live on shared institutional infrastructure are based in European museums.

What the numbers represent

How catalogue raisonné technology choices reflect market position

Of the 86 independent catalogues in the dataset, 59% use a platform specifically designed for catalogues raisonnés. 22% are custom-built. 19% run on assembled solutions combining a database, a content management system, and various plugins. An additional 32 catalogues exist within shared institutional infrastructure and are reported separately because they represent publicly funded digital preservation for national heritage artists, a different use case from the majority of publishing institutions in this dataset.

The custom tier is where the market logic is clearest. The 19 custom-built catalogues cluster at two extremes: major foundations, studios, and museums with the budget and institutional infrastructure to sustain bespoke software, and a few individual scholars who built hand-coded HTML because no platform existed when their projects started. The middle tier, those with enough resources to publish but not enough to commission and maintain custom software, uses Artifact, Navigating.art, or a collection management system. The platform choice is, among other things, a statement about institutional scale.

Who (doesn’t) gets documented: Women artists and artists of color in the catalogue raisonné dataset

Twelve of the 118 artists in the dataset are women, and only two are on custom-built solutions — a percentage that reflects the broader historical underrepresentation of women in the canon of collected and commercially valued art. For women, reaching the custom tier requires institutional scale that remains rare. The purpose-built platforms are doing more demographic work than a summary of their client lists might suggest: they are the infrastructure that makes it possible to document artists whose commercial markets have not generated the endowments that fund custom builds.

Nine of the 118 artists in the dataset are identified as people of color, 8% of the total and lower than the already low figure for women. All three artists of African descent are on Artifact or Navigating.art. None are custom builds; none are on shared institutional infrastructure. The funding structures that produce custom builds have not historically been directed at their work, and the data reflects that directly.

The exception: Shared institutional infrastructure in Europe

The 32 catalogues on shared institutional infrastructure break the pattern in an instructive way. Eighty-four percent of these entries come from German-speaking countries, with the Dutch RKD accounting for the remaining 16%. This tier reaches 13% women artists, better than any market-facing platform in the dataset. It does so without trying: its selection criteria are archival and national rather than commercial, which produces a more representative slice of the dataset than any commercially oriented platform.

Technology and time: Why 97% of custom and assembled catalogues run on outdated frameworks

Among custom-built and assembled catalogues, only three use a current JavaScript framework. Everything else runs on stacks from the 2010s: WordPress, Drupal, TYPO3, Joomla, jQuery, and PHP. The catalogues that look functional today are largely running on infrastructure that will require complete rebuilds within a decade. The three catalogues using modern frameworks are all backed by major institutional or commercial budgets. The resources required to keep infrastructure current follow the same distribution as the resources required to build it.

What this data shows and what we are tracking next

The catalogue raisonné form has always reflected the audiences and markets it serves. What this dataset makes visible is that the reflection extends into infrastructure: which artists receive the most technically ambitious and independently sustained documentation is not a separate question from which artists the commercial market values most. The 10% and 8% figures are not anomalies in an otherwise neutral system. They are the market, legible at the level of server configurations and database choices.

This dataset will be updated annually. Changes in the technology distribution may be slow; the demographic figures are likely to be more sensitive to shifts in how estates, foundations, and platforms direct their work. A pulse survey with the estate and foundation community is planned for later this year, which will add crucial data that fingerprinting cannot reach.

Dataset methodology and privacy

Discovering the software strategy each catalogue raisonné took relied on a combination of methods, but primarily on investigating digital fingerprints available through their websites.

HTTP response headers — X-Powered-By, Server, X-Generator, and related fields often expose CMS or framework identity directly
HTML meta generators — WordPress, TYPO3, Joomla, and Drupal all inject <meta name="generator"> tags by default
JavaScript bundle paths — Next.js serves from /_next/static/, Nuxt from /_nuxt/, Drupal from /core/assets/vendor/, WordPress from /wp-content/plugins/ and /wp-includes/; these paths are reliably diagnostic
Network request patterns — for JavaScript-heavy sites that don't render useful source HTML, observing the actual asset URLs loaded at runtime reveals the underlying framework
Footer credits and plugin fingerprints — many catalogues display "Built with [Platform]" credits, or load plugins that confirm the CMS

This investigation results in enough information to place each catalogue raisonné into one of the three general categories. It does not reveal any private information about the publication or the artwork documented, nor does it reveal every piece of software used in the publication workflow. For example, a scholar may have begun their process by writing notes in a Word document, created a Google Sheet to start with, and then migrated everything somewhere else. This part of the workflow is not visible in the end code.

Navigating.art https://navigating.art