Almost frictionless: Museum open access in practice in 2026

The past decade has changed what it means to have access to a museum collection. Major institutions, including the Metropolitan Museum of Art, the Art Institute of Chicago, and the Rijksmuseum, have released vast holdings under Creative Commons Zero licenses. In doing so, they've made high-resolution images and detailed metadata freely available to researchers with an internet connection, for any purpose, without fee or permission. For art historians, this represents a genuine shift: research that once required sustained physical proximity to a collection, or the resources to license reproductions, can now be retrieved, compared, and analyzed at a scale that was not previously possible.

The practical reality of that access is the subject of this article. Researchers who approach these collections by writing code to retrieve images, query metadata, or build datasets across institutions encounter a small but recurring set of technical patterns that the official documentation does not always fully address. This article examines those patterns through a specific lens: the open-access collections of institutions with significant Impressionist holdings. The Impressionist case is a useful one precisely because the works are distributed across institutions with genuinely different approaches to digital infrastructure. 

The details included here reflect the real experience of working with open-access collections to build a dataset of images of paintings made by artists who exhibited at the eight independent exhibitions in Paris from 1874 to 1886 that are often referred to as the Impressionist exhibitions.

The challenges encountered in this process highlight many of the recurring challenges that appear throughout computational work with cultural heritage data. The goal here is to name those challenges clearly, so that the knowledge accumulated through hard experience becomes transferable — and the afternoon spent diagnosing an undocumented header requirement is one fewer afternoon the next researcher has to spend.

Pierre-Auguste Renoir, Pêches, 1881. Oil on canvas, 53.3 × 64.8 cm. Image courtesy of The Metropolitan Museum of Art, New York (in the Public Domain). First exhibited in 1882 at the 7me exposition des artistes indépendants, March 1882, no. 159 (as "Les pêches").

Discovery vs. access: The Metropolitan Museum of Art

One of the most fundamental distinctions when working with open collections is between access and discovery. Access refers to whether you can retrieve a work once you know what you are looking for; discovery is whether the system helps you find it in the first place. The two are often conflated in open access announcements, but they are built on different infrastructures and can fail independently.

The Metropolitan Museum's collection API handles both quite well, which makes it a useful baseline. It is openly documented, requires no authentication, and has been publicly available since 2018. A researcher can query it without registering for a key, without agreeing to terms beyond the CC0 license that governs the images themselves, and without any special configuration. Requests return clean JSON. Images resolve reliably. The open access policy is reflected accurately in the data: works flagged as open access are, in practice, open.

One limitation is worth knowing in advance, and it falls precisely on the discovery side of that distinction. The API's artist search does not always return results organized by the queried name. A search for "Monet" may surface works by other artists; a search for "Renoir" behaves similarly. This is not a flaw in the collection's accessibility but a quirk of how the search index is constructed. Researchers who arrive with a list of object IDs, or who retrieve full department listings and filter locally, will find the API entirely reliable. Those who expect artist-name search to function like a finding aid may need to adjust their approach. The Met's online collection is a practical way to identify object IDs before querying the API directly.

For most purposes, the Met's API is as close to frictionless as publicly available museum data gets. It is a reasonable model against which to measure the others.

Implicit requirements: The Art Institute of Chicago

One of the most common challenges in working with public APIs is the gap between what documentation covers and what implementation requires. Systems built for internal use before being opened externally often carry dependencies that internal users absorb through institutional knowledge but that outside researchers have no way to anticipate. The behavior is consistent and logical once understood; the difficulty is that understanding it requires experimentation that the documentation does not prompt.

The Art Institute of Chicago API is well-designed and generously documented. It supports complex queries, returns rich metadata, and covers a collection with significant holdings in Impressionism, Post-Impressionism, and American modernism. Two implicit requirements shape whether a researcher can retrieve images.

The first concerns the image endpoint itself. The AIC hosts images through a separate IIIF server, and requests to that server without a Referer header pointing to www.artic.edu return a 403 error. No documentation flags this requirement. A researcher building an image retrieval pipeline will encounter the error and need to diagnose it independently; adding the header resolves it immediately.

The second concerns field selection. The API supports a fields parameter that limits which metadata is returned, and by default it does not include image_id. Since image_id is necessary to construct the image URL, a query that omits the parameter will return records with no obvious path to the image.

The experience of getting nothing — no image, no error that points toward the cause — is characteristic of this category across institutions and domains. Both issues are straightforward once known, and the AIC's collection is richly accessible once they are. The documentation would benefit from a section on image retrieval specifically.

Documentation drift: The Cleveland Museum of Art

A related but distinct challenge arises when documentation and implementation are simply out of sync. APIs evolve incrementally; documentation, which by definition must follow API development, does not always keep pace.

The Cleveland Museum of Art has been among the more committed institutions in the open access movement, and its API reflects that commitment. The collection is queryable, well-organized, and returns useful metadata. One small discrepancy is worth knowing. The documentation describes the license field for open access works as returning the value "Open Access." The API actually returns "CC0." Any code written directly from the documentation to filter for open-access works will silently return nothing, because the string it is checking for does not match the string the API provides. It’s likely that “Open Access” was once correct, and the documentation hasn’t been updated to match “CCO.” The fix is trivial once the discrepancy is identified — a single string substitution — but it will not announce itself as necessary. Code written in alignment with the documentation produces no errors and no results, which makes the discrepancy harder to find than a straightforward failure would be.

Parameter inconsistency: The Rijksmuseum

APIs that expose multiple endpoints, or that have evolved across versions, sometimes carry inconsistencies in how parameters are named. A field called "artist" in one context may be called "creator" or "maker" in another. Documentation tends to describe intended behavior; actual behavior sometimes reflects earlier design decisions that were never fully reconciled.

The Rijksmuseum offers both an authenticated API and a keyless endpoint. The keyless endpoint is a useful starting point for researchers who want to explore the collection without registering, but its parameter behavior differs from what the documentation implies. Queries using "maker", "artist", or "principalMaker" to filter by artist return 400 errors. The parameter that works is "creator", and it is the only one that does. A researcher querying the keyless endpoint for works by a specific artist will need to know this before they begin.

The authenticated API is more capable and better behaved. Researchers planning extended work with the Rijksmuseum collection are likely to find registration worthwhile.

Unexpected architecture: The Getty

A different challenge appears when an institution's programmatic access exists but follows a model outside the conventions most museum API documentation establishes. The Getty's collection is publicly accessible without any API key, but the path to that access differs from standard approaches. Requests to /search, /api/search, or similar endpoints return 404 errors, and the Getty's web collection search returns HTML from a JavaScript application rather than JSON. These are the natural first attempts, and none of them work. A researcher who arrives without consulting documentation or knowing what does work will spend time ruling out problems that do not exist before finding the approach that does. What works is a two-layer architecture.

The first layer is discovery via SPARQL. The Getty exposes a public endpoint at https://data.getty.edu/museum/collection/sparql. Researchers POST a SPARQL query with Content-Type: application/sparql-query and Accept: application/sparql-results+json headers. No registration or API key is required. A query filtering by artist name using FILTER(CONTAINS(LCASE(?creatorName), "monet")) returns object URIs in the form https://data.getty.edu/museum/collection/object/{uuid}. Searching by last name in lowercase catches variant spellings. Broad queries can time out; keeping queries focused on one artist at a time and using LIMIT makes the endpoint reliable.

The second layer is metadata and image retrieval via Linked Art JSON-LD. Each object URI from the SPARQL results can be dereferenced by sending a request with Accept: application/ld+json, which returns a full Linked Art 1.0 record. The representation field contains a ready-to-use IIIF image URL; the made_of field contains material classifications useful for filtering to oil paintings; produced_by contains attribution and dates; subject_to contains rights information, with CC0 for open access works. Image quality is high, and the IIIF endpoint supports arbitrary resolution.

A few practical notes worth knowing before starting: Getty returns artist names with parenthetical biographical information ("Claude Monet (French, 1840–1926)"), which requires regex cleaning before the names can be used for display or matching. Not all objects classified under a given artist will be oil paintings; the made_of field needs to be checked. The older Linked Art endpoints documented in many online posts and GitHub threads no longer function. Only the Linked Art 1.0 endpoints, stable since mid-2024, are currently reliable.

The Getty's image quality is excellent, and the collection is genuinely accessible once the architecture is understood.

Absence of programmatic access: WikiArt and the Musée d'Orsay

The final category requires the least diagnosis but the most adjustment: some collections simply do not offer programmatic access, for reasons that range from rights complexity to infrastructure priorities to institutional policy.

WikiArt is an aggregator rather than an institutional collection; its images are not uniformly in the public domain, and automated requests are actively blocked. Researchers looking for programmatic access to the works it indexes will find more reliable paths through the originating institutions directly. Although a helpful tool in many respects for computational and traditional art history research, WikiArt isn’t the right tool when building an image-based dataset.

The Musée d'Orsay presents a more nuanced situation. It holds one of the most significant Impressionist collections in the world, and its curatorial and conservation work is among the most serious anywhere. The institution has invested heavily in digitization, and its online collection is genuinely rich for browsing and research. What it does not currently offer is a public API or open image downloads. This reflects a broader pattern among French national institutions, where rights frameworks, digitization infrastructure, and public access policy have developed within a different legal and cultural context than in the United States or the Netherlands. The absence of programmatic access is not a failure of commitment to scholarship; it is the outcome of a different set of institutional, legal, and funding pressures. Researchers who want to engage seriously with the Orsay's holdings will find the collection site itself more useful than this article's silence on it might suggest.

Continuing expansion and clarification

The collections described in this article are, taken together, genuinely accessible. A researcher who wants to work programmatically with Impressionist holdings across multiple institutions can do so. The images are there, the metadata is there, and the rights status on the overwhelming majority of pre-twentieth-century works is unambiguous. None of the technical challenges described here is demanding: a researcher with basic familiarity with HTTP requests and JSON, and with SPARQL for the Getty, can navigate all of it.

The Musée d'Orsay's absence from this landscape is not a permanent condition. French institutions have been moving, if gradually, toward broader digital access, and the policy environment is not static. It is a reminder that open access has developed unevenly across national contexts, and that the Impressionist record as it exists online reflects particular institutional histories as much as it reflects the works themselves.

The next chapter of open access in art history is less about rights and more about legibility: making the practical requirements of programmatic access as clear as the license terms. The institutions that have done the hard work of opening their collections are well-placed to lead that effort, and the researchers who benefit from it have reason to support them in doing so.

Kiersten Thamm

Dr. Thamm bridges art history and technology, researching their mutual influence and supporting historians using computational technology for new forms of knowledge production.

Next
Next

The Eva Gonzalès Digital Catalogue Raisonné: Her work, relationships, and legacy