SQR-115

The RSP Visit metadata enrichment service#

Abstract

Design of the RSP Visit metadata enrichment service.

Overview#

The Visit metadata enrichment service ingests a “visit ready” notification from the Prompt Publication Service (see DMTN-330), enriches the visit record with metadata from external systems, transforms selected fields into IVOA ObsCore-compatible columns, and upserts the final result into the “visit metadata table”.

The service is implemented as:

  • A FastAPI application that accepts inbound notifications and registers work.

  • An Safir/arq background worker that performs asynchronous enrichment tasks.

  • A PostgreSQL database that stores the queue/job state.

External integrations with:

  • The prompt Butler data repository from the Prompt Publication service for querying visit-related metatadata from datasetTypes such as calexp.visitInfo, preliminary_visit_summary and *Metrics.

Note

The design of the visit metadata table has implications on specific dataset types that need to be included in the Prompt Publication Butler repository. See RFC-1134 and Prompt Publication Service Roadmap.

  • The EFD for querying telemetry information over the visit timespan via the InfluxDB API.

The “visit metadata table” is the main output of the RSP Visit metadata enrichment service. It will populate one row per visit, containing ObsCore-compatible and LSST-specific metadata.

The ObsCore-compatible metadata will allow for interoperability with other IVOA-compliant services, while the LSST-specific metadata will provide additional context and information relevant to science users such as environmental information, processing status and data quality information.

The Initial design of the “visit metadata table” is presented in the Appendix A.

Note

Considerations for the “visit metadata table” implementation.

A VisitProcessingSummary table appears to be a good fit for the APDB schema, and for its public-facing counterpart, the PPDB. It would complement DetectorVisitProcessingSummary by providing visit-level metadata.

My initial impression is that the current APDB schema does not contain all of the columns needed to map to the ObsCore columns listed in Appendix A. Identifying and sourcing the the required visit-level metadata is therefore a necessary first step before mapping to ObsCore-compatible columns.

For DP1, ObsCore was implemented as a table separate from the DP1 schema. Following that design, the visit metadata enrichment service could potentially populate two tables, VisitProcessingSummary, containing visit-level metadata intended for science users and ObsCore, containing the subset of columns needed for IVOA-compliant services.

ObsTAP is already implemented to provide access to the ObsCore table.

At the time of writing this, I do not yet know how the ObsCore table was populated in DP1. It seems likely that we will still need a service to populate ObsCore incrementally as observations become available.

For summarized EFD telemetry over the visit timespan, we will likely need multiple tables unless we limit ourselves to a very small subset of EFD data. This would avoid creating very wide tables, such as the ones used in the ConsDB Transformed EFD LSSTCam. It may make sense to treat this as a separate implementation, potentially handled by another service.

Workflow design#

Main sequence of operations for one visit enrichment:

  1. Prompt Publication Service sends POST /visits/ready.

  2. FastAPI writes a pending job record in the database.

  3. FastAPI enqueues an arq job, passing the job id and visit id.

  4. Worker marks job in_progress.

  5. Worker fetches EFD metadata for the visit timespan window.

  6. Worker fetches Butler metadata from required datasetsTypes.

  7. Worker aggregates and normalizes both sources.

  8. Worker upserts the final row into VisitProcessingSummary.

  9. Worker maps normalized metadata to ObsCore-compatible columns.

  10. Worker upserts the ObsCore-compatible metadata into the ObsCore table.

  11. Worker marks the job completed.

On transient failures, workers records error state and retries. On permanent failure or max retries exceeded, worker marks the job failed.

Visit registration#

Prompt Publication service will send a notification when a visit is ready for enrichment. The POST body contains the minimum visit registration payload:

{
   "band": "g",
   "instrument": "LSSTCam",
   "day_obs": 20260327,
   "physical_filter": "g_6",
   "visit": 20260327123456,
   "timespan": {
   "begin": "2026-03-27T08:15:10Z",
   "end": "2026-03-27T08:15:45Z"
   }
}

Job granularity#

For simplicity, use one parent job that coordinates enrichment for one visit.

One arq job per visit: enrich_visit(job_id, visit_id).

Inside the job, call internal async functions for:

  • fetch_efd_metadata(...)

  • fetch_butler_metadata(...)

  • build_obscore_record(...)

  • upsert_VisitProcessingSummary(...)

This keeps the state machine simple and ensures visit metadata is enriched and upserted atomically.

A future version could decompose into separate queued jobs.

Job tracking table#

Create a PostgreSQL table, for example enrichment_job, to track the durable state of the workflow.

External integrations#

EFD metadata task#

Query the InfluxDB API around the visit timespan and compute aggregate telemetry metadata.

Butler metadata task#

Retrieve visit-related metadata from the Butler repository, specifically datasets such as:

  • calexp.visitInfo,

  • preliminary_visit_image

  • preliminary_visit_summary

  • *Metrics

ObsCore transformation#

Map the normalized metadata to ObsCore-compatible columns.

Appendix A: Initial design of the Visit metadata table#

ObsCore space/time/wavelength data:

  • Boresight (at center of LSSTCam): s_ra and s_dec, in ICRS degrees

  • Spatial coverage of the visit: ObsCore s_region, reporting a simplified 12-vertex polygon for the camera outline on the sky

  • Observation timing: t_min and t_max, in MJD (as specified by ObsCore; unclear if TAI time scale is standards-compliant, but can be documented with TIMESYS)

  • Rubin-specific logical and physical filter identities, as lsst_band and lsst_filter

  • Structural metadata like s_xel1/2 can be nulled as they are not well-defined for visits as a whole

  • The instrument_name column should be included, with its standard value, if there is any chance that AuxTel data might also be reported this way someday

Data access information:

  • For the DataLink service, the access_url and access_format are essential; access_estsize can be nulled

Environmental information:

  • A subset of available EFD information should be provided

Processing status (from AP?):

  • A no-coverage / partial-coverage / full-coverage indicator for template coverage would be useful

  • Number of alerts issued

Data quality information (from AP?)

  • PSF size and, if available, zero point / grey extinction value

  • What else is immediately useful? Assume that we’ll provide links to other more detailed tables.