Transcript REST API Reference

This document is a work in progress. Meanwhile, see gRPC transcoding and consult the gRPC API reference.

Authorization

Calls to the Tiro services need to be authorized with an API key or a JWT token signed with the client's private key. The key or token is either supplied in an HTTP header Authorization: Bearer ACCESS_TOKEN.

Contact tiro@tiro.is to gain access or request an access token.

Server host

https://ritari.talgreinir.is

Submitting transcript jobs

Submit a new media file to be transcribed. Media should be sent in via a URL.

POST /v1alpha1/transcriptjob:submit

Request body fields (JSON)

See the gRPC reference for documentation on the fields.

Response fields (JSON)

See the gRPC reference for documentation on the fields.

Examples (cURL)

Example that submits an audio file via URL to be transcribed. For submitting a URL the required fields are: metadata.languageCode, useUri, uri and metadata.fileType. Generally available language codes for Icelandic are is-IS and is-IS-x-exp.

curl -X POST \
  -H "Authorization: Bearer $TIRO_TOKEN" \
  https://ritari.talgreinir.is/v1alpha1/transcriptjob:submit -d@payload.json | jq

Where payload.json contains:

{
  "metadata": {
    "fileType": "AUDIO",
    "languageCode": "is-IS",
    "subject": "Test Spegillinn",
    "description": "Test description",
    "keywords": [
      "keyword1",
      "keyword2"
    ]
  },
  "useUri": true,
  "uri": "https://ruv-vod-app-dcp-v4.secure.footprint.net/opid/vefur/200826thingidamorgun.mp3"
}

This will return a TranscriptJob in the form:

{
  "name": "transcriptjob/ea893d3c-...",
  "startTime": "2020-08-26T20:18:29.017571Z",
  "transcriptMetadata": {
    "fileType": "AUDIO",
    "languageCode": "is-IS",
    "originalUri": "https://ruv-vod-app-dcp-4.secure.footprint.net/opid/vefur/200826thingidamorgun.mp3",
    "subject": "Test Spegillinn",
    "description": "Test description",
    "keywords": [
      "keyword1",
      "keyword2"
    ]
  }
}

Transcript jobs for longer media files can take a while to process. To check the status of jobs A GET request is used. To query the status the name returned in the response above is used in the request:

curl -X GET \
  -H "Authorization: Bearer $TIRO_TOKEN" \
  https://ritari.talgreinir.is/v1alpha1/transcriptjob/ea893d3c-...

While the job is being processed (in a PROCESSING state) a response similar to the following is returned, where progressPercent will indicate how much of the audio has been transcribed.

{
  "name": "transcriptjob/ea893d3c-...",
  "state": "PROCESSING",
  "progressPercent": 98,
  "startTime": "2020-08-26T20:18:29.017571Z",
  "lastUpdatedTime": "2020-08-26T20:21:49.869051Z",
  "transcriptMetadata": {
    "fileType": "AUDIO",
    "languageCode": "is-IS",
    "originalUri": "https://ruv-vod-app-dcp-v4.secure.footprint.net/opid/vefur/200826thingidamorgun.mp3",
    "subject": "Test Spegillinn",
    "description": "Test description",
    "keywords": [
      "keyword1",
      "keyword2"
    ]
  }
}

If the transcript job successfully finished (in a SUCCESS state) an example response looks like the following, where the field transcript has been populated. This name is used to retrieve the contents of the transcript.

{
  "name": "transcriptjob/ea893d3c-...",
  "state": "SUCCESS",
  "progressPercent": 100,
  "startTime": "2020-08-26T20:18:29.017571Z",
  "lastUpdatedTime": "2020-08-26T20:22:09.344261Z",
  "transcript": "transcripts/ea893d3c-...",
  "transcriptMetadata": {
    "fileType": "AUDIO",
    "languageCode": "is-IS",
    "originalUri": "https://ruv-vod-app-dcp-v4.secure.footprint.net/opid/vefur/200826thingidamorgun.mp3",
    "subject": "Test Spegillinn",
    "description": "Test description",
    "keywords": [
      "keyword1",
      "keyword2"
    ],
    "recordingDuration": "792.904s"
  }
}

Listing transcripts

List or query transcripts accessible to the authorized user.

GET /v1alpha/transcripts

Query parameters

Parameter	Description
`pageSize`	Number of results returned per page
`pageToken`	Each response contains a `nextPageToken` which can be used to list more results
`filter`	Filter by metadata attached to the transcripts. See filter description.

Filter description

Currently there are only two filters available: Filtering by subject (or title) and keywords (or tags).

To filter by subject specify the filter parameter as: metadata.subject CONTAINS "...".

To filter by keywords specify the filter parameter as: metadata.keywords CONTAINS ["..."]

Examples (cURL)

List all (up to a server specified default) transcripts accessible to the authorized user:

curl -X GET -H "Content-Type: application/json" \
            -H "Authorization: Bearer $ACCESS_TOKEN" \
            https://ritari.talgreinir.is/v1alpha1/transcripts | jq

List transcripts that contain a specific string, Kastljós, in the subject:

curl -X GET -H "Content-Type: application/json" \
            -H "Authorization: Bearer $ACCESS_TOKEN" \
            'https://ritari.talgreinir.is/v1alpha1/transcripts?filter=metadata.subject%20CONTAINS%20%22Kastlj%C3%B3s%22' | jq

These requests return a response with the following structure:

{
  "transcripts": [
    {
      "name": "transcripts/...",
      "metadata": {
        "fileType": "VIDEO",
        "languageCode": "is-IS",
        "originalUri": "https://...",
        "subject": "...",
        "description": "",
        "keywords": [
          "xyz",
        ],
        "additionalMetadata": {
          "abc": "xyz"
        },
        "recordingDuration": "1463.382s",
        "waveformUri": "",
        "speakers": {}
      },
      "segments": [],
      "uri": "",
      "version": {
        "name": "...",
        "parent": "...",
        "creationTime": "2022-05-31T20:29:04.469478Z"
      }
    },
    ...
  ],
  "nextPageToken": "2"
}

Retrieve a transcript

Get the contents and metadata of a transcript identified by the name TRANSCRIPT_NAME, i.e. the contents of the name field described above.

GET /v1alpha/TRANSCRIPT_NAME

This endpoint returns the metadata in the same structure as when listing accessible transcripts in addition to a segments field which contains the time-aligned segments. The full text for the transcript is obtained by concatenating every word in every segment in order.

Examples (cURL)

Retrieve a transcript with the name transcripts/8657e641-...:

curl -X GET -H "Content-Type: application/json" \
            -H "Authorization: Bearer $ACCESS_TOKEN" \
            'https://ritari.talgreinir.is/v1alpha1/transcripts/8657e641-...' | jq

Example response:

{
  "name": "transcripts/8657e641-...",
  "metadata": {
    "fileType": "VIDEO",
    "languageCode": "is-IS",
    "originalUri": "...",
    "subject": "Example subject",
    "description": "",
    "keywords": [
      "examplekeyword",
    ],
    "additionalMetadata": {
      "xyz": "abc"
    },
    "recordingDuration": "1463.382s",
    "waveformUri": "...",
    "speakers": {}
  },
  "segments": [
    {
      "startTime": "18.415s",
      "endTime": "28.196s",
      "words": [
        {
          "startTime": "18.415s",
          "endTime": "18.625s",
          "word": "Gott "
        },
        {
          "startTime": "18.625s",
          "endTime": "18.924s",
          "word": "kvöld "
        },
        {
          "startTime": "18.926s",
          "endTime": "19.016s",
          "word": "og "
        },
        ...
      ],
      "speakerId": ""
    },
    ...,
    {
      "startTime": "1457.024s",
      "endTime": "1463.382s",
      "words": [
        {
          "startTime": "1457.024s",
          "endTime": "1457.114s",
          "word": "af "
        },
        {
          "startTime": "1457.114s",
          "endTime": "1457.294s",
          "word": "hverju "
        },
        ...
      ],
      "speakerId": ""
    }
  ],
  "uri": "https://...",
  "version": {
    "name": "435ae148-...",
    "parent": "7ba2d6f4-...",
    "creationTime": "2022-05-31T20:29:04.469478Z"
  }
}

Create a transcript

Create a user created transcript using caller supplied text and timestamps.

POST /v1alpha1/transcripts

The body of the request is a Transcript in the same format as returned when retrieving a transcript. Note that the word field of each word in a segment also includes any whitespace that should appear before the next word in the segment.

Examples (cURL)

curl -X POST -H "Content-Type: application/json" \
             -H "Authorization: Bearer $ACCESS_TOKEN" \
             'https://ritari.talgreinir.is/v1alpha1/transcripts' -d@payload.json | jq

where payload.json contains:

{
  "metadata": {
    "fileType": "AUDIO",
    "languageCode": "is-IS",
    "subject": "Example subject",
    "keywords": [
      "examplekeyword"
    ],
    "dictation": true
  },
  "segments": [
    {
      "startTime": "18.415s",
      "endTime": "28.196s",
      "words": [
        {
          "startTime": "18.415s",
          "endTime": "18.625s",
          "word": "Gott "
        },
        {
          "startTime": "18.625s",
          "endTime": "18.924s",
          "word": "kvöld "
        },
        {
          "startTime": "18.926s",
          "endTime": "19.016s",
          "word": "og "
        },
        ...
      ]
    },
    ...,
    {
      "startTime": "1457.024s",
      "endTime": "1463.382s",
      "words": [
        {
          "startTime": "1457.024s",
          "endTime": "1457.114s",
          "word": "af "
        },
        {
          "startTime": "1457.114s",
          "endTime": "1457.294s",
          "word": "hverju "
        },
        ...
      ]
    }
  ]
}

Example response:

{"name": "transcripts/8657e641-..."}

Which can be used to retrieve this transcript.

Update a transcript

Update the contents and/or metadata of a transcript identified by the name TRANSCRIPT_NAME, i.e. the contents of the name field described above.

PATCH /v1alpha1/TRANSCRIPT_NAME

The body of the request is a partial Transcript in the same format as returned when retrieving a transcript. Note that the word field of each word in a segment also includes any whitespace that should appear before the next word in the segment. The updatable fields are segments and metadata, and only one has to be present in the request. The return value is the full updated Transcript.

Examples (cURL)

Example that updates only the segments, i.e. the content of the transcript.

curl -X PATCH -H "Content-Type: application/json" \
             -H "Authorization: Bearer $ACCESS_TOKEN" \
             'https://ritari.talgreinir.is/v1alpha1/transcripts/8657e641-...' -d@payload.json | jq

where payload.json contains:

{
  "segments": [
    {
      "startTime": "18.415s",
      "endTime": "28.196s",
      "words": [
        {
          "startTime": "18.415s",
          "endTime": "18.625s",
          "word": "Vont "
        },
        {
          "startTime": "18.625s",
          "endTime": "18.924s",
          "word": "kvöld "
        },
        {
          "startTime": "18.926s",
          "endTime": "19.016s",
          "word": "og "
        },
        ...
      ]
    },
    ...,
    {
      "startTime": "1457.024s",
      "endTime": "1463.382s",
      "words": [
        {
          "startTime": "1457.024s",
          "endTime": "1457.114s",
          "word": "af "
        },
        {
          "startTime": "1457.114s",
          "endTime": "1457.294s",
          "word": "hverju "
        },
        ...
      ]
    }
  ]
}

Example response:

{
  "name": "transcripts/8657e641-...",
  "metadata": {
    "fileType": "AUDIO",
    "languageCode": "is-IS",
    "dictation": true,
    "dataSource": "DATA_SOURCE_UNSPECIFIED",
    "subject": "Example subject",
    "description": "",
    "keywords": [
      "examplekeyword"
    ],
    "additionalMetadata": {},
    "recordingDuration": null,
    "originalCharLength": 0,
    "originalByteLength": 0,
    "waveformUri": "",
    "speakers": {}
  },
  "segments": [
    {
      "startTime": "18.415s",
      "endTime": "28.196s",
      "words": [
        {
          "startTime": "18.415s",
          "endTime": "18.625s",
          "word": "Vont "
        },
        {
          "startTime": "18.625s",
          "endTime": "18.924s",
          "word": "kvöld "
        },
        {
          "startTime": "18.926s",
          "endTime": "19.016s",
          "word": "og "
        },
        ...
      ]
    },
    ...,
    {
      "startTime": "1457.024s",
      "endTime": "1463.382s",
      "words": [
        {
          "startTime": "1457.024s",
          "endTime": "1457.114s",
          "word": "af "
        },
        {
          "startTime": "1457.114s",
          "endTime": "1457.294s",
          "word": "hverju "
        },
        ...
      ]
    }
  ],
  "uri": "",
  "version": {
    "name": "c77689ff-2ced-4fda-868c-3aab9a2b263a",
    "parent": "54ac33c9-81b1-4bc8-af47-b790fd5c7224",
    "creationTime": "2024-05-16T13:45:28.949372Z"
  }
}

Uploading an audio file for a user created transcript

Generate an upload URL using.

POST /v1alpha1/initupload

Body fields (JSON)

Field	Description
`resourceName`	The name of the transcript (or other resource) for which to generate an upload URL

Response fields (JSON)

Field	Description
`gcsSignedUrl`	Temporary URL that can be uploaded to using a `PUT` request.

Example (CURL)

Generate an upload URL:

curl -X POST -H "Content-Type: application/json" \
             -H "Authorization: Bearer $TIRO_TOKEN" \
             'https://ritari.talgreinir.is/v1alpha1/initupload' -d@-
{"resourceName": "transcripts/8657e641-..."}

which will generate a response containing an upload URL:

{
  "gcsSignedUrl": "https://storage.googleapis.com/upload/storage/v1/b/talgreinir-is-transcript-assets/..."
}

Which can be uploaded to using:

curl -X PUT --data-binary @audio_file.wav \
     "https://storage.googleapis.com/upload/storage/v1/b/talgreinir-is-transcript-assets/..."

Once an audio (or video) file has been uploaded for a transcript, the uri field in response when retrieving a transcript will contain a temporary URL.