Thoth
The Thoth Telescope downloads, transforms and loads publisher ONIX feeds from Thoth into BigQuery. ONIX is a standard format that book publishers use to share information about the books that they have published.
Thoth is a free, open metadata service that publishers can choose to utilise as a solution for metadata storage. Thoth can provide metadata upon request in a number of formats. The Thoth Telescope used the Thoth Export API to download metadata in an ONIX format. This API provides a snapshot of a specified publisher’s metadata at the time of request. It requires the publisher’s ID as part of the URL, which can be found using the GraphiQL API.
The Thoth telescope downloads the ONIX metadata files and then transforms the data into a format suitable for loading into BigQuery with the ONIX parser Java command line tool. This is a near-identical process to how the ONIX telescope’s data-transformation step is executed. The transformed data is loaded into BigQuery, where it can be picked up and used by the ONIX Workflow.
The corresponding table in BigQuery is onix.onixYYYYMMDD
.
Summary |
|
---|---|
Average runtime |
5-10 mins |
Average download size |
1-10 MB |
Harvest Type |
URL |
Harvest Frequency |
Weekly |
Runs on remote worker |
False |
Catchup missed runs |
False |
Credentials Required |
No |
Uses Telescope Template |
None |
Each shard includes all data |
Yes |
Configuration
The following settings need to be configured for the Thoth telescope.
Telescope API Instance
A Thoth Telescope API instance needs to be created. Unlike the ONIX telescope, it does not require any ‘extra’ fields.
Airflow Connections
The Thoth telescope does not require any airflow connections to run, as the Thoth API is freely usable.