Custom Connector Profile

Author: Matillion
Date Posted: Jul 9, 2024
Last Modified: Jul 23, 2024

TCGA Genomic Data Commons Data Portal

Extract case, exposure, diagnosis and gene expression quantification files from The Cancer Genome Atlas (TCGA)

This Data Productivity Cloud Custom Connector extracts and loads open access data from the Genomic Data Commons Data Portal for analysis.

Image ofExtract from the Genomic Data Commons Data Portal
Extract from the Genomic Data Commons Data Portal

Authentication

No authentication is required for open access data.

Endpoints

  • cases - Retrieve the metadata associated with one or more cases, including all nested biospecimen entities
  • geq_files - The “search” part of the GDC API’s “Search and Retrieval” functionality for files

Parameters

The cases endpoint can be configured by setting query parameters:

  • from - the start point for paging. Recommend leaving this at its default value 0
  • size - the number of records per page, default 1000
  • fields - a comma-separated list of fields to extract for every case

The geq_files endpoint is a “Search and Retrieval” request that extracts a list of gene expression quantification file names. Users are expected to subsequently download the files one by one or in bulk using the GDC API’s file download functionality. Filters and field selection can be changed by editing the POST body as per the documentation.


Downloads

Licensed under: Matillion Free Subscription License

Installation Instructions

How to Install a Data Productivity Cloud Custom Connector