.. _Search_tutorial_label: Search Tutorial =============== 1.1 Load packages ----------------- .. code:: ipython3 import sys import pandas as pd import ECAUGT import time import multiprocessing import numpy as np 1.2 Connect to server --------------------- .. code:: ipython3 # set parameters endpoint = "https://HCAd-Datasets.cn-beijing.ots.aliyuncs.com" access_id = "LTAI5t7t216W9amUD1crMVos" #enter your id and keys access_key = "ZJPlUbpLCij5qUPjbsU8GnQHm97IxJ" instance_name = "HCAd-Datasets" table_name = 'HCA_d' .. code:: ipython3 # # setup client ECAUGT.Setup_Client(endpoint, access_id, access_key, instance_name, table_name) .. parsed-literal:: Connected to the server, find the table. HCA_d TableName: HCA_d PrimaryKey: [('cid', 'INTEGER')] Reserved read throughput: 0 Reserved write throughput: 0 Last increase throughput time: 1605795297 Last decrease throughput time: None table options's time to live: -1 table options's max version: 1 table options's max_time_deviation: 86400 .. parsed-literal:: 0 1.3 Build index --------------- We should check if the index has been built. .. code:: ipython3 ECAUGT.build_index() .. parsed-literal:: index already exist. 2. Search cell with metadata condition -------------------------------------- Conditions are presented in a structured string which is a combination of several logical expressions. Each logical expression should be in the following forms: :: field_name1 == value1, here '==' means equal field_name2 <> value2, here '<>' means unequal Three symbols are used for logical operation between expressions: :: logical_expression1 && logical_expression2, here '&&' means AND operation logical_expression1 || logical_expression2, here '||' means OR operation ! logical_expression1, here '!' means not NOT operation Brackets are allowed and the priorities of the logical operations are as common. The metadata condition string is also robust to the space character. .. code:: ipython3 # get primary keys rows_to_get = ECAUGT.search_metadata("organ == Lung && cell_type == T cell ") .. parsed-literal:: 14894 cells found We found 14894 cells here, and the vairable rows_to_get is a list containing their primary keys. 3. Download data ---------------- We first download three columns of the queried cellls and return them in the DataFrame form. (The first column in the result is the primary keys) For illustration, we only download the first 20 cells. .. code:: ipython3 rows_to_get_2 = rows_to_get[0:20] 3.1 Download interested columns ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 # download data in pandas::DataFrame from ECAUGT.get_columnsbycell_para(rows_to_get = rows_to_get_2, cols_to_get=['cl_name','hcad_name','cell_type'], col_filter=None, do_transfer = True, thread_num = multiprocessing.cpu_count()-1) .. raw:: html
cell_type | cl_name | uHAF_name | |
---|---|---|---|
cid | |||
2000932 | T cell | T cell | Lung-Connective tissue-T cell-CD3D CD8A |
2000962 | T cell | T cell | Lung-Connective tissue-T cell-CD3D IL32 |
2000971 | T cell | T cell | Lung-Connective tissue-T cell-CD3D IL32 |
2000978 | T cell | T cell | Lung-Connective tissue-T cell-CD3D CD8A |
2000987 | T cell | T cell | Lung-Connective tissue-T cell-CD3D IL32 |
2000994 | T cell | T cell | Lung-Connective tissue-T cell-CD3D IL32 |
2001027 | T cell | T cell | Lung-Connective tissue-T cell-CD3D IL32 |
2001030 | T cell | T cell | Lung-Connective tissue-T cell-CD3D CD8A |
2001031 | T cell | T cell | Lung-Connective tissue-T cell-CD3D IL32 |
2001032 | T cell | T cell | Lung-Connective tissue-T cell-CD3D CD8A |
2001038 | T cell | T cell | Lung-Connective tissue-T cell-CD3D IL32 |
2001050 | T cell | T cell | Lung-Connective tissue-T cell-CD3D CD8A |
2001059 | T cell | T cell | Lung-Connective tissue-T cell-CD3D CD8A |
2001065 | T cell | T cell | Lung-Connective tissue-T cell-CD3D IL32 |
2001086 | T cell | T cell | Lung-Connective tissue-T cell-CD3D IL32 |
2001091 | T cell | T cell | Lung-Connective tissue-T cell-CD3D IL32 |
2001099 | T cell | T cell | Lung-Connective tissue-T cell-CD3D IL32 |
2001105 | T cell | T cell | Lung-Connective tissue-T cell-CD3D IL32 |
2001106 | T cell | T cell | Lung-Connective tissue-T cell-CD3D CD8A |
2001112 | T cell | T cell | Lung-Connective tissue-T cell-CD3D CD8A |
CD3D | PTPRC | donor_id | uHAF_name | |
---|---|---|---|---|
cid | ||||
2000962 | 2.598072 | 2.229140 | 343B | Lung-Connective tissue-T cell-CD3D IL32 |
2000987 | 2.138744 | 1.790511 | 343B | Lung-Connective tissue-T cell-CD3D IL32 |
2000994 | 3.055748 | 2.269341 | 343B | Lung-Connective tissue-T cell-CD3D IL32 |
2001099 | 2.682663 | 1.864017 | 343B | Lung-Connective tissue-T cell-CD3D IL32 |
2001112 | 2.417518 | 1.482966 | 343B | Lung-Connective tissue-T cell-CD3D CD8A |
... | ... | ... | ... | ... |
2115395 | 2.729593 | 2.729593 | FetalLung1_12W | Lung-Connective tissue-T cell-CD3D CD1C |
2115433 | 3.851911 | 3.179780 | FetalLung1_12W | Lung-Connective tissue-T cell-CD3D CD1C |
2115441 | 3.591656 | 2.546684 | FetalLung1_12W | Lung-Connective tissue-T cell-CD3D CD1C |
2115483 | 3.181991 | 3.181991 | FetalLung1_12W | Lung-Connective tissue-T cell-CD3D CD1C |
2115498 | 3.124087 | 3.124087 | FetalLung1_12W | Lung-Connective tissue-T cell-CD3D CD1C |
7403 rows × 4 columns