Linux 碱基序列处理
- 从基因注释文件提取基因信息
Zea_mays.B73_RefGen_v4.42.gtf gtf格式的注释文件
- 提取所有基因的位置信息
cat Zea_mays.B73_RefGen_v4.42.gtf|awk -F "\t" '{if($3~/gene/)print $1","$4","$5","$9}' >B73-V442.position.txt
- 提取制定染色体的所有基因的位置信息
提取chr5
cat Zea_mays.B73_RefGen_v4.42.gtf|awk -F "\t" '{if($3~/gene/&&$1~/5/)print $1","$4","$5","$9}' >B73-V442.Chr5.csv
- 提取chr3,chr5
cat Zea_mays.B73_RefGen_v4.42.gtf|awk -F "\t" '{if($3~/gene/&&$1~/^[3,5]/)print $1","$4","$5","$9}'
- 提取指定染色体指定区间的所有基因
指定第3条染色体,区间左侧是100450260,区间右侧是112424507的所有的基因和其位置信息
cat Zea_mays.B73_RefGen_v4.42.gtf|awk -F "\t" '{if($3~/gene/&&$1~/3/&&$4>100450260&&$5<112424507)print $1","$4","$5","$9}'