SRP133500|GSM3375379

Source: MeWo
Organism: H. sapiens
Protocol: 3'READS
Treatment: N/A
Read length: 76
Number of reads: 13,642,284

Versions:

2.0 | GRCh38.96

Reads longer than 15 nt after removing adapters: 10,665,339
Uniquely mapped reads: 6,872,119
Reads that contribute to 3' ends: 5,707,233
Criteria for including reads.
1. the read was mapped to a unique position in the genome
2. the last four nucleotides of the read were perfectly aligned to the genome
3. the last nucleotide of the read was not an adenine
4. the read length was at most 70
5. the read was composed of no more than 80.0% As
6. the read had at most 2 ambiguous nucleotides
Number of 3' ends: 371,522
Percentage of 3' ends flagged as internal priming: 21.19%
Percentage of reads contributing to 3' ends flagged as internal priming: 12.12%
Criteria for internal priming.
1. more than 7 As within 10 nt downstream of the mapped read
2. more than 6 consecutive As directly downstream of the mapped read

We follow the standard BED specification with 0-based coordinates. Additionally, we appended extra column(s). For more information click here.

The columns represent:

first - chromosome name

second and third - start and end positions of the poly(A) site cluster, respectively

fourth - unique cluster ID, composed of the chromosome name, the representative poly(A) site of the cluster and the strand. Note that this ID format is inspired by UCSC's position format, which uses 1-based coordinates instead of the 0-based bed coordinates used in the second and third columns. Thus, to convert the position of the representative site to bed coordinates, subtract 1.

fifth - expression (tags per million, tpm) in this sample

sixth - strand on which the cluster is encoded

seventh - percentage of samples that support the particular cluster

eighth - number of different 3' end sequencing protocols that support the particular cluster

ninth - average expression (tags per million, tpm) across all samples

tenth - two letter code for the cluster annotation (in order of decreasing priority: TE, terminal exon; EX, exonic; IN, intronic; DS, 1,000 nt downstream of an annotated terminal exon; AE, anti-sense to an exon; AI, anti-sense to an intron; AU, 1,000 nt upstream in anti-sense direction of a transcription start site; IG, intergenic)

eleventh - information about the poly(A) signal(s) that are present upstream of the poly(A) site, including the motif, the location with respect to the cleavage site and the genomic coordinate

Download gzipped BED file Add as custom track @ UCSC genome browser

The data was originally published in: Wang, R. et al. A compendium of conserved cleavage and polyadenylation events in mammalian genes. Genome Res 10, 1427-1441. (2018)