PolyASite - Exploring 3' end processing

PolyASite hosts the following poly(A) site annotations:

+ Caenorhabditis elegans: v2.0 (WBcel235)

Read more about the clustering process that yields our curated datasets of poly(A) sites.

A clustering procedure has been implemented to group together closely-spaced poly(A) sites, that most likely are due to imprecision in cleavage or processing.

For this, individual cleavage positions in the genome are sorted, first by the number of samples in which a site has been identified and, for equal numbers of samples, by the total tags per million (TPM).

The list is then traversed from most supported sites to least supported ones and the clustering procedure is applied. We have determined that the number of clusters decreases rapidly up to a distance of 12 nucleotides around the most used cluster representative. Thus, we constructed clusters of sites by grouping sites with lower read support that were located from -12 to +12 nucleotides around sites with strong support.

Clusters that were flagged as 'putative internal priming clusters' (because one of their poly(A) sites resides within an A-rich poly(A) signal) were retained if

a) they shared a poly(A) signal with a non-IP cluster downstream in which case they were merged into the downstream cluster, or
b) their most downstream poly(A) signal was at least 15 nucleotides upstream of the most distal poly(A) site.

In another pass, sites that were located within 25 nucleotides of each other were clustered together, and finally, for clusters with no annotated poly(A) signals a more permissive distance of 50 nucleotides was used in clustering.

Mus musculus: v2.0 (GRCm38.96)

Date of release: 20/04/2020

Total reads: 1,167,552,603

3'-end sequencing libraries: 178
See all contributing samples.

GSM1089085
GSM1089086
GSM1089087
GSM1089088
GSM1089089
GSM1089090
GSM1089091
GSM1089092
GSM1089093
GSM1089094
GSM1089095
GSM1089096
GSM1268946
GSM1268947
GSM1268948
GSM1268949
GSM1268950
GSM1268951
GSM1268952
GSM1268953
GSM1268954
GSM1268955
GSM1268956
GSM1268957
GSM1268958
GSM1327166
GSM1327167
GSM1327168
GSM1327169
GSM1480973
GSM1480974
GSM1480975
GSM1480976
GSM1480977
GSM1480978
GSM1480979
GSM1480980
GSM1518071
GSM1518072
GSM1518073
GSM1518074
GSM1518075
GSM1518076
GSM1518077
GSM1518078
GSM1518079
GSM1518080
GSM1518081
GSM1518082
GSM1518083
GSM1518084
GSM1518085
GSM1518086
GSM1518087
GSM1518088
GSM1518089
GSM1518090
GSM1518091
GSM1518092
GSM1518093
GSM1518094
GSM1518095
GSM1518096
GSM1518097
GSM1518098
GSM1518099
GSM1518100
GSM1518101
GSM1518102
GSM1518103
GSM1518104
GSM1518105
GSM1518106
GSM1518107
GSM1518108
GSM1518109
GSM1518110
GSM1518111
GSM1518112
GSM1518113
GSM1518114
GSM1586363
GSM1586364
GSM1586365
GSM1586366
GSM1586367
GSM1586368
GSM1865359
GSM1865360
GSM1865361
GSM1865362
GSM1865363
GSM1865364
GSM1865365
GSM1906926
GSM1906927
GSM1906928
GSM1906929
GSM1906930
GSM1906931
GSM2467568
GSM2467569
GSM2467570
GSM2467571
GSM2467572
GSM2467573
GSM2467574
GSM2467575
GSM2467576
GSM2467577
GSM2467578
GSM2467579
GSM2467580
GSM2467581
GSM2467582
GSM2717200
GSM2717201
GSM2717202
GSM2717203
GSM2717204
GSM2717205
GSM2717206
GSM2717207
GSM2717208
GSM2717209
GSM2717222
GSM2717223
GSM2717224
GSM2717225
GSM2717226
GSM2717227
GSM2717228
GSM2717229
GSM2901339
GSM2901340
GSM2901341
GSM2901342
GSM2901343
GSM2901344
GSM2901345
GSM2901346
GSM2901347
GSM2901348
GSM2901349
GSM2901350
GSM2901351
GSM3022814
GSM3022815
GSM3022816
GSM3022817
GSM3022818
GSM3022819
GSM3022820
GSM3022821
GSM3022822
GSM3022823
GSM3022824
GSM3022825
GSM3022826
GSM3022827
GSM3022828
GSM624687
GSM747481
GSM747482
GSM747483
GSM747484
GSM747485
SRX304982
SRX304983
SRX480169
SRX480179
SRX480205
SRX480212
SRX480221
SRX480227
SRX480229
SRX480250
SRX480287

Protocols: 3'READS, DRS, SAPAS, PAPERCLIP, 2P-Seq, PolyA-seq, PAS-Seq, A-seq, 3P-Seq

Number of poly(A) site clusters: 301,006

Percentage of clusters with poly(A) signal: 72
List of poly(A) signals.

Poly(A) signals are considered if they reside in a region of 60 nt upstream to 10 nt downstream of one of the poly(A) sites of a cluster

AATAAA
ATTAAA
TATAAA
AGTAAA
AATACA
CATAAA
AATATA
GATAAA
AATGAA
AATAAT
AAGAAA
ACTAAA
AATAGA
ATTACA
AACAAA
ATTATA
AACAAG
AATAAG

Cluster annotations

Terminal exon:	101,531
Other exon:	13,079
Intron:	58,184
Downstream of terminal exon:	14,657
Antisense exon:	4,386
Antisense intron:	34,671
Antisense upstream of a gene:	4,124
Intergenic:	70,374

Add as custom track @ UCSC genome browser

Atlas BED file

We follow the standard BED specification with 0-based coordinates. Additionally, we appended extra column(s). For more information click here.

The columns represent:

first - chromosome name

second and third - start and end positions of the poly(A) site cluster, respectively

fourth - unique cluster ID, composed of the chromosome name, the representative poly(A) site of the cluster and the strand. Note that this ID format is inspired by UCSC's position format, which uses 1-based coordinates instead of the 0-based bed coordinates used in the second and third columns. Thus, to convert the position of the representative site to bed coordinates, subtract 1.

fifth - average expression (tags per million, tpm) across all samples

sixth - strand on which the cluster is encoded

seventh - percentage of samples that support the particular cluster

eighth - number of different 3' end sequencing protocols that support the particular cluster

ninth - average expression (tags per million, tpm) across all samples

tenth - two letter code for the cluster annotation (in order of decreasing priority: TE, terminal exon; EX, exonic; IN, intronic; DS, 1,000 nt downstream of an annotated terminal exon; AE, anti-sense to an exon; AI, anti-sense to an intron; AU, 1,000 nt upstream in anti-sense direction of a transcription start site; IG, intergenic)

eleventh - information about the poly(A) signal(s) that are present upstream of the poly(A) site, including the motif, the location with respect to the cleavage site and the genomic coordinate

Alternatively, you can download our atlas with average TPMs for all contributing samples as a tab separated file:

Atlas with Samples TSV

For more information on columns click here.

The columns represent:

first - chromosome name

second and third - start and end positions of the poly(A) site cluster, respectively

fifth - average expression (tags per million, tpm) across all samples

sixth - strand on which the cluster is encoded

seventh - representative poly(A) site of the cluster

eighth - percentage of samples that support the particular cluster

ninth - number of different 3' end sequencing protocols that support the particular cluster

eleventh - gene symbols for annotated genes overlapping with the cluster

twelfth - Ensembl gene IDs for annotated genes overlapping with the cluster

thirteenth - poly(A) signals in the region of -60 to +10 nucleotides around the representative site of the cluster with relative (e.g. @-28) and absolute position on the chromosome (e.g. @1001018)

fourteenth onwards - Sample information, consisting of SAMPLE_ID|PROTOCOL|SOURCE|TITLE|TREATMENT

Homo sapiens: v2.0 (GRCh38.96)

Date of release: 21/04/2020

Total reads: 1,104,077,259

3'-end sequencing libraries: 221
See all contributing samples.

Protocols: 3'-Seq (Mayr), 3'READS, DRS, QuantSeq_REV, SAPAS, PAPERCLIP, PolyA-seq, PAS-Seq, A-seq, 3P-Seq

Number of poly(A) site clusters: 569,005

Percentage of clusters with poly(A) signal: 76
List of poly(A) signals.

Poly(A) signals are considered if they reside in a region of 60 nt upstream to 10 nt downstream of one of the poly(A) sites of a cluster

AATAAA
ATTAAA
TATAAA
AGTAAA
AATACA
CATAAA
AATATA
GATAAA
AATGAA
AATAAT
AAGAAA
ACTAAA
AATAGA
ATTACA
AACAAA
ATTATA
AACAAG
AATAAG

Cluster annotations

Terminal exon:	143,658
Other exon:	21,804
Intron:	165,859
Downstream of terminal exon:	19,865
Antisense exon:	16,135
Antisense intron:	78,441
Antisense upstream of a gene:	4,353
Intergenic:	118,890

Add as custom track @ UCSC genome browser

Atlas BED file

We follow the standard BED specification with 0-based coordinates. Additionally, we appended extra column(s). For more information click here.

The columns represent:

first - chromosome name

second and third - start and end positions of the poly(A) site cluster, respectively

fifth - average expression (tags per million, tpm) across all samples

sixth - strand on which the cluster is encoded

seventh - percentage of samples that support the particular cluster

eighth - number of different 3' end sequencing protocols that support the particular cluster

ninth - average expression (tags per million, tpm) across all samples

eleventh - information about the poly(A) signal(s) that are present upstream of the poly(A) site, including the motif, the location with respect to the cleavage site and the genomic coordinate

Alternatively, you can download our atlas with average TPMs for all contributing samples as a tab separated file:

Atlas with Samples TSV

For more information on columns click here.

The columns represent:

first - chromosome name

second and third - start and end positions of the poly(A) site cluster, respectively

fifth - average expression (tags per million, tpm) across all samples

sixth - strand on which the cluster is encoded

seventh - representative poly(A) site of the cluster

eighth - percentage of samples that support the particular cluster

ninth - number of different 3' end sequencing protocols that support the particular cluster

eleventh - gene symbols for annotated genes overlapping with the cluster

twelfth - Ensembl gene IDs for annotated genes overlapping with the cluster

thirteenth - poly(A) signals in the region of -60 to +10 nucleotides around the representative site of the cluster with relative (e.g. @-28) and absolute position on the chromosome (e.g. @1001018)

fourteenth onwards - Sample information, consisting of SAMPLE_ID|PROTOCOL|SOURCE|TITLE|TREATMENT

Caenorhabditis elegans: v2.0 (WBcel235)

Date of release: 17/04/2020

Total reads: 67,268,436

3'-end sequencing libraries: 22
See all contributing samples.

Protocols: PAT-seq, 3P-Seq

Number of poly(A) site clusters: 20,931

Percentage of clusters with poly(A) signal: 81
List of poly(A) signals.

Poly(A) signals are considered if they reside in a region of 60 nt upstream to 10 nt downstream of one of the poly(A) sites of a cluster

AATAAA
ATTAAA
TATAAA
AGTAAA
AATACA
CATAAA
AATATA
GATAAA
AATGAA
AATAAT
AAGAAA
ACTAAA
AATAGA
ATTACA
AACAAA
ATTATA
AACAAG
AATAAG

Cluster annotations

Terminal exon:	13,885
Other exon:	660
Intron:	567
Downstream of terminal exon:	5,023
Antisense exon:	78
Antisense intron:	326
Antisense upstream of a gene:	85
Intergenic:	307

Add as custom track @ UCSC genome browser

Atlas BED file

We follow the standard BED specification with 0-based coordinates. Additionally, we appended extra column(s). For more information click here.

The columns represent:

first - chromosome name

second and third - start and end positions of the poly(A) site cluster, respectively

fifth - average expression (tags per million, tpm) across all samples

sixth - strand on which the cluster is encoded

seventh - percentage of samples that support the particular cluster

eighth - number of different 3' end sequencing protocols that support the particular cluster

ninth - average expression (tags per million, tpm) across all samples

eleventh - information about the poly(A) signal(s) that are present upstream of the poly(A) site, including the motif, the location with respect to the cleavage site and the genomic coordinate

Alternatively, you can download our atlas with average TPMs for all contributing samples as a tab separated file:

Atlas with Samples TSV

For more information on columns click here.

The columns represent:

first - chromosome name

second and third - start and end positions of the poly(A) site cluster, respectively

fifth - average expression (tags per million, tpm) across all samples

sixth - strand on which the cluster is encoded

seventh - representative poly(A) site of the cluster

eighth - percentage of samples that support the particular cluster

ninth - number of different 3' end sequencing protocols that support the particular cluster

eleventh - gene symbols for annotated genes overlapping with the cluster

twelfth - Ensembl gene IDs for annotated genes overlapping with the cluster

thirteenth - poly(A) signals in the region of -60 to +10 nucleotides around the representative site of the cluster with relative (e.g. @-28) and absolute position on the chromosome (e.g. @1001018)

fourteenth onwards - Sample information, consisting of SAMPLE_ID|PROTOCOL|SOURCE|TITLE|TREATMENT

Want to use the newest version? Check out the single cell atlas.

Missing older versions of our atlas? Find it in our archive.