PolyASite hosts the following poly(A) site annotations:
Read more about the clustering process that yields our curated datasets of poly(A) sites.
A clustering procedure has been implemented to group together closely-spaced poly(A) sites, that most likely are due to imprecision in cleavage or processing.
For this, individual cleavage positions in the genome are sorted, first by the number of samples in which a site has been identified and, for equal numbers of samples, by the total tags per million (TPM).
The list is then traversed from most supported sites to least supported ones and the clustering procedure is applied. We have determined that the number of clusters decreases rapidly up to a distance of 12 nucleotides around the most used cluster representative. Thus, we constructed clusters of sites by grouping sites with lower read support that were located from -12 to +12 nucleotides around sites with strong support.
Clusters that were flagged as 'putative internal priming clusters' (because one of their poly(A) sites resides within an A-rich poly(A) signal) were retained if
- a) they shared a poly(A) signal with a non-IP cluster downstream in which case they were merged into the downstream cluster, or
- b) their most downstream poly(A) signal was at least 15 nucleotides upstream of the most distal poly(A) site.
In another pass, sites that were located within 25 nucleotides of each other were clustered together, and finally, for clusters with no annotated poly(A) signals a more permissive distance of 50 nucleotides was used in clustering.
Mus musculus: v2.0 (GRCm38.96)
Total reads: 1,167,552,603
3'-end sequencing libraries:
178
See
all contributing samples.
- GSM1089085
- GSM1089086
- GSM1089087
- GSM1089088
- GSM1089089
- GSM1089090
- GSM1089091
- GSM1089092
- GSM1089093
- GSM1089094
- GSM1089095
- GSM1089096
- GSM1268946
- GSM1268947
- GSM1268948
- GSM1268949
- GSM1268950
- GSM1268951
- GSM1268952
- GSM1268953
- GSM1268954
- GSM1268955
- GSM1268956
- GSM1268957
- GSM1268958
- GSM1327166
- GSM1327167
- GSM1327168
- GSM1327169
- GSM1480973
- GSM1480974
- GSM1480975
- GSM1480976
- GSM1480977
- GSM1480978
- GSM1480979
- GSM1480980
- GSM1518071
- GSM1518072
- GSM1518073
- GSM1518074
- GSM1518075
- GSM1518076
- GSM1518077
- GSM1518078
- GSM1518079
- GSM1518080
- GSM1518081
- GSM1518082
- GSM1518083
- GSM1518084
- GSM1518085
- GSM1518086
- GSM1518087
- GSM1518088
- GSM1518089
- GSM1518090
- GSM1518091
- GSM1518092
- GSM1518093
- GSM1518094
- GSM1518095
- GSM1518096
- GSM1518097
- GSM1518098
- GSM1518099
- GSM1518100
- GSM1518101
- GSM1518102
- GSM1518103
- GSM1518104
- GSM1518105
- GSM1518106
- GSM1518107
- GSM1518108
- GSM1518109
- GSM1518110
- GSM1518111
- GSM1518112
- GSM1518113
- GSM1518114
- GSM1586363
- GSM1586364
- GSM1586365
- GSM1586366
- GSM1586367
- GSM1586368
- GSM1865359
- GSM1865360
- GSM1865361
- GSM1865362
- GSM1865363
- GSM1865364
- GSM1865365
- GSM1906926
- GSM1906927
- GSM1906928
- GSM1906929
- GSM1906930
- GSM1906931
- GSM2467568
- GSM2467569
- GSM2467570
- GSM2467571
- GSM2467572
- GSM2467573
- GSM2467574
- GSM2467575
- GSM2467576
- GSM2467577
- GSM2467578
- GSM2467579
- GSM2467580
- GSM2467581
- GSM2467582
- GSM2717200
- GSM2717201
- GSM2717202
- GSM2717203
- GSM2717204
- GSM2717205
- GSM2717206
- GSM2717207
- GSM2717208
- GSM2717209
- GSM2717222
- GSM2717223
- GSM2717224
- GSM2717225
- GSM2717226
- GSM2717227
- GSM2717228
- GSM2717229
- GSM2901339
- GSM2901340
- GSM2901341
- GSM2901342
- GSM2901343
- GSM2901344
- GSM2901345
- GSM2901346
- GSM2901347
- GSM2901348
- GSM2901349
- GSM2901350
- GSM2901351
- GSM3022814
- GSM3022815
- GSM3022816
- GSM3022817
- GSM3022818
- GSM3022819
- GSM3022820
- GSM3022821
- GSM3022822
- GSM3022823
- GSM3022824
- GSM3022825
- GSM3022826
- GSM3022827
- GSM3022828
- GSM624687
- GSM747481
- GSM747482
- GSM747483
- GSM747484
- GSM747485
- SRX304982
- SRX304983
- SRX480169
- SRX480179
- SRX480205
- SRX480212
- SRX480221
- SRX480227
- SRX480229
- SRX480250
- SRX480287
Number of poly(A) site clusters: 301,006
Percentage of clusters with poly(A) signal:
72
List
of poly(A) signals.
Poly(A) signals are considered if they reside in a region of 60 nt upstream to 10 nt downstream of one of the poly(A) sites of a cluster
- AATAAA
- ATTAAA
- TATAAA
- AGTAAA
- AATACA
- CATAAA
- AATATA
- GATAAA
- AATGAA
- AATAAT
- AAGAAA
- ACTAAA
- AATAGA
- ATTACA
- AACAAA
- ATTATA
- AACAAG
- AATAAG
Cluster annotations
Terminal exon: | 101,531 |
Other exon: | 13,079 |
Intron: | 58,184 |
Downstream of terminal exon: | 14,657 |
Antisense exon: | 4,386 |
Antisense intron: | 34,671 |
Antisense upstream of a gene: | 4,124 |
Intergenic: | 70,374 |
Add as custom track @ UCSC genome browser
We follow the standard BED specification with 0-based coordinates. Additionally, we appended extra column(s). For more information click here.
The columns represent:
first - chromosome name
second and third - start and end positions of the poly(A) site cluster, respectively
fourth - unique cluster ID, composed of the chromosome name, the representative poly(A) site of the cluster and the strand. Note that this ID format is inspired by UCSC's position format, which uses 1-based coordinates instead of the 0-based bed coordinates used in the second and third columns. Thus, to convert the position of the representative site to bed coordinates, subtract 1.
fifth - average expression (tags per million, tpm) across all samples
sixth - strand on which the cluster is encoded
seventh - percentage of samples that support the particular cluster
eighth - number of different 3' end sequencing protocols that support the particular cluster
ninth - average expression (tags per million, tpm) across all samples
tenth - two letter code for the cluster annotation (in order of decreasing priority: TE, terminal exon; EX, exonic; IN, intronic; DS, 1,000 nt downstream of an annotated terminal exon; AE, anti-sense to an exon; AI, anti-sense to an intron; AU, 1,000 nt upstream in anti-sense direction of a transcription start site; IG, intergenic)
eleventh - information about the poly(A) signal(s) that are present upstream of the poly(A) site, including the motif, the location with respect to the cleavage site and the genomic coordinate
Alternatively, you can download our atlas with average TPMs for all contributing samples as a tab separated file:
For more information on columns click here.
The columns represent:
first - chromosome name
second and third - start and end positions of the poly(A) site cluster, respectively
fourth - unique cluster ID, composed of the chromosome name, the representative poly(A) site of the cluster and the strand. Note that this ID format is inspired by UCSC's position format, which uses 1-based coordinates instead of the 0-based bed coordinates used in the second and third columns. Thus, to convert the position of the representative site to bed coordinates, subtract 1.
fifth - average expression (tags per million, tpm) across all samples
sixth - strand on which the cluster is encoded
seventh - representative poly(A) site of the cluster
eighth - percentage of samples that support the particular cluster
ninth - number of different 3' end sequencing protocols that support the particular cluster
tenth - two letter code for the cluster annotation (in order of decreasing priority: TE, terminal exon; EX, exonic; IN, intronic; DS, 1,000 nt downstream of an annotated terminal exon; AE, anti-sense to an exon; AI, anti-sense to an intron; AU, 1,000 nt upstream in anti-sense direction of a transcription start site; IG, intergenic)
eleventh - gene symbols for annotated genes overlapping with the cluster
twelfth - Ensembl gene IDs for annotated genes overlapping with the cluster
thirteenth - poly(A) signals in the region of -60 to +10 nucleotides around the representative site of the cluster with relative (e.g. @-28) and absolute position on the chromosome (e.g. @1001018)
fourteenth onwards - Sample information, consisting of SAMPLE_ID|PROTOCOL|SOURCE|TITLE|TREATMENT
Homo sapiens: v2.0 (GRCh38.96)
Total reads: 1,104,077,259
3'-end sequencing libraries:
221
See
all contributing samples.
- GSM1614163
- GSM1614164
- GSM1614165
- GSM1614166
- GSM1614171
- GSM1614172
- GSM1614173
- GSM1614174
- GSM1614175
- GSM1614176
- GSM1857615
- GSM1857616
- GSM1857617
- GSM1857618
- GSM1857619
- GSM1857620
- GSM1857621
- GSM1857622
- GSM1857623
- GSM1857624
- GSM1857625
- GSM1857626
- GSM1846072
- GSM1846073
- GSM1846071
- GSM3375431
- GSM3375390
- GSM3375391
- GSM3375392
- GSM3375393
- GSM3375361
- GSM3375362
- GSM3375363
- GSM3375364
- GSM3375367
- GSM3375368
- GSM3375369
- GSM3375370
- GSM3375371
- GSM3375372
- GSM3375373
- GSM3375374
- GSM3375375
- GSM3375394
- GSM3375395
- GSM3375396
- GSM3375421
- GSM3375429
- GSM3375430
- GSM3375366
- GSM3375382
- GSM3375365
- GSM3375386
- GSM3375423
- GSM3375424
- GSM3375425
- GSM3375426
- GSM3375427
- GSM3375428
- GSM3375376
- GSM3375377
- GSM3375378
- GSM3375379
- GSM3375380
- GSM3375381
- GSM3375404
- GSM3375405
- GSM3375406
- GSM3375407
- GSM3375408
- GSM3375413
- GSM3375414
- GSM3375419
- GSM3375420
- SAMEA4444114
- SAMEA4444115
- SAMEA4444116
- SAMEA4444117
- SAMEA4444118
- SAMEA4444119
- SAMEA4444120
- SAMEA4444121
- SAMEA4444122
- SAMEA4444123
- SAMEA4444124
- SAMEA4444125
- SAMEA4444126
- SAMEA4444127
- SAMEA4444128
- SAMEA4444129
- SAMEA4444130
- SAMEA4444131
- SAMEA4444132
- SAMEA4444133
- SAMEA4444134
- SAMEA4444135
- SAMEA4444136
- SAMEA4444137
- SAMEA4444113
- SAMEA4444112
- SAMEA4444111
- GSM3028273
- GSM3028274
- GSM3028275
- GSM3028276
- GSM3028277
- GSM3028278
- GSM3028279
- GSM3028280
- GSM3028281
- GSM3028282
- GSM3028283
- GSM3028284
- GSM3028285
- GSM3028286
- GSM3028287
- GSM3028288
- GSM3028289
- GSM3028290
- GSM3028291
- GSM3028292
- GSM3028293
- GSM3028294
- GSM3028295
- GSM3028296
- GSM3028297
- GSM3028298
- GSM3028299
- GSM3028300
- GSM3028301
- GSM3039795
- GSM3039796
- GSM3039797
- GSM3039798
- GSM3039799
- GSM3039800
- GSM3039801
- GSM3039802
- GSM3039803
- GSM3039804
- GSM3039805
- GSM3039806
- GSM3039807
- SRX351949
- SRX351950
- SRX351952
- SRX351953
- SRX359328
- SRX359329
- SRX359330
- SRX359331
- SRX359332
- SRX359333
- SRX359334
- SRX359335
- SRX359336
- SRX359337
- SRX359339
- SRX359340
- SRX359341
- GSM909242
- GSM909243
- GSM909244
- GSM909245
- GSM986133
- GSM986134
- GSM986135
- GSM986136
- GSM986137
- GSM986138
- SRX275752
- SRX275753
- SRX275806
- SRX275827
- GSM1003590
- GSM1003591
- GSM1003592
- GSM747470
- GSM747471
- GSM747472
- GSM747473
- GSM747474
- GSM747475
- GSM747476
- GSM747477
- GSM747479
- GSM747480
- GSM624686
- SRX026582
- SRX026583
- SRX026584
- SRX388391
- GSM1268942
- GSM1268943
- GSM1268944
- GSM1268945
- GSM1366428
- GSM1366429
- GSM1366430
- SRX517313
- SRX517314
- SRX517315
- SRX517316
- SRX517317
- SRX517318
- SRX517319
- SRX517320
- SRX517321
- SRX517322
- SRX517323
- SRX517324
- SRX517325
- SRX517326
- SRX517327
- SRX517328
- SRX517329
- SRX517330
- SRX517331
- SRX517332
- SRX517333
- SRX517334
Number of poly(A) site clusters: 569,005
Percentage of clusters with poly(A) signal:
76
List
of poly(A) signals.
Poly(A) signals are considered if they reside in a region of 60 nt upstream to 10 nt downstream of one of the poly(A) sites of a cluster
- AATAAA
- ATTAAA
- TATAAA
- AGTAAA
- AATACA
- CATAAA
- AATATA
- GATAAA
- AATGAA
- AATAAT
- AAGAAA
- ACTAAA
- AATAGA
- ATTACA
- AACAAA
- ATTATA
- AACAAG
- AATAAG
Cluster annotations
Terminal exon: | 143,658 |
Other exon: | 21,804 |
Intron: | 165,859 |
Downstream of terminal exon: | 19,865 |
Antisense exon: | 16,135 |
Antisense intron: | 78,441 |
Antisense upstream of a gene: | 4,353 |
Intergenic: | 118,890 |
Add as custom track @ UCSC genome browser
We follow the standard BED specification with 0-based coordinates. Additionally, we appended extra column(s). For more information click here.
The columns represent:
first - chromosome name
second and third - start and end positions of the poly(A) site cluster, respectively
fourth - unique cluster ID, composed of the chromosome name, the representative poly(A) site of the cluster and the strand. Note that this ID format is inspired by UCSC's position format, which uses 1-based coordinates instead of the 0-based bed coordinates used in the second and third columns. Thus, to convert the position of the representative site to bed coordinates, subtract 1.
fifth - average expression (tags per million, tpm) across all samples
sixth - strand on which the cluster is encoded
seventh - percentage of samples that support the particular cluster
eighth - number of different 3' end sequencing protocols that support the particular cluster
ninth - average expression (tags per million, tpm) across all samples
tenth - two letter code for the cluster annotation (in order of decreasing priority: TE, terminal exon; EX, exonic; IN, intronic; DS, 1,000 nt downstream of an annotated terminal exon; AE, anti-sense to an exon; AI, anti-sense to an intron; AU, 1,000 nt upstream in anti-sense direction of a transcription start site; IG, intergenic)
eleventh - information about the poly(A) signal(s) that are present upstream of the poly(A) site, including the motif, the location with respect to the cleavage site and the genomic coordinate
Alternatively, you can download our atlas with average TPMs for all contributing samples as a tab separated file:
For more information on columns click here.
The columns represent:
first - chromosome name
second and third - start and end positions of the poly(A) site cluster, respectively
fourth - unique cluster ID, composed of the chromosome name, the representative poly(A) site of the cluster and the strand. Note that this ID format is inspired by UCSC's position format, which uses 1-based coordinates instead of the 0-based bed coordinates used in the second and third columns. Thus, to convert the position of the representative site to bed coordinates, subtract 1.
fifth - average expression (tags per million, tpm) across all samples
sixth - strand on which the cluster is encoded
seventh - representative poly(A) site of the cluster
eighth - percentage of samples that support the particular cluster
ninth - number of different 3' end sequencing protocols that support the particular cluster
tenth - two letter code for the cluster annotation (in order of decreasing priority: TE, terminal exon; EX, exonic; IN, intronic; DS, 1,000 nt downstream of an annotated terminal exon; AE, anti-sense to an exon; AI, anti-sense to an intron; AU, 1,000 nt upstream in anti-sense direction of a transcription start site; IG, intergenic)
eleventh - gene symbols for annotated genes overlapping with the cluster
twelfth - Ensembl gene IDs for annotated genes overlapping with the cluster
thirteenth - poly(A) signals in the region of -60 to +10 nucleotides around the representative site of the cluster with relative (e.g. @-28) and absolute position on the chromosome (e.g. @1001018)
fourteenth onwards - Sample information, consisting of SAMPLE_ID|PROTOCOL|SOURCE|TITLE|TREATMENT
Caenorhabditis elegans: v2.0 (WBcel235)
Total reads: 67,268,436
3'-end sequencing libraries:
22
See
all contributing samples.
Number of poly(A) site clusters: 20,931
Percentage of clusters with poly(A) signal:
81
List
of poly(A) signals.
Poly(A) signals are considered if they reside in a region of 60 nt upstream to 10 nt downstream of one of the poly(A) sites of a cluster
- AATAAA
- ATTAAA
- TATAAA
- AGTAAA
- AATACA
- CATAAA
- AATATA
- GATAAA
- AATGAA
- AATAAT
- AAGAAA
- ACTAAA
- AATAGA
- ATTACA
- AACAAA
- ATTATA
- AACAAG
- AATAAG
Cluster annotations
Terminal exon: | 13,885 |
Other exon: | 660 |
Intron: | 567 |
Downstream of terminal exon: | 5,023 |
Antisense exon: | 78 |
Antisense intron: | 326 |
Antisense upstream of a gene: | 85 |
Intergenic: | 307 |
Add as custom track @ UCSC genome browser
We follow the standard BED specification with 0-based coordinates. Additionally, we appended extra column(s). For more information click here.
The columns represent:
first - chromosome name
second and third - start and end positions of the poly(A) site cluster, respectively
fourth - unique cluster ID, composed of the chromosome name, the representative poly(A) site of the cluster and the strand. Note that this ID format is inspired by UCSC's position format, which uses 1-based coordinates instead of the 0-based bed coordinates used in the second and third columns. Thus, to convert the position of the representative site to bed coordinates, subtract 1.
fifth - average expression (tags per million, tpm) across all samples
sixth - strand on which the cluster is encoded
seventh - percentage of samples that support the particular cluster
eighth - number of different 3' end sequencing protocols that support the particular cluster
ninth - average expression (tags per million, tpm) across all samples
tenth - two letter code for the cluster annotation (in order of decreasing priority: TE, terminal exon; EX, exonic; IN, intronic; DS, 1,000 nt downstream of an annotated terminal exon; AE, anti-sense to an exon; AI, anti-sense to an intron; AU, 1,000 nt upstream in anti-sense direction of a transcription start site; IG, intergenic)
eleventh - information about the poly(A) signal(s) that are present upstream of the poly(A) site, including the motif, the location with respect to the cleavage site and the genomic coordinate
Alternatively, you can download our atlas with average TPMs for all contributing samples as a tab separated file:
For more information on columns click here.
The columns represent:
first - chromosome name
second and third - start and end positions of the poly(A) site cluster, respectively
fourth - unique cluster ID, composed of the chromosome name, the representative poly(A) site of the cluster and the strand. Note that this ID format is inspired by UCSC's position format, which uses 1-based coordinates instead of the 0-based bed coordinates used in the second and third columns. Thus, to convert the position of the representative site to bed coordinates, subtract 1.
fifth - average expression (tags per million, tpm) across all samples
sixth - strand on which the cluster is encoded
seventh - representative poly(A) site of the cluster
eighth - percentage of samples that support the particular cluster
ninth - number of different 3' end sequencing protocols that support the particular cluster
tenth - two letter code for the cluster annotation (in order of decreasing priority: TE, terminal exon; EX, exonic; IN, intronic; DS, 1,000 nt downstream of an annotated terminal exon; AE, anti-sense to an exon; AI, anti-sense to an intron; AU, 1,000 nt upstream in anti-sense direction of a transcription start site; IG, intergenic)
eleventh - gene symbols for annotated genes overlapping with the cluster
twelfth - Ensembl gene IDs for annotated genes overlapping with the cluster
thirteenth - poly(A) signals in the region of -60 to +10 nucleotides around the representative site of the cluster with relative (e.g. @-28) and absolute position on the chromosome (e.g. @1001018)
fourteenth onwards - Sample information, consisting of SAMPLE_ID|PROTOCOL|SOURCE|TITLE|TREATMENT
Want to use the newest version? Check out the single cell atlas.
Missing older versions of our atlas? Find it in our archive.