Genbank accession
YP_009614899.1 [GenBank]
Protein name
long tail fiber protein distal subunit
RBP type
TF
Evidence UniProt/TrEMBL
Probability 1,00
TSP
Evidence DepoScope
Probability 1,00
TSP
Evidence RBPdetect
Probability 0,83
TF
Evidence RBPdetect2
Probability 0,92
Protein sequence
MATLKQIQFKRSKTAGARPAASVLAEGELAINLKDRTIFTKDDSGNIIDLGIAKGGQIDGNVTIDGTLRVNGPINNFGNFSTSGTITASNIVSATVFRSTSGSFYTRAINDTANAHLWFENTNGSERGVLYSKPQSENEGVITMRVRQGTAAGAQNSEFHFISTDGGIFQARKLTALTSISTPTISVNLINHDSKAFGQYDSQSLVQYVYPGTGETNGVNYLRKVRAKSGGTIYHEICTAQTGLADEVSWWTGNIPLFKLYGIRNDGRMIIRNSLAIGTVTEGFPSSDYGNTGVMGDRYLALGDNASGLKWVRTGVIDLMANGISAASIINNGINSTKKLMVGYSRDSGSTWDFPNNNQAMFTARTDVDGNNNGDGQTHIGYSSNSKMYHYFRGTGRMSVSMGDGLLVEPGILDIKTGSNSLNLRADGTVASTQEVKLNNGLFFNSSSSNAGLKFGASSSINGTKTIQWNAGTRPGQNKSYVTMKAWGNAFEDNTNKNRETVFELGDGQGWHFYSQRVAPAAGSTAGSVEFAMAGSLLTSGSITTRSSLNADNGLSVNGQAKFGGTADVLRIWNAEYGAIFRRSESNLYIIPTPKDAGESGGISNLRPLIIELNTGTVKMSHDVHLGDSGSGTGLLQVSNSLKTIKMICPVTINERNAALTLDSPSSSSANYLQGSKAGTRSWYVGLGGAGNDLSLYSQSYGHGLVISDNFTSITKPLKVGNAQLGTDGNITNGSGNFVNLNTTLNRKVNSGFITYGATSGWYKFATVTMPQSTSTVFFKIVGGSGFNSGLFTQCNIAEIVLRTSNERPSDLNAVLYTRTTGMYSAAFRNIAVNNVSGNTYDIYVYAGTYCNQLVCEWSCTENATVSVIGINSSTQSPVDALPDTAVDGQVANVLNNLVDRGKVKRYEADSEIAINSQTGIRIRSNADKTGSVATMLRNDGGSFYILFTDKNATDGAATVNGDWNSKRPFAINLTTGEVMMNNGIAVRSAALFYNSINVKDNGSINFDKSGANPRNMRIFHAGDGTRGNRIEIADETNYIAYFQKLPGGQNQVSFNSATLSGLTQCNQFGVNTTNALGGNSITFGDTDTGIKQNGDGLLDIYANNVQVFRFQNGDLYSYKNINAPNVYIRSDIRLKSNFKPIENALDKVEKLNGVIYDKAEYIGGEAIETEAGIVAQTLQDVLPEAVRETEDSKGNKILTVSSQAQIALLVEAVKTLSSRVKELESKLM
Physico‐chemical
properties
protein length:1229 AA
molecular weight: 131546,85480 Da
isoelectric point:8,14141
aromaticity:0,08218
hydropathy:-0,28910

Domains

Domains [InterPro]
DC_1954
STR
1–1228
G3DSA:6.20.80.10
STR
657–716
IPR048388
ATT
660–732
YP_009614899.1
1 1229
Architecture
STR
ATT
STR
STR 1-659 | ATT 660-732 | STR 733-1228 |
Legend: ATT STR RBD CBM LEC ENZ CHP LNK TAS TTP UNK Unmapped

Tail Spike Domain Segmentation

Tail Spike Domain Segmentation

This protein has been segmented into three structural domains: N-terminal, central domain, and C-terminal.

Domain Layout
N-terminal
Central
C-terminal
YP_009614899.1
1 1229
Domain Start End Length (AA) Confidence
N-terminal 1 215 215 0,1563
Central domain 216 420 206 0,4544
C-terminal 421 1229 808 0,7290
Legend: N-terminal Central domain C-terminal
3D Structure with Domain Coloring

The structure is colored according to the domain segmentation: N-terminal (blue), Central (green), C-terminal (pink).

Domain Coloring
N-terminal
1-215
Central
216-420
C-terminal
421-1229

Taxonomy

  Name Taxonomy ID Lineage
Phage Shigella phage Sf22
[NCBI]
2024320 Uroviricota > Caudoviricetes > Pantevenvirales > Tevenvirinae > Tequatrovirus
Host No host information

Coding sequence (CDS)

Coding sequence (CDS)
Genbank protein accession
YP_009614899.1 [NCBI]
Genbank nucleotide accession
NC_042039 [NCBI]
CDS location
range 98688 -> 102377
strand +
CDS
ATGGCTACTTTAAAACAAATACAATTTAAAAGAAGCAAAACCGCAGGAGCACGTCCTGCTGCTTCAGTATTAGCCGAAGGTGAATTGGCTATAAACTTAAAAGATAGAACAATTTTTACTAAAGATGATTCAGGAAATATCATTGATCTAGGTATTGCTAAAGGCGGTCAAATTGATGGAAATGTAACTATTGATGGAACTTTACGCGTCAATGGACCAATAAACAACTTTGGAAATTTTTCTACAAGTGGTACTATTACGGCAAGTAATATTGTTTCTGCTACAGTATTCAGATCAACATCCGGTTCATTTTATACAAGAGCAATAAATGATACTGCAAATGCCCATCTTTGGTTTGAAAATACGAATGGATCGGAGAGAGGGGTTTTATATTCAAAACCACAGTCTGAAAATGAAGGCGTAATTACAATGCGCGTTCGCCAAGGAACCGCAGCAGGGGCTCAAAATTCAGAATTTCATTTCATTTCTACTGATGGCGGTATCTTCCAGGCACGCAAATTAACTGCTTTAACTTCTATTAGTACACCAACTATTAGTGTTAATTTAATTAATCATGATTCTAAAGCCTTTGGACAATACGATTCACAATCATTAGTTCAGTATGTTTATCCTGGAACCGGTGAAACAAATGGTGTAAACTATCTTCGTAAAGTTCGAGCTAAATCAGGTGGAACTATTTACCATGAAATTTGTACAGCTCAGACTGGGCTAGCTGATGAAGTTTCTTGGTGGACTGGTAATATACCGTTATTTAAACTATACGGTATTCGTAACGACGGTAGAATGATTATTCGCAATAGCTTGGCCATTGGTACTGTGACGGAAGGGTTTCCGTCAAGCGATTATGGAAATACTGGGGTAATGGGAGATAGGTATTTAGCTCTTGGCGATAATGCTTCTGGACTTAAATGGGTTCGTACTGGCGTTATTGACTTAATGGCTAACGGTATTTCTGCGGCGTCTATTATTAATAATGGTATTAACAGTACTAAAAAATTAATGGTTGGTTATAGTCGAGATTCTGGTTCTACTTGGGATTTCCCAAATAACAACCAAGCAATGTTTACTGCGCGTACCGATGTTGATGGTAATAATAACGGTGATGGTCAAACTCATATCGGTTATAGCAGTAATTCTAAAATGTATCACTATTTCCGTGGTACAGGTCGTATGTCCGTTAGTATGGGCGATGGACTTCTTGTTGAGCCTGGCATTTTAGATATTAAAACTGGTTCAAATTCATTAAATTTGCGTGCTGATGGTACTGTTGCATCTACTCAAGAAGTTAAGCTTAATAATGGGTTATTCTTTAATAGTAGTTCTTCTAACGCCGGTCTTAAATTTGGCGCTTCTTCTTCTATAAATGGAACTAAGACAATACAATGGAACGCAGGCACCCGCCCGGGTCAGAACAAAAGCTATGTGACCATGAAAGCATGGGGCAATGCATTTGAGGATAATACTAATAAAAACAGAGAAACTGTATTTGAATTAGGCGATGGACAGGGTTGGCATTTTTATTCACAGCGCGTTGCTCCTGCAGCAGGTTCTACTGCCGGTTCTGTTGAATTTGCGATGGCCGGTAGTTTATTAACTTCTGGTTCTATTACAACAAGATCTTCCCTGAATGCTGATAATGGATTGTCTGTAAATGGACAAGCTAAATTTGGTGGAACGGCTGATGTATTAAGAATTTGGAATGCTGAATACGGTGCTATTTTCCGTCGCTCAGAAAGTAATCTTTATATTATACCGACTCCTAAAGATGCTGGAGAATCGGGCGGTATAAGTAATCTTAGACCATTGATAATAGAACTGAACACTGGCACAGTTAAAATGTCGCATGATGTTCATTTAGGAGATTCTGGATCTGGTACAGGACTTTTACAAGTAAGTAATAGTCTTAAAACTATTAAAATGATATGTCCAGTAACTATTAATGAACGCAATGCAGCGCTTACCCTGGATTCTCCTTCATCTTCTTCTGCTAATTATTTACAGGGTTCTAAAGCTGGAACTAGATCATGGTACGTTGGTCTTGGCGGCGCTGGAAATGATTTATCTCTTTATAGCCAATCTTATGGACATGGTCTTGTTATAAGTGATAATTTCACGTCAATCACTAAGCCTCTTAAAGTCGGCAATGCCCAATTAGGAACTGACGGTAATATTACCAATGGTTCAGGAAACTTTGTCAACTTAAATACCACGTTAAATCGTAAAGTTAATTCTGGATTTATTACTTATGGAGCAACCTCTGGATGGTATAAGTTTGCAACAGTAACAATGCCACAATCCACTTCGACGGTCTTCTTTAAAATAGTTGGAGGTTCTGGATTTAATAGCGGATTATTCACACAATGTAATATTGCTGAAATTGTTTTACGTACTAGTAATGAAAGACCTTCTGACTTAAATGCTGTATTATACACAAGAACAACTGGAATGTATAGCGCAGCATTTAGAAATATTGCAGTTAACAATGTCTCTGGAAATACATATGACATTTATGTTTATGCTGGAACATATTGTAATCAACTAGTTTGTGAATGGTCATGTACCGAAAATGCTACTGTTAGCGTTATTGGTATTAACTCATCTACCCAATCACCTGTGGATGCTCTTCCAGATACAGCAGTTGATGGGCAAGTTGCTAATGTTCTTAATAACTTGGTTGATCGTGGTAAAGTTAAGCGTTATGAAGCCGATTCTGAAATAGCTATTAATAGCCAAACCGGTATTCGTATCAGAAGCAATGCCGATAAAACTGGTTCTGTGGCTACAATGTTACGAAATGACGGTGGCAGTTTTTATATTCTGTTTACAGATAAAAATGCCACTGATGGCGCAGCAACTGTTAATGGTGATTGGAATAGTAAACGTCCTTTCGCAATTAACTTAACAACCGGCGAAGTGATGATGAATAACGGCATAGCTGTTCGCAGCGCTGCTTTATTCTATAATAGCATAAACGTCAAAGATAATGGTTCTATTAACTTTGATAAGTCCGGCGCTAACCCGAGAAACATGCGTATATTCCATGCAGGTGATGGCACTCGTGGTAATCGCATTGAAATTGCTGATGAAACAAACTATATTGCTTACTTCCAAAAATTACCTGGTGGTCAAAATCAGGTTTCATTCAATAGTGCTACACTTTCTGGATTAACACAGTGTAATCAATTTGGTGTTAACACAACAAACGCACTTGGTGGAAACAGTATAACATTTGGTGATACTGATACTGGTATTAAGCAAAATGGCGATGGATTATTAGACATATATGCGAACAACGTGCAAGTGTTCCGTTTCCAAAATGGTGATTTGTACTCATATAAAAATATAAATGCTCCAAACGTTTATATTCGTTCTGATATTCGTTTAAAATCTAACTTCAAACCTATCGAAAATGCACTTGATAAAGTTGAAAAACTCAATGGTGTCATTTATGATAAAGCTGAATACATCGGTGGAGAGGCAATTGAAACTGAAGCGGGTATTGTAGCTCAAACGTTACAAGACGTTTTACCAGAAGCCGTCCGTGAAACAGAAGACAGCAAGGGTAATAAAATACTCACTGTTTCTTCTCAAGCCCAGATTGCTCTTCTGGTTGAAGCTGTGAAAACGCTTTCTTCTCGTGTAAAAGAACTTGAATCTAAACTTATGTAA

Genome Context

Genome Context

Tertiary structure

PDB ID
85edce6fe6d4b118e82afef0413b46c4d6693eef118c644ca87470e62e708d0b
ESMFold
Source ESMFold
Method ESMFold
Resolution 0,5249
Oligomeric State monomer
Model Confidence
Very high
pLDDT > 90
High
90 > pLDDT > 70
Low
70 > pLDDT > 50
Very low
pLDDT < 50

Literature

Title Authors Date PMID Source
The isolation and characterization of 16 novel Shigella-infecting phages from the environment Doore,S.M., Schrad,J.R., Dover,J.A. and Parent,K.N. 2020-09-29 GenBank