Mouse aggrecan, a large cartilage proteoglycan: protein sequence, gene structure and promoter sequence

Abstract
Seven genomic clones for mouse aggrecan core protein have been isolated including 3 kb of 5′- and 7 kb of 3′-flanking sequences. All exon sequences and their intron boundary sequences in these clones were identified and mapped by DNA sequencing. The gene spans at least 61 kb and contains 18 exons. Exon 1 encodes 5′-untranslated sequence and exon 2 contains a translation start codon, methionine. The coding sequence is 6545 bp for a 2132-amino-acid protein with calculated M(r) = 259,131 including an 18-amino-acid signal peptide. There is a strong correlation between structural domains and exons. Notably, the chondroitin sulphate domain consisting of 1161 amino acids is encoded by a single exon of 3.6 kb. Although link protein has similar structural domains and subdomains, the sequence identity and the organization of exons encoding the subdomains B and B′ of G1 and G2 domains revealed a strong similarity of mouse aggrecan to both human versican and rat neurocan. Primer extension analysis identified four transcription start sites which are close together. The promoter sequence showed high G/C content (65%) and contained several consensus binding motifs for transcription factors including Sp-1 and the glucocorticoid receptor. There are stretches of sequences similar to the promoter region of both the type-II collagen and link protein genes. These sequences may be important for cartilage gene expression.