Page 2 of 5

Posted: Tue Jun 05, 2007 12:00 pm
by phipunk
deagol wrote:
phipunk wrote: NICE! Did you try BLASTing it? There are 991 triples by my count (2973 total bases), so we're going to have to winnow it down considerably to get the 48 proteins we're looking for.
I did, but I have no idea what I'm doing or what I'm looking for, and there's so many ways to search.

http://130.14.29.110/BLAST/

I clicked on "nucleotide blast" and I think I looked for human DNA and it found something near the end of the sequence that matched a bit of chromosome 5, I think. No idea what's up with that.

Searched again now against the mouse genome, and found a big chunk matching chromosome 12 of mice.
Did you try aligning it with the recently sequenced Phoenix genome? ;)

Posted: Tue Jun 05, 2007 12:07 pm
by phipunk
deagol wrote: Decrypt again with key: "To understand the future, look to the past."
Ah, now I get it. "The key to knowledge is understanding" --> "To understand the future, look to the past". Although it should probably be "One of the keys to knowledge is understanding; the other one is gibberish."

Posted: Tue Jun 05, 2007 12:13 pm
by deagol
Ok I think I found the right place to cut it. On a hunch, I decided to search for our own GAATTC sequence from the rings. Just a simple text search, not any of those fancy tools (BLAST or NEBcutter) with make me feel like I'm in front of a 747 instrument panel. So anyways, there's exactly two instances of our plasmid marker, so I'm selecting the DNA sequence that's flanked by them.

Code: Select all

GAATTC
TACTGCTCTGTTACTCTTTGCTACACTGACATGTTTGTGAGTGAAACA
CAAATCTGGCCTACATGCACATCCAAAGACACAGTAGGTCTAAACACA
AAGTTAGTGTTTGTACAGTCACATCGTTTCTCATTGTCTCGTCATTAC
GAATTC
This is exactly 144 bases, just what we needed. The aminoacid sequence:

Code: Select all

Y C S V T L C Y T D Met F V S E T Q I W P T C T S K D T V G L N T K L V F V Q S H R F S L S R H Y
So far I did the first 16 decoding numbers (there must be an automated way of obtaining the aminoacid full names so I can plug them into our facilityj_decoder, but I haven't found one so I'm stuck to building the list by hand), and the resulting string is:

Code: Select all

BFGBGFBGEAEABGCB
        -+-+
Now this doesn't look as noisy. I've marked the letters that came from those suspicious (-0) (+0) (-0) (+0) shifts which remind me of the OOOOOOOCOOOO puzzle (*gulp*).

Posted: Tue Jun 05, 2007 12:15 pm
by deagol
phipunk wrote: Did you try aligning it with the recently sequenced Phoenix genome? ;)
Exactly what I did after getting lost in BLAST. Did you try it?

Posted: Tue Jun 05, 2007 12:27 pm
by Aja
deagol wrote:Ok I think I found the right place to cut it. On a hunch, I decided to search for our own GAATTC sequence from the rings. Just a simple text search, not any of those fancy tools (BLAST or NEBcutter) with make me feel like I'm in front of a 747 instrument panel. So anyways, there's exactly two instances of our plasmid marker, so I'm selecting the DNA sequence that's flanked by them.

Code: Select all

GAATTC
TACTGCTCTGTTACTCTTTGCTACACTGACATGTTTGTGAGTGAAACA
CAAATCTGGCCTACATGCACATCCAAAGACACAGTAGGTCTAAACACA
AAGTTAGTGTTTGTACAGTCACATCGTTTCTCATTGTCTCGTCATTAC
GAATTC
This is exactly 144 bases, just what we needed. The aminoacid sequence:

Code: Select all

Y C S V T L C Y T D Met F V S E T Q I W P T C T S K D T V G L N T K L V F V Q S H R F S L S R H Y
So far I did the first 16 decoding numbers (there must be an automated way of obtaining the aminoacid full names so I can plug them into our facilityj_decoder, but I haven't found one so I'm stuck to building the list by hand), and the resulting string is:

Code: Select all

BFGBGFBGEAEABGCB
        -+-+
Now this doesn't look as noisy. I've marked the letters that came from those suspicious (-0) (+0) (-0) (+0) shifts which remind me of the OOOOOOOCOOOO puzzle (*gulp*).
I did all 48 and as long as I didn't make any mistakes, you get:

Code: Select all

BFGBGFBGEAEABGCBEFBGCBEBBFGAEABGFBGEBEFBGCBGFBGD
        -+-+               +-      + -  +

Posted: Tue Jun 05, 2007 1:04 pm
by deagol
Aja wrote: I did all 48 and as long as I didn't make any mistakes, you get:

Code: Select all

BFGBGFBGEAEABGCBEFBGCBEBBFGAEABGFBGEBEFBGCBGFBGD
        -+-+               +-      + -  +
Thanks Aja! I did it all just in case and got the same. Now what?

Posted: Tue Jun 05, 2007 1:16 pm
by ignatzmouse
Boy that was fast, I blinked and I missed it...
deagol wrote:
Aja wrote: I did all 48 and as long as I didn't make any mistakes, you get:

Code: Select all

BFGBGFBGEAEABGCBEFBGCBEBBFGAEABGFBGEBEFBGCBGFBGD
        -+-+               +-      + -  +
Thanks Aja! I did it all just in case and got the same. Now what?
A-G == musical notation? Or would that be Maddison Atkins?

Posted: Tue Jun 05, 2007 1:20 pm
by phipunk
ignatzmouse wrote:Boy that was fast, I blinked and I missed it...
deagol wrote:
Aja wrote: I did all 48 and as long as I didn't make any mistakes, you get:

Code: Select all

BFGBGFBGEAEABGCBEFBGCBEBBFGAEABGFBGEBEFBGCBGFBGD
        -+-+               +-      + -  +
Thanks Aja! I did it all just in case and got the same. Now what?
A-G == musical notation? Or would that be Maddison Atkins?
Yes, I think so! And +/- indicates sharps and flats!

Posted: Tue Jun 05, 2007 1:20 pm
by phipunk
deagol wrote:
phipunk wrote: Did you try aligning it with the recently sequenced Phoenix genome? ;)
Exactly what I did after getting lost in BLAST. Did you try it?
Yep. The chromosomes it aligned with were all numbered 1 through 8, so I subtracted 1 from everything and converted from octal to ascii, giving me a perl script that outputs the very same sequence that you and Aja decoded!

Seriously though, great work on the slicing and dicing. I can't believe how fast this is going forward.

Posted: Tue Jun 05, 2007 2:00 pm
by deagol
ignatzmouse wrote:Boy that was fast, I blinked and I missed it...
Haha yeah, for a change I went to sleep early last night, so I missed the first part. Hey you'd probably enjoy the aha! moment with the frequency crack. Here, I'll show you the way:

1. Go here
2. In the box, paste the gobbledygook from Lum's post and hit the "Break" button.
3. Notice the controls to increase the key size (period). It starts at a single letter but obviously you should increase it. Do that and switch between the different letters in the key (position), and notice how the frecuencies of the highlighted letters change. You can slide the histogram left or right to try to match the english histogram in blue, but don't waste too much time trying this since this message isn't in english.
4. Now keep increasing the key length and try not to yawn at the boring, flat histograms. Keep going... yeah, I was about to give up as well when, out of nowhere... ba-boom! There it is! For a second, it feels like you made a mistake or something. Of course you realize what's going on right away, and you shift to match what you're thinking. Yes, it fits!
5. When you try the next letter, don't be disheartened by what happens. Keep sliding and matching as best you can. Later you'll get another perfect match, and in the end you'll understand what the fortunate glitch was. Realize that this tool only lets you crack keys up to 20 characters long and the key we're looking for is 34 characters long. It was a close call with it happening only until the 17-characters long (half) pseudo-key.

If this sounds too boring you're free to ignore it. I realize I'm a bit obsessed with this little tool. I really like it though.
ignatzmouse wrote:

Code: Select all

BFGBGFBGEAEABGCBEFBGCBEBBFGAEABGFBGEBEFBGCBGFBGD
        -+-+               +-      + -  +
A-G == musical notation? Or would that be Maddison Atkins?
Ooh good thinking! Well, who says one J can't pick up a trick from another J... ;) I'll check it out, but first: lunch! =P~

Posted: Tue Jun 05, 2007 2:33 pm
by ignatzmouse
deagol wrote:
ignatzmouse wrote:Boy that was fast, I blinked and I missed it...
Haha yeah, for a change I went to sleep early last night, so I missed the first part. Hey you'd probably enjoy the aha! moment with the frequency crack. Here, I'll show you the way:
OK, that's quite a neat tool. Pity it's limited to key size 20, can't have everything.
deagol wrote:
ignatzmouse wrote:

Code: Select all

BFGBGFBGEAEABGCBEFBGCBEBBFGAEABGFBGEBEFBGCBGFBGD
        -+-+               +-      + -  +
A-G == musical notation? Or would that be Maddison Atkins?
Ooh good thinking! Well, who says one J can't pick up a trick from another J... ;) I'll check it out, but first: lunch! =P~
This doesn't smell right to me (musical notation being a bit far from the hard biochem we were dealing with a couple of months back) but you never know.

Posted: Tue Jun 05, 2007 2:35 pm
by phipunk
phipunk wrote:
ignatzmouse wrote: A-G == musical notation? Or would that be Maddison Atkins?
Yes, I think so! And +/- indicates sharps and flats!
On the other hand, that would yield both E-flat and E-sharp, the latter of which is better known as F. Perhaps I jumped the gun on that interpretation of + and -.

Posted: Tue Jun 05, 2007 2:38 pm
by ignatzmouse
Oh, and here's a perl script I wrote which automates the codon/shift decryption. Command line only I'm afraid -- it takes the DNA sequence as the first argument, and the shifts as the second.

Code: Select all

#! /usr/bin/perl

$codons = shift();
$shifts = shift();

print "codons: $codons\n";
print "shifts: $shifts\n";

%names = (
  "TTT", "PHENYLALANINE",
  "TTC", "PHENYLALANINE",
  "TTA", "LEUCINE",
  "TTG", "LEUCINE",
  "TCT", "SERINE",
  "TCC", "SERINE",
  "TCA", "SERINE",
  "TCG", "SERINE",
  "TAT", "TYROSINE",
  "TAC", "TYROSINE",
  "TAA", "OCHRE",
  "TAG", "AMBER",
  "TGT", "CYSTEINE",
  "TGC", "CYSTEINE",
  "TGA", "OPAL",
  "TGG", "TRYPTOPHAN",
  "CTT", "LEUCINE",
  "CTC", "LEUCINE",
  "CTA", "LEUCINE",
  "CTG", "LEUCINE",
  "CCT", "PROLINE",
  "CCC", "PROLINE",
  "CCA", "PROLINE",
  "CCG", "PROLINE",
  "CAT", "HISTIDINE",
  "CAC", "HISTIDINE",
  "CAA", "GLUTAMINE",
  "CAG", "GLUTAMINE",
  "CGT", "ARGININE",
  "CGC", "ARGININE",
  "CGA", "ARGININE",
  "CGG", "ARGININE",
  "ATT", "ISOLEUCINE",
  "ATC", "ISOLEUCINE",
  "ATA", "ISOLEUCINE",
  "ATG", "METHIONINE",
  "ACT", "THREONINE",
  "ACC", "THREONINE",
  "ACA", "THREONINE",
  "ACG", "THREONINE",
  "AAT", "ASPARAGINE",
  "AAC", "ASPARAGINE",
  "AAA", "LYSINE",
  "AAG", "LYSINE",
  "AGT", "SERINE",
  "AGC", "SERINE",
  "AGA", "ARGININE",
  "AGG", "ARGININE",
  "GTT", "VALINE",
  "GTC", "VALINE",
  "GTA", "VALINE",
  "GTG", "VALINE",
  "GCT", "ALANINE",
  "GCC", "ALANINE",
  "GCA", "ALANINE",
  "GCG", "ALANINE",
  "GAT", "ASPARTICACID",
  "GAC", "ASPARTICACID",
  "GAA", "GLUTAMICACID",
  "GAG", "GLUTAMICACID",
  "GGT", "GLYCINE",
  "GGC", "GLYCINE",
  "GGA", "GLYCINE",
  "GGG", "GLYCINE"
  );

$output = "";

while($codons =~ s/\s*(\S\S\S)\s*//) {
    $codon = $1;
    die "No such codon." unless $name = $names{$codon};
    die "No such shift" unless $shifts =~ s/\s*(\d+)\((\+?-?\d+)\)\s*//;
    $len = length($name);
    $index = $1;
    $offset = $2;
    $char = substr($name,($index - 1) % $len,1);
    $result = chr(((ord($char) - ord("A") + $offset) % 26) + ord("A"));
    print "codon: $codon\n";
    print "name: $name\n";
    print "offset: $offset\n";
    print "index: $index\n";
    print "len: $len\n";
    print "char: $char\n";
    print "result: $result\n";
    $output .= $result;
}

print "output: $output\n";

Posted: Tue Jun 05, 2007 3:20 pm
by deagol
phipunk wrote: In the meantime, perhaps there's a clue right under our noses
The Key to Knowledge is Understanding
Where did this quote come from? I can't find it in any of the profiles or archived correspondence.

Posted: Tue Jun 05, 2007 3:25 pm
by Aja
It was the title of the second Craigslist posting (the one with all the gobbledygook), which unfortunately has since been tagged for removal.