From technology to politics to video games; these are the random thoughts of a geek with too much time on his hands
It's not a bug! It's unintended irony!
Published on May 18, 2006 By Zoomba In Windows Software
Here's a really funny way to break Notepad that a coworker showed me this morning. I bet this is one of those jokes that's been around for ages, but this was the first I ever heard of it, so it's new to me.

This actually works. It will not crash your computer, it just breaks Notepad in that it causes it to display very oddly. No perm damage comes of the following steps.

Here's how to do it:
1. Open up Notepad (not Wordpad, not Word or any other word processor)
2. Type in this sentence exactly (without quotes): "this app can break"
3. Save the file to your hard drive.
4. Close Notepad
5. Open the saved file by double clicking it.

Instead of seeing your sentence, you should see a series of squares. For whatever reason, Notepad can't figure out what to do with that series of characters and breaks

Again, it doesn't crash the app or anything, it's just a funny little twise of fate/unintended feature

Comments (Page 2)
3 Pages1 2 3 
on May 21, 2006
ummmm, it's a windows app so it's already broken
on May 22, 2006
Calculator isnt broken AFAIK...
on May 22, 2006

Calculator isnt broken AFAIK...

Is it based on the Pentium 90?

on May 22, 2006
on May 23, 2006
Is it based on the Pentium 90?


2+2=5 for suffeciently large values of 2
on Jun 14, 2006
If anyone is interested in what it is doing... I saved a copy of the file after opening the corrupted version. If you open it in a hex editor, you will see that two bytes have been prefixed to the file: FF FE. The FF FE characters at the start of the file signify that the file is a UTF-16 little-endian encoded file.

http://en.wikipedia.org/wiki/UTF-16

Unicode uses 2 bytes instead of 1 bytes like ASCII. The phrase "this app can break" is 18 characters long (include the spaces), which explains why people are seeing 9 Asian or 9 block characters.

That doesn't explain why it happens, but it does explain what is happening.
on Jun 14, 2006
All ya'll crack me up. Trying to figure out if it's Chinese, Japanese, "Simplified Chinese".

But did any of you try simply changing the letters? Such as:

"this app can brake", or
"this cat can split", or
"xxxx xxx xxx xxxxx", or
"abcd efg hij klmno"?

They all work, and produce different combinations of your "Chinese characters". My guess is, it's a bitshifting bug (or possibly an egg) which is simply a bitshift of the original characters.
on Jun 14, 2006
Now can anyone create an 18 character phrase that means something both in English and Chinese?
on Jun 14, 2006
I've been learning Chinese for 9.5 years and Japanese for 3.5 years. The characters are CJK (Chinese/Japanese/Korean) characters, but have no meaning when read as a phrase in Chinese or Japanese. Individual characters hold meaning in Chinese and Japanese; the fourth character means "ash" and the last "joy". CJK characters ultimately originate from Chinese, and all the characters probably have meanings, at least in the past.

In short, the bunch of characters are random CJK characters.
on Jun 14, 2006
Synfin80 is correct, this is a UTF related bug. I, however, do not see any "FEFF" characters, and that would seem to be the real issue to me; UTF encoded file without the BOM.

If I view the raw data of the file created in the "save" step, I get:

$ hexdump break.txt
0000000 6874 7369 6120 7070 6320 6e61 6220 6572
0000010 6b61

Translate those hex values into ascii characters, and you get:

htsia ppc nab erka

Or, as he said, little-endian two-byte groups of the value we started with.
(That means, look, transpose each pair of characters, it's the original!)

Something that turns it into 16-bit little endian runs, but not the bit to turn it into a valid UTF encoded file.

This also happens with "this abc xyz break" and "efgh abc xyz break". But it seems, only when you type it in first thing after starting the program.

I don't know how, but at
on Jun 15, 2006
after a few tries, i got the following pattern:

xxxx xxxxxxx yyy....

the positions of the 2 blanks mattered, just letters -- no digits or other symbols, and as long as the total number of characters is even then Notepad will break.

looks like a bug rather than an egg. but i'm curious what's the logic behind this...
hmmm, what if some of the x's are replaced with real Unicode characters?...
on Jun 15, 2006
nope, can't insert Unicode into the file.
if you do, Notepad saves the file in Unicode format (even if you specify ASCII), which doubles the filesize and tags FFFE at the beginning.
on Jun 16, 2006
this is what we call BUG.

try putting adding a carriage (return) at the end of the line.
notepad will then properly handle it...
on Jun 17, 2006
Not just Notepad -- saving and opening the same string in Metapad produces the same results.

Looking at the hex, it seems to be saving fine -- the bytes. Changing anything other than the letters in "break" (said other including adding or deleting any of it), or substituting any charcter other than the lower-case alphabet (61 to 7A), for any letter of those last five, makes it display properly, again, in both of the -pads I tried it in.

e.g.

Displays properly:
this app can breaka
this app can brea
this app can breaK
qhis app can break
this app canbreak
this app can break[carriage return]

Still broken:
this app can qwert

My guess is that this collection of characters is somehow interpreted as malformed unicode-16 by whatever common sense is applied to distinguishing between the various characters encodings by text editors -- it's a bug, but on a much more abstract scale than Notepad being a shitty app.
on Jun 21, 2006
It's not a bug in the core logic or in the translator, nor is it an easter-egg; it's simply a matter of what the defaults for notepad are. When you save a file in notepad, the default is "ANSI" (it's in a dropdown with other choices). When you open a file in notepad (File->Open), apart from the file path, you specify the file format, the default is "Unicode" (UTF-16), with "ANSI" and "UTF-8" as other choices. When you double-click a file, all notepad has is the filepath, it wasn't given the format, so it just uses the default Unicode when auto-detecting fails. Auto-detecting fails when CRLF is missing under all possible attempted formats. If you hit a newline after the sentence, the file would open correctly under the ANSI format.

If the default format to open were the same as the default format to save, this wouldn't confuse end-users.

No, I have never worked for Microsoft. I used to work for Oracle, and I have seen similar auto-detection failures when trying to simplify the user experience by not asking for the values of unknown variables.

Cheers,
thoreaulylazy
3 Pages1 2 3