Rainforest

Sankuru

Implémenter, personaliser, étendre et réparer Joomla/Virtuemart

Views: 3421

Nous vous aidons avec ...

Virtuemart
Joomfish
Autres extensions
SocialTwist Tell-a-Friend

Traduction automatique

English Arabic Chinese (Simplified) German Japanese Russian Spanish



Re-utilisons des sources libres

Les logiciels dont vous avez besoin, éxistent souvent déjà en source libre, et couvrent vos besoins à 80%. Nous ajouterons pour vous les 20% qui manquent.

Devis gratuit

Demandez gratuitement un devis aujourd'hui.

Crushing the head of the BOM marker monster PDF Imprimer E-mail
Note des utilisateurs: / 1
MauvaisTrès bien 
Écrit par erik   
Lundi, 29 Juin 2009 09:31
There are no translations available.

The tell-tale sign for BOM marker infection is the appearance of  the following characters in the browser:  at the start of one or more web pages. This is the notorious BOM marker, that is, the BOM marker monster. It may show up in just one page, or in many pages, or even in all pages of your website. When you open the site's files, however, you will usually not see anything.Therefore, it is usually not possible to edit the file and get rid of it.

The BOM marker is a unicode (utf-8) sequence of characters that gets saved, usually by accident, into your text files.The BOM marker is rarely or never stored in database tables. The database management system will very often enforce an encoding on text fields, and prevent the BOM marker to get stored or to get returned from the database. Therefore, the BOM marker will be hidden in .html files or in script files such .js and .php files, in template files and in other text files. 

Utf-8 is a multibyte character encoding. Whenever a protocol consists of characters that take multiple bytes, the endianness problem pops up. Do we store the bytes from left-to-right (small-to-big), that is, little endianness or from right-to-left (big-to-small), that is, big endianness? Each computer architecture must make that choice.

As you can imagine, some architectures chose little endianness where other architectures chose big endianness. This is not an problem as long as computer architectures with different endianness do not need to communicate multibyte encodings with each other.

The PC (x86) architecture, and therefore most desktops, laptops, and medium-size servers, are little endian. The PowerPC architecture and the IBM mainframe architecture are big endian.

As you can imagine, there are situations in which endianness must be communicated between two machines. For example, multibyte encoded files that travel from a PC to a mainframe and the other way around, may need to indicate their endianness. But then again, in most cases of machine-to-machine communication, there is no need for telling the other side what the endianness is for the file. The vast majority of files floating on the internet are simply little endian.

The Unicode consortium explains in their FAQ when and when not to use the BOM marker for utf-8 files. 

The use of BOM markers is therefore legitimate. However, the popular browsers (Firefox, Internet Explorer) treat the BOM marker as just another character, and therefore just show the ugly sequence   at the start of the page.

While most other PC-based text editors will be sensible enough not to save the BOM marker at the beginning of a utf-8 file, Microsoft notepad and Microsoft Visual Studio will insist on doing this anyway.

Therefore, generally spoken, the BOM marker disease is caused by editing text files with a Microsoft text editor. If anybody edits any of your site's files using a Microsoft text-editing product, your text files will be infected, and your site will start showing ugly characters at the start of some or all of its pages.

One way to detect the presence of the BOM marker in files belonging to a folder tree, consists in using the shell command grep:

grep -rl $'\xEF\xBB\xBF' <folder>

There is an entire article available, that elaborates in depth on this method of finding and listing the files that are infected, as well as cleaning them manually.

For large repositories of text files, it can be a tedious job to open and save each file individually. It can also be done automatically. The sed command line utility can detect the BOM marker in a file, and output an equivalent line without this marker present. Recently, I have successfully used the following script to clean an entire repository of files from BOM marker infections. It will look for BOM marker infections in .php, .ini, .html, and .js files, and remove them automatically:

#!/bin/bash
#The cleanBOM.sh bash script
#written by Cette adresse email est protégée contre les robots des spammeurs, vous devez activer Javascript pour la voir.
#12 Jun 2009
#Licensed under the GPL.
#
#Look up the files in a folder that contain the BOM marker
#and are php,ini,html,or js files
#and remove the BOM marker from the file

# -- first command line argument must be a folder name
folder="$1"

#expand the char codes in the regex to their corresponding characters
regex1=$'\xEF\xBB\xBF'
cmd1="grep -rl $regex1 $folder"

#select only php, ini, html, and js files
jregex2='\.\(php\|ini\|html\|js\)$'
cmd2="grep $regex2"

echo "The following files contain the BOM marker:"
$cmd1 | $cmd2

echo "Cleaning up:"
$cmd1 | $cmd2 | \
    while read f; do
        #back up file; only if not yet backed up
        backup="$f.backup"
        if [ ! -e "$backup" ]; then
            cp "$f" "$backup"
        fi
        # the copy operation above could have failed anyway (permissions)
        # don't proceed if there is no backup file
        if [ -e "$backup" ]; then
            #delete the original file
            rm -f "$f"
            #output the backup file without the BOM
            #and write it to the original file location
            sed "s/\xEF\xBB\xBF//" "$backup" > "$f"
        fi
    done 

echo "The following files still contain the BOM marker:"
$cmd1 | $cmd2

save the script to a text file called cleanBOM.sh. Change the permissions for the script to make it executable:

$ chmod a+x cleanBOM.sh

Next, invoke the script at the command line, with the folder to clean as its first argument:

$ ./cleanBOM.sh myfolder

The script will remove the BOM marker from the text files in the folder and its subfolders; and save the original files with the extension .backup.

Now you should check the results of the cleanup in the browser. If all backup markers have successfully been removed, you can now remove the backup files with the following script:

#!/bin/bash
#The cleanBOM.sh bash script
#written by Cette adresse email est protégée contre les robots des spammeurs, vous devez activer Javascript pour la voir.
#29 Jun 2009
#Licensed under the GPL.
#
#Remove backup files created by the cleanBOM.sh script
# -- first command line argument must be a folder name
folder="$1"

echo "removing .backup files."
find "$folder" | grep '\.backup$' | \
    while read f; do
        echo "removing $f"
        rm -f $f
    done

Note that you MUST remove the backup files after successfully cleaning the folder tree. This is not optional! The next time you will do a cleanup, the cleanBOM.sh script will find the old backup files and think that you have already done a cleanup. From there, it will delete your new file restore the old backup file. This is not what you want. Therefore, cleanup the backups after you've successfully removed the BOM marker infection.

save the script to a text file called cleanBACKUPS.sh. Change the permissions for the script to make it executable:

$ chmod a+x cleanBACKUPS.sh

Next, invoke the script at the command line, with the folder to clean as its first argument:

$ ./cleanBACKUPS.sh myfolder

The script will remove the backup files in the folder and its subfolders.

If your site regularly gets infected with BOM markers, for example, because novice staff violates the work instructions and site management guidelines by accidentally using Microsoft text-editing software, you may want to schedule the script to run on a regular basis, using an entry in the crontab for your site's system user.

 


blog comments powered by Disqus
 
 
Joomla 1.5 Templates by Joomlashack