Tuesday, October 27, 2009

 

Windows XP Blank Screen After XP Logo (Dell Latitude D830)

I thought I would record the following experience in case it helps others who encounter the same problem. I have a Dell Latitude D830 laptop and generally leave it in standby mode at the end of the day. This uses little power and makes the slow Windows XP reboot unnecessary. However, last week I was away from the office and the machine in standby mode, and detached from the power supply, apparently ran out of battery power. When I came to power up the machine, rather than springing back from standby mode, it began a normal reboot sequence. However, this reboot sequence was far from normal in its ending. After the Windows XP logo appeared with its blue progress bar, the screen suddenly went completely black.

Very strange and unsatisfactory behavior! Googling through the various sites discussing this topic, I quickly came to appreciate that there are a multitude of ways in which this tedious effect can be achieved. Rebooting in safe mode or VGA mode worked fine, but any attempt to get to the normal screen resolution resulted in the blank screen after the Windows XP logo part of the reboot sequence.

To solve the problem I tried rebooting in safe mode and reloading graphics drivers as originally downloaded from Dell. This did not solve the problem. I then tried just using the machine with remote desktop, after booting it into VGA mode. This worked well, but wasn't quite what I was used to, and certainly did not constitute a fix, and so I Googled further. I tried 'repairing' the operating system using a Windows XP disk. (That was a mistake - see below). Eventually, I ran into a site which described how someone logged onto the machine, although the screen was black. So - having nothing to lose, I tried that. I typed username - and tab - and password a few times. Something in the blankness was happening as their were some typical Windows XP clonking sounds emitted from the machine. Then I must have hit the correct sequence, as the display suddenly lit up, and carried on working as though nothing had ever happened.

Of course, the Windows XP repair attempt had the effect of removing all the vital operating system updates that had been put on the machine. So, a large Windows update was then required to restore SP3 and all the other updates thought to be vital to my security.

So, the conclusions are firstly that Windows XP and standby mode can be fragile under some rare circumstances. Secondly, curing tedious Windows XP start up problems can be mysterious. I speculate that my habit of putting the machine into standby mode may be quite risky when perilously short of diskspace. Probably at some point the standby mode program decides that the battery will expire soon and decides to write everything out to disk, but without checking properly that there is enough diskspace to save everything. Then, somehow or other, some special state for the graphics card of blankness (because the lid is closed) is saved, and the machine crashes. This state will be overwritten the next time someone logs on successfully. However, the next logon is effectively prohibited by the fact that the screen is blank.

Hence, users are forced into the rather tedious process of either wiping the machine and starting again. Or buying another machine and making another donation to Microsoft in the process for another, equally buggy, operating system.

But - if you have a blank screening Windows XP laptop, that almost boots but not quite, try typing in your credentials at what should be the log in screen. You may find that this clears the problem - and although it is not as satisfying as repairing the operating system, replacing graphics drivers, or buying a new machine, it may just get you going again.

Tuesday, October 13, 2009

 

Extracting the Audio Portion of MP4 to MP3

If you are ever in need of just the audio portion of an .mp4 file, here are two commands and a loop to put the audio into an .mp3 file.

#!/bin/sh

for file in *.mp4
do
echo $file
"/cygdrive/c/Program Files/MPlayer/mplayer.exe" -ao pcm $file \
-ao pcm:file=tmp.wav
lame -h tmp.wav `basename $file .mp4`.mp3
done


(You may need to customize the path to mplayer.exe. I used an explicit path as I was on Windows and needed to pull in a specific cygwin dll for this executable. You probably won't have this issue.)

Monday, September 21, 2009

 

Problems with a 4 GB Kingston Data Traveler

I have been having tedious problems with a 4 GB Kingston Data Traveler. I couldn't find much information on the web about these problems - so I thought I would describe what I have seen here. If anyone has a solution please let me know in comments. If you are having similar problems, perhaps you can learn from what I have seen (and can avoid wasting your time trying to get the drive to work).

I bought 4 GB Kingston Data Traveler for about 12 dollars from Fry's Electronics several months ago. The drive never worked properly - files were getting lost and corrupted from the file system. So, I sent it back to Kingston for replacement. The replacement drive showed up last week. Soon after I plugged that into a Windows XP machine, I found that the complete USB controller for that machine had shut down. That is a tedious event. You need to power down the machine completely, in order to power down the internal USB hub, in order to get everything to reset. I looked around on the web - endlessly and made various attempts to change drivers and so on. But to no avail. So, I am inclined to give up with this drive. It just isn't worth the time and effort trying to make it work as advertised. Every other USB drive and device I have ever used with this machine has worked perfectly. The USB drive in question is a 4 GB Kingston Data Traveler USB stick. The computer concerned is a Windows XP/SP3 Dell Latitude D610.

If you know how to get the Data Traveler USB drive to behave properly - or recognize the symptoms, please let me know!

Sunday, July 12, 2009

 

Simple, Battery-Free, LED Flasher



How about making an electronic device which does something (slightly) useful, doesn't require an external power supply, and might last longer than you? I found this proposition enticing, and so, inspired by Kevin Horton's Infini-Flasher, I had a go at creating my own version of Kevin's cunning device.

I firstly put together the components using a simple breadboard and found that I needed to make some changes to the component values listed in Kevin's design. I suspect that this is simply a result of using different transistor types. The changes simplified Kevin's circuit slightly, I omitted a pair of diodes, which are only needed if the supply voltage is high, and changed the ordering of the NPN and PNP transistors. I also found that it was important to use a suitable value for the limited resistor, between the super capacitor and the flasher circuit. I started out with 100k here, as in Kevin's circuit, but found that the circuit would not start to flash as the leakage through the flasher circuit would not allow the voltage across the flasher circuit to reach a high enough value to actually start flashing. Clearly the circuit is a little sensitive to component tolerances. I recommend that you put together your circuit using a breadboard first, then solder everything together once you are sure that the component values are satisfactory. The flasher circuit I ended up with is very similar to that listed here in the Talking Electronics site, with the main difference being that the flasher drives a LED through the induced voltage in an inductor, rather than directly from the power supply (which is generally too low a voltate to light a LED). This type of flasher circuit works well for this application, as we want a very short duration of current usage, so that the relatively small amount of charge available from the capacitor is made to last as long as possible. The simulation of the circuit shows (when I get around to adding it), the on condition for the output transistor is very brief. This provides a brief spike of power to the inductor and in turn the LED. As human eyes are very sensitive to short pulses of light (through having evolved to avoid the glinting teeth of sabre tooth tigers no doubt) this provides the most electrically economical means to light the LED. Even if the pulses were longer, the human eye would not appreciate the large expenditure of power that much.

I will include a table indicating the specifications of each of the components in due course, should you want to reproduce the circuit. Here is the circuit diagram. Be warned that this battery free flasher is fascinating! The super capacitor (1F on the diagram) charges up from the solar solar cells in about 30 minutes under a lamp, or quite happily during the hours of daylight on a desk. When the voltage supplied to the flasher circuit reaches about 1.5 volts, the LED starts flashing. The current consumption is around 10 microamps, so the charge in the super capacitor lasts a long time, certainly more than a typical night time. So far my version has been flashing away happily on my desk for about a month. It should last as long as the electrolytic capacitors that it contains - that should be at least 20 years - perhaps 40 if I am lucky! (I will keep this blog post updated - if I can).

Monday, July 06, 2009

 

Automating Tweets for Twitter With A Script

Are you interested in hooking up your computer's CPU temperature monitor to Twitter? How about monitoring the progress of your automatic builds from anywhere in the world, without needing to set up a web server, or worry about firewall security?
Simple monitoring tasks such as these can be easily accomplished using Twitter. You can also (should you want to) automate your personal updates to your followers. However, be aware that your followers will quickly tire of your Tweets if they are too routine or 'bot' like.
Here is a simple example script to get you started:
#!/bin/sh
user="YourTwitterUserName"
pass="YourTwitterPassword"
while true
do
status=`awk '{srand();array[NR]=$0;}END{print array[int(rand()*NR+1)]}' msgs.txt`
status=`echo $status | tr ' ' '+'`
curl --basic --user "$user:$pass" --data-ascii \
"status=$status" \
"http://twitter.com/statuses/update.json" > /dev/null
DELAY=`awk 'BEGIN{srand();print 29+int(rand()*1200)}'`
echo "We will be sleeping for" $DELAY "seconds"
sleep $DELAY
done

This reads the msgs.txt file, selects a line from this file at random, and updates your Twitter status accordingly. The script then pauses for a suitable delay of less than 20 minutes, and repeats, forever (or until interrupted). With a suitably creative set of work, or study, related lines in your msgs.txt file this might be help convince some of your colleagues (or parents) that you are dedicated person while you surf the interweb.

Saturday, February 28, 2009

 

Making Files Accessible - Easily

Today I was contemplating setting up Samba on Cygwin on Windows XP. The idea was with a machine running such a setup somewhere on the network, I would be able to easily share files without having to engage in Windows XP or Windows Vista security.

Well - if you Google the subject - you find that it can be done. But it is clearly quite a lot of effort to set up. Additionally, not everything works correctly. Some forms of file access do not survive the round trip from Windows to Gnu to Windows unscathed.

However, before I got too far into this project I ran into something neater and more minimalistic, which is almost as good. What is that thing? Running a simple web-server under Cygwin so that other computers on the network can browse that machine's files using Firefox, Internet Explorer, Safari, or your favorite web browser. How does this work? A simple one liner is sufficient - because the Python world have been to some effort to make everyone's lives easier.

python -m SimpleHTTPServer

Just type the python command above. Then elsewhere on the network open http://1.2.3.4:8000 and you will be able to browse the files and directories beneath the directory in which the python command was executed. (The 1.2.3.4 is the IP address of the server machine - you can find this by typing 'ping hostname'). This means read only access on the remote machine - however this involves no set up whatsoever. Depending on what you have been doing with Python on the machine, you may get a prompt about opening the firewall for this program. However, that is the extent of the setup. Very nice!

Inspired by this - I Googled again. This time for a web server written in Bash. Sure enough there is a nice version out there, written, (I think), by Piotr Gabryjeluk. This is a little cooler than the Python web server, because you can play with the moving parts, and see what is happening. It all happens in less than 100 lines of script. This script expects an argument of the port that will be used for communication. So, on your server machine, you would issue a command like:

./web.sh 9092

...and on the client machine you would browse http://1.2.3.4:9092. (As before 1.2.3.4 is the IP address of your server machine). This is very convenient, and I think, rather secure. You can see precisely which files are being served by your server (from the output on standard out). You can kill the server when you have had enough fun and your machine's configuration will be completely unaffected. The script relies on 'nc' or 'nc.exe' to do the communication, so you may get a prompt from your firewall about this program. Just add it to the list of trusted programs, and all will be fine.

I made a small tweak to the original script - so that I could browse images and text files without downloading them first. This is also basically described in the comments on Piotr's page (I think I added the \n which seemed to be necessary), but if you are interested here is the tweak to the function called serve_file.

function serve_file {
echo 'HTTP/1.1 200 OK'
local file="`fix_path "$1"`"
debug INFO serving file "$file"
debug INFO "`file -i "$file" | cut -d " " -f 2-`"
printf 'Content-type: %s\n' "`file -i "$file" | cut -d " " -f 2-`"
echo
cat "$file"
}

These commands - the completely minimalist Python command or the Bash script - enable you to share files easily on your home computer network. It may be my imagination - but I am certainly impressed by the performance of the Bash version. It makes browsing directories graphically very nice because it is so responsive.

Tuesday, February 24, 2009

 

A Bash Script for Directory Merging (dirmer.sh)

When working on multiple machines, using external drives, and being constrained for disk space, it is all too easy to create cloned directory trees, which are similar but not identical to one another. Looking at the various directories cloned on my hard drive, I decided to create a simple script for merging directories, which is appended. Now, this is very simple, and be warned, it comes with no guarantees expressed or implied, and has minimal error checking. However, I find it useful and thought I would post it in case anyone else were interested.

Here is how it works...

1. Two arguments, the source directory, and the target directory are passed to the awk program
2. The awk program cd's to the source and target directories and builds associative arrays keyed on file names for the file's timestamp and file type (either file or directory)
3. Each file in the source directory is checked in the target directory.
4. If the same file name exists in the target, the checksums of each file are compared, if the files are identical, a command to delete the source file is stored
5. If the files are not identical, a warning is emitted and the file is left in place in the source directory for further investigation
6. If the source file or directory does not exist in the target it is moved to the target directory, again by storing the appropriate command
7. The user is shown the list of commands that the script has decided are required and asked if these should be executed
8. If requested, the merge commands are executed

The effect is that identical files are deleted in the source (you already have them in the target after all). Files that are unique are copied to the target. Any files that are in conflict are left in place to be reconciled by hand.

As mentioned above - the script is experimental and crude and contains minimal error checking - please leave comments if you have suggestions for possible improvements and please check carefully prior to any use (which will be at your own risk, as always).


#!/bin/sh

awk '
BEGIN{
dir1="\"" ARGV[1] "\"/"
dir2="\"" ARGV[2] "\"/"

readdir(dir1, lista, typea)
readdir(dir2, listb, typeb)

for(filea in lista){
if(filea in listb){
if(typea[filea] == "f"){
if(ckfile(dir1 "\"" filea "\"") != ckfile(dir2 "\"" filea "\"")){
print "# " dir1 "\"" filea "\"" " " dir2 "\"" filea "\""
print "# WARNING FILES DIFFER - CONTINUING - YOU NEED TO CHECK WHY!"
} else {
com[++ncom]="# files match " dir1 "\"" filea "\"" " " \
dir2 "\"" filea "\""
com[++ncom]="/usr/bin/rm " dir1 "\"" filea "\""
}
}
}else{
if(typea[filea] == "d" ){
dcom[++ndcom]="# directory needs to be created in the target"
dcom[++ndcom]="mkdir -p " substr(dir2,1,length(dir2)-2) \
substr(filea,2) "\""
} else {
com[++ncom]="# file needs to be moved to the target"
com[++ncom]="mv " substr(dir1,1,length(dir1)-2) substr(filea,2) \
"\"" " " substr(dir2,1,length(dir2)-2) substr(filea,2) "\""
}
}
}

if(!ncom && !ndcom){
print "No updates required"
exit
}
print "The following commands are needed to merge directories:"
for(i=1;i<=ndcom;i++){
print dcom[i]
}
for(i=1;i<=ncom;i++){
print com[i]
}
print "Do you want to execute these commands?"
getline ans < "/dev/tty"
if( ans == "y" || ans == "Y"){
for(i=1;i<=ndcom;i++){
print "Executing: " dcom[i]
escapefilename(dcom[i])
system(dcom[i])
close (dcom[i])
}
for(i=1;i<=ncom;i++){
print "Executing: " com[i]
escapefilename(com[i])
system(com[i])
close (com[i])
}
}
}
function ckfile(filename, cmd)
{
if (length(ck[filename])==0){
cmd="cksum " filename
cmd | getline ckout
close(cmd)
split(ckout, array," ")
ck[filename]=array[1]
}
return ck[filename]
}
function escapefilename(name){
gsub("\\$", "\\$", name) # deal with dollars in filename
gsub("\\(", "\\(", name) # and parentheses
gsub("\\)", "\\)", name)
}
function readdir(dir, list, type, timestamp, ftype, name){
cmd="cd " dir ";find . -printf \"%T@\\t%y\\t%p\\n\""
print "Building list of files in: " dir
while (cmd | getline > 0){
timestamp=$1
ftype=$2
$1=$2=""
name=substr($0,3)
list[name]=int(timestamp)
type[name]=ftype
}
close(cmd)
}' "$1" "$2"

Thursday, February 05, 2009

 

Creating A Subset of a PDF Document

If you have a PDF file and want to send only a portion to a friend or colleague, what do you do? With pdftk you can easily create subsets of the pages in a PDF. For example, if you want to drop 5 pages of preamble in a document that you need to send to your boss, you can do that with:

pdftk A=LongDocument.pdf cat A6-end output ShortDocument.pdf

Tuesday, February 03, 2009

 

Extracting the Audio Portion of an FLV File to MP3 (Again)

Here is another ffmpeg recipe. This time extracting the audio portion of a flash, or .flv, video file.

ffmpeg -i example.flv -ab 56 -ar 22050 -b 500 example.wav
lame --preset standard example.wav example.mp3

You can also go directly to .mp3 from .flv using ffmpeg with a command like this.

ffmpeg -i example.flv -sameq example.mp3

But using lame seemed to give better results, as far as I could tell. If you have additional information on maintaining quality when carrying out such transformations please leave a comment.
 

An Improved ffmpeg Recipe For Concatenating Video Files

Here is an improved recipe for concatenating mpg and other video format files.

ffmpeg -i a.mpg -s 480x360 -maxrate 2500k -bufsize 4000k -b 700k -ar 44100 i1.mpg
ffmpeg -i b.mpg -s 480x360 -maxrate 2500k -bufsize 4000k -b 700k -ar 44100 i2.mpg
ffmpeg -i c.mpg -s 480x360 -maxrate 2500k -bufsize 4000k -b 700k -ar 44100 i3.mpg
cat i1.mpg i2.mpg i3.mpg > a.mpg
ffmpeg -i a.mpg -sameq combined.mpg

This seemed to work better than simply using 'sameq' the last time that I needed to join some video segments.

Monday, February 02, 2009

 

Bash Directory Synchronization 2.0 (ds.sh)

Here is a simple Bash script which illustrates the use of find and awk in determining which files to update when synchronizing two directories. As noted in the previous post on the subject (where there is a bash shell script to synchronize directories which does the recursion and file time stamp checking), you can use the rsync command to carry out this task for you. However, rsync is a little too efficient and terse, and an open script, even if it is mainly awk, allows you to understand precisely what is about to happen to your files. The script works by using 'find' to gather data about the directories being synchronized. A list of synchronization commands (simple cp's or rm's) are presented to the user based on this analysis, and the user can then decide whether to execute the commands or not. In my tests the analysis of 1.3 GB of files on hard drive and on USB took around 3 seconds - so the speed of this script is not far from the speed of rsync itself.

The usage reporting and error checking are minimal, the arguments are:

ds.sh directory1 directory2 time-window (seconds)

(and all arguments are compulsory). So, for example, you might type:

./ds.sh usb_documents drive_documents 2

to synchronize your USB drive documents with your hard drive documents.

This script is experimental - please feel to use this - at your own risk. If you have questions or comments please let me know.
#!/bin/sh

awk '
BEGIN{
timewindow=ARGV[3]
print "The time window is: " timewindow
dir1="\"" ARGV[1] "\"/"
dir2="\"" ARGV[2] "\"/"

readdir(dir1, lista, typea)
readdir(dir2, listb, typeb)

for(filea in lista){
if(filea in listb){
if(typea[filea] == "f"){
timediff=lista[filea]-listb[filea]
if(timediff > timewindow){
com[++ncom]="# file in source directory newer than target"
com[++ncom]="cp -a " dir1 "\"" filea "\"" " " dir2 "\"" filea "\""
}
if(timediff < -timewindow){
print "# WARNING NEWER FILE IN TARGET DIRECTORY"
print "# files concerned are: "
print "# " dir1 "\"" filea "\"" " " dir2 "\"" filea "\""
}
}
}else{
if(typea[filea] == "d" ){
dcom[++ndcom]="# directory needs to be created in the target"
dcom[++ndcom]="mkdir -p " substr(dir2,1,length(dir2)-2) \
substr(filea,2) "\""
} else {
com[++ncom]="# file needs to be copied to the target"
com[++ncom]="cp -a " substr(dir1,1,length(dir1)-2) substr(filea,2) \
"\"" " " substr(dir2,1,length(dir2)-2) substr(filea,2) "\""
}
}
}
for(fileb in listb){
if(!(fileb in lista)){
com[++ncom]="# need to remove file in target not in source"
com[++ncom]="rm -f " dir2 "\"" fileb "\""
}
}
if(!ncom){
print "No updates required"
exit
}
print "The following commands are needed to synchronize directories:"
for(i=1;i<=ndcom;i++){
print dcom[i]
}
for(i=1;i<=ncom;i++){
print com[i]
}
print "Do you want to execute these commands?"
getline ans < "/dev/tty"
if( ans == "y" || ans == "Y"){
for(i=1;i<=ndcom;i++){
print "Executing: " dcom[i]
escapefilename(dcom[i])
system(dcom[i])
close (dcom[i])
}
for(i=1;i<=ncom;i++){
print "Executing: " com[i]
escapefilename(com[i])
system(com[i])
close (com[i])
}
}
}
function escapefilename(name){
gsub("\\$", "\\$", name) # deal with dollars in filename
gsub("\\(", "\\(", name) # and parentheses
gsub("\\)", "\\)", name)
}
function readdir(dir, list, type, timestamp, ftype, name){
cmd="cd " dir ";find . -printf \"%T@\\t%y\\t%p\\n\""
print "Building list of files in: " dir
while (cmd | getline > 0){
timestamp=$1
ftype=$2
$1=$2=""
name=substr($0,3)
list[name]=int(timestamp)
type[name]=ftype
}
close(cmd)
}' "$1" "$2" "$3"

Labels:

Sunday, February 01, 2009

 

Synchronizing Directories And Files With A USB Drive

If you are interested in this script, there is an arguably better version, which uses find, instead of recursion in bash, to traverse the directories. It is much faster. (You can obtain a copy here, Bash script to synchronize directories).

Every time that I check on the price of USB drives, it seems that the amount of storage that one can buy for $10 has doubled! Moore's Law seems to be a little accelerated for USB drives...

At this rate, in two years (say 2011) 256 GB USB drives will cost $10.

So, like many people, I store more and more information on USB drives.

And, like many people, I then rapidly run into the problem of keeping directory trees synchronized. It is actually a difficult problem, because although you know from the file timestamps which files are the latest files on the USB, you do not necessarily know the history of the files and directories. So if a file is deleted on the USB but still exists on the hard drive, what do you do? You either remove the file on the hard drive, or create the file on the USB, but knowing which action is the correct one is difficult. As you can create, modify, and remove files using a variety of programs, capturing the history necessary to synchronize two directories trees is difficult too. One solution might be to intercept all the OS calls to the file systems involved, but that seems to be a lot of work.

There are a variety of programs which set out to provide directory synchronization. Two of the most well known are rsync and unison. They are both well worth a look. Rsync in particular is very effective. However, for the application of keeping my USB drive and hard drive in synchronization, I wanted something which I could adjust a little more than rsync, and so I have been using the script below. This is still rather experimental, and comes with no guarantees whatsoever. If you try it, you need to take appropriate precautions for yourself.

The script allows the user to enter a 'modification window' in seconds. This allows latitude in the assessment of the file timestamps that are used in deciding whether to update files in the target directory. This is needed because a 'FAT' USB drive stores file timestamps at a lower resolution than either Windows or Linux typical file systems. For a FAT device you will probably want to supply '-t 2' to insure that you don't end up copy lots of files in either direction when the files are actually supposed to have the same timestamp in reality.

As mentioned, this script is still experimental. I use it with a directory of around 1.5 GB of files, which I synchronize between two computers and a USB drive. The performance is the primary concern, although it is certainly usable. The shell script uses 'stat' (a lot) to obtain information on the modification timestamps of the files that it needs to compare. I have been considering replacing this with a single find command to obtain this information in one shot upfront (the command will be something like 'find . -printf "%p\t%T@\n"'). Perhaps this will be the subject of a future script.

If you have any comments or questions, please let me know.

#!/bin/bash

# comparefiles either compares two files and returns (if in compare mode)
# or determines whether to update the target file and carries out the update

comparefiles () {

FILE1="$1/$3"
FILE2="$2/$3"

if [ $COMPARE = "Y" ]; then
if [ ! -f "$FILE2" ] ; then
echo "dirsync: warning $FILE2 does not exist"
NCOPY=`expr $NCOPY + 1`
else
diff "$FILE1" "$FILE2" > /dev/null
if [ $? != 0 ]
then
echo "dirsync: $FILE1 $FILE2 differ"
NCOPY=`expr $NCOPY + 1`
fi
fi
return
fi

if [ ! -f "$FILE2" ] ; then
NCOPY=`expr $NCOPY + 1`
if [ $DRYRUN = "Y" ]; then
echo "dirsync: need to /bin/cp -a -i $1/$ITEM $2/$ITEM"
else
echo "dirsync: copying new item $1/$ITEM to $2/$ITEM"
/bin/cp -a -i "$1"/"$ITEM" "$2"/"$ITEM"
fi
return
fi

FILETIME1=`stat -c'%Y' "$FILE1"`
FILETIME2=`stat -c'%Y' "$FILE2"`
TIMEDIFF=`expr $FILETIME1 - $FILETIME2`
NEGTIMEWINDOW=`expr -$TIMEWINDOW`
if [ $TIMEDIFF -gt $TIMEWINDOW ]; then
echo "dirsync: (t=$TIMEDIFF) copying file $1/$ITEM to $2/$ITEM"
echo "dirsync: $FILE1: `stat -c'%s %y' "$FILE1"`"
echo "dirsync: $FILE2: `stat -c'%s %y' "$FILE2"`"
NCOPY=`expr $NCOPY + 1`
if [ $DRYRUN = "Y" ]; then
echo "dirsync: need to chmod u+w $2/$ITEM"
echo "dirsync: /bin/cp -a $1/$ITEM $2/$ITEM"
else
chmod u+w "$2"/"$ITEM"
/bin/cp -a "$1"/"$ITEM" "$2"/"$ITEM"
fi
elif [ $TIMEDIFF -lt $NEGTIMEWINDOW ]; then
echo "dirsync: warning newer file in target TIMEDIFF: " $TIMEDIFF
echo "dirsync: $FILE1: `stat -c'%s %y' "$FILE1"`"
echo "dirsync: $FILE2: `stat -c'%s %y' "$FILE2"`"
echo "dirsync: diffing files"
diff "$FILE1" "$FILE2" > /dev/null
if [ $? == 0 ]; then
echo "dirsync: the files are the same - update target"
echo "dirsync: requires /bin/cp -a $1/$ITEM $2/$ITEM"
NCOPY=`expr $NCOPY + 1`
if [ $DRYRUN != "Y" ]; then
/bin/cp -a "$1"/"$ITEM" "$2"/"$ITEM"
fi
else
echo "dirsync: files differ"
echo "dirsync: requires /bin/cp -a -i $1/$ITEM $2/$ITEM"
NCOPY=`expr $NCOPY + 1`
if [ $DRYRUN != "Y" ]; then
/bin/cp -a -i "$1"/"$ITEM" "$2"/"$ITEM"
fi
fi
fi
}

searchdir () {

if [ $COMPARE = "Y" ]; then
echo "dirsync: comparing $1 and $2"
fi

if [ ! -d "$2" ]; then
if [ $DRYRUN = "Y" ]; then
echo "dirsync: need to mkdir $2"
else
mkdir "$2"
fi
fi

for ITEM in "$1"/*
do
ITEM=`basename "$ITEM"`
if [ -h "$1"/"$ITEM" ]; then
echo "dirsync: $1/$ITEM is a link and links are not handled"
elif [ -f "$1"/"$ITEM" ]; then
comparefiles "$1" "$2" "$ITEM"
NFILE=`expr $NFILE + 1`
elif [ -d "$1"/"$ITEM" ]; then
searchdir "$1"/"$ITEM" "$2"/"$ITEM"
NDIRS=`expr $NDIRS + 1`
fi
done
for ITEM in "$2"/*; do
ITEM=`basename "$ITEM"`
# the check on the existence of the second item handles the wild card
if [ ! -e "$1"/"$ITEM" -a -e "$2/$ITEM" ]; then
if [ -d "$2/$ITEM" ]; then
echo "dirsync: directory $2/$ITEM does not exist in $1"
else
echo "dirsync: File $2/$ITEM does not exist in $1"
fi
if [ $DRYRUN = "Y" ]; then
echo "dirsync: need to rm -ri $2/$ITEM (if -f is set)"
fi
if [ $CLEANUP = "Y" ]; then
echo "dirsync: rm -ri $2/$ITEM"
rm -ri "$2"/"$ITEM"
fi
NDELE=`expr $NDELE + 1`
fi
done
}

NDIRS=0
NFILE=0
NCOPY=0
NDELE=0
NDIFF=0
TIMEWINDOW="0"
CLEANUP="N"
DRYRUN="N"
COMPARE="N"

if [ "$#" -lt 2 ]; then
echo "Usage: dirsync source target [-f | -d | -k] [-t offset]"
echo "-f = force removal of files deleted in source (cleanup)"
echo "-d = dry run"
echo "-t offset = offset in seconds to apply to timestamps (for FAT)"
echo "-k = comparison"
exit
fi

# the first two arguments are directories

while [ $# -gt 0 ]; do
if [ "$1" = "-t" ]; then
TIMEWINDOW="$2"
shift; shift; continue
elif [ "$1" = "-d" ]; then
DRYRUN="Y"
shift; continue
elif [ "$1" = "-f" ]; then
CLEANUP="Y"
shift; continue
elif [ "$1" = "-k" ]; then
COMPARE="Y"
shift; continue
else # target directories are stored here
if [ -z "$SRC" ]; then
SRC="$1"
else
TRG="$1"
fi
shift; continue
fi
done

if [ -z "$SRC" -o -z "$TRG" ]; then
echo "Target directories not supplied"
exit 1
fi

if [ ! -d "$SRC" -o ! -d "$TRG" ]; then
echo "Either $SRC or $TRG is not a directory, stopping"
exit 1
fi

if [ $COMPARE = "Y" -a $CLEANUP = "Y" ]; then
echo "Compare (-k) not permitted with cleanup (-f)"
exit 1
fi

if [ $COMPARE = "Y" -a $DRYRUN = "Y" ]; then
echo "Compare (-k) not permitted with dryrun (-d)"
exit 1
fi

if [ $DRYRUN = "Y" -a $CLEANUP = "Y" ]; then
echo "Dryrun (-d) not permitted with cleanup (-f)"
exit 1
fi

searchdir "$SRC" "$TRG"

echo ""
echo "dirsync: number of directories searched = $NDIRS"
echo "dirsync: number of files checked = $NFILE"

if [ $DRYRUN = "Y" ]; then
echo "dirsync: number of files to be copied = $NCOPY"
echo "dirsync: number of items to be deleted = $NDELE"
elif [ $COMPARE != "Y" ]; then
echo "dirsync: number of files copied = $NCOPY"
fi
if [ $CLEANUP = "Y" ]; then
echo "dirsync: number of items deleted = $NDELE"
fi
if [ $COMPARE = "Y" ]; then
echo "dirsync: number of files that differ $NDIFF"
fi

Labels: , , ,

Tuesday, January 20, 2009

 

Opening a PDF File - With a Forgotten Password

Wow, I needed to open a PDF file today. This particular file was password protected years ago and, of course, the password had been forgotten. 'Well, while I am thinking, I might as well try a quick Google on the subject of lost PDF passwords'. This rapidly took me to the Source Forge page for 'pdfcrack', and just as rapidly I had downloaded and built pdfcrack on Cygwin. So far so good, no surprises. However, the real shock was in how fast 'pdfcrack' determined the password. It took just a few seconds - I hadn't even started to read the instructions and I had the password.

Pdfcrack was fast because the password was only 4 characters long, and pdfcrack was able to work through trial passwords rapidly (about 36,000 password attempts per second on this particular machine, a simple laptop).

So, be warned, passwords for PDF files need to be quite long to turn 'a few seconds' into 'a few days' and make PDF files secure; or their passwords unrecoverable.

Say 36,000 words per second is the standard speed for pdfcrack. How many letters (or characters) do you need in your password to make a PDF file generally safe against a brute force attack for 24 hours? Well, there are 24x60x60=86,400 seconds in 24 hours, and in this time pdfcrack can try 36,000x86,400=3,110,400,000 passwords (over three billion passwords). Say there are 60 characters that can be used in each position of the password, you will need at least a 6 character password, for reasonable PDF file security. (Because, 605 < 3,110,400,000 but 606 > 3,110,400,000).

Assuming, of course, that your password is not something that can be found in a dictionary. Most password recovery programs, like pdfcrack, can make use of a supplied dictionary of common passwords. So you absolutely need to avoid common words in passwords to ensure security, even if they are longer than 6 characters.

Wednesday, December 17, 2008

 

How to Merge and Join PDF documents

Have you ever wanted to merge several PDF files to create a single PDF file? Well, if you work with electronic documents you probably will. There is a very capable command line tool to carry out this and a range of other PDF related operations, it is called pdftk. You can obtain a copy of pdftk here: http://www.accesspdf.com/pdftk/ and if you are interested in merging, or taking apart PDF documents, this is the tool that you need.

For example, if you want to concatenate three PDFs into a single PDF file, here is the command:

pdftk A=a.pdf B=b.pdf C=c.pdf output output.pdf


I use pdftk on Windows XP via Cygwin on a regular basis.

Sunday, December 14, 2008

 

Using a KWorld Dvd Maker USB Device to Capture Video

I recently wanted to convert the video in an old Sony Handycam 8 to a digital format. I looked around at the various options, and did not find any particularly clear instructions on the web. So, I thought I would record my mileage here, in case it is of help to anyone looking for information on this subject.

There are a variety of video capture USB devices on the market. The prices vary widely, as do their reviews. I could not find any clear information on a suitable device to capture the video from an old, not very high quality, home video camera. So, I selected a KWorld DVD Maker USB device, bought from Fry's Electronics.

I installed the driver on Windows XP (I do not believe that the KWorld device supports Mac or Linux boxes). Then, after being baffled by what to do next for a while, I eventually realized that Windows Movie Maker now presented a capture device of the KWorld DVD maker. When this was being fed by the Sony Handycam, there was an input stream to be captured.

Using the device it is straightforward to convert video signal to WMV format. This enabled me to save the contents of the cassette in the Handycam, which was the object of the exercise. Actually, the Handycam needed to be encouraged during the course of proceedings with an enthusiastic 'tap'. (According to the web this is not uncommon for the Handycam).

Once the WMV file had been saved, I converted appropriate portions to MP4 format. The MP4 format files were then safely installed on an iPhone, where the effect has been well received.

For the record, here is the command line needed for the WMV to MP4 conversion - carefully transcribed from the ffmpeg options which the excellent Floola program uses for the iPod.

ffmpeg -i input.wmv -y -s 320x240 -vcodec mpeg4 -flags +aic+mv4+trell \
-mbd 2 -cmp 2 -subcmp 2 -g 250 -maxrate 2500k -bufsize 4000k -b 700k -acodec \
libfaac -ar 44100 -ab 64k -title "HomeVideo" -author "" -comment "" output.mp4

Thursday, August 07, 2008

 

A data CD archive

I have many CDs with who knows what wonderful data stored on them. Most of it is valueless backup data, of course. All those old CDs on the shelf were irritating me recently and so I determined that I needed to do something constructive with them. I guess that this is not a unique problem and so here is what I came up with.
Firstly, I needed to recognize that hard disk space is now relatively inexpensive. So I created images of all the CDs, simply in a flat directory called CD-ARCHIVE with the first CD called CD0001, the second CD0002 and so on. I made the copies using cp -a d:/* . or the equivalent command (my CD is labeled d:/). I ended up with 50 or so CD00nn directories and I marked each CD with the appropriate identifier CD00nn, so that if I mess up what is on the hard disk it can be recreated from the CD.

Then I needed to think about how to handle the resulting the archive. The first step was a simple indexing by file name. This I achieved with 'find -ls > listing.txt' that ran for a while the CD-ARCHIVE directory and created a complete listing of all the files that are stored in the archive. I then wrote a little script to help me find files quickly using 'grep -i $1 listing.txt' as its guts. This rapidly allows me to identify files of interest.

The next step was an aggressive removal of duplicate files. I did this with an execution of the filetidy.sh script and the script which it outputs to remove absolutely identical files. (I say aggressive because this script does not make intelligent decisions - if it finds two identical files it removes one of them and moves on). Then I made another index using 'find -ls > listing2.txt' and another version of the file finding script which references the second listing.

There was an amazing amount of duplication in all those old files. Some 3 gigabytes of duplication to be precise out of only 20 gigabytes of files. But now I have a readily indexed collection of backup material to sort through when I need to find an old file - and no CD changing.

Every now and again this repository of my electronic data is really useful. My next project will be processing my old floppies in the same way. Then I am going to relegate the old media to long term storage off site and (hopefully) never worry about physical media again!

Tuesday, July 01, 2008

 

What Really Slows Windows Down

With Windows you tend to add more and more programs and services during the lifetime of the computer, as you try to get more and more pieces of hardware and assorted technologies up and running on the system. Each new layer of software encrustation tends to stay - because after a while you can no longer tell if you actually need a given program. Virus scanners, firewalls, printer monitors, software that looks after your sister's scanner(s), it all adds up - and after a while you notice that the machine is not as fast as it used to be. Well, to add a little science to the exploration of what is causing the most significant problems here is a great link: http://www.thepcspy.com/read/what_really_slows_windows_down. If you follow the link you will find a detailed analysis of different virus scanning programs and their effects on boot time, computational performance and disk access times. The results are very interesting and the methodology can be applied to any programs or processes that you might find questionable on your own machine. Of particular interest is a simple timing program which checks the arithmetic performance of your CPU and times disk access on your machine. This is a recommended article for anyone with an interest in keeping their machines running at a reasonable rate.

Monday, June 30, 2008

 

How to check links - using a simple bash script

If your web site has a page of external links it is useful to have an automated method to check that the links are valid, because you cannot tell when a link will change or be taken over by another site. The alternative to automation is a tedious session of pointing, clicking and 'back'-ing. Of course, there are some heavy duty link checking programs which can automate the task for you - but even they tend to be a little tedious to use, meaning that your link checking might never get done, and your site would start to exhibit decaying links, a sure sign of neglect and carelessness. The following script assumes that you have a page called 'links.htm' which contains your external links. It then processes the strings which begin with http within this page - to pull down just a single page from each site, which it stores as www.site.com.tmp in the current directory. If a site is off line, or a page has moved on a given site, then you will see evidence of these facts in the output of the script. Rather than use 'lynx' or 'curl' to download the target page, the scripts uses 'telnet' and individual requests to the http server. This is done because it is a lot more educational than simply using 'lynx' or 'curl' and having some understanding of http is a good thing! The script took some inspiration from ancient variants which have existed on the web for more than 15 years - but is modified to not use temporary files. If you have any problems with it - please don't hesitate to let me know. At some point, I may put into the script a retry if the current sleep times prove to be ineffective in producing reliable page downloads. However, in its current form the script seems to work fine for my links.htm page - it gives me the confidence to extend the page knowing that I will be able to keep it up to date despite the ever changing web. Here is the script.

#!/bin/sh

grep href links.htm \
| sed 's/^.*href="//' \
| sed 's/".*$//' \
| grep -v "\.\." | while read siteline
do
nohttp=`echo $siteline|sed 's|http://||g'`
site=`echo $nohttp|sed 's|/.*$||g'`
item=`echo $nohttp|sed 's|/| |' | awk '{ print "/"$2 }'`
echo Checking $site $item
(echo "open $site 80"; sleep 3; echo "GET $item HTTP/1.0"; \
echo -n "User-Agent: Mozilla/5.0 "; \
echo -n "(Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4)"; \
echo " Gecko/20070515 Firefox/2.0.0.4"; \
echo "Host: $site"; echo; echo; sleep 5) | telnet > telnet.tmp
grep "404" telnet.tmp | grep "Not Found"
cp telnet.tmp $site.tmp
echo "This site produced" `wc -c telnet.tmp | awk '{print $1}'` "bytes"
done

Friday, June 13, 2008

 

Backing up a USB to a Gmail account

A few days ago, I found that my trusted 1 GB USB drive was exhibiting apparent corruption. Some of the directories gave listings in which the file names were garbled. Fortunately I think that the files in these directories weren't mission critical. However, I became concerned that valuable files that were on the root of USB drive might disappear too. I was on the road - so my back up options were limited. However, the following commands rapidly salvaged, compressed, encrypted, and stored on gmail what was left on my drive. These commands might be useful to you, if you find yourself in similar circumstances. And, if you know what causes USB directory listing corruption, and how to avoid it, and/or recover from it, please let me know!

# Commands to extract data from a USB drive and copy to gmail for storage
# Firstly, copy what you can from your USB drive to a directory
# called (in this case) 'save'
mkdir save
cd save
cp -r e:/stuff .

# create a tar file of that saved directory
cd ..
tar -cvzf save.tgz save

# encrypt the tgz file
ccrypt -e < save.tgz > save.tgz.backup

# split the tgz file into ~ 10 MB chunks
split -b10000000 save.tgz.backup

# now mail the chunks to your gmail account

for file in x*
do
echo $file
echo $file | mutt -a $file -s $file youremailaddress@gmail.com
done

# To reassemble the information on the USB drive you will need to download
# each of the chunks, reassemble them into the encrypted .tgz file, with:

cat x* > temp.tgz.backup

# then decrypt them with:

ccrypt -d < temp.tgz.backup > temp.tgz

# and finally un-tar the data

tar xvof ./temp.tgz

Tuesday, January 01, 2008

 

A simple script for GTD actions

Almost everyone has by now run into the Getting Things Done (GTD) philosophy of David Allen. But just in case you have managed to avoid GTD it is the latest in a long line of time management and organizational schemes promoted and popularized by the ever productive, communicative and entrepreneurial North Americans. One of the vices of GTD (and Franklin-Covey, Day Runner, Filo Fax and friends) is the cargo cult/fashion angle, whereby tools and labeling machines become the essential focus of the philosophy. With GTD, which is very popular in computer literate circles, there are a variety of software products designed to support the GTD prescriptions. I have tried a few of these and generally they are overly complicated and not particularly efficient - the learning curve is high and the return on the investment doesn't seem to warrant that investment. The Geek community as whole has encountered this problem and has responded with the concept that an ascii to do list was the way to solve the problem. However, the Geekification drive rapidly morphed this into the todo.txt/todo.sh shell script espoused by LifeHacker. I have tried this script too - but found that the script was so complicated that its performance was poor. However, recently I found a script which is simple enough to be fast and easy to learn - and yet performs well and is therefore very useful. It is called gtd.sh. Thank you to Tammy Cravit for publishing the script. I have made one or two minor modifications for my own use - and I have been happily using it for a couple of months. If you are interested in GTD and feel that todo.txt/todo.sh may be a little too complex - try Tammy's gtd.sh script.

Sunday, December 30, 2007

 

Using reTune and rebulid_db on your iPod and iPod Shuffle

As mentioned in an earlier post, I don't want to use iTunes with my iPod shuffle or iPod proper. Fortunately, Martin Fiedler has created two solutions which make the iPod technology independent of iTunes. For the shuffle - follow the simple instructions here: http://shuffle-db.sourceforge.net and then charging your shuffle is as simple as copying mp3 files to the shuffle and then ./rebuild_db.py in its root directory. Your mp3 files are kept wherever you want - no mysterious transformations to names or locations is required. For the third generation iPod that I possess, I can't use quite the same approach. iTunes transforms the name and location of your mp3 files and you need to use reTune. You can get a copy of reTune here: http://retune.sourceforge.net - and once you have it your can copy .mp3 files to your iPod and then make them available for playback by typing ./retune.py in the root directory. When you want to change file positions or add or remove files from the iPod - you need to reverse the Apple mandated name and location transformation by typing ./retune.py again, and then you can manage your music files. A big 'Thank You' to Martin Fiedler for creating these very useful tools!
 

Converting .m4a files to .mp3 format

If you have used iTunes in default mode to create a library of music for your iPod you may need to convert the resulting .m4a files to .mp3 format. MPlayer can do this for you - here are the necessary commands, in the form of a simple shell script:

#!/bin/sh
for FILE in *.m4a
do
mplayer.exe -vc null -vo null -ao pcm:fast $FILE -ao pcm:file="temp.wav"
lame -h -V2 --vbr-new temp.wav `basename $FILE .m4a`.mp3
done

You will be left with a temp.wav file in the current directory - delete it (and the .m4a files, if you want) after the conversion

Saturday, December 29, 2007

 

Converting CDs to MP3 format using cd-paranoia

Recently, I have been making use of a third generation, 15 GB, iPod. Accordingly the problem of filling the iPod with .mp3 files emerges - and the standard solution for this is iTunes. Although many people rave about iTunes - I found it a very confusing program - I was never sure what was going to happen when a 'synchronization' occurred. I don't have much local disk space on the machine that could run iTunes and as iTunes keeps a complete copy of what is on your iPod - this seemed to be an inefficient consumer of space (after all, the CDs themselves form an efficient backup of what is stored on your iPod). Additionally iTunes is a little too obsessed with its online big brother(s) at Apple in the form of adverts and update information. So, what do you do if you don't want to use iTunes? Firstly, use cd-paranoia to extract .wav files from your CDs. This is a very simple operation:

cd-paranoia -B

Then convert the .wav files to .mp3 using lame. The command for this is simply:

lame -h input.wav output.mp3

Wrapping the lame command in a simple for loop makes the process painless. The resulting .mp3 files do not possess ID3 tags - but you can rapidly add them with another loop, using the mp3info program, for example:

for FILE in *.mp3
do
echo $FILE
mp3info -t `basename $FILE .mp3` -l AlbumTitle -a "Artist A" $FILE
done

Once you have converted the CDs to .mp3 format, you can copy them to your iPod, and delete them from your hard drive. You will need to convince the iPod to allow you to play the tunes - without using iTunes - there are several programs that accomplish this by building the necessary description files for the on iPod software. I have been using 'retune' (see a coming post) and it works very well indeed.

Sunday, December 02, 2007

 

A little recursion

Here is a short script which goes through a directory structure and operates on mp3 or MP3 files. It uses standard bash methods - and I am posting it here in case you are interested - of course you can achieve the same effect with
find . -name "*.[mM][pP]3" -print

but this script is more fun - and substantially slower - giving you time to think about other things - which is often useful(!) The script prints out the names of the files it encounters - you could have it do something else - like report checksums (cksum) or modification times (ls -lt). The necessary lines for these operations have been commented out in the script. You can use this script as a template to carry out other file based scripted operations. Also note that it won't find files with the following cases in their extensions: mP3 or Mp3.


#!/bin/bash
scandir () {
for item in *
do
if [ -f "$item" ] ; then
curdir=`pwd`
nfile=`expr $nfile + 1`
EXT=`echo "$item" | rev | cut -c 1-3 | rev`
if [ $EXT = "mp3" -o $EXT = "MP3" ]
then
echo "$curdir"/"$item" | cut -c${ld1}-
# ls -lt "$curdir"/"$item"
# cksum "$curdir"/"$item"
fi
elif [ -d "$item" ] ; then
cd "$item"
scandir
ndirs=`expr $ndirs + 1`
cd ..
fi
done
}
startdir=`pwd`
ld1=`echo $startdir | wc -c`
ld1=`expr $ld1 + 1`
echo "Initial directory = $startdir"
ndirs=0
nfile=0
scandir
echo "Total directories searched = $ndirs"
echo "Total files = $nfile"

Sunday, August 12, 2007

 

How to make a continous loop photo VCD

Do you want to make a VCD that will display photographs on your DVD player? The following recipe is not elaborate. It doesn't deal with a sound track or with transition effects between images. However, if you want to share photographs using a DVD player and a television - this method is effective and - this is the simplest method that I could find. If you have suggestions for improvements, please let me know!

First, prepare a set of jpg files. It is convenient to do this with sequentially named files - because each jpg file will be turned into an mpg file, and the mpg files will be included in the VCD. Here is a short script which will name your jpg files IMG0001.jpg, IMG0002.jpg, etc. This script renames your existing jpg files in this directory (so work with copies of your photographs).

#!/bin/sh
count=1
BASE=IMG
for file in `ls -d *.jpg | sort -r`
do
echo $file
count=`expr $count + 1`
PADDED=`printf "%04d" $count`
name=$BASE$PADDED".jpg"
mv -i $file $name
done

Having named the jpg files in a convenient way - we then need to insure that they are appropriately scaled. Again a short script comes in handy.

#!/bin/sh
for file in *.jpg
do
echo $file
jpegtopnm $file | pnmscale -xysize 768 576 > tmp.pnm
pnmtojpeg tmp.pnm > `basename $file .jpg`.scaled.jpg
done

Now your jpg files are appropriately named, and appropriately sized, the next step is to create a matching set of short mpg files. To do this we will make use of ffmpeg. Here is the script to achieve this.

#!/bin/sh
for file in IMG*.scaled.jpg
do
echo $file
count=0
while true
do
count=`expr $count + 1`
if [ $count -eq 21 ]
then
break
fi
cp $file tmp$count.jpg
done
ffmpeg -f image2 -i tmp%d.jpg -target \
ntsc-vcd `basename $file .test.jpg`.mpg
done

Now you need to create the 'selection' segments for the xml file that we will use to create the VCD image. Another script is needed. Note that here the number of images is 19 in the example that I used, and this means that there is an '18' in the script - you will probably need to change this for your files.

#!/bin/sh
#NB one less that the number of mpg's to be
#included as the vcd xml is zero based
max=18
count=0
while true
do
thissec=`printf "%03d" $count`
countp1=`expr $count + 1`
nextlid=`printf "%03d" $countp1`
sequeid=`printf "%02d" $count`
if [ $count -eq $max ]
then
nextlid="000"
fi
cat <<!
<selection id="lid-$thissec">
<next ref="lid-$nextlid"/>
<timeout ref="lid-$nextlid"/>
<wait>5</wait>
<play-item ref="sequence-$sequeid"/>
</selection>
!
count=`expr $count + 1`
if [ $count -gt $max ]
then
break
fi
done

This script will write to standard out a revised set of <selection> items to put into the xml file needed to create the VCD. Execute the command, capture its output to a file, and edit this file into the videocd.xml file created with the following command.

vcdxgen -t vcd2 IMG0002.mpg IMG0003.mpg
(include all the mpg files on the command line).
Edit the selection sections into the resulting videocd.xml file - in the place of the playlist sections. Now you can create the VCD image and cue file with:

vcdxbuild videocd.xml

and burn the resulting image with

cdrdao write --device 1,0,0 --driver generic-mmc-raw \
--force --speed 4 videocd.cue

As noted in the introduction, this is not a fancy approach - but it makes nice simple photograph VCDs. In the future I may create a single script which does the work - and improve its error checking.

Sunday, August 05, 2007

 

How to make a continuous loop VCD

As discussed in other notes here, the VCD format allows you to play videos and movies on your DVD player using normal CD-R media. One thing that you might want to do is create a continuous loop VCD - you can use that to give yourself a constant background of favorite material from YouTube, create a looping product demonstration for a show or for a store, or even turn your television into a fishtank - using a movie or two of aquatic life. The VCD format can accommodate menus and considerable complexity. All the information that you need to understand VCD menus is here on PCByPaul and here (in detail). However, if you want to create a continuous loop playing video using VCD - you just need to do the following. Start with two flash movies, say video1.flv and video2.flv. First, create the necessary VCD format mpg files.

ffmpeg.exe -i video1.flv -target ntsc-vcd -ac 2 video1.mpg
ffmpeg.exe -i video2.flv -target ntsc-vcd -ac 2 video2.mpg

Now you need to create a template xml file to control vcdimager's production of your movie. You use

vcdxgen -t vcd2 video1.mpg video1.mpg

This command creates a file called videocd.xml. You need to change this slightly to create the looping effect. To do this, remove the <playlist> portions in the pbc section in the file. Replace these with <selection> items, as follows:

<selection id="lid-000">
<next ref="lid-001"/>
<timeout ref="lid-001"/>
<wait>1</wait>
<play-item ref="sequence-00"/>
</selection>
<selection id="lid-001">
<next ref="lid-000"/>
<timeout ref="lid-000"/>
<wait>1</wait>
<play-item ref="sequence-01"/>
</selection>

Take a look at the changes and you will see how the looping effect is achieved. When the first section times out, it moves to 'lid-001' (the next section). When the second section times out, it moves to 'lid-000' (the first section) - and so on.
Once you have updated videocd.xml, you can use it to create the image to be burned to cd with:

vcdxbuild videocd.xml

And then finally burn the VCD with:

cdrdao write --device 1,0,0 --driver generic-mmc-raw \
--force --speed 4 videocd.cue

Now you can create some nice endless, atmospheric video backgrounds - and display them on your large screen television. If you need more than 2 movies - just create the appropriate number of <selection> sections and make sure that the last one times out to the first.

Labels:

Sunday, July 29, 2007

 

How to split and join mp3 files

I recently needed to extract the important portion of one mp3 file. I also had the opposite problem of joining two related mp3 files. Here is how these operations can be achieved. Firstly, to extract the important portion of a given mp3 file you can use the following command:

ffmpeg -i input.mp3 -ss "0:35:26" -t 724 output.mp3

This starts the extraction at 35 minutes 26 seconds into the mp3 file called input.mp3. The extraction then runs for a duration of 724 seconds, and the output is stored in output.mp3.

To combine two mp3 files, you can simply concatenate the individual mp3 files:

cat file1.mp3 file2.mp3 > file3.mp3

Sunday, July 22, 2007

 

Converting a talk show mp3 file to reduce its size

Say you want to reduce the size of a talk show mp3 file to maximize the number of mp3 files that you can load on your player. How do you do that? Here is a lame command:

lame -b 40 -m m --resample 22.05 -S KentBeck_Large.mp3 KentBeck.mp3

This reduces the bit rate to 40 kbps (-b 40), sets the mode to mono (-m m), resamples at a frequency of 22.05 kHz (--resample 22.05), and doesn't print anything to the screen (-S). For the 36 megabyte file in this example - this saves 13.75 megabytes - and does not appreciable change the quality of the recording.

Labels:

Sunday, July 15, 2007

 

More information on grabbing an rstp file as an mp3 file

Again I faced the question of off-line mp3 listening. Specifically, how to listen to a online news item from NPR - when I am constantly interrupted when I am online and would prefer something that I can listen to on my mp3 player or ipod shuffle. As the rstp post indicates the main problem is determining the rstp address of the media source. Here is how to determine this for the NPR site.
1) Start up a browser listening to the interview or show that you would like to listen to off-line. (I used Firefox). This will spawn a 'player' browser window with the embedded control for the Windows Media Play or Real player within it.
2) Use the 'Launch Standalone Player' to fire up the Real player in isolation (switch to Real player if necessary)
3) This has the effect of downloading a 'rpm' file to your desktop - this file contains a url like this:
http://www.npr.org/templates/dmg/dmg_em.p....

4) Actually, this string is (deliberately?) very long and complex - and you need to turn this into an rstp address. Do this using
wget -O output.txt "http://www.npr.org/templates/dmg/dmg_em.p...."

I did this by just editing the rpm file, then typing
source filename.rpm
Note that some additional quotes needed to be added to the http address.
5) This wget operation is fast - and you will find in the output.txt file an rstp address that can be fed into the instructions for capturing an rstp stream as an mp3 file.

Labels:

Sunday, July 08, 2007

 

Making an mp3 file of a rstp stream

I found myself wanting to make a permanent copy of a talk given on the web using an rstp address. You might want to do this to be able to share the talk with a friend or colleague - or to view later offline from the web. Here is how this objective can be achieved. The first step is discover the rstp address of the file. This is a little painful - but should always be possible. What worked for me was to launch the talk in a browser (which on windows fired up the Real Player embedded in the browser). This gave me an option to launch the Real Player browser, and this in turn gave me the option to share the presentation with a friend using email. I took this option - and finally ended up with an email message which revealed the rstp://... information required to capture the file. This looks like an http:// address - but to grab the file you need a program that can talk rstp:// - so wget etc. cannot be used. MPlayer does have the requisite functionality though, here is the command line to capture the file:

mplayer -noframedrop -dumpfile example.rm \
-dumpstream rtsp://"long complicated address/filename.rm"

Then you need to convert the captured example.rm file to wav format:

mplayer -ao pcm example.rm

Initially that failed with 'Cannot find codec for audio format 0x72706973' for me - that was a little tedious - but it was cured by returning to the official Mplayer site and just putting the sipr*.dll (from the codecs zip file) in the Mplayer directory. Finally,
 
mplayer -ao pcm example.rm

created 'audiodump.wav' which lame was able to turn into a mp3 file as required

lame -h audiodump.wav example.mp3

So, it takes a few steps - but eventually you get the offline usable mp3 file.

Labels:

Saturday, June 16, 2007

 

How to create a VCD

Almost all DVD players can also play VCD format disks. VCD format disks are made with normal CD media and so are less expensive than DVD disks. VCD cannot store as much information as DVD (which translates to shorter movie lengths). But if you are making DVDs which do not run for hours - you can probably use VCD instead of DVD and save money. Here are the commands necessary to create a VCD disk from a flash (flv) file. We will take the example flash file 'example.flv'. First you need to convert your flash file to mpg, and at the same time we will also use ffmpeg to convert the flash file to the standard required by VCD (using the -target ntsc-vcd option). I found that it was also necessary to make sure that there were two audio channels (the -ac 2 option).

ffmpeg.exe -i example.flv -target ntsc-vcd -ac 2 temp.mpg

The next step is create the VCD image. The tool to use for this is "vcdimager". The following command line takes the temp.mpg file and converts it to a disk image/cue file pair that can be used to burn the VCD disk.

vcdimager -t vcd2 -l "Example" -c vcd.cue -b vcd.bin temp.mpg

(If you have multiple mpg files to add to the disk just replace 'temp.mpg' with 'temp1.mpg temp2.mpg temp3.mpg' etc.) For the burn operation, use cdrdao. Here is the command line that I used on Cygwin on Windows XP.

cdrdao write --device 1,0,0 --driver generic-mmc-raw \
--force --speed 4 vcd.cue

Getting cdrdao working on Cygwin took a minor amount of effort - I will post a note on how to do that in the near future. On Linux you should have no problems whatsoever. If you follow these directions, you'll be able to save and share VCDs of anything that starts life as a flash movie file (flv) - or any other format that ffmpeg can deal with.

Labels:

Monday, June 04, 2007

 

Using GNU screen on Cygwin

GNU Screen

Screen lets you work with multiple text based sessions in the linux and unix world. Cygwin doesn't yet have an official version - but screen-3.9.15 has been patched for Cygwin by Emilio Lopes - and I have started to use it. You can obtain the binaries from Emilio's web page here. There are other tabbed terminal options out there - but if you can learn the shortcuts - screen gives greater efficiency - so here are my notes/tutorial on the elementary use of screen.
1. Obtain and install the screen binary from the link above
2. Startup screen, by typing "screen bash --login -i"
3. Create a new session, by typing "screen bash --login -i" (ctrl-a ctrl-c is the another method to create a new screen session - but I could not get this to give me my usual environment and aliases)
4. Get a little output going in the new session, e.g. "while true; do date; sleep 2; done"
5. Switch the first session: ctrl-a ctrl-spacebar
6. Do a little work (start to think about setting up the man pages for screen)
7. Switch back the 'output' session: ctrl-a ctrl-spacebar
8. And observe that output has been going on in the first session behind the scenes
9. To cut and paste between sessions use: cntrl-a [ (to enter selection mode); navigate the window with the 'vi' direction keys h,j,k,l marking text with a spacebar press, further navigation and an 'enter'. Then go to the desired window for your paste operation and type cntrl-a ]

(You will want to create an alias for "screen bash --login -i", of course.)

TabbedCygwin

An alternative to GNU scree is TabbedCygwin, written by Klaus Novere, TabbedCygwin provides a .Net application which hosts rxvt. You can get the Windows installer here: http://klaus.novere.com. Once you have installed TabbedCygwin, set the Cygwin location (typically C:\Cygwin) and the rxvt command line under 'Extras/Options' and you are off and running. The command rxvt arguments I used was: "-sr -sl 10000 -rv -fn 'Courier New-18' -e bash --login -i". You can then create multiple terminal sessions easily and you have a graphical interface to switch between them. The result is convenient - I wrote this text in such a TabbedCygwin hosted terminal. I have found some problems - at some point TabbedCygwin forgot its default screen dimensions and I had to track down the appropriate part of the Windows registry to resolve the problem - but it is a neat solution - and it is easy to use.

Labels:

Friday, May 25, 2007

 

How to add a disk to a linux box

I recently had occasion to add a disk to a virtual machine running Fedora Core 5. Without much system administration experience, and with an inclination emboldened by the fact that the machine was running on VMWare on a Windows XP machine, I found that the operation straightforward. Here are the necessary steps:
1. Add the hardware to the machine (virtually in my case, using the VMWare UI)
2. Use fdisk -l to establish the name of the hard drive that you have hooked up to the machine. The answer in my case was /dev/sda. The drive name information will be distinguished by the fact that the target drive is reported as not having a valid partition table
3. Use fdisk /dev/sda to format the new disk. This involves typing 'm' to get a help listing, 'n' to add a new partition, 'p' to selected extended, '1' to specify the starting cylinder and 'w' to write the partition information to the hard disk.
4. Use mkfs.ext3 /dev/sda to write a file system to the disk
5. All that then remains is to mount the disk so that it is accessible to the OS, first create a mount point mkdir /mnt/disk2 then issue a mount command to hook the disk to the mount mount -t ext3 /dev/sda /mnt/disk2
Then you can go on to work with your disk space hungry software.

Labels:

Saturday, May 19, 2007

 

Dealing with malware and 100% CPU svchost.exe

I had to spend some time getting a Windows XP box to behave correctly recently. The machine had been hooked up to the network with relaxed Internet Explorer ActiveX security settings - and was subsequently the proud possessor of malware. The effect of this intrusion was an occasional Internet Explorer redirect to a dubious website or two - typical ones were: www.jack9.com & www.maniatv.com (don't visit these sites - I haven't linked them deliberately). Getting rid of the infection took some doing (and I am not yet completely sure that it is all cleaned up). Here are the things that I learnt.

If you are in a malware affected state on Windows XP - a 'bad' dll can be linked into your Windows Explorer as a 'Browser Helper Object' (BHO) and this can then redirect your browser or fire up your browser to a given (target) web site. The target web site might be one that the malware writers want to increase traffic to, or might be a site which attempts to sell you software to clean up your computer - an internet protection racket! In order to find out what was going on - I had to hook up depends.exe (http://www.dependencywalker.com) (that is a good link!) to explorer.exe. The way to do that is to firstly start up depends.exe, then kill explorer.exe in the TaskManager - then use File/Open in depends.exe to open C:\WINDOWS\explorer.exe. Once explorer.exe is loaded in depends.exe you can then start profiling the Windows Explorer (under the Profile menu). Then you can see all the things that your Explorer is up to (the explorer.exe is firmly embedded into the Windows XP operating system). This revealed that there was a dll apparently up to no good - every second or so it was loaded into the explorer and then it was unloaded. The dll was listed as C:\WINDOWS\system32\awvtt.dll. Some googling on this revealed the deletion procedure - and because the dll had become attached to explorer.exe on start up this involved creating a batch file that deletes the files and fixes registry entries before explorer starts up - because once explorer is started - the dll's can't be deleted as they are embedded in the running Windows XP processes. The role of depends.exe here was in determining what the problem dll was - virus scans do not provide pinpoint information (and they take a long time to complete) - moreover, in this case the scanners had a hard time identifying the problem.

The next problem was svchost.exe - for a while this process has been going wild at start up on this machine, occupying 100% or the CPU. Googling this indicated that this actually wasn't suspicious behavior - it was a Microsoft bug. So, I downloaded and applied the patch - and so far so good - no further svchost.exe strange behavior. The Microsoft Knowledge Base article to look for here is KB927891 - the issues listed aren't too explicit about svchost.exe - but after applying the patch the machine hasn't had that problem.

So far the machine has been behaving better after these changes - but this experience has led me to worry about the Windows XP operating system - it is so easy to breach the OS and do nasty things to the machine and the users of the machine. Furthermore, it is difficult to get the machine back to normal after the bad software has hit. This comes about because:
1) People need to install software that interacts with their browsers
2) Microsoft needed to embed the browser in the OS to defeat the browser threat 10 years ago
3) People need to make money one way or another - and hence scareware comes into existence
The next step will be Microsoft making the world more secure with Vista. It is inevitable - so you might as well get with the program!

And as a final activity in this project - I have installed VMWare running Ubuntu on the Windows XP machine. This seems to be a lot safer - browsers writing to system binary areas is less of a problem in the Linux world. For a great tutorial on VMWare on Windows XP, see http://spyware-free.us/tutorials/vmware/

Sunday, April 22, 2007

 

Reporting duplicated file names

Here is a very simple script to report files which have the same name in a directory tree. There is no provision for spaces in file names (or other associated niceties like ampersands or dollars in files names). However, as a simple tool, to check on possible duplicates which may differ in their contents but be similar in name, it is useful.

#!/bin/sh

find -type f | \
awk '{print $1,$1}' | \
sed 's# \..*/# #' | \
sort -k2 | \
awk '{if(prev == $2){print pl; print $0;print""}pl=$0; prev = $2}' | \
awk '{print $2,$1}'

Here is how it works. First "find -type f" finds files and reports their complete names including their paths. Then the awk fragment "awk '{print $1,$1}'" duplicates the file name output and the sed command "sed 's# \..*/# #'" takes the second field - and removes everything before the final '/' (this is the base file name. Then we sort "sort -k2" based on the base file name (-k2), and the longer awk fragment "awk '{if(prev == $2){print pl; print $0;print""}pl=$0; prev = $2}'" prints just those files which are duplicated. Finally we switch the fields back with awk "awk '{print $2,$1}'" - so that the output shows the duplicate file name followed by the complete pathway to the possibly duplicated files.

Labels:

Saturday, March 24, 2007

 

How to save a set of files for transfer

Say you want to save all your graphics files from one computer and transfer them to another computer - what is the best way of going about this? There are a variety of ways to achieve the objective. If you are well organized you can just transfer one directory from one machine to the other. However, frequently we spread files around on a machine (or our various programs do) and what is first needed is a search to find the important information - then a rescue to collect it all in one place - and finally a transfer to its new location. Here is how you can go about collecting and saving all the jpg files on a machine. Firstly - although you may have a Windows machine - make sure that you have access to Cygwin so that you can use the general Linux command line options and commands that Cygwin makes available. Once that is in place the steps are:

  1. Find the files
  2. Clean up the names
  3. Make the files list into a tar archive
  4. Transfer the archive to the other machine (using ftp)
  5. Extract the files (using tar xvzf filename.tgz)

How to find the files - the find command is the appropriate tool


find . -name "*.jpg" -print

Resulting in:


$ find . -name "*.jpg" -print
./My Music/AlbumArtSmall.jpg
./My Music/Folder.jpg
./My Pictures/DSC00388.jpg
./My Pictures/MyPicture.jpg
./My Pictures/UpgradeDialog.jpg

However, there are also files which have the extension .JPG and there might be files with .jpeg and any capitalization combination between these choices. So use the following find command:


find . \( -name "*.[jJ][pP][eE][gG]" -o -name "*.[jJ][pP][gG]" \) -print

This might seem complex, but the segments in square brackets like [jJ] enable the find command to select files with any possible capitalization pattern of jpg or jpeg as an extension - and print out the path of the file. The output of the command is now:


$ find . \( -name "*.[jJ][pP][eE][gG]" -o -name "*.[jJ][pP][gG]" \) -print
./a.jpEg
./My Music/AlbumArtSmall.jpg
./My Music/Folder.jpg
./My Pictures/DSC00388.jpg
./My Pictures/IMG_0175.JPG
./My Pictures/IMG_0176.JPG
./My Pictures/IMG_0180.JPG
./My Pictures/MyPicture.jpg
./My Pictures/New Folder/IMG_0391.JPG
./My Pictures/New Folder/IMG_0396.JPG
./My Pictures/New Folder/IMG_0398.JPG

So, finding the files is no longer a problem. However, they must be saved in a suitable archive - so that they can be transfered together. There as several possible commands (zip is one possible choice). But let's use the simple tar command. To glue the find output and the tar command together use the xargs command. This takes a list of, typically files as provided by find, and passes them into a command specified as its first argument. As xargs using spaces to delmit its own arguments - it is necessary to make sure that any spaces in filesname are appropriately escaped. Hence some sed is required. The short sed script is shown below - it has the effect of escaping any non-alaphabetic or numeric character in the filename. This is a good remedy for the various other characters which may be inserted in Windows file names that on occasion can confuse the Cygwin command line (like ampersands and dollars, for instance). The sed command says 'for characters which are not alpha numeric, replace them with the character itself with a backslash prepended to the character'. (There are a total of 5 backslashes in the sed script - to escape the backslashes from the shell and to account for the backslash-1 nomenclature that sed uses to refer to the matched token (the non-alphanumeric character). So the command to output cleaned up filenames now looks like this:


$ find . \( -name "*.[jJ][pP][eE][gG]" -o -name "*.[jJ][pP][gG]" \) \
-print | sed -r "s/([^a-zA-Z0-9])/\\\\\1/g"
\.\/a\.jpEg
\.\/My\ Music\/AlbumArtSmall\.jpg
\.\/My\ Music\/Folder\.jpg
\.\/My\ Pictures\/DSC00388\.jpg
\.\/My\ Pictures\/IMG\_0175\.JPG
\.\/My\ Pictures\/IMG\_0176\.JPG
\.\/My\ Pictures\/IMG\_0178\.JPG

As you can see from the output - the term 'cleaned up' is used loosely. However, the good news is that xargs can deal with this output easily. The command to hook up tar to this output is "xargs tar -rcvf jpg.tar" which says from the stream of files supplied to xargs, provide them as arguments to tar, in append mode (-r) to add them (-c) to the tar archive (-f) jpg.tar. The (-v) option makes tar run in verbose mode so that you can see what it is doing. Here is the command now:

$ find . \( -name "*.[jJ][pP][eE][gG]" -o -name "*.[jJ][pP][gG]" \) \
-print | sed -r "s/([^a-zA-Z0-9])/\\\\\1/g" | xargs tar -rvf jpg.tar
./a.jpEg
./My Music/AlbumArtSmall.jpg
./My Music/Folder.jpg
./My Pictures/DSC00388.jpg
./My Pictures/IMG_0175.JPG
./My Pictures/IMG_0176.JPG
./My Pictures/IMG_0178.JPG
./My Pictures/IMG_0179.JPG
./My Pictures/IMG_0180.JPG
./My Pictures/MyPicture.jpg
./My Pictures/New Folder/IMG_0391.JPG

Now all the jpg files are safely contained within the archive jpg.tar - and this file can be transferred to another computer. The files can then be extracted using "tar -xvf jpg.tar"

Saturday, March 03, 2007

 

Making the most of YouTube

So when YouTube came out and attracted so much attention we were amazed. So much great content was suddenly available - The Jam on the Marc Bolan show, any number of Blackadder episodes and the Animaniacs - all present and available for use online, without advertising breaks! Online was the only problem because not everything is connected to the net all the time and the devices best able to deal with video content, apart from computers, like iPods, PSPs and televisions were not able to benefit from all that content availability. So, you might ask, given YouTube, is it possible to make use of this content offline?

The answer is yes - absolutely! YouTube is effectively storing the video information and transferring it to your browser for viewing. The video content can be readily transformed and used in a variety of ways that build on the basic mechanism that YouTube uses to publish the information to you in the first place. If you look around on your machine while viewing a video you will find that the video is transferred to your machine - to an obscure folder. For example, on Windows using Firefox, you will find the flash video in a location such as this: Documents and Settings\username\LocalSettings\ApplicationData\Mozilla\Firefox\Profiles\
54mhz57r.default\Cache\2F82C2A7d01 (of course, this location will vary depending on the browser that you are using, the operating system of your computer, the video that you are viewing, and your user name). Furthermore, if you look at the source of the html of a page which displays a YouTube video you can decode the location of the flash video that YouTube is serving to your client browser to download to this folder. Sites like savetube and keepvid take a look at the html source of a given video and decode the download location for you.

So, you can readily get flash video copies from YouTube - they are downloaded when you watch a video behind the scenes and sites like savetube and keepvid make it easy to go from a video url to the download location (so that you can download without having to view first). Once you have saved a flash video to your disk - what can you do next? Your options are:

The disadvantage of flash video is that it cannot be used by the Windows Media Player and a variety of other quality video players which are available on Windows and other platforms. Additionally there are few editing options available - the format is a little too proprietary to be a useful format to save. Flash is very compressed - and this is its main advantage - it allows YouTube to reduce disk space requirements on its servers as it saves those terabytes of everybody's video data. What can you convert flash video to? Among your options are:


I have collected technical information on how to make these transformations on this site. I will add to the information here - so check back if you are interested in making the most of YouTube and online video in general. You will see that these methods rely on free open source tools - these can be used on Linux and with Cygwin on Windows with ease.

Labels:

Saturday, January 13, 2007

 

Removing duplicated files - keeping your files tidy with filetidy

Frequently you find that files have been duplicated on your various machines. This happens when trees of files are moved from machine to machine and work begins to diverge within these trees. Rather than manually reconcile such work activities (which is slow and difficult) - it is often useful to rapidly find duplicated files - information on file duplicates gives you a sense of directories or folders that can be deleted - or you can simply remove the duplicates automatically. Here is a script, called filetidy, makes use of find, sort, cksum and awk to automate the analysis of duplicate files.

It works in in the following manner: a long listing of file information is created using find - this information is sorted, and then files which are the same size (and therefore could be duplicates), are tested for similarity using cksum. The output is in the form of a list of diff commands and a list of commented 'rm' commands. You can use the output to confirm that files are indeed duplicates - and then once you have decided which files to retain - to delete the duplicates.



#!/bin/sh
# 1. find files only and report long listing
# 2. sort based on size field
# 3. process same size files using awk and cksum
# 4. output a script which diffs files for confirmataion and
# can delete files with editing

find -type f -ls | sort -k7 | \
awk 'BEGIN{
prevsize=-1
ncount=0
}
function ckfile(filename, cmd)
{
if (length(ck[filename])==0){
cmd="cksum " filename
cmd | getline ckout
close(cmd)
split(ckout, array," ")
ck[filename]=array[1]
}
return ck[filename]
}
{
filesize = $7
for(i=1;i<=10;i++){ # remove all fields except the filename
$(i)="";
}
file = $0
gsub("\\$", "\\$", file) # deal with dollars in filename
gsub("\\(", "\\(", file) # and parentheses
gsub("\\)", "\\)", file)
if(match(file,"&")) next; # avoid files with ampersands
if(match(file,"\047")) next; # avoid files with apostrophes
sub("^[ \t]*", "", file) # remove leading white space
ncount++
filelistsize[ncount]=filesize
filelistname[ncount]=file
}
END{
i=1
while(i<=ncount){
filelistname[i]
j = i+1
while( filelistsize[j] == filelistsize[i] && j <= ncount ){
if ( ckfile(filelistname[i]) == ckfile(filelistname[j]) ) {
if ( ck[filelistname[i]] != oldck ) {
if ( first == 1 ) print ""
oldck = ck[filelistname[i]]
first = 1
}
if( !visited[filelistname[j]] ){
visited[filelistname[j]]++
fn = filelistname[j]
print "diff " filelistname[j] " " \
filelistname[i] " # " filelistsize[j]
print "#if [ $? == 0 ] ; then rm -f " fn "; fi"
}
}
j++
}
i++
}
}'

Usage is typically:


filetidy.sh | tee tmp.txt

Examine tmp.txt to confirm that the duplicates idenfited makes sense, then remove some of the commented 'rm' commands, and remove the duplicated files.


source tmp.txt

Labels:

Saturday, January 06, 2007

 

Getting Cygwin configured properly on a laptop

A note on how to configure Cygwin for maximum usability on a laptop. A quick look at many sites on web development, such as ZZTools, provides ample evidence of the utility of using the command line and Linux or Unix utilities to maintain your site. Yet many people must make use of Windows too for a variety of reasons. A popular solution to this dilemma - which is often better than maintaining two machines, dual booting or using virtual machines is Cygwin. Cygwin allows you to use familiar and convenient command line utilites on Windows. The preceding link describes the installation procedure - every Windows based web developer should install Cygwin! A minor tribulation with the default installation will be the bash shell which Cygwin provides - which by default is hosted in Windows' cmd.exe and has fairly odd and unuseful cut-and-paste functionlity - among other problems. Fortunately you can resolve this by locating your cygwin.bat file, generally in c:\cygwin\ and updating it to be as follows:


@echo off
C:
chdir C:\Cygwin\bin
rxvt -sl 10000 -rv -fn 'Courier New-18' -e bash --login -i
pause

(You will want to copy cygwin.bat to a backup copy before making your edits). This will have the effect of launching rxvt instead of cmd.exe when you fire up a Cygwin shell. The command line options provide for 10,000 lines of scrollability history, reverse video display (ie. a black background), a large fixed width font and the execution of bash in the resulting window, in full startup and interactive mode. Of course, as you now have Cygwin installed, you can obtain additional detail with man rxvt and man bash, and tune the options to match your needs. With rxvt you copy text by simply highlighting it, and you paste by either using the middle mouse button - or if you have touchpad laptop - using shift left-mouse button. The net result is a substantial improvement over cmd.exe.

Labels:

Sunday, December 31, 2006

 

How to monitor blog site statistics

What is the best way to monitor the status of your site(s)? Your own analysis of server blogs, one of the variety of online analytical tools out there or http://www.google.com/analytics? All are effective and simple to use - though you do have to use a UI or bookmark a page and execute multiple mouse clicks to extract the current status. Is there a simpler method? If you are willing to use the command line there is. For example, if you have submitted your site to Blog Top Sites, you can extract a command line report of the current status using lynx and sed, for example:


#!/bin/sh
lynx -dump http://www.blogtopsites.com/sitedetails_16988.html \
| sed -n '/Current Rank/,/Hits Out/p'

If you put these lines into a short script file, called say wstatus, and make wstatus executable then you will find the following output on execution of the script:


[11][FreewareList.net] Current Rank: 3 ([12]Computers)
URL: [13]http://FreewareList.net
Join Date: August 14, 2005
Site Description: Download Latest Softwares and Games. Update Daily !
Visit http://freewarelist.net
Date Joined: Mar 17, 2006

Statistics

Unique Visitors Today: 2,500
Page Views Today: 5,931
Unique Visitors this Week: 2,500
Page Views this Week: 5,931
Unique Visitors this Month: 96,555
Page Views this Month: 264,005
Total Unique Visitors: 181,490
Total Page Views: 487,793
Total Hits Out: 1,780

(I have taken the freewarelist.net site as an example here). The lynx command provides an ascii dump of the statistics of the site, as reported by Blog Top Site, and the sed command thins this down to the pertinent details. Now you can get a status report just by typing wstatus - the results are rapidly reported - and you can carry on with your normal tasks with minimal interruption. If you have too great an urge to monitor - and monitoring can be addictive - create a cron job which executes the wstatus command and emails the output to you at a defined frequency.

Thursday, December 28, 2006

 

Using CVS to track file changes

The article Using RCS to track file changes shows you how to work with single files using RCS. This is minimalist source control - but it is a huge advance over tracking changes manually. If you have many files and many directories - handling individual file check-ins will be tedious with RCS. A better tool for this task is CVS - but CVS can be more intimidating than RCS for casual use. This article provides an illustration of simple CVS use - if you have not used CVS - and you have a large collection of files that you would like to put under source control, this article will get you started. I have kept the formatting compact - with comments beginning with '#' in the midst of the commands.


#First let's create an example directory
#with two files in two directories
mkdir example
cd example
mkdir a
mkdir b
touch a/a.txt
date > b/b.txt

#An ls -aR will show you the structure of the example directory
ls -aR

#Now create a local cvs repository
cvs -d ~/cvsexample init
#And import the current directory into it, calling the project
#in the repository 'example'
cvs -d ~/cvsexample import -m "" example example initial
cd ..

#Move the example directory out of the way
mv example example-old
mkdir tmp
cd tmp
cvs -d ~/cvsexample co example
#The cvs controlled example directory is now in ~/tmp/example

#From now on we work in this directory - let's do some example work
cd example/
ls
echo "Another line" >> b/b.txt
cvs diff

#You will be prompted for a commit message
cvs commit
cvs log b/b.txt
echo "Yet another line" >> b/b.txt
#You will again be prompted for a commit message
cvs commit
cvs diff -r1.1 -r1.2 b/b.txt
#And so on....you edit, commit, diff and track your work using cvs
#e.g. to see all the changes to b/b.txt
cvs log b/b.txt

As you can see - this is straightforward - and keeping annotated versions of your work under CVS will soon become second nature. You can add more sophistication as your projects get larger and the number of team members working on the files increases - but for simple personal projects the commands in this article will pay dividends. An excellent, free, online resource of detailed CVS information is available in Open Source Development with CVS, 3rd Edition by Karl Fogel and Moshe Bar.

Wednesday, December 27, 2006

 

Using RCS to track file changes

Keeping track of your edits to important text files? Don't litter your directories with file.old, file.old-version-1, file.bak and similar untidy file names. Instead, make use of 'RCS' - you only need a few simple commands - and all your changes are tracked carefully, and optionally with descriptive comments. Here's how:

Create a directory called 'RCS'


mkdir RCS

'Check in' your file to the RCS directory


ci -l filename.txt

Enter a descriptive comment, terminated with a '.' on a line by itself. If you don't want to added a comment just type '.'.

Make some edits to your file, and then check it in again


ci -l filename.txt

Again, enter some more descriptive comments, terminated with a '.' on a line by itself

Then if you want to see the history of the file, type:


rlog filename.txt

If you want to see what you have changed recently, and not checked in:


rcsdiff filename.txt

And if you want to compare two specific versions of the file:


rcsdiff -r1.1 -r1.2 filename.txt

Finally, for reference


man rcsintro

Gives a large amount of additional information, but the
small number of commands described here are very powerful for tracking your
local edits to important files - and are highly recommended!

Sunday, November 05, 2006

 

Using sed to process html files

sed is a wonderful tool! There are books, man-pages and generally good resources for sed users online. However, given how useful the tool is - it is hard to master - so I thought I would provide just a little information here.
Say you want to extract the ascii information in an html file for additional processing - how should you do that? Many programs can input html (e.g. aspell used elsewhere here) - and sometimes use the html tags to set font and formatting information. But what if you just want to count characters or words - how do you proceed? The first step is a quick google to 'sed one liners', and there one finds that the command line required is:

sed -e :a -e 's/<[^>]*>//g;/</N;//ba'

or to do this with a specific file
sed -e :a -e 's/<[^>]*>//g;/</N;//ba' filename.html > filename.txt
But how does this command work?

Well, it builds a sed program with two sets (-e) of sed input. The first, :a, sets the branch label to 'a' at the beginning of the sed program. The second command says - if you find a left angle bracket that is followed immediately by any character which is not a right angle bracket ([^>]), and is followed by some characters (*) and a right angle bracket, then globally replace it with nothing (//g). This takes care of <tag> html tags on one line - but what of tags which span lines? Well, they will hit the /</N command which will append the next line into the sed pattern space and then the (//ba) will branch back to the beginning of the sed script (remember the 'a' label?) to continue to the search for the tag to replace with nothing (//g). Simple, elegant and compact!

Don't forget to check out http://sed.sourceforge.net/.

Saturday, October 21, 2006

 

To Do List Pro versus todo.txt

I happened on the following to do list manager recently To Do List Pro. I have been using it for the last week or so - and it works very well. It has some non-standard interface elements and behavior, it has to be said, but it does the job, it keeps track of what and when things have to be done and when they were done automatically. It has an installer which worked fine on Windows XP (I believe that the program was written a while ago), and it can export to Excel format. This tool appears to be more efficient than others that I have tried - and it is integrated with the Windows/Office world that people typically have to work with - so I recommend that you give it a try (the price is right - free!). As someone interested in GTD - I have of course tried toto.txt (who hasn't?) and I prefer To Do List Pro - for one thing toto.txt is ...slow... and, behind the scenes, an unwieldy shell script (which normally I would like but for some reason it seems wrong to manage your todo list with a shell script - an alias to edit a text file maybe - but a multi-hundred line shell script to edit a text file - sounds a little too geeky).

Saturday, September 30, 2006

 

Technorati rank


I just took a quick look at Technorati and the rank of this site. It ranks at 1,164,857 (currently). Wow! Technorati have indexed a huge number of blogs. I thought that I would post the current ranking and then report changes - in either direction. The screen capture evidence follows...


 

How to check blog spelling

Want to run a spell check on your blog or site? Perhaps not every day - but occasionally this can be helpful to remove basic errors. You can combine a wget download with a find command and aspell to get a list of your typos. Here's how:



#!/bin/sh
#check spelling on zztools
wget --mirror -p --html-extension --convert-links \
-P ./ http://zztools.blogspot.com
find zztools.blogspot.com -name "*.html" |
while read file
do
echo $file
aspell -l -H < $file | sort -u
done | less

Put the commands above into a file, then execute the file at the command line, and you will run a spell check of zztools.blogspot.com - add an argument and some error checking to check your own sites.

Wednesday, September 27, 2006

 

File hosting - box.com and other solutions

There are many off-line storage solutions ramping up at present. (And as we all know this has been a constant state of affairs for a while on the web). Ever interested in efficient and low cost computing solutions, I took a moment to look at the options yesterday. Here are the findings:
Box.net, Inc. this is the favorite presently. The interface is clean and modern without being annoying. The price is right (up to 1 GB free after sign-up) and there is no installation on your computer. This is the solution that I am going to move forward with. There is a 10 MB limit on file uploads - this will need some work to deal with. However, for simplicity and price - looks like box.com is the winner.
www.drivehq.com 1 GB free after sign-up and 50 MB uploads with the free account. An advantage is the fact that DriveHQ supports an ftp interface. You need to be able to put your ftp client into passive mode. One problem here is that downloads are limited to 80 MB per month.
www.filewire.com generally suspect looking site! - I tried registering and regretted it when nothing happened with my registration.
www.xdrive.com has potential - but googling revealed many complaints about access times and reliability - so I didn't go forward with this one either.
www.mozy.com looked interesting - but wanted to take over my computer with an extensive installation which included a service monitoring the machine. There was also a slightly ominous threat that a full backup would take 21 hours (for around 1 GB of backup - which Mozy had deduced I needed). This seemed to be overkill for my needs and so I rapidly uninstalled. I got the general impression that the typical, home user, of Windows XP might do ok with Mozy, but I didn't test it further.

Monday, September 25, 2006

 

Blog post values

Steve Pavlina has an interesting post on his site
How Much Is a Blog Post Worth? Would You Believe $2400 Each?

Of course, Steve's calculation contains the significant assumption that the post in question is made on Steve's site, www.stevepavlina.com, and that the post will continue to earn at Steve's monthly average value for a ten year life time. To counter the first assumption one should note that we currently have new blogs starting at a rate of 75,000 per day (or more), and each of these blogs has presumably at least one post. If Steve's figures were realistic for the masses, each day would see the creation of billions of dollars of blog posting assets. This is not the case - as one can tell by any random walk in the blogsphere. The second assumption of a ten year life span for each post is also interesting - will the audience for personal development material have been saturated by the site sooner than ten years? Will the presence of more and more posts on the site diminish or magnify the effect of the other posts, reducing or increasing the value of every post? It is hard to guess - but I suspect that Steve's posts are each carefully adjusted in content to maximize the interest that they receive. Steve's posts are word rich and keep the reader engaged - they offer get rich titles - and this undoubtedly improves the monetary performance of his site. People will stay, link, bookmark and return, if only because there is the fear that the wealth of material cannot be assimilated in the typical 10 minute stay, and that also surely increases the value of his posts.

Sunday, September 17, 2006

 

Making a copy of a web site

wget - that is the command for this task. Here is an example for this site:

wget --mirror -w 1 -p --html-extension \
--convert-links -P ./ \
http://zztools.blogspot.com

The options are:
'-mirror' get a copy of everything on the site
'-w 1' wait for 1 second for each page - this is optional and reduces the load on the site
'-p' get the prerequisites for the page - this means that linked graphics files, for example, will be downloaded too.
'--html-extension' change any files which don't have an html extension to html
'--convert-links' make links in retrieved documents point to their local versions
'-P ./' put the retrieved files into the current working directory

Sunday, September 10, 2006

 

How to make a DVD from a set of mpg files

Once you have collected a set of mpg's from flv's from YouTube and similar places - how do you make your own DVD or make a DVD to give to a friend or relative?


Here we are talking about a normal DVD that you can play on a normal DVD player - not just on your computer - which can be very convenient. Typically TV screens lead to nicer images than computer screens and they are designed to be viewed from many angles. It has to be said that the quality of the video is limited to the quality of the input flv (flash) file. Typically this is not as good as broadcast video - so be prepared for that.


To make a DVD proceed as follows:


ffmpeg -i normal.mpg -target ntsc-dvd dvdmpg.mpg

This converts your normal mpg into an mpg suitable for inclusion in a DVD. If you are in Europe or Japan, you will need to adjust the ntsc-dvd argument. The conversion takes a little while - so you will probably want to leave that running while you do other things.


The next step involves creating the directory structure and file contents of your DVD. The DVD format is quite fussy - and DVDs which do not have the correct format are not recognized by normal domestic players. So, use the 'dvdauthor' command to create your DVD layout. This is simple - you will get a simple continuously viewable DVD without menus etc. if you following these instructions. If you explore the documentation for dvdauthor your can create more sophisticated DVDs too.


The first step is to create yourself a little xml file which describes your DVD


This is what the file (dvdauthor.xml) looks like:



<dvdauthor>
<vmgm />
<titleset>
<titles>
<pgc>
<vob file="temp1.mpg" chapters="0" />
<vob file="temp2.mpg" />
<vob file="temp3.mpg" />
<vob file="temp4.mpg" />
<vob file="temp5.mpg" />
<vob file="temp6.mpg" />
<vob file="temp7.mpg" />
<vob file="temp8.mpg" />
</pgc>
</titles>
</titleset>
</dvdauthor>

Then use the xml file above to create the directory structure for the DVD:


dvdauthor -o example -x dvdauthor.xml

Then use the mkisofs command to create the .iso to burn your DVD:


mkisofs -dvd-video -o example.iso example

You will be left with an iso image file of your DVD that you can then burn to DVD using your preferred DVD burning software

Thursday, September 07, 2006

 

How to save a flash movie from YouTube

Just use the http://keepvid.com/ site to save a flash file locally on your hard-drive for a given video on YouTube (or many other sites that provide video content in flash format).

Wednesday, September 06, 2006

 

How to convert a flash movie into mpeg format

If you download a flash movie from YouTube or Google videos - often it is convenient to convert it to mpeg format. On Linux and Windows (under Cygwin) this can be achieved using ffmpeg, for example:

ffmpeg -i input.flv -ab 56 -ar 22050 -b 500 -s 320x240 output.mpg
 

How to make an mp3 file from an mpg file

Sometimes it is useful to extract an mp3 file from an mpeg or mpg file. This is a task that is handled efficiently using ffmpeg and lame. These tools are available for Linux and for Windows via Cygwin.

Here is an example:


ffmpeg -i example.mpg example.wav
lame -h example.wav example.mp3

Tuesday, September 05, 2006

 

Welcome!


ZZTools - a place to record useful ways of doing things with computers and computing. All comments welcome.