Friday, August 24, 2012

 

Optimizing a PDF from a Scanned Paper for Text

I scanned an old paper recently - and was left with a huge PDF. The PDF was storing multiple images with color, and allowed for too much intensity variation for each pixel - hence its size. Text can be stored more efficiently than that!

So, here is a short script which extracts the pages of a PDF, reduces them to black and white, and reconstructs the PDF. This produced about a factor of ten reduction in size for me, and also improved the legibility as the text is now high contrast black on white.

Yes, the script is crude, and it contains a useless use of cat. Clean up and optimization are left as exercises to the reader...

#!/bin/sh

i=0
while [ $i -lt 27 ]
do
  i=`expr $i + 1`
  echo $i
  d=`echo $i | awk '{printf "%02d",$i}'`
  echo $d
  pdftk A=paper.pdf cat A$i output page$d.pdf
  pdftoppm page$d.pdf -gray eh
  cat eh-000001.pgm | pnmquant 2 | pgmtopgm | \
  pamditherbw -threshold |  pnmtops -nocenter -imagewidth=8.5 > tmp.ps
  ps2pdf -dPDFSETTINGS=/ebook tmp.ps
  mv tmp.pdf newpage$d.pdf
done

pdftk newpage*.pdf cat output newcombined.pdf

Monday, June 25, 2012

 

American Airlines Fail Again


Here are the facts:
1) Weather delayed in the inbound flight for AA 1339 from JFK on 25-Jun-2012.
2) The plane was then boarded and had a departure slot (passengers organized the boarding).
3) The crew then announced that it had used up its hours and passengers had to leave the plane.
4) A new crew was then assembled and the plane eventually took off five hours late.
5) On arrival in Chicago I was told that the entire delay was weather related and I would have to pay for a hotel (for a few hours).

Well - I suppose this means that the end is nearer for American than I had realized. I'll never fly them again.
Throughout the inefficient ordeal that I was subjected to, I spoke to many American Airlines employees. The Chicago booking agent was particularly obnoxious! The rest were just incompetent, and keen to finish their shifts. The only person I spoke to who was remotely helpful was working in the baggage area in Chicago airport. He was ex-military and said that he was sick of working for the jobsworths and wanted to leave as soon as possible.
Not exactly a ringing endorsement from someone prepared to risk life and limb for America.
Somehow it does not seem right that an airline that bears the name 'American' should be such an advert for the gross inefficiencies and incompetence.

Update 6/27/2012:
I submitted a complaint to American Airlines online. We'll see what becomes of that. If it is anything other than the standard automated response (already received, along with the helpful 'do not reply to this message'), I'll report it here.

Thursday, June 21, 2012

 

More from the Microsoft Clown Troupe: KB2709162, KB2686828, KB2685939, KB890830, KB2656369

Well thanks Microsoft - I'm on the road trying to do some work and the silly update icon lights up. So I carefully check what you think that I need to be updated on. I determine that the diskspace requirements of KB2709162, KB2686828, KB2685939, KB890830, KB2656369 which you, Microsoft, say that I need, are quite modest. So, I start the installation. Big mistake. Several hundred megabytes of diskspace later, I find that the installer has hung and I'm out of diskspace. I wasn't doing anything else on the machine - this is just the normal Microsoft inability to test anything before spamming the world with it. What a bunch of morons - sitting on a huge monopoly - raking in the cash - and occasionally destroying many working hours, for presumably millions of victims around the world. The end for Microsoft cannot come soon enough with this level of poor attention to detail. This is what the install screen says prior to it causing my system to cease up with lack of diskspace. Now I have to kill everything and find out what those Microsoft morons have done to my hard disk so that I can get back to work. If anyone wants to start a class action law suit - drop me a line!
Initializing installation... done!
Installing Security Update for Windows XP (KB2709162) (update 1 of 9)... done!
Installing Security Update for Microsoft .NET Framework 2.0 SP2 on Windows Server 2003 and Windows XP x86 (KB2686828) (update 2 of 9)... failed!
Installing Microsoft Browser Choice Screen Update for EEA Users of Windows XP  (KB976002) (update 3 of 9)... done!
Installing Security Update for Windows XP (KB2685939) (update 4 of 9)... done!
Installing Cumulative Security Update for Internet Explorer 8 for Windows XP (KB2699988) (update 5 of 9)... done!
Installing Windows Malicious Software Removal Tool - June 2012 (KB890830) (update 6 of 9)... done!
Installing Security Update for Microsoft .NET Framework 2.0 SP2 on Windows Server 2003 and Windows XP x86 (KB2656369) (update 7 of 9)... 
Canceling updates... 

Tuesday, May 22, 2012

 

Microsoft Fail Again KB2518864 KB2572073 KB2633880

Microsoft updates are irritating at the best of times. Today I have my XP machines constantly moaning about 'essential .NET updates' to prevent unauthorized intruder attacks. So I allow the updates to proceed. But then the little update icon stays presents and prompts about the same updates. So I follow the update procedure again and AGAIN. Eventually I observe that the Windows update is doing the same thing over and over again, applying exactly the same update, with no errors logged anywhere, but still prompting for the same update on completion. Pathetic - someone at Microsoft forgot to bump a version number or flag with these updates, and then someone else forgot to do any testing....and the rest of the world then gets to spend millions of hours wondering why they pay Microsoft so much money to waste their time.

The updates concerned are these: KB2518864 KB2572073 KB2633880. When you find yourself in this update infinite loop - take a moment to consider whether now is the time to upgrade to Ubuntu.

I'll update this post when Microsoft get their act in gear...

...which they may have done as of 23-May 2012 because the irritating constant prompting for updates has gone away. As someone said in comments, there is additional information here: http://social.technet.microsoft.com/Forums/en-US/smallbusinessserver/thread/2f0bbb7d-fc28-4c32-bf63-54cf5a6615d2. However, having flipped through that lengthy thread I have no idea what, if anything, Microsoft did to fix their problem. Almost certainly they have kludged their way around the problem - because the flag just went away on my machine without any action on my part. I do wish that Microsoft would explain what the problem was, what they did to fix it, and whether or not they are going to improve their processes to avoid such problems in the future. Well we can but dream, can't we.

Ubuntu is looking good by the way!

Wednesday, May 02, 2012

 

Limiting Rsync Bandwidth Usage

If you want to backup a directory to a remote computer - but don't want to drag down your network's bandwidth you can use the --bwlimit option...
cd directory
rsync --rsh='ssh -p 12345' --bwlimit 20 -avrzogtp . username@remote.servername.com:/users/username/directory | tee -a backup.txt

Wednesday, February 01, 2012

 

Monitor an Office with a MacBook

I was interested in monitoring a room recently, and decided to use the built-in camera in a relatively elderly MacBook for this purpose. The procedure is extremely simple:
1. Display a video image from the camera on the screen using a suitable program
2. Run the script below, which saves a complete screen capture, compares it with the previous capture, and saves the image if there are significant differences
3. Turn off the display with F1
For additional impact, I also arranged for the MacBook to mail me updates - so if there is motion in the monitored area I receive an email notification. Because the entire screen is capture, each image is date stamped. (The email messages are also given the current date and time as a subject line). Here is the script:

#!/bin/sh

THRESHOLD=30
DELAY=5

count=0
while true
do
/usr/sbin/screencapture -m -tjpg -x /Users/username/Desktop/pictures/new.jpg
if [ ! -f old.jpg ]
then
cp new.jpg old.jpg
fi
/usr/local/bin/djpeg -ppm new.jpg > new.ppm
/usr/local/bin/djpeg -ppm old.jpg > old.ppm
~/bin/pnmpsnr new.ppm old.ppm > yup 2>&1
Y=`awk '/Y color/{print int($5)}' yup`
echo "Y is $Y"
if [ $Y -lt $THRESHOLD ]
then
count=`expr $count + 1`
name=`echo $count | awk '{printf "%06d.jpg", $1}'`
echo "Saving: " $name
cp new.jpg $name
uuencode $name $name > t
SUBJECT=`date`
EMAIL="xyz@abchlk.com"
/usr/bin/mail -s "$SUBJECT" "$EMAIL" < t
else
echo "No significant change in the images"
fi
cp new.jpg old.jpg
sleep $DELAY
done

I drew inspiration for this script from David Bowler's page here. Thank you David. There are a few things that you need to do to get this work correctly. Firstly, you need to be able compile djpeg and netpbm. Apple make building open source software a massive pain (I wonder why, largest company in the world?). Anyway, these things can be done fairly easily. I'll add the necessary links below.

Xcode is needed, just for the gcc compiler. How Apple get away with making an open source compiler their own, and making it so difficult to install I don't know. But fortunately, they have paid off the right politicians, so no anti-trust cases for Apple. I used the (only) 800M download here to get the MacBook up to speed with gcc.

djpeg is needed to turn the jpg screen capture into a ppm file. Just grab the source from here, and configure and make it using your newly acquired gcc compiler.

netpbms is needed for the utility which compares two ppm files, called pnmpsnr. I grabbed the source from here, and built it using make and gcc as usual. The build was not entirely error free (!) but pnmpsnr was built just fine. This is just a program that reads and processes ascii files, so it isn't exactly the most demanding software engineering activity.

Wednesday, November 23, 2011

 

Nice PDF Files from Ascii

If you want to convert ascii into PDF - and you want to do it with a little style, use the following:

a2ps –media=Letter -o – example.txt | ps2pdf – example.pdf

Drop the -media=Letter option if you want an A4 pdf file.

Wednesday, November 09, 2011

 

Monitor a Woot-Off! Bash Script

Here is a simple script to monitor the Woot site, when a 'Woot-Off' is underway. The script can easily be customized to perform a similar function for any desired site. I use it under Cygwin on Windows, and when the Woot site is updated, a 'bell' is sounded, so that if I'm by this machine, I can take a look to see if the new item is the elusive 'bag of crap'! (So far no success - you need to be fast or lucky).

The script waits 30 seconds + a random number of seconds between 0 and 60 prior to its visits to the Woot site. The sound creation step is customized for Cygwin, if you are on Linux or a Mac, just use:
echo -en "\007"
(or something similar).



#!/bin/sh

while true
do
wget -O - "http://www.woot.com/" | \
grep "meta property=\"og:title\" content=" \
> tmp.txt
diff tmp.txt tmpold.txt > /dev/null
ret=$?
if [ $ret -eq 0 ]
then
echo "NO CHANGES"
echo "Current item"
cat tmp.txt
else
echo "CHANGES!"
cat `cygpath -W`/Media/ding.wav > /dev/dsp
cp tmp.txt tmpold.txt
fi
SLEEP=`awk 'BEGIN{srand(); print int(rand()*60+30)}'`
echo "Sleeping for " $SLEEP " seconds"
sleep $SLEEP
done

Friday, May 20, 2011

 

A Subversion (SVN) Tutorial in a Bash Script

A simple bash script which creates a subversion (svn) repository and conducts some very simple operations on that repository. You can use this script to learn how subversion works, and how to work with subversion. The script will create a svn directory in your home directory, and trunk and branch (called 2xxx-01-01), directories. These can optionally be removed at the end of the script.


#!/bin/bash

REPODIR=$HOME/svn
REPO=file://$REPODIR

function svn_cmd() {
echo "Comment: " $1
if [ -z "$3" ]
then
echo "Executing: " $2
$2
else
echo $1 > tmp.txt
echo "Executing: " $2 --file tmp.txt
$2 --file tmp.txt
rm -f tmp.txt
fi
echo ""
}

if [ -d $REPODIR ]
then
echo "Repository alread exists, skipping creation"
echo "(delete $HOME/svn if you want to start from scratch)"
else
svn_cmd "Creating repository" "svnadmin create --fs-type fsfs $HOME/svn"
fi

# create, remove, and create the trunk
svn_cmd "What does the repository contain?" "svn ls $REPO"
svn_cmd "Creating trunk dir" "svn mkdir $REPO/trunk" cmt
svn_cmd "What does the repository contain?" "svn ls $REPO"
svn_cmd "Removing trunk dir" "svn rm $REPO/trunk" cmt
svn_cmd "What does the repository contain?" "svn ls $REPO"
svn_cmd "Creating trunk dir" "svn mkdir $REPO/trunk" cmt
svn_cmd "Checkout trunk" "svn co $REPO/trunk"

cd trunk
echo "line 1" > test.txt

# add a file, and a branch
svn_cmd "Adding test.txt" "svn add test.txt"
svn_cmd "Initial commit" "svn commit" cmt
svn_cmd "Creating branches dir" "svn mkdir $REPO/branches" cmt
svn_cmd "Create a branch" "svn copy $REPO/trunk $REPO/branches/2xxx-01-01" cmt

cd ..

# change the file on the branch
svn_cmd "Checkout branch" "svn co $REPO/branches/2xxx-01-01"

cd 2xxx-01-01
echo "line 2" >> test.txt

svn_cmd "Commit on the branch" "svn commit" cmt
svn_cmd "Checking branch history" "svn log -v test.txt"

cd ../trunk

# check the history
svn_cmd "Checking trunk history" "svn log -v test.txt"
svn_cmd "Diff trunk and branch" "svn diff $REPO/trunk $REPO/branches/2xxx-01-01"

# remove the trunk and branches directories
svn_cmd "Removing trunk dir" "svn rm $REPO/trunk" cmt
svn_cmd "Removing branches dir" "svn rm $REPO/branches" cmt
svn_cmd "What does the repository contain?" "svn ls $REPO"

cd ..

echo "Enter y to: rm -rf svn trunk 2xxx-01-01"
read a
if [ "$a" = "y" ]; then
rm -rf $HOME/svn trunk 2xxx-01-01
fi

Tuesday, March 22, 2011

 

Script To Monitor Disk Space Use

Here is a simple script to monitor disk space usage in real time. I put this together because I am frequently frustrated to find that Windows (often during updates) or some other badly organized program has decided to use more disk space than my lowly laptop can provide. The Linux "df -h ." provides the necessary information, and it is a simple matter to put this command into a loop. For additional convenience, the script handles reporting to the screen with only carriage returns (no line feeds) so you won't chew up the windows complete contents with a long list of disk space information, no matter how long you run the script. Here is the typical output:

ctrl-c to stop
Filesystem Size Used Avail Use% Mounted on Time
C:/cygwin 75G 71G 4.3G 95% / Tue Mar 22 15:28:29 PDT 2011

and the script itself:

#!/bin/sh

#monitor disk space usage

HEADING=`df -h . | head -1`
echo "ctrl-c to stop"
echo "$HEADING Time "

while true
do
CURRENT=`df -h . | tail -1`
CURRENT=$CURRENT" "`date`
echo -e -n "$CURRENT\r"
sleep 3
done

Now I can keep an eye on what Windows is subjecting me to, in approximately real time.

Tuesday, March 08, 2011

 

BBC Radio 4 and KIRN 670 am on an iPhone

I do not know much about iPhones, I have to confess. However, I recently wanted to arrange for KIRN 670 am (Persian radio from Los Angeles) and Radio 4 to be available on an iPhone. How should one proceed? Here is what I did.

First, install the free FStream app for the iPhone. This seems to be a good, free, stream player. It also has record capabilities. Install FStream as you would any other app on the iPhone.

Second, set up the channels that you want. In FStream, go to Favorites, click Edit, and click 'Add new webradio'. At this point, you need to provide a name for the channel that you are about to create. This is just some text, you can enter whatever you would like, e.g. Radio 4 or KIRN. Next you need to supply the URL of the channel. This is more difficult to determine. I found the Radio 4 URL by Googling (it is: http://bbc.co.uk/radio/listen/live/r4.asx) and the KIRN URL by using 'netstat -an' in Cygwin, when using the KIRN 670 am browser radio. (The KIRN URL is: http://174.36.167.220:9000). Enter the appropriate URL for your radio channel (or stream as it should probably be called) and save your new favorite.

Then under play, click on the favorite you just entered, and you should find yourself listening to your newly entered channel. If you get the URL wrong for your channel, the error message won't be very clear. You'll see FStream trying to connect, but then the message 'Disconnected' will appear. If you see this, go back and check the URL in Favorites, you need to get every character exactly right.

I have not checked the bill recently - so be careful with respect to your data usage. But, this should let you use your iPhone as a radio receiver for BBC Radio 4 and KIRN 670, what more could you possibly want?

Sunday, August 01, 2010

 

Fixing Non-Printing Characters in Cygwin Man Pages

After upgrading to Cygwin 1.7 (which fixes some annoying socket problems for me) one residual problem was the display of certain characters in man pages. As this problem seemed to effect the emboldening of options like '-d', this made man pages virtually unusable. So, I have now found, relatively empirically, that this can be fixed by adding the following lines to your ~/.profile file:
LANG=ISO-8859-1
export LANG

If you make this update, and open a new window, you should find that man pages return to their former, readable, glory.

Saturday, July 24, 2010

 

NVidia Not Microsoft (This Time!)

I was tortured today by a movie that I was trying to include in a PowerPoint 2007 presentation. This particular movie had a white background and looked just fine in the Windows Media Player. And it looked just fine when first inserted into PowerPoint 2007, but when clicked, it turned into a black rectangle. So, as one does, I Googled. I found that Microsoft, with all its extraordinary resources, does not know how to handle movie file names over 128 characters. So, I fiddled around with my directory structure, and suddenly I could actually see the movie. Better - but still tediously broken as the white background suddenly became grey when the movie was clicked.

How frustrating. There are various solutions on the Web. I upgraded the Windows Media Player - that did nothing. I switched to using a VLC embedded active X control. That crashed PowerPoint and I lost some edits. Pretty typical mileage for those of us that have to suffer Microsoft's programming excellence. The visual effect was so bad, that I decided to try harder. I investigated changing the gamma and contrast of the movie. Very tedious - but no success. Then I decided to investigate my graphics card settings. I switched off hardware acceleration entirely - and the problem was gone. Now that is an easy solution - just turn off hardware acceleration before giving the presentation - I should be able to manage that.

I am annoyed that I wasted so much time finding this small tweak - and hope that if you have the problem "white becomes gray" with an inserted video in a PowerPoint presentation - you'll try this solution early.

I still hate PowerPoint 2007 - but on this occasion I must admit that most of the fault seems to lie with NVidia and their irritating attempts to improve the performance of their cheap graphics chipset in my Dell Latitude 830. (And yes, before you ask, I have upgraded the driver).

Mind you - it would be good if Microsoft hadn't made PowerPoint 2007 completely awful with their silly ribbon bar fiasco, and wouldn't it be great if instead of messing up the interface, they had made it possible to use a file path of over 128 characters? But if you have the "white becomes gray" problem - just trying switching off hardware acceleration before you try anything else.

Tuesday, June 29, 2010

 

Archiving recently modified files

If you have recently added to your mp3 or jpg library and want to backup just new additions, here is the command to create a tar archive of files modified in the last 2 days.
find . -mtime -2 -print | sed -r "s/([^a-zA-Z0-9])/\\\\\1/g" | xargs tar -rvf new.tar
The find command finds the files, the sed command escapes non-ascii characters, the xarg command fires up tar commands which append to the tar archive called 'new.tar'. The find option '-mtime -2' means files that have been modified in the last day days.

Wednesday, June 16, 2010

 

Copying A Set of Files from one Machine to Another

How do you set about saving your many and varied jpg and mp3 files from an old machine?

There are a variety of possible strategies. Assuming we neglect those that presume that you have your files in a carefully organized directory structure, or a nice secure backup, here is a pragmatic approach.

First - make sure that you are using Linux, Mac OS X, or Cygwin on Windows. Then use find to collect a list of the files and their complete files names that you wish to save. For example, if you wanted to save all files with the suffix 'mp3', you would use:

find . -name "*.[mM][pP]3" -print > mp3files.txt

That will give you a list of mp3 flies in the text file 'mp3files.txt'. Then go over to the machine that you wish to copy the files to, and use a command like this to collect the files on the other machine:

ssh User@1.2.3.4 "cd /cygdrive/c; tar -T mp3files.txt -cvf -" | tar xvof -

This command executes an ssh command to the machine that has the files, goes to the root directory, then tar's the files to standard out, making use of the mp3files.txt list. Meanwhile back on the receiving machine, tar reads standard in and extracts the files. Hence you create a faithful copy of the files and directory structure on the receiving machine.

Why didn't I use rsync, you might inquire. Well - I tried and I found that rsync, on this particular version of Cygwin had a habit of hanging. Meanwhile tar and ssh do the job just fine.

This method has advantages too. For example, if you want, you can remove files that you do not want to copy from mp3files.txt prior to doing the copy. So you have a high degree of control over what gets copied and what gets left behind.

Thursday, May 20, 2010

 

Script to Convert WMA Files to MP3 Format

I changed work computers recently and on my new computer I needed to convert 16 CDs to .mp3 format. I didn't want to install new programs in the process, I just wanted to painlessly carry out the conversion. Putting the first CD into the drive led to Windows Media Player being launched and offering to 'Copy from CD' - so I accepted that offer. After a little whirring, what I ended up with was a new 'Unknown Artist', and the content of the CDs as '.wma' files under "My Documents\My Music". Well, not exactly perfect because I prefer .mp3 files (because these are the most likely to work in any given mp3 player). Looking around at the options in Windows Media Player interface indicated that Microsoft were being typically unhelpful in not allowing users to store their music in formats other than .wma. However, using the various tools added to this machine's cygwin to deal with video, etc., I found it was easy to carry out the conversion. The conversion is a little slow, so rather than navigate all those silly directory names (containing irritating spaces...) I wrote a short script which traverses the "My Music" folder and converts .wma files to their more useful .mp3 cousins. Here is the script.

#!/bin/sh

scandir () {
for item in *
do
if [ -f "$item" ] ; then
curdir=`pwd`
nfile=`expr $nfile + 1`
EXT=`echo "$item" | rev | cut -c 1-3 | rev`
if [ $EXT = "wma" ]
then
ld2=`echo "$curdir"/"$item" | wc -c`
ld2=`expr $ld2 - 5`
echo "Operating on:"
echo "$curdir"/"$item"
NEWNAME=`echo "$curdir"/"$item" | cut -c-${ld2}`.mp3
echo "Will create:"
echo $NEWNAME
if [ -f "$NEWNAME" ]
then
echo "$NEWNAME exists - no action taken"
else
ffmpeg -i "$curdir"/"$item" tmp.wav
lame -h tmp.wav "$NEWNAME"
rm -f tmp.wav
fi
fi
elif [ -d "$item" ] ; then
cd "$item"
scandir
ndirs=`expr $ndirs + 1`
cd ..
fi
done
}
startdir=`pwd`
ld1=`echo $startdir | wc -c`
ld1=`expr $ld1 + 1`
echo "Initial directory = $startdir"
ndirs=0
nfile=0
scandir
echo "Total directories searched = $ndirs"
echo "Total files = $nfile"

Tuesday, May 18, 2010

 

Diffing Directories Using SSH

I found myself needing to check the contents of directories across a network today. Finding myself inhibited by the rsync argument complexity - I quickly hooked up the following simple script. If you give the script the IP address of the remote machine (argument 1), and the absolute path on the remote machine (argument 2) to the directory you want to compare with the current working directory, the script lists the files in the remote directory and compares that list with the current working directory. Used in this mode, the script provides a quick sanity check that the directories are indeed equivalent. If you want a more careful check, supply a third argument, and the script uses cksum to compare every file's check sum. I kept the error checking deliberately light - in the interests of expediency and as an exercise for anyone that wants to improve the script.

#!/bin/sh

#provide a simple, remote directory diff

echo "arg1 is the username and port": $1
echo "arg2 is the remote directory, complete path": $2

#a third argument means use cksums

if [ $# -eq 2 ]
then
ssh $1 "cd $2; ls -1a" > /usr/tmp/dir1.txt
ls -1a > /usr/tmp/dir2.txt
else
ssh $1 "cd $2; find . -exec cksum {} \\;" | sort -k3 > /usr/tmp/dir1.txt
find . -exec cksum {} \; | sort -k3 > /usr/tmp/dir2.txt
fi
diff /usr/tmp/dir1.txt /usr/tmp/dir2.txt
ret=$?
if [ $ret -eq 0 ]
then
echo "The directories are identical"
fi

Wednesday, April 21, 2010

 

rsync with a Non-Standard Port

Well, I found this slightly unintuitive. If you want rsync to use a specific non-standard port, and you don't have the rsync daemon running, you need to force ssh to use this port, like so:

rsync -ar --partial --progress --rsh='ssh -p 12345' username@xyz.abc.com:/path/to/your/files .

(or you are going to get errors referring to port 22 despite your best efforts to avoid this port with --port=12345).

Friday, April 02, 2010

 

Robust File Transfers With Rsync

Here is a useful command which allows you to copy a file from a server to your machine or vice versa. Instead of using scp it uses rsync. The advantage of rsync is that if your connection breaks during the transfer, and you are left with a partial file, when you restart the command, rsync will pick up from where it left off. And this, of course, will save you time.

rsync --partial --progress --rsh=ssh user@yourserver.xyz.com:/home/path/to/file.zip .

Saturday, March 27, 2010

 

Upgrading From Cygwin 1.5 to Cygwin 1.7.2-2

I had an interesting experience recently - my trusty Dell Latitude D610 developed a motherboard problem - and I needed to rapidly transfer my work to a Dell Latitude D830 running Windows XP.

Now, I had installed Cygwin 1.5 on the D830 a year or so ago - and had seen that it had various socket related problems. I had got around to fixing such issues for rsync by rebuilding the executable with 'socket pairs' (whatever they are) turned off. However, many other similar problems existed such as lock ups with web servers running under Cygwin, and even intermittent hangs with wget. While this wasn't my main machine - I didn't need to solve them - but now I did.

Looking at the Cygwin site indicated that an upgrade to Cygwin 1.7 would be the best solution.

So, I loosed off the Cygwin setup.exe program - as usual with some trepidation. This did its normal slow and mysterious thing, complained about various files being locked (I had left some Cygwin services running - oops), and eventually left me with the usual Cygwin shortcut on my desk top. (Additionally, there were three annoying desktop icons for a program called Singular, something mathematical, and they were rapidly deleted - who on earth thought that every Cygwin installation should have those icons dumped on the Desktop!).

However, there was a problem. This did not seem to be caused by file mounts, as advertised by the Cygwin site. Instead, starting a Cygwin shell resulted in a hung rxvt window. Poking around on the web provided no clues. However, I empirically found that changing my Cygwin.bat file to the following caused normal service to return - the necessary change from version 1.5 was the inclusion of the complete path for the bash shell.

@echo off
C:
chdir C:\cygwin\bin
rxvt +rv -sr -sl 10000 -bg White -fg Black -fn 'Courier New-18' -e /usr/bin/bash --login -i
pause


And with that, everything was back to normal. And so far, no tedious socket related hangs have occurred.

Friday, December 25, 2009

 

Did Keith Briffa Leak the Climategate Documents?


I had a comment on one of my posts on the climategate email archive, saying that the messages weren't leaked, they were hacked. If you were the person that left that message, apologies - it looks as though I accidentally deleted it.

Here's my view - and this is very much my view.



I do not think that climategate was caused by hackers, or involved theft. Here's why.

The chances of teams of hackers monitoring the email to and from the CRU for 12 years are very low. During that time, the nature of the CRU servers undoubtedly changed several times, the 'hackers' would have had to gain access to each new technology and network setup, and would have had to have invested vast amounts of time in assembling a coherent storyline in the emails. How could anyone have intercepted all of that information, all the junk mail, all of the routine academic work, and then processed that into a coherent story, specially timed to coincide with a climate meeting in Copenhagen? Simply using up that amount of network bandwidth alone would have led to detection years ago.

If the hackers were supposed to have downloaded a vast amount of email and cherry picked the resulting set in a short period of time, how can one explain the more than a decade of emails which are in the archive? No University IT department would keep decades of emails on its server, and to achieve this feat, the hackers would have needed to gain access to large numbers of machines, probably many laptops, kept in different locations and used by different people. No, this too is highly improbable.

And if the 'hackers' wanted to change the course of history, the time to do that was in the 1990s, when showing the 'hide the decline' plot and the 'hockey stick' to be the frauds that they were would have stood a chance of altering major climate legislation. By the time of the Copenhagen meeting, the laws were all in place, and the majority of politicians and voters believed that CO2 taxes were vital to avoid imminent disaster.

So, no, the hacker scenario is an obvious smokescreen. Similar to the comments seen in the media about the 'trick' being an intellectual device, and 'hiding the decline' being taken out of context.

By the way, if you take two diverging quantities, and replace one with the other, then plot the result against time, it is not surprising that the original divergence has disappeared. The 'hide the decline' act was so grossly unscientific and cynically manipulative that every scientist in the world should have, like George Monbiot, called for the resignation of the scientist involved.

But as it turned out, most scientists did not read those emails, as the media did not report them. So, the climatologist strategy of saying 'nothing to worry about here - just move on' appears to be working just fine.

But returning to climategate itself. I think that the leak scenario is far more likely. With a leak the long time line is explained. An insider decided to collect some information to protect himself or herself should things get too bad, or his or her conscience become too taxed. That person was copied on a large amount of the email. That person also had access to other people's email, through loosely protected email servers over time. This was internal access within the CRU itself, not hacking.

Finally the source of the leak had the necessary experience to determine which elements of the internal CRU dialog should be released. So that person was not an IT person or an administrative member of staff, that person knew the details of the climatology in question.

If you look at the messages, it is clear that Keith Briffa is the majority recipient or a sender of the the messages. Keith Briffa does not engage in any of the 'unfortunate' behavior patterns documented in the messages. In fact, Keith is shown being shoved along in conducting his research to achieve the (pre-)ordained answer.

So I think that the emails were leaked by Keith Briffa. As he was a CRU official, and the emails were the communications of public servants, conducting research using public data, the release of the messages was not theft or hacking but the action of a whistleblower.

It will be interesting to see what emerges from the investigation which is ongoing at the University of East Anglia.

There will surely be a strong temptation for the University to continue with the 'nothing to see, please move along' messaging which is currently being used.

But, just as with Watergate, at some point the source of the information will be revealed. It will be interesting to see whether my guess has any validity!

Monday, December 14, 2009

 

Climategate-UK Science Community Intersection

In the light of the Climategate leak, I was intrigued to read the following statement from UK scientists:

From: http://www.timesonline.co.uk/tol/news/environment/article6950783.ece

We, members of the UK science community, have the utmost confidence in the observational evidence for global warming and the scientific basis for concluding that it is due primarily to human activities. The evidence and the science are deep and extensive. They come from decades of painstaking and meticulous research, by many thousands of scientists across the world who adhere to the highest levels of professional integrity. That research has been subject to peer review and publication, providing traceability of the evidence and support for the scientific method. The science of climate change draws on fundamental research from an increasing number of disciplines, many of which are represented here. As professional scientists, from students to senior professors, we uphold the findings of the IPCC Fourth Assessment Report, which concludes that "Warming of the climate system is unequivocal" and that "Most of the observed increase in global average temperatures since the mid-20th century is very likely due to the observed increase in anthropogenic greenhouse gas concentrations".

So, I wondered, how many of the 1702 scientist signers are referred to in the CRU archive?

I wrote a little script to do the analysis, this just reads the names from the list of signers, and uses grep to find exact matches in the CRU email folder. Here is the script:


awk '{FS="\t";sub("Dr ", "");sub("Dr. ", "");sub("Prof ", ""); print $1;}' \
uk_names.csv | while read FULLNAME
do
FULLNAME=`echo $FULLNAME | sed 's/,//g'`
echo -n $FULLNAME
grep "$FULLNAME" *.txt > /dev/null
ret=$?
if [ $ret -eq 0 ]
then
FILENAMES=`grep -l "$FULLNAME" *.txt`
echo -e -n "," $FILENAMES
else
echo -e -n ", no exact match"
fi
SURNAME=`echo $FULLNAME | awk '{print $(NF)}'`
grep "$SURNAME" *.txt > /dev/null
ret=$?
if [ $ret -eq 0 ]
then
FILENAMES=`grep -l -i "$SURNAME" *.txt`
echo -e -n ",?" $FILENAMES
fi
echo ""
done

The script assumes that you have a file called 'uk_names.csv' containing the names of the scientists in the same directory as the mail messages.

The conclusion is that at least 59 of the signers, mainly senior people, judging by their titles, are referenced in the CRU climategate emails. These are just the perfect matches to people's full names, there are likely many more matches if you consider different ways of handling initials, etc.

Perhaps the instigator of this 'science by consensus' message, sent out a message to fellow leaders, asking for as many names as possible to be rounded up, to sign.

What should one conclude from this? This group of respected scientists see no cause for scientific concern. Indeed, they are absolutely confident in the scientific integrity of peer review, etc. Well, to paraphrase Mandy Rice-Davies, 'They would be, wouldn't they'!

For reference I have posted the output of the script here: http://www.scribd.com/doc/24090185/UkNamesCRUMailIntersection

And here are some of the names on the list of signers and some of the files from the CRU leak which contain their names:


Julia Slingo, 1217431501.txt
John Mitchell, 0925507395.txt 0998926751.txt 1031923640.txt 1157473748.txt 1167928837.txt 1168022320.txt 1206628118.txt 1237480766.txt 1255095172.txt 1255100876.txt
Pete Smith, 0942953601.txt 0984799044.txt
John Waterhouse, 1106934832.txt 1111085657.txt
Gareth Jones, 0919310505.txt 1031923640.txt
Martin Widmann, 0994187098.txt
Jo House, 0984799044.txt
Colin Prentice, 0848695896.txt 0939437868.txt 0942953601.txt 0984799044.txt
Paul Valdes, 0906136579.txt 0912095517.txt 0929392417.txt 1106346062.txt
Eric W Wolff, 1137184681.txt 1239572061.txt 1240254197.txt 1240398230.txt
Andy McLeod, 1038859764.txt
Gabi Hegerl, 1036182485.txt 1061298033.txt 1061625894.txt 1067194064.txt 1067450707.txt 1092167224.txt 1109684442.txt 1123163394.txt 1123514677.txt 1141393414.txt 1154697504.txt 1155497558.txt 1155832288.txt 1158680269.txt 1158770262.txt 1200059003.txt 1200090166.txt 1217431501.txt 1219844013.txt 1224035484.txt 1252672219.txt
Sandy Tudhope, 1106946949.txt 1212686327.txt 1258039134.txt
Simon Tett, 0845217169.txt 0906042912.txt 0906136579.txt 0919310505.txt 0929392417.txt 1001695888.txt 1053457075.txt 1059664704.txt 1059674663.txt 1060021835.txt 1106934832.txt 1107191864.txt 1124994521.txt 1139323214.txt 1151577820.txt
Peter Cox, 0906136579.txt 0925507395.txt 1217431501.txt 1236958090.txt
Chris Turney, 1236958090.txt
Richard Jones, 0968705882.txt 0968774000.txt 0968941827.txt
Sir John Houghton, 0845217169.txt 0900972000.txt 0929985154.txt
Stephen Sitch, 0942953601.txt 0984799044.txt
Cath Senior, 1217431501.txt
David Parker, 0929985154.txt 1097159316.txt 1101999700.txt 1103583356.txt 1103647149.txt 1113941558.txt 1168288278.txt 1177158252.txt 1177163150.txt 1177423054.txt 1177534709.txt 1182346299.txt 1184779319.txt 1206549942.txt 1233245601.txt 1233249393.txt 1234277656.txt 1234302123.txt 1249503274.txt
David Sexton, 1176746137.txt 1182179459.txt 1199466465.txt
Gareth Jones, 0919310505.txt 1031923640.txt
Peter Stott, 0919310505.txt 1031923640.txt 1200059003.txt 1200090166.txt 1207158227.txt 1219844013.txt
Vicky Pope, 1182179459.txt
James Murphy, 1217431501.txt
Keith Williams, 1217431501.txt
Olivier Boucher, 1217431501.txt
Peter Thorne, 1094483447.txt 1112622624.txt 1113941558.txt 1191550129.txt 1196877845.txt 1196882357.txt 1199286511.txt 1199994210.txt 1200003656.txt 1200010023.txt 1200112408.txt 1200162026.txt 1203631942.txt 1209143958.txt 1211911286.txt 1212009927.txt 1212026314.txt 1212067640.txt 1212088415.txt 1222901025.txt 1231257056.txt 1234277656.txt 1234302123.txt 1242132884.txt 1242136391.txt 1245773909.txt 1258053464.txt
Philip Brohan, 1060021835.txt 1146252894.txt 1151577820.txt 1153424011.txt 1236958090.txt
Chris Folland, 0925829267.txt 0926087421.txt 0929985154.txt 0969308584.txt 0970664328.txt 0990718506.txt 1101999700.txt 1103647149.txt 1111417712.txt 1167752455.txt 1167754725.txt 1167928837.txt 1168022320.txt 1168356704.txt 1199984805.txt 1207158227.txt 1226500291.txt 1231166089.txt 1231190304.txt 1231254297.txt 1231279297.txt 1231350711.txt 1254751382.txt
Roger Saunders, 1234277656.txt 1234302123.txt
Simon Brown, 0990718506.txt 1065128595.txt
Tim Johns, 1231166089.txt 1231190304.txt 1231254297.txt 1231279297.txt 1231350711.txt
Craig Wallace, 0925823304.txt
John Shepherd, 0930934311.txt 0937153268.txt 0951431850.txt 0959187643.txt
Jim Hall, 1208278112.txt 1211040378.txt 1211215007.txt 1211225754.txt 1211491089.txt 1211816659.txt 1219078495.txt
Mark New, 1035838207.txt 1208278112.txt 1211040378.txt 1211215007.txt 1211225754.txt 1219078495.txt
Myles Allen, 0919310505.txt 0970664328.txt 1008167369.txt 1018045075.txt 1018889093.txt 1018893474.txt 1123163394.txt 1139323214.txt 1154697504.txt 1163715685.txt 1163771694.txt 1164120712.txt 1199286511.txt 1200059003.txt 1200090166.txt 1217431501.txt 1219844013.txt 1224035484.txt 1252672219.txt 1255318331.txt 1255352257.txt 1255352444.txt 1255496484.txt 1255523796.txt 1255530325.txt 1255532032.txt 1255550975.txt 1255553034.txt 1255558867.txt
William Ingram, 0925507395.txt
Peter Thorne, 1094483447.txt 1112622624.txt 1113941558.txt 1191550129.txt 1196877845.txt 1196882357.txt 1199286511.txt 1199994210.txt 1200003656.txt 1200010023.txt 1200112408.txt 1200162026.txt 1203631942.txt 1209143958.txt 1211911286.txt 1212009927.txt 1212026314.txt 1212067640.txt 1212088415.txt 1222901025.txt 1231257056.txt 1234277656.txt 1234302123.txt 1242132884.txt 1242136391.txt 1245773909.txt 1258053464.txt
Maria Noguer, 0900972000.txt 0962366892.txt
Jonathan Gregory, 0908385907.txt 1086904814.txt 1217431501.txt
Nigel Arnell, 0937153268.txt 0998926751.txt
Paul Hardaker, 1233586975.txt
Martin Juckes, 1123163394.txt 1154697504.txt 1163715685.txt 1163771694.txt 1164059987.txt 1164120712.txt 1183753398.txt
Tom Webb, 1167752455.txt 1167754725.txt 1168022320.txt
Ian Woodward, 0848695896.txt 0878654527.txt 0942953601.txt 0984799044.txt
David Webb, 1086904814.txt
Rob Wilson, 1053610494.txt 1098388401.txt 1106934832.txt 1121869083.txt 1128000000.txt 1133366680.txt 1140554230.txt 1141068509.txt 1163715685.txt 1163771694.txt 1258039134.txt
Davies Siwan, 1106934832.txt
Roger Street, 1182179459.txt
Chronis Tzedakis, 1115843111.txt 1115887684.txt
Andrew Manning, 1254832684.txt
Anthony Foot, 1208278112.txt 1211040378.txt 1211215007.txt 1211225754.txt
Clare Goodess, 1038353689.txt 1057944829.txt 1087589697.txt 1182179459.txt 1208278112.txt 1211040378.txt 1211215007.txt 1211225754.txt 1211462932.txt 1211491089.txt 1211816659.txt 1219078495.txt 1221742524.txt 1223915581.txt
Tom Melvin, 1103828684.txt 1132094873.txt 1141267802.txt 1146252894.txt 1183499559.txt 1236958090.txt 1253561029.txt 1254230232.txt 1254323180.txt 1254345174.txt 1256353124.txt
Rachel Warren, 1182179459.txt
Simon Busby, 1221742524.txt

Wednesday, December 09, 2009

 

Deleting Irritating Files on Windows XP

Having taken a look at the CRU FOI2009.zip archive, I found two irritating files that I could not delete. These were flxlist. and sfwxlist. in the FOIA/documents/yamal folder. These have something to do with paleoclimatology - a field which most of the interweb now knows well.

Windows XP happily gave me a baffling 'Cannot delete file: Cannot read from source file or disk.' message box. (As helpful as usual).

The solution to this problem is to prepend \\? to the file name that you are about to delete (using cmd.exe, of course, there is no known solution from the graphical interface). E.g.

del "\\?\C:\Documents and Settings\User\Desktop\FOIA\documents\yamal\flxlist."

The magic \\? switches off some form of Windows XP file name sanity checking, allowing you delete files which Windows XP thinks do not have valid names (although, of course, Windows XP did allow the creation of the files in the first place). Problem solved!

Tuesday, October 27, 2009

 

Windows XP Blank Screen After XP Logo (Dell Latitude D830)

I thought I would record the following experience in case it helps others who encounter the same problem. I have a Dell Latitude D830 laptop and generally leave it in standby mode at the end of the day. This uses little power and makes the slow Windows XP reboot unnecessary. However, last week I was away from the office and the machine in standby mode, and detached from the power supply, apparently ran out of battery power. When I came to power up the machine, rather than springing back from standby mode, it began a normal reboot sequence. However, this reboot sequence was far from normal in its ending. After the Windows XP logo appeared with its blue progress bar, the screen suddenly went completely black.

Very strange and unsatisfactory behavior! Googling through the various sites discussing this topic, I quickly came to appreciate that there are a multitude of ways in which this tedious effect can be achieved. Rebooting in safe mode or VGA mode worked fine, but any attempt to get to the normal screen resolution resulted in the blank screen after the Windows XP logo part of the reboot sequence.

To solve the problem I tried rebooting in safe mode and reloading graphics drivers as originally downloaded from Dell. This did not solve the problem. I then tried just using the machine with remote desktop, after booting it into VGA mode. This worked well, but wasn't quite what I was used to, and certainly did not constitute a fix, and so I Googled further. I tried 'repairing' the operating system using a Windows XP disk. (That was a mistake - see below). Eventually, I ran into a site which described how someone logged onto the machine, although the screen was black. So - having nothing to lose, I tried that. I typed username - and tab - and password a few times. Something in the blankness was happening as their were some typical Windows XP clonking sounds emitted from the machine. Then I must have hit the correct sequence, as the display suddenly lit up, and carried on working as though nothing had ever happened.

Of course, the Windows XP repair attempt had the effect of removing all the vital operating system updates that had been put on the machine. So, a large Windows update was then required to restore SP3 and all the other updates thought to be vital to my security.

So, the conclusions are firstly that Windows XP and standby mode can be fragile under some rare circumstances. Secondly, curing tedious Windows XP start up problems can be mysterious. I speculate that my habit of putting the machine into standby mode may be quite risky when perilously short of diskspace. Probably at some point the standby mode program decides that the battery will expire soon and decides to write everything out to disk, but without checking properly that there is enough diskspace to save everything. Then, somehow or other, some special state for the graphics card of blankness (because the lid is closed) is saved, and the machine crashes. This state will be overwritten the next time someone logs on successfully. However, the next logon is effectively prohibited by the fact that the screen is blank.

Hence, users are forced into the rather tedious process of either wiping the machine and starting again. Or buying another machine and making another donation to Microsoft in the process for another, equally buggy, operating system.

But - if you have a blank screening Windows XP laptop, that almost boots but not quite, try typing in your credentials at what should be the log in screen. You may find that this clears the problem - and although it is not as satisfying as repairing the operating system, replacing graphics drivers, or buying a new machine, it may just get you going again.

Tuesday, October 13, 2009

 

Extracting the Audio Portion of MP4 to MP3

If you are ever in need of just the audio portion of an .mp4 file, here are two commands and a loop to put the audio into an .mp3 file.

#!/bin/sh

for file in *.mp4
do
echo $file
"/cygdrive/c/Program Files/MPlayer/mplayer.exe" -ao pcm $file \
-ao pcm:file=tmp.wav
lame -h tmp.wav `basename $file .mp4`.mp3
done


(You may need to customize the path to mplayer.exe. I used an explicit path as I was on Windows and needed to pull in a specific cygwin dll for this executable. You probably won't have this issue.)

Monday, September 21, 2009

 

Problems with a 4 GB Kingston Data Traveler

I have been having tedious problems with a 4 GB Kingston Data Traveler. I couldn't find much information on the web about these problems - so I thought I would describe what I have seen here. If anyone has a solution please let me know in comments. If you are having similar problems, perhaps you can learn from what I have seen (and can avoid wasting your time trying to get the drive to work).

I bought 4 GB Kingston Data Traveler for about 12 dollars from Fry's Electronics several months ago. The drive never worked properly - files were getting lost and corrupted from the file system. So, I sent it back to Kingston for replacement. The replacement drive showed up last week. Soon after I plugged that into a Windows XP machine, I found that the complete USB controller for that machine had shut down. That is a tedious event. You need to power down the machine completely, in order to power down the internal USB hub, in order to get everything to reset. I looked around on the web - endlessly and made various attempts to change drivers and so on. But to no avail. So, I am inclined to give up with this drive. It just isn't worth the time and effort trying to make it work as advertised. Every other USB drive and device I have ever used with this machine has worked perfectly. The USB drive in question is a 4 GB Kingston Data Traveler USB stick. The computer concerned is a Windows XP/SP3 Dell Latitude D610.

If you know how to get the Data Traveler USB drive to behave properly - or recognize the symptoms, please let me know!

Sunday, July 12, 2009

 

Simple, Battery-Free, LED Flasher



How about making an electronic device which does something (slightly) useful, doesn't require an external power supply, and might last longer than you? I found this proposition enticing, and so, inspired by Kevin Horton's Infini-Flasher, I had a go at creating my own version of Kevin's cunning device.

I firstly put together the components using a simple breadboard and found that I needed to make some changes to the component values listed in Kevin's design. I suspect that this is simply a result of using different transistor types. The changes simplified Kevin's circuit slightly, I omitted a pair of diodes, which are only needed if the supply voltage is high, and changed the ordering of the NPN and PNP transistors. I also found that it was important to use a suitable value for the limited resistor, between the super capacitor and the flasher circuit. I started out with 100k here, as in Kevin's circuit, but found that the circuit would not start to flash as the leakage through the flasher circuit would not allow the voltage across the flasher circuit to reach a high enough value to actually start flashing. Clearly the circuit is a little sensitive to component tolerances. I recommend that you put together your circuit using a breadboard first, then solder everything together once you are sure that the component values are satisfactory. The flasher circuit I ended up with is very similar to that listed here in the Talking Electronics site, with the main difference being that the flasher drives a LED through the induced voltage in an inductor, rather than directly from the power supply (which is generally too low a voltate to light a LED). This type of flasher circuit works well for this application, as we want a very short duration of current usage, so that the relatively small amount of charge available from the capacitor is made to last as long as possible. The simulation of the circuit shows (when I get around to adding it), the on condition for the output transistor is very brief. This provides a brief spike of power to the inductor and in turn the LED. As human eyes are very sensitive to short pulses of light (through having evolved to avoid the glinting teeth of sabre tooth tigers no doubt) this provides the most electrically economical means to light the LED. Even if the pulses were longer, the human eye would not appreciate the large expenditure of power that much.

I will include a table indicating the specifications of each of the components in due course, should you want to reproduce the circuit. Here is the circuit diagram. Be warned that this battery free flasher is fascinating! The super capacitor (1F on the diagram) charges up from the solar solar cells in about 30 minutes under a lamp, or quite happily during the hours of daylight on a desk. When the voltage supplied to the flasher circuit reaches about 1.5 volts, the LED starts flashing. The current consumption is around 10 microamps, so the charge in the super capacitor lasts a long time, certainly more than a typical night time. So far my version has been flashing away happily on my desk for about a month. It should last as long as the electrolytic capacitors that it contains - that should be at least 20 years - perhaps 40 if I am lucky! (I will keep this blog post updated - if I can).

Monday, July 06, 2009

 

Automating Tweets for Twitter With A Script

Are you interested in hooking up your computer's CPU temperature monitor to Twitter? How about monitoring the progress of your automatic builds from anywhere in the world, without needing to set up a web server, or worry about firewall security?
Simple monitoring tasks such as these can be easily accomplished using Twitter. You can also (should you want to) automate your personal updates to your followers. However, be aware that your followers will quickly tire of your Tweets if they are too routine or 'bot' like.
Here is a simple example script to get you started:
#!/bin/sh
user="YourTwitterUserName"
pass="YourTwitterPassword"
while true
do
status=`awk '{srand();array[NR]=$0;}END{print array[int(rand()*NR+1)]}' msgs.txt`
status=`echo $status | tr ' ' '+'`
curl --basic --user "$user:$pass" --data-ascii \
"status=$status" \
"http://twitter.com/statuses/update.json" > /dev/null
DELAY=`awk 'BEGIN{srand();print 29+int(rand()*1200)}'`
echo "We will be sleeping for" $DELAY "seconds"
sleep $DELAY
done

This reads the msgs.txt file, selects a line from this file at random, and updates your Twitter status accordingly. The script then pauses for a suitable delay of less than 20 minutes, and repeats, forever (or until interrupted). With a suitably creative set of work, or study, related lines in your msgs.txt file this might be help convince some of your colleagues (or parents) that you are dedicated person while you surf the interweb.

Saturday, February 28, 2009

 

Making Files Accessible - Easily

Today I was contemplating setting up Samba on Cygwin on Windows XP. The idea was with a machine running such a setup somewhere on the network, I would be able to easily share files without having to engage in Windows XP or Windows Vista security.

Well - if you Google the subject - you find that it can be done. But it is clearly quite a lot of effort to set up. Additionally, not everything works correctly. Some forms of file access do not survive the round trip from Windows to Gnu to Windows unscathed.

However, before I got too far into this project I ran into something neater and more minimalistic, which is almost as good. What is that thing? Running a simple web-server under Cygwin so that other computers on the network can browse that machine's files using Firefox, Internet Explorer, Safari, or your favorite web browser. How does this work? A simple one liner is sufficient - because the Python world have been to some effort to make everyone's lives easier.

python -m SimpleHTTPServer

Just type the python command above. Then elsewhere on the network open http://1.2.3.4:8000 and you will be able to browse the files and directories beneath the directory in which the python command was executed. (The 1.2.3.4 is the IP address of the server machine - you can find this by typing 'ping hostname'). This means read only access on the remote machine - however this involves no set up whatsoever. Depending on what you have been doing with Python on the machine, you may get a prompt about opening the firewall for this program. However, that is the extent of the setup. Very nice!

Inspired by this - I Googled again. This time for a web server written in Bash. Sure enough there is a nice version out there, written, (I think), by Piotr Gabryjeluk. This is a little cooler than the Python web server, because you can play with the moving parts, and see what is happening. It all happens in less than 100 lines of script. This script expects an argument of the port that will be used for communication. So, on your server machine, you would issue a command like:

./web.sh 9092

...and on the client machine you would browse http://1.2.3.4:9092. (As before 1.2.3.4 is the IP address of your server machine). This is very convenient, and I think, rather secure. You can see precisely which files are being served by your server (from the output on standard out). You can kill the server when you have had enough fun and your machine's configuration will be completely unaffected. The script relies on 'nc' or 'nc.exe' to do the communication, so you may get a prompt from your firewall about this program. Just add it to the list of trusted programs, and all will be fine.

I made a small tweak to the original script - so that I could browse images and text files without downloading them first. This is also basically described in the comments on Piotr's page (I think I added the \n which seemed to be necessary), but if you are interested here is the tweak to the function called serve_file.

function serve_file {
echo 'HTTP/1.1 200 OK'
local file="`fix_path "$1"`"
debug INFO serving file "$file"
debug INFO "`file -i "$file" | cut -d " " -f 2-`"
printf 'Content-type: %s\n' "`file -i "$file" | cut -d " " -f 2-`"
echo
cat "$file"
}

These commands - the completely minimalist Python command or the Bash script - enable you to share files easily on your home computer network. It may be my imagination - but I am certainly impressed by the performance of the Bash version. It makes browsing directories graphically very nice because it is so responsive.

Tuesday, February 24, 2009

 

A Bash Script for Directory Merging (dirmer.sh)

When working on multiple machines, using external drives, and being constrained for disk space, it is all too easy to create cloned directory trees, which are similar but not identical to one another. Looking at the various directories cloned on my hard drive, I decided to create a simple script for merging directories, which is appended. Now, this is very simple, and be warned, it comes with no guarantees expressed or implied, and has minimal error checking. However, I find it useful and thought I would post it in case anyone else were interested.

Here is how it works...

1. Two arguments, the source directory, and the target directory are passed to the awk program
2. The awk program cd's to the source and target directories and builds associative arrays keyed on file names for the file's timestamp and file type (either file or directory)
3. Each file in the source directory is checked in the target directory.
4. If the same file name exists in the target, the checksums of each file are compared, if the files are identical, a command to delete the source file is stored
5. If the files are not identical, a warning is emitted and the file is left in place in the source directory for further investigation
6. If the source file or directory does not exist in the target it is moved to the target directory, again by storing the appropriate command
7. The user is shown the list of commands that the script has decided are required and asked if these should be executed
8. If requested, the merge commands are executed

The effect is that identical files are deleted in the source (you already have them in the target after all). Files that are unique are copied to the target. Any files that are in conflict are left in place to be reconciled by hand.

As mentioned above - the script is experimental and crude and contains minimal error checking - please leave comments if you have suggestions for possible improvements and please check carefully prior to any use (which will be at your own risk, as always).


#!/bin/sh

awk '
BEGIN{
dir1="\"" ARGV[1] "\"/"
dir2="\"" ARGV[2] "\"/"

readdir(dir1, lista, typea)
readdir(dir2, listb, typeb)

for(filea in lista){
if(filea in listb){
if(typea[filea] == "f"){
if(ckfile(dir1 "\"" filea "\"") != ckfile(dir2 "\"" filea "\"")){
print "# " dir1 "\"" filea "\"" " " dir2 "\"" filea "\""
print "# WARNING FILES DIFFER - CONTINUING - YOU NEED TO CHECK WHY!"
} else {
com[++ncom]="# files match " dir1 "\"" filea "\"" " " \
dir2 "\"" filea "\""
com[++ncom]="/usr/bin/rm " dir1 "\"" filea "\""
}
}
}else{
if(typea[filea] == "d" ){
dcom[++ndcom]="# directory needs to be created in the target"
dcom[++ndcom]="mkdir -p " substr(dir2,1,length(dir2)-2) \
substr(filea,2) "\""
} else {
com[++ncom]="# file needs to be moved to the target"
com[++ncom]="mv " substr(dir1,1,length(dir1)-2) substr(filea,2) \
"\"" " " substr(dir2,1,length(dir2)-2) substr(filea,2) "\""
}
}
}

if(!ncom && !ndcom){
print "No updates required"
exit
}
print "The following commands are needed to merge directories:"
for(i=1;i<=ndcom;i++){
print dcom[i]
}
for(i=1;i<=ncom;i++){
print com[i]
}
print "Do you want to execute these commands?"
getline ans < "/dev/tty"
if( ans == "y" || ans == "Y"){
for(i=1;i<=ndcom;i++){
print "Executing: " dcom[i]
escapefilename(dcom[i])
system(dcom[i])
close (dcom[i])
}
for(i=1;i<=ncom;i++){
print "Executing: " com[i]
escapefilename(com[i])
system(com[i])
close (com[i])
}
}
}
function ckfile(filename, cmd)
{
if (length(ck[filename])==0){
cmd="cksum " filename
cmd | getline ckout
close(cmd)
split(ckout, array," ")
ck[filename]=array[1]
}
return ck[filename]
}
function escapefilename(name){
gsub("\\$", "\\$", name) # deal with dollars in filename
gsub("\\(", "\\(", name) # and parentheses
gsub("\\)", "\\)", name)
}
function readdir(dir, list, type, timestamp, ftype, name){
cmd="cd " dir ";find . -printf \"%T@\\t%y\\t%p\\n\""
print "Building list of files in: " dir
while (cmd | getline > 0){
timestamp=$1
ftype=$2
$1=$2=""
name=substr($0,3)
list[name]=int(timestamp)
type[name]=ftype
}
close(cmd)
}' "$1" "$2"

Thursday, February 05, 2009

 

Creating A Subset of a PDF Document

If you have a PDF file and want to send only a portion to a friend or colleague, what do you do? With pdftk you can easily create subsets of the pages in a PDF. For example, if you want to drop 5 pages of preamble in a document that you need to send to your boss, you can do that with:

pdftk A=LongDocument.pdf cat A6-end output ShortDocument.pdf

Tuesday, February 03, 2009

 

Extracting the Audio Portion of an FLV File to MP3 (Again)

Here is another ffmpeg recipe. This time extracting the audio portion of a flash, or .flv, video file.

ffmpeg -i example.flv -ab 56 -ar 22050 -b 500 example.wav
lame --preset standard example.wav example.mp3

You can also go directly to .mp3 from .flv using ffmpeg with a command like this.

ffmpeg -i example.flv -sameq example.mp3

But using lame seemed to give better results, as far as I could tell. If you have additional information on maintaining quality when carrying out such transformations please leave a comment.
 

An Improved ffmpeg Recipe For Concatenating Video Files

Here is an improved recipe for concatenating mpg and other video format files.

ffmpeg -i a.mpg -s 480x360 -maxrate 2500k -bufsize 4000k -b 700k -ar 44100 i1.mpg
ffmpeg -i b.mpg -s 480x360 -maxrate 2500k -bufsize 4000k -b 700k -ar 44100 i2.mpg
ffmpeg -i c.mpg -s 480x360 -maxrate 2500k -bufsize 4000k -b 700k -ar 44100 i3.mpg
cat i1.mpg i2.mpg i3.mpg > a.mpg
ffmpeg -i a.mpg -sameq combined.mpg

This seemed to work better than simply using 'sameq' the last time that I needed to join some video segments.

Monday, February 02, 2009

 

Bash Directory Synchronization 2.0 (ds.sh)

Here is a simple Bash script which illustrates the use of find and awk in determining which files to update when synchronizing two directories. As noted in the previous post on the subject (where there is a bash shell script to synchronize directories which does the recursion and file time stamp checking), you can use the rsync command to carry out this task for you. However, rsync is a little too efficient and terse, and an open script, even if it is mainly awk, allows you to understand precisely what is about to happen to your files. The script works by using 'find' to gather data about the directories being synchronized. A list of synchronization commands (simple cp's or rm's) are presented to the user based on this analysis, and the user can then decide whether to execute the commands or not. In my tests the analysis of 1.3 GB of files on hard drive and on USB took around 3 seconds - so the speed of this script is not far from the speed of rsync itself.

The usage reporting and error checking are minimal, the arguments are:

ds.sh directory1 directory2 time-window (seconds)

(and all arguments are compulsory). So, for example, you might type:

./ds.sh usb_documents drive_documents 2

to synchronize your USB drive documents with your hard drive documents.

This script is experimental - please feel to use this - at your own risk. If you have questions or comments please let me know.
#!/bin/sh

awk '
BEGIN{
timewindow=ARGV[3]
print "The time window is: " timewindow
dir1="\"" ARGV[1] "\"/"
dir2="\"" ARGV[2] "\"/"

readdir(dir1, lista, typea)
readdir(dir2, listb, typeb)

for(filea in lista){
if(filea in listb){
if(typea[filea] == "f"){
timediff=lista[filea]-listb[filea]
if(timediff > timewindow){
com[++ncom]="# file in source directory newer than target"
com[++ncom]="cp -a " dir1 "\"" filea "\"" " " dir2 "\"" filea "\""
}
if(timediff < -timewindow){
print "# WARNING NEWER FILE IN TARGET DIRECTORY"
print "# files concerned are: "
print "# " dir1 "\"" filea "\"" " " dir2 "\"" filea "\""
}
}
}else{
if(typea[filea] == "d" ){
dcom[++ndcom]="# directory needs to be created in the target"
dcom[++ndcom]="mkdir -p " substr(dir2,1,length(dir2)-2) \
substr(filea,2) "\""
} else {
com[++ncom]="# file needs to be copied to the target"
com[++ncom]="cp -a " substr(dir1,1,length(dir1)-2) substr(filea,2) \
"\"" " " substr(dir2,1,length(dir2)-2) substr(filea,2) "\""
}
}
}
for(fileb in listb){
if(!(fileb in lista)){
com[++ncom]="# need to remove file in target not in source"
com[++ncom]="rm -f " dir2 "\"" fileb "\""
}
}
if(!ncom){
print "No updates required"
exit
}
print "The following commands are needed to synchronize directories:"
for(i=1;i<=ndcom;i++){
print dcom[i]
}
for(i=1;i<=ncom;i++){
print com[i]
}
print "Do you want to execute these commands?"
getline ans < "/dev/tty"
if( ans == "y" || ans == "Y"){
for(i=1;i<=ndcom;i++){
print "Executing: " dcom[i]
escapefilename(dcom[i])
system(dcom[i])
close (dcom[i])
}
for(i=1;i<=ncom;i++){
print "Executing: " com[i]
escapefilename(com[i])
system(com[i])
close (com[i])
}
}
}
function escapefilename(name){
gsub("\\$", "\\$", name) # deal with dollars in filename
gsub("\\(", "\\(", name) # and parentheses
gsub("\\)", "\\)", name)
}
function readdir(dir, list, type, timestamp, ftype, name){
cmd="cd " dir ";find . -printf \"%T@\\t%y\\t%p\\n\""
print "Building list of files in: " dir
while (cmd | getline > 0){
timestamp=$1
ftype=$2
$1=$2=""
name=substr($0,3)
list[name]=int(timestamp)
type[name]=ftype
}
close(cmd)
}' "$1" "$2" "$3"

Labels:

Sunday, February 01, 2009

 

Synchronizing Directories And Files With A USB Drive

If you are interested in this script, there is an arguably better version, which uses find, instead of recursion in bash, to traverse the directories. It is much faster. (You can obtain a copy here, Bash script to synchronize directories).

Every time that I check on the price of USB drives, it seems that the amount of storage that one can buy for $10 has doubled! Moore's Law seems to be a little accelerated for USB drives...

At this rate, in two years (say 2011) 256 GB USB drives will cost $10.

So, like many people, I store more and more information on USB drives.

And, like many people, I then rapidly run into the problem of keeping directory trees synchronized. It is actually a difficult problem, because although you know from the file timestamps which files are the latest files on the USB, you do not necessarily know the history of the files and directories. So if a file is deleted on the USB but still exists on the hard drive, what do you do? You either remove the file on the hard drive, or create the file on the USB, but knowing which action is the correct one is difficult. As you can create, modify, and remove files using a variety of programs, capturing the history necessary to synchronize two directories trees is difficult too. One solution might be to intercept all the OS calls to the file systems involved, but that seems to be a lot of work.

There are a variety of programs which set out to provide directory synchronization. Two of the most well known are rsync and unison. They are both well worth a look. Rsync in particular is very effective. However, for the application of keeping my USB drive and hard drive in synchronization, I wanted something which I could adjust a little more than rsync, and so I have been using the script below. This is still rather experimental, and comes with no guarantees whatsoever. If you try it, you need to take appropriate precautions for yourself.

The script allows the user to enter a 'modification window' in seconds. This allows latitude in the assessment of the file timestamps that are used in deciding whether to update files in the target directory. This is needed because a 'FAT' USB drive stores file timestamps at a lower resolution than either Windows or Linux typical file systems. For a FAT device you will probably want to supply '-t 2' to insure that you don't end up copy lots of files in either direction when the files are actually supposed to have the same timestamp in reality.

As mentioned, this script is still experimental. I use it with a directory of around 1.5 GB of files, which I synchronize between two computers and a USB drive. The performance is the primary concern, although it is certainly usable. The shell script uses 'stat' (a lot) to obtain information on the modification timestamps of the files that it needs to compare. I have been considering replacing this with a single find command to obtain this information in one shot upfront (the command will be something like 'find . -printf "%p\t%T@\n"'). Perhaps this will be the subject of a future script.

If you have any comments or questions, please let me know.

#!/bin/bash

# comparefiles either compares two files and returns (if in compare mode)
# or determines whether to update the target file and carries out the update

comparefiles () {

FILE1="$1/$3"
FILE2="$2/$3"

if [ $COMPARE = "Y" ]; then
if [ ! -f "$FILE2" ] ; then
echo "dirsync: warning $FILE2 does not exist"
NCOPY=`expr $NCOPY + 1`
else
diff "$FILE1" "$FILE2" > /dev/null
if [ $? != 0 ]
then
echo "dirsync: $FILE1 $FILE2 differ"
NCOPY=`expr $NCOPY + 1`
fi
fi
return
fi

if [ ! -f "$FILE2" ] ; then
NCOPY=`expr $NCOPY + 1`
if [ $DRYRUN = "Y" ]; then
echo "dirsync: need to /bin/cp -a -i $1/$ITEM $2/$ITEM"
else
echo "dirsync: copying new item $1/$ITEM to $2/$ITEM"
/bin/cp -a -i "$1"/"$ITEM" "$2"/"$ITEM"
fi
return
fi

FILETIME1=`stat -c'%Y' "$FILE1"`
FILETIME2=`stat -c'%Y' "$FILE2"`
TIMEDIFF=`expr $FILETIME1 - $FILETIME2`
NEGTIMEWINDOW=`expr -$TIMEWINDOW`
if [ $TIMEDIFF -gt $TIMEWINDOW ]; then
echo "dirsync: (t=$TIMEDIFF) copying file $1/$ITEM to $2/$ITEM"
echo "dirsync: $FILE1: `stat -c'%s %y' "$FILE1"`"
echo "dirsync: $FILE2: `stat -c'%s %y' "$FILE2"`"
NCOPY=`expr $NCOPY + 1`
if [ $DRYRUN = "Y" ]; then
echo "dirsync: need to chmod u+w $2/$ITEM"
echo "dirsync: /bin/cp -a $1/$ITEM $2/$ITEM"
else
chmod u+w "$2"/"$ITEM"
/bin/cp -a "$1"/"$ITEM" "$2"/"$ITEM"
fi
elif [ $TIMEDIFF -lt $NEGTIMEWINDOW ]; then
echo "dirsync: warning newer file in target TIMEDIFF: " $TIMEDIFF
echo "dirsync: $FILE1: `stat -c'%s %y' "$FILE1"`"
echo "dirsync: $FILE2: `stat -c'%s %y' "$FILE2"`"
echo "dirsync: diffing files"
diff "$FILE1" "$FILE2" > /dev/null
if [ $? == 0 ]; then
echo "dirsync: the files are the same - update target"
echo "dirsync: requires /bin/cp -a $1/$ITEM $2/$ITEM"
NCOPY=`expr $NCOPY + 1`
if [ $DRYRUN != "Y" ]; then
/bin/cp -a "$1"/"$ITEM" "$2"/"$ITEM"
fi
else
echo "dirsync: files differ"
echo "dirsync: requires /bin/cp -a -i $1/$ITEM $2/$ITEM"
NCOPY=`expr $NCOPY + 1`
if [ $DRYRUN != "Y" ]; then
/bin/cp -a -i "$1"/"$ITEM" "$2"/"$ITEM"
fi
fi
fi
}

searchdir () {

if [ $COMPARE = "Y" ]; then
echo "dirsync: comparing $1 and $2"
fi

if [ ! -d "$2" ]; then
if [ $DRYRUN = "Y" ]; then
echo "dirsync: need to mkdir $2"
else
mkdir "$2"
fi
fi

for ITEM in "$1"/*
do
ITEM=`basename "$ITEM"`
if [ -h "$1"/"$ITEM" ]; then
echo "dirsync: $1/$ITEM is a link and links are not handled"
elif [ -f "$1"/"$ITEM" ]; then
comparefiles "$1" "$2" "$ITEM"
NFILE=`expr $NFILE + 1`
elif [ -d "$1"/"$ITEM" ]; then
searchdir "$1"/"$ITEM" "$2"/"$ITEM"
NDIRS=`expr $NDIRS + 1`
fi
done
for ITEM in "$2"/*; do
ITEM=`basename "$ITEM"`
# the check on the existence of the second item handles the wild card
if [ ! -e "$1"/"$ITEM" -a -e "$2/$ITEM" ]; then
if [ -d "$2/$ITEM" ]; then
echo "dirsync: directory $2/$ITEM does not exist in $1"
else
echo "dirsync: File $2/$ITEM does not exist in $1"
fi
if [ $DRYRUN = "Y" ]; then
echo "dirsync: need to rm -ri $2/$ITEM (if -f is set)"
fi
if [ $CLEANUP = "Y" ]; then
echo "dirsync: rm -ri $2/$ITEM"
rm -ri "$2"/"$ITEM"
fi
NDELE=`expr $NDELE + 1`
fi
done
}

NDIRS=0
NFILE=0
NCOPY=0
NDELE=0
NDIFF=0
TIMEWINDOW="0"
CLEANUP="N"
DRYRUN="N"
COMPARE="N"

if [ "$#" -lt 2 ]; then
echo "Usage: dirsync source target [-f | -d | -k] [-t offset]"
echo "-f = force removal of files deleted in source (cleanup)"
echo "-d = dry run"
echo "-t offset = offset in seconds to apply to timestamps (for FAT)"
echo "-k = comparison"
exit
fi

# the first two arguments are directories

while [ $# -gt 0 ]; do
if [ "$1" = "-t" ]; then
TIMEWINDOW="$2"
shift; shift; continue
elif [ "$1" = "-d" ]; then
DRYRUN="Y"
shift; continue
elif [ "$1" = "-f" ]; then
CLEANUP="Y"
shift; continue
elif [ "$1" = "-k" ]; then
COMPARE="Y"
shift; continue
else # target directories are stored here
if [ -z "$SRC" ]; then
SRC="$1"
else
TRG="$1"
fi
shift; continue
fi
done

if [ -z "$SRC" -o -z "$TRG" ]; then
echo "Target directories not supplied"
exit 1
fi

if [ ! -d "$SRC" -o ! -d "$TRG" ]; then
echo "Either $SRC or $TRG is not a directory, stopping"
exit 1
fi

if [ $COMPARE = "Y" -a $CLEANUP = "Y" ]; then
echo "Compare (-k) not permitted with cleanup (-f)"
exit 1
fi

if [ $COMPARE = "Y" -a $DRYRUN = "Y" ]; then
echo "Compare (-k) not permitted with dryrun (-d)"
exit 1
fi

if [ $DRYRUN = "Y" -a $CLEANUP = "Y" ]; then
echo "Dryrun (-d) not permitted with cleanup (-f)"
exit 1
fi

searchdir "$SRC" "$TRG"

echo ""
echo "dirsync: number of directories searched = $NDIRS"
echo "dirsync: number of files checked = $NFILE"

if [ $DRYRUN = "Y" ]; then
echo "dirsync: number of files to be copied = $NCOPY"
echo "dirsync: number of items to be deleted = $NDELE"
elif [ $COMPARE != "Y" ]; then
echo "dirsync: number of files copied = $NCOPY"
fi
if [ $CLEANUP = "Y" ]; then
echo "dirsync: number of items deleted = $NDELE"
fi
if [ $COMPARE = "Y" ]; then
echo "dirsync: number of files that differ $NDIFF"
fi

Labels: , , ,

Older Posts