Category Archives: Technological Tantrums

Things that cause us to shout at inanimate objects.

Taking a Butcher’s at Fonts

Fonts are fun.  When we (Lazy Bee Scripts) publish a stage play or pantomime, we pick a title font that says something about the content.  Usually it’s a feeling conveyed by the typeface (from chilling to frivolous); occasionally it’s suggested by the font name – so I have used a font named Gaslight for plays set in the late Victorian era.

So far, so good.  However, we distribute a lot of scripts as Word files and that brings some additional problems.  Obviously we don’t expect our customers to have the same fonts installed on their computers as we do, but Word provides a means of embedding TrueType fonts into the Word file, so that they are available to readers of that file who don’t have the font installed.  There are two options for doing this: embed the whole font file or only characters used in the file (which Word recommends as “best for reducing file size”).  We take the latter option.

When the customer opens the file, if they don’t have the font installed, they may see a message about restricted fonts.  Sometimes we use fonts that (for copyright reasons) are not freely distributable, however, if embedded, they will show up on-screen and may be printed.  What the customer can’t do is to edit a document containing a restricted font.  That’s fine.  Most of our customers do not need to edit our scripts.  (Not least because of the general point that you should not change a copyrighted work without the permission of the copyright holder.)  However, there are occasionally good reasons for editing (embedding lighting cues for a specific production, for example).  In that case, if the customer saves an editable version, they will lose the embedded font and Word will substitute its default system font (which probably won’t look anything like our chosen font).

The next problem was pointed out to us by author Tim Cole when we sent him a copy of his script Butchers.  When he took a butcher’s *, he pointed out that there was a spurious square at the end of the title.

Usually, spurious squares are an indication that the creator of a font has not implemented some characters (usually punctuation marks).  In this case, the square appeared at the end of the line, in the position of the paragraph return.  Even more bizarrely, making the paragraph marks visible displayed the pilcrow (the printers’ end of paragraph mark, ¶ ) in the chosen font.  Somehow, when the pilcrow was supposed to be invisible, Word was trying to display a character that wasn’t in the embedded character set.

If you paid close attention to my second paragraph, you may have identified the obvious way around the problem: instead of embedding just the used characters, why not embed the whole font?  You’re right.  I tried that, and indeed the invisible character that isn’t embedded in the “used characters” is part of the whole font set and so if the whole font set is embedded, the problem disappears.  Unfortunately, there’s a cost to doing that.  Remember that recommendation from Microsoft that embedding just the used characters is “best for reducing file size”.  Embedding a whole font in a test document grew it from 98 kB to over 1.7 MB – so one invisible character cost me 1.6 MB of storage space.  The problem is prevalent in all non-installed fonts and, since we don’t know what our customers have installed on their computers, applying the fix to all scripts would cost us 10 GB of on-line storage.  (You can argue that this is not very much by modern standards; however Lazy Bee Scripts is a small publisher with small storage requirements compared to, say, YouTube.)

I found a different solution, which was to replace every return character in a script title with a return in the document’s default font.  Quite tricky to automate (and, because of thousands of scripts, it needed to be automated), but it removes the spurious square at no cost to the file size.
All this may well be a feature of the latest version of Word.  I had not come across it before, but then I don’t keep archived copies of different versions of Word just to test for Microsoft’s problems.



*     “Butcher’s hook”, rhyming slang for look.  Not many people see butcher’s hooks any more, but they were very useful to my grandfather.

Microsoft Causes Inflation

This simple trick will cut your Word document down to size.

The Problem

Word LogoWord documents can suffer from bloat. An author tried to submit a 32-page Word file to us. The formatting was straightforward, but the file was a whopping 2.4 MB. Our on-line systems rejected the file (because why would we want a file that big?) The author saved it as a Rich Text Format (RTF) file of 600 kB and uploaded it. We imported it into Word and saved it as a .docx and low and behold, it was back to nearly 3 MB. Over 2 MB bigger without the addition of so much as a single comma. We reviewed the file and added some simple mark-up, and it blew up to well over 3MB.
After that, I applied this trick to the script and got it down to less than 90 kB – two orders of magnitude smaller. (And with five minutes work, it went to 80 kB.) So what’s going on?

The Cause

The frank answer is that I’m not sure. I know some of the causes, but Word is a complex tool, so attributing anything to a single cause is dubious, and I’m trying to approach this as a user, not as a product tester. Broadly, there are two issues. Firstly, Word tracks changes. Even if you declare an edition to be final, and stop tracking changes, Word seems not to discard the change data. It’s still hanging around somewhere, even though it’s not used. Secondly, if your document is edited by multiple users (or one user on several computers), it picks up template information from each instance without discarding the previous information, so it keeps adding unused data to the document. (There may also be an issue with using different versions of Word to edit one document, and certainly further issues with editing documents in a mix of Word and other word-processors.)

The Solution

The solution is to leave all the rubbish behind:-
Create a new blank document (preferably using a clean template that includes just the Styles you need). Now open your bloated document. Select all the contents (either by mouse or use a shortcut; Ctrl-A on a Windows computer). Click copy (Ctrl-C). Switch to your new blank document and Paste (Ctrl-V). Then Save. The new version will have left most of the dross behind and kept your text, your formatting and not a lot else.

(There is a minor additional tweak: that process will copy over all the Styles from the source document, including ones that are not actually in use. You can reduce the file size a little more by deleting unused Styles.)

Postscript – What If That Wasn’t My Problem

The other major cause of Word Bloat is embedded images. If you need pictures, you need them, but consider cropping and shrinking to a size appropriate for your purpose before you embed images in your document.

The Horns of a Dilemma

Detail from Judgement by Jacob de Backer (16th century)Once again, I find myself drawn to bad language.  As usual, the cause is e-mail or, rather, e-mail filtering; a recent customer newsletter was rejected by a small number of (school) e-mail systems on the grounds of profanity.  It is not my intention to write offensive newsletters (they are mainly about new publications), so the compilation strategy is to avoid swearing.  In cases where words only have vulgar meanings, this is easy.  It gets harder, as I have mentioned before, where words have multiple meanings dependent on context.  Filtering is not good at context.  I am returning to this topic because the offending word was an odd one.  I think the cause of the problem was the title of David Pemberton’s Dance with the Devil.  Why is the devil banned from my communications?  The question is whether or not “devil” constitutes profanity.

That may seem obvious.  You could argue that the devil, being in opposition to God is, by definition, profane.  However, that which is profane is not necessarily profanity.  (Profane means ‘not sacred’ whereas profanity is swearing or other language that should be avoided in polite society.)  It might also be argued that ‘devil’ is a religious concept: a personification of evil.  But if you go to the source material, you will find relatively little about the devil in the Christian bible – mainly the temptation of Christ (by Satan) as described in three of the gospels, and various instances of “casting out devils” (describing demonic possession).  This should not be such a surprise: Christianity is monotheistic, believing in one omnipotent god; any elevation of the devil beyond the occasional anthropomorphic personification of evil would be to recreate a dualistic system along the lines of Manichaeism (which held that the universe was a perpetual struggle between equal opposing forces of good and evil).  So where do we get the notion of the devil as a consistent figure – the one with the horns and goat’s feet?  Largely through a combination of later Christian mythologizing and mediaeval art.  The former is a matter of joining biblical dots (notably from the books of Ezekiel, Isaiah and Revelations) to create a more coherent whole than appears in any of the sources.  The second is a matter of laziness.  In Anna Karenina, when Tolstoy said “All happy families are alike; each unhappy family is unhappy in its own way” he was talking lazy rubbish.  All happy families are different, but it is much easier – more dramatic – to describe the myriad ways people make each other miserable than it is to depict happiness.  Similarly, depicting the tortures of hell and the attendant demons is far easier than a dull depiction of the tranquillity of heaven.

So what we have is the over-elaboration of a metaphor.  Does that constitute profanity?  I don’t think so.  You can’t discuss the religious concept unless you name it.  I suppose that there is an argument to be made that representations of the devil (such as the 16th century one by Jacob de Backer shown here) constitute profanity, but it’s a pretty abstruse argument.  Then we have the original source of my problem: ‘dance with the devil’ is a metaphor, not a literal depiction or instruction.  Old Harry appears in similar expressions like ‘devil in the detail’ and nobody takes those as literal or offensive.  (At least, I don’t know of anybody who does.  Would anyone care to speak, for example, for the Plymouth Brethren in this respect?  I pick on them as a group who take such things very seriously and much more prescriptively than most of society.)

So are there any instances where use of ‘devil’ constitutes profanity?  Well yes.  You can call someone a devil offensively.  You can also tell them to go to the devil.  These days those uses constitute a vanishingly small minority when compared to legitimate religious use and common metaphor.  So filtering out e-mails that contain the word devil is every bit as lazy as the mediaeval depictions of the tortures of hell.

The Sharks Are Circling

Internet malwareWe are under attack.
There is an increasing volume of spam, mostly aimed at business e-mail addresses, carrying a malicious payload via an attached file.  The attachment contain some executable element (usually a macro that runs when the file is opened).  The worst of the malicious payloads are ransomware – hijacking the computer and locking the user out pending payment of a ransom.

We have four lines of defence.  The first is e-mail filtering.  It isn’t very good.
I just completed my tax return on Her Majesty’s Revenue and Customs web site.  At the end of the process, HMRC sent me a confirmation e-mail, essentially just giving me a reference number, with a link to the HMRC web site.  That confirmation e-mail was filtered out as junk, whereas the filtering was perfectly happy to let through an e-mail with this header:Spam example 1

Or a similar one, in which HMRC appear to have contracted their services overseas:Spam example 2

Automated filtering suffers from both false positives and undetected negatives. The second line of filtering is the user, who has to cope with messages like:Spam example 3

That e-mail address is more plausible than the HMRC spoofs but bears no relation to the person name or the supposed company.  It is part of the bombardment of quasi-business e-mails, most of which have attachments disguised as financial instruments – invoices, statements of account and the like.  The following is a better example; it spoofs a sender e-mail address consistently and the body of the e-mail takes the Ian Fleming approach, disguising the big lie in plausible levels of detail.  (In this case, its biggest failing was that it was sent to a non-existent address and was therefore swept into our junk mail dungeon.)Spam example 4

In theory, there are two levels of security beyond the inbox that might still save us from the worst of the scams, but I never want to put those to the test – and there is something simple that business people can do to defeat the scammers.
The assumption made by the scammers is that the e-mail is coming into a busy financial office.  The e-mail doesn’t contain enough information for the transaction to be recognisable and therefore the recipient will open the attachment to find out what it’s about.  The e-mail is written as though there is a prior history, but that history is never specified.
All that is needed to defeat this – to prove that a business e-mail is genuine – is to have some common verifiable evidence of history in the body of the e-mail so that the provenance can be checked without opening the attachment.

So, if you send out e-mails with, for example, remittance advice notes attached, then make sure your subject line or e-mail body contain a verifiable reference to a purchase order or invoice number.

The Invention of Paul Roostercroft

(And why you can’t tell a Scottish head teacher that a child has been naughty.)
Cockcroft - a smallholding for farming chickens.
Cockcroft does not mean Chicken Ranch

Paul Roostercroft came about through a collision of two problems.  As mentioned previously (We Will Hide Your Stuff), BT Business has a novel filtering system that hides e-mails that it regards as spam.  No customer notification – they don’t even tell you that this filter exists unless you ask the right question – just hiding.  In theory – the theory expounded by the helpful BT second-line support guy who gave me access to the hidden system – this junk mail filter uses a learning algorithm.  That means that if you tell it that something isn’t spam, it is supposed to look at future mail for similar characteristics, and, on that basis, decide that the new mail isn’t spam either.  It doesn’t work.  No matter how many times I tell it that I want to receive the regular bulletins from the Ordnance Survey (I like maps), it decides they are junk, whereas it lets through plenty of advertising e-mails to which I’ve never subscribed.

Similarly with Paul’s e-mail.  Paul is a playwright whose e-mail I wish to receive.  BT wishes to prevent that.  The only reason I can see for BT’s objection is that he has the venerable Anglo Saxon surname of Cockcroft.  I assume that BT thinks that this name will offend my delicate sensibilities.  No matter how many times I tell BTs system that I want his e-mails, they still get trapped in the hidden junk folder.

That brings me on to the other problem (Things You Can’t Say).  If BT thinks Cockcroft will frighten the horses, I can expect the same treatment from other e-mail systems.  How am I supposed to talk about Paul’s plays in our e-mail newsletter?  My solution was euphemism – specifically borrowing the American euphemism for a male chicken.

I thought that the inclusion of Paul Roostercroft had been successful in rendering my e-mails filter-proof until I received a “bounce” message that stated:

“A mail from you to [the head teacher of a Scottish primary school] was stopped and quarantined because it contains objectionable content in line 40”

I thought that this might have been caused by “Puss-in-Boots”, but no.  As far as I can see from scrutinising the e-mail, the naughty word in line 40 was, in fact, “naughty”.

You paid Who? For What?

BACS logoThere are chat rooms and forums for British expatriates working in the United States (and elsewhere).  They all include the question: “What’s the [local] equivalent of BACS?”

BACS is a brilliant idea, but it has yet to reach America.  “Bankers’ Automated Clearing Services” allows direct transfer between UK bank accounts using the recipient’s account number and the branch identifier (sort code).  It’s monodirectional – you push money from your account to someone else’s, but, even though you know the account number, you can’t pull money in the opposite direction.  In its latest incarnation, it is generally very fast.  For the banks, it’s cheaper to operate than cheques (the customer and the computer do all the work).  For the customer’s it is (generally) more convenient and secure – I for one have never had an electron lost in the mail.

Are you waiting for something?  Have you taken a breath in anticipation?  Okay, here it comes.  However…

The two identifier fields (account number and branch code) have a fixed format.  They are a prescribed part of the protocol.  There are also two text fields, one used for the benefit of the sender, to identify the recipient, the other for the benefit of the recipient to identify the (reason for) the payment.  Both fields are free-form text and both give hassle.  This is at the nuisance level – the benefits far outweigh the niggles – but as a frequent user, I feel the frustration and the need to grizzle!

We would like our customers to use the second field to enter our order reference number.  That number includes an underscore character, which is fine for some banks, but others block it.  There are excellent reasons for “sanitising” customer input, and blocking some characters; however, I have never come across a good reason for blocking an underscore.  (The “recipient” field also gets sanitised.  My bank doesn’t like dots.  It will cope with “Mr A Smith” but not “Mr A. Smith”.)

Furthermore, some banks make it hard to change the “reason” field once it has been set-up.  Thus we get returning customers who appear to be paying for the same order multiple times.

The field that gives me the greatest problem is the “recipient” field.  My bank encourages me to use that field to enter the recipient’s name – and logically that would be the name that appears on their bank account.  However, the bank offers me a fixed length field that is insufficient for the purpose.  I have a long list of authors who receive their royalties by BACS, but how can I be sure I’m paying the right person?  If the field cuts off at 15 characters, how am I supposed to distinguish between Christopher McPherson and Christopher McPhee?

Why does Microsoft make things up?

An exercise for word-processing obsessives

Word LogoThis is a feature of Word 2007 and Word 2010, but not (pre-ribbon) Word 2002.
Try the following steps.

  • Start a new document in Word 2007 or Word 2010.
  • Write a short sentence or headline.
  • Select your text, then change the font to your favourite fancy font, increase the font size and make it italic.
  • Select the text, then click on the expander in the bottom right hand corner of the Styles box on the home page of the ribbon.  (That launches the pop-up Styles panel.)
  • At the bottom of the Styles panel, click on click on the New Style icon.  This should create a new style from your fancy text, and prompt you to give it a name.  Let’s call this style “Wanted”.  Click OK to create it.
  • The name of your new style should now appear in the Styles panel.
  • From the “Options…” link at the bottom of the Styles panel, under the “Select Formatting to Show As Styles” heading, select “paragraph level formatting”.  (That determines what shows-up in your Styles panel.)
  • Now go back to the short sentence that you’ve created in your “Wanted” style.  Put the cursor somewhere in the middle of that sentence and press Ctrl-Return.  That inserts a page break.

Did you spot what that last operation did?  In addition to the page break, it added something to your Styles panel.

What it added depends on which version of Word you’re using (and possibly the phase of the moon).  In Word 2010 it usually adds a new style called “After: <something descriptive of paragraph formatting>”.  In Word 2007 it adds a new style that describes details of the “Wanted” style.

Is this necessary?
To prove that, use the Styles panel to select all instances of the new (“Unwanted”) style and then apply the “Wanted” style to them.  Aside from the demise of the Unwanted style, nothing else happens in the document.  The Unwanted style was unnecessary.

Why does this matter?
Well, the point of Styles is to keep control of your document – to ensure that everything that should have the same format does have the same format.  To ensure that if you want to change the way particular parts of the document look, you can change the style – one style, one change – and the change will be applied consistently throughout the document.  By spewing out unnecessary styles, Microsoft makes it harder to format documents consistently.

Things You Can’t Say

Warning: this post contains words that are forbidden in Derby.

I sent an e-mail about a school play script to a customer at a school in Derby.  I received an automated reply that said:-

Offensive Words Lexicon Found the expression “bottomless” 1 times, at 2 points each, for an expression score of 2 points.
Total Message Score: 2 points.
The e-mail has been blocked and has not been delivered.

Now, I recognise that in some contexts, the word bottomless can have connotations of immorality, but in this case, the context was the title of Raymond Blakesley’s school play “Santa Claus and the Bottomless Sack”.  E-mail filtering systems are good with words, but very bad with context.  Unfortunately, context is important.  In describing a play to a school, I can’t say that the adult roles are written to be performed by children, as “adult” has been hijacked to mean “pornographic”.  Instead, I have to use the childish expression “grown up”.  Even worse, I can’t say that a play is written for teenagers as “teen” is blocked because it is used to mean “nubile” (though not in the sense of “marriageable”, unless marriageable is a euphemism).

The final insult from the automated message from Derby was the footnote.  It said

The views expressed in this email are personal and may not necessarily reflect those of Derby City Council

So the things I am not allowed to say are dictated by the personal opinions of an automaton.

We Will Hide Your Stuff

Some time ago, I heard a bit of a radio series in which the heroine found herself in a weird parallel version of London.  (I was fairly sure that this was “Undone”, written by Ben Moor – I got so obsesessed that I confirmed this with the writer! – and therefore the protagonist’s name was Edna.)  Edna came across a business called “We Will Hide Your Stuff”.  She was so puzzled by this that she phoned them up to find out what it was about.  The conversation went:
“You know your stuff?”
“We hide it.”

This seems to be an approach taken in a joint effort by BT and Microsoft.  For their business customers, BT provide e-mail via Microsoft Exchange Server.  So far, so good.  However, the e-mail that reaches the user is pre-filtered for spam.  You would think that this means that it goes into a spam folder, and so it does, but that spam folder is not visible to the user.  There is a spam folder visible to the user, but nothing goes there, because anything that Microsoft Exchange thinks is spam gets trapped by the hidden pre-filtering system.  This is filtering by algorithm, and, of course, it is imperfect.  It traps some genuine spam, but it also lets some through.  Crucially, it traps some genuine business communications.
BT can give their customers access to this pre-filtering spam folder, but they don’t do so automatically.  In order to get access, you need to prove to BT that there is some e-mail that you have not received.  You would think that they would see a flaw in this approach.