In the examples below, I'll assume that string is a shell variable that contains some character string. It might be as short as a single character, or it might be the contents of a whole document.
newstring=${string^^} # the string, converted to UPPER CASE newstring=${string,,} # the string, converted to lower case(You can also convert just the first letter by using a single ^ or , .) Notice that the original variable, string, is not changed.
Normally, we think of doing this by using the tr command:
newstring=`echo "$string" | tr '[a-z]' '[A-Z]'` newstring=`echo "$string" | tr '[A-Z]' '[a-z]'`Of course, that involves spawning a new process. Actually, as the man page for tr tells you, this isn't optimal; depending on your locale setting, you might get unexpected results. It's safer to say
newstring=`echo "$string" | tr '[:lower:]' '[:upper:]'` newstring=`echo "$string" | tr '[:upper:]' '[:lower:]'`Using tr is certainly more readable; but it also takes a lot longer to type. How about execution time?
str1=X i=0 time ( while [ $i -lt 100000 ] do let i++ done )On my machine — currently, a 3 GHz (6000 bogomips) dual-core Pentium box — that takes about 1.57 seconds. That's the bash overhead for running the useless loop. Nearly all of that is “user” time; the “sys” time is only a few dozen milliseconds.
Now let's add the line
str2=${str1^^}to the loop, just after the let statement. The execution time jumps to about 2.3 seconds; so executing the added line 100,000 times took about 0.7 second. That's about 7 microseconds per execution.
Now, let's try putting the line
str2=`echo "$str1" | tr '[:lower:]' '[:upper:]'`inside the loop instead. The execution time is now a whopping 1m 33s of real time — but only 3 seconds of user and 7 sec of system time! Apparently, the system gives both bash and tr a thousand one-millisecond time-slices a second, and then takes a vacation until the next round millisecond comes up.
If we try to even things up a bit by making the initial string longer, we find practically the same times for the version using tr, but about 0.2 second longer than before for the all-shell version, if the string to convert is "Hello, world!". Clearly, we need a really big string to measure bash's speed accurately.
So let's initialize the original string with the line
str1=`cat /usr/share/dict/american-english`which is a text file of 931708 characters. For this big file, a single cycle through the loop is enough: it takes bash about 45.7 seconds, all but a few milliseconds of which is “user” time. On the other hand, the tr version takes only 0.24 seconds to process the big text file.
Clearly, there's a trade-off here that depends on the size of the string to be converted. Evidently, the context switch required to invoke tr is the bottleneck when the string is short; but tr is so much more efficient than bash in converting big strings that it's faster when the string exceeds a few thousand characters. I find my machine takes about 1.55 milliseconds to process a string about 4100 characters long, regardless of which method is used. (About a quarter of a millisecond is used by the system when tr is invoked; presumably, that's the time required to set up the pipeline and make the context switch.)
newstring=${oldstring/pattern/replacement}Notice that there is no trailing slash, as in sed or vi: the closing brace terminates the substitution string. (Be careful to use braces and not parentheses here.)
The catch is that only shell-type patterns (like those used in pathname expansion) can be used, not the “extended” regular expressions recognized by sed. Also, only a single replacement normally occurs; but you can simulate a “global” replacement by using two slashes before the pattern:
newstring=${oldstring//pattern/replacement}A handy use for this trick is in sanitizing user input. For example, you might want to convert a filename into a form that's safe to use as (part of) a shell-variable name: filenames can contain hyphens and other special characters that are not allowed in variable names, which can only be alphanumeric. So, to clean up a dirty string:
clean=${dirty//[-+=.,]/_}If we had set dirty='a,b.c=d-e+f', the line above converts the dangerous characters to underscores, forming the clean string: a_b_c_d_e_f, which can be used safely in a shell script.
And you can omit the replacement string, thereby deleting the offensive parts entirely. So, for example,
cleaned=${dirty//[-+=.,]}is equivalent to
cleaned=`echo $dirty | sed -e 's/[-+=.,]//g'`or
cleaned=`echo $dirty | tr -d '+=.,-'`where we have to put the hyphen last so tr won't think it's an option.
Be careful: sed and tr allow the use of ranges like 'A-Z' and '0-9' ; but bash requires you to either enumerate these, or to use character classes like [:upper:] or [:digit:] within the brackets that define the pattern list.
You can even force the pattern to appear at the beginning or the end of the string being edited, by prefixing pattern with # (for the start) or % (for the end).
dirpath=${path%/*}extracts the part of the path variable before the last slash; and
base=${path##*/}yields the part after the last slash. CAUTION: Notice that the asterisk goes between the slash and the ##, but after the %.
That's because
${varname#pattern}trims the shortest prefix from the contents of the shell variable varname that matches the shell-pattern pattern ; and
${varname##pattern}trims the longest prefix that matches the pattern from the contents of the shell variable. Likewise,
${varname%pattern}trims the shortest suffix from the contents of the shell variable varname that matches the shell-pattern pattern ; and
${varname%%pattern}trims the longest suffix that matches the pattern from the contents of the shell variable. You can see that the general rule here is: a single # or % to match the shortest part; or a double ## or %% to match the longest part.
But be careful. If you just feed a bare filename instead of a pathname to dirname, you get just a dot [.]; but if there are no slashes in the variable you process with the hack above, you get the filename back, unaltered: because there were no slashes in it, nothing got removed. So this trick isn't a complete replacement for dirname.
Another use of basename is to remove a suffix from a filename. We often need to do this in shell scripts when we want to generate an output file with the same basename but a different extension from an input file. For example, to convert file.old to file.new, you could use
newname=`basename $oldname .old`.newso that, if you had set oldname to file.old , newname would be set to file.new . But it's faster to say
newname=${oldname%.old}.new(Notice that we have to use the % operation here, even though the generic replacement for basename given above uses the ## operation. That's because we're trimming off a suffix rather than a prefix, in this case.) If you didn't know the old file extension, you could still replace it by saying
newname=${oldname%.*}.new
This way of trimming off a prefix or a suffix is also useful for separating numbers that contain a decimal point between the integer and fractional parts. For example, if we set DECIMAL=123.4567, we can get the part before the decimal as
INTEGER=${DECIMAL%.*}and the digits of the fraction as
FRACT=${DECIMAL#*.}
let sum=5+2will store 7 in sum. Of course, the operands on the right side can just as well be shell variables; so, if x and y are numerical, you could
let sum=x+ywhich is both more compact and faster than
sum=`expr $x + $y`
If you want to space the expression out for better readability, you can say
let "sum = x + y"and bash will do the right thing. (You have to use quotes so that let has just a single argument. If you don't like the quotes, you can say
sum=$(( x + y ))but then you can't have spaces around the = sign.)
This way of doing arithmetic is a lot more readable than using expr — especially when you're doing multiplications, because expr has to have its arguments separated by whitespace, so the asterisk [*] has to be quoted:
product=`expr $x \* $y`Yuck. Pretty ugly, compared to
let "product = x * y"
Finally, when you need to increment a counter, you can say
let i++or
let j+=2which is cleaner, faster, and more readable than invoking expr.
${string:5:2}for example.
This can save a lot of work when parsing replies to shell-script questions. If the shell script asks a yes/no question, you only need to check the first letter of the reply. Then
init=${string:0:1}is what you want to test. (This gives you 1 character, starting at position 0 — in other words, the first character of the string.)
If the “offset” parameter is −1, the substring begins at the last character of the string; so
last=${string: −1:1}gives you just the last character. (Note the space that's needed to separate the colon from the minus sign; this is required to avoid confusion with the colon-minus sequence used in specifying a default value.)
To get the last 2 characters, you should specify
last2=${string: −2:2} ;note that
penult=${string: −2:1}gives you the next -to-last character.
size=`wc -c somefile`because this captures the user-friendly repetition of the filename in the output. Instead, you want to re-direct the input to wc:
size=`wc -c < somefile`
But if the operand is already in a shell variable, you certainly don't want to do this:
size=`echo -n "$string" | wc -c`— particularly if the string is short — because bash can do the job itself:
size=${#string}
It's even possible to make bash fake wc -w , if you don't mind sacrificing the positional parameters:
set $string nwords=$#
Copyright © 2011 – 2012, 2020, 2023, 2024 Andrew T. Young
or the
alphabetic index page
or the
GF home page
or the website overview page