awk编程参考
模式Patterns
AWK支持以下模式:
- BEGIN {statements}
- END {statements}
- expression {statements}
- /regular expression/ {statements}
- compound pattern {statements}
- pattern1,pattern2 {statements}
其中:
- BEGIN、END和范围模式不能其他模式组合;
- 简单模式可通过&&, ||以及!进行组合;
- 仅BEGIN和END模式必须要有对应动作;
- 支持的比较运算符有:<, <=, ==, !=, >, >=, ~, !~;
字符串匹配模式
- /regex/:当前行是否有子串匹配regex;
- expression ~ /regex/:expression是否有子串匹配regex;
- expression !~ /regex/:expression是否有子串不匹配regex;
动作Actions
AWK支持以下动作:
- 包含常量、变量、赋值以及函数调用的表达式
- print语句
- printf调用
- if (expression) statement
- if (expression) statement else statement
- while (expresssion) statement
- do statement while (expression)
- break, continue, next, exit, exit expression
- {statements}
AWK内置变量
- ARGC: 参数个数
- ARGV: 参数列表数组
- FILENAME: 当前输入文件名
- FNR: 当前输入文件记录数
- FS: 输入字段分隔符,默认为空格
- NF: 当前行字段数
- NR: 当前已读记录行数
- OFMT: 数字输出格式,默认为%.6g
- OFS: 输出字段分隔符,默认为空格
- ORS: 输出记录分隔符,默认为换行符
- RLENGTH: 字符串匹配函数成功匹配长度
- RS: 输入记录分隔符,默认为换行符
- RSTART: 字符串匹配函数成功匹配开始位置
- SUBSEP: subscript分隔符
AWK支持的运算符
- 赋值运算符:=, +=, -=, *=, /=, %=, ^=
- 条件运算符:?:
- 逻辑运算符:||, &&, !
- 正则匹配符:~, !~
- 关系运算符:<, <=, ==, !=, >, >=
- 字符串连接:无显式符号
- 算术运算符:+, -, *, /, %, ^
- 一元运算符:+, -
- 自增减运算:++, –
- 组合运算符:()
内置数学函数
- atan2(y,x)
- cos(x)
- exp(x)
- int(x)
- log(x)
- rand()
- sin(x)
- sqrt(x)
- srand()
内置字符串函数
- gsub(r,s): 将$0中所有r替换成s,返回替换次数。
- gbus(r,s,t): 将t中所有r替换成s,返回替换次数。
- index(s,t): 返回s中t第一次出现的位置,如不存在则为0。
- length(s): 返回s长度。
- match(s,r): 测试s是否包含匹配r的子串,返回0或匹配位置,并设置RSTART和RLENGTH变量。
- split(s,a): 将s按FS分隔符分隔成数组,放到a中,返回数组大小。
- split(s,a,fs): 将s按fs分隔符分隔成数组,放到a中,返回数组大小。
- sprintf(fmt, exprlist): 格式化字符串。
- sub(r,s): 将$0中第一次匹配r的内容替换成s。
- sub(r,s,t): substitute s for the leftmost longest substring of t matched by r.
- substr(s,p): return suffix of s starting at position p.
- substr(s,p,n): return substring of s of length n staring at potision p.
流程控制语句
- {statements}: 语句块
- if (expression) statement
- if (expression) statement1 else statement2
- while (expresssion) statement
- for (expression1; expression2; expression3) statement
- for (variable in array) statement
- do statement while (expression)
- break
- continue
- next
- exit
- exit expression
输出语句
- print expression1, expression2,…
- print expression1, expression2,… >filename
- print expression1, expression2,… >>filename
- print expression1, expression2,… | command
- printf(format, expression1, expression2,…)
- printf(format, expression2, expression2,…) >filename
- printf(format, expression2, expression2,…) >>filename
- printf(format, expression2, expression2,…) | command
- close(filename), close(command)
- system(command)
例子
$3 > 15 { emp = emp + 1 }
END { print emp, "employees worked more than 15 hours" }
{ pay = pay + $2 * $3 }
END { print NR, "employees"
print "total pay is", pay
print "average pay is", pay/NR
}
$2 > maxrate { maxrate = $2; maxemp = $1 }
END { print "highest hourly rate:", maxrate, "for", maxemp }
{ nc = nc + length($0) + 1
nw = nw + NF
}
END { print NR, "lines,", nw, "words," nc, "characters" }
{ for (i = 1; i <= $3; i = i + 1)
printf("\t%.2f\n", $1 * (1 + $2) ^ i)
}
{ line[NR] = $0 }
END { for (i = NR; i > 0; i = i - 1)
print line[i]
}
BEGIN { FS = "\t"
printf("%10s %6s %5s %s\n\n", "COUNTRY", "AREA", "POP", "CONTINENT")
}
{ printf("%10s %6d %5d %s\n", $1, $2, $3, $4)
area = area + $2
pop = pop + $3
}
END { printf("\n%10s %6d %5d\n", "TOTAL", area, pop) }
BEGIN {
sign = "[+|-]?"
decimal = "[0-9]+[.]?[0-9]*"
fraction = "[.][0-9]+"
exponent = "([eE]" sign "[0-9]+)?"
number = "^" sign "(" decimal "|" fraction ")" exponent "$"
}
$0 ~ number
/Asia/ { pop["Asia"] += $3 }
/Europe/ { pop["Europe"] += $3 }
END { print "Asian population is", pop["Asia"], "million."
print "Europe population is", pop["Europe"], "million."
}
BEGIN { FS = "\t" }
{ pop[$4] += $3 }
END { for (name in pop) print name, pop[name] }
BEGIN {
srand()
for (i = 0; i < 150; i++) x[int(101*rand()/10)] += 1
}
END {
for (i = 0; i < 10; i++)
printf(" %2d - %2d: %3d %s\n", 10*i, 10*i+9, x[i], rep(x[i], "*"))
printf(" 100: %3d %s\n", x[10], rep(x[10], "*"))
}
function rep(n, s, t) {
while (n-- > 0) t = t s
return t
}