Ir direto para busca.
sed-br · Lista sobre SED e Expressões Regulares

Informações sobre o grupo

? Você já é um associado? Entre no Yahoo!

Dicas

Você sabia...
Você pode receber várias mensagens em um único e-mail. Basta configurar suas opções de entrega de e-mail.

Mensagens

  Ajuda
Avançado
mensagens 197 - 226 de 5040   Mais antigos  |  < Mais antigos  |  Mais recentes >  |  Mais recentes
mensagens 197 - 226 de 5040   Mais antigos  |  < Mais antigos  |  Mais recentes >  |  Mais recentes
mensagens: Exibir resumo de mensagens Classificar por data ^  
#197 De: Eliphas Levy Theodoro <eliphas@...>
Data: Qui, 25 de Mai de 2000 1:46 am
Assunto: Re: inserir uma linha
eliphas@...
Enviar e-mail Enviar e-mail
 
Mauricio Teixeira, @ 12:01:

> Sendo que a linha "numero: 8979" vai de 1 a 90.000, quer dizer,
> poderíamos fazer uma busca por números e inserir uma linha em branco
> imediatamente após eles....(é assim?)

pensou regularmente! mas podemos ser mais específicos, procurando por
'uma linha começando com "numero: " e seguida por um ou mais números'.

$ sed '/^numero: [0-9]\+/{N;s/\(\n\)/\1\1/;}' arquivo

/numero: [0-9]\+/       procure por 'numero: ' seguido de um ou mais números
{                       inicia bloco de comandos para a linha encontrada
N                       junta esta linha com a próxima
s/                      substitua...
\(                      inicia um buffer pra referenciação posterior
\n                      linha nova
\)                      fecha buffer
/                       por...
\1\1                    linha-nova linha-nova (buffer 1)
/                       fim.
}                       fecha bloco de comandos.

> Tem como fazer isso? (onde é que eu acho informação sobre expressões
> regulares como inserir linha, deletar linha, pois eu não sei quais
> caracter representa esses comandos).

'man sed'. tem tudim.

também veja http://www.conectiva.com.br/~aurelio/sed

sed you later,
--
eliphas

#198 De: # aurelio marinho jargas <aurelio@...>
Data: Qui, 25 de Mai de 2000 3:43 am
Assunto: Re: inserir uma linha
aurelio@...
Enviar e-mail Enviar e-mail
 
olás,


@ 24/5, Eliphas Levy Theodoro:
> Mauricio Teixeira, @ 12:01:
> > Sendo que a linha "numero: 8979" vai de 1 a 90.000, quer dizer,
> > poderíamos fazer uma busca por números e inserir uma linha em branco
> > imediatamente após eles....(é assim?)
>
> pensou regularmente! mas podemos ser mais específicos, procurando por
> 'uma linha começando com "numero: " e seguida por um ou mais números'.
>
> $ sed '/^numero: [0-9]\+/{N;s/\(\n\)/\1\1/;}' arquivo
>
> /numero: [0-9]\+/       procure por 'numero: ' seguido de um ou mais números
> {                       inicia bloco de comandos para a linha encontrada
> N                       junta esta linha com a próxima
> s/                      substitua...
> \(                      inicia um buffer pra referenciação posterior
> \n                      linha nova
> \)                      fecha buffer
> /                       por...
> \1\1                    linha-nova linha-nova (buffer 1)
> /                       fim.
> }                       fecha bloco de comandos.


OU

sed 's/^numero: [0-9]\+$/&\
/' arquivo

basta "escapar" a quebra de linha. fica feio, mas funciona.
bem que podia funcionar o maldito \n...

o & referencia o padrão casado na primeira parte do comando s, ou
seja, a própria linha com o número.



> também veja http://www.conectiva.com.br/~aurelio/sed

esse tá meio porco, concha de retalhos mesmo, mas tem algumas
coisinhas interessantes...


--
s/:(/:)/;s/:(/:|/;s/:(/>(/,http://www.conectiva.com.br/~aurelio
${linux/mouse/},ctrl+a],http://www.brasmidia.com/dumbs,<esc>:wq

#199 De: # aurelio marinho jargas <aurelio@...>
Data: Qui, 25 de Mai de 2000 3:52 am
Assunto: Re: inserir uma linha
aurelio@...
Enviar e-mail Enviar e-mail
 
@ 24/5, Mauricio Teixeira:
> Tem como fazer isso? (onde é que eu acho informação sobre expressões
> regulares como inserir linha, deletar linha, pois eu não sei quais
> caracter representa esses comandos).

ôpa, agora que eu vi o fim da mensagem.
com ERs, não tem especificamente como dizer "apague a linha X".

mas com sed+ERs sim.

como o eliphas já disse, na "man sed" tem tudo, mas admito que a
informação é tão direta e técnica que acaba quase que não
significando algo de palpável.

o comando sed para apagar linhas é o d de Delete
o comando sed para inserir linhas é o i para Inserir uma linha
ANTES da linha atual e o a para Apendar uma linha DEPOIS da ilnha
atual.


então:

sed d         apaga todas as linhas
sed 1,5d      apaga as 5 primeiras linhas
sed /linux/d  apaga todas as linhas que contém a palavra linux

o mesmo para os comandos de inserção.

sed /<html>/a\
<body>

que ao achar uma o início de um arquivo html, já inicia a seção
BODY logo abaixo.



--
s/:(/:)/;s/:(/:|/;s/:(/>(/,http://www.conectiva.com.br/~aurelio
${linux/mouse/},ctrl+a],http://www.brasmidia.com/dumbs,<esc>:wq

#200 De: Eliphas Levy Theodoro <eliphas@...>
Data: Qui, 25 de Mai de 2000 3:55 am
Assunto: Re: inserir uma linha
eliphas@...
Enviar e-mail Enviar e-mail
 
# aurelio marinho jargas, @ 00:43:

> olás,

eai ':)

> @ 24/5, Eliphas Levy Theodoro:
> > Mauricio Teixeira, @ 12:01:
> > > Sendo que a linha "numero: 8979" vai de 1 a 90.000, quer dizer,
> > > poderíamos fazer uma busca por números e inserir uma linha em branco
> > > imediatamente após eles....(é assim?)
> >
> > pensou regularmente! mas podemos ser mais específicos, procurando por
> > 'uma linha começando com "numero: " e seguida por um ou mais números'.
> >
> > $ sed '/^numero: [0-9]\+/{N;s/\(\n\)/\1\1/;}' arquivo
> >
> > /numero: [0-9]\+/       procure por 'numero: ' seguido de um ou mais números
> > {                       inicia bloco de comandos para a linha encontrada
> > N                       junta esta linha com a próxima
> > s/                      substitua...
> > \(                      inicia um buffer pra referenciação posterior
> > \n                      linha nova
> > \)                      fecha buffer
> > /                       por...
> > \1\1                    linha-nova linha-nova (buffer 1)
> > /                       fim.
> > }                       fecha bloco de comandos.
>
>
> OU
>
> sed 's/^numero: [0-9]\+$/&\
> /' arquivo
>
> basta "escapar" a quebra de linha. fica feio, mas funciona.
> bem que podia funcionar o maldito \n...

só se copiar ele em um buffer, como fiz lá em cima. ':)
mas aí torna-se obrigatório o uso do N, senão não há como referenciar um \n.

> o & referencia o padrão casado na primeira parte do comando s, ou
> seja, a própria linha com o número.

sed -e '/numero: [0-9]\+/a \' -e '' arquivo

procure, anexe (append - 'a \') uma linha contendo nada.

é preciso dos dois '-e' para o sed pensar que está lendo linha a
linha. cada -e é uma linha.

--
eliphas

#201 De: Junior <extacy@...>
Data: Ter, 30 de Mai de 2000 6:39 pm
Assunto: Fala galera !
extacy@...
Enviar e-mail Enviar e-mail
 
Sei que já pedi isso antes e o Aurélio tinha me respondido, porém
graças a um pau na minha máquina perdi tudo :(
'Tem coisas que só a microsoft fa por vc' essa é uma delas hehehe
mas vamos ao que interessa !
eu precisava de um er para data alguém aí me dá uma força :) ??

[]'s


--
/*
If it happens once, it's a bug.
If it happens twice, it's a feature.
If it happens more than twice, it's windows.
*/

#202 De: # aurelio marinho jargas <aurelio@...>
Data: Ter, 30 de Mai de 2000 7:25 pm
Assunto: Re: Fala galera !
aurelio@...
Enviar e-mail Enviar e-mail
 
@ 30/5, Junior:

> Sei que já pedi isso antes e o Aurélio tinha me respondido, porém
> graças a um pau na minha máquina perdi tudo :(
> 'Tem coisas que só a microsoft fa por vc' essa é uma delas hehehe
> mas vamos ao que interessa !
> eu precisava de um er para data alguém aí me dá uma força :) ??

a lista tem histórico meu, se a resposta que eu tinha dado serve,
procura lá no histórico:

http://www.egroups.com/messages/sed-br


--
s/:(/:)/;s/:(/:|/;s/:(/>(/,http://www.conectiva.com.br/~aurelio
${linux/mouse/},ctrl+a],http://www.brasmidia.com/dumbs,<esc>:wq

#203 De: # aurelio marinho jargas <aurelio@...>
Data: Sex, 2 de Jun de 2000 2:39 am
Assunto: usando a quebra de linha como separador
aurelio@...
Enviar e-mail Enviar e-mail
 
vi isso num roteiro sed agora e juro que isso nunca tinha passado
na minha cabeça...

o comando de substituição s aceita qualquer caractere como
separador, então:

s/minha palavra palha/Outra PALAVRA mala/
s,minha palavra palha,Outra PALAVRA mala,
s|minha palavra palha|Outra PALAVRA mala|
s#minha palavra palha#Outra PALAVRA mala#

são similares, bem como

s
minha palavra palha
Outra PALAVRA mala


usando a quebra de linha como separador, bem mais visual não?


--
s/:(/:)/;s/:(/:|/;s/:(/>(/,http://www.conectiva.com.br/~aurelio
${linux/mouse/},ctrl+a],http://www.brasmidia.com/dumbs,<esc>:wq

#204 De: "Alexandre Biancalana" <ale@...>
Data: Seg, 5 de Jun de 2000 8:38 pm
Assunto: Re: inserir uma linha
ale@...
Enviar e-mail Enviar e-mail
 
Ola Mauricio....

Creio que voce possa usar algo como:

  sed s/([1-90000])/$1\n/g

[ ]''s
Alexandre


----- Original Message -----
From: Mauricio Teixeira <msteixeira@...>
To: <sed-br@egroups.com>
Sent: Wednesday, May 24, 2000 9:01 AM
Subject: [sed-br] inserir uma linha


> Oi pessoal,
>
> Se alguém puder ajudar, eu agradeço. O fato é que tenho um arquivo
> assim:
>
> textoaaaa
> textobbb
> textoccc
> numero: 8979
> textoggg
> textouiuu
> textotttt
> numero: 89765
>
> e assim por diante, e eu gostaria de colocar uma linha em branco,
> ficando assim:
>
> textoaaaa
> textobbb
> textoccc
> numero: 8979
>
> textoggg
> textouiuu
> textotttt
> numero: 765
>
> Sendo que a linha "numero: 8979" vai de 1 a 90.000, quer dizer,
> poderíamos fazer uma busca por números e inserir uma linha em branco
> imediatamente após eles....(é assim?)
> Tem como fazer isso? (onde é que eu acho informação sobre expressões
> regulares como inserir linha, deletar linha, pois eu não sei quais
> caracter representa esses comandos).
>
> Abraço
> Mauricio
>
>
> ------------------------------------------------------------------------
> Best friends, most artistic, class clown Find 'em here:
> http://click.egroups.com/1/4054/0/_/161736/_/959180289/
> ------------------------------------------------------------------------
>
>
>

#205 De: # aurelio marinho jargas <aurelio@...>
Data: Qua, 7 de Jun de 2000 10:46 pm
Assunto: Re: inserir uma linha
aurelio@...
Enviar e-mail Enviar e-mail
 
@ 5/6, Alexandre Biancalana:
> Ola Mauricio....
> Creio que voce possa usar algo como:
>
>  sed s/([1-90000])/$1\n/g

infelizmente não alexandre &:(
tem alguns detalhes que acho que você confundiu com outra
ferramenta.

o sed não interpreta \n na segunda parte do comando s (a parte do
texto que substituirá o antigo)

(aliás, esse é um pedido antigo dos usuários de sed...)


e o intervalo [1-90000] não é válido porque a classe [] só
representa um caractere (neste caso de 1 a 9 e o 0)

bem, os () precisam ser escapados no sed para agrupar

e para referenciar o conteúdo agrupado usa-se \1 e não $1 (como
no perl)


fora isso dá certo &:)


>
> ----- Original Message -----
> From: Mauricio Teixeira <msteixeira@...>
> To: <sed-br@egroups.com>
> Sent: Wednesday, May 24, 2000 9:01 AM
> Subject: [sed-br] inserir uma linha
>
>
> > Oi pessoal,
> >
> > Se alguém puder ajudar, eu agradeço. O fato é que tenho um arquivo
> > assim:
> >
> > textoaaaa
> > textobbb
> > textoccc
> > numero: 8979
> > textoggg
> > textouiuu
> > textotttt
> > numero: 89765
> >
> > e assim por diante, e eu gostaria de colocar uma linha em branco,
> > ficando assim:
> >
> > textoaaaa
> > textobbb
> > textoccc
> > numero: 8979
> >
> > textoggg
> > textouiuu
> > textotttt
> > numero: 765
> >
> > Sendo que a linha "numero: 8979" vai de 1 a 90.000, quer dizer,
> > poderíamos fazer uma busca por números e inserir uma linha em branco
> > imediatamente após eles....(é assim?)
> > Tem como fazer isso? (onde é que eu acho informação sobre expressões
> > regulares como inserir linha, deletar linha, pois eu não sei quais
> > caracter representa esses comandos).
> >
> > Abraço
> > Mauricio


--
s/:(/:)/;s/:(/:|/;s/:(/>(/,http://www.conectiva.com.br/~aurelio
${linux/mouse/},ctrl+a],http://www.brasmidia.com/dumbs,<esc>:wq

#206 De: "Victor Apocalypse Rodrigues" <zedek@...>
Data: Qua, 7 de Jun de 2000 11:09 pm
Assunto: Re: inserir uma linha
zedek@...
Enviar e-mail Enviar e-mail
 
Então ficaria algo do tipo:

sed s/.{1,90000}/\1^M/g
ou  s/.{1,90000}/\\1^M/g (não sei quantas vezes tem que escapar a \)

     ?? =)
     Sendo o ^M obtido através de CTRL+V e depois ENTER.

     Falou!

Victor Apocalypse Rodrigues
Portal - Matrix Internet S.A.


----- Original Message -----
From: "# aurelio marinho jargas" <aurelio@...>
To: <sed-br@egroups.com>
Sent: Wednesday, June 07, 2000 7:46 PM
Subject: Re: [sed-br] inserir uma linha


> @ 5/6, Alexandre Biancalana:
> > Ola Mauricio....
> > Creio que voce possa usar algo como:
> >
> >  sed s/([1-90000])/$1\n/g
>
> infelizmente não alexandre &:(
> tem alguns detalhes que acho que você confundiu com outra
> ferramenta.
>
> o sed não interpreta \n na segunda parte do comando s (a parte do
> texto que substituirá o antigo)
>
> (aliás, esse é um pedido antigo dos usuários de sed...)
>
>
> e o intervalo [1-90000] não é válido porque a classe [] só
> representa um caractere (neste caso de 1 a 9 e o 0)
>
> bem, os () precisam ser escapados no sed para agrupar
>
> e para referenciar o conteúdo agrupado usa-se \1 e não $1 (como
> no perl)
>
>
> fora isso dá certo &:)
>
>
> >
> > ----- Original Message -----
> > From: Mauricio Teixeira <msteixeira@...>
> > To: <sed-br@egroups.com>
> > Sent: Wednesday, May 24, 2000 9:01 AM
> > Subject: [sed-br] inserir uma linha
> >
> >
> > > Oi pessoal,
> > >
> > > Se alguém puder ajudar, eu agradeço. O fato é que tenho um arquivo
> > > assim:
> > >
> > > textoaaaa
> > > textobbb
> > > textoccc
> > > numero: 8979
> > > textoggg
> > > textouiuu
> > > textotttt
> > > numero: 89765
> > >
> > > e assim por diante, e eu gostaria de colocar uma linha em branco,
> > > ficando assim:
> > >
> > > textoaaaa
> > > textobbb
> > > textoccc
> > > numero: 8979
> > >
> > > textoggg
> > > textouiuu
> > > textotttt
> > > numero: 765
> > >
> > > Sendo que a linha "numero: 8979" vai de 1 a 90.000, quer dizer,
> > > poderíamos fazer uma busca por números e inserir uma linha em branco
> > > imediatamente após eles....(é assim?)
> > > Tem como fazer isso? (onde é que eu acho informação sobre expressões
> > > regulares como inserir linha, deletar linha, pois eu não sei quais
> > > caracter representa esses comandos).
> > >
> > > Abraço
> > > Mauricio
>
>
> --
> s/:(/:)/;s/:(/:|/;s/:(/>(/,http://www.conectiva.com.br/~aurelio
> ${linux/mouse/},ctrl+a],http://www.brasmidia.com/dumbs,<esc>:wq
>
>
> ------------------------------------------------------------------------
> Remember four years of good friends, bad clothes, explosive chemistry
> experiments.
> http://click.egroups.com/1/4051/0/_/161736/_/960417992/
> ------------------------------------------------------------------------
>
>
>

#207 De: # aurelio marinho jargas <aurelio@...>
Data: Qua, 7 de Jun de 2000 11:34 pm
Assunto: Re: inserir uma linha
aurelio@...
Enviar e-mail Enviar e-mail
 
@ 7/6, Victor Apocalypse Rodrigues:
>     Então ficaria algo do tipo:
>
> sed s/.{1,90000}/\1^M/g
> ou  s/.{1,90000}/\\1^M/g (não sei quantas vezes tem que escapar a \)
>
>     ?? =)
>     Sendo o ^M obtido através de CTRL+V e depois ENTER.


infelizmente o ^M não funciona tb no sed... &:(
(no vi funciona)

vc deve escapar a quebra de linha, assim:
sed 's/isso/aquilo\
/'


quanto ao escape do \1, depende do interpretador de comandos
(shell), ou da linguagem de programação... o sed tanto com scripts
próprios quanto quando executado no prompt, o \1 funciona

já em outras linguagens, onde a ER é pré-processada e considerada
do tipo string, deve ser \\1 (como no php).

mas na dúvida, tenta com um, se não der coloca outro...


a construção .{1,90000} está correta, mas não é exatamente o que
ele queria, pois ela indica repetição. o desejado era um número
entre 0 e 90000, veja:

.{1,90000}   qualquer caractere (.) de 1 a 90000 vezes!
[0-9]{1,5}   um número ([0-9) de 1 a 5 vezes, ou seja,
                um número até 5 dígitos


sempre lembrando que o sed é o rei dos escapes: [0-9]\{1,5\}
&:)

e por fim, o \1 referencia ao conteúdo casado do primeiro parênteses
aberto. ei! mas vc não colocou parênteses, então o \1 nesse caso é
vazio.

ficaria mais ou menos assim:

sed 's/\([0-9]\{1,5\}\)/\1\
/'

sim, também tem que escapar os ()... (não falei que era o rei? &:) )
pra entender melhor a ER, basta tirar os escapes:

sed 's/([0-9]{1,5})/\1\
/'

ou seja, troque um número de até 5 dígitos por ele mesmo e uma quebra
de linha.


isso já funciona, mas quebrará a linha em todo número que achar no
texto, e o proposto inicial era quebra apenas na linha:

numero: 64764

então apenas colocamos a parte 'numero: ' no começo pra trocar apenas
na hora certa:

sed 's/(numero: [0-9]{1,5})/\1\
/'

ufa! acho que é isso &:)

mas lembra que para executar tem que escapar
(malditos escapes) os () e as {}

se não entendeu, ou errei em alguma parte, diz aí amigo.


obs.: deve ter uma meia dúzia de formas diferentes de resolver este
problema, esta ficou legal porque explica o uso dos registradores
(parênteses)


> From: "# aurelio marinho jargas" <aurelio@...>
> > @ 5/6, Alexandre Biancalana:
> > > Ola Mauricio....
> > > Creio que voce possa usar algo como:
> > >
> > >  sed s/([1-90000])/$1\n/g
> >
> > infelizmente não alexandre &:(
> > tem alguns detalhes que acho que você confundiu com outra
> > ferramenta.
> >
> > o sed não interpreta \n na segunda parte do comando s (a parte do
> > texto que substituirá o antigo)
> >
> > (aliás, esse é um pedido antigo dos usuários de sed...)
> >
> >
> > e o intervalo [1-90000] não é válido porque a classe [] só
> > representa um caractere (neste caso de 1 a 9 e o 0)
> >
> > bem, os () precisam ser escapados no sed para agrupar
> >
> > e para referenciar o conteúdo agrupado usa-se \1 e não $1 (como
> > no perl)
> >
> > > ----- Original Message -----
> > > From: Mauricio Teixeira <msteixeira@...>
> > > To: <sed-br@egroups.com>
> > > Sent: Wednesday, May 24, 2000 9:01 AM
> > > Subject: [sed-br] inserir uma linha
> > >
> > > > Se alguém puder ajudar, eu agradeço. O fato é que tenho um arquivo
> > > > assim:
> > > >
> > > > textoaaaa
> > > > textobbb
> > > > textoccc
> > > > numero: 8979
> > > > textoggg
> > > > textouiuu
> > > > textotttt
> > > > numero: 89765
> > > >
> > > > e assim por diante, e eu gostaria de colocar uma linha em branco,
> > > > ficando assim:
> > > >
> > > > textoaaaa
> > > > textobbb
> > > > textoccc
> > > > numero: 8979
> > > >
> > > > textoggg
> > > > textouiuu
> > > > textotttt
> > > > numero: 765
> > > >
> > > > Sendo que a linha "numero: 8979" vai de 1 a 90.000, quer dizer,
> > > > poderíamos fazer uma busca por números e inserir uma linha em branco
> > > > imediatamente após eles....(é assim?)
> > > > Tem como fazer isso? (onde é que eu acho informação sobre expressões
> > > > regulares como inserir linha, deletar linha, pois eu não sei quais
> > > > caracter representa esses comandos).


--
s/:(/:)/;s/:(/:|/;s/:(/>(/,http://www.conectiva.com.br/~aurelio
${linux/mouse/},ctrl+a],http://www.brasmidia.com/dumbs,<esc>:wq

#208 De: "Victor Apocalypse Rodrigues" <zedek@...>
Data: Qui, 8 de Jun de 2000 12:02 am
Assunto: Re: inserir uma linha
zedek@...
Enviar e-mail Enviar e-mail
 
----- Original Message -----
From: "# aurelio marinho jargas" <aurelio@...>
To: <sed-br@egroups.com>
Sent: Wednesday, June 07, 2000 8:34 PM
Subject: Re: [sed-br] inserir uma linha


> infelizmente o ^M não funciona tb no sed... &:(
> (no vi funciona)
>
> vc deve escapar a quebra de linha, assim:
> sed 's/isso/aquilo\
> /'
>

     Bom, pelo menos algo funciona, mas não é nada intuitivo... =)

>
> quanto ao escape do \1, depende do interpretador de comandos
> (shell), ou da linguagem de programação... o sed tanto com scripts
> próprios quanto quando executado no prompt, o \1 funciona
>
> já em outras linguagens, onde a ER é pré-processada e considerada
> do tipo string, deve ser \\1 (como no php).
>
> mas na dúvida, tenta com um, se não der coloca outro...
>
>
> a construção .{1,90000} está correta, mas não é exatamente o que
> ele queria, pois ela indica repetição. o desejado era um número
> entre 0 e 90000, veja:
>
> .{1,90000}   qualquer caractere (.) de 1 a 90000 vezes!
> [0-9]{1,5}   um número ([0-9) de 1 a 5 vezes, ou seja,
>                um número até 5 dígitos
>

     Bom, no caso, essa ER aceita numeros até 99999, e se ele só quer até
90000 poderia ser assim:

     ([0-8][0-9]{1,4})|(90000)

     Assim ele aceita numeros de 0 a 89999 ou 90000 =)

>
> sed 's/(numero: [0-9]{1,5})/\1\
> /'
>

     Agora seria:

sed 's/(numero: ([0-8][0-9]{1,4})|(90000))/\1\
/'

> ufa! acho que é isso &:)
>
> mas lembra que para executar tem que escapar
> (malditos escapes) os () e as {}
>

     Assim:

sed 's/\(numero: \([0-8][0-9]\{1,4\}\)|\(90000\)\)/\1\
/'

     ?? =)


> se não entendeu, ou errei em alguma parte, diz aí amigo.
>
>

     Bom, por enquanto entendi. Valeu!

Victor Apocalypse Rodrigues
Portal - Matrix Internet S.A.

#209 De: # aurelio marinho jargas <aurelio@...>
Data: Qui, 8 de Jun de 2000 1:12 am
Assunto: Re: inserir uma linha
aurelio@...
Enviar e-mail Enviar e-mail
 
@ 7/6, Victor Apocalypse Rodrigues:
> > infelizmente o ^M não funciona tb no sed... &:(
> > (no vi funciona)
> > vc deve escapar a quebra de linha, assim:
> > sed 's/isso/aquilo\
> > /'
>
>     Bom, pelo menos algo funciona, mas não é nada intuitivo... =)

pois é, eu tb demorei uma cara até descobrir isso...


> > a construção .{1,90000} está correta, mas não é exatamente o que
> > ele queria, pois ela indica repetição. o desejado era um número
> > entre 0 e 90000, veja:
> >
> > .{1,90000}   qualquer caractere (.) de 1 a 90000 vezes!
> > [0-9]{1,5}   um número ([0-9) de 1 a 5 vezes, ou seja,
> >                um número até 5 dígitos
>
>     Bom, no caso, essa ER aceita numeros até 99999, e se ele só quer até
> 90000 poderia ser assim:
>
>     ([0-8][0-9]{1,4})|(90000)
>     Assim ele aceita numeros de 0 a 89999 ou 90000 =)

certo! só um detalhe, os parênteses de dentro não são
necessários:

      ([0-8][0-9]{1,4}|90000)


> > mas lembra que para executar tem que escapar
> > (malditos escapes) os () e as {}
>
>     Assim:
> sed 's/\(numero: \([0-8][0-9]\{1,4\}\)|\(90000\)\)/\1\
> /'

certo! novamente, dá pra tirar os parênteses de dentro. os dois
funcionam, a única diferença é que deixando os parêntess, você
abre um terceiro registrador que não é necessário.

ah! eu falei que a barra vertical | também deve ser escapada? &:)

sed 's/\(numero: \([0-8][0-9]\{1,4\}\|90000\)\)/\1\
/'


>     Bom, por enquanto entendi. Valeu!

legal!

então só pra melhorar ainda mais, como estamos jogando _toda_ a
primeira parte do comando s num registrador:

's/\(expressão\)/\1/\

temos o indicador &, que se colocado na segunda parte do comando
s, representa tudo o que foi casado na primeira parte (ou seja,
exatamente para isso que estes parênteses serviam), então:

's/\(expressão\)/\1/\
/'

é o mesmo que:

's/expressão/&/\
/

sem precisar dos parênteses, tornando a expressão menor e
alocando menos memória (não que isso importe muito em textos
pequenos)

aí fica:

sed 's/numero: \([0-8][0-9]\{1,4\}\|90000\)/&\
/'



--
s/:(/:)/;s/:(/:|/;s/:(/>(/,http://www.conectiva.com.br/~aurelio
${linux/mouse/},ctrl+a],http://www.brasmidia.com/dumbs,<esc>:wq

#210 De: Eliphas Levy Theodoro <eliphas@...>
Data: Qui, 8 de Jun de 2000 1:23 am
Assunto: Re: inserir uma linha
eliphas@...
Enviar e-mail Enviar e-mail
 
# aurelio marinho jargas, @ 22:12:

> sem precisar dos parênteses, tornando a expressão menor e
> alocando menos memória (não que isso importe muito em textos
> pequenos)
>
> aí fica:
>
> sed 's/numero: \([0-8][0-9]\{1,4\}\|90000\)/&\
> /'

eu já prefiro a gambiar^Wopção de procurar a linha a mudar. isso
já pouparia o sed de procurar em cada linha do texto pela expressão. e o
xunxo de colar a próxima linha, copiar o \n do meio dela duas vezes
também é legal, assim dá pra fazer em uma linha só (e ficou menor) ':)

sed '/^numero: [0-9]\+/{N;s/\(\n\)/\1\1/;}'

--
eliphas

I don't suffer from insanity, I enjoy every minute of it.

#211 De: # aurelio marinho jargas <aurelio@...>
Data: Qui, 8 de Jun de 2000 3:07 am
Assunto: Re: inserir uma linha
aurelio@...
Enviar e-mail Enviar e-mail
 
@ 7/6, Eliphas Levy Theodoro:
> # aurelio marinho jargas, @ 22:12:
> > sem precisar dos parênteses, tornando a expressão menor e
> > alocando menos memória (não que isso importe muito em textos
> > pequenos)
> >
> > aí fica:
> >
> > sed 's/numero: \([0-8][0-9]\{1,4\}\|90000\)/&\
> > /'
>
> eu já prefiro a gambiar^Wopção de procurar a linha a mudar. isso

gambiarra nada, assim é melhor mesmo &:)


> já pouparia o sed de procurar em cada linha do texto pela expressão. e o
> xunxo de colar a próxima linha, copiar o \n do meio dela duas vezes
> também é legal, assim dá pra fazer em uma linha só (e ficou menor) ':)
>
> sed '/^numero: [0-9]\+/{N;s/\(\n\)/\1\1/;}'

mas usar o N é apelação...
se fizer em duas linhas como antes fica mais rápido, ou então
usando o 'a' como antes você já tinha feito:

sed -e '/^numero: [0-9]\+/a \' -e ''

--
s/:(/:)/;s/:(/:|/;s/:(/>(/,http://www.conectiva.com.br/~aurelio
${linux/mouse/},ctrl+a],http://www.brasmidia.com/dumbs,<esc>:wq

#212 De: Eliphas Levy Theodoro <eliphas@...>
Data: Qui, 8 de Jun de 2000 3:23 am
Assunto: Re: inserir uma linha
eliphas@...
Enviar e-mail Enviar e-mail
 
# aurelio marinho jargas, @ 00:07:

> @ 7/6, Eliphas Levy Theodoro:
> > # aurelio marinho jargas, @ 22:12:
> > > sem precisar dos parênteses, tornando a expressão menor e
> > > alocando menos memória (não que isso importe muito em textos
> > > pequenos)
> > >
> > > aí fica:
> > >
> > > sed 's/numero: \([0-8][0-9]\{1,4\}\|90000\)/&\
> > > /'
> >
> > eu já prefiro a gambiar^Wopção de procurar a linha a mudar. isso
>
> gambiarra nada, assim é melhor mesmo &:)

a gambiarra é mais pra baixo ':)

> > já pouparia o sed de procurar em cada linha do texto pela expressão. e o
> > xunxo de colar a próxima linha, copiar o \n do meio dela duas vezes
> > também é legal, assim dá pra fazer em uma linha só (e ficou menor) ':)
> >
> > sed '/^numero: [0-9]\+/{N;s/\(\n\)/\1\1/;}'
>
> mas usar o N é apelação...

não falei que era xunxo? ':P

> se fizer em duas linhas como antes fica mais rápido, ou então
> usando o 'a' como antes você já tinha feito:
>
> sed -e '/^numero: [0-9]\+/a \' -e ''

nossa, que ER perfeita! quem fez? '8-)

--
eliphas

I don't suffer from insanity, I enjoy every minute of it.

#213 De: # aurelio marinho jargas <aurelio@...>
Data: Sex, 16 de Jun de 2000 10:43 pm
Assunto: filtro sed para apagar a propaganda do egroups
aurelio@...
Enviar e-mail Enviar e-mail
 
o padrão da propaganda do egroups, que vem em todo fim de
mensagem é:

------------- (com 72 hífens)
some text
some text
...
http://click.egroups.com
some text
...
------------- (com 72 hífens)


fiz um filtro em sed pra arrancar fora essa propaganda da
mensagem:

sed '/^\(> \)*-\{72\}$/{N;:l;/-\{72\}$/bs;N;bl;:s;s%^.*\n\(>
\)*http://click\.egroups\.com.*%%;}'


esse filtro também pega texto citado com '> ':

> > > ------------- (com 72 -)
> > > some text
> > > some text
> > > ...
> > > http://click.egroups.com
> > > some text
> > > ...
> > > ------------- (com 72 -)


quem faz isso é o '\(> \)*'



pra que usa procmail, basta colocar no .procmailrc:

:0 fhw
* Delivered-To:.*@egroups.com
| sed '/^\(> \)*-\{72\}$/{N;:l;/-\{72\}$/bs;N;bl;:s;s%^.*\n\(>
\)*http://click\.egroups\.com.*%%;}'


a tática do filtro é usar estruturas parecidas com o goto de
certas linguagens de programação.

fiz as marcas
:l e :s (loop e comando s)


/^\(> \)*-\{72\}$/       ele procura a primeira linha da assinatura,
{
N                        apenda a próxima linha
:l                       marca l
/-\{72\}$/bs             se for a última linha da assinatura, pule para :s
N                        caso não tenha pulado na anterior, apenda de novo
bl                       pula para l (aqui é o reinício do loop)
:s                       marca s

s%^.*\n\(> \)*http://click\.egroups\.com.*%%
                          se tiver a cadeia http://click.egroups.com
                          apaga toda a assinatura
}


--
s/:(/:)/;s/:(/:|/;s/:(/>(/,http://www.conectiva.com.br/~aurelio
${linux/mouse/},ctrl+a],http://www.brasmidia.com/dumbs,<esc>:wq

#214 De: "Ademar de Souza Reis Jr." <adesr@...>
Data: Seg, 19 de Jun de 2000 3:45 am
Assunto: Re: filtro sed para apagar a propaganda do egroups
adesr@...
Enviar e-mail Enviar e-mail
 
Em 16/06/00 às 19:43, # aurelio marinho jargas escreveu:

>
> pra que usa procmail, basta colocar no .procmailrc:
>
> :0 fhw
> * Delivered-To:.*@egroups.com
> | sed '/^\(> \)*-\{72\}$/{N;:l;/-\{72\}$/bs;N;bl;:s;s%^.*\n\(>
\)*http://click\.egroups\.com.*%%;}'

Não pode ter esse "h" no header do filtro.

h == header, e, no caso, estamos filtrando o corpo da msg.

[]'s
    - Ademar

--
================================================
Ademar de Souza Reis Jr. - ademar@...
http://www.inf.ufpr.br/~asr98
Estudante de Informatica / Bolsista PET - UFPR
Registered Linux User #71790
Curitiba - PR - Brasil

-- Win2k: "It's not so much that it's only 65,000 bugs, it's just that they
stopped at 65,535 to prevent an overflow."

#215 De: # aurelio marinho jargas <aurelio@...>
Data: Qua, 21 de Jun de 2000 10:06 pm
Assunto: Re: filtro sed para apagar a propaganda do egroups
aurelio@...
Enviar e-mail Enviar e-mail
 
@ 19/6, Ademar de Souza Reis Jr.:
> Em 16/06/00 às 19:43, # aurelio marinho jargas escreveu:
> > pra que usa procmail, basta colocar no .procmailrc:
> >
> > :0 fhw
> > * Delivered-To:.*@egroups.com
> > | sed '/^\(> \)*-\{72\}$/{N;:l;/-\{72\}$/bs;N;bl;:s;s%^.*\n\(>
\)*http://click\.egroups\.com.*%%;}'
>
> Não pode ter esse "h" no header do filtro.
>
> h == header, e, no caso, estamos filtrando o corpo da msg.


é verdade, s/h/b/  &:)

--
s/:(/:)/;s/:(/:|/;s/:(/>(/,http://www.conectiva.com.br/~aurelio
${linux/mouse/},ctrl+a],http://www.brasmidia.com/dumbs,<esc>:wq

#216 De: "alexandro@..." <alexandro@...>
Data: Seg, 26 de Jun de 2000 4:59 pm
Assunto: algo +
alexandro@...
Enviar e-mail Enviar e-mail
 
algumas informações sobre sed...
------------------------------------------------------------------------------
-------------------------------------------------------------------------
HANDY ONE-LINERS FOR SED (Unix stream editor)                May 26, 1999
compiled by Eric Pement <epement@...>             version 4.8
Latest version of this file is usually at:
    http://www.cornerstonemag.com/sed/sed1line.txt
    http://seders.icheme.org/tutorials/sedtut_9.txt

FILE SPACING:

  # double space a file
  sed G

  # triple space a file
  sed 'G;G'

  # undo double-spacing (assumes even-numbered lines are always blank)
  sed 'n;d'

NUMBERING:

  # number each line of a file (simple left alignment). Using a tab (see
  # note on '\t' at end of file) instead of space will preserve margins.
  sed = filename | sed 'N;s/\n/\t/'

  # number each line of a file (number on left, right-aligned)
  sed = filename | sed 'N; s/^/     /; s/ *\(.\{6,\}\)\n/\1  /'

  # number each line of file, but only print numbers if line is not blank
  sed '/./=' filename | sed '/./N; s/\n/ /'

  # count lines (emulates "wc -l")
  sed -n '$='

TEXT CONVERSION AND SUBSTITUTION:

  # IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
  sed 's/.$//'               # assumes that all lines end with CR/LF
  sed 's/^M$//'              # in bash/tcsh, press Ctrl-V then Ctrl-M
  sed 's/\x0D$//'            # sed v1.5 only

  # IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format
  sed "s/$/`echo -e \\\r`/"            # command line under ksh
  sed 's/$'"/`echo \\\r`/"             # command line under bash
  sed "s/$/`echo \\\r`/"               # command line under zsh

  # IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format
  sed "s/$//"                          # method 1
  sed -n p                             # method 2

  # IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
  # Cannot be done with DOS versions of sed. Use "tr" instead.
  tr -d \r <infile >outfile            # GNU tr version 1.22 or higher

  # delete leading whitespace (spaces, tabs) from front of each line
  # aligns all text flush left
  sed 's/^[ \t]*//'                    # see note on '\t' at end of file

  # delete trailing whitespace (spaces, tabs) from end of each line
  sed 's/[ \t]*$//'                    # see note on '\t' at end of file

  # delete BOTH leading and trailing whitespace from each line
  sed 's/^[ \t]*//;s/[ \t]*$//'

  # insert 5 blank spaces at beginning of each line (make page offset)
  sed 's/^/     /'

  # align all text flush right on a 79-column width
  sed -e :a -e 's/^.\{1,78\}$/ &/;ta'  # set at 78 plus 1 space

  # center all text in the middle of 79-column width. In method 1,
  # spaces at the beginning of the line are significant, and trailing
  # spaces are appended at the end of the line. In method 2, spaces at
  # the beginning of the line are discarded in centering the line, and
  # no trailing spaces appear at the end of lines.
  sed  -e :a -e 's/^.\{1,77\}$/ & /;ta'                     # method 1
  sed  -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/'  # method 2

  # substitute (find & replace) "foo" with "bar" on each line
  sed 's/foo/bar/'             # replaces only 1st instance in a line
  sed 's/foo/bar/4'            # replaces only 4th instance in a line
  sed 's/foo/bar/g'            # replaces ALL instances in a line

  # substitute "foo" with "bar" ONLY for lines which contain "baz"
  sed '/baz/s/foo/bar/g'

  # substitute "foo" with "bar" EXCEPT for lines which contain "baz"
  sed '/baz/!s/foo/bar/g'

  # change "scarlet" or "ruby" or "puce" to "red"
  sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g'   # most seds
  gsed 's/scarlet\|ruby\|puce/red/g'                # GNU sed only

  # reverse order of lines (emulates "tac")
  # bug/feature in HHsed v1.5 causes blank lines to be deleted
  sed '1!G;h;$!d'               # method 1
  sed -n '1!G;h;$p'             # method 2

  # reverse each character on the line (emulates "rev")
  sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'

  # join pairs of lines side-by-side (like "paste")
  sed 'N;s/\n/ /'

  # if a line ends with a backslash, append the next line to it
  sed -e :a -e '/\\$/N; s/\\\n//; ta'

  # if a line begins with an equal sign, append it to the previous line
  # and replace the "=" with a single space
  sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D'

  # add commas to numeric strings, changing "1234567" to "1,234,567"
  gsed ':a;s/\B[0-9]\{3\}\>/,&/;ta'                     # GNU sed
  sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'  # other seds

  # add commas to numbers with decimal points and minus signs (GNU sed)
  gsed ':a;s/\(^\|[^0-9.]\)\([0-9]\+\)\([0-9]\{3\}\)/\1\2,\3/g;ta'

SELECTIVE PRINTING OF CERTAIN LINES:

  # print first 10 lines of file (emulates behavior of "head")
  sed 10q

  # print first line of file (emulates "head -1")
  sed q

  # print the last 10 lines of a file (emulates "tail")
  sed -e :a -e '$q;N;11,$D;ba'

  # print the last 2 lines of a file (emulates "tail -2")
  sed '$!N;$!D'

  # print the last line of a file (emulates "tail -1")
  sed '$!d'                    # method 1
  sed -n '$p'                  # method 2

  # print only lines which match regular expression (emulates "grep")
  sed -n '/regexp/p'           # method 1
  sed '/regexp/!d'             # method 2

  # print only lines which do NOT match regexp (emulates "grep -v")
  sed -n '/regexp/!p'          # method 1, corresponds to above
  sed '/regexp/d'              # method 2, simpler syntax

  # print 1 line of context before and after regexp, with line number
  # indicating where the regexp occurred (similar to "grep -A1 -B1")
  sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}' -e h

  # grep for AAA and BBB and CCC (in any order)
  sed '/AAA/!d; /BBB/!d; /CCC/!d'

  # grep for AAA and BBB and CCC (in that order)
  sed '/AAA.*BBB.*CCC/!d'

  # grep for AAA or BBB or CCC (emulates "egrep")
  sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d    # most seds
  gsed '/AAA\|BBB\|CCC/!d'                        # GNU sed only

  # print paragraph if it contains AAA (blank lines separate paragraphs)
  # HHsed v1.5 must insert a 'G;' after 'x;' in the next 3 scripts below
  sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;'

  # print paragraph if it contains AAA and BBB and CCC (in any order)
  sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;/BBB/!d;/CCC/!d'

  # print paragraph if it contains AAA or BBB or CCC
  sed -e '/./{H;$!d;}' -e 'x;/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d
  gsed '/./{H;$!d;};x;/AAA\|BBB\|CCC/b;d'         # GNU sed only

  # print only lines of 65 characters or longer
  sed -n '/^.\{65\}/p'

  # print only lines of less than 65 characters
  sed -n '/^.\{65\}/!p'        # method 1, corresponds to above
  sed '/^.\{65\}/d'            # method 2, simpler syntax

  # print section of file from regular expression to end of file
  sed -n '/regexp/,$p'

  # print section of file based on line numbers (lines 8-12, inclusive)
  sed -n '8,12p'               # method 1
  sed '8,12!d'                 # method 2

  # print line number 52
  sed -n '52p'                 # method 1
  sed '52!d'                   # method 2
  sed '52q;d'                  # method 3, efficient on large files

  # beginning at line 3, print every 7th line
  gsed -n '3~7p'               # GNU sed only

  # print section of file between two regular expressions (inclusive)
  sed -n '/Iowa/,/Montana/p'             # case sensitive

SELECTIVE DELETION OF CERTAIN LINES:

  # print all of file EXCEPT section between 2 regular expressions
  sed '/Iowa/,/Montana/d'

  # delete duplicate, consecutive lines from a file (emulates "uniq").
  # First line in a set of duplicate lines is kept, rest are deleted.
  sed '$!N; /^\(.*\)\n\1$/!P; D'

  # delete duplicate, nonconsecutive lines from a file. Beware not to
  # overflow the buffer size of the hold space, or else use GNU sed.
  sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'

  # delete the first 10 lines of a file
  sed '1,10d'

  # delete the last 10 lines of a file
  sed -e :a -e 'N;2,10ba' -e 'P;D'      # method 1
  sed -n -e :a -e '1,10!{P;N;D;};N;ba'  # method 2

  # delete every 8th line
  gsed '0~8d'                           # GNU sed only

  # delete ALL blank lines from a file (same as "grep '.' ")
  sed '/^$/d'                           # method 1
  sed '/./!d'                           # method 2

  # delete all CONSECUTIVE blank lines from file except the first; also
  # deletes all blank lines from top and end of file (emulates "cat -s")
  sed '/./,/^$/!d'          # method 1, allows 0 blanks at top, 1 at EOF
  sed '/^$/N;/\n$/D'        # method 2, allows 1 blank at top, 0 at EOF

  # delete all CONSECUTIVE blank lines from file except the first 2:
  sed '/^$/N;/\n$/N;//D'

  # delete all leading blank lines at top of file
  sed '/./,$!d'

  # delete all trailing blank lines at end of file
  sed -e :a -e '/^\n*$/N;/\n$/ba'

  # delete the last line of each paragraph
  sed -n '/^$/{p;h;};/./{x;/./p;}'

SPECIAL APPLICATIONS:

  # remove nroff overstrikes (char, backspace) from man pages. The 'echo'
  # command may need an -e switch if you use Unix System V or bash shell.
  sed "s/.`echo \\\b`//g"    # double quotes required for Unix environment
  sed 's/.^H//g'             # in bash/tcsh, press Ctrl-V and then Ctrl-H
  sed 's/.\x08//g'           # hex expression for sed v1.5

  # get Usenet/e-mail message header
  sed '/^$/q'                # deletes everything after first blank line

  # get Usenet/e-mail message body
  sed '1,/^$/d'              # deletes everything up to first blank line

  # get Subject header, but remove initial "Subject: " portion
  sed '/^Subject: */!d; s///;q'

  # get return address header
  sed '/^Reply-To:/q; /^From:/h; /./d;g;q'

  # parse out the address proper. Pulls out the e-mail address by itself
  # from the 1-line return address header (see preceding script)
  sed 's/ *(.*)//; s/>.*//; s/.*[:<] *//'

  # add a leading angle bracket and space to each line (quote a message)
  sed 's/^/> /'

  # delete leading angle bracket & space from each line (unquote a message)
  sed 's/^> //'

  # remove most HTML tags (accommodates multiple-line tags)
  sed -e :a -e 's/<[^>]*>//g;/</N;//ba'

  # extract multi-part uuencoded binaries, removing extraneous header
  # info, so that only the uuencoded portion remains. Files passed to
  # sed must be passed in the proper order. Version 1 can be entered
  # from the command line; version 2 can be made into an executable
  # Unix shell script. (Modified from a script by Rahul Dhesi.)
  sed '/^end/,/^begin/d' file1 file2 ... fileX | uudecode   # vers. 1
  sed '/^end/,/^begin/d' "$@" | uudecode                    # vers. 2

  # zip up each .TXT file individually, deleting the source file and
  # setting the name of each .ZIP file to the basename of the .TXT file
  # (under DOS: the "dir /b" switch returns bare filenames in all caps).
  echo @echo off >zipup.bat
  dir /b *.txt | sed "s/^\(.*\)\.TXT/pkzip -mo \1 \1.TXT/" >>zipup.bat

TYPICAL USE: Sed takes one or more editing commands and applies all of
them, in sequence, to each line of input. After all the commands have
been applied to the first input line, that line is output and a second
input line is taken for processing, and the cycle repeats. The
preceding examples assume that input comes from the standard input
device (i.e, the console, normally this will be piped input). One or
more filenames can be appended to the command line if the input does
not come from stdin. Output is sent to stdout (the screen). Thus:

  cat filename | sed '10q'        # uses piped input
  sed '10q' filename              # same effect, avoids a useless "cat"
  sed '10q' filename > newfile    # redirects output to disk

For additional syntax instructions, including the way to apply editing
commands from a disk file instead of the command line, consult "sed &
awk, 2nd Edition," by Dale Dougherty and Arnold Robbins (O'Reilly,
1997; http://www.ora.com), "UNIX Text Processing," by Dale Dougherty
and Tim O'Reilly (Hayden Books, 1987) or the tutorials by Mike Arst
distributed in U-SEDIT2.ZIP (many sites). To fully exploit the power
of sed, one must understand "regular expressions." For this, see
"Mastering Regular Expressions" by Jeffrey Friedl (O'Reilly, 1997).
The manual ("man") pages on Unix systems may be helpful (try "man
sed", "man regexp", or the subsection on regular expressions in "man
ed"), but man pages are notoriously difficult. They are not written to
teach sed use or regexps to first-time users, but as a reference text
for those already acquainted with these tools.

QUOTING SYNTAX: The preceding examples use single quotes ('...')
instead of double quotes ("...") to enclose editing commands, since
sed is typically used on a Unix platform. Single quotes prevent the
Unix shell from intrepreting the dollar sign ($) and backquotes
(`...`), which are expanded by the shell if they are enclosed in
double quotes. Users of the "csh" shell and derivatives will also need
to quote the exclamation mark (!) with the backslash (i.e., \!) to
properly run the examples listed above, even within single quotes.
Versions of sed written for DOS invariably require double quotes
("...") instead of single quotes to enclose editing commands.

USE OF '\t' IN SED SCRIPTS: For clarity in documentation, we have used
the expression '\t' to indicate a tab character (0x09) in the scripts.
However, most versions of sed do not recognize the '\t' abbreviation,
so when typing these scripts from the command line, you should press
the TAB key instead. '\t' is supported as a regular expression
metacharacter in awk, perl, and in a few implementations of sed.

VERSIONS OF SED: Versions of sed do differ, and some slight syntax
variation is to be expected. In particular, most do not support the
use of labels (:name) or branch instructions (b,t) within editing
commands, except at the end of those commands. We have used the syntax
which will be portable to most users of sed, even though the popular
GNU versions of sed allow a more succinct syntax. When the reader sees
a fairly long command such as this:

    sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d

it is heartening to know that GNU sed will let you reduce it to:

    sed '/AAA/b;/BBB/b;/CCC/b;d'      # or even
    sed '/AAA\|BBB\|CCC/b;d'

In addition, remember that while many versions of sed accept a command
like "/one/ s/RE1/RE2/", some do NOT allow "/one/! s/RE1/RE2/", which
contains space before the 's'. Omit the space when typing the command.

OPTIMIZING FOR SPEED: If execution speed needs to be increased (due to
large input files or slow processors or hard disks), substitution will
be executed more quickly if the "find" expression is specified before
giving the "s/.../.../" instruction. Thus:

    sed 's/foo/bar/g' filename         # standard replace command
    sed '/foo/ s/foo/bar/g' filename   # executes more quickly
    sed '/foo/ s//bar/g' filename      # shorthand sed syntax

On line selection or deletion in which you only need to output lines
from the first part of the file, a "quit" command (q) in the script
will drastically reduce processing time for large files. Thus:

    sed -n '45,50p' filename           # print line nos. 45-50 of a file
    sed -n '51q;45,50p' filename       # same, but executes much faster

If you have any additional scripts to contribute or if you find errors
in this document, please send e-mail to the compiler. Indicate the
version of sed you used, the operating system it was compiled for, and
the nature of the problem. Various scripts in this file were written
or contributed by:

  Al Aab <af137@...>   # "seders" list moderator
  Yiorgos Adamopoulos <adamo@...>
  Dale Dougherty <dale@...>     # author of "sed & awk"
  Carlos Duarte <cdua@...>    # author of "do it with sed"
  Eric Pement <epement@...>  # author of this document
  Ken Pizzini <ken@...>          # author of GNU sed v3.02
  S.G.Ravenhall <S.G.Ravenhall@...> # great de-html script
  Greg Ubben <gsu@...>      # many contributions & much help
-------------------------------------------------------------------------


From af137@...  Fri Feb 28 11:31:42 1997
Date: Fri, 28 Feb 1997 04:26:06 -0500 (EST)
From: Al Aab <af137@...>
Subject: dc.sed (fwd)
Message-ID: <Pine.3.89.9702280439.B15924-0100000@queen>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


here is the biggest greg ubben master piece, yet :
a sed script that not only does arithmetic,
but clones a unix rpn calculator, dc.

as i noted, eons ago, sed can do the 4 r's :
reading, writing, arithemtic and 			 recursion.
there is a posted sed web-script to solve the classic  towers of hanoi.
if you cannot find it on the web, email me.
=-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
al aab, seders moderator                                      sed u soon
                it is not zat we do not see the  s o l u t i o n
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

---------- Forwarded message ----------
Date: Thu, 27 Feb 1997 14:15:15 -0500 (EST)
From: Greg Ubben <gsu@...>
To: af137@...
Subject: dc.sed

Seders,

     Here is the big sed project I've been working on for fun -- a complete
implementation of the UNIX dc command!  If you've never used it before,
dc is an arbitrary precision reverse-polish (stacking) "desk" calculator.
Besides supporting calculations on very long numbers, it also handles
non-decimal input and output bases (even negative and fractional bases),
and has a set of stackable registers in which can be stored numbers,
strings, or executable macros.  For details on operating dc, see the dc
manual page on any UNIX system or on the web (one URL you can use is
http://www.delorie.com/gnu/docs/bc/dc.1.html).  But here's a short demo:

$ ./dc.sed  # start running dc.sed interactively
10k 	 # set the scale (fractional digits) to 10
_375 2.5 .716 +/p # compute and print -375 / (2.5 + .716)
			 # hit return again to see result
sa 	 # store the last result (still on stack) in register a
la d *p 	 # load register a, duplicate, square, and print
			 # hit return again to see result
vp 	 # now take square root and print
q 	 # we're done - quit (hit return twice)

     Note that because sed reads ahead a line, you have to hit RETURN twice
to see the effects of your last command.  Of course, the script is limited
by the size of your sed's buffer.  This is about 4000 characters on SunOS
-- enough for you to sling around 500-digit numbers if there's not much on
the stack or in registers, and you don't mind waiting a few hours for your
answer.  As far as I know, dc.sed is actually less buggy than the UNIX and
GNU versions, barring the memory and speed limitations.  Let me know if you
find any bugs.

     This script is actually surprisingly fast using SunOS 4.1.4 /bin/sed.
On my Sun host, it calculates 10k2vp (sqrt(2) to 10 places) in just a
couple seconds.  However GNU sed 2.05 (compiled with cc -O2) was 168 times
slower.  (304 times slower with cc -g)  In real time, I watched a 1-hour
TV show waiting for GNU sed to finish the calculation.  I'd be interested
in hearing experiences running it on other sed implementations.  It should
give any sed a good work-out, and would make a good test to profile GNU
sed against to find the bottlenecks.

     Another limitation you may run into is the number of sed commands.
I had to really squeeze to get this script to fit into the 199-command
limit on SunOS.  If your sed has lower limits, you may be out of luck.

     What few explanatory comments there were have been removed from this
copy of the script just to make it more obfuscated and interesting.  :-)
I may post a better-commented version later if there's interest.  If you
want to try to tinker, here's an exercise:  see if you can add a "pi"
command that will push the number 3.14159265 on the stack.  As an advanced
(difficult) exercise for hard-core hackers, try implementing associative
arrays per my comments in the script.  In any case, enjoy, and never again
let it be said that sed can't do arithmetic.

Next time:  perl.sed  (whfg xvqqvat)

Greg

------------------------------------------------------------------------
#!/bin/sed -f
#  dc.sed - an arbitrary precision RPN calculator
#  Created by Greg Ubben <gsu@...> early 1995, late 1996
#
#  Dedicated to MAC's memory of the IBM 1620 ("CADET") computer.
#  @(#)GSU dc.sed 1.0 27-Feb-1997 [non-explanatory]
#
#  Examples:
# sqrt(2) to 10 digits: echo "10k 2vp" | dc.sed
# 20 factorial:  echo "[d1-d1<!*]s! 20l!xp" | dc.sed
# sin(ln(7)):  echo "s(l(7))" | bc -c /usr/lib/lib.b | dc.sed
# hex to base 60:  echo "60o16i 6B407.CAFE p" | dc.sed
#
#  To debug or analyze, give the dc Y command as input or add it to
#  embedded dc routines, or add the sed p command to the beginning of
#  the main loop or at various points in the low-level sed routines.
#  If you need to allow [|~] characters in the input, filter this
#  script through "tr '|~' '\36\37'" first.
#
#  Not implemented: ! \
#  But implemented: K Y t # !< !> != fractional-bases
#  SunOS limits: 199/199 commands (though could pack in 10-20 more)
#  Limitations:  scale <= 999; |obase| >= 1; input digits in [0..F]
#  Completed:  1am Feb 4, 1997

s/^/|P|K0|I10|O10|?~/

: next
s/|?./|?/
s/|?#[  -}]*/|?/
/|?!*[lLsS;:<>=]\{0,1\}$/N
/|?!*[-+*/%^<>=]/b binop
/^|.*|?[dpPfQXZvxkiosStT;:]/b binop
/|?[_0-9A-F.]/b number
/|?\[/b string
/|?l/b load
/|?L/b Load
/|?[sS]/b save
/|?c/ s/[^|]*//
/|?d/ s/[^~]*~/&&/
/|?f/ s//&[pSbz0<aLb]dSaxsaLa/
/|?x/ s/\([^~]*~\)\(.*|?x\)~*/\2\1/
/|?[KIO]/ s/.*|\([KIO]\)\([^|]*\).*|?\1/\2~&/
/|?T/ s/\.*0*~/~/
#  a slow, non-stackable array implementation in dc, just for completeness
#  A fast, stackable, associative array implementation could be done in sed
#  (format: {key}value{key}value...), but would be longer, like load & save.
/|?;/ s/|?;\([^{}]\)/|?~[s}s{L{s}q]S}[S}l\1L}1-d0>}s\1L\1l{xS\1]dS{xL}/
/|?:/ s/|?:\([^{}]\)/|?~[s}L{s}L{s}L}s\1q]S}S}S{[L}1-d0>}S}l\1s\1L\1l{xS\1]dS{x/
/|?[ ~ cdfxKIOT]/b next
/|?\n/b next
/|?[pP]/b print
/|?k/ s/^\([0-9]\{1,3\}\)\([.~].*|K\)[^|]*/\2\1/
/|?i/ s/^\(-\{0,1\}[0-9]*\.\{0,1\}[0-9]\{1,\}\)\(~.*|I\)[^|]*/\2\1/
/|?o/ s/^\(-\{0,1\}[1-9][0-9]*\.\{0,1\}[0-9]*\)\(~.*|O\)[^|]*/\2\1/
/|?[kio]/b pop
/|?t/b trunc
/|??/b input
/|?Q/b break
/|?q/b quit
h
/|?[XZz]/b count
/|?v/b sqrt
s/.*|?\([^Y]\).*/\1 is unimplemented/
s/\n/\\n/g
l
g
b next

: print
/^-\{0,1\}[0-9]*\.\{0,1\}[0-9]\{1,\}~.*|?p/!b Print
/|O10|/b Print

#  Print a number in a non-decimal output base.  Uses registers a,b,c,d.
#  Handles fractional output bases (O<-1 or O>=1), unlike other dc's.
#  Converts the fraction correctly on negative output bases, unlike
#  UNIX dc.  Also scales the fraction more accurately than UNIX dc.
#
s,|?p,&KSa0kd[[-]Psa0la-]Sad0>a[0P]sad0=a[A*2+]saOtd0>a1-ZSd[[[[ ]P]sclb1\
!=cSbLdlbtZ[[[-]P0lb-sb]sclb0>c1+]sclb0!<c[0P1+dld>c]scdld>cscSdLbP]q]Sb\
[t[1P1-d0<c]scd0<c]ScO_1>bO1!<cO[16]<bOX0<b[[q]sc[dSbdA>c[A]sbdA=c[B]sbd\
B=c[C]sbdC=c[D]sbdD=c[E]sbdE=c[F]sb]xscLbP]~Sd[dtdZOZ+k1O/Tdsb[.5]*[.1]O\
X^*dZkdXK-1+ktsc0kdSb-[Lbdlb*lc+tdSbO*-lb0!=aldx]dsaxLbsb]sad1!>a[[.]POX\
+sb1[SbO*dtdldx-LbO*dZlb!<a]dsax]sadXd0<asbsasaLasbLbscLcsdLdsdLdLak[]pP,
b next

: Print
/|?p/s/[^~]*/&\
~&/
s/\(.*|P\)\([^|]*\)/\
\2\1/
s/\([^~]*\)\n\([^~]*\)\(.*|P\)/\1\3\2/
h
s/~.*//
/./{ s/.//; p; }
#  Just s/.//p would work if we knew we were running under the -n option.
#  Using l vs p would kind of do \ continuations, but would break strings.
g

: pop
s/[^~]*~//
b next

: load
s/\(.*|?.\)\(.\)/\20~\1/
s/^\(.\)0\(.*|r\1\([^~|]*\)~\)/\1\3\2/
s/.//
b next

: Load
s/\(.*|?.\)\(.\)/\2\1/
s/^\(.\)\(.*|r\1\)\([^~|]*~\)/|\3\2/
/^|/!i\
register empty
s/.//
b next

: save
s/\(.*|?.\)\(.\)/\2\1/
/^\(.\).*|r\1/ !s/\(.\).*|/&r\1|/
/|?S/ s/\(.\).*|r\1/&~/
s/\(.\)\([^~]*~\)\(.*|r\1\)[^~|]*~\{0,1\}/\3\2/
b next

: quit
t quit
s/|?[^~]*~[^~]*~/|?q/
t next
#  Really should be using the -n option to avoid printing a final newline.
s/.*|P\([^|]*\).*/\1/
q

: break
s/[0-9]*/&;987654321009;/
: break1
s/^\([^;]*\)\([1-9]\)\(0*\)\([^1]*\2\(.\)[^;]*\3\(9*\).*|?.\)[^~]*~/\1\5\6\4/
t break1
b pop

: input
N
s/|??\(.*\)\(\n.*\)/|?\2~\1/
b next

: count
/|?Z/ s/~.*//
/^-\{0,1\}[0-9]*\.\{0,1\}[0-9]\{1,\}$/ s/[-.0]*\([^.]*\)\.*/\1/
/|?X/ s/-*[0-9A-F]*\.*\([0-9A-F]*\).*/\1/
s/|.*//
/~/ s/[^~]//g

s/./a/g
: count1
	 s/a\{10\}/b/g
	 s/b*a*/&a9876543210;/
	 s/a.\{9\}\(.\).*;/\1/
	 y/b/a/
/a/b count1
G
/|?z/ s/\n/&~/
s/\n[^~]*//
b next

: trunc
#  for efficiency, doesn't pad with 0s, so 10k 2 5/ returns just .40
#  The X* here and in a couple other places works around a SunOS 4.x sed bug.
s/\([^.~]*\.*\)\(.*|K\([^|]*\)\)/\3;9876543210009909:\1,\2/
: trunc1
	
s/^\([^;]*\)\([1-9]\)\(0*\)\([^1]*\2\(.\)[^:]*X*\3\(9*\)[^,]*\),\([0-9]\)/\1\5\6\
\4\7,/
t trunc1
s/[^:]*:\([^,]*\)[^~]*/\1/
b normal

: number
s/\(.*|?\)\(_\{0,1\}[0-9A-F]*\.\{0,1\}[0-9A-F]*\)/\2~\1~/
s/^_/-/
/^[^A-F~]*~.*|I10|/b normal
/^[-0.]*~/b normal
s:\([^.~]*\)\.*\([^~]*\):[Ilb^lbk/,\1\2~0A1B2C3D4E5F1=11223344556677889900;.\2:
: digit
     s/^\([^,]*\),\(-*\)\([0-F]\)\([^;]*\(.\)\3[^1;]*\(1*\)\)/I*+\1\2\6\5~,\2\4/
t digit
s:...\([^/]*.\)\([^,]*\)[^.]*\(.*|?.\):\2\3KSb[99]k\1]SaSaXSbLalb0<aLakLbktLbk:
b next

: string
/|?[^]]*$/N
s/\(|?[^]]*\)\[\([^]]*\)]/\1|{\2|}/
/|?\[/b string
s/\(.*|?\)|{\(.*\)|}/\2~\1[/
s/|{/[/g
s/|}/]/g
b next

: binop
/^[^~|]*~[^|]/ !i\
stack empty
//!b next
/^-\{0,1\}[0-9]*\.\{0,1\}[0-9]\{1,\}~/ !s/[^~]*\(.*|?!*[^!=<>]\)/0\1/
/^[^~]*~-\{0,1\}[0-9]*\.\{0,1\}[0-9]\{1,\}~/ !s/~[^~]*\(.*|?!*[^!=<>]\)/~0\1/
h
/|?\*/b mul
/|?\//b div
/|?%/b rem
/|?^/b exp

/|?[+-]/ s/^\(-*\)\([^~]*~\)\(-*\)\([^~]*~\).*|?\(-\{0,1\}\).*/\2\4s\3o\1\3\5/
s/\([^.~]*\)\([^~]*~[^.~]*\)\(.*\)/<\1,\2,\3|=-~.0,123456789<></
/^<\([^,]*,[^~]*\)\.*0*~\1\.*0*~/ s/</=/
: cmp1
	 s/^\(<[^,]*\)\([0-9]\),\([^,]*\)\([0-9]\),/\1,\2\3,\4/
t cmp1
/^<\([^~]*\)\([^~]\)[^~]*~\1\(.\).*|=.*\3.*\2/ s/</>/
/|?/{
	 s/^\([<>]\)\(-[^~]*~-.*\1\)\(.\)/\3\2/
	 s/^\(.\)\(.*|?!*\)\1/\2!\1/
	 s/|?![^!]\(.\)/&l\1x/
	 s/[^~]*~[^~]*~\(.*|?\)!*.\(.*\)|=.*/\1\2/
	 b next
}
s/\(-*\)\1|=.*/;9876543210;9876543210/
/o-/ s/;9876543210/;0123456789/
s/^>\([^~]*~\)\([^~]*~\)s\(-*\)\(-*o\3\(-*\)\)/>\2\1s\5\4/

s/,\([0-9]*\)\.*\([^,]*\),\([0-9]*\)\.*\([0-9]*\)/\1,\2\3.,\4;0/
: right1
	 s/,\([0-9]\)\([^,]*\),;*\([0-9]\)\([0-9]*\);*0*/\1,\2\3,\4;0/
t right1
s/.\([^,]*\),~\(.*\);0~s\(-*\)o-*/\1~\30\2~/

: addsub1
	
s/\(.\{0,1\}\)\(~[^,]*\)\([0-9]\)\(\.*\),\([^;]*\)\(;\([^;]*\(\3[^;]*\)\).*X*\1\\
(.*\)\)/\2,\4\5\9\8\7\6/
	 s/,\([^~]*~\).\{10\}\(.\)[^;]\{0,9\}\([^;]\{0,1\}\)[^;]*/,\2\1\3/
	 #  could be done in one s/// if we could have >9 back-refs...
/^~.*~;/!b addsub1

: endbin
s/.\([^,]*\),\([0-9.]*\).*/\1\2/
G
s/\n[^~]*~[^~]*//

: normal
s/^\(-*\)0*\([0-9.]*[0-9]\)[^~]*/\1\2/
s/^[^1-9~]*~/0~/
b next

: mul
s/\(-*\)\([0-9]*\)\.*\([0-9]*\)~\(-*\)\([0-9]*\)\.*\([0-9]*\).*|K\([^|]*\).*/\1\\
4\2\5.!\3\6,|\2<\3~\5>\6:\7;9876543210009909/

: mul1
     s/![0-9]\([^<]*\)<\([0-9]\{0,1\}\)\([^>]*\)>\([0-9]\{0,1\}\)/0!\1\2<\3\4>/
     /![0-9]/ s/\(:[^;]*\)\([1-9]\)\(0*\)\([^0]*\2\(.\).*X*\3\(9*\)\)/\1\5\6\4/
/<~[^>]*>:0*;/!t mul1

s/\(-*\)\1\([^>]*\).*/;\2^>:9876543210aaaaaaaaa/

: mul2
     s/\([0-9]~*\)^/^\1/
     s/<\([0-9]*\)\(.*[~^]\)\([0-9]*\)>/\1<\2>\3/

     : mul3
	 s/>\([0-9]\)\(.*\1.\{9\}\(a*\)\)/\1>\2;9\38\37\36\35\34\33\32\31\30/
	 s/\(;[^<]*\)\([0-9]\)<\([^;]*\).*\2[0-9]*\(.*\)/\4\1<\2\3/
	 s/a[0-9]/a/g
	 s/a\{10\}/b/g
	 s/b\{10\}/c/g
     /|0*[1-9][^>]*>0*[1-9]/b mul3

     s/;/a9876543210;/
     s/a.\{9\}\(.\)[^;]*\([^,]*\)[0-9]\([.!]*\),/\2,\1\3/
     y/cb/ba/
/|<^/!b mul2
b endbin

: div
#  CDDET
/^[-.0]*[1-9]/ !i\
divide by 0
//!b pop
s/\(-*\)\([0-9]*\)\.*\([^~]*~-*\)\([0-9]*\)\.*\([^~]*\)/\2.\3\1;0\4.\5;0/
: div1
	 s/^\.0\([^.]*\)\.;*\([0-9]\)\([0-9]*\);*0*/.\1\2.\3;0/
	 s/^\([^.]*\)\([0-9]\)\.\([^;]*;\)0*\([0-9]*\)\([0-9]\)\./\1.\2\30\4.\5/
t div1
s/~\(-*\)\1\(-*\);0*\([^;]*[0-9]\)[^~]*/~123456789743222111~\2\3/
s/\(.\(.\)[^~]*\)[^9]*\2.\{8\}\(.\)[^~]*/\3~\1/
s,|?.,&SaSadSaKdlaZ+LaX-1+[sb1]Sbd1>bkLatsbLa[dSa2lbla*-*dLa!=a]dSaxsakLasbLb*t,
b next

: rem
s,|?%,&Sadla/LaKSa[999]k*Lak-,
b next

: exp
#  This decimal method is just a little faster than the binary method done
#  totally in dc:  1LaKLb [kdSb*LbK]Sb [[.5]*d0ktdSa<bkd*KLad1<a]Sa d1<a kk*
/^[^~]*\./i\
fraction in exponent ignored
s,[^-0-9].*,;9d**dd*8*d*d7dd**d*6d**d5d*d*4*d3d*2lbd**1lb*0,
: exp1
	 s/\([0-9]\);\(.*\1\([d*]*\)[^l]*\([^*]*\)\(\**\)\)/;dd*d**d*\4\3\5\2/
t exp1
G
s,-*.\{9\}\([^9]*\)[^0]*0.\(.*|?.\),\2~saSaKdsaLb0kLbkK*+k1\1LaktsbkLax,
s,|?.,&SadSbdXSaZla-SbKLaLadSb[0Lb-d1lb-*d+K+0kkSb[1Lb/]q]Sa0>a[dk]sadK<a[Lb],
b next

: sqrt
#  first square root using sed:  8k2v at 1:30am Dec 17, 1996
/^-/i\
square root of negative number
/^[-0]/b next
s/~.*//
/^\./ s/0\([0-9]\)/\1/g
/^\./ !s/[0-9][0-9]/7/g
G
s/\n/~/
s,|?.,&KSbSb[dk]SadXdK<asadlb/lb+[.5]*[sbdlb/lb+[.5]*dlb>a]dsaxsasaLbsaLatLbk,
b next

#  END OF GSU dc.sed


      this ia a beta graf for sed.  a state diagram, if u will.
      Sun Dec 21 97   AM 11 43 48

      sed commands:
      r w s / g p l a i c w d
                P           D
      b t
      MODIFIERS (ADJECTIVES/ADVERBS)                             _______
      !                                                        /  output \
      g                                                       |    file   |
      \{m,n\}                                                  \ ________/
      \(\)                                                          ^
                                                                    ^
                                                                    ^
                                                                    w
  SEPARATORS : ;  -e                                                ^
|
  CONTROL FLOW MODS:  b t d D                                       ^
|
  ADDRESS:  // ,                                                    ^
  LIMITS :  |(\) \{\}                                               ^
  UNSPECIFIED BUFFERS : \1 \2 ... \9                                ^
    _____________                                              _____^________
  /              |                                           /  STANDARD     |
|   SCRIPT      |                                          |    inPUT       |
|_______________|                                          |________________|
        .                                                           .
        .                                                           .
        .                                                           .
      a i c                                                        p P l
        .                                                           .
        .                                                           .
        .       __________________            __________________    .
        .      |                  |---h H--->|                  |   .
        .      |   PATTERN        |          |       HOLD       |   .
        .      |    SPACE         |<----x--->|       SPACE      |   .
        .      |                  |          |                  |   .
        .      |                  |<--g G----|                  |   .
        .      |__________________|          |__________________|   .
        .                                                           .
        .                                                           .
        .                                                           .
        .                                                           .
        .                                                           .
  ______V___________________________________________________________V_____
|                                                                        |
|                                                                       /
|                               STANDARD   outPUT                    /
|                                                                  /
|                                                                 |
|_________________________________________________________________|

Archive-name: editor-faq/sed
Posting-Frequency: bimonthly
Last-modified: 1999/07/18
Version: 011
URL: http://www.cornerstonemag.com/sed/sedfaq.html
Maintainer: Eric Pement <epement@...>


                              THE SED FAQ

                   Frequently Asked Questions about
                        sed, the stream editor

CONTENTS:

1. GENERAL INFORMATION
1.1. Introduction - How this FAQ is organized
1.2. Latest version of the sed FAQ
1.3. FAQ revision information
1.4. How do I add a question/answer to the sed FAQ?
1.5. FAQ abbreviations
1.6. Credits and acknowledgements
1.7. Standard disclaimers

2. BASIC SED
2.1. What is sed?
2.2. What versions of sed are there, and where can I get them?

2.2.1. Free versions

  2.2.1.1. Unix platforms
  2.2.1.2. OS/2
  2.2.1.3. Microsoft Windows (3.1, NT, Win95)
  2.2.1.4. MS-DOS
  2.2.1.5. CP/M

2.2.2. Shareware and Commercial versions

  2.2.2.1. Unix platforms
  2.2.2.2. OS/2
  2.2.2.3. Windows NT, Windows 95
  2.2.2.4. MS-DOS

2.3. Where can I learn to use sed?

  2.3.1. Books
  2.3.2. Mailing list
  2.3.3. Tutorials, electronic text
  2.3.4. General web and ftp sites

3. TECHNICAL
3.1. More detailed explanation of basic sed
3.2. Common one-line sed scripts. How do I . . . ?

       - double/triple-space a file?
       - convert DOS/Unix newlines?
       - delete leading/trailing spaces?
       - do substitutions on all/certain lines?
       - delete consecutive blank lines?
       - delete blank lines at the top/end of the file?

3.3. Addressing and address ranges
3.4. [reserved]
3.5. [reserved]
3.6. [reserved]
3.7. GNU/POSIX extensions to regular expressions

4. EXAMPLES
4.1. How do I perform a case-insensitive search?
4.2. How do I make changes in only part of a file?
4.3. How do I change only the first occurrence of a pattern?
4.4. How do I make substitutions in every file in a directory, or in a
      complete directory tree?

  4.4.1 - Perl solution
  4.4.2 - Unix solution
  4.4.3 - DOS solution

4.5. How do I parse a comma-delimited data file?
4.6. How do I insert a newline into the RHS of a substitution?
4.7. How do I represent control-codes or non-printable characters?
4.8. How do I read environment variables with sed?

  4.8.1. - on Unix platforms
  4.8.2. - on MS-DOS or 4DOS platforms

4.9. How do I export or pass variables back into the environment?

  4.9.1. - on Unix platforms
  4.9.2. - on MS-DOS or 4DOS platforms

4.10. How do I handle shell quoting in sed?
4.11. How do I delete a block of text if the block contains a certain
       regular expression?
4.12. How do I locate/print a paragraph of text if the paragraph
       contains a certain regular expression?
4.13. How do I delete a block of _specific_ consecutive lines?
4.14. How do I read (insert/add) a file at the top of a textfile?
4.15. How do I address all the lines between RE1 and RE2, excluding
       the lines themselves?
4.16. How do I put "/some/path/here" into the LHS of a substitution?
4.17. How do I replace "C:\SOME\DOS\PATH" in a substitution?                  |
4.18. How do I convert files with toggle characters, like +this+, to          |
       look like [i]this[/i]?                                                  |
4.19. How do I delete only the first occurrence of a pattern?                 |
4.20. How do I commify a string of numbers?                                   |

5. WHY ISN'T THIS WORKING?
5.1. Why don't my variables like $var get expanded in my sed script?
5.2. I'm using 'p' to print, but I have duplicate lines sometimes.
5.3. Why does my DOS version of sed process a file part-way through
      and then quit?
5.4. My RE isn't matching/deleting what I want it to. (Or, "Greedy vs.
      stingy pattern matching")
5.5. What is CSDPMI*B.ZIP and why do I need it?
5.6. Where are the man pages for GNU sed?
5.7. How do I tell what version of sed I am using?
5.8. Does sed issue an exit code?
5.9. The 'r' command isn't inserting the file into the text.
5.10. Why can't I match 2 or more lines using the \n command?

6. OTHER ISSUES
6.1. I have a problem that stumps me. Where can I get help?
6.2. How does sed compare with awk, perl, and other utilities?
6.3. When should I use sed?
6.4. When should I NOT use sed?
6.5. When should I ignore sed and use Awk or Perl instead?
6.6. Known limitations among sed versions
6.7. Known bugs among sed versions
6.8. Known incompatibilities between sed versions

  6.8.1. Issuing commands from the command line
  6.8.2. Using comments (prefixed by the '#' sign)
  6.8.3. Special syntax in REs
  6.8.4. Word boundaries
  6.8.5. Range addressing with GNU sed and HHsed

------------------------------

1. GENERAL INFORMATION

1.1. Introduction - How this FAQ is organized

    This FAQ is organized to answer common (and some uncommon)
    questions about sed, quickly. If you see a term or abbreviation in
    the examples that seems unclear, see if the term is defined in             |
    section 1.5. If not, write us and we'll try to clarify it for the          |
    next version of the FAQ.                                                   |

1.2. Latest version of the sed FAQ

    The newest version of the sed FAQ is usually here:

       http://www.cornerstonemag.com/sed/sedfaq.html
       http://www.cornerstonemag.com/sed/sedfaq.txt
       http://www.dbnet.ece.ntua.gr/~george/sed/sedfaq.html
       http://www.dbnet.ece.ntua.gr/~george/sed/sedfaq.txt                     |
       http://www.ptug.org/sed/sedfaq.html
       http://seders.icheme.org/tutorials/sedfaq.html
       http://www.faqs.org/faqs/editor-faq/sed
       ftp://rtfm.mit.edu/pub/faqs/editor-faq/sed

1.3. FAQ revision information

    Changes to this FAQ since the last version are indicated by a
    vertical bar (|) placed in column 78 of the affected lines. To
    remove the vertical bars (use double quotes for MS-DOS):

       sed 's/  *|$//' sedfaq.txt > sedfaq2.txt

    In the HTML version, vertical bars do not appear. New or altered
    portions of the FAQ are indicated by printing in dark blue type.

    In the text version, words needing emphasis may be surrounded by
    the underscore '_' or the asterisk '*'. In the HTML version, these
    are changed to italics and boldface, respectively.

1.4. How do I add a question/answer to the sed FAQ?

    Word your question succinctly and clearly, and e-mail it to Al Aab
    <af137@...> for posting on the seders mailing
    list; send a cc: to <epement@...>. We will discuss the
    proposed question/answer on the sed mailing list, and if there is
    some agreement, your contribution will be included in the next
    edition of the sed FAQ.

1.5. FAQ abbreviations:

    files = one or more filenames, separated by whitespace
    RE  = Regular Expressions supported by sed
    LHS = the left-hand side ("find" part) of "s/find/repl/" command
    RHS = the right-hand side ("replace" part) of "s/find/repl/" cmd.

    files: "files" will be our shorthand for one or more filenames,
    which are entered as arguments on the command line. The names may
    include any wildcards your shell understands (such as ``zork*'' or
    ``Aug[4-9].let'').  Sed will process each filename passed to it by
    the shell.

    RE: For the syntax of Basic Regular Expressions (BREs), type "man
    ed" and read the documentation for regular expressions. A technical
    description of BREs from the Single UNIX Specification, Version 2,
    by The Open Group (joint committee on Unix) is available online at
    <http://www.rdg.opengroup.org/onlinepubs/7908799/xbd/re.html#tag_007_003>.
    Sed normally supports BREs plus '\n' to match a newline in the
    pattern space and '\xREx' as equivalent to '/RE/', where 'x' is any
    character other than another backslash.

    Some versions of sed support supersets of BREs, or "extended
    regular expressions", which offer additional metacharacters for
    increased flexibility. For additional information on extended REs
    in GNU sed, see sections 3.7 ("GNU/POSIX extensions to regular
    expressions") and 6.8.3 ("Special syntax in REs"), below.

    LHS: In sed, the LHS may be a string literal (e.g., "foo") or any
    valid regular expression supported by your version of sed. Some
    versions of sed support things like \t for TAB, \r for carriage
    return, \xNN for direct entry of hex codes, etc. Other versions of
    sed do not support this syntax.

    RHS: The right-hand side (the replacement part in s/find/replace/)
    is almost always a string literal, with no interpolation of the
    metacharacters (.), (^), ($), ([), or \(...\) -- with the following
    exceptions:  \1 through \9 are replaced by the corresponding group,
    if grouping \(...\) was used in the LHS.  If no grouping was used
    in the LHS, then \1 through \9 are replaced by literal digits. '&'
    is replaced by the entire expression matched on the LHS. To enter a
    literal ampersand or backslash in the RHS, type '\&' or '\\'.

1.6. Credits and acknowledgements

    Many of the ideas for this faq were taken from the Awk FAQ
       http://www.faqs.org/faqs/computer-lang/awk/faq/
       ftp://rtfm.mit.edu/pub/usenet/comp.lang.awk/faq

    and from the Perl FAQ
       http://www.perl.com/perl/FAQ
       http://www.perl.com/CPAN/doc/FAQs/FAQ/html/index.html
       ftp://ftp.cdrom.com/pub/perl/CPAN/doc/FAQs/FAQ

    The following individuals have contributed significantly to this
    document, and have provided input and wording suggestions for
    questions, answers, and script examples. Credit goes to these
    contributors (in alphabetical order by last name):

       Al Aab <af137@freenet*toronto*on*ca>
       Yiorgos Adamopoulos <adamo@softlab*ece*ntua*gr>
       Walter Briscoe <walter@wbriscoe*demon*co*uk>
       Jim Dennis <jadestar@rahul*net>
       Carlos Duarte <cdua@algos*inesc*pt>
       Otavio Exel <oexel@economatica*com*br>
       Mark Katz <mark@ispc001*demon*co*uk>
       Eric Pement <epement@jpusa*chi*il*us>
       Greg Pfeiffer <gpfeiffe@yahoo*com>                                      |
       Ken Pizzini <ken@halcyon*com>
       Niall Smart <nialls@euristix*ie>
       Simon Taylor <staylor@unisolve*com*au>
       Greg Ubben <gsu@romulus*ncsc*mil>

    Note: Periods (.) are replaced with asterisks (*) to foil e-mail
    harvesting and spam-bots.

1.7. Standard disclaimers

    While a serious attempt has been made to ensure the accuracy of the
    information presented herein, the contributors and maintainers of
    this document do not claim the absence of errors and make no
    warranties on the information provided. If you notice any errors or
    ambiguous wording, please notify the FAQ maintainer so it can be
    fixed for the next edition.

------------------------------

2. BASIC SED

2.1. What is sed?

    "sed" stands for Stream EDitor. Sed is a non-interactive editor,
    written by the late Lee E. McMahon in 1973 or 1974. A brief history
    of sed's origins may be found in an early history of the Unix
    tools, at <http://www.columbia.edu/~rh120/ch106.x09>.

    Instead of the user altering a file interactively by moving the
    cursor on the screen (like with Word Perfect), the user sends a
    script of editing instructions to sed, plus the name of the file to
    edit (or the text to be edited may come as output from a pipe). In
    this sense, sed works like a filter -- deleting, inserting and
    changing characters, words, and lines of text. Its range of
    activity goes from small, simple changes to very complex ones.

    Sed reads its input from stdin (Unix shorthand for "standard
    input," i.e., the console) or from files (or both), and sends the
    results to stdout ("standard output," normally the console or
    screen). Most people use sed first for its substitution features.
    Sed is often used as a find-and-replace tool.

       sed 's/Glenn/Harold/g' oldfile >newfile

    will replace every occurrence of "Glenn" with the word "Harold",
    wherever it occurs in the file. The "find" portion is a regular
    expression ("RE"), which can be a simple word or may contain
    special characters to allow greater flexibility (for example, to
    prevent "Glenn" from also matching "Glennon").

    My very first use of sed was to add 8 spaces to the left side of a
    file, so when I printed it, the printing wouldn't begin at the
    absolute left edge of a piece of paper.

       sed 's/^/        /' myfile >newfile   # my first sed script
       sed 's/^/        /' myfile | lp       # my next sed script

    Then I learned that sed could display only one paragraph of a file,
    beginning at the phrase "and where it came" and ending at the
    phrase "for all people". My script looked like this:

       sed -n '/and where it came/,/for all people/p' myfile

    Sed's normal behavior is to print (i.e., display or show on screen)
    the entire file, including the parts that haven't been altered,
    unless you use the -n switch. The "-n" stands for "no output". This
    switch is almost always used in conjunction with a 'p' command
    somewhere, which says to print only the sections of the file that
    have been specified. The -n switch with the 'p' command allow for
    parts of a file to be printed (i.e., sent to the console).

    Next, I found that sed could show me only (say) lines 12-18 of a
    file and not show me the rest. This was very handy when I needed to
    review only part of a long file and I didn't want to alter it.

       sed -n 12,18p myfile   # the 'p' stands for print

    Likewise, sed could show me everything else BUT those particular
    lines, without physically changing the file on the disk:

       sed 12,18d myfile      # the 'd' stands for delete

    Sed could also double-space my single-spaced file when it came time
    to print it:

       sed G myfile >newfile

    If you have many editing commands (for deleting, adding,
    substituting, etc.) which might take up several lines, those
    commands can be put into a separate file and all of the commands in
    the file applied to file being edited:

       sed -f script.sed myfile  # 'script.sed' is the file of commands
                                 # 'myfile' is the file being changed

    It is not our intention to convert this FAQ file into a full-blown
    sed tutorial (for good tutorials, see section 2.3). Rather, we hope        |
    this gives the complete novice a few ideas of how sed can be used.

2.2. What versions of sed are there, and where can I get them?

2.2.1. Free versions

    Note: "Free" does not mean "public domain". "Free" doesn't mean you
    can sell it, put your name on it, or get the source code. "Free"
    just means you don't have to pay money for it.

2.2.1.1. Unix platforms

    GNU sed v3.02
    This is the latest official version of GNU sed
       ftp://ftp.gnu.org/pub/gnu/sed/sed-3.02.tar.gz

    GNU sed v3.02a
    Now a,i,c commands can accept a string after them. Expansion of
    line ranges such as /RE/,+5 (next 5 lines) or /RE/,~5 (till the
    next line which is a multiple of 5). NULs permitted in regexes
    in sed scripts, '\n' is permitted on RHS, other changes. Technically
    this is still an alpha release, but no problems have been noted
    with this version in the past 7 months.
       ftp://alpha.gnu.org/pub/gnu/sed/sed-3.02a.tar.gz

    GNU sed v2.05
    This version is superseded by v3.02 and 3.02a, above

    GNU mirror sites. A list of mirror sites is at:
       http://www.ensta.fr/internet/unix/GNU-archives.html

    Precompiled versions:

    GNU sed v3.02-1
    source code and binaries for Debian Linux
       http://www.debian.org/Packages/unstable/base/sed.html

    GNU sed v2.05-12
    source code and binaries for Debian Linux (Note: the code for gsed
    3.02 is much better despite the name "unstable" in the pathname.)
       http://www.debian.org/Packages/stable/base/sed.html

    The 4.4BSD version of sed is available from any 4.4BSD-Lite2 mirror
    site:
       ftp://ftp.ntua.gr/pub/bsd/4.4BSD/usr/src/usr.bin/sed/

    For some time, the GNU project <http://www.gnu.org> used Eric S.
    Raymond's version of sed (ESR sed v1.1), but eventually dropped it
    because it had too many built-in limits. In 1991 Howard Helman
    modified the GNU/ESR sed and produced a flexible version of sed
    v1.5 available at several sites (Helman's version permitted things
    like \<...\> to delimit word boundaries, \xHH to enter hex code and
    \n to indicate newlines in the replace string). This version did
    not catch on with the GNU project and their version of sed has
    moved in a similar but different direction.

    sed v1.3, by Eric Steven Raymond (released 4 June 1998)
       http://earthspace.net/~esr/sed-1.3.tar.gz

    Eric Raymond <esr@...> wrote one of the earliest
    versions of sed. On his website <http://www.tuxedo.org/~esr/> which
    also distributes many freeware utilities he has written or worked
    on, he describes sed v1.1 this way:

    "This is the fast, small sed originally distributed in the GNU
    toolkit and still distributed with Minix. The GNU people ditched it
    when they built their own sed around an enhanced regex package --
    but it's still better for some uses (in particular, faster and less
    memory-intensive)." (Version 1.3 fixes an unidentified bug and adds
    the L command to hexdump the current pattern space.)

2.2.1.2. OS/2

    GNU sed v1.06
       http://oak.oakland.edu/pub/os2/editors/sed106.zip

    GNU sed v2.05 (requires 'emxrt.zip', below)
       http://oak.oakland.edu/pub/os2/editors/gnused.zip
       http://oak.oakland.edu/pub/os2/emx09c/emxrt.zip

    GNU sed v3.0
    Note: version 3.0 was withdrawn due to numerous bugs, and as soon
    as someone gives us a URL for version 3.02 or higher compiled for
    OS/2, we will remove this entry. User beware!
       ftp://hobbes.nmsu.edu/pub/os2/apps/editors/gnused.zip                   |

2.2.1.3. Microsoft Windows (3.1, NT, Win95)

    GNU sed v3.02
    32-bit binaries and source, using DJGPP compiler. Requires 80386 SX
    or better. Also requires 3 CWS*.EXE extenders if run under MS-DOS.
    See section 5.5 ("What is CSDPMI*B.ZIP and why do I need it?"),
    below. This version will run under Windows or under MS-DOS.

    The binary archive (sed302b.zip) contains 2 executables, sed.exe
    and gsed.exe.  sed.exe was compiled with the DJGPP regex library,
    which is POSIX.2-compliant and usually runs faster; gsed.exe was
    compiled with the GNU regex library, which though it runs slower
    and is almost POSIX.2-compliant, it has a richer set of regexs and
    will run faster on certain complex regexs which cause the DJGPP
    sed.exe to run extremely slowly.
       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed302b.zip
       ftp://ftp.cdrom.com/.27/simtelnet/gnu/djgpp/v2gnu/sed302b.zip
       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed302s.zip
       ftp://ftp.cdrom.com/.27/simtelnet/gnu/djgpp/v2gnu/sed302s.zip

    GNU sed v2.05
    32-bit binaries, no docs. Requires 80386 DX (SX will not run) and
    must be run in a DOS window or in a full screen DOS session under
    Microsoft Windows. Will not run in MS-DOS mode (outside Win/Win95).
    We recommend using GNU sed v3.02 (above) instead.
       http://www.simtel.net/pub/simtelnet/win95/prog/gsed205b.zip
       ftp://ftp.cdrom.com/.27/simtelnet/win95/prog/gsed205b.zip

    GNU sed v1.03
    modified by Frank Whaley.
       ftp://ftp.itribe.net/pub/virtunix/gnused.zip

    Again, we recommend avoiding any versions of GNU sed other than the
    current version 3.02 or 3.02a. However, this version appears to be
    built on gsed v1.03 beta as a base and then augmented farther. The
    authors did not give this sed its own version number or name. Gsed
    v1.03 is offered in the "Virtually UN*X" set of Win32 utilities at
    <http://www.itribe.net/virtunix/>. It supports Win 95/98/NT long
    filenames, and runs in a DOS session or DOS window under Microsoft
    Windows, but does not run in DOS mode. This version of sed supports
    hex, decimal, binary, and octal representation in expressions.

    The Cygwin toolkit:
       http://sourceware.cygnus.com/cygwin/

    Formerly know as "GNU-Win32 tools." According to their home page,
    "The Cygwin tools are Win32 ports of the popular GNU development
    tools for Windows NT, 95 and 98. They function through the use of
    the Cygwin library which provides a UNIX-like API on top of the
    Win32 API." The version of sed used is GNU sed v3.02.

    Minimalist GNU-Win32 (Mingw32):
      
ftp://agnes.dida.physik.uni-essen.de/home/janjaap/mingw32/binaries/sed-2.05.zip
       http://agnes.dida.physik.uni-essen.de/~janjaap/mingw32/download.html

    According to their home page, "The Minimalist GNU-Win32 Package (or
    Mingw32) is simply a set of header files and initialization code
    which allows a GNU compiler to link programs with one of the C
    run-time libraries provided by Microsoft. By default it uses
    CRTDLL, which is built into all Win32 operating systems." The
    download page says Mingw32 programs "behave like you would expect
    from a Windows application. They support drive letters, for
    example. A side effect of using CRTDLL is that Mingw32 is
    thread-safe, while Cygwin32 is not." The version of sed used is GNU
    sed v2.05.

    U/WIN:
       http://www.research.att.com/sw/tools/uwin/

    U/WIN is a suite of Unix utilities created for WinNT and Win95
    systems. It is owned by AT&T, created by David Korn (author of the
    Unix korn shell), and is freely distributed provided you sign a
    licensing agreement. U/WIN operates best with the NTFS (WinNT file
    system) but will run in degraded mode with the FAT file system and
    in further degraded mode under Win95. The complete set of utilities
    and development tools takes up about 20 megs of disk space. Sed is
    not available as a separate file for download, but comes with the
    suite.

    sed v1.5 (a/k/a HHsed), by Howard Helman
    Compiled with Mingw32 for 32-bit environments described above. This
    version should support Win95 long filenames.
       http://www.dbnet.ece.ntua.gr/~george/sed/sed15.exe
       http://www.cornerstonemag.com/sed/sed15exe.zip                          |

2.2.1.4. MS-DOS

    sed v1.5 (a/k/a HHsed), by Howard Helman
    uncompiled source code (Turbo C)
       http://filepile.com/nc/dd?sed15.zip+mega2
       ftp://ftp.simtel.net/pub/simtelnet/msdos/txtutl/sed15.zip
       ftp://ftp.cdrom.com/pub/simtelnet/msdos/txtutl/sed15.zip
       ftp://oak.oakland.edu/pub/simtelnet/msdos/txtutl/sed15.zip
       ftp://uiarchive.uiuc.edu/pub/systems/pc/simtelnet/msdos/txtutl/sed15.zip

    DOS executable and documentation
       http://filepile.com/nc/dd?sed15x.zip+mega2
       ftp://ftp.simtel.net/pub/simtelnet/msdos/txtutl/sed15x.zip
       ftp://ftp.cdrom.com/pub/simtelnet/msdos/txtutl/sed15x.zip
       ftp://oak.oakland.edu/pub/simtelnet/msdos/txtutl/sed15x.zip
       ftp://uiarchive.uiuc.edu/pub/systems/pc/simtelnet/msdos/txtutl/sed15x.zip

    sedmod v1.0, by Hern Chen
       http://www.ptug.org/sed/SEDMOD10.ZIP
       http://www.cornerstonemag.com/sed/sedmod10.zip
       ftp://garbo.uwasa.fi/pc/unix/sedmod10.zip
       CompuServe DTPFORUM, "PC DTP Tools" library, file SEDMOD.ZIP

    GNU sed v3.02
    See section 2.2.1.3 ("Microsoft Windows"), above.

    GNU sed v2.05
    Does not run under MS-DOS.

    GNU sed v1.18
    32-bit binaries and source, using DJGPP compiler. Requires 80386 SX
    or better. Also requires 3 CWS*.EXE extenders on the path. See
    section 5.5 ("What is CSDPMI*B.ZIP and why do I need it?"), below.
    We recommend using GNU sed v3.02 (above) instead.
       http://www.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed118b.zip
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2gnu/sed118b.zip
       http://www.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed118s.zip
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2gnu/sed118s.zip

    GNU sed v1.06
    16-bit binaries and source. Should run under any MS-DOS system.
       http://www.simtel.net/pub/simtelnet/gnu/gnuish/sed106.zip
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/gnuish/sed106.zip

2.2.1.5. CP/M

    ssed v2.2, by Chuck A. Forsberg
       http://oak.oakland.edu/pub/cpm/txtutl/ssed22.lbr

    Written for CP/M, ssed (for "small/stupid stream editor) supports
    only the a(ppend), c(hange), d(elete) and i(nsert) options, and
    apparently doesn't support regular expressions. It does have a -u
    option to "unsqueeze" compressed files and was used mainly in
    conjunction with dif.com for source code maintenance.

    change, by Michael M. Rubenstein
       http://oak.oakland.edu/pub/cpm/txtutl/ttools.lbr

    Rubenstein probably felt that "sed" was an obscure name, so he
    renamed it CHANGE.COM (the TTOOLS.LBR archive member CHANGE.CZM is
    a "crunched" file). Unlike ssed, change supports full RE's except
    for grouping and backreferences, and its only function is for
    global substitution.

2.2.2. Shareware and Commercial versions

2.2.2.1. Unix platforms

       ** Information needed **

2.2.2.2. OS/2

       None known

2.2.2.3. Windows NT, Windows 95

    Interix:                                                                   |
       http://www.interix.com                                                  |

    Interix (formerly known as OpenNT) is advertised as "a complete            |
    UNIX system environment running natively on Microsoft Windows NT",         |
    and is licensed and supported by Softway Systems. It offers over           |
    200 Unix utilities, and supports Unix shells, sockets, networking,         |
    and more. A single-user edition runs about $200. A free demo or            |
    evaluation copy will run for 31 days and then quit; to continue            |
    using it, you must purchase the commercial version.                        |


    UnixDos:
       http://www.unixdos.com

    UnixDos is a suite of 82 Unix utilities ported over to the Windows
    environments. There are 16-bit versions for Win 3.1 and 32-bit
    versions for WinNT/Win95. It is distributed as uncrippled shareware
    for the first 30 days. After the test period, the utilities will
    not run and you must pay the registration fee of $50.

    Their version of sed supports "\n" in the RHS of expressions, and
    increases the length of input lines to 10,000 characters. By
    special arrangement with the owners, persons who want a licensed
    version of sed *only* (without the other utilities) may pay a
    license fee of $10.

2.2.2.4. MS-DOS

    MKS (Mortice Kern Systems) Toolkit
       http://www.mks.com

    Sed comes bundled with the MKS Toolkit, which is distributed only
    as commercial software; it is not available separately.

    Thompson Automation Software
       http://www.teleport.com/~thompson/

    The Thompson Toolkit contains over 100 familiar Unix utilities,
    including a version of the Unix Korn shell. It runs under MS-DOS,
    OS/2, Win 3.0/3.1, Win95, and WinNT. Sed is one of the utilities,
    though Thompson is better known for its version of awk for DOS,
    TAWK. The toolkit runs about $150; sed is not available separately.

2.3. Where can I learn to use sed?

2.3.1. Books

    _Sed & Awk, 2d edition_, by Dale Dougherty & Arnold Robbins
    (Sebastopol, Calif: O'Reilly and Associates, 1997)
    ISBN 1-56592-225-5
       http://www.oreilly.com/catalog/sed2/noframes.html

    About 40 percent of this book is devoted to sed, and maybe 50
    percent is devoted to awk. The other 10 percent is given to regular
    expressions and concepts which are common to both tools. If you
    prefer hard copy, this is definitely the best single place to learn
    to use sed, including its advanced features.

    The first edition is also very useful. Several typos crept into the
    first printing of the first edition (though if you follow the
    tutorials closely, you'll recognize them right away). A list of
    errors from the first printing of _sed & awk_ is available at
    <http://www.cs.colostate.edu/~dzubera/sedawk.txt> (most of these
    were corrected in subsequent printings). The second edition tells
    how POSIX standards have affected these tools and covers the
    popular GNU versions of sed and awk. Price is about (US) $30.00

    -----

    _Mastering Regular Expressions_, by Jeffrey E. F. Friedl
    (Sebastopol, Calif: O'Reilly and Associates, 1997)
    ISBN 1-56592-257-3
       http://www.oreilly.com/catalog/regex/
       http://enterprise.ic.gc.ca/~jfriedl/regex/index.html

    Knowing how to use "regular expressions" is essential to effective
    use of most Unix tools. This book focuses on how regular
    expressions can be best implemented in utilities such as perl, vi,
    emacs, and awk, but also touches on sed as well. Friedl's home page
    (above) gives links to other sites which help students learn to
    master regular expressions. His site also gives a Perl script for
    determining a syntactically valid e-mail address, using regexes:
       http://enterprise.ic.gc.ca/~jfriedl/regex/email-opt.pl

    -----

    _Awk und Sed_, by Helmut Herold. (Bonn: Addison-Wesley, 1994)
    ISBN 3-89319-685-4
    VVA-Nr. 563-00685-8

    The text of this book is in German. Now out of print.

    -----

    _Linux-Unix-Profitools: awk, sed, lex, yacc und make_, by Helumt
    Herold. (Bonn: Addison-Wesley, 1998)
    ISBN 3-8273-1448-8

       http://www.addison-wesley.de:80/katalog/item.ppml?id=00262

    The text of this book is in German. (Comments from German-speaking
    reviewers appreciated!)

2.3.2. Mailing list

    The informal "seders" mailing list.  Send e-mail to

       af137@... (Al Aab)

    and a brief description of your interest.  Average mail volume
    is 15-25 messages per week. No digest form is available (yet).

2.3.3. Tutorials, electronic text

    The original users manual for sed, by Lee E. McMahon, from the
    7th edition UNIX Manual (1978), with the classic "Kubla Khan"
    example and tutorial, in formatted text format:
       http://www.urc.bl.ac.yu/manuals/progunix/sed.txt
       http://www.softlab.ntua.gr/unix/docs/sed.txt

    The source code to the preceding manual. Use "troff -ms sed" to
    print this file properly:
       http://plan9.bell-labs.com/7thEdMan/vol2/sed
       http://cm.bell-labs.com/7thEdMan/vol2/sed

    "Do It With Sed", by Carlos Duarte
       http://www.dbnet.ece.ntua.gr/~george/sed/sedtut_1.html
       http://seders.icheme.org/tutorials/sedtut_1.txt

    U-SEDIT2.ZIP, by Mike Arst (16 June 1990)
      
http://wuarchive.wustl.edu/systems/ibmpc/garbo.uwasa.fi/editor/u-sedit2.zip
       ftp://ftp.cs.umu.se/pub/pc/u-sedit2.zip
       ftp://ftp.uni-stuttgart.de/pub/systems/msdos/util/unixlike/u-sedit2.zip
       ftp://sunsite.icm.edu.pl/vol/d2/garbo/pc/editor/u-sedit2.zip
       ftp://ftp.sogang.ac.kr/.1/msdos_garbo/editor/u-sedit2.zip

    U-SEDIT3.ZIP, by Mike Arst (24 Jan. 1992)
       http://www.cornerstonemag.com/sed/u-sedit3.zip
       CompuServe DTPFORUM, "PC DTP Utilities" library, file SEDDOC.ZIP

    sed-tutorial, by Felix von Leitner
       http://www.math.fu-berlin.de/~leitner/sed/tutorial.html

    "Manipulating text with sed," chapter 14 of the SCO OpenServer
    "Operating System Users Guide"
      
http://dontask.caltech.edu:457/cgi-bin/printchapter/OSUserG/BOOKCHAPTER-14.html
       http://www.multisoft.it:457/OSUserG/_Manipulating_text_with_sed.html

    "Combining the Bourne-shell, sed and awk in the UNIX environment
    for language analysis," by Lothar M. Schmitt and Kiel T.
    Christianson. This basic tutorial on the Bourne shell, sed and awk
    downloads as a 71-page PostScript file (compressed to 290K with
    gzip). You may need to navigate down from the root to get the file.
       ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/1997/97-2-007.tar.gz
       available upon request from Lothar Schmitt <lothar@...>

2.3.4. General web and ftp sites

    http://seders.icheme.org                      # Seders Grab Bag
    http://www.cis.nctu.edu.tw/~gis84806/sed/     # Yao-Jen Chang
    http://www.math.fu-berlin.de/~guckes/sed/     # Sven Guckes
    http://www.math.fu-berlin.de/~leitner/sed/    # Felix von Leitner
    http://www.dbnet.ece.ntua.gr/~george/sed/     # Yiorgos Adamopoulos
    http://www.cornerstonemag.com/sed/            # Eric Pement

    http://spacsun.rice.edu/FAQ/sed.html

    ftp://algos.inesc.pt/pub/users/cdua/scripts/sed (Carlos Duarte)
    ftp://algos.inesc.pt/pub/users/cdua/scripts/sh  (sed & shell script)

    "Handy One-Liners For Sed", compiled by Eric Pement. A large list
    of 1-line sed commands which can be executed from the command line.
    http://www.cornerstonemag.com/sed/sed1line.txt                             |
    http://seders.icheme.org/tutorials/sedtut_9.txt                            |
    http://www.dbnet.ece.ntua.gr/~george/sed/1liners.html

    The Single UNIX Specification, Version 2 (technical man page)
    http://www.rdg.opengroup.org/onlinepubs/7908799/xcu/sed.html

    AltaVista: Advanced Query "sed script"
   
http://www.altavista.digital.com/cgi-bin/query?pg=aq&text=yes&what=web&kl=en&q=%\
22sed+script%22&r=sed&d0=2%2FSep%2F97Mar%2F86&d1=&act=search

    Getting started with sed
    http://ftp.uni-klu.ac.at/sed/sed.html

    Comments in sed
    http://www.bluesky.com.au:457/OSUserG/_Comments_in_sed.html

    "Using sed"
    http://www.multisoft.it:457/OSUserG/_Using_sed_main.html

    masm to gas converter
    http://www.delorie.com/djgpp/faq/converting/asm2s-sed.html

    HotBot results: "sed script" (101+)
   
http://www.hotbot.com/IU0WscUF5E02D2EA1554B98A996AAEA614A1E63E/?act.next=Next&MT\
=%22sed%20script%22&RG=NA&DC=100&_v=2

    mail2html.zip
    http://hiwaay.net/~crispen/src/mail2html.zip

    customize VIM to aid writing sed scripts
    http://www.fys.uio.no/~hakonrk/vim/syntax/sed.vim

------------------------------

3. TECHNICAL

3.1. More detailed explanation of basic sed

    Sed takes a script of editing commands and applies each command, in
    order, to each line of input. After all the commands have been
    applied to the first line of input, that line is output. A second
    input line is taken for processing, and the cycle repeats. Sed
    scripts can address a single line by line number or by matching a
    /RE pattern/ on the line. An exclamation mark '!' after a regex
    ('/RE/!') or line number will select all lines that do NOT match
    that address. Sed can also address a range of lines in the same
    manner, using a comma to separate the 2 addresses.

       $d               # delete the last line of the file
       /[0-9]\{3\}/p    # print lines with 3 consecutive digits
       5!s/ham/cheese/  # except on line 5, replace 'ham' with 'cheese'
       /awk/!s/aaa/bb/  # unless 'awk' is found, replace 'aaa' with 'bb'
       17,/foo/d        # delete all lines from line 17 up to 'foo'

    Following an address or address range, sed accepts curly braces
    '{...}' so several commands may be applied to that line or to the
    lines matched by the address range. On the command line, semicolons
    ';' separate each instruction and must precede the closing brace.

       sed '/Owner:/{s/yours/mine/g;s/your/my/g;s/you/me/g;}' file

    Range addresses operate differently depending on which version of
    sed is used (see section 6.8.5, below). For further information on
    using sed, consult the references in section 2.3, above. The online
    manual ("man pages") on Unix/Linux systems may be helpful (try "man
    sed"), but man pages are notoriously obscure for first-time users.

3.2. Common one-line sed scripts

    A separate document of over 70 handy "one-line" sed commands is
    available at <http://www.cornerstonemag.com/sed/sed1line.txt>. Here        |
    are fourteen of the most common sed commands for one-line use.
    MS-DOS users should replace single quotes ('...') with double
    quotes ("...") in these examples. A specific filename ("file")
    usually follows the script, though the input may also come via
    piping ("sort somefile | sed 'somescript'").

    # 1. Double space a file
    sed G file

    # 2. Triple space a file
    sed 'G;G' file

    # 3. Under UNIX: convert DOS newlines (CR/LF) to Unix format
    sed 's/.$//' file    # assumes that all lines end with CR/LF
    sed 's/^M$// file    # in bash/tcsh, press Ctrl-V then Ctrl-M

    # 4. Under DOS: convert Unix newlines (LF) to DOS format
    sed 's/$//' file                     # method 1
    sed -n p file                        # method 2

    # 5. Delete leading whitespace (spaces/tabs) from front of each line
    # (this aligns all text flush left). '^t' represents a true tab
    # character. Under bash or tcsh, press Ctrl-V then Ctrl-I.
    sed 's/^[ ^t]*//' file

    # 6. Delete trailing whitespace (spaces/tabs) from end of each line
    sed 's/[ ^t]*$//' file               # see note on '^t', above

    # 7. Delete BOTH leading and trailing whitespace from each line
    sed 's/^[ ^t]*//;s/[ ^]*$//' file    # see note on '^t', above

    # 8. Substitute "foo" with "bar" on each line
    sed 's/foo/bar/' file        # replaces only 1st instance in a line
    sed 's/foo/bar/4' file       # replaces only 4th instance in a line
    sed 's/foo/bar/g' file       # replaces ALL instances within a line

    # 9. Substitute "foo" with "bar" ONLY for lines which contain "baz"
    sed '/baz/s/foo/bar/g' file

    # 10. Delete all CONSECUTIVE blank lines from file except the first.
    # This method also deletes all blank lines from top and end of file.
    # (emulates "cat -s")
    sed '/./,/^$/!d' file       # this allows 0 blanks at top, 1 at EOF
    sed '/^$/N;/\n$/D' file     # this allows 1 blank at top, 0 at EOF

    # 11. Delete all leading blank lines at top of file (only).
    sed '/./,$!d' file

    # 12. Delete all trailing blank lines at end of file (only).
    sed -e :a -e '/^\n*$/N;/\n$/ba' file

    # 13. If a line ends with a backslash, join the next line to it.
    sed -e :a -e '/\\$/N; s/\\\n//; ta' file

    # 14. If a line begins with an equal sign, append it to the
    # previous line (and replace the "=" with a single space).
    sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D' file

3.3. Addressing and address ranges

    Sed commands may have an optional "address" or "address range"
    prefix. If there is no address or address range given, then the
    command is applied to all the lines of the input file or text
    stream. Three commands cannot take an address prefix:

    - labels, used to branch or jump within the script
    - the close brace, '}', which ends the '{' "command"
    - the '#' comment character, also technically a "command"

    An address can be a line number (such as 1, 5, 37, etc.), a regular
    expression (written in the form /RE/ or \xREx where 'x' is any
    character other than '\' and RE is the regular expression), or the
    dollar sign ($), representing the last line of the file. An
    exclamation mark (!) after an address or address range will apply
    the command to every line EXCEPT the ones named by the address. A
    null regex ("//") will be replaced by the last regex which was
    used. Also, some seds do not support \xREx as regex delimiters.

       5d               # delete line 5 only
       5!d              # delete every line except line 5
       /RE/s/LHS/RHS/g  # substitute only if RE occurs on the line
       /^$/b label      # if the line is blank, branch to ':label'
       /./!b label      # ... another way to write the same command
       \%.%!b label     # ... yet another way to write this command
       $!N              # on all lines but the last, get the Next line

    Note that an embedded newline can be represented in an address by
    the symbol \n, but this syntax is needed only if the script puts 2
    or more lines into the pattern space via the N, G, or other
    commands. The \n symbol does *not* match the newline at an
    end-of-line because when sed reads each line into the pattern space
    for processing, it strips off the trailing newline, processes the
    line, and adds a newline back when printing the line to standard
    output. To match the end-of-line, use the '$' metacharacter, as
    follows:

       /tape$/       # matches the word 'tape' at the end of a line
       /tape$deck/   # matches the word 'tape$deck' with a literal '$'
       /tape\ndeck/  # matches 'tape' and 'deck' with a newline between

    The following sed commands usually accept *only* a single address.
    All other commands (except labels, '}', and '#') accept both single
    addresses and address ranges.

       =       print to stdout the line number of the current line
       a       after printing the current line, append "text" to stdout
       i       before printing the current line, insert "text" to stdout
       q       quit after the current line is matched
       r file  prints contents of "file" to stdout after line is matched

    Note that we said "usually." If you need to apply the '=', 'a',
    'i', or 'r' commands to each and every line within an address
    range, this behavior can be coerced by the use of braces. Thus,
    "1,9=" is an invalid command, but "1,9{=;}" will print each line
    number followed by its line for the first 9 lines (and then print
    the rest of the rest of the file normally).

    Address ranges occur in the form

       <address1>,<address2>    or    <address1>,<address2>!

    where the address can be a line number or a standard /regex/.
    <address2> can also be a dollar sign, indicating the end of file.
    Under HHsed and gsed302a, <address2> may also be a notation of the
    form +num, indicating the next _num_ lines after <address1> is
    matched.

    Address ranges are:

    (1) Inclusive. The range "/From here/,/eternity/" matches all the
    lines containing "From here" up to and including the line
    containing "eternity". It will not stop on the line just prior to
    "eternity". (If you don't like this, see section 4.15.)

    (2) Plenary. They always match full lines, not just parts of lines.
    In other words, a command to change or delete an address range will
    change or delete whole lines; it won't stop in the middle of a
    line.

    (3) Multilinear. Address ranges normally match 2 lines or more. The
    second address will never match the same line the first address
    did; therefore a valid address range always spans at least two
    lines, with these exceptions which match only one line:

    - if the first address matches the last line of the file
    - if using the syntax "/RE/,3" and /RE/ occurs only once in the
      file at line 3 or below
    - if using HHsed v1.5. See section 6.8.5.

    (4) Minimalist. In address ranges with /regex/ as <address2>, the
    range "/foo/,/bar/" will stop at the first "bar" it finds, provided
    that "bar" occurs on a line below "foo". If the word "bar" occurs
    on several lines below the word "foo", the range will match all the
    lines from the first "foo" up to the first "bar". It will not
    continue hopping ahead to find more "bar"s. In other words, address
    ranges are not "greedy," like regular expressions.

    (5) Repeating. An address range will try to match more than one
    block of lines in a file. However, the blocks cannot nest. In
    addition, a second match will not "take" the last line of the
    previous block.  For example, given the following text,

       start
       stop  start
       stop

    the sed command '/start/,/stop/d' will only delete the first two
    lines. It will not delete all 3 lines.

    (6) Relentless. If the address range finds a "start" match but
    doesn't find a "stop", it will match every line from "start" to the
    end of the file. Thus, beware of the following behaviors:

       /RE1/,/RE2/  # if /RE2/ is not found, matches from /RE1/ to the
                    # end-of-file

       20,/RE/      # if /RE/ is not found, matches from line 20 to the
                    # end-of-file

       /RE/,30      # if /RE/ occurs any time after line 30, each
                    # occurrence will be matched in HHsed, sedmod, and
                    # gsed302. GNU sed v2.05 and 1.18 will match from
                    # the 2nd occurrence of /RE/ to the end-of-file.

    If these behaviors seem strange, remember that they occur because
    sed does not look "ahead" in the file. Doing so would stop sed from
    being a stream editor and have adverse effects on its efficiency.
    If these behaviors are undesirable, they can be circumvented or
    corrected by the use of nested testing within braces. The following
    scripts work under GNU sed 3.02:

       # Execute your_commands on range "/RE1/,/RE2/", but if /RE2/ is
       # not found, do nothing.
       /RE1/{:a;N;/RE2/!ba;your_commands;}                                     |

       # Execute your_commands on range "20,/RE/", but if /RE/ is not
       # found, do nothing.
       20{:a;N;/RE/!ba;your_commands;}                                         |

    As a side note, once we've used N to "slurp" lines together to test
    for the ending expression, the pattern space will have gathered
    many lines (possibly thousands) together and concatenated them as a
    single expression, with the \n sequence marking line breaks. The
    REs *within* the pattern space may have to be modified (e.g., you
    must write '/\nStart/' instead of '/^Start/' and '/[^\n]*/' instead
    of '/.*/') and other standard sed commands will be unavailable or
    difficult to use.

       # Execute your_commands on range "/RE/,30", but if /RE/ occurs
       # on line 31 or later, do not match it.
       1,30{/RE/,$ your_commands;}

    For related suggestions on using address ranges, see sections 4.2,
    4.15, and 4.19 of this FAQ. Note that HHsed contains a bug or              |
    nonstandard feature in how it implements address ranges; also, GNU
    sed 3.02a supports a zero (0) in addressing. For more details, see
    section 6.8.5 ("Range addressing with GNU sed and HHsed").

3.4. [reserved]

3.5. [reserved]

3.6. [reserved]

3.7. GNU/POSIX extensions to regular expressions

    GNU sed supports "character classes" in addition to regular
    character sets, such as [0-9A-F]. Like regular character sets,
    character classes represent any single character within a set.

    "Character classes are a new feature introduced in the POSIX
    standard. A character class is a special notation for describing
    lists of characters that have a specific attribute, but where the
    actual characters themselves can vary from country to country
    and/or from character set to character set. For example, the notion
    of what is an alphabetic character differs in the USA and in
    France." [quoted from the docs for GNU awk v3.0.3]

    Though character classes don't generally conserve space on the
    line, they help make scripts portable for international use. The
    equivalent character sets *for U.S. users* follow:

       [[:alnum:]]  - [A-Za-z0-9]     Alphanumeric characters
       [[:alpha:]]  - [A-Za-z]        Alphabetic characters
       [[:blank:]]  - [ \x09]         Space or tab characters only
       [[:cntrl:]]  - [\x00-\x19\x7F] Control characters
       [[:digit:]]  - [0-9]           Numeric characters
       [[:graph:]]  - [!-~]           Printable and visible characters
       [[:lower:]]  - [a-z]           Lower-case alphabetic characters
       [[:print:]]  - [ -~]           Printable (non-Control) characters
       [[:punct:]]  - [!-/:-@[-`{-~]  Punctuation characters
       [[:space:]]  - [ \t\v\f]       All whitespace chars
       [[:upper:]]  - [A-Z]           Upper-case alphabetic characters
       [[:xdigit:]] - [0-9a-fA-F]     Hexadecimal digit characters

    Note that [[:graph:]] does not match the space " ", but [[:print:]]
    does. Some character classes may (or may not) match characters in
    the high ASCII range (ASCII 128-255 or 0x80-0xFF), depending on
    which C library was used to compile sed. For non-English languages,
    [[:alpha:]] and other classes may also match high ASCII characters.

------------------------------

4. EXAMPLES

4.1. How do I perform a case-insensitive search?

    Use GNU sed v3.02 with the I flag ("/regex/I" or "s/LHS/RHS/I").
    Or use sedmod with the -i switch on the command line. With other
    versions of sed this is not easy to do, so some people use GNU awk
    (gawk), mawk, or perl, since these programs have options for
    case-insensitive searches. In gawk/mawk, use "BEGIN {IGNORECASE=1}"
    and in perl, "/regex/i". For sed, here are three solutions:

    Solution 1: convert everything to upper case and search normally

       # sed script, solution 1
       h;          # copy the original line to the hold space
                   # convert the pattern space to solid caps
       y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
                   # now we can search for the word "CARLOS"
       /CARLOS/ {
            # add or insert lines. Note: "s/.../.../" will not work
            # here because we are searching a modified pattern
            # space and are not printing the pattern space.
       }
       x;          # get back the original pattern space
                   # the original pattern space will be printed

    Solution 2: search for both cases

    Often, proper names will either start with all lower-case ("unix"),
    with an initial capital letter ("Unix") or occur in solid caps
    ("UNIX"). There may be no need to search for every possibility.

       /UNIX/b match
       /[Uu]nix/b match

    Solution 3: search for all possible cases

       # If all else fails, search for any possible combination
       /[Ca][Aa][Rr][Ll][Oo][Ss]/...

    Bear in mind that as the pattern length increases, this solution
    becomes an order of magnitude slower than the one of Solution 1, at
    least with some implementations of sed.

4.2. How do I make changes in only part of a file?

    Select parts of a file for changing by naming a range of lines
    either by number (e.g., lines 1-20), by RE (between the words "foo"
    and "bar"), or by some combination of the two. For multiple
    changes, put the substitution command between braces {...}.

       # replace only between lines 1 and 20
       1,20 s/Johnson/White/g

       # replace everywhere EXCEPT between lines 1 and 20
       1,20 !s/Johnson/White/g

       # replace only between words "foo" and "bar"
       /foo/,/bar/ { s/Johnson/White/g; s/Smith/Wesson/g; }

       # replace only from the words "ENDNOTES:" to the end of file
       /ENDNOTES:/,$ { s/Schaff/Herzog/g; s/Kraft/Ebbing/g; }

    For technical details on using address ranges, see section 3.3
    ("Addressing and Address ranges").

4.3. How do I change only the first occurrence of a pattern?

    To replace the regex "LHS" with "RHS", do this:

       gsed '0,/LHS/s//RHS/'                       # GNU sed 3.02a
       sed -e '1s/LHS/RHS/;t' -e '1,/LHS/s//RHS/'  # other seds

    If you know the pattern *won't* occur on the first line, omit the
    first -e and the statement following it.

4.4. How do I make substitutions in every file in a directory, or in a
      complete directory tree?

4.4.1. - Perl solution

    (Yes, we know this is a FAQ file for sed, not perl, but the
    solution is so simple that it has to be noted. Also, perl and
    sed share a very similar syntax here.)

       perl -pi.bak -e 's|foo|bar|g' filelist

    For each file in the filelist, perl renames the source file to
    "filename.bak"; the modified file gets the original filename.
    Change '-pi.bak' to '-pi' if you don't need backup copies. (Note
    the use of s||| instead of s/// here, and in the scripts below.
    The vertical bars in the 's' command lets you replace '/some/path'
    with '/another/path', accommodating slashes in the LHS and RHS.)

4.4.2. - Unix solution

    For all files in a single directory, assuming they end with *.txt
    and you have no files named "[anything].txt.bak" already, use a
    shell script:

       #! /bin/sh
       # Source files are saved as "filename.txt.bak" in case of error
       # The '&&' after cp is an additional safety feature
       for file in *.txt
       do
          cp $file $file.bak &&
          sed 's|foo|bar|g' $file.bak >$file
       done

    To do an entire directory tree, use the Unix utility find, like so
    (thanks to Jim Dennis <jadestar@...> for this script):

       #! /bin/sh
       # filename: replaceall
       find . -type f -name '*.txt' -print | while read i
       do
          sed 's|foo|bar|g' $i > $i.tmp && mv $i.tmp $i
       done

    This previous shell script recurses through the directory tree,
    finding only files in the directory (not symbolic links, which will
    be encountered by the shell command "for file in *.txt", above). To
    preserve file permissions and make backup copies, use the 2-line cp
    routine of the earlier script instead of "sed ... && mv ...". By
    replacing the sed command 's|foo|bar|g' with something like

       sed "s|$1|$2|g" ${i}.bak > $i

    using double quotes instead of single quotes, the user can also
    employ positional parameters on the shell script command tail, thus
    reusing the script from time to time. For example,

       replaceall East West

    would modify all your *.txt files in the current directory.

4.4.3. - DOS solution:

    MS-DOS users should use two batch files like this:

       @echo off
       :: MS-DOS filename: REPLACE.BAT
       ::
       :: Create a destination directory to put the new files.
       :: Note: The next command will fail under Novel Netware
       :: below version 4.10 unless "SHOW DOTS=ON" is active.
       if not exist .\NEWFILES\NUL mkdir NEWFILES
       for %%f in (*.txt) do CALL REPL_2.BAT %%f
       echo Done!!
       :: =======End of the first batch file====

       @echo off
       :: MS-DOS filename: REPL_2.BAT
       ::
       sed "s/foo/bar/g" %1 > NEWFILES\%1
       :: =======End of the second batch file===

    When finished, the current directory contains all the original
    files, and the newly-created NEWFILES subdirectory contains the
    modified *.TXT files. Do not attempt a command like

       for %%f in (*.txt) do sed "s/foo/bar/g" %%f >NEWFILES\%%f

    under any version of MS-DOS because the output filename will be
    created as a literal '%f' in the NEWFILES directory before the
    %%f is expanded to become each filename in (*.txt). This occurs
    because MS-DOS creates output filenames via redirection commands
    before it expands "for..in..do" variables.

    To recurse through an entire directory tree in MS-DOS requires a
    batch file more complex than we have room to describe. Examine the
    file SWEEP.BAT in Timo Salmi's great archive of batch tricks,
    TSBAT58.ZIP, located at <ftp://garbo.uwasa.fi/pc/ts/tsbat58.zip>,
    or get an external program designed for directory recursion. Here
    are some recommended programs for directory recursion:
       http://www.geocities.com/SiliconValley/Lakes/1888/forall.zip
       http://www.geocities.com/SiliconValley/Lakes/2414/fortn711.zip
       http://garbo.uwasa.fi/pc/filefind/target15.zip

4.5. How do I parse a comma-delimited data file?

    Comma-delimited data files can come in several forms, requiring
    increasing levels of complexity in parsing and handling:

    (a) No quotes, no internal commas

       1001,John Smith,PO Box 123,Chicago,IL,60699
       1002,Mary Jones,320 Main,Denver,CO,84100,

    (b) Like (a), with quotes around each field

       "1003","John Smith","PO Box 123","Chicago","IL","60699"
       "1004","Mary Jones","320 Main","Denver","CO","84100"

    (c) Like (b), with embedded commas

       "1005","Tom Hall, Jr.","61 Ash Ct.","Niles","OH","44446"
       "1006","Bob Davis","429 Pine, Apt. 5","Boston","MA","02128"

    (d) Like (c), with embedded commas and quotes

       "1007","Sue "Red" Smith","19 Main","Troy","MI","48055"                  |
       "1008","Joe "Hey, guy!" Hall","POB 44","Reno","NV","89504"              |

    In each example above, we have 7 fields and 6 commas which function
    as field separators. Case (c) is a very typical form of these data
    files, with double quotes used to enclose each field and to protect
    internal commas (such as "Tom Hall, Jr.") from interpretation as
    field separators. However, many times the data may include both
    embedded quotation marks as well as embedded commas, as seen by
    case (d), above.

    Before handling a comma-delimited data file, make sure that you
    fully understand its format and check the integrity of the data.
    Does each line contain the same number of fields? Should certain
    fields be composed only of numbers or of two-letter state
    abbreviations in all caps? Sed (or awk or perl) should be used to
    validate the integrity of the data file before you attempt to alter
    it or extract particular fields from the file.

    After ensuring that each line has a valid number of fields, use sed
    to locate and modify individual fields, using the \(...\) grouping
    command where needed.

    In case (a):

       sed 's/^[^,]*,[^,]*,[^,]*,[^,]*,/.../'
               ^     ^     ^
               |     |     |_ 3rd field
               |     |_______ 2nd field
               |_____________ 1st field

       # Unix script to delete the second field for case (a)
       sed 's/^\([^,]*\),[^,]*,/\1,,/' file                                    |

       # Unix script to change field 1 to 9999 for case (a)                    |
       sed 's/^[^,]*,/9999,/' file                                             |

    In cases (b) and (c):

       sed 's/^"[^"]*","[^"]*","[^"]*","[^"]*",/.../'
                1st--   2nd--   3rd--   4th--

       # Unix script to delete the second field for case (c)
       sed 's/^\("[^"]*"\),"[^"]*",/\1,"",/' file                              |

       # Unix script to change field 1 to 9999 for case (c)                    |
       sed 's/^"[^"]*",/"9999",/' file                                         |

    In case (d):                                                               |

    One way to parse such files is to replace the 3-character field            |
    separator "," with an unused character like the tab or vertical            |
    bar. (Technically, the field separator is only the comma while the         |
    fields are surrounded by "double quotes", but the net _effect_ is          |
    that fields are separated by quote-comma-quote, with quote                 |
    characters added to the beginning and end of each record.) Search          |
    your datafile _first_ to make sure that your character appears             |
    nowhere in it!                                                             |

       sed -n '/|/p' file        # search for any instance of '|'              |
       # if it's not found, we can use the '|' to separate fields              |

    Then replace the 3-character field separator and parse as before:          |

       # sed script to delete the second field for case (d)                    |
       s/","/|/g;                  # global change of "," to bar               |
       s/^\([^|]*\)|[^|]|/\1||/;   # delete 2nd field                          |
       s/|/","/g;                  # global change of bar back to ","          |

       # sed script to change field 1 to 9999 for case (d)                     |
       # Remember to accommodate leading and trailing quote marks              |
       s/","/|/g;                                                              |
       s/^[^|]*|/"9999|/;                                                      |
       s/|/","/g;                                                              |

    Note that this technique works only if _each_ and _every_ field is         |
    surrounded with double quotes, including empty fields. If your             |
    datafile does not look like case (d), above, or if it omits quote          |
    marks around empty fields or numeric values, then the complexity of        |
    the script would probably not be worth the effort to write it in           |
    sed. For such a case, you should use perl. This question is                |
    addressed in the Perl FAQ, at question 4.28: "How can I split a            |
    [character] delimited string except when inside [character]?"              |

4.6. How do I insert a newline into the RHS of a substitution?

    Five versions of sed permit '\n' to be typed directly into the RHS,
    which is then converted to a newline on output: HHsed (aka sed15),
    sedmod, gsed103 (with the -x switch), gsed302a, and UnixDOS sed.
    The _easiest_ solution is to use one of these versions.

    For other versions of sed, try one of the following:

    (a) Insert an unused character and pipe the output through tr:

       echo twolines | sed 's/two/& new=/' | tr "=" "\n"   # produces
       two new
       lines

    (b) Use two backslashes (\\) from the shell prompt. Using bash:

       [bash-prompt]$ echo twolines | sed "s/two/& new\\
       >/"
       two new
       lines
       [bash-prompt]$

    (c) Write a multi-line script and use the backslash (\) in the
    middle of the "replace" portion:

       sed -f newline.sed files

       # newline.sed
       s/twolines/two new\
       lines/g

    Some versions of sed may not need the trailing backslash. If so,
    remove it.

    (d) Use the "G" command:

    G appends a newline, plus the contents of the hold space to the end
    of the pattern space. If the hold space is empty, a newline is
    appended anyway. The newline is stored in the pattern space as "\n"
    where it can be addressed by grouping "\(...\)" and moved in the
    RHS. Thus, to change the "twolines" example used earlier, the
    following script will work:

       sed '/twolines/{G;s/\(two\)\(lines\)\(\n\)/\1\3\2/;}'

    (e) Inserting full lines, not breaking lines up:

    If one is not *changing* lines but only inserting complete lines
    before or after a pattern, the procedure is much easier. Use the
    "i" (insert) or "a" (append) command, making the alterations by an
    external script. To insert "This line is new" BEFORE each line
    matching a regex:

       /RE/i This line is new               # HHsed, sedmod, gsed 3.02a
       /RE/{x;s/.*/This line is new/;G;}    # other seds

    To append "This line is new" AFTER each line matching a regex:

       /RE/a This line is new               # HHsed, sedmod, gsed 3.02a
       /RE/{G;s/$/This line is new/;}       # other seds

    To append 2 blank lines after each line matching a regex:

       /RE/{G;G;}                    # assumes the hold space is empty

    To replace each line matching a regex with 5 blank lines:

       /RE/{s/.*//;G;G;G;G;}         # assumes the hold space is empty

    (f) Use the "y///" command if possible:

    On some Unix versions of sed (not GNU sed!), though the s///
    command won't accept '\n' in the RHS, the y/// command does. If
    your Unix sed supports it, a newline after "aaa" can be inserted
    this way (which is not portable to GNU sed or other seds):

       s/aaa/&~/; y/~/\n/;    # assuming no other '~' is on the line!

4.7. How do I represent control-codes or nonprintable characters?

    For HHsed v1.5 by Howard Helman, hex codes can be represented
    on either the LHS or the RHS by the syntax \xNN, where "NN" are
    two valid hex numbers. (GNU sed does not support hex or octal
    notation.)

    Be forewarned that sed is not intended to process binary or object
    code, and also that files which contain nulls (0x00) will usually
    generate errors in most versions of sed (GNU sed 3.02a is an
    exception; it allows nulls in the input files and also in regexes).

    On Unix platforms, the 'echo' command may allow insertion of octal
    or hex values, e.g., `echo "\0nnn"` or `echo -n "\0nnn"`. The echo
    command may also support syntax like '\\b' or '\\t' for backspace
    or tab characters. Check the man pages to see what syntax your
    version of echo supports. Some versions support the following:

       # replace 0x1A (32 octal) with ASCII letters
       sed 's/'`echo "\032"`'/Ctrl-Z/g'

       # note the 3 backslashes in the command below
       sed "s/.`echo \\\b`//g"

4.8. How do I read environment variables with sed?

4.8.1. - on Unix platforms

    In Unix, environment variables are words which begin with a dollar
    sign, such as $TERM, $HOME, $user, or $path.  In sed, the dollar
    sign is used to indicate the last line of the input file, the end
    of a line (in the LHS), or a literal symbol (in the RHS). Sed
    cannot access variables directly, so one must pay attention to
    shell quoting requirements to expand the variables properly.

    To ALLOW the Unix shell to interpret the dollar sign (replacing it
    with an environment variable), put the script in double quotes:

       sed "s/_terminal-type_/$TERM/g" input.file >output.file

    To PREVENT the Unix shell from interpreting the dollar sign
    (letting sed define its meaning), put the script in single quotes:

       sed 's/.$//' DOS.file >Unix.file

    To use BOTH Unix $environment_vars and sed /end-of-line$/ pattern
    matching, use single quotes to bracket the sed part 'like so', then
    follow immediately with double quotes "$HERE" when you want the
    shell to substitute the variable, and resume with single quotes
    again where 'sed should set the meaning'. There must be NO SPACE
    between the closing single quotes and the opening double quotes. To
    demonstrate with the example two sentences above:

       sed 'like so'"$HERE"'sed should set the meaning'  # rough idea
       sed "s/$user"'$/root/' input.file >output.file    # sample use

    In the sample use above, we search for the user's name (which is
    stored as an environment variable) when it occurs at the end of the
    line ($), and we substitute the word "root" in all these occasions.

    In writing shell scripts, we likewise begin with single quote marks
    ('), close them upon encountering the variable, enclose the
    variable name in double quotes ("), and resume with single quotes,
    closing them at the end of the sed script.  Example:

       #! /bin/sh
       # lower to upper, that could be changed
       FROM='abcdefgh'
       TO='ABCDEFGH'
       ... misc commands that pipe data into a longer sed script.
       sed '
       ...
       # do the conversion
       y/'"$FROM"'/'"$TO"'/
       # some more commands go here . . .
       # last line is a single quote mark
       '

    Thus, each variable named $FROM is replaced by $TO, and the single
    quotes are used to glue the multiple lines together in the script.
    (See also section 4.10, "How do I handle shell quoting in sed?")

4.8.2. - on MS-DOS and 4DOS platforms

    Under 4DOS and MS-DOS version 7.0 (Win95) or 7.10 (Win95 OSR2),
    environment variables can be accessed from the command prompt.
    Under MS-DOS 6.22 and below, environment variables can only be
    accessed from within batch files. Environment variables should be
    enclosed between percent signs and are case-insensitive; i.e.,
    %USER% or %user% will display the USER variable. To generate a true
    percent sign, just enter it twice.

    DOS versions of sed require that sed scripts be enclosed by double
    quote marks "..." (not single quotes!) if the script contains
    embedded tabs, spaces, redirection arrows or the vertical bar. In
    fact, if the input for sed comes from piping, a sed script should
    not contain a vertical bar, even if it is protected by double
    quotes (this seems to be bug in the normal MS-DOS syntax). Thus,

       echo blurk | sed "s/^/ |foo /"     # will cause an error
       sed "s/^/ |foo /" blurk.txt        # will work as expected

    Using DOS environment variables which contain DOS path statements
    (such as a TMP variable set to "C:\TEMP") within sed scripts is
    discouraged because sed will interpret the backslash '\' as a
    metacharacter to "quote" the next character, not as a normal
    symbol. Thus,

       sed "s/^/%TMP% /" somefile.txt

    will not prefix each line with (say) "C:\TEMP ", but will prefix
    each line with "C:TEMP "; sed will discard the backslash, which is
    probably not what you want. Other variables such as %PATH% and
    %COMSPEC% will also lose the backslash within sed scripts.

    Environment variables which do not use backslashes are usually
    workable. Thus, all the following should work without difficulty,
    if they are invoked from within DOS batch files:

       sed "s/=username=/%USER%/g" somefile.txt
       echo %FILENAME% | sed "s/\.TXT/.BAK/"
       grep -Ei "%string%" somefile.txt | sed "s/^/  /"

    while from either the DOS prompt or from within a batch file,

       sed "s/%%/ percent/g" input.fil >output.fil

    will replace each percent symbol in a file with " percent" (adding
    the leading space for readability).

4.9. How do I export or pass variables back into the environment?

4.9.1. - on Unix platforms

    Suppose that line #1, word #2 of the file 'terminals' contains a
    value to be put in your TERM environment variable. Sed cannot
    export variables directly to the shell, but it can pass strings to
    shell commands. To set a variable in the Bourne shell:

       TERM=`sed 's/^[^ ][^ ]* \([^ ][^ ]*\).*/\1/;q' terminals`;
       export TERM

    If the second word were "Wyse50", this would send the shell command
    "TERM=Wyse50".

4.9.2. - on MS-DOS or 4DOS platforms

    Sed cannot directly manipulate the environment. Under DOS, only
    batch files (.BAT) can do this, using the SET instruction, since
    they are run directly by the command shell. Under 4DOS, special
    4DOS commands (such as ESET) can also alter the environment.

    Under DOS or 4DOS, sed can select a word and pass it to the SET
    command. Suppose you want the 1st word of the 2nd line of MY.DAT
    put into an environment variable named %PHONE%. You might do this:

       @echo off
       sed -n "2 s/^\([^ ][^ ]*\) .*/SET PHONE=\1/;3q" MY.DAT > GO_.BAT
       call GO_.BAT
       echo The environment variable for PHONE is %PHONE%
       :: cleanup
       del GO_.BAT

    The sed script assumes that the first character on the 2nd line is
    not a space and uses grouping \(...\) to save the first string of
    non-space characters as \1 for the RHS. In writing any batch files,
    make sure that output filenames such as GO_.BAT don't overwrite
    preexisting files of the same name.

4.10. How do I handle Unix shell quoting in sed?

    To embed a literal single quote (') in a script, use (a) or (b):

    (a) If possible, put the script in double quotes:

       sed "s/cannot/can't/g" file

    (b) If the script must use single quotes, then close-single-quote
    the script just before the SPECIAL single quote, prefix the single
    quote with a backslash, and use a 2nd pair of single quotes to
    finish marking the script. Thus:

       sed 's/cannot$/can'\''t/g' file

    Though this looks hard to read, it breaks down to 3 parts:

       's/cannot$/can'   \'   't/g'
       ---------------   --   -----

    To embed a literal double quote (") in a script, use (a) or (b):

    (a) If possible, put the script in single quotes. You don't need to
    prefix the double quotes with anything. Thus:

       sed 's/14"/fourteen inches/g' file

    (b) If the script must use double quotes, then prefix the SPECIAL
    double quote with a backslash (\). Thus,

       sed "s/$length\"/$length inches/g" file

    To embed a literal backslash (\) into a script, enter it twice:

       sed 's/C:\\DOS/D:\\DOS/g' config.sys

4.11. How do I delete a block of text if the block contains a certain
       regular expression?

    Suppose the beginning of the block is indicated by 'BLOCK_TOP' and
    the end of the block is indicated by 'BLOCK_END'. If the expression
    'regex' appears anywhere within the block, the entire block should
    be deleted. This script can be modified to match different types
    of block markers; it deletes the entire line containing the string
    'BLOCK_TOP' but preserves the rest of the line after 'BLOCK_END'.
    Written by Russell Davies <c9415019@...>:

       :t
       /BLOCK_TOP/,/BLOCK_END/ {
          /BLOCK_END/!  { N; b t; }
          /regex/s/^.*BLOCK_END//
       }

4.12. How do I locate/print a paragraph of text if the paragraph
       contains a certain regular expression?

    Assume that paragraphs are separated by blank lines. For regexes
    that are single terms, use the following script:

       sed -e '/./{H;$!d;}' -e 'x;/regex/!d'

    To print paragraphs only if they contain 3 specific regular
    expressions (RE1, RE2, and RE3), in any order in the paragraph:

       sed -e '/./{H;$!d;}' -e 'x;/RE1/!d;/RE2/!d;/RE3/!d'

    With this solution and the preceding one, if the paragraphs are
    excessively long (more than 4k in length), you may overflow sed's
    internal buffers. If using HHsed, you must add a "G;" command
    immediately after the "x;" in the scripts above to defeat a bug
    in HHsed (see section 6.7.D(4), below, for a description).

4.13. How do I delete a block of _specific_ consecutive lines?

    If the block of lines always looks like this (with '^' and '$'
    representing the beginning and end of line, respectively):

       ^able$
       ^baker$
       ^charlie$
       ^delta$

    and if there is never any deviation from this format (e.g., "able"
    *always* is followed by "baker", etc.), this will work fine:

       sed '/^able$/,/^delta$/d' files      # most seds
       sed '/^able$/,+3d' files             # HHsed, sedmod, gsed 3.02a

    However, if the top line sometimes appears alone or is followed by
    other lines, if the block may have additional lines in the middle,
    or if a partial block could possibly occur somewhere in the file, a
    more explicit script is needed.

    The following scripts show how to delete blocks of specific
    consecutive lines. Only an exact match of the block is deleted, and
    partial matches of the block are left alone.

       # sed script to delete 2 consecutive lines: /^RE1\nRE2$/
       $b
       /^RE1$/ {
         $!N
         /^RE1\nRE2$/d
         P;D
       }
       #---end of script---


       # sed script to delete 3 consecutive lines. (This script
       # fails under GNU sed earlier than version 3.02.)
       : more
       $!N
       s/\n/&/2;
       t enough
       $!b more
       : enough
       /^RE1\nRE2\nRE3$/d
       P;D
       #---end of script---

    For example, to delete a block of 5 consecutive lines, the previous
    script must be altered in only two places:

    (1) Change the 2 in "s/\n/&/2;" to a 4 (the trailing semicolon is
    needed to work around a bug in HHsed v1.5).

    (2) Change the regex line to "/^RE1\nRE2\nRE3\nRE4\nRE5$/d",
    modifying the expression as needed.

    Suppose we want to delete a block of two blank lines followed by
    the word "foo" followed by another blank line (4 lines in all).
    Other blank lines and other instances of "foo" should be left
    alone. After changing the '2' to a '3' (always one number less than
    the total number of lines), the regex line would look like this:
    "/^\n\nfoo\n$/d". (Thanks to Greg Ubben for this script.)

    As an alternative for older versions of GNU sed, the following
    script will delete 4 consecutive lines:

       # sed script to delete 4 consecutive lines (gsed-2.05 and below)
       /^RE1$/!b
       $!N
       $!N
       :a
       $b
       N
       /^RE1\nRE2\nRE3\nRE4$/d
       P
       s/^.*\n\(.*\n.*\n.*\)$/\1/
       ba
       #---end of script---

    Its drawback is that it must be modified in 3 places instead of 2
    to adapt it for more lines, and as additional lines are added, the
    's' command is forced to work harder to match the regexes. On the
    other hand, it avoids a problem with gsed-2.05 and shows another
    way to solve the problem of deleting consecutive lines.

4.14. How do I read (insert/add) a file at the top of a textfile?

    Given a textfile, file1, one may wish to prepend or insert an
    external file, fileT, to the top of it before processing the file.
    Normally, this should be done from the Unix or DOS shell before
    passing file1 on to sed (MS-DOS 5.0 or lower needs 3 commands to do
    this; for DOS 6.0 or higher, the MOVE command is available):

       copy fileT+file1 temp                   # MS-DOS command 1
       echo Y | copy temp file1                # MS-DOS command 2
       del temp                                # MS-DOS command 3
       cat fileT file1 >temp; mv temp file1    # Unix commands

    However, if inserting the file must be done from within sed, there
    is a way. The expected sed command "1 r fileT" will not work; it
    first prints line 1 and then inserts fileT between lines 1 and 2.
    The following two-line sed script solves this problem, although
    there must be at least 2 lines in file1 for the script to work
    properly:

       1{ h; r fileT; D; }
       2{ x; G; }

4.15. How do I address all the lines between RE1 and RE2, excluding
       the lines themselves?

    Normally, to address the lines between two regular expressions, RE1
    and RE2, one would do this: '/RE1/,/RE2/{commands;}'. Excluding
    those lines takes an extra step. To put 2 arrows before each line
    between RE1 and RE2, except for those lines:

       sed '1,/RE1/!{ /RE2/,/RE1/!s/^/>>/; }' input.fil

    The preceding script, though short, may be difficult to follow. It
    also requires that /RE1/ cannot occur on the first line of the
    input file. The following script, though it's not a one-liner, is
    easier to read and it permits /RE1/ to appear on the first line:

       /RE1/,/RE2/{
         /RE1/b
         /RE2/b
         s/^/>>/
       }

    Contents of input.fil:         Output of sed script:
       aaa                           aaa
       bbb                           bbb
       RE1                           RE1
       aaa                           >>aaa
       bbb                           >>bbb
       ccc                           >>ccc
       RE2                           RE2
       end                           end

4.16. How do I put "/some/path/here" into the LHS of a substitution?

    Technically, the normal meaning of the slash can be disabled by
    prefixing it with a backslash. Thus,

       sed 's/\/some\/path\/here/\/a\/new\/path/g' files

    But this is hard to read and write. There is a better solution.
    The s/// substitution command allows '/' to be replaced by any
    other character (including spaces or alphanumerics). Thus,

       sed 's|/some/path/here|/a/new/path|g' files                             |

    and if you are using variable names in a Unix shell script,                |

       sed "s|$OLDPATH|$NEWPATH|g" oldfile >newfile                            |

4.17. How do I replace "C:\SOME\DOS\PATH" in a substitution?                  |

    For MS-DOS users, every backslash must be doubled. Thus, to replace        |
    "C:\SOME\DOS\PATH" with "D:\MY\NEW\PATH" --                                |

       sed "s|C:\\SOME\\DOS\\PATH|D:\\MY\\NEW\\PATH|g" infile >outfile         |

    Remember that DOS pathnames are not case sensitive and can appear          |
    in upper or lower case in the input file. If this concerns you, use        |
    gsed v3.02 with the "i" flag or sedmod with the -i switch to ignore        |
    case on the LHS:                                                           |

       @echo off                                                               |
       :: sample MS-DOS batch file to alter path statements                    |
       set old=C:\\SOME\\DOS\\PATH                                             |
       set new=D:\\MY\\NEW\\PATH                                               |
       gsed "s|%old%|%new%|gi" infile >outfile                                 |
       :: or                                                                   |
       ::     sedmod -i "s|%old%|%new%|g" infile >outfile                      |
       set old=                                                                |
       set new=                                                                |

    Also, remember that under Win95 long filenames may be stored in two        |
    formats: e.g., as "C:\Program Files" or as "C:\PROGRA~1".                  |

4.18. How do I convert files with toggle characters, like +this+, to          |
       look like [i]this[/i]?

    Input files, especially message-oriented text files, often contain
    toggle characters for emphasis, like ~this~, *this*, or =this=. Sed        |
    can make the same input pattern produce alternating output each            |
    time it is encountered. Typical needs might be to generate HMTL            |
    codes or print codes for boldface, italic, or underscore. This             |
    script accomodates multiple occurrences of the toggle pattern on           |
    the same line, as well as cases where the pattern starts on one            |
    line and finishes several lines later, even at the end of the file:        |

       # sed script to convert +this+ to [i]this[/i]                           |
       :a                                                                      |
       /+/{ x;        # If "+" is found, switch hold and pattern space         |
         /^ON/{       # If "ON" is in the (former) hold space, then ..         |
           s///;      # .. delete it                                           |
           x;         # .. switch hold space and pattern space back            |
           s|+|[/i]|; # .. turn the next "+" into "[/i]"                       |
           ba;        # .. jump back to label :a and start over                |
         }                                                                     |
       s/^/ON/;       # Else, "ON" was not in the hold space; create it        |
       x;             # Switch hold space and pattern space                    |
       s|+|[i]|;      # Turn the first "+" into "[i]"                          |
       ba;            # Branch to label :a to find another pattern             |
       }                                                                       |
       #---end of script---                                                    |

    This script uses the hold space to create a "flag" to indicate             |
    whether the toggle is ON or not. We have added remarks to                  |
    illustrate the script logic, but in most versions of sed remarks           |
    are not permitted after 'b'ranch commands or labels.                       |

    If you are sure that the +toggle+ characters never cross line
    boundaries (i.e., never begin on one line and end on another), this
    script can be reduced to one line:

       s|+\([^+][^+]*\)+|[i]\1[/i]|g

    If your toggle pattern contains regex metacharacters (such as * and        |
    +, in the case of HHsed), remember to quote them with backslashes.         |

4.19. How do I delete only the first occurrence of a pattern?                 |

    To delete only the first line that contains the pattern RE, where
    "RE" is any regular expression, but leave all other lines
    containing RE alone, do this:

       gsed '0,/RE/{//d}' file                     # GNU sed 3.02a
       sed '/RE/{x;/Y/!{s/^/Y/;h;d;};x;}' file     # other seds

    And if you *know* the pattern will not occur on line 1 and you
    don't use GNU sed, this will work:

       sed '1,/RE/{/RE/d;}' file

4.20. How do I commify a string of numbers?                                   |

    Use the simplest script necessary to accomplish your task. As
    variations of the line increase, the sed script must become more
    complex to handle additional conditions. Whole numbers are
    simplest, followed by decimal formats, followed by embedded words.

    Case 1: simple strings of whole numbers separated by spaces or
    commas, with an optional negative sign. To convert this:

       4381, -1222333, and 70000: - 44555666 1234567890 words
       56890  -234567, and 89222  -999777  345888777666 chars

    to this:

       4,381, -1,222,333, and 70,000: - 44,555,666 1,234,567,890 words
       56,890  -234,567, and 89,222  -999,777  345,888,777,666 chars

    use one of these one-liners:

       sed ':a;s/\B[0-9]\{3\}\>/,&/;ta'                      # GNU sed
       sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'  # other seds

    Case 2: strings of numbers which may have an embedded decimal
    point, separated by spaces or commas, with an optional negative
    sign. To change this:

       4381,  -6555.1212 and 70000,  7.18281828  44906982.071902
       56890   -2345.7778 and 8.0000:  -49000000 -1234567.89012

    to this:

       4,381,  -6,555.1212 and 70,000,  7.18281828  44,906,982.071902
       56,890   -2,345.7778 and 8.0000:  -49,000,000 -1,234,567.89012

    use the following command for GNU sed:

       sed ':a;s/\(^\|[^0-9.]\)\([0-9]\+\)\([0-9]\{3\}\)/\1\2,\3/g;ta'

    and for other versions of sed:

       sed -f case2.sed files

       # case2.sed
       s/^/ /;                 # add space to start of line
       :a
       s/\( [-0-9]\{1,\}\)\([0-9]\{3\}\)/\1,\2/g
       ta
       s/ //;                  # remove space from start of line
       #---end of script---

------------------------------

5. WHY ISN'T THIS WORKING?

5.1. Why don't my variables like $var get expanded in my sed script?

    Because your sed script uses 'single quotes' instead of "double
    quotes". Unix shells never expand $variables in single quotes.

    This is probably the most frequently-asked sed question. For more
    info on using variables, see section 4.8.

5.2. I'm using 'p' to print, but I have duplicate lines sometimes.

    Sed prints the entire file by default, so the 'p' command might
    cause the duplicate lines. If you want the whole file printed,
    try removing the 'p' from commands like 's/foo/bar/p'. If you want
    part of the file printed, run your sed script with -n flag to
    suppress normal output, and rewrite the script to get all output
    from the 'p' comand.

    If you're still getting duplicate lines, you are probably finding
    several matches for the same line. Suppose you want to print lines
    with the words "Peter" or "James" or "John", but not the same line
    twice. The following command will fail:

       sed -n '/Peter/p; /James/p; /John/p' files

    Since all 3 commands of the script are executed for each line,
    you'll get extra lines. A better way is to use the 'd' (delete) or
    'b' (branch) commands, like so (with GNU sed):

       sed '/Peter/b; /James/b; /John/b; d' files          # one way
       sed -n '/Peter/{p;d;};/James/{p;d;};/John/p' files  # a 2nd way
       sed -n '/Peter/{p;b;};/James/{p;b;};/John/p' files  # a 3rd way
       sed '/Peter\|James\|John/!d' files                  # best way :-)

    On standard seds, these must be broken down with -e commands:

       sed -e '/Peter/b' -e '/James/b' -e '/John/b' -e d files
       sed -n -e '/Peter/{p;d;}' -e '/James/{p;d;}' -e '/John/p' files

    The 3rd line would require too many -e commands to fit on one line,
    since standard versions of sed require an -e command after each 'b'
    and also after each closing brace '}'.

5.3. Why does my DOS version of sed process a file part-way through
      and then quit?

    First, look for errors in the script. Have you used the -n switch
    without telling sed to print anything to the console?  Have you
    read the docs to your version of sed to see if it has switches or a
    syntax you may have misused? If you are sure your sed script is
    valid, a probable cause is an end-of-file (EOF) marker embedded in
    the file. An EOF marker (a/k/a SUB) is a Control-Z character, with
    the values of 1A hex or 026 decimal. As soon as any DOS version of
    sed encounters a Ctrl-Z character, sed stops processing.

    To locate the EOF character, use Vern Buerg's shareware file viewer
    LIST.COM <http://www.buerg.com/list.html>. In text mode, look for a
    right-arrow symbol; in hex mode (Alt-H), look for a 1A code. With
    Unix utilities ported to DOS, use 'od' (octal dump) to display
    hexcodes in your file, and then use sed to locate the offending
    character:

       od -txC badfile.txt | sed -n "/ 1a /p; / 1a$/p"

    Then edit the input file to remove the offending character(s).

    If you would rather NOT edit the input file, there is still a fix.
    It requires the DJGPP 32-bit port of 'tr', the Unix translate
    program, ver 1.22. This version is included as one of the GNU text
    utilities, available at
       http://www.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/txt122b.zip
    It is important to get the DJGPP version of 'tr' because other
    versions ported to DOS will stop processing when they encounter the
    EOF character. Use the -d (delete) command:

       tr -d \32 < badfile.txt | sed -f myscript.sed

5.4. My RE isn't matching/deleting what I want it to. (Or, "Greedy vs.
      stingy pattern matching")

    The two most common causes for this problem are: (1) misusing the
    '.' metacharacter, and (2) misusing the '*' metacharacter. The RE
    '.*' is designed to be "greedy" (i.e., matching as many characters
    as possible). However, sometimes users need an expression which is
    "stingy," matching the shortest possible string.

    (1) On single-line patterns, the '.' metacharacter matches any
    single character on the line. ('.' cannot match the newline at the
    end of the line because the newline is removed when the line is put
    into the pattern space; sed adds a newline automatically when the
    pattern space is printed.) On multi-line patterns obtained with the
    'N' or 'G' commands, '.' _will_ match a newline in the middle of the
    pattern space. If there are 3 lines in the pattern space, "s/.*//"
    will delete all 3 lines, not just the first one (leaving 1 blank
    line, since the trailing newline is added to the output).

    Normal misuse of '.' occurs in trying to match a word or bounded
    field, and forgetting that '.' will also cross the field limits.
    Suppose you want to delete the first word in braces:

       echo {one} {two} {three} | sed 's/{.*}/{}/'       # fails
       echo {one} {two} {three} | sed 's/{[^}]*}/{}/'    # succeeds

    's/{.*}/{}/' is not the solution, since the regex '.' will match
    any character, including the close braces. Replace the '.' with
    '[^}]', which signifies a negated character set '[^...]' containing
    anything other than a right brace. FWIW, we know that 's/{one}/{}/'
    would also solve our question, but we're trying to illustrate the
    use of the negated character set: [^anything-but-this].

    A negated character set should be used for matching words between
    quote marks, for fields separated by commas, etc. See also section
    4.5 ("How do I parse a comma-delimited data file?"), above.

    (2) The '*' metacharacter represents zero or more instances of the
    previous expression. The '*' metacharacter looks for the leftmost
    possible match first and will match zero characters. Thus,

       echo foo | sed 's/o*/EEE/'

    will generate 'EEEfoo', not 'fEEE' as one might expect. This is
    because /o*/ matches the null string at the beginning of the word.

    After finding the leftmost possible match, the '*' is GREEDY; it
    always tries to match the longest possible string. When two or
    three instances of '.*' occur in the same RE, the leftmost instance
    will grab the most characters. Consider this example, which uses
    grouping '\(...\)' to save patterns:

       echo bar bat bay bet bit | sed 's/^.*\(b.*\)/\1/'

    What will be displayed is 'bit', never anything longer, because
    the leftmost '.*' took the longest possible match. Remember this
    rule: "leftmost match, longest possible string, zero also matches."

5.5. What is CSDPMI*B.ZIP and why do I need it?

    If you boot to MS-DOS instead of Windows and try to use GNU sed
    v1.18 or 3.02, you may encounter the following error message:

       no DPMI - Get csdpmi*b.zip

    "DPMI" stands for DOS Protected Mode Interface; it's basically a
    means of running DOS in Protected Mode (as opposed to Real Mode),
    which allows programs to share resources in extended memory without
    conflicting with one another. Running HIMEM.SYS and EMM386.EXE is
    not enough. The "CSDPMI*B.ZIP" refers to files written by Charles
    Sandmann to provide DPMI services for 32-bit computers (i.e.,
    386SX, 386DX, 486SX, etc.). Download this file:

       http://www.simtel.net/pub/simtelnet/gnu/djgpp/v2misc/csdpmi4b.zip
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2misc/csdpmi4b.zip

    and extract CWSDPMI.EXE, CWSDPR0.EXE and CWSPARAM.EXE from the ZIP
    file. Put all 3 CWS*.EXE files in the same directory as GSED.EXE
    and you're all set. There are DOC files enclosed, but they're
    nearly incomprehensible for the average computer user. (Another
    case of user-vicious documentation.)

    If you're running Windows and you normally use a DOS session to run
    GNU sed (i.e., you get to a DOS prompt with a resizable window or
    you press Alt-Enter to switch to full-screen mode), you don't need
    the CWS*.EXE files at all, since Windows uses DPMI already.

5.6. Where are the man pages for GNU sed?

    Prior to GNU sed v3.02, there weren't any. Until recently, man
    pages distributed with gsed were borrowed from old sources or from
    other compilations. None of them were "official." Even the man and
    info pages distributed with gsed 3.02 are incomplete. For example,
    they omit special regexes recognized by GNU sed not in most seds.
    See section 6.8.3 ("Special syntax in REs"), below.

5.7. How do I tell what version of sed I am using?

    Try entering "sed" all by itself on the command line, followed by
    no arguments or parameters.  Also, try "sed --version".  In a
    pinch, you can also try this:

       strings sed | grep -i ver

    Your version of 'strings' must be a version of the Unix utility of
    this name. It should not be the DOS utility STRINGS.COM by Douglas
    Boling.

5.8. Does sed issue an exit code?

    Most versions of sed do not, but check the documentation that came
    with whichever version you are using. GNU sed issues an exit code
    of 0 if the program terminated normally, 1 if there were errors in
    the script, and 2 if there were errors during script execution.

5.9. The 'r' command isn't inserting the file into the text.

    On most versions of sed (except HHsed and gsed-3.02), the 'r'
    (read) and 'w' (write) commands must be followed by exactly one
    space, then the filename, and then terminated by a newline. Any
    additional characters before or after the filename are interpreted
    as being part of the filename. Thus "/RE/r  insert.me" would try to
    locate a file called ' insert.me' (note the leading space!). If the
    file was not found, sed says nothing -- not even an error message.

    When sed scripts are used on the command line, every 'r' and 'w'
    must be the last command in that part of the script. Thus,

      sed -e '/regex/{r insert.file;d;}' source         # will fail
      sed -e '/regex/{r insert.file' -e 'd;}' source    # will succeed

5.10. Why can't I match 2 or more lines using the \n character?

    This is a constant FAQ for new sed users. Sed normally processes
    only one line at a time, so the newline is never "located" under
    typical use. The default behavior of sed is to read one line into
    the "pattern space", then to print, change, or delete that line,
    and then to get the next line from the input file (or from the
    standard input stream, if you're using sed at the end of a pipe).

    To match two or more consecutive lines, you must either use address
    ranges (see section 3.3, above), which match all the lines between
    two specified addresses, or you must explicitly add the next line to
    the pattern space by using the 'N' command (or something
    functionally similar).

    Address ranges match each line in the range one at a time, so you
    still can't use the \n there. However, it _can_ match a block of
    consecutive lines, so it may be that you don't need the \n to find
    what you're looking for. The 'N' command, on the other hand, will
    gather multiple lines into the pattern space at once, and this is
    where the \n character can be used.

    An example of using 'N' to accumulate and delete a block of 2 or
    more lines appears in section 4.13 ("How do I delete a block of
    _specific_ consecutive lines?"). This example can be modified by
    changing the delete command to something else, like 'p' (print),
    'i' (insert), 'c' (change), 'a' (append), or 's' (substitute).

------------------------------

6. OTHER ISSUES

6.1. I have a certain problem that stumps me. Where can I get help?

    Newsgroups:

       - alt.comp.editors.batch  (best choice)
       - comp.editors
       - comp.unix.questions
       - comp.unix.shell

    Send e-mail to:  Al Aab <af137@...>

    Your question will be posted on the "seders" mailing list, where
    many sed users will be able to see your question. If you do not
    want to subscribe to the list but do want a direct e-mail reply to
    your question, please indicate this somewhere in your message.

6.2. How does sed compare with awk, perl, and other utilities?

    Awk is a much richer language with many features of a programming
    language, including variable names, math functions, arrays, system
    calls, etc. Its command structure is similar to sed:

       address { command(s) }

    which means that for each line or range of lines that matches the
    address, execute the command(s). In both sed and awk, an address
    can be a line number or a RE somewhere on the line, or both.

    In program size, awk is 3-10 times larger than sed. Awk has most
    of the functions of sed, but not all. Notably, sed supports
    backreferences (\1, \2, ...) to previous expressions, and awk does
    not have any comparable function or syntax.

    Perl is a general-purpose programming language, with many features
    beyond text processing and interprocess communication, taking it
    well past awk or other scripting languages. Perl supports every
    feature sed does and has its own set of extended regular
    expressions, which give it extensive power in pattern matching and
    processing. (Note: the standard perl distribution comes with 's2p',
    a perl script which translates sed scripts into equivalent perl
    scripts.) Like sed and awk, perl scripts do not need to be compiled
    into binary code. Like sed, perl can also run many useful
    "one-liners" from the command line, though with greater
    flexibility; see question 4.3 ("How do I make substitutions in
    every file in a directory, or in a complete directory tree?").

    On the other hand, the current version of perl is from 8 to 35
    times larger than sed in its executables alone (perl's library
    modules and allied files not included!). Further, for most simple
    tasks such as substitution, sed executes more quickly than either
    perl or awk. All these utilities serve to process input text,
    transforming it to meet our needs . . . or our arbitrary whims.

6.3. When should I use sed?

    When you need a small, fast program to modify words, lines, or
    blocks of lines in a textfile.

6.4. When should I NOT use sed?

    You should not use sed when you have "dedicated" tools which can do
    the job faster or with an easier syntax. Do not use sed when you
    only want to:

    - delete individual characters. Instead of "s/[abcd]//g", use

         tr -d "[a-d]"

    - squeeze sequential characters. Instead of "s/ee*/e/g", use

         tr -s "{character-set}"

    - change individual characters. Instead of "y/abcdef/ABCDEF/", use

         tr "[a-f]" "[A-F]"

    - print individual lines, based on patterns within the line itself.
      Instead, use "grep".

    - print blocks of lines, with 1 or more lines of context above
      and/or below a specific regular expression. Instead, use the GNU
      version of grep as follows:

         grep -A{number} -B{number}

    - remove individual lines, based on patterns within the line
      itself. Instead, use "grep -v".

    - print line numbers.  Instead, use "nl" or "cat -n".

    - reformat lines or paragraphs. Instead, use "fold", "fmt" or "par".

    Though sed can perfectly emulate certain functions of cat, grep,
    nl, rev, sort, tac, tail, tr, uniq, and other utilities, producing
    identical output, the native utilities are usually optimized to do
    the job more quickly than sed.

6.5. When should I ignore sed and use Awk or Perl instead?

    If you can write the same script in Awk or Perl and do it in less
    time, then use Perl or Awk. There's no reason to spend an hour
    writing and debugging a sed script if you can do it in Perl in 10
    minutes (assuming that you know Perl already) and if the processing
    time or memory use is not a factor. Don't hunt pheasants with a .22
    if you have a shotgun at your side . . . unless you simply enjoy
    the challenge!

    Specifically, if you need to:

    - heavily comment what your scripts do. Use GNU sed, awk, or perl.
    - do case insensitive searching. Use gsed302, sedmod, awk or perl.
    - count fields (words) in a line. Use awk.
    - count lines in a block or objects in a file. Use awk.
    - check lengths of strings or do math operations. Use awk or perl.
    - handle very long lines or need very large buffers. Use gsed or perl.
    - handle binary data (control characters). Use perl (binmode).
    - loop through an array or list. Use awk or perl.
    - test for file existence, filesize, or fileage. Use perl or shell.
    - treat each paragraph as a line. Use awk.
    - indicate /alternate|options/ in regexes. Use gsed, awk or perl.
    - use syntax like \xNN to match hex codes. Use perl.
    - use (nested (regexes)) with backreferences. Use perl.

    Perl lovers: I know that perl can do everything awk can do, but
    please don't write me to complain. Why heft a shotgun when a .45
    will do? As we all know, "There is more than one way to do it."

6.6. Known limitations among sed versions

    Limits on distributed versions, although source code for most
    versions of free sed allows for modification and recompilation.
    The term "no limit" when used below means there is no "fixed"
    limit. Limits are actually determined by one's hardware, memory,
    operating system, and which C library is used to compile sed.

6.6.1. Maximum line length

       GNU sed 3.02: no limit
       GNU sed 2.05: no limit
       sedmod 1.0:   4096 bytes
       HHsed:        4000 bytes

6.6.2. Maximum size for all buffers (pattern space + hold space)

       GNU sed 3.02: no limit
       GNU sed 2.05: no limit
       sedmod 1.0:   4096 bytes
       HHsed:        4000 bytes

6.6.3. Maximum number of files that can be read with read command

       GNU sed 3.02: no limit
       GNU sed 2.05: total no. of r and w commands may not exceed 32
       sedmod 1.0:   total no. of r and w commands may not exceed 20

6.6.4. Maximum number of files that can be written with 'w' command

       GNU sed 3.02: no limit (but typical Unix is 253)
       GNU sed 2.05: total no. of r and w commands may not exceed 32
       sedmod 1.0:   10
       HHsed:        10

6.6.5. Limits on length of label names

       BSD sed:      8 characters
       GNU sed 3.02: no limit
       GNU sed 2.05: no limit
       HHsed:        no limit

6.6.6. Limits on length of write-file names

       BSD sed:      40 characters
       GNU sed 3.02: no limit
       GNU sed 2.05: no limit
       HHsed:        no limit

6.6.7. Limits on branch/jump commands

       HHsed:        50

    As a practical consequence, this means that HHsed will not read
    more than 50 lines into the pattern space via an N command, even if
    the pattern space is only a few hundred bytes in size. HHsed exits
    with an error message, "infinite branch loop at line {nn}".

6.7. Known bugs among sed versions

A. GNU sed v3.02, 3.02a

    (1) Affects only v3.02 binaries compiled with DJGPP for MS-DOS and
    MS-Windows: 'l' (list) command does not display a lone carriage
    return (0x0D, ^M) embedded in a line.

B. GNU sed v2.05

    (1) If a number follows the substitute command (e.g., s/f/F/10) and
    the number exceeds the possible matches on the pattern space, the
    command 't label' _always_ jumps to the specified label. 't' should
    jump only if the substitution was successful (or returned "true").

    (2) 'l' (list) command does not convert the following characters to
    hex values, but passes them through unchanged: 0xF7, 0xFB, 0xFC,
    0xFD, 0xFE.

    (3) A range address like "/foo/,14d" should delete every line from
    the first occurrence of "foo" until line 14, inclusive, and then if
    /foo/ occurs thereafter, delete only those lines. In gsed 2.05, if
    a second "foo" occurs in the file, that line and everything to the
    end of file will be deleted (since gsed is looking for line 14 to
    occur again!).

    (4) The regex /\'/ is not interpreted as an apostrophe or a single
    quote mark, as it should be. Instead, it is interpreted as $,
    representing the end-of-line! This can be proven by these tests:

       echo hello | gsed "/\'/d"        # entire line is deleted!
       echo hello | gsed "s/\'/YYY/"    # 'YYY' appended to string

    (5) Multiple occurrences of the 'w' command fail, as shown here,
    given that both "aaa" and "bbb" occur within the file:

       gsed -e "/aaa/w FILE" -e "/bbb/w FILE" input.txt

C. GNU sed v1.18

    (1) same as #1 for GNU sed v2.05, above.

    (2) The following command will lock the computer under Win95. Echos
    is an echo command that does not issue a trailing newline:

       echos any_word | gsed "s/[ ]*$//"

    (3) same as #3 for GNU sed v2.05, above.

D. GNU sed v1.03 (by Frank Whaley)

    (1) The \w and \W escape sequences both match only nonword
    characters. \w is misdefined and should match word characters.

    (2) The underscore is defined as a nonword character; it should be
    defined as a word character.

    (3) same as #3 for GNU sed v2.05, above.

E. HHsed v1.5 (by Howard Helman)

    (1) If a number follows the substitute command (e.g., s/foo/bar/2),
    in a sed script entered from the command line, two semicolons must
    follow the number, or they must be separated by an -e switch.
    Normally, only 1 semicolon is needed to separate commands.

       echo bit bet | HHsed "s/b/n/2;;s/b/B/"          # solution 1
       echo bit bet | HHsed -e "s/b/n/2" -e "s/b/B"    # solution 2

    (2) If the substitute command is followed by a number and a "p"
    flag, when the -n switch is used, the "p" flag must occur first.

       echo aaa | HHsed -n "s/./B/3p"    # bug! nothing prints
       echo aaa | HHsed -n "s/./B/p3"    # prints "aaB" as expected

    (3) The following commands will cause HHsed to lock the computer
    under MS-DOS or Win95. Note that they occur because of malformed
    regular expressions which will match no characters.

       sed -n "p;s/\<//g;" file
       sed -n "p;s/[char-set]*//g;" file

    (4) The range command '/RE1/,/RE2/' in HHsed will match one line if
    both regexes occur on the same line (see section 6.8.5, below).
    Though this could be construed as a feature, it should probably be
    considered a bug since its operation differs from every other
    version of sed. For example, '/----/,/----/{s/^/>>/;}' should put
    two angle brackets ">>" before every line which is sandwiched
    between a row of 4 or more hyphens. With HHsed, this command will
    only prefix the hyphens themselves with the angle brackets.

    (5) If the hold space is empty, the H command copies the pattern
    space to the hold space but fails to prepend a leading newline. The
    H command is supposed to add a newline, followed by the contents of
    the pattern space, to the hold space at all times. A workaround is
    "{G;s/^\(.*\)\(\n\)$/\2\1/;H;s/\n$//;}", but it requires knowing
    that the hold space is empty and using the command only once.
    Another alternative is to use the G or the A command alone at key
    points in the script.

    (6) If grouping is followed by an '*' or '+' operator, HHsed does
    not match the pattern, but issues no warning. See below:

       echo aaa | HHsed "/\(a\)*/d"      # nothing is deleted
       echo aaa | HHsed "/\(a\)+/d"      # nothing is deleted
       echo aaa | HHsed "s/\(a\)*/\1B/"  # nothing is changed
       echo aaa | HHsed "s/\(a\)+/\1B/"  # nothing is changed

    (7) If grouping is followed by an interval expression, HHsed halts
    with the error message "garbled command", in all of the following
    examples:

       echo aaa | HHsed "/\(a\)\{3\}/d"
       echo aaa | HHsed "/\(a\)\{1,5\}/d"
       echo aaa | HHsed "s/\(a\)\{3\}/\1B/"

    (8) In interval expressions, 0 is not supported. E.g., \{0,3\)

F. sedmod v1.0 (by Hern Chen)

    Technically, the following are limits (or features?) of sedmod, not
    bugs, since the docs for sedmod do not claim to support these
    missing features.

    (1) sedmod does not support standard range arguments \{...\}
    present in nearly all versions of sed.

    (2) If grouping is followed by an '*' or '+' operator, sedmod gives
    a "garbled command" message. However, if the grouped expressions
    are strings literals with no metacharacters, a partial workaround
    can be done like so:

       \(string\)\1*    # matches 1 or more instances of 'string'
       \(string\)\1+    # matches 2 or more instances of 'string'

    (3) sedmod does not support a numeric argument after the s///
    command, as in 's/a/b/3', present in nearly all versions of sed.

    The following are bugs in sedmod v1.0:

    (4) When the -i (ignore case) switch is used, the '/regex/d'
    command is not properly obeyed. Sedmod may miss one or more lines
    matching the expression, regardless of where they occur in the
    script. Workaround: use "/regex/{d;}" instead.

G. HP-UX sed

    (1) Versions of HP-UX sed up to and including version 10.20 are
    buggy. According to the README file, which comes with the GNU cc
    at <ftp://ftp.ntua.gr/pub/gnu/sed-2.05.bin.README>:

    "When building gcc on a hppa*-*-hpux10 platform, the `fixincludes'
    step (which involves running a sed script) fails because of a bug
    in the vendor's implementation of sed.  Currently the only known
    workaround is to install GNU sed before building gcc.  The file
    sed-2.05.bin.hpux10 is a precompiled binary for that platform."

H. SunOS 4.1 sed

    (1) Bug occurs in RE pattern matching when a non-null '[char-set]*'
    is followed by a null '\NUM' pattern recall, illustrated here and
    reported by Greg Ubben:

       s/\(a\)\(b*\)cd\1[0-9]*\2foo/bar/  # between '[0-9]*' and '\2'
       s/\(a\{0,1\}\).\{0,1\}\1/bar/      # between '.\{0,1\}' and '\1'

    Workaround: add a do-nothing 'X*' expression which will not match
    any characters on the line between the two components. E.g.,

       s/\(a\)\(b*\)cd\1[0-9]*X*\2foo/bar/
       s/\(a\{0,1\}\).\{0,1\}X*\1/bar/

I. SunOS 5.6 sed

    (1) If grouping is followed by an asterisk, SunOS sed does not match
    the null string, which it should do. The following command:

       echo foo | sed 's/f\(NO-MATCH\)*/g\1/'

    should transform "foo" to "goo" under normal versions of sed.

J. Ultrix 4.3 sed

    (1) If grouping is followed by an asterisk, Ultrix sed replies with
    "command garbled", as shown in the following example:

       echo foo | sed 's/f\(NO-MATCH\)*/g\1/'

    (2) If grouping is followed by a numeric operator such as \{0,9\},
    Ultrix sed does not find the match.

K. Digital Unix sed

    (1) The following comes from the man pages for sed distributed with
    new, 1998 versions of Digital Unix (reformatted to fit our
    margins):

    [Digital]  The h subcommand for sed does not work properly.  When
    you use the  h subcommand to place text into the hold area, only
    the last line of the specified text is saved.  You can use the H
    subcommand to append text to the hold area. The H subcommand and
    all others dealing with the hold area work correctly.

    (2) "$d" command issues an error message, "cannot parse".  Reported
    by Carlos Duarte on 8 June 1998.

6.8. Known incompatibilities between sed versions

6.8.1. Issuing commands from the command line

    Most versions of sed permit multiple commands to issued on the
    command line, separated by a semicolon (;). Thus,

       sed 'G;G' file

    should triple-space a file. However, certain commands REQUIRE
    separate expressions on the command line. These include:

       - all labels (':a', ':more', etc.)
       - all branching instructions ('b', 't')
       - commands to read and write files ('r' and 'w')
       - any closing brace, '}'

    If these commands are used, they must be the LAST commands of an
    expression. Subsequent commands must use another expression
    (another -e switch plus arguments).  E.g.,

       sed  -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/' files

    GNU sed and HHsed v1.5 allow these commands to be followed by a
    semicolon, and the previous script can be written like this:

       sed  ':a;s/^.\{1,77\}$/ &/;ta;s/\( *\)\1/\1/' files

    Versions differ in implementing the 'a' (append), 'c' (change), and
    'i' (insert) commands:

       hhsed "/foo/i New text here"            # either HHsed or sedmod
       gsed -e "/foo/i\\" -e "New text here"   # GNU sed
       sed1 -e "/foo/i" -e "New text here"     # one version of sed
       sed2 "/foo/i\ New text here"            # another version

6.8.2. Using comments (prefixed by the '#' sign)

    Most versions of sed permit comments to appear in sed scripts only
    on the first line of the script. Comments on line 2 or thereafter
    are not recognized and will generate an error like "unrecognized
    command" or "command [bad-line-here] has trailing garbage".

    GNU sed, HHsed, sedmod, and HP-UX sed permit comments to appear on
    any line of the script, except after labels and branching commands
    (b,t), *provided* that a semicolon (;) occurs after the command
    itself. This syntax makes sed similar to awk and perl, which use a
    similar commenting structure in their scripts.  Thus,

       # GNU style sed script
       $!N;                        # except for last line, get next line
       s/^\([0-9]\{5\}\).*\n\1.*//;    # if first 5 digits of each line
                                       # match, delete BOTH lines.
       t skip
       P;                              # print 1st line only if no match
       :skip
       D;                    # delete 1st line of pattern space and loop
       #---end of script---

    is a valid script for GNU sed and Helman's sed, but is unrecognized
    for most other versions of sed.

6.8.3. Special syntax in REs

A. GNU sed v2.05 and 3.02

    BEGIN~STEP selection: GNU sed can select a series of lines in the
    form M~N, where M and N are integers (with gsed v2.05, M must be
    less than N). Beginning at line M (M may equal 0), every Nth line
    is selected. Thus,

       gsed '1~3d' file    # delete every 3d line, starting with line 1
                           # deletes lines 1, 4, 7, 10, 13, 16, ...

       gsed -n '2~5p' file # print every 5th line, starting with line 2
                           # prints lines 2, 7, 12, 17, 22, 27, ...

    With gsed v3.02, M may be any valid line number. With gsed v2.05,
    if M is greater than or equal to N (the STEP value), nothing will
    be selected, except in one pointless case, 0~0, which selects every
    line.

    The following expressions can be used for /RE/ addresses or in the
    LHS side of a substitution:

       \`  - matches the beginning of the pattern space (same as "^")
       \'  - matches the end of the pattern space (same as "$")
       \?  - 0 or 1 occurrences of previous character: same as \{0,1\}
       \+  - 1 or more occurrences of previous character: same as \{1,\}
       \|  - matches the string on either side, e.g., foo\|bar
       \b  - boundary between word and nonword chars (reversible)
       \B  - boundary between 2 word or between 2 nonword chars
       \n  - embedded newline (usable after N, G, or similar commands)
       \w  - any word character: [A-Za-z0-9_]
       \W  - any nonword char: [^A-Za-z0-9_]
       \<  - boundary between nonword and word character
       \>  - boundary between word and nonword character

    On \b, \B, \<, and \>, see section 6.8.4 ("Word boundaries"),
    below.

    Note that gsed does not have any syntax for designating characters
    in octal or hex notation. Traditionally, \ooo or \hh or \xhh have
    been used by the GNU project to do this, but they are not (yet)
    implemented in gsed. Note that GNU sed also supports "character
    classes", a POSIX extension to regexes, described in section 3.7,
    above.

B. GNU sed v1.03 (by Frank Whaley)

    When used with the -x (extended) switch on the command line, or
    when '#x' occurs as the first line of a script, Whaley's gsed103
    supports the following expressions in both the LHS and RHS of a
    substitution:

       \|      matches the expression on either side
       ?       0 or 1 occurrences of previous RE: same as \{0,1\}
       +       1 or more occurrence of previous RE: same as \{1,\}
       \a      audible beep (Ctrl-G, 0x07)
       \b      backspace (Ctrl-H, 0x08)
       \bBBB   binary char, where BBB are 1-8 binary digits, [0-1]
       \dDDD   decimal char, where DDD are 1-3 decimal digits, [0-9]
       \f      formfeed (Ctrl-L, 0x0C)
       \n      newline (Ctrl-J, 0x0A)
       \oOOO   octal char, where OOO are 1-3 octal digits, [0-7]
       \r      carriage-return (Ctrl-M, 0x0D)
       \t      tab (Ctrl-I, 0x09)
       \v      vertical tab (Ctrl-K, 0x0B)
       \xXX    hex char, where XX are 1-2 hex digits, [0-9A-F]

    In normal mode, with or without the -x switch, the following escape
    sequences are also supported in regex addressing or in the LHS of a
    substitution:

       \`      matches beginning of pattern space: same as /^/
       \'      matches end of pattern space: same as /$/
       \B      boundary between 2 word or 2 nonword characters
       \w      any nonword character [*BUG!* should be a word char]
       \W      any nonword character: same as /[^A-Za-z0-9]/
       \<      boundary between nonword and word char
       \>      boundary between word and nonword char

C. HHsed v1.5 (by Howard Helman)

    The following expressions can be used for /RE/ addresses or in the
    LHS and RHS side of a substitution:

       +    - 1 or more occurrences of previous RE: same as \{1,\}
       \a   - bell         (ASCII 07, 0x07)
       \b   - backspace    (ASCII 08, 0x08)
       \e   - escape       (ASCII 27, 0x1B)
       \f   - formfeed     (ASCII 12, 0x0C)
       \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
       \r   - return       (ASCII 13, 0x0D)
       \t   - tab          (ASCII 09, 0x09)
       \v   - vertical tab (ASCII 11, 0x0B)
       \xhh - the ASCII character corresponding to 2 hex digits hh.
       \<   - boundary between nonword and word character
       \>   - boundary between word and nonword character

D. sedmod v1.0 (by Hern Chen)

    The following expressions can be used for /RE/ addresses in the LHS
    of a substitution:

       +    - 1 or more occurrences of previous RE: same as \{1,\}
       \a   - any alphanumeric: same as [a-zA-Z0-9]
       \A   - 1 or more alphas: same as \a+
       \d   - any digit: same as [0-9]
       \D   - 1 or more digits: same as \d+
       \h   - any hex digit: same as [0-9a-fA-F]
       \H   - 1 or more hexdigits: same as \h+
       \l   - any letter: same as [A-Za-z]
       \L   - 1 or more letters: same as \l+
       \n   - newline      (read as 2 bytes, 0D 0A or ^M^J, in DOS)
       \s   - any whitespace character: space, tab, or vertical tab
       \S   - 1 or more whitespace chars: same as \s+
       \t   - tab          (ASCII 09, 0x09)
       \<   - boundary between nonword and word character
       \>   - boundary between word and nonword character

    The following expressions can be used in the RHS of a substitution.
    "Elements" refer to \1 .. \9, &, $0, or $1 .. $9:

       &    - insert regexp defined on LHS
       \e   - end case conversion of next element
       \E   - end case conversion of remaining elements
       \l   - change next element to lower case
       \L   - change remaining elements to lower case
       \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
       \t   - tab          (ASCII 09, 0x09)
       \u   - change next element to upper case
       \U   - change remaining elements to upper case
       $0   - insert pattern space BEFORE the substitution
       $1-$9 - match Nth word on the pattern space

E. UnixDos sed

    The following expressions can be used in text, LHS, and RHS:

       \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)

6.8.4. Word boundaries

    GNU sed, HHsed, and sedmod use certain symbols to define the
    boundary between a "word character" and a nonword character. A word
    character fits the regex "[A-Za-z0-9_]". Note: a word character
    includes the underscore "_" but not the hyphen, probably because
    the underscore is permissible as a label in sed and in other
    scripting languages. (In gsed103, a word character did NOT include
    the underscore; it included alphanumerics only.)

    These symbols include '\<' and '\>' (gsed, HHsed, sedmod) and '\b'
    and '\B' (gsed only). Note that the boundary symbols do not
    represent a character, but a position on the line. Word boundaries
    are used with literal characters or character sets to let you match
    (and delete or alter) whole words without affecting the spaces or
    punctuation marks outside of those words. They can only be used in
    a "/pattern/" address or in the LHS of a 's/LHS/RHS/' command. The
    following table shows how these symbols may be used in HHsed and
    GNU sed. Sedmod matches the syntax of HHsed.

       Match position      Possible word boundaries   HHsed   GNU sed
       ---------------------------------------------------------------
       start of word    [nonword char]^[word char]      \<    \< or \b
       end of word         [word char]^[nonword char]   \>    \> or \b
       middle of word      [word char]^[word char]     none      \B
       outside of word  [nonword char]^[nonword char]  none      \B
       ---------------------------------------------------------------

6.8.5. Range addressing with GNU sed and HHsed

    When addressing a range of lines, as in the following example to
    delete all lines between /RE1/ and /RE2/,

       sed '/RE1/,/RE2/d' file

    if /RE1/ and /RE2/ both occur on the *same* line, HHsed will delete
    that single line and then look forward in the file for the next
    occurrence of /RE1/ to attempt the deletion. GNU sed will match the
    first line containing /RE1/ but will look forward to the next and
    succeeding lines to match /RE2/. If /RE1/ and /RE2/ cannot be found
    on two different lines, nothing will be deleted.

    GNU sed v2.05 has a bug in range addressing (see section 6.7:B(3),
    above). This was fixed in gsed v3.02.

    GNU sed v3.02a supports 0 in range addressing, which means that the
    range "0,/RE/" will match every line from the top of the file to
    the first line containing /RE/, inclusive, and if /RE/ occurs on
    the first line of the file, only line 1 will be matched.

[end-of-file]

------------------------------------------------------------------------------
                          Alexandro Obirajara Renaud
------------------------------------------------------------------------------
ISTM do Brasil    -   http://www.istm.com.br
"Centro de Suporte Certificado Conectiva-LINUX"
"Centro de Administração de Ambientes Tecnológicos-LINUX"
ISTM do Brasil  ISTM Company  ISTM Security Co  ISTM Linux Center Free
------------------------------------------------------------------------------
                   -----------[  I Love Linux !!  ]-------------
------------------------------------------------------------------------------

#217 De: # aurelio marinho jargas <aurelio@...>
Data: Qui, 6 de Jul de 2000 10:18 pm
Assunto: unescape.sed e novidades no sed
aurelio@...
Enviar e-mail Enviar e-mail
 
olás,

quem lida com CGIs e programas que interagem com o navegador,
sabe que naqueles formulários malas que o usuário envia, os dados
são codificados para formar uma tripa única, como definido na RFC
2396, tranformando caracteres "estranhos" para seu respectivo
código hexadecimal, no formato %hh

depois, o seu programinha que vai tratar esta tripa precisa pegar
as informações codificadas e transformá-las em texto novamente
para poder ser manipulado.

pois bem, linguagens como o PHP fazem esta conversão
automaticamente pra vc. perl tem módulo pra isso. bash tem que
fazer na mão &:)

eu, desde que comecei a usar o linux, herdei um executável
unescape que fazia este serviço sujo para os programinhas em
bash. agora o diabo do executável parou de funcionar e como não
tenho os fontes, não dá para (alguém fora eu) arrumá-lo.

um dia isso iria acontecer, então, é claro, tive que fazer um
unescape.sed pra resolver este problema.

<segue anexado>

é só uma série de s/// fazendo a tabelinha de conversão, numa
ordem que não quebre, é claro.

ah! no sed novo (3.02.80) já se pode usar \t, \n, ..., \a:

The s/// command now understands the following escapes
(in both halves):
     \a  an "alert" (BEL)
     \f  a form-feed
     \n  a newline
     \r  a carriage-return
     \t  a horizontal tab
     \v  a vertical tab
     \oNNN   a character with the octal value NNN
     \dNNN   a character with the decimal value NNN
     \xNN    a character with the hexadecimal value NN

tá tá, o sed devia ser o único programa no mundo que ainda não
tinha estes escapes disponíveis... (mas \a é sacanagem &:)

e ainda! o sed agora tem mensagens de erro em português. (tudo
ficou muuuuuuuito mais fácil &:) o malantenedor do po é moá,
então qualquer problema me avisem.

se o seu sed for < 3.02.80, ou atualize-o (recomendado), ou veja
no código onde descomentar para que funcione.


--
s/:(/:)/;s/:(/:|/;s/:(/>(/,http://www.conectiva.com.br/~aurelio
${linux/mouse/},ctrl+a],http://www.brasmidia.com/dumbs,<esc>:wq
#!/bin/sed -f
# unescape.sed - traduz os escapes hexadecimais para ascii
#
#   útil para converter para texto legível por humanos as tripas
#   que os navegadores fazem com o conteúdo de formulários
#
# IMPORTANTE! \n \r \t: sed >= 3.02.80
#
# 20000706 <aurelio@...> ** 1a versão

# tem que ser o primeiro para não confundir com + literais
s/+/ /g

# quebra de linha (lynx %0d, netscape %0D)
### 2 linhas seguintes: apenas para sed >= 3.02.80
s/%0[Dd]%0[Aa]/\n/g
s/%09/\t/g
### 3 linhas seguintes: sed < 3.02.80
#s/%0[Dd]%0[Aa]/\
#/g
#s/%09/ /g


# substituições padrão hexa->ascii
s/%21/!/g
s/%22/"/g
s/%23/#/g
s/%24/$/g
s/%26/\&/g
s/%27/'/g
s/%28/(/g
s/%29/)/g
s/%2B/+/g
s/%2C/,/g
s/%2F/\//g
s/%3A/:/g
s/%3B/;/g
s/%3C/</g
s/%3D/=/g
s/%3E/>/g
s/%3F/?/g
s/%5B/[/g
s/%5C/\\/g
s/%5D/]/g
s/%5E/^/g
s/%60/`/g
s/%7B/{/g
s/%7C/|/g
s/%7D/}/g
s/%7E/~/g
s/%A1/¡/g
s/%A2/¢/g
s/%A3/£/g
s/%A4/¤/g
s/%A5/¥/g
s/%A6/¦/g
s/%A7/§/g
s/%A8/¨/g
s/%A9/©/g
s/%AA/ª/g
s/%AB/«/g
s/%AC/¬/g
s/%AD/­/g
s/%AE/®/g
s/%AF/¯/g
s/%B0/°/g
s/%B1/±/g
s/%B2/²/g
s/%B3/³/g
s/%B4/´/g
s/%B5/µ/g
s/%B6/¶/g
s/%B7/·/g
s/%B8/¸/g
s/%B9/¹/g
s/%BA/º/g
s/%BB/»/g
s/%BC/¼/g
s/%BD/½/g
s/%BE/¾/g
s/%BF/¿/g
s/%C0/À/g
s/%C1/Á/g
s/%C2/Â/g
s/%C3/Ã/g
s/%C4/Ä/g
s/%C5/Å/g
s/%C6/Æ/g
s/%C7/Ç/g
s/%C8/È/g
s/%C9/É/g
s/%CA/Ê/g
s/%CB/Ë/g
s/%CC/Ì/g
s/%CD/Í/g
s/%CE/Î/g
s/%CF/Ï/g
s/%D0/Ð/g
s/%D1/Ñ/g
s/%D2/Ò/g
s/%D3/Ó/g
s/%D4/Ô/g
s/%D5/Õ/g
s/%D6/Ö/g
s/%D7/×/g
s/%D8/Ø/g
s/%D9/Ù/g
s/%DA/Ú/g
s/%DB/Û/g
s/%DC/Ü/g
s/%DD/Ý/g
s/%DE/Þ/g
s/%DF/ß/g
s/%E0/à/g
s/%E1/á/g
s/%E2/â/g
s/%E3/ã/g
s/%E4/ä/g
s/%E5/å/g
s/%E6/æ/g
s/%E7/ç/g
s/%E8/è/g
s/%E9/é/g
s/%EA/ê/g
s/%EB/ë/g
s/%EC/ì/g
s/%ED/í/g
s/%EE/î/g
s/%EF/ï/g
s/%F0/ð/g
s/%F1/ñ/g
s/%F2/ò/g
s/%F3/ó/g
s/%F4/ô/g
s/%F5/õ/g
s/%F6/ö/g
s/%F7/÷/g
s/%F8/ø/g
s/%F9/ù/g
s/%FA/ú/g
s/%FB/û/g
s/%FC/ü/g
s/%FD/ý/g
s/%FE/þ/g
s/%FF/ÿ/g

# tem que ser o último para não bagunçar os hexadecimais
s/%25/%/g

#218 De: # aurelio marinho jargas <aurelio@...>
Data: Sáb, 15 de Jul de 2000 6:34 am
Assunto: justificador de texto em sed
aurelio@...
Enviar e-mail Enviar e-mail
 
olás,

sexta  à  noite, hoje não rolou show ou náite, então a ociosidade
me fez ser produtivo &:)

aí  nessa  falta  de algo de melhor pra fazer, fiz um carinha que
fazia   tempo   que   eu   queria,   um  justificador  de  texto.

não é perfeito, mas funciona. pega um texto já quebrado no número
limite  de  colunas  e apenas inclui uns espaços em branco aqui e
ali, da esquerda para a direita, palavra por palavra, até atingir
o  limite  máximo de colunas (no caso 65, que é o que eu uso aqui
no pine).

se quiser usá-lo, basta:

prompt$ chmod +x justifica.sed
prompt$ ./justifica.sed arquivo.txt > arquivo-justificado.txt

se quiser alterar, o número de colunas máximo, troque todos os 65
do script pelo número desejado.

se  quiser  usar  no  vim,  selecione o texto com o modo visual e
:'<,'>!justifica.sed

ah!    esta    mensagem    foi    justificada    por    ele   &:)

--
s/:(/>(/×^a]×http://www.verde666.org×^[:wq
#!/bin/sed -f
# justify.sed
#
# it  gets  a text already wrapped on the desired number of columns
# and  add  extra  white  spaces, from left to right, word by word,
# to  justify  all  the lines. there is a maximum of 5 spaces to be
# inserted  between  the  words. if this limit is reached, the line
# is  not  justified  (come  on,  more  than  5 is horrible). empty
# lines  are  ignored.  btw, this comments were justified with this
# script &:)
#
# 20000715 <aurelio@...>

# we'll only justify lines with less than 65 chars
/^.\{65\}/!{

   # cleaning extra spaces of the line
   s/^ \+//
   s/ \+/ /g
   s/ \+$//

   # don't try to justify blank lines
   /^$/b

   # backup of the line
   h

   # spaces -> pattern
   # convert series of spaces to a internal pattern `n
   :s2p
   s/     /`5/g
   s/    /`4/g
   s/   /`3/g
   s/  /`2/g
   s/ /`1/g
   t 1space
   b

   # pattern -> spaces
   # restore the spaces converted to the internal pattern `n
   :p2s
   s/`5/     /g
   s/`4/    /g
   s/`3/   /g
   s/`2/  /g
   s/`1/ /g
   t check
   b

   # check if we've reached our right limit
   # if not, continue adding spaces
   :check
   /^.\{65\}/!b s2p
   b

   # here's the "magic":
   # add 1 space to the first and minor internal pattern found.
   # this way, the extra spaces are always added from left to right,
   # always balanced, one by one.
   # right after the substitution, we'll restore the spaces and
   # test if our limit was reached.
   :1space
   s/`1/`2/ ; t p2s
   s/`2/`3/ ; t p2s
   s/`3/`4/ ; t p2s
   s/`4/`5/ ; t p2s

   # we don't want to justify with more than 5 added spaces between
   # words, so let's restore the original line
   /`5/x

}

#219 De: # aurelio marinho jargas <aurelio@...>
Data: Ter, 25 de Jul de 2000 11:11 pm
Assunto: quebrar texto na coluna n
aurelio@...
Enviar e-mail Enviar e-mail
 
olá marcos,

estou respondendo com cópia para a lista do sed, se tiver
interesse de entrar, seja bem-vindo.


@ 23/7, Marcos Fernando de Souza:
> Via a sua dica sobre SED e gostaria de saber como
> fazer resolver um outro problema. Preciso formatar um arquivo
> ASCII com comprimento de linhas totalmente aleatorio
> em um arquivo onde todas as linhas tenham 25 colunas.....

poderia dar mais detalhes, ou um exemplo?
se a linha tiver mais de 25 colunas o resto deve ser excluído ou
colocado na próxima linha?
se a linha tual não tiver 25 colunas, uma parte da linha de
baixo deve ser grudada nela para totalizar as 25 colunas ou
preenche com espaço em branco?

de qualquer forma, mesmo sem saber exatamente o seu problema,
sugiro que você dê uma olhada no aplicativo fmt que está no
pacote textutils.

exemplo:

quebrar em 25 colunas o /etc/fstab
# fmt -25 /etc/fstab

idem anterior sem juntar com a linha de baixo:
# fmt -25 -s /etc/fstab


ou ainda, se for para juntar com a linha de baixo e ignorar o
conceito de "palavras", quebrando a linha SEMPRE na coluna 25
mesmo que esteja no meio de uma palavra, use o sed:

sed '1{:a;$!N;s/\n//;ta;s/.\{25\}/&\n/g;}' /etc/fstab

note que o \n na substituição, ali pertinho do final só funciona
no sed-3.02.80, se você tiver outra versão, tem que colocar uma
quebra de linha literal escapada (ficando em 2 linhas):

sed '1{:a;$!N;s/\n//;ta;s/.\{25\}/&\
/g;}' /etc/fstab

se não for isso o desejado, escreva seu problema com mais
detalhes, valeu!


--
s/:(/>(/×^a]×http://www.verde666.org×^[:wq

#220 De: # aurelio marinho jargas <aurelio@...>
Data: Qua, 26 de Jul de 2000 2:25 am
Assunto: assinatura mala mudou
aurelio@...
Enviar e-mail Enviar e-mail
 
mudaram a assinatura do egroups pra eles eliminarem
automaticamente as assinaturas anteriores numa mensagem (ó, que
avanço) e deixaram baba para apagar pois tem marcadores nas
linhas.

atualizem seus .procmailrc

:0 fhbw
* Delivered-To:.*@egroups.com
| sed '/^\(> \)*-\{68\}<e|-$/,/^\(> \)*-\{68\}|e>-$/d'



--
s/:(/>(/×^a]×http://www.verde666.org×^[:wq

#221 De: # aurelio marinho jargas <aurelio@...>
Data: Qua, 26 de Jul de 2000 3:36 am
Assunto: Re: quebrar texto na coluna n
aurelio@...
Enviar e-mail Enviar e-mail
 
respondendo a mim mesmo, versão nova com comentários.
destaque para a pseudo-internacionalização de comentários &:)


#!/bin/sed -f
# wrap-forced.sed - wrap lines at column n
#
# acts like fmt, but ignores the 'word' context,
# wrapping the line exactly at the specified column
#
# pt_BR comments:
# funciona como o fmt, mas ignora o contexto de 'palavra'
# quebrando a linha exatamente na coluna especificada
#
# c1: na primeira linha do texto...
# c2: laço para colocar todas as linhas em 1 linha apenas
# c3: isto é para eliminar espaços em branco repetidos
#     você pode comentá-lo se não quiser alterá-los
# c4: dica: tire o espaço da 2ª parte do comando s para
#     apagar todos os espaços (parece arte ascii &:) )
# c5: aqui é quem quebra a linha na coluna especificada
#     mude o 25 para o número que você quiser
#     o gnu-sed >= 3.02.80 é necessário por causa do \n
#
# 20000726 <aurelio@...>

#c1: at the first line of the text...
1{

   #c2: loop to put all the lines of the text at one single line
   :a
   $!N
   s/\n//
   ta

   #c3:
   # this is to squeeze blanks
   # you can comment it if you want blanks untouched
   #c4:
   # tip: take off the space at replacement part to have a result with
   # NO spaces at all (looks like ascii art &:) )
   s/[[:blank:]]\+/ /g

   #c5:
   # here is the guy who breaks the line at the specified column
   # just change the 25 whatever column you like
   # gnu-sed >= 3.02.80 required because the \n
   s/.\{25\}/&\n/g
}




@ 25/7, # aurelio marinho jargas:
> @ 23/7, Marcos Fernando de Souza:
> > fazer resolver um outro problema. Preciso formatar um arquivo
> > ASCII com comprimento de linhas totalmente aleatorio
> > em um arquivo onde todas as linhas tenham 25 colunas.....
>
> ou ainda, se for para juntar com a linha de baixo e ignorar o
> conceito de "palavras", quebrando a linha SEMPRE na coluna 25
> mesmo que esteja no meio de uma palavra, use o sed:
>
> sed '1{:a;$!N;s/\n//;ta;s/.\{25\}/&\n/g;}' /etc/fstab
>
> note que o \n na substituição, ali pertinho do final só funciona
> no sed-3.02.80, se você tiver outra versão, tem que colocar uma
> quebra de linha literal escapada (ficando em 2 linhas):
>
> sed '1{:a;$!N;s/\n//;ta;s/.\{25\}/&\
> /g;}' /etc/fstab


--
s/:(/>(/×^a]×http://www.verde666.org×^[:wq

#222 De: "Manuel Lemos" <mlemos@...>
Data: Qua, 26 de Jul de 2000 4:18 am
Assunto: Expressões regulares em Javascript do Internet Explorer
mlemos@...
Enviar e-mail Enviar e-mail
 
Viva,

Eu sei que esta lista é do sed, mas como na Revista do Linux recomendar
esta lista para discutir questões relacionadas com expressões regulares,
vamos lá ver se os peritos de plantão conseguem resolver esta charada.

Eu desenvolvi uma classe de objectos em PHP para gerar e validar campos de
formulários de páginas em HTML. A classe valida os valores dos campos no lado
do servidor usando código de PHP da própria classe e no lado do cliente usa
Javascript gerador com o próprio HTML do formulário onde estão os campos a
validar.  Para quem interessar, pode encontrar o código da classe
gratuitamente aqui:

http://phpclasses.UpperDesign.com/browse.html/package/1

A classe suporta vários tipos de validação incluindo uma baseada em
expressões regulares definidas pelo programador.  Tudo funciona bem excepto
alguns problemas no Internet Explorer 4.  Pessoalmente não uso o Internet
Explorer, nem sequer o Windows, mas precisava de resolver estes problemas
para poder usar bem estas validações em todos os browsers usados por que
quem irá às páginas onde vão aparecer os meus formulários.

Uma das situações é um caso em que preciso de validar um endereço entrado
num campo tipo TEXTAREA (1 ou mais linhas).  De acordo com a sintaxe
Javascript para expressões regulares eu usei a expressão ^[^ \t\n\r]+
para obrigar a que o campo contenha pelo menos uma linha começada por um
caractere que não seja espaço ou tab.

A classe gera o seguinte código de Javascript que dá erro apenas no
Internet explorer por ter os caracteres \t\n\r .

  if((formulario.endereco.value.search
  && formulario.endereco.value.search(new RegExp("^[^ \t\n\r]+","g"))<0))
  {
   alert('N'+unescape('%E3')+'o foi indicado um endere'+unescape('%E7')+'o
v'+unescape('%E1')+'lido.')
   formulario.endereco.focus()
   return false
  }

Alguém sabe porque isso acontece apenas com o Internet Explorer e como
contornar a situação?

Uma situação idêntica acontece com a validação de campos de URL. No caso usei
a expressão ^(http|https|ftp)://(([A-Za-z0-9_]|\-)+\.)+[A-Za-z]{2,4}(:[0-9]+)?/
.

  if((formulario.url.value.search
  && formulario.url.value.search(new
RegExp("^(http|https|ftp)://(([A-Za-z0-9_]|\-)+\.)+[A-Za-z]{2,4}(:[0-9]+)?/","g"\
))<0)
  || formulario.url.value=='')
  {
   alert('N'+unescape('%E3')+'o foi indicado um endere'+unescape('%E7')+'o de
p'+unescape('%E1')+'gina v'+unescape('%E1')+'lido.')
   theform.elements[0].focus()
   form_submitted=false
   return false
  }

O problema agora é com os caracteres // assim seguidos.  Alguém sabe porque
isso acontece apenas com o Internet Explorer e como contornar a situação?


Um abraço,
Manuel Lemos

Web Programming Components using PHP Classes.
Look at: http://phpclasses.UpperDesign.com/?user=mlemos@acm.org
--
E-mail: mlemos@...
URL: http://www.mlemos.e-na.net/
PGP key: http://www.mlemos.e-na.net/ManuelLemos.pgp
--

#223 De: # aurelio marinho jargas <aurelio@...>
Data: Qua, 26 de Jul de 2000 5:00 am
Assunto: Re: Expressões regulares em Javascript do Internet Explorer
aurelio@...
Enviar e-mail Enviar e-mail
 
oi manuel,

@ 26/7, Manuel Lemos:
> Eu sei que esta lista é do sed, mas como na Revista do Linux recomendar
> esta lista para discutir questões relacionadas com expressões regulares,
> vamos lá ver se os peritos de plantão conseguem resolver esta charada.

certo, sed & expressões regulares, estamos aí...
eu nunca usei javascript e não sei suas particularidades, mas vou
dar uns pitacos, quem sabe ajude...


> Uma das situações é um caso em que preciso de validar um endereço entrado
> num campo tipo TEXTAREA (1 ou mais linhas).  De acordo com a sintaxe
> Javascript para expressões regulares eu usei a expressão ^[^ \t\n\r]+
> para obrigar a que o campo contenha pelo menos uma linha começada por um
> caractere que não seja espaço ou tab.
>
> A classe gera o seguinte código de Javascript que dá erro apenas no
> Internet explorer por ter os caracteres \t\n\r .
>
>  if((formulario.endereco.value.search
>  && formulario.endereco.value.search(new RegExp("^[^ \t\n\r]+","g"))<0))

duas coisas:

   suponho que o parâmetro g passado a função RegExp seja para
fazer procuras globais (várias na mesma linha). mas como
você colocou a âncora ^ no começo da ER, para casar o começo de
linha, então só pode haver um começo de linha, aí o uso o "g" é
desnecessárioe talvez possa estar causando o erro. tente tirar o
"g". se o "g" não for de global, ignore essa suposição.

   como você procura uma linha que NÃO começe por um branco, o
quantificador + no final é desnecessário, visto que apenas um
espaço no começo já invalida a cadeia. acho difícil ser este o
problema, mas quem sabe a máquina de ER está se perdendo tentando
casar mais de um \n seguido? tente tirar o +.


> Uma situação idêntica acontece com a validação de campos de URL. No caso usei
> a expressão
^(http|https|ftp)://(([A-Za-z0-9_]|\-)+\.)+[A-Za-z]{2,4}(:[0-9]+)?/ .
>
>  if((formulario.url.value.search
>  && formulario.url.value.search(new
>
RegExp("^(http|https|ftp)://(([A-Za-z0-9_]|\-)+\.)+[A-Za-z]{2,4}(:[0-9]+)?/","g"\
))<0)


se na ER anterior tirar o "g" resolveu o problema, nesta é a
mesma coisa, pois também tem o marcador de início de linha ^.

outro chute, é tentar evitar estruturas muito complexas como
((...|...)+...)+ ou (...+)?, não sei como é o comportamento da
máquina de ER do javascript com estas possibilidades múltiplas,
mas pelo menos para eliminar uma alternância (...|...)

(([A-Za-z0-9_]|\-)+\.)+
poderia ficar:
([-A-Za-z0-9_]+\.)+

pois o - no ínicio da classe [] não é especial

mas no mais a ER está certinha, e bem completa por sinal.
claro, estou considerando que a sintaxe do javascript para
escapes está correta &:) ou ainda, talvez esta versão do explorer
tenha alguma particularidade (bug) ou não aceite a opção "g", ou
sua sintaxe é diferente das outras versões, ou tem que passar a
cadeia entre 'aspas simples', ou ...

bem manuel, estes foram meus chutes como leigo total em
javascript e interessado em ERs. se ajudar em alguma coisa, fico
feliz &:)


--
s/:(/>(/×^a]×http://www.verde666.org×^[:wq

#224 De: paulo_sofia@...
Data: Qui, 27 de Jul de 2000 10:43 pm
Assunto: Apagar parte de um arquivo
paulo_sofia@...
Enviar e-mail Enviar e-mail
 
#225 De: andreia <andreia@...>
Data: Sex, 28 de Jul de 2000 1:13 am
Assunto: Re: Apagar parte de um arquivo
andreia@...
Enviar e-mail Enviar e-mail
 
olá

eu tentei com:

sed '/ENV/p' | sed '1,/ENV/d'

o primeiro sed repete a linha [ENVIAR] e o segundo apaga da primeira linha
do arquivo até o primeiro [ENVIAR].

tentei fazer só com o segundo sed, mas apaga inclusive a linha que contém
[ENVIAR].

deve haver um jeito menos mala de fazer isso, mas esse funciona. :)


Em 27.07.00, paulo_sofia@... escreveu:

> Olá pessoal.
>
> Estou fazendo um script e preciso apagar parte de um automaticamente.
>
> A dúvida é o seguinte, tenho um arquivo conforme exemplo abaixo:
>
>
> AAAAAAAAAA
>
> AAAAAAAAAA
>
> AAAAAAAAAA
>
> BBBBBBBBBB
>
> VVVVVVV
>
> DDDDDDDDD
>
> [ENVIAR]
>
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
> dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
> eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
> rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
>
>
>
> - Preciso apagar as N linhas iniciais até a palavra [ENVIAR] ( O que tem
> antes do arquivo até a palavra [ENVIAR] não me interessa).
>
> - Esta String que estou procurando para apagar as linhas anteriores está
> entre colchetes.
>
> - O arquivo final ficaria assim:
>
> [ENVIAR]
>
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
> dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
> eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
> rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
>
>
> Conto com a ajuda de vocês.
>
> Paulo Sofia
> Hewlett-Packard Brazil
> Operations Implementation Team
> mailto:paulo_sofia@...
> Phone:  +55 11 7297-4257
> Cellular: +55 11 9219-7834 (Mobile)
> Telnet: 725-4257
> Internal WEB Site: http://oscnt.brazil.hp.com/oi
>
>
>
>

--
Andréia

#226 De: andreia <andreia@...>
Data: Sex, 28 de Jul de 2000 1:33 am
Assunto: Re: Apagar parte de um arquivo
andreia@...
Enviar e-mail Enviar e-mail
 
melhor ainda:

sed -n /ENV/,\$p

o -n é uma opção que 'mostra' com silent, ou seja, não imprime nada que
não esteja especificado. no caso, foi especificado o intervalo de ENV até
o final do arquivo ($).

não é necessário escapar o $, desde que a expressão seja colocada entre
aspas simples.

falei que tinha um jeito mais 'elegante' de fazer a parada. :)

Em 27.07.00, paulo_sofia@... escreveu:

> Olá pessoal.
>
> Estou fazendo um script e preciso apagar parte de um automaticamente.
>
> A dúvida é o seguinte, tenho um arquivo conforme exemplo abaixo:
>
>
> AAAAAAAAAA
>
> AAAAAAAAAA
>
> AAAAAAAAAA
>
> BBBBBBBBBB
>
> VVVVVVV
>
> DDDDDDDDD
>
> [ENVIAR]
>
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
> dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
> eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
> rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
>
>
>
> - Preciso apagar as N linhas iniciais até a palavra [ENVIAR] ( O que tem
> antes do arquivo até a palavra [ENVIAR] não me interessa).
>
> - Esta String que estou procurando para apagar as linhas anteriores está
> entre colchetes.
>
> - O arquivo final ficaria assim:
>
> [ENVIAR]
>
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
> dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
> eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
> rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
>
>
> Conto com a ajuda de vocês.
>
> Paulo Sofia
> Hewlett-Packard Brazil
> Operations Implementation Team
> mailto:paulo_sofia@...
> Phone:  +55 11 7297-4257
> Cellular: +55 11 9219-7834 (Mobile)
> Telnet: 725-4257
> Internal WEB Site: http://oscnt.brazil.hp.com/oi
>
>
>
>

--
Andréia

mensagens 197 - 226 de 5040   Mais antigos  |  < Mais antigos  |  Mais recentes >  |  Mais recentes
mensagens 197 - 226 de 5040   Mais antigos  |  < Mais antigos  |  Mais recentes >  |  Mais recentes
Avançado

Copyright © 2010 Yahoo! do Brasil Internet Ltda. Todos os direitos reservados.
Política de Privacidade - Termos do Serviço - Diretrizes - Ajuda