Regexp::Common::comment -- provide regexes for comments.
use Regexp::Common qw /comment/; while (<>) { /$RE{comment}{C}/ and print "Contains a C comment\n"; /$RE{comment}{C++}/ and print "Contains a C++ comment\n"; /$RE{comment}{PHP}/ and print "Contains a PHP comment\n"; /$RE{comment}{Java}/ and print "Contains a Java comment\n"; /$RE{comment}{Perl}/ and print "Contains a Perl comment\n"; /$RE{comment}{awk}/ and print "Contains an awk comment\n"; /$RE{comment}{HTML}/ and print "Contains an HTML comment\n"; } use Regexp::Common qw /comment RE_comment_HTML/; while (<>) { $_ =~ RE_comment_HTML() and print "Contains an HTML comment\n"; }
Please consult the manual of Regexp::Common for a general description of the works of this interface.
Do not use this module directly, but load it via Regexp::Common.
This modules gives you regular expressions for comments in various languages.
Below, the comments of each of the languages are described.
The patterns are available as $RE{comment}{LANG}
, foreach
language LANG. Some languages have variants; it's described
at the individual languages how to get the patterns for the variants.
Unless mentioned otherwise,
{-keep}
sets $1
, $2
, $3
and $4
to the entire comment,
the opening marker, the content of the comment, and the closing marker
(for many languages, the latter is a newline) respectively.
\
), and last till
the end of the line.
See http://homepages.cwi.nl/%7Esteven/abc/.
--
, and last till the end of the line.
#
or //
, and last till the
end of the line.
;
and last till
the end of the line. See also http://www.wurb.com/if/devsys/12.
--
, and last till the end of the line.
See also http://w1.132.telia.com/~u13207378/alan/manual/alanTOC.html.
comment
,
and end with a ;
. See http://www.masswerk.at/algol60/report.htm.
#
, or by one of the
keywords co
or comment
. The keywords should not be part of another
word. See http://westein.arb-phys.uni-dortmund.de/~wb/a68s.txt.
With {-keep}
, only $1
will be set, returning the entire comment.
/*
and ending with */
.
#
and end at the end of the line.
/*
and ending with */
.
$RE{comment}{BASIC}{mvEnterprise}
. Comments in this language start with a
!
, a *
or the keyword REM
, and end till the end of the line. See
http://www.rainingdata.com/products/beta/docs/mve/50/ReferenceManual/Basic.pdf.
{-keep}
, $1
will be set, and set to the
entire comment. This pattern requires perl 5.8.0 or newer.
//
and that continue till the end of the line. See also
http://www.catseye.mb.ca/esoteric/b-juliet/index.html.
;
. See http://www.catseye.mb.ca/esoteric/befunge/98/spec98.html.
<?c_
, and ending with c_?>
.
See http://www.livejournal.com/doc/server/bml.index.html.
<
, >
, [
, ]
, +
, -
, .
and ,
.
Any other characters are considered comments. With {-keep}
,
$1
is set to the entire comment.
/*
and ending with */
.
/*
and ending with */
.
See http://cs.uas.arizona.edu/classes/453/programs/C--Spec.html.
//
and last till the end of the line, and comments that start with
/*
, and end with */
. If {-keep}
is used, only $1
will be
set, and set to the entire comment.
//
and last till the end of the line, and comments that start with
/*
, and end with */
. If {-keep}
is used, only $1
will be
set, and set to the entire comment.
See http://msdn.microsoft.com/library/default.asp?url=/library/en-us/csspec/html/vclrfcsharpspec_C.asp.
(*
, end with *)
, and can be nested.
See http://www.cs.caltech.edu/courses/cs134/cs134b/book.pdf and
http://pauillac.inria.fr/caml/index-eng.html.
//
and last till the end of the line, and comments that start with
/*
, and end with */
. If {-keep}
is used, only $1
will be
set, and set to the entire comment.
See http://developer.nvidia.com/attach/3722.
CLU
, a comment starts with a procent sign (%
), and ends with the
next newline. See ftp://ftp.lcs.mit.edu:/pub/pclu/CLU-syntax.ps and
http://www.pmg.lcs.mit.edu/CLU.html.
;
) and last till the end of the line. See http://www.rbnn.com/cql/.
//
, and end with the end of the line.
//
, or are nested comments, delimited with /*
and */
.
Under {-keep}
, only $1
will be set, returning the entire comment.
This pattern requires perl 5.6.0 or newer.
//
and last till the end of the line, and comments that start with
/*
, and end with */
. If {-keep}
is used, only $1
will be
set, and set to the entire comment. JavaScript is Netscapes implementation
of ECMAScript. See
http://www.ecma-international.org/publications/files/ecma-st/Ecma-262.pdf,
and http://www.ecma-international.org/publications/standards/Ecma-262.htm.
--
, and last till the end of the line.
{
and end with }
.
See http://wouter.fov120.com/false/false.txt
//
and last till the end of the line, and comments that start with
/*
, and end with */
. If {-keep}
is used, only $1
will be
set, and set to the entire comment.
\
, and end with the end of the line.
See also http://docs.sun.com/sb/doc/806-1377-10.
There are two forms of Fortran. There's free form Fortran, which
has comments that start with !
, and end at the end of the line.
The pattern for this is given by $RE{Fortran}
. Fixed form Fortran,
which has been obsoleted, has comments that start with C
, c
or
*
in the first column, or with !
anywhere, but the sixth column.
The pattern for this are given by $RE{Fortran}{fixed}
.
See also http://www.cray.com/craydoc/manuals/007-3692-005/html-007-3692-005/.
;
.
#
and lasting the rest of the line.
,
.
See http://www.dangermouse.net/esoteric/haifu.html.
{-
and -}
.
Under {-keep}
, only $1
will be set, returning the entire comment.
This pattern requires perl 5.6.0 or newer.
In HTML, comments only appear inside a comment declaration.
A comment declaration starts with a <!
, and ends with a
>
. Inside this declaration, we have zero or more comments.
Comments starts with --
and end with --
, and are optionally
followed by whitespace. The pattern $RE{comment}{HTML}
recognizes
those comment declarations (and hence more than a comment).
Note that this is not the same as something that starts with
<!--
and ends with -->
, because the following will
be matched completely:
<!-- First Comment -- --> Second Comment <!-- -- Third Comment -->
Do not be fooled by what your favourite browser thinks is an HTML comment.
If {-keep}
is used, the following are returned:
<!
.
>
.
!
(which cannot be followed by a \
), or are nested comments,
delimited with !\
and \!
.
Under {-keep}
, only $1
will be set, returning the entire comment.
This pattern requires perl 5.6.0 or newer.
#
and end at the next new line.
See http://www.toolsofcomputing.com/IconHandbook/IconHandbook.pdf,
http://www.cs.arizona.edu/icon/index.htm, and
http://burks.bton.ac.uk/burks/language/icon/index.htm.
NOT
or N'T
, and can optionally be preceeded by the
keywords DO
and PLEASE
. If both keywords are used, PLEASE
preceeds DO
. Keywords are separated by whitespace.
NB.
, and that last till
the end of the line. See
http://www.jsoftware.com/books/help/primer/contents.htm, and
http://www.jsoftware.com/.
//
and last till the end of the line, and comments that start with
/*
, and end with */
. If {-keep}
is used, only $1
will be
set, and set to the entire comment.
//
and last till the end of the line, and comments that start with
/*
, and end with */
. If {-keep}
is used, only $1
will be
set, and set to the entire comment. JavaScript is Netscapes implementation
of ECMAScript.
See http://www.mozilla.org/js/language/E262-3.pdf,
and http://www.mozilla.org/js/language/.
%
and ending at the end of the line.
;
) and last till the
end of the line.
/*
and ending with */
.
;
, and last till the end
of the line.
--
, and last till the end
of the line. See also http://www.lua.org/manual/manual.html.
M
(aka MUMPS
), comments start with a semi-colon, and last
till the end of a line. The language specification requires the
semi-colon to be preceeded by one or more linestart characters.
Those characters default to a space, but that's configurable. This
requirement, of preceeding the comment with linestart characters is
not tested for. See
ftp://ftp.intersys.com/pub/openm/ism/ism64docs.zip,
http://mtechnology.intersys.com/mproducts/openm/index.html, and
http://mcenter.com/mtrc/index.html.
#
and continue to the end of the line, including
the newline. The pattern $RE {comment} {m4}
matches such comments.
In m4, it is possible to change the starting token though.
See http://wolfram.schneider.org/bsd/7thEdManVol2/m4/m4.pdf,
http://www.cs.stir.ac.uk/~kjt/research/pdf/expl-m4.pdf, and
http://www.gnu.org/software/m4/manual/.
Modula-2
, comments start with (*
, and end with *)
. Comments
may be nested. See http://www.modula2.org/.
Modula-3
, comments start with (*
, and end with *)
. Comments
may be nested. See http://www.m3.org/.
#
and lasting the rest of the line.
#
(like Perl), or multiline comments delimited by /*
and */
(like C). Under -keep
, only $1
will be set. See also
http://www.nickle.org.
(*
and end with *)
.
See http://www.oberon.ethz.ch/oreport.html.
There are many implementations of Pascal. This modules provides pattern for comments of several implementations.
$RE{comment}{Pascal}
{
, or
(*
, and end with }
or *)
. This means that {*)
and (*}
are considered to be comments. Many Pascal applications don't allow this.
See http://www.pascal-central.com/docs/iso10206.txt
$RE{comment}{Alice}
{
and end with }
. Comments are not allowed to contain newlines.
See http://www.templetons.com/brad/alice/language/.
$RE{comment}{Pascal}{Delphi}
, $RE{comment}{Pascal}{Free}
and $RE{comment}{Pascal}{GPC}
The Delphi Pascal, Free Pascal and the Gnu Pascal Compiler
implementations of Pascal all have comments that either start with
//
and last till the end of the line, are delimited with {
and }
or are delimited with (*
and *)
. Patterns for those
comments are given by $RE{comment}{Pascal}{Delphi}
,
$RE{comment}{Pascal}{Free}
and $RE{comment}{Pascal}{GPC}
respectively. These patterns only set $1
when {-keep}
is used,
which will then include the entire comment.
See http://info.borland.com/techpubs/delphi5/oplg/, http://www.freepascal.org/docs-html/ref/ref.html and http://www.gnu-pascal.de/gpc/.
$RE{comment}{Pascal}{Workshop}
The Workshop Pascal compiler, from SUN Microsystems, allows comments
that are delimited with either {
and }
, delimited with
(*)
and *
), delimited with /*
, and */
, or starting
and ending with a double quote ("
). When {-keep}
is used,
only $1
is set, and returns the entire comment.
!
and last till the end of the
line, or start with /*
and end with */
. With {-keep}
,
$1
will be set to the entire comment.
#
or //
and last till the
end of the line, or are delimited by /*
and */
. With {-keep}
,
$1
will be set to the entire comment.
.
or ;
, and end with the
next newline. See http://www.mmcctech.com/pl-b/plb-0010.htm.
/*
and ending with */
.
--
and run till the end
of the line, or start with /*
and end with */
.
#
, and continue till the end
of the line.
//
,
and last till the end of the line.
#
, and continue till the end
of the line.
`
(a backtick), and
contine till the end of the line.
QML
, comments start with #
and last till the end of the line.
See http://www.questionmark.com/uk/qml/overview.doc.
#
and
end with the following new line. See http://www.r-project.org/.
;
and last till the
end of the line.
#
and last till the end of the time.
;
, and last till the end of the line.
See http://schemers.org/.
#
and end at the end of
the line.
;
. See http://www.catseye.mb.ca/esoteric/shelta/index.html.
#
and includes the rest of the
line (just like Perl). Second, there is the multiline, nested comment,
which are delimited by (*
and *)
. Under C{-keep}>, only
$1
is set, and is set to the entire comment. This pattern needs
at least Perl version 5.6.0. See
http://www.cs.berkeley.edu/~ug/slide/docs/slide/spec/spec_frame_intro.shtml.
%
and lasting the rest of the line.
"
.
;
, and last till the
end of the line.
"
. Double quotes can appear inside comments by doubling them.
Standard SQL uses comments starting with two or more dashes, and ending at the end of the line.
MySQL does not follow the standard. Instead, it allows comments
that start with a #
or --
(that's two dashes and a space)
ending with the following newline, and comments starting with
/*
, and ending with the next ;
or */
that isn't inside
single or double quotes. A pattern for this is returned by
$RE{comment}{SQL}{MySQL}
. With {-keep}
, only $1
will
be set, and it returns the entire comment.
#
and continue till the end of the line.
%
and ending at the end of the line.
\"
, and continuing till the end of the line.
//
and continue to the end of the line. See http://www.ubercode.com.
"
, and ending at the end of the line.
||
, and end with !!
.
;
, and continue till the
end of the line.
'
character, and end at the following newline. See
http://dave2.rocketjump.org/rad/zzthelp/lang.html.
$Log: comment.pm,v $ Revision 2.120 2008/05/26 15:46:07 abigail Fix "Variable "%s" is not available Revision 2.119 2008/05/26 15:43:52 abigail Fixed bug in pattern for Pascal comments Revision 2.118 2008/05/23 21:30:09 abigail Changed email address Revision 2.117 2008/05/23 21:28:01 abigail Changed license Revision 2.116 2005/03/16 00:00:02 abigail CQL, INTERCAL, R Revision 2.115 2005/01/09 23:12:03 abigail BML comments Revision 2.114 2004/12/18 11:43:06 abigail POD: HTML comments end in >, not < Revision 2.113 2004/12/15 22:06:51 abigail Fixed regex for J comments Revision 2.112 2004/06/09 21:44:48 abigail New languages Revision 2.111 2003/09/24 08:39:35 abigail Stupid "syntax" warning issues false positives Revision 2.110 2003/08/19 21:27:55 abigail Nickle language Revision 2.109 2003/08/13 10:07:39 abigail Added patterns for C--, C#, Cg and SLIDE comments Revision 2.108 2003/08/01 11:30:25 abigail Comments for 'QML' and 'PL/SQL' Revision 2.107 2003/05/25 21:33:48 abigail POD nits from Bryan C. Warnock Revision 2.106 2003/03/12 22:25:42 abigail - More generic setup to define comments for various languages. - Expanded and redid the documentation for comment.pm. - Comments for Advisor, Advsys, Alan, Algol 60, Algol 68, B, BASIC (mvEnterprise), Forth, Fortran (both fixed and free form), fvwm2, mutt, Oberon, 6 versions of Pascal, PEARL (one of the at least four...), PL/B, PL/I, slrn, Squeak. Revision 2.105 2003/03/09 19:04:42 abigail - More generic setup to define comments for various languages. - Expanded and redid the documentation for comment.pm. Now every language has its own paragraph, describing its comment, and pointers to webpages. - Comments for Advisor, Advsys, Alan, Algol 60, Algol 68, B, BASIC (mvEnterprise), Forth, Fortran (both fixed and free form), fvwm2, mutt, Oberon, 6 versions of Pascal, PEARL (one of the at least four...), PL/B, PL/I, slrn, Squeak. Revision 2.104 2003/02/21 14:48:06 abigail Crystal Reports Revision 2.103 2003/02/11 09:39:08 abigail Added Revision 2.102 2003/02/07 15:23:54 abigail Lua and FPL Revision 2.101 2003/02/01 22:55:31 abigail Changed Copyright years Revision 2.100 2003/01/21 23:19:40 abigail The whole world understands RCS/CVS version numbers, that 1.9 is an older version than 1.10. Except CPAN. Curse the idiot(s) who think that version numbers are floats (in which universe do floats have more than one decimal dot?). Everything is bumped to version 2.100 because CPAN couldn't deal with the fact one file had version 1.10. Revision 1.19 2002/11/06 13:51:34 abigail Minor POD changes. Revision 1.18 2002/09/18 18:13:01 abigail Fixes for 5.005 Revision 1.17 2002/09/04 17:04:24 abigail Q-BAL Revision 1.16 2002/08/27 16:50:50 abigail Patterns for Beatnik, Befunge-98, Funge-98 and W*. Revision 1.15 2002/08/22 17:04:03 abigail SMITH added Revision 1.14 2002/08/22 16:41:25 abigail + Added function 'id' and 'from_to' with associated data. + Added function 'combine' for languages having multiple syntaxes. + Added 'Shelta' Revision 1.13 2002/08/21 16:00:32 abigail beta-Juliet, Portia, ILLGOL and Brainfuck. Revision 1.12 2002/08/20 17:40:37 abigail - Created a 'nested' function (simplified version from Regexp::Common::balanced). - Comments that use 'from' to eol or balanced (nested) delimiters are now generated from a data array. - Added Hugo and Haifu. Revision 1.11 2002/08/05 12:16:58 abigail Fixed 'Regex::' and 'Rexexp::' typos to 'Regexp::' (Found my Mike Castle). Revision 1.10 2002/07/31 23:33:16 abigail Documented that Haskell and Dylan comments need at least 5.6.0. Revision 1.9 2002/07/31 23:12:29 abigail Dylan and Haskell comments can be nested, hence version 5.6.0 of Perl is needed to be able to make a regex matching them. Revision 1.8 2002/07/31 14:48:16 abigail Added LOGO (to please petdance) Revision 1.7 2002/07/31 13:06:41 abigail Dealt with -keep for Haskell and Dylan. Revision 1.6 2002/07/31 00:54:00 abigail Added comments for Haskell, Dylan, Smalltalk and MySQL. Revision 1.5 2002/07/30 16:38:23 abigail Added support for the languages: LaTeX, Tcl, TeX and troff. Revision 1.4 2002/07/26 16:48:12 abigail Simplied datastructure for the languages that use single line comments. Revision 1.3 2002/07/26 16:37:20 abigail Added new languages: Ada, awk, Eiffel, Java, LPC, PHP, Python, REBOL, Ruby, vi and zonefile. Revision 1.2 2002/07/25 22:37:44 abigail Added 'use strict'. Added 'no_defaults' to 'use Regex::Common' to prevent loaded of all defaults. Revision 1.1 2002/07/25 19:56:07 abigail Modularizing Regexp::Common.
Regexp::Common for a general description of how to use this interface.
Damian Conway (damian@conway.org)
This package is maintained by Abigail (regexp-common@abigail.be).
Bound to be plenty.
For a start, there are many common regexes missing. Send them in to regexp-common@abigail.be.
This software is Copyright (c) 2001 - 2008, Damian Conway and Abigail.
This module is free software, and maybe used under any of the following licenses:
1) The Perl Artistic License. See the file COPYRIGHT.AL. 2) The Perl Artistic License 2.0. See the file COPYRIGHT.AL2. 3) The BSD Licence. See the file COPYRIGHT.BSD. 4) The MIT Licence. See the file COPYRIGHT.MIT.