NaturalDocs:: Languages:: Advanced

The base class for all languages that have full support in Natural Docs.  Each one will have a custom parser capable of documenting undocumented aspects of the code.

Summary
The base class for all languages that have full support in Natural Docs.
The class is implemented as a blessed arrayref.
Returns a new language object and adds it to NaturalDocs::Languages.
Returns the tokens found by ParseForCommentsAndTokens().
Replaces the tokens.
Resets the token list.
Returns the arrayref of automatically generated topics, or undef if none.
Resets the automatic topic list.
Returns an arrayref of NaturalDocs::Languages::Advanced::ScopeChange objects describing how and when the scope changed thoughout the file.
These functions are good general language building blocks.
Loads the passed file, sends all appropriate comments to NaturalDocs::Parser->OnComment(), and breaks the rest into an arrayref of tokens.
Converts the passed line to tokens as described in ParseForCommentsAndTokens and adds them to Tokens().
If the position is on a string delimiter, moves the position to the token following the closing delimiter, or past the end of the tokens if there is none.
Moves the position to the token following the next line break, or past the end of the tokens array if there is none.
Moves the position to the token following the next occurance of a particular token sequence, or past the end of the tokens array if it never occurs.
Returns whether the position is at the first token of a line, not including whitespace.
Returns whether the position is at the last token of a line, not including whitespace.
Returns whether the position is at a sequence of tokens.
Returns whether the position is after a backslash.
These functions provide a nice scope stack implementation for language-specific parsers to use.
Clears the scope stack for a new file.
Records a new scope level.
Records the end of the current scope level.
Returns the symbol that ends the current scope level, or undef if we are at the top level.
Returns the current calculated scope, or undef if global.
Returns the current calculated namespace, or undef if none.
Returns the current calculated package or class, or undef if none.
Returns the current protection, or undef if none.
Sets the namespace for the current scope level.
Sets the package or class for the current scope level.
Sets the protection for the current scope level.
Adds a change to the scope record, condensing unnecessary entries.
Converts the specified tokens into a string and returns it.

Implementation

Members

The class is implemented as a blessed arrayref.  The following constants are used as indexes.

TOKENSAn arrayref of tokens used in all the Parsing Functions.
SCOPE_STACKAn arrayref of NaturalDocs::Languages::Advanced::Scope objects serving as a scope stack for parsing.  There will always be one available, with a symbol of undef, for the top level.
SCOPE_RECORDAn arrayref of NaturalDocs::Languages::Advanced::ScopeChange objects, as generated by the scope stack.  If there is more than one change per line, only the last is stored.
AUTO_TOPICSAn arrayref of NaturalDocs::Parser::ParsedTopics generated automatically from the code.

Functions

New

sub New

Returns a new language object and adds it to NaturalDocs::Languages.

Tokens

sub Tokens

Returns the tokens found by ParseForCommentsAndTokens().

SetTokens

sub SetTokens #(tokens)

Replaces the tokens.

ClearTokens

sub ClearTokens

Resets the token list.  You may want to do this after parsing is over to save memory.

AutoTopics

sub AutoTopics

Returns the arrayref of automatically generated topics, or undef if none.

AddAutoTopic

sub AddAutoTopic #(topic)

Adds a NaturalDocs::Parser::ParsedTopic to AutoTopics().

ClearAutoTopics

sub ClearAutoTopics

Resets the automatic topic list.  Not necessary if you call ParseForCommentsAndTokens().

ScopeRecord

sub ScopeRecord

Returns an arrayref of NaturalDocs::Languages::Advanced::ScopeChange objects describing how and when the scope changed thoughout the file.

Parsing Functions

These functions are good general language building blocks.  Use them to create your language-specific parser.

All functions work on Tokens() and assume it is set by ParseForCommentsAndTokens().

ParseForCommentsAndTokens

sub ParseForCommentsAndTokens #(sourceFile,
lineCommentSymbols,
openingCommentSymbols,
closingCommentSymbols)

Loads the passed file, sends all appropriate comments to NaturalDocs::Parser->OnComment(), and breaks the rest into an arrayref of tokens.  Tokens are defined as

  • All consecutive alphanumeric and underscore characters.
  • All consecutive whitespace.
  • A single line break.  It will always be “\n”; you don’t have to worry about platform differences.
  • A single character not included above, which is usually a symbol.  Multiple consecutive ones each get their own token.

The result will be placed in Tokens().

Parameters

sourceFileThe source file to load and parse.
lineCommentSymbolsAn arrayref of symbols that designate line comments, or undef if none.
openingCommentSymbolsAn arrayref of symbols that designate the start of multiline comments, or undef if none.
closingCommentSymbolsAn arrayref of symbols that designate the end of multiline comments, or undef if none.

Notes

TokenizeLine

sub TokenizeLine #(line)

Converts the passed line to tokens as described in ParseForCommentsAndTokens and adds them to Tokens().  Also adds a line break token after it.

TryToSkipString

sub TryToSkipString #(indexRef,
lineNumberRef,
openingDelimiter,
closingDelimiter,
startContentIndexRef,
endContentIndexRef)

If the position is on a string delimiter, moves the position to the token following the closing delimiter, or past the end of the tokens if there is none.  Assumes all other characters are allowed in the string, the delimiter itself is allowed if it’s preceded by a backslash, and line breaks are allowed in the string.

Parameters

indexRefA reference to the position’s index into Tokens().
lineNumberRefA reference to the position’s line number.
openingDelimiterThe opening string delimiter, such as a quote or an apostrophe.
closingDelimiterThe closing string delimiter, if different.  If not defined, assumes the same as openingDelimiter.
startContentIndexRefA reference to a variable in which to store the index of the first token of the string’s content.  May be undef.
endContentIndexRefA reference to a variable in which to store the index of the end of the string’s content, which is one past the last index of content.  May be undef.

Returns

Whether the position was on the passed delimiter or not.  The index, line number, and content index ref variables will be updated only if true.

SkipRestOfLine

sub SkipRestOfLine #(indexRef,
lineNumberRef)

Moves the position to the token following the next line break, or past the end of the tokens array if there is none.  Useful for line comments.

Note that it skips blindly.  It assumes there cannot be anything of interest, such as a string delimiter, between the position and the end of the line.

Parameters

indexRefA reference to the position’s index into Tokens().
lineNumberRefA reference to the position’s line number.

SkipUntilAfter

sub SkipUntilAfter #(indexRef,
lineNumberRef,
token,
token,
token ...)

Moves the position to the token following the next occurance of a particular token sequence, or past the end of the tokens array if it never occurs.  Useful for multiline comments.

Note that it skips blindly.  It assumes there cannot be anything gof interest, such as a string delimiter, between the position and the end of the line.

Parameters

indexRefA reference to the position’s index.
lineNumberRefA reference to the position’s line number.
tokenA token that must be matched.  Can be specified multiple times to match a sequence of tokens.

IsFirstLineToken

sub IsFirstLineToken #(index)

Returns whether the position is at the first token of a line, not including whitespace.

Parameters

indexThe index of the position.

IsLastLineToken

sub IsLastLineToken #(index)

Returns whether the position is at the last token of a line, not including whitespace.

Parameters

indexThe index of the position.

IsAtSequence

sub IsAtSequence #(index,
token,
token,
token ...)

Returns whether the position is at a sequence of tokens.

Parameters

indexThe index of the position.
tokenA token to match.  Specify multiple times to specify the sequence.

IsBackslashed

sub IsBackslashed #(index)

Returns whether the position is after a backslash.

Parameters

indexThe index of the postition.

Scope Functions

These functions provide a nice scope stack implementation for language-specific parsers to use.  The default implementation makes the following assumptions.

  • Namespaces and packages completely replace one another, rather than concatenating.  If you call SetPackage(), it completely replaces the previous package for the current scope.  You need to concatenate manually if that’s the behavior.
  • Namespaces and packages inherit.  So if a scope level doesn’t set its own, the namespace and package are the same as the parent scope’s.
  • Protection applies to the current level only and does not inherit.  So if one is not set for the current scope level, CurrentProtection() will return undef rather than the parent scope’s value.

ClearScopeStack

sub ClearScopeStack

Clears the scope stack for a new file.  Not necessary if you call ParseForCommentsAndTokens().

StartScope

sub StartScope #(symbol,
lineNumber,
namespace,
package,
protection)

Records a new scope level.

Parameters

symbolThe closing symbol of the scope.
lineNumberThe line number where the scope begins.
namespaceThe namespace of the scope.  Undef means no change.
packageThe package or class of the scope.  Undef means no change.
protectionThe protection of the scope, such as public/private/protected.  Undef means no change.

EndScope

sub EndScope #(lineNumber)

Records the end of the current scope level.  Note that this is blind; you need to manually check ScopeSymbol() if you need to determine if it is correct to do so.

Parameters

lineNumberThe line number where the scope ends.

ScopeSymbol

sub ScopeSymbol

Returns the symbol that ends the current scope level, or undef if we are at the top level.

CurrentScope

sub CurrentScope

Returns the current calculated scope, or undef if global.  The default implementation just returns CurrentPackage().  If your language supports namespaces, override this function to join CurrentNamespace() and CurrentPackage().

CurrentNamespace

sub CurrentNamespace

Returns the current calculated namespace, or undef if none.

CurrentPackage

sub CurrentPackage

Returns the current calculated package or class, or undef if none.

CurrentProtection

sub CurrentProtection

Returns the current protection, or undef if none.  Assumes protection doesn’t inherit like package and namespace do.

SetNamespace

sub SetNamespace #(namespace,
lineNumber)

Sets the namespace for the current scope level.

Parameters

namespaceThe new namespace.
lineNumberThe line number the new namespace starts on.

SetPackage

sub SetPackage #(package,
lineNumber)

Sets the package or class for the current scope level.

Parameters

packageThe new package.
lineNumberThe line number the new package starts on.

SetProtection

sub SetProtection #(protection,
lineNumber)

Sets the protection for the current scope level.

Parameters

protectionThe new protection level.
lineNumberThe line number the new protection starts on.

Support Functions

AddToScopeRecord

sub AddToScopeRecord #(newScope,
lineNumber)

Adds a change to the scope record, condensing unnecessary entries.

Parameters

newScopeWhat the scope changed to.
lineNumberWhere the scope changed.

CreateString

sub CreateString #(startIndex,
endIndex)

Converts the specified tokens into a string and returns it.

Parameters

startIndexThe starting index to convert.
endIndexThe ending index, which is not inclusive.

Returns

The string.

A base class for all programming language parsers.
A subclass to handle the language variations of Perl.
sub New
Returns a new language object and adds it to NaturalDocs::Languages.
A package to manage all the programming languages Natural Docs supports.
sub Tokens
Returns the tokens found by ParseForCommentsAndTokens().
sub ParseForCommentsAndTokens #(sourceFile,
lineCommentSymbols,
openingCommentSymbols,
closingCommentSymbols)
Loads the passed file, sends all appropriate comments to NaturalDocs::Parser->OnComment(), and breaks the rest into an arrayref of tokens.
sub SetTokens #(tokens)
Replaces the tokens.
sub ClearTokens
Resets the token list.
sub AutoTopics
Returns the arrayref of automatically generated topics, or undef if none.
sub AddAutoTopic #(topic)
Adds a NaturalDocs::Parser::ParsedTopic to AutoTopics().
A class for parsed topics of source files.
sub ClearAutoTopics
Resets the automatic topic list.
sub ScopeRecord
Returns an arrayref of NaturalDocs::Languages::Advanced::ScopeChange objects describing how and when the scope changed thoughout the file.
A class used to store a scope change.
sub OnComment #(commentLines,
lineNumber)
The function called by NaturalDocs::Languages::Base-derived objects when their parsers encounter a comment suitable for documentation.
sub TokenizeLine #(line)
Converts the passed line to tokens as described in ParseForCommentsAndTokens and adds them to Tokens().
sub TryToSkipString #(indexRef,
lineNumberRef,
openingDelimiter,
closingDelimiter,
startContentIndexRef,
endContentIndexRef)
If the position is on a string delimiter, moves the position to the token following the closing delimiter, or past the end of the tokens if there is none.
sub SkipRestOfLine #(indexRef,
lineNumberRef)
Moves the position to the token following the next line break, or past the end of the tokens array if there is none.
sub SkipUntilAfter #(indexRef,
lineNumberRef,
token,
token,
token ...)
Moves the position to the token following the next occurance of a particular token sequence, or past the end of the tokens array if it never occurs.
sub IsFirstLineToken #(index)
Returns whether the position is at the first token of a line, not including whitespace.
sub IsLastLineToken #(index)
Returns whether the position is at the last token of a line, not including whitespace.
sub IsAtSequence #(index,
token,
token,
token ...)
Returns whether the position is at a sequence of tokens.
sub IsBackslashed #(index)
Returns whether the position is after a backslash.
sub ClearScopeStack
Clears the scope stack for a new file.
sub StartScope #(symbol,
lineNumber,
namespace,
package,
protection)
Records a new scope level.
sub EndScope #(lineNumber)
Records the end of the current scope level.
sub ScopeSymbol
Returns the symbol that ends the current scope level, or undef if we are at the top level.
sub CurrentScope
Returns the current calculated scope, or undef if global.
sub CurrentNamespace
Returns the current calculated namespace, or undef if none.
sub CurrentPackage
Returns the current calculated package or class, or undef if none.
sub CurrentProtection
Returns the current protection, or undef if none.
sub SetNamespace #(namespace,
lineNumber)
Sets the namespace for the current scope level.
sub SetPackage #(package,
lineNumber)
Sets the package or class for the current scope level.
sub SetProtection #(protection,
lineNumber)
Sets the protection for the current scope level.
sub AddToScopeRecord #(newScope,
lineNumber)
Adds a change to the scope record, condensing unnecessary entries.
sub CreateString #(startIndex,
endIndex)
Converts the specified tokens into a string and returns it.
These functions are good general language building blocks.
A class used to store a scope level.