pygments.lexer

Base lexer classes.

copyright:Copyright 2006-2013 by the Pygments team, see AUTHORS.
license:BSD, see LICENSE for details.
class pygments.lexer.Lexer(**options)[source]

Lexer for a specific language.

Basic options recognized: stripnl

Strip leading and trailing newlines from the input (default: True).
stripall
Strip all leading and trailing whitespace from the input (default: False).
ensurenl
Make sure that the input ends with a newline (default: True). This is required for some lexers that consume input linewise. New in Pygments 1.3.
tabsize
If given and greater than 0, expand tabs in the input (default: 0).
encoding
If given, must be an encoding name. This encoding will be used to convert the input string to Unicode, if it is not already a Unicode string (default: 'latin1'). Can also be 'guess' to use a simple UTF-8 / Latin1 detection, or 'chardet' to use the chardet library, if it is installed.
add_filter(filter_, **options)[source]

Add a new stream filter to this lexer.

alias_filenames = []

Secondary file name globs

aliases = []

Shortcuts for the lexer

static analyse_text(text)[source]

Has to return a float between 0 and 1 that indicates if a lexer wants to highlight this text. Used by guess_lexer. If this method returns 0 it won’t highlight it in any case, if it returns 1 highlighting with this lexer is guaranteed.

The LexerMeta metaclass automatically wraps this function so that it works like a static method (no self or cls parameter) and the return value is automatically converted to float. If the return value is an object that is boolean False it’s the same as if the return values was 0.0.

filenames = []

File name globs

get_tokens(text, unfiltered=False)[source]

Return an iterable of (tokentype, value) pairs generated from text. If unfiltered is set to True, the filtering mechanism is bypassed even if filters are defined.

Also preprocess the text, i.e. expand tabs and strip it if wanted and applies registered filters.

get_tokens_unprocessed(text)[source]

Return an iterable of (tokentype, value) pairs. In subclasses, implement this method as a generator to maximize effectiveness.

mimetypes = []

MIME types

name = None

Name of the lexer

priority = 0

Priority, should multiple lexers match and no content is provided

class pygments.lexer.RegexLexer(**options)[source]

Base for simple stateful regular expression-based lexers. Simplifies the lexing process so that you need only provide a list of states and regular expressions.

flags = 8

Flags for compiling the regular expressions. Defaults to MULTILINE.

get_tokens_unprocessed(text, stack=('root', ))[source]

Split text into (tokentype, text) pairs.

stack is the inital stack (default: ['root'])

tokens = {}

Dict of {'state': [(regex, tokentype, new_state), ...], ...}

The initial state is ‘root’. new_state can be omitted to signify no state transition. If it is a string, the state is pushed on the stack and changed. If it is a tuple of strings, all states are pushed on the stack and the current state will be the topmost. It can also be combined('state1', 'state2', ...) to signify a new, anonymous state combined from the rules of two or more existing ones. Furthermore, it can be ‘#pop’ to signify going back one step in the state stack, or ‘#push’ to push the current state on the stack again.

The tuple can also be replaced with include('state'), in which case the rules from the state named by the string are included in the current one.

class pygments.lexer.ExtendedRegexLexer(**options)[source]

A RegexLexer that uses a context object to store its state.

get_tokens_unprocessed(text=None, context=None)[source]

Split text into (tokentype, text) pairs. If context is given, use this lexer context instead.

class pygments.lexer.DelegatingLexer(_root_lexer, _language_lexer, _needle=Token.Other, **options)[source]

This lexer takes two lexer as arguments. A root lexer and a language lexer. First everything is scanned using the language lexer, afterwards all Other tokens are lexed using the root lexer.

The lexers from the template lexer package use this base lexer.

class pygments.lexer.LexerContext(text, pos, stack=None, end=None)[source]

A helper object that holds lexer position data.

class pygments.lexer.include[source]

Indicates that a state should include rules from another state.

pygments.lexer.bygroups(*args)[source]

Callback that yields multiple actions for each group in the match.

pygments.lexer.using(_other, **kwargs)[source]

Callback that processes the match with a different lexer.

The keyword arguments are forwarded to the lexer, except state which is handled separately.

state specifies the state that the new lexer will start in, and can be an enumerable such as (‘root’, ‘inline’, ‘string’) or a simple string which is assumed to be on top of the root state.

Note: For that to work, _other must not be an ExtendedRegexLexer.

Project Versions

This Page