Django/テンプレートシステムを読む（テンプレートのパース）の変更点

 [[Djangoを読む]]
 
 #contents
 
 *はじめに [#n7eb059c]
 
 前回のおさらい。django.template.loaderのget_templateを呼び出すと最終的にdjango.template.loaders.baseのLoaderクラスのget_templateメソッドに行きつきます。
 
 #code(Python){{
     def get_template(self, template_name, template_dirs=None, skip=None):
         """
         Calls self.get_template_sources() and returns a Template object for
         the first template matching template_name. If skip is provided,
         template origins in skip are ignored. This is used to avoid recursion
         during template extending.
         """
         tried = []
 
         args = [template_name]
         # RemovedInDjango20Warning: Add template_dirs for compatibility with
         # old loaders
         if func_supports_parameter(self.get_template_sources, 'template_dirs'):
             args.append(template_dirs)
 
         for origin in self.get_template_sources(*args):
             if skip is not None and origin in skip:
                 tried.append((origin, 'Skipped'))
                 continue
 
             try:
                 contents = self.get_contents(origin)
             except TemplateDoesNotExist:
                 tried.append((origin, 'Source does not exist'))
                 continue
             else:
                 return Template(
                     contents, origin, origin.template_name, self.engine,
                 )
 
         raise TemplateDoesNotExist(template_name, tried=tried)
 }}
 
 今回はこの中の、Templateインスタンス生成の中に入っていきます。
 
 *django/template/base.py [#f18eaeb3]
 
 Templateとは何者なのか確認。
 
 #code(Python){{
 from django.template import Origin, Template, TemplateDoesNotExist
 }}
 
 django/template/__init__.py
 
 #code(Python){{
 # Template parts
 from .base import (                                                     # NOQA isort:skip
     Context, Node, NodeList, Origin, RequestContext, StringOrigin, Template,
     Variable,
 )
 }}
 
 うーん、ややこしい。というわけでTemplateクラスはdjango/templateのbase.pyに書かれています。Templateの__init__メソッド
 
 #code(Python){{
 class Template(object):
     def __init__(self, template_string, origin=None, name=None, engine=None):
         try:
             template_string = force_text(template_string)
         except UnicodeDecodeError:
             raise TemplateEncodingError("Templates can only be constructed "
                                         "from unicode or UTF-8 strings.")
         # If Template is instantiated directly rather than from an Engine and
         # exactly one Django template engine is configured, use that engine.
         # This is required to preserve backwards-compatibility for direct use
         # e.g. Template('...').render(Context({...}))
         if engine is None:
             from .engine import Engine
             engine = Engine.get_default()
         if origin is None:
             origin = Origin(UNKNOWN_SOURCE)
         self.name = name
         self.origin = origin
         self.engine = engine
         self.source = template_string
         self.nodelist = self.compile_nodelist()
 }}
 
 compile_nodelistメソッドによりコンパイルが行われるようです。
 
 #code(Python){{
     def compile_nodelist(self):
         """
         Parse and compile the template source into a nodelist. If debug
         is True and an exception occurs during parsing, the exception is
         is annotated with contextual line information where it occurred in the
         template source.
         """
         if self.engine.debug:
             lexer = DebugLexer(self.source)
         else:
             lexer = Lexer(self.source)
 
         tokens = lexer.tokenize()
         parser = Parser(
             tokens, self.engine.template_libraries, self.engine.template_builtins,
             self.origin,
         )
 
         try:
             return parser.parse()
         except Exception as e:
             if self.engine.debug:
                 e.template_debug = self.get_exception_info(e, e.token)
             raise
 }}
 
 デバッグ用の処理が少し追加されていますが、典型的なコンパイル処理になっています。すなわち、
 
 +Lexerでトークン化
 +Parserでトークンからノードに変換
 
 それぞれ見ていきましょう。
 
 **Lexer [#fd23b4d3]
 
 Lexerは同じbase.py内に書かれています。
 
 #code(Python){{
     def tokenize(self):
         """
         Return a list of tokens from a given template_string.
         """
         in_tag = False
         lineno = 1
         result = []
         for bit in tag_re.split(self.template_string):
             if bit:
                 result.append(self.create_token(bit, None, lineno, in_tag))
             in_tag = not in_tag
             lineno += bit.count('\n')
         return result
 }}
 
 tag_reはbase.pyの先頭の方に書かれています。{%, %}で囲むことでブロック（いわゆるタグ）、{{, }}で変数参照を表します。
 ※}}を含むのでシンタックスハイライトなしで貼ってます
 
  BLOCK_TAG_START = '{%'
  BLOCK_TAG_END = '%}'
  VARIABLE_TAG_START = '{{'
  VARIABLE_TAG_END = '}}'
  COMMENT_TAG_START = '{#'
  COMMENT_TAG_END = '#}'
  
  # match a variable or block tag and capture the entire tag, including start/end
  # delimiters
  tag_re = (re.compile('(%s.*?%s|%s.*?%s|%s.*?%s)' %
            (re.escape(BLOCK_TAG_START), re.escape(BLOCK_TAG_END),
             re.escape(VARIABLE_TAG_START), re.escape(VARIABLE_TAG_END),
             re.escape(COMMENT_TAG_START), re.escape(COMMENT_TAG_END))))
 
 [[正規表現オブジェクトのsplitの挙動>https://docs.python.jp/3/library/re.html#re.split]]ですが以下のように記載されています。
 
  キャプチャグループの丸括弧が pattern で使われていれば、
  パターン内のすべてのグループのテキストも結果のリストの一部として返されます。
 
 実際、splitを実行してみると以下のようにテンプレートのタグ部分、タグ以外の部分、というようにsplitされます。
 
  ['', '{% if latest_question_list %}',
   '\n    <ul>\n    ', '{% for question in latest_question_list %}',
   '\n        <li><a href="/polls/', '{{ question.id }}',
   '/">', '{{ question.question_text }}',
   '</a></li>\n    ', '{% endfor %}',
   '\n    </ul>\n', '{% else %}',
   '\n    <p>No polls are available.</p>\n', '{% endif %}']
 
 create_tokenではタグの種類によってTokenオブジェクトを作成しています。
 
 #code(Python){{
     def create_token(self, token_string, position, lineno, in_tag):
         """
         Convert the given token string into a new Token object and return it.
         If in_tag is True, we are processing something that matched a tag,
         otherwise it should be treated as a literal string.
         """
         if in_tag and token_string.startswith(BLOCK_TAG_START):
             # The [2:-2] ranges below strip off *_TAG_START and *_TAG_END.
             # We could do len(BLOCK_TAG_START) to be more "correct", but we've
             # hard-coded the 2s here for performance. And it's not like
             # the TAG_START values are going to change anytime, anyway.
             block_content = token_string[2:-2].strip()
             if self.verbatim and block_content == self.verbatim:
                 self.verbatim = False
         if in_tag and not self.verbatim:
             if token_string.startswith(VARIABLE_TAG_START):
                 token = Token(TOKEN_VAR, token_string[2:-2].strip(), position, lineno)
             elif token_string.startswith(BLOCK_TAG_START):
                 if block_content[:9] in ('verbatim', 'verbatim '):
                     self.verbatim = 'end%s' % block_content
                 token = Token(TOKEN_BLOCK, block_content, position, lineno)
             elif token_string.startswith(COMMENT_TAG_START):
                 content = ''
                 if token_string.find(TRANSLATOR_COMMENT_MARK):
                     content = token_string[2:-2].strip()
                 token = Token(TOKEN_COMMENT, content, position, lineno)
         else:
             token = Token(TOKEN_TEXT, token_string, position, lineno)
         return token
 }}
 
 verbatimはテンプレートの処理を無効化する（つまり、タグが書かれていてもTOKEN_TEXTとして扱う）ものです。
 
 さて、対象のtoken_stringがタグなのかそうじゃないのかを表すin_tagですが、呼び出し元のtokenizeメソッドでループを回るたびにFalse→True→Falseと反転するようになっています。何故これでいいのか不思議な感じですが先のsplit例を見ると、テキスト→タグ→テキストと分割されていたので単純に反転させるだけでタグなのかテキストなのかを判定できるようです。
 
 **Parser [#k6de54b7]
 
 Parserもbase.py内に書かれています。parseメソッドは長いので重要なところだけ抜き出し。
 
 ***変数 [#z5262705]
 
 #code(Python){{
             elif token.token_type == 1:  # TOKEN_VAR
                 if not token.contents:
                     raise self.error(token, 'Empty variable tag on line %d' % token.lineno)
                 try:
                     filter_expression = self.compile_filter(token.contents)
                 except TemplateSyntaxError as e:
                     raise self.error(token, e)
                 var_node = VariableNode(filter_expression)
                 self.extend_nodelist(nodelist, var_node, token)
 }}
 
 compile_filterはFilterExpressionオブジェクト生成しているだけなのでFilterExpressionの__init__メソッドを見てみます（注目部分のみ抜き出し）
 
 #code(Python){{
     def __init__(self, token, parser):
         self.token = token
         matches = filter_re.finditer(token)
         var_obj = None
         filters = []
         for match in matches:
             if var_obj is None:
                 var, constant = match.group("var", "constant")
                 if constant:
                     # 省略
                 elif var is None:
                     # 省略
                 else:
                     var_obj = Variable(var)
             else:
                 # 省略
 
         self.filters = filters
         self.var = var_obj
 }}
 
 filter_reはFilterExpressionクラス定義のすぐ上にあります。フィルター引数まで考慮した正規表現なのでなかなかすごいことになっていますが、%記法をうまいこと使って階層的に記述が行われています。
 
 ともかく変数は、
 
  VariableNode
    FilterExpression
      Variable
 
 というオブジェクトの構造になるようです。
 
 ***ブロック [#acaad311]
 
 #code(Python){{
             elif token.token_type == 2:  # TOKEN_BLOCK
                 try:
                     command = token.contents.split()[0]
                 except IndexError:
                     raise self.error(token, 'Empty block tag on line %d' % token.lineno)
                 if command in parse_until:
                     # A matching token has been reached. Return control to
                     # the caller. Put the token back on the token list so the
                     # caller knows where it terminated.
                     self.prepend_token(token)
                     return nodelist
                 # Add the token to the command stack. This is used for error
                 # messages if further parsing fails due to an unclosed block
                 # tag.
                 self.command_stack.append((command, token))
                 # Get the tag callback function from the ones registered with
                 # the parser.
                 try:
                     compile_func = self.tags[command]
                 except KeyError:
                     self.invalid_block_tag(token, command, parse_until)
                 # Compile the callback into a node object and add it to
                 # the node list.
                 try:
                     compiled_result = compile_func(self, token)
                 except Exception as e:
                     raise self.error(token, e)
                 self.extend_nodelist(nodelist, compiled_result, token)
                 # Compile success. Remove the token from the command stack.
                 self.command_stack.pop()
 }}
 
 二つのことが行われています。
 
 +コマンドに対するタグオブジェクトを取得して呼び出す。結果をNodeListに加える
 +終わりのコマンドに来たら処理を終了してreturnする。見た感じ、1つ目の呼び出しがparseを再帰呼び出ししている雰囲気
 
 というわけでここからは各論です。
 
 *タグライブラリ [#t6672795]
 
 **タグの読み込み [#ree379ba]
 
 チュートリアルのテンプレート例ではifやforが使われていました。言わずもがなこれらはビルトインのタグです。これらがいつ準備されてか確認していきましょう。逆算的にまずはParserのコンストラクタ
 
 #code(Python){{
     def __init__(self, tokens, libraries=None, builtins=None, origin=None):
         self.tokens = tokens
         self.tags = {}
         self.filters = {}
         self.command_stack = []
 
         if libraries is None:
             libraries = {}
         if builtins is None:
             builtins = []
 
         self.libraries = libraries
         for builtin in builtins:
             self.add_library(builtin)
         self.origin = origin
 }}
 
 ややこしいですがadd_libraryメソッドでtagsへの追加が行われています。
 
 #code(Python){{
     def add_library(self, lib):
         self.tags.update(lib.tags)
         self.filters.update(lib.filters)
 }}
 
 次にParserを作っている部分。初めの方に見たdjango.template.base.Templateのcompile_nodelistメソッドです。
 
 #code(Python){{
         parser = Parser(
             tokens, self.engine.template_libraries, self.engine.template_builtins,
             self.origin,
         )
 }}
 
 engineは、django.template.loaders.base.Loaderから渡されています。
 
 #code(Python){{
                 return Template(
                     contents, origin, origin.template_name, self.engine,
                 )
 }}
 
 今度はLoaderにengineを渡した相手を探します。前回飛ばしたところ、Loaderオブジェクトを生成しているところで渡されています（ここでのselfはdjango.template.engine.Engineです）
 
 #code(Python){{
     def find_template_loader(self, loader):
         if isinstance(loader, (tuple, list)):
             args = list(loader[1:])
             loader = loader[0]
         else:
             args = []
 
         if isinstance(loader, six.string_types):
             loader_class = import_string(loader)
             return loader_class(self, *args)
         else:
             raise ImproperlyConfigured(
                 "Invalid value in template loaders configuration: %r" % loader)
 }}
 
 というわけで、engineとはdjango.template.engine.Engineです。
 
 template_builtinsに設定されているものを確認。
 
 #code(Python){{
 class Engine(object):
     default_builtins = [
         'django.template.defaulttags',
         'django.template.defaultfilters',
         'django.template.loader_tags',
     ]
 
     def __init__(self, dirs=None, app_dirs=False, context_processors=None,
                  debug=False, loaders=None, string_if_invalid='',
                  file_charset='utf-8', libraries=None, builtins=None, autoescape=True):
         # 省略
         if builtins is None:
             builtins = []
 
         # 省略
         self.builtins = self.default_builtins + builtins
         self.template_builtins = self.get_template_builtins(self.builtins)
 }}
 
 get_template_builtinsはimportしてるだけなので、django.template.defaulttagsを見てみましょう。
 
 **タグライブラリの仕組み [#z8f614f7]
 
 defaulttags.py内の前半はタグに対応していると思われるノードのクラス定義、後半がタグ処理関数の定義です。処理関数の方を確認
 
 #code(Python){{
 @register.tag('if')
 def do_if(parser, token):
     # {% if ... %}
     bits = token.split_contents()[1:]
     condition = TemplateIfParser(parser, bits).parse()
     nodelist = parser.parse(('elif', 'else', 'endif'))
     conditions_nodelists = [(condition, nodelist)]
     token = parser.next_token()
 
     # {% elif ... %} (repeatable)
     # 省略
 
     # {% else %} (optional)
     if token.contents == 'else':
         nodelist = parser.parse(('endif',))
         conditions_nodelists.append((None, nodelist))
         token = parser.next_token()
 
     # {% endif %}
     assert token.contents == 'endif'
 
     return IfNode(conditions_nodelists)
 }}
 
 TemplateIfParserの中に立ち入ると長くなるので省略します。ともかくこれで以下のようなnodelistができることになります。
 
  [
    (ifに書かれている式を表すオブジェクト, ifがTrueの場合に使われるNodeList),
    (None, elseの場合に使われるNodeList)
  ]
 
 なお、@registerのregisterはdjango.template.libraryのLibraryオブジェクトです。
 
 #code(Python){{
 register = Library()
 }}
 
 Libraryクラスのtagメソッド
 
 #code(Python){{
     def tag(self, name=None, compile_function=None):
         if name is None and compile_function is None:
             # @register.tag()
             return self.tag_function
         elif name is not None and compile_function is None:
             if callable(name):
                 # @register.tag
                 return self.tag_function(name)
             else:
                 # @register.tag('somename') or @register.tag(name='somename')
                 def dec(func):
                     return self.tag(name, func)
                 return dec
         elif name is not None and compile_function is not None:
             # register.tag('somename', somefunc)
             self.tags[name] = compile_function
             return compile_function
         else:
             # 省略
 }}
 
 今回の場合、nameがnot Noneでstr、compile_functionがNoneです。つまり、
 
 +elifの1つ目、さらにelseに進み関数内関数のdecが返される。これはただの関数呼び出しです
 +デコレータの機能としてdecが呼び出される。tagメソッドが2引数で実行される
 +elifの2つ目の部分が実行されtagsに登録される
 
 という動作をします。
 
 *おわりに [#l3167c17]
 
 さて、テンプレートファイルの読み込み、解析まで見てきました。これで後はテンプレートのレンダリングを残すのみです。
 というわけで次回に続く。
Django/テンプレートシステムを読む（テンプレートのパース） の変更点

Django/テンプレートシステムを読む（テンプレートのパース）の変更点