texlive[72373] Master/texmf-dist: jupynotex (24sep24)

commits+karl at tug.org commits+karl at tug.org
Tue Sep 24 22:31:05 CEST 2024


Revision: 72373
          https://tug.org/svn/texlive?view=revision&revision=72373
Author:   karl
Date:     2024-09-24 22:31:05 +0200 (Tue, 24 Sep 2024)
Log Message:
-----------
jupynotex (24sep24)

Modified Paths:
--------------
    trunk/Master/texmf-dist/doc/latex/jupynotex/README.md
    trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.py
    trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.sty

Removed Paths:
-------------
    trunk/Master/texmf-dist/doc/latex/jupynotex/example/
    trunk/Master/texmf-dist/doc/latex/jupynotex/tests/

Modified: trunk/Master/texmf-dist/doc/latex/jupynotex/README.md
===================================================================
--- trunk/Master/texmf-dist/doc/latex/jupynotex/README.md	2024-09-24 20:30:44 UTC (rev 72372)
+++ trunk/Master/texmf-dist/doc/latex/jupynotex/README.md	2024-09-24 20:31:05 UTC (rev 72373)
@@ -13,15 +13,15 @@
 
 All you need to do is include the `jupynotex.py` and `jupynotex.sty` files in your LaTeX project, and use the package from your any of your `.tex` files:
 
-    \usepackage{jupynotex}
+    `\usepackage{jupynotex}`
 
 After that, you can include a whole Jupyter Notebook in your file just specifying it's file name:
 
-    \jupynotex{file_name_for_your_notebook.ipynb}
+    `\jupynotex{file_name_for_your_notebook.ipynb}`
 
 If you do not want to include it completely, you can optionally specify which cells:
 
-    \jupynotex[<which cells>]{sample.ipynb}
+    `\jupynotex[<which cells>]{sample.ipynb}`
 
 The cells specification can be numbers separated by comma, or ranges using dashes (defaulting to first and last if any side is not included).
 
@@ -48,20 +48,71 @@
     `\jupynotex[3,12-]{somenote.ipynb}`
 
 
+## Configurations available
+
+The whole package can be configured when included in your project:
+
+    \usepackage[OPTIONS]{jupynotex}
+
+Global options available:
+
+- `output-text-limit=N` where N is a number; it will wrap all outputs that exceed that quantity of columns
+
+Also, each cell(s) can be configured when included in your .tex files:
+
+    `\jupynotex[3, OPTIONS]{yournotebook.ipynb}`
+
+Cell options available:
+
+- `output-image-size=SIZE` where SIZE is a valid .tex size (a number with an unit, e.g. `70mm`); it will set any image in the output of those cells to the indicated size
+
+
 ## Full Example
 
 Check the `example` directory in this project.
 
-There you will find an example `notebook.ipynb`, an `example.tex` file that includes cells from that notebook in different ways, and a `build` script.
+There you will find different notebook examples and `.tex` files using them. Also there's a build script to easily run on any of the examples, like:
 
+    ./build cell_ranges.tex
+
 Play with it. Enjoy.
 
 
+# Supported cell types
+
+Jupyter has several types of cells, `jupynotex` supports most of those. If you find one that is not supported, please open an issue with an example.
+
+In any case, only the "code" cells are included when processing a notebook (no markdown titles, for example, to make it easy for the developer to find the numbers of cells to include).
+
+Supported cell types in the output:
+
+- `execute_result`: this may have multiple types of information inside; if an image is present, it will be included, otherwise if a latex output is present it will included (directly, so the latex is really parsed later by the LaTeX system, else the plain text will be included (verbatim).
+
+- `stream`: the different text lines will be included (verbatim)
+                result.extend(_verbatimize(x.rstrip() for x in item["text"]))
+
+- `display_data`: the image will be included
+
+- `error`: in this case the Traceback will be parsed, sanitized and included in the output keeping its structure (verbatim)
+
+Two type of images are currently supported (for the case in `execute_result` or `display_data` cell type:
+
+- PNG: used directly
+
+- SVG: converted to PDF (need to have `inkscape` present in the system) and included that
+
+
 # Dependencies
 
-You need Python 3 in your system, and the [tcolorbox](https://ctan.org/pkg/tcolorbox) module in your LaTeX toolbox.
+You need Python 3 in your system, and the following modules in your LaTeX toolbox:
 
+- [tcolorbox](https://ctan.org/pkg/tcolorbox)
 
+- [minted](https://www.ctan.org/pkg/minted)
+
+To support SVG images in the notebook, [inkscape](https://inkscape.org/) needs to be installed and in the system's PATH.
+
+
 # Feedback & Development
 
 Please open any issue or ask any question [here](https://github.com/facundobatista/jupynotex/issues/new).

Modified: trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.py
===================================================================
--- trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.py	2024-09-24 20:30:44 UTC (rev 72372)
+++ trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.py	2024-09-24 20:31:05 UTC (rev 72373)
@@ -1,64 +1,202 @@
-# Copyright 2020 Facundo Batista
+# Copyright 2020-2024 Facundo Batista
 # All Rights Reserved
 # Licensed under Apache 2.0
 
-"""USAGE: jupynote.py notebook.ipynb cells
+"""Convert a jupyter notebook into latex for inclusion in documents."""
 
-    cells is a string with which cells to include, separate groups
-    with comma, ranges with dash (with defaults to start and end.
-"""
-
+import argparse
 import base64
 import json
+import pathlib
+import re
+import subprocess
 import sys
 import tempfile
+import textwrap
 import traceback
 
+# message to help people to report potential problems
+REPORT_MSG = """
 
-def _verbatimize(lines):
+Please report the issue in
+https://github.com/facundobatista/jupynotex/issues/new
+including the latex log. Thanks!
+"""
+
+# basic verbatim start/end
+VERBATIM_BEGIN = [r"\begin{footnotesize}", r"\begin{verbatim}"]
+VERBATIM_END = [r"\end{verbatim}", r"\end{footnotesize}"]
+
+# highlighers for different languages (block beginning and ending)
+HIGHLIGHTERS = {
+    'python': ([r'\begin{minted}[fontsize=\footnotesize]{python}'], [r'\end{minted}']),
+    None: (VERBATIM_BEGIN, VERBATIM_END),
+}
+
+# the different formats to be used when error or all ok
+FORMAT_ERROR = r"colback=red!5!white,colframe=red!75!"
+FORMAT_OK = (
+    r"coltitle=red!75!black, colbacktitle=black!10!white, "
+    r"halign title=right, fonttitle=\sffamily\mdseries\scshape\footnotesize")
+
+# a little mark to put in the continuation line(s) when text is wrapped
+WRAP_MARK = "↳"
+
+# the options available for command line
+CMDLINE_OPTION_NAMES = {
+    "output-text-limit": "The column limit for the output text of a cell",
+}
+
+
+def _validator_positive_int(value):
+    """Validate value is a positive integer."""
+    value = value.strip()
+    if not value:
+        return
+
+    value = int(value)
+    if value <= 0:
+        raise ValueError("Value must be greater than zero.")
+    return value
+
+
+def _process_plain_text(lines, config_options=None):
     """Wrap a series of lines around a verbatim indication."""
-    result = [r"\begin{verbatim}"]
+    if config_options is None:
+        config_options = {}
+
+    result = []
+    result.extend(VERBATIM_BEGIN)
     for line in lines:
-        result.append(line.rstrip())
-    result.append(r"\end{verbatim}")
+        line = line.rstrip()
+
+        # clean color escape codes (\u001b plus \[Nm where N are one or more digits)
+        line = re.sub(r"\x1b\[[\d;]+m", "", line)
+
+        # split too long lines
+        limit = config_options.get("output-text-limit")
+        if limit and line:
+            firstline, *restlines = textwrap.wrap(line, limit)
+            lines = [firstline]
+            for line in restlines:
+                lines.append(f"    {WRAP_MARK} {line}")
+        else:
+            lines = [line]
+
+        result.extend(lines)
+    result.extend(VERBATIM_END)
     return result
 
 
-def _save_content(data):
-    """Save the received b64encoded data to a temp file."""
-    _, fname = tempfile.mkstemp(suffix='.png')
-    with open(fname, 'wb') as fh:
-        fh.write(base64.b64decode(data))
-    return fname
+class ItemProcessor:
+    """Process each item according to its type with a (series of) function(s)."""
 
+    def __init__(self, cell_options, config_options):
+        self.cell_options = cell_options
+        self.config_options = config_options
 
+    def get_item_data(self, item):
+        """Extract item information using different processors."""
+
+        data = item['data']
+        for mimetype, *functions in self.PROCESSORS:
+            if mimetype in data:
+                content = data[mimetype]
+                break
+        else:
+            raise ValueError("Image type not supported: {}".format(data.keys()))
+
+        for func in functions:
+            content = func(self, content)
+
+        return content
+
+    def process_plain_text(self, lines):
+        """Process plain text."""
+        return _process_plain_text(lines, self.config_options)
+
+    def process_png(self, image_data):
+        """Process a PNG: just save the received b64encoded data to a temp file."""
+        _, fname = tempfile.mkstemp(suffix='.png')
+        with open(fname, 'wb') as fh:
+            fh.write(base64.b64decode(image_data))
+        return fname
+
+    def process_svg(self, image_data):
+        """Process a SVG: save the data, transform to PDF, and then use that."""
+        _, svg_fname = tempfile.mkstemp(suffix='.svg')
+        _, pdf_fname = tempfile.mkstemp(suffix='.pdf')
+        raw_svg = ''.join(image_data).encode('utf8')
+        with open(svg_fname, 'wb') as fh:
+            fh.write(raw_svg)
+
+        cmd = ['inkscape', '--export-text-to-path', '--export-pdf={}'.format(pdf_fname), svg_fname]
+        subprocess.run(cmd)
+
+        return pdf_fname
+
+    def include_graphics(self, fname):
+        """Wrap a filename in an includegraphics structure."""
+        fname_no_backslashes = fname.replace("\\", "/")  # do not leave backslashes in Windows
+        width = self.cell_options.get("output-image-size", r"1\textwidth")
+        return r"\includegraphics[width={}]{{{}}}".format(width, fname_no_backslashes)
+
+    def listwrap(self, item):
+        """Wrap an item in a list for processors that return that single item."""
+        return [item]
+
+    # mimetype and list of functions to apply; order is important here as we want to
+    # prioritize getting some mimetypes over others when multiple are present
+    PROCESSORS = [
+        ('text/latex',),
+        ('image/svg+xml', process_svg, include_graphics, listwrap),
+        ('image/png', process_png, include_graphics, listwrap),
+        ('text/plain', process_plain_text),
+    ]
+
+
 class Notebook:
     """The notebook converter to latex."""
 
-    def __init__(self, path):
-        with open(path, 'rt', encoding='utf8') as fh:
-            nb_data = json.load(fh)
+    GLOBAL_CONFIGS = {
+        "output-text-limit": _validator_positive_int,
+    }
 
-        self._cells = nb_data['cells']
+    def __init__(self, notebook_path, config_options):
+        self.config_options = self._validate_config(config_options)
+        self.cell_options = {}
+        nb_data = json.loads(notebook_path.read_text())
 
-    def __len__(self):
-        return len(self._cells)
+        # get the languaje, to highlight
+        lang = nb_data['metadata']['language_info']['name']
+        self._highlight_delimiters = HIGHLIGHTERS.get(lang, HIGHLIGHTERS[None])
 
+        # get all cells excluding markdown ones
+        self._cells = [x for x in nb_data['cells'] if x['cell_type'] != 'markdown']
+
+    def _validate_config(self, config):
+        """Validate received configuration."""
+        for key, value in list(config.items()):
+            validator = self.GLOBAL_CONFIGS[key]
+            new_value = validator(value)
+            config[key] = new_value
+        return config
+
     def _proc_src(self, content):
         """Process the source of a cell."""
         source = content['source']
         result = []
         if content['cell_type'] == 'code':
-            result.extend(_verbatimize(source))
-        elif content['cell_type'] == 'markdown':
-            # XXX: maybe we could parse this?
-            result.extend(_verbatimize(source))
+            begin, end = self._highlight_delimiters
+            result.extend(begin)
+            result.extend(line.rstrip() for line in source)
+            result.extend(end)
         else:
             raise ValueError(
                 "Cell type not supported when processing source: {!r}".format(
                     content['cell_type']))
 
-        return '\n'.join(result) + '\n'
+        return '\n'.join(result)
 
     def _proc_out(self, content):
         """Process the output of a cell."""
@@ -67,27 +205,30 @@
             return
 
         result = []
+        processor = ItemProcessor(self.cell_options, self.config_options)
         for item in outputs:
             output_type = item['output_type']
-            if output_type == 'execute_result':
-                data = item['data']
-                if 'image/png' in data:
-                    fname = _save_content(data['image/png'])
-                    result.append(r"\includegraphics{{{}}}".format(fname))
-                elif 'text/latex' in data:
-                    result.extend(data["text/latex"])
-                else:
-                    result.extend(_verbatimize(data["text/plain"]))
+            if output_type in ('execute_result', 'display_data'):
+                more_content = processor.get_item_data(item)
             elif output_type == 'stream':
-                result.extend(_verbatimize(x.rstrip() for x in item["text"]))
-            elif output_type == 'display_data':
-                data = item['data']
-                fname = _save_content(data['image/png'])
-                result.append(r"\includegraphics{{{}}}".format(fname))
+                more_content = processor.process_plain_text(item["text"])
+            elif output_type == 'error':
+                raw_traceback = item['traceback']
+                tback_lines = []
+                for raw_line in raw_traceback:
+                    internal_lines = raw_line.split('\n')
+                    for line in internal_lines:
+                        line = re.sub(r"\x1b\[\d.*?m", "", line)  # sanitize
+                        if set(line) == {'-'}:
+                            # ignore separator, as our graphical box already has one
+                            continue
+                        tback_lines.append(line)
+                more_content = processor.process_plain_text(tback_lines)
             else:
                 raise ValueError("Output type not supported in item {!r}".format(item))
+            result.extend(more_content)
 
-        return '\n'.join(result) + '\n'
+        return '\n'.join(result)
 
     def get(self, cell_idx):
         """Return the content from a specific cell in the notebook.
@@ -99,58 +240,71 @@
         output = self._proc_out(content)
         return source, output
 
+    def parse_cells(self, spec):
+        """Convert the cells spec to a range of ints."""
+        if not spec:
+            raise ValueError("Empty cells spec not allowed")
 
-def _parse_cells(spec, maxlen):
-    """Convert the cells spec to a range of ints."""
-    if not spec:
-        raise ValueError("Empty cells spec not allowed")
-    if set(spec) - set('0123456789-,'):
-        raise ValueError(
-            "Found forbidden characters in cells definition (allowed digits, '-' and ',')")
+        maxlen = len(self._cells)
 
-    cells = set()
-    groups = spec.split(',')
-    for group in groups:
-        if '-' in group:
-            cfrom, cto = group.split('-')
-            cfrom = 1 if cfrom == '' else int(cfrom)
-            cto = maxlen if cto == '' else int(cto)
-            if cfrom >= cto:
+        cells = set()
+        options = {}
+        groups = [x.strip() for x in spec.split(',')]
+        valid_chars = set('0123456789-,')
+        for group in groups:
+            if '=' in group:
+                k, v = group.split("=", maxsplit=1)
+                options[k] = v
+                continue
+
+            if set(group) - valid_chars:
                 raise ValueError(
-                    "Range 'from' need to be smaller than 'to' (got {!r})".format(group))
-            cells.update(range(cfrom, cto + 1))
-        else:
-            cells.add(int(group))
-    cells = sorted(cells)
+                    "Found forbidden characters in cells definition (allowed digits, '-' and ',')")
 
-    if any(x < 1 for x in cells):
-        raise ValueError("Cells need to be >=1")
-    if maxlen < cells[-1]:
-        raise ValueError(
-            "Notebook loaded of len {}, smaller than requested cells: {}".format(maxlen, cells))
+            if '-' in group:
+                cfrom, cto = group.split('-')
+                cfrom = 1 if cfrom == '' else int(cfrom)
+                cto = maxlen if cto == '' else int(cto)
+                if cfrom >= cto:
+                    raise ValueError(
+                        "Range 'from' need to be smaller than 'to' (got {!r})".format(group))
+                cells.update(range(cfrom, cto + 1))
+            else:
+                cells.add(int(group))
+        cells = sorted(cells)
 
-    return cells
+        if any(x < 1 for x in cells):
+            raise ValueError("Cells need to be >=1")
+        if maxlen < cells[-1]:
+            raise ValueError(
+                f"Notebook loaded of len {maxlen}, smaller than requested cells: {cells}")
 
+        self.cell_options = options
+        return cells
 
-def main(notebook_path, cells_spec):
+
+def main(notebook_path, cells_spec, config_options):
     """Main entry point."""
-    nb = Notebook(notebook_path)
-    cells = _parse_cells(cells_spec, len(nb))
+    nb = Notebook(notebook_path, config_options)
+    cells = nb.parse_cells(cells_spec)
 
     for cell in cells:
         try:
             src, out = nb.get(cell)
-        except Exception:
+        except Exception as exc:
             title = "ERROR when parsing cell {}".format(cell)
-            print(
-                r"\begin{{tcolorbox}}"
-                r"[colback=red!5!white,colframe=red!75!,title={{{}}}]".format(title))
+            print(r"\begin{{tcolorbox}}[{}, title={{{}}}]".format(FORMAT_ERROR, title))
+            print(exc)
+            _parts = _process_plain_text(REPORT_MSG.split('\n'))
+            print('\n'.join(_parts))
+            print(r"\end{tcolorbox}")
+
+            # send title and traceback to stderr, which will appear in compilation log
             tb = traceback.format_exc()
-            print('\n'.join(_verbatimize(tb.split('\n'))))
-            print(r"\end{tcolorbox}")
+            print(tb, file=sys.stderr)
             continue
 
-        print(r"\begin{{tcolorbox}}[title=Cell {{{:02d}}}]".format(cell))
+        print(r"\begin{{tcolorbox}}[{}, title=Cell {{{:02d}}}]".format(FORMAT_OK, cell))
         print(src)
         if out:
             print(r"\tcblower")
@@ -159,8 +313,19 @@
 
 
 if __name__ == "__main__":
-    if len(sys.argv) != 3:
-        print(__doc__)
-        exit()
+    parser = argparse.ArgumentParser()
+    parser.add_argument("notebook_path", type=pathlib.Path, help="The path to the notebook.")
+    parser.add_argument(
+        "cells_spec",
+        type=str,
+        help=(
+            "A string specifying which cells to include; use comma to separate groups, "
+            "dash for ranges (with defaults to start and end)"
+        )
+    )
+    for option, explanation in CMDLINE_OPTION_NAMES.items():
+        parser.add_argument(option, type=str, help=explanation)
+    args = parser.parse_args()
 
-    main(*sys.argv[1:3])
+    config_options = {option: getattr(args, option) for option in CMDLINE_OPTION_NAMES}
+    main(args.notebook_path, args.cells_spec, config_options)

Modified: trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.sty
===================================================================
--- trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.sty	2024-09-24 20:30:44 UTC (rev 72372)
+++ trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.sty	2024-09-24 20:31:05 UTC (rev 72373)
@@ -1,7 +1,21 @@
-\ProvidesPackage{jupynotex}[0.1]
+\ProvidesPackage{jupynotex}[1.0]
 
 \usepackage{tcolorbox}
+\usepackage{pgfopts}
 
+\newcommand*\jupynotex at outputtextlimit@value{}
+
+
+\pgfkeys{
+  /jupynotex/.cd ,
+    output-text-limit/.store in=\jupynotex at outputtextlimit@value
+}
+
+\ProcessPgfPackageOptions{/jupynotex}
+
 \newcommand{\jupynotex}[2][-]{
-    \input|"python3 jupynotex.py #2 #1"
+    \input|"python3 jupynotex.py '#2' '#1' '\jupynotex at outputtextlimit@value'"
 }
+
+\endinput
+



More information about the tex-live-commits mailing list.