texlive[72373] Master/texmf-dist: jupynotex (24sep24)
commits+karl at tug.org
commits+karl at tug.org
Tue Sep 24 22:31:05 CEST 2024
Revision: 72373
https://tug.org/svn/texlive?view=revision&revision=72373
Author: karl
Date: 2024-09-24 22:31:05 +0200 (Tue, 24 Sep 2024)
Log Message:
-----------
jupynotex (24sep24)
Modified Paths:
--------------
trunk/Master/texmf-dist/doc/latex/jupynotex/README.md
trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.py
trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.sty
Removed Paths:
-------------
trunk/Master/texmf-dist/doc/latex/jupynotex/example/
trunk/Master/texmf-dist/doc/latex/jupynotex/tests/
Modified: trunk/Master/texmf-dist/doc/latex/jupynotex/README.md
===================================================================
--- trunk/Master/texmf-dist/doc/latex/jupynotex/README.md 2024-09-24 20:30:44 UTC (rev 72372)
+++ trunk/Master/texmf-dist/doc/latex/jupynotex/README.md 2024-09-24 20:31:05 UTC (rev 72373)
@@ -13,15 +13,15 @@
All you need to do is include the `jupynotex.py` and `jupynotex.sty` files in your LaTeX project, and use the package from your any of your `.tex` files:
- \usepackage{jupynotex}
+ `\usepackage{jupynotex}`
After that, you can include a whole Jupyter Notebook in your file just specifying it's file name:
- \jupynotex{file_name_for_your_notebook.ipynb}
+ `\jupynotex{file_name_for_your_notebook.ipynb}`
If you do not want to include it completely, you can optionally specify which cells:
- \jupynotex[<which cells>]{sample.ipynb}
+ `\jupynotex[<which cells>]{sample.ipynb}`
The cells specification can be numbers separated by comma, or ranges using dashes (defaulting to first and last if any side is not included).
@@ -48,20 +48,71 @@
`\jupynotex[3,12-]{somenote.ipynb}`
+## Configurations available
+
+The whole package can be configured when included in your project:
+
+ \usepackage[OPTIONS]{jupynotex}
+
+Global options available:
+
+- `output-text-limit=N` where N is a number; it will wrap all outputs that exceed that quantity of columns
+
+Also, each cell(s) can be configured when included in your .tex files:
+
+ `\jupynotex[3, OPTIONS]{yournotebook.ipynb}`
+
+Cell options available:
+
+- `output-image-size=SIZE` where SIZE is a valid .tex size (a number with an unit, e.g. `70mm`); it will set any image in the output of those cells to the indicated size
+
+
## Full Example
Check the `example` directory in this project.
-There you will find an example `notebook.ipynb`, an `example.tex` file that includes cells from that notebook in different ways, and a `build` script.
+There you will find different notebook examples and `.tex` files using them. Also there's a build script to easily run on any of the examples, like:
+ ./build cell_ranges.tex
+
Play with it. Enjoy.
+# Supported cell types
+
+Jupyter has several types of cells, `jupynotex` supports most of those. If you find one that is not supported, please open an issue with an example.
+
+In any case, only the "code" cells are included when processing a notebook (no markdown titles, for example, to make it easy for the developer to find the numbers of cells to include).
+
+Supported cell types in the output:
+
+- `execute_result`: this may have multiple types of information inside; if an image is present, it will be included, otherwise if a latex output is present it will included (directly, so the latex is really parsed later by the LaTeX system, else the plain text will be included (verbatim).
+
+- `stream`: the different text lines will be included (verbatim)
+ result.extend(_verbatimize(x.rstrip() for x in item["text"]))
+
+- `display_data`: the image will be included
+
+- `error`: in this case the Traceback will be parsed, sanitized and included in the output keeping its structure (verbatim)
+
+Two type of images are currently supported (for the case in `execute_result` or `display_data` cell type:
+
+- PNG: used directly
+
+- SVG: converted to PDF (need to have `inkscape` present in the system) and included that
+
+
# Dependencies
-You need Python 3 in your system, and the [tcolorbox](https://ctan.org/pkg/tcolorbox) module in your LaTeX toolbox.
+You need Python 3 in your system, and the following modules in your LaTeX toolbox:
+- [tcolorbox](https://ctan.org/pkg/tcolorbox)
+- [minted](https://www.ctan.org/pkg/minted)
+
+To support SVG images in the notebook, [inkscape](https://inkscape.org/) needs to be installed and in the system's PATH.
+
+
# Feedback & Development
Please open any issue or ask any question [here](https://github.com/facundobatista/jupynotex/issues/new).
Modified: trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.py
===================================================================
--- trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.py 2024-09-24 20:30:44 UTC (rev 72372)
+++ trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.py 2024-09-24 20:31:05 UTC (rev 72373)
@@ -1,64 +1,202 @@
-# Copyright 2020 Facundo Batista
+# Copyright 2020-2024 Facundo Batista
# All Rights Reserved
# Licensed under Apache 2.0
-"""USAGE: jupynote.py notebook.ipynb cells
+"""Convert a jupyter notebook into latex for inclusion in documents."""
- cells is a string with which cells to include, separate groups
- with comma, ranges with dash (with defaults to start and end.
-"""
-
+import argparse
import base64
import json
+import pathlib
+import re
+import subprocess
import sys
import tempfile
+import textwrap
import traceback
+# message to help people to report potential problems
+REPORT_MSG = """
-def _verbatimize(lines):
+Please report the issue in
+https://github.com/facundobatista/jupynotex/issues/new
+including the latex log. Thanks!
+"""
+
+# basic verbatim start/end
+VERBATIM_BEGIN = [r"\begin{footnotesize}", r"\begin{verbatim}"]
+VERBATIM_END = [r"\end{verbatim}", r"\end{footnotesize}"]
+
+# highlighers for different languages (block beginning and ending)
+HIGHLIGHTERS = {
+ 'python': ([r'\begin{minted}[fontsize=\footnotesize]{python}'], [r'\end{minted}']),
+ None: (VERBATIM_BEGIN, VERBATIM_END),
+}
+
+# the different formats to be used when error or all ok
+FORMAT_ERROR = r"colback=red!5!white,colframe=red!75!"
+FORMAT_OK = (
+ r"coltitle=red!75!black, colbacktitle=black!10!white, "
+ r"halign title=right, fonttitle=\sffamily\mdseries\scshape\footnotesize")
+
+# a little mark to put in the continuation line(s) when text is wrapped
+WRAP_MARK = "↳"
+
+# the options available for command line
+CMDLINE_OPTION_NAMES = {
+ "output-text-limit": "The column limit for the output text of a cell",
+}
+
+
+def _validator_positive_int(value):
+ """Validate value is a positive integer."""
+ value = value.strip()
+ if not value:
+ return
+
+ value = int(value)
+ if value <= 0:
+ raise ValueError("Value must be greater than zero.")
+ return value
+
+
+def _process_plain_text(lines, config_options=None):
"""Wrap a series of lines around a verbatim indication."""
- result = [r"\begin{verbatim}"]
+ if config_options is None:
+ config_options = {}
+
+ result = []
+ result.extend(VERBATIM_BEGIN)
for line in lines:
- result.append(line.rstrip())
- result.append(r"\end{verbatim}")
+ line = line.rstrip()
+
+ # clean color escape codes (\u001b plus \[Nm where N are one or more digits)
+ line = re.sub(r"\x1b\[[\d;]+m", "", line)
+
+ # split too long lines
+ limit = config_options.get("output-text-limit")
+ if limit and line:
+ firstline, *restlines = textwrap.wrap(line, limit)
+ lines = [firstline]
+ for line in restlines:
+ lines.append(f" {WRAP_MARK} {line}")
+ else:
+ lines = [line]
+
+ result.extend(lines)
+ result.extend(VERBATIM_END)
return result
-def _save_content(data):
- """Save the received b64encoded data to a temp file."""
- _, fname = tempfile.mkstemp(suffix='.png')
- with open(fname, 'wb') as fh:
- fh.write(base64.b64decode(data))
- return fname
+class ItemProcessor:
+ """Process each item according to its type with a (series of) function(s)."""
+ def __init__(self, cell_options, config_options):
+ self.cell_options = cell_options
+ self.config_options = config_options
+ def get_item_data(self, item):
+ """Extract item information using different processors."""
+
+ data = item['data']
+ for mimetype, *functions in self.PROCESSORS:
+ if mimetype in data:
+ content = data[mimetype]
+ break
+ else:
+ raise ValueError("Image type not supported: {}".format(data.keys()))
+
+ for func in functions:
+ content = func(self, content)
+
+ return content
+
+ def process_plain_text(self, lines):
+ """Process plain text."""
+ return _process_plain_text(lines, self.config_options)
+
+ def process_png(self, image_data):
+ """Process a PNG: just save the received b64encoded data to a temp file."""
+ _, fname = tempfile.mkstemp(suffix='.png')
+ with open(fname, 'wb') as fh:
+ fh.write(base64.b64decode(image_data))
+ return fname
+
+ def process_svg(self, image_data):
+ """Process a SVG: save the data, transform to PDF, and then use that."""
+ _, svg_fname = tempfile.mkstemp(suffix='.svg')
+ _, pdf_fname = tempfile.mkstemp(suffix='.pdf')
+ raw_svg = ''.join(image_data).encode('utf8')
+ with open(svg_fname, 'wb') as fh:
+ fh.write(raw_svg)
+
+ cmd = ['inkscape', '--export-text-to-path', '--export-pdf={}'.format(pdf_fname), svg_fname]
+ subprocess.run(cmd)
+
+ return pdf_fname
+
+ def include_graphics(self, fname):
+ """Wrap a filename in an includegraphics structure."""
+ fname_no_backslashes = fname.replace("\\", "/") # do not leave backslashes in Windows
+ width = self.cell_options.get("output-image-size", r"1\textwidth")
+ return r"\includegraphics[width={}]{{{}}}".format(width, fname_no_backslashes)
+
+ def listwrap(self, item):
+ """Wrap an item in a list for processors that return that single item."""
+ return [item]
+
+ # mimetype and list of functions to apply; order is important here as we want to
+ # prioritize getting some mimetypes over others when multiple are present
+ PROCESSORS = [
+ ('text/latex',),
+ ('image/svg+xml', process_svg, include_graphics, listwrap),
+ ('image/png', process_png, include_graphics, listwrap),
+ ('text/plain', process_plain_text),
+ ]
+
+
class Notebook:
"""The notebook converter to latex."""
- def __init__(self, path):
- with open(path, 'rt', encoding='utf8') as fh:
- nb_data = json.load(fh)
+ GLOBAL_CONFIGS = {
+ "output-text-limit": _validator_positive_int,
+ }
- self._cells = nb_data['cells']
+ def __init__(self, notebook_path, config_options):
+ self.config_options = self._validate_config(config_options)
+ self.cell_options = {}
+ nb_data = json.loads(notebook_path.read_text())
- def __len__(self):
- return len(self._cells)
+ # get the languaje, to highlight
+ lang = nb_data['metadata']['language_info']['name']
+ self._highlight_delimiters = HIGHLIGHTERS.get(lang, HIGHLIGHTERS[None])
+ # get all cells excluding markdown ones
+ self._cells = [x for x in nb_data['cells'] if x['cell_type'] != 'markdown']
+
+ def _validate_config(self, config):
+ """Validate received configuration."""
+ for key, value in list(config.items()):
+ validator = self.GLOBAL_CONFIGS[key]
+ new_value = validator(value)
+ config[key] = new_value
+ return config
+
def _proc_src(self, content):
"""Process the source of a cell."""
source = content['source']
result = []
if content['cell_type'] == 'code':
- result.extend(_verbatimize(source))
- elif content['cell_type'] == 'markdown':
- # XXX: maybe we could parse this?
- result.extend(_verbatimize(source))
+ begin, end = self._highlight_delimiters
+ result.extend(begin)
+ result.extend(line.rstrip() for line in source)
+ result.extend(end)
else:
raise ValueError(
"Cell type not supported when processing source: {!r}".format(
content['cell_type']))
- return '\n'.join(result) + '\n'
+ return '\n'.join(result)
def _proc_out(self, content):
"""Process the output of a cell."""
@@ -67,27 +205,30 @@
return
result = []
+ processor = ItemProcessor(self.cell_options, self.config_options)
for item in outputs:
output_type = item['output_type']
- if output_type == 'execute_result':
- data = item['data']
- if 'image/png' in data:
- fname = _save_content(data['image/png'])
- result.append(r"\includegraphics{{{}}}".format(fname))
- elif 'text/latex' in data:
- result.extend(data["text/latex"])
- else:
- result.extend(_verbatimize(data["text/plain"]))
+ if output_type in ('execute_result', 'display_data'):
+ more_content = processor.get_item_data(item)
elif output_type == 'stream':
- result.extend(_verbatimize(x.rstrip() for x in item["text"]))
- elif output_type == 'display_data':
- data = item['data']
- fname = _save_content(data['image/png'])
- result.append(r"\includegraphics{{{}}}".format(fname))
+ more_content = processor.process_plain_text(item["text"])
+ elif output_type == 'error':
+ raw_traceback = item['traceback']
+ tback_lines = []
+ for raw_line in raw_traceback:
+ internal_lines = raw_line.split('\n')
+ for line in internal_lines:
+ line = re.sub(r"\x1b\[\d.*?m", "", line) # sanitize
+ if set(line) == {'-'}:
+ # ignore separator, as our graphical box already has one
+ continue
+ tback_lines.append(line)
+ more_content = processor.process_plain_text(tback_lines)
else:
raise ValueError("Output type not supported in item {!r}".format(item))
+ result.extend(more_content)
- return '\n'.join(result) + '\n'
+ return '\n'.join(result)
def get(self, cell_idx):
"""Return the content from a specific cell in the notebook.
@@ -99,58 +240,71 @@
output = self._proc_out(content)
return source, output
+ def parse_cells(self, spec):
+ """Convert the cells spec to a range of ints."""
+ if not spec:
+ raise ValueError("Empty cells spec not allowed")
-def _parse_cells(spec, maxlen):
- """Convert the cells spec to a range of ints."""
- if not spec:
- raise ValueError("Empty cells spec not allowed")
- if set(spec) - set('0123456789-,'):
- raise ValueError(
- "Found forbidden characters in cells definition (allowed digits, '-' and ',')")
+ maxlen = len(self._cells)
- cells = set()
- groups = spec.split(',')
- for group in groups:
- if '-' in group:
- cfrom, cto = group.split('-')
- cfrom = 1 if cfrom == '' else int(cfrom)
- cto = maxlen if cto == '' else int(cto)
- if cfrom >= cto:
+ cells = set()
+ options = {}
+ groups = [x.strip() for x in spec.split(',')]
+ valid_chars = set('0123456789-,')
+ for group in groups:
+ if '=' in group:
+ k, v = group.split("=", maxsplit=1)
+ options[k] = v
+ continue
+
+ if set(group) - valid_chars:
raise ValueError(
- "Range 'from' need to be smaller than 'to' (got {!r})".format(group))
- cells.update(range(cfrom, cto + 1))
- else:
- cells.add(int(group))
- cells = sorted(cells)
+ "Found forbidden characters in cells definition (allowed digits, '-' and ',')")
- if any(x < 1 for x in cells):
- raise ValueError("Cells need to be >=1")
- if maxlen < cells[-1]:
- raise ValueError(
- "Notebook loaded of len {}, smaller than requested cells: {}".format(maxlen, cells))
+ if '-' in group:
+ cfrom, cto = group.split('-')
+ cfrom = 1 if cfrom == '' else int(cfrom)
+ cto = maxlen if cto == '' else int(cto)
+ if cfrom >= cto:
+ raise ValueError(
+ "Range 'from' need to be smaller than 'to' (got {!r})".format(group))
+ cells.update(range(cfrom, cto + 1))
+ else:
+ cells.add(int(group))
+ cells = sorted(cells)
- return cells
+ if any(x < 1 for x in cells):
+ raise ValueError("Cells need to be >=1")
+ if maxlen < cells[-1]:
+ raise ValueError(
+ f"Notebook loaded of len {maxlen}, smaller than requested cells: {cells}")
+ self.cell_options = options
+ return cells
-def main(notebook_path, cells_spec):
+
+def main(notebook_path, cells_spec, config_options):
"""Main entry point."""
- nb = Notebook(notebook_path)
- cells = _parse_cells(cells_spec, len(nb))
+ nb = Notebook(notebook_path, config_options)
+ cells = nb.parse_cells(cells_spec)
for cell in cells:
try:
src, out = nb.get(cell)
- except Exception:
+ except Exception as exc:
title = "ERROR when parsing cell {}".format(cell)
- print(
- r"\begin{{tcolorbox}}"
- r"[colback=red!5!white,colframe=red!75!,title={{{}}}]".format(title))
+ print(r"\begin{{tcolorbox}}[{}, title={{{}}}]".format(FORMAT_ERROR, title))
+ print(exc)
+ _parts = _process_plain_text(REPORT_MSG.split('\n'))
+ print('\n'.join(_parts))
+ print(r"\end{tcolorbox}")
+
+ # send title and traceback to stderr, which will appear in compilation log
tb = traceback.format_exc()
- print('\n'.join(_verbatimize(tb.split('\n'))))
- print(r"\end{tcolorbox}")
+ print(tb, file=sys.stderr)
continue
- print(r"\begin{{tcolorbox}}[title=Cell {{{:02d}}}]".format(cell))
+ print(r"\begin{{tcolorbox}}[{}, title=Cell {{{:02d}}}]".format(FORMAT_OK, cell))
print(src)
if out:
print(r"\tcblower")
@@ -159,8 +313,19 @@
if __name__ == "__main__":
- if len(sys.argv) != 3:
- print(__doc__)
- exit()
+ parser = argparse.ArgumentParser()
+ parser.add_argument("notebook_path", type=pathlib.Path, help="The path to the notebook.")
+ parser.add_argument(
+ "cells_spec",
+ type=str,
+ help=(
+ "A string specifying which cells to include; use comma to separate groups, "
+ "dash for ranges (with defaults to start and end)"
+ )
+ )
+ for option, explanation in CMDLINE_OPTION_NAMES.items():
+ parser.add_argument(option, type=str, help=explanation)
+ args = parser.parse_args()
- main(*sys.argv[1:3])
+ config_options = {option: getattr(args, option) for option in CMDLINE_OPTION_NAMES}
+ main(args.notebook_path, args.cells_spec, config_options)
Modified: trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.sty
===================================================================
--- trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.sty 2024-09-24 20:30:44 UTC (rev 72372)
+++ trunk/Master/texmf-dist/tex/latex/jupynotex/jupynotex.sty 2024-09-24 20:31:05 UTC (rev 72373)
@@ -1,7 +1,21 @@
-\ProvidesPackage{jupynotex}[0.1]
+\ProvidesPackage{jupynotex}[1.0]
\usepackage{tcolorbox}
+\usepackage{pgfopts}
+\newcommand*\jupynotex at outputtextlimit@value{}
+
+
+\pgfkeys{
+ /jupynotex/.cd ,
+ output-text-limit/.store in=\jupynotex at outputtextlimit@value
+}
+
+\ProcessPgfPackageOptions{/jupynotex}
+
\newcommand{\jupynotex}[2][-]{
- \input|"python3 jupynotex.py #2 #1"
+ \input|"python3 jupynotex.py '#2' '#1' '\jupynotex at outputtextlimit@value'"
}
+
+\endinput
+
More information about the tex-live-commits
mailing list.