Metadata-Version: 2.4 Name: rfc3987-syntax Version: 1.1.0 Summary: Helper functions to syntactically validate strings according to RFC 3987. Project-URL: Homepage, https://github.com/willynilly/rfc3987-syntax Project-URL: Documentation, https://github.com/willynilly/rfc3987-syntax#readme Project-URL: Issues, https://github.com/willynilly/rfc3987-syntax/issues Project-URL: Source, https://github.com/willynilly/rfc3987-syntax Author: Jan Kowalleck Author-email: Will Riley License-Expression: MIT License-File: LICENSE Keywords: RFC 3987,RFC3987,parser,syntax,validator Classifier: Development Status :: 3 - Alpha Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Education Classifier: Intended Audience :: Information Technology Classifier: Intended Audience :: Science/Research Classifier: Intended Audience :: System Administrators Classifier: License :: OSI Approved :: Apache Software License Classifier: Natural Language :: English Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.9 Classifier: Topic :: Scientific/Engineering Classifier: Topic :: Software Development Classifier: Topic :: Utilities Requires-Python: >=3.9 Requires-Dist: lark>=1.2.2 Provides-Extra: testing Requires-Dist: pytest>=8.3.5; extra == 'testing' Description-Content-Type: text/markdown # rfc3987-syntax Helper functions to parse and validate the **syntax** of terms defined in **[RFC 3987](https://www.rfc-editor.org/info/rfc3987)** โ€” the IETF standard for Internationalized Resource Identifiers (IRIs). ## ๐ŸŽฏ Purpose The goal of `rfc3987-syntax` is to provide a **lightweight, permissively licensed Python module** for validating that strings conform to the **ABNF grammar defined in RFC 3987**. These helpers are: - โœ… Strictly aligned with the **syntax rules of RFC 3987** - โœ… Built using a **permissive MIT license** - โœ… Designed for both **open source and proprietary use** - โœ… Powered by [Lark](https://github.com/lark-parser/lark), a fast, EBNF-based parser > ๐Ÿง  **Note:** This project focuses on **syntax validation only**. RFC 3987 specifies **additional semantic rules** (e.g., Unicode normalization, BiDi constraints, percent-encoding requirements) that must be enforced separately. ## ๐Ÿ“„ License, Attribution, and Citation **`rfc3987-syntax`** is licensed under the [MIT License](LICENSE), which allows reuse in both open source and commercial software. This project: - โŒ Does **not** depend on the `rfc3987` Python package (GPL-licensed) - โœ… Uses [`lark`](https://github.com/lark-parser/lark), licensed under MIT - โœ… Implements grammar from **[RFC 3987](https://datatracker.ietf.org/doc/html/rfc3987)**, using **[RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986)** where RFC 3987 delegates syntax > โš ๏ธ This project is **not affiliated with or endorsed by** the authors of RFC 3987 or the `rfc3987` Python package. Please cite this software in accordance with the enclosed CITATION.cff file. ## โš ๏ธ Limitations The grammar and parser enforce **only the ABNF syntax** defined in RFC 3987. The following are **not validated** and must be handled separately for full compliance: - โœ… Unicode **Normalization Form C (NFC)** - โœ… Bidirectional text (**BiDi**) constraints (RFC 3987 ยง4.1) - โœ… **Port number ranges** (must be 0โ€“65535) - โœ… Valid **IPv6 compression** (only one `::`, max segments) - โœ… Context-aware **percent-encoding** requirements ChatGPT 40 was used during the original development process. Errors may exist due to this assistance. Additional review, testing, and bug fixes by human experts is welcome. ## ๐Ÿ“ฆ Installation ```bash pip install rfc3987-syntax ``` ## ๐Ÿ›  Usage ### List all supported "terms" (i.e., non-terminals and terminals within ABNF production rules) used to validate the syntax of an IRI according to RFC 3987 ```python from rfc3987_syntax import RFC3987_SYNTAX_TERMS print("Supported terms:") for term in RFC3987_SYNTAX_TERMS: print(term) ``` ### Syntactically validate a string using the general-purpose validator ```python from rfc3987_syntax import is_valid_syntax if is_valid_syntax(term='iri', value='http://github.com'): print("โœ“ Valid IRI syntax") if not is_valid_syntax(term='iri', value='bob'): print("โœ— Invalid IRI syntax") if not is_valid_syntax(term='iri_reference', value='bob'): print("โœ“ Valid IRI-reference syntax") ``` ### Alternatively, use term-specific helpers to validate RFC 3987 syntax. ```python from rfc3987_syntax import is_valid_syntax_iri from rfc3987_syntax import is_valid_syntax_iri_reference if is_valid_syntax_iri('http://github.com'): print("โœ“ Valid IRI syntax") if not is_valid_syntax_iri('bob'): print("โœ— Invalid IRI syntax") if is_valid_syntax_iri_reference('bob'): print("โœ“ Valid IRI-reference syntax") ``` ### Get the Lark parse tree for a syntax validation (useful for additional semantic validation) ```python from rfc3987_syntax import parse ptree: ParseTree = parse(term="iri", value="http://github.com") print(ptree) ``` ## ๐Ÿ“š Sources This grammar was derived from: - **[RFC 3987 โ€“ Internationalized Resource Identifiers (IRIs)]** โ†’ Defines IRI syntax and extensions to URI (e.g. Unicode characters, `ucschar`) โ†’ https://datatracker.ietf.org/doc/html/rfc3987 - **[RFC 3986 โ€“ Uniform Resource Identifier (URI): Generic Syntax)]** โ†’ Provides reusable components like `scheme`, `authority`, `ipv4address`, etc. โ†’ https://datatracker.ietf.org/doc/html/rfc3986 > ๐Ÿ“ When `RFC 3986` is listed as the source, it is **used in accordance with RFC 3987**, which explicitly references it for foundational elements. ### Rule-to-Source Mapping | Rule/Component | Source | Notes | |----------------------|------------|-------| | `iri` | RFC 3987 | Top-level IRI rule | | `iri_reference` | RFC 3987 | Top-level IRI Reference rule | | `absolute_iri` | RFC 3987 | Top-level Absolute IRI rule | | `scheme` | RFC 3986 | Referenced by RFC 3987 ยง2.2 | | `ihier_part` | RFC 3987 | IRI-specific hierarchy | | `irelative_ref` | RFC 3987 | IRI-specific relative ref | | `irelative_part` | RFC 3987 | IRI-specific relative part | | `iauthority` | RFC 3986 | Standard URI authority | | `ipath_abempty` | RFC 3986 | Path format variant | | `ipath_absolute` | RFC 3986 | Absolute path | | `ipath_noscheme` | RFC 3986 | Path disallowing scheme prefix | | `ipath_rootless` | RFC 3986 | Used in non-scheme contexts | | `iquery` | RFC 3987 | Query extension to URI | | `ifragment` | RFC 3987 | Fragment extension to URI | | `ipchar`, `isegment` | RFC 3986 | Path characters and segments | | `isegment_nz_nc` | RFC 3987 | IRI-specific path constraint | | `iunreserved` | RFC 3987 | Includes `ucschar` | | `ucschar`, `iprivate`| RFC 3987 | Unicode support | | `sub_delims` | RFC 3986 | Reserved characters | | `ip_literal` | RFC 3986 | IPv6 or IPvFuture in `[]` | | `ipv6address` | RFC 3986 | Expanded forms only | | `ipvfuture` | RFC 3986 | Forward-compatible | | `ipv4address` | RFC 3986 | Dotted-decimal IPv4 | | `ls32` | RFC 3986 | Final 32 bits of IPv6 | | `h16`, `dec_octet` | RFC 3986 | Hex and decimal chunks | | `port` | RFC 3986 | Optional numeric | | `pct_encoded` | RFC 3986 | Percent encoding (e.g. `%20`) | | `alpha`, `digit`, `hexdig` | RFC 3986 | Character classes |