# PR-012 - PBS Lexer Byte-Offset Spans ## Briefing Lexer spans are currently tracked as Java `String` character indices. The PBS syntax spec requires stable byte offsets. This PR aligns token/span attribution with byte offsets and keeps diagnostics deterministic. ## Motivation Without byte offsets, diagnostics and downstream attribution diverge on non-ASCII sources, violating the lexical contract. ## Target - `prometeu-frontend-pbs` lexer and span attribution behavior. - Diagnostics and AST attribution consumers that depend on lexer spans. ## Scope - Convert lexer position accounting to UTF-8 byte offsets. - Preserve existing tokenization semantics. - Keep parser/semantics APIs unchanged. ## Method - Introduce byte-accurate cursor accounting in lexer scanning. - Emit token start/end using byte offsets. - Validate compatibility with parser and diagnostics sinks. - Add regression fixtures with non-ASCII source content. ## Acceptance Criteria - All emitted tokens include UTF-8 byte offsets. - Diagnostics from lexer/parser over non-ASCII sources point to correct byte spans. - Existing ASCII tests remain green. - New non-ASCII span tests are added and deterministic. ## Tests - Extend lexer tests with UTF-8 multibyte identifiers/strings. - Add parser span-attribution tests over multibyte source. - Run full `prometeu-frontend-pbs` test suite. ## Non-Goals - Changing token classes or grammar. - Changing message wording policy.