feisty meow concerns codebase  2.140
textual::parser_bits Class Reference

Warehouses some functions that are often useful during text parsing. More...

#include <parser_bits.h>

Public Types

enum  line_ending { LF_AT_END = -15 , CRLF_AT_END , NO_ENDING }
 Line endings is an enumeration of the separator character(s) used for text files. More...
 

Static Public Member Functions

static const char * eol_to_chars (line_ending ending)
 returns the C string form for the "ending" value. More...
 
static line_ending platform_eol ()
 provides the appropriate ending on the current OS platform. More...
 
static const char * platform_eol_to_chars ()
 provides the characters that make up this platform's line ending. More...
 
static void translate_CR_for_platform (basis::astring &to_translate)
 flips embedded EOL characters for this platform's needs. More...
 
static basis::astring substitute_env_vars (const basis::astring &text, bool leave_unknown=true)
 resolves embedded environment variables in "text". More...
 
static bool is_printable_ascii (char to_check)
 returns true if "to_check" is a normally visible ASCII character. More...
 
static bool white_space_no_cr (char to_check)
 reports if "to_check" is white space but not a carriage return. More...
 
static bool is_eol (char to_check)
 returns true if "to_check" is part of an end-of-line sequence. More...
 
static bool white_space (char to_check)
 returns true if the character "to_check" is considered a white space. More...
 
static bool is_alphanumeric (char look_at)
 returns true if "look_at" is one of the alphanumeric characters. More...
 
static bool is_alphanumeric (const char *look_at, int len)
 returns true if the char ptr "look_at" is all alphanumeric characters. More...
 
static bool is_alphanumeric (const basis::astring &look_at, int len)
 returns true if the string "look_at" is all alphanumeric characters. More...
 
static bool is_alpha (char look_at)
 returns true if "look_at" is one of the alphabetical characters. More...
 
static bool is_alpha (const char *look_at, int len)
 returns true if the char ptr "look_at" is all alphabetical characters. More...
 
static bool is_alpha (const basis::astring &look_at, int len)
 returns true if the string "look_at" is all alphabetical characters. More...
 
static bool is_numeric (char look_at)
 returns true if "look_at" is a valid numerical character. More...
 
static bool is_numeric (const char *look_at, int len)
 returns true if "look_at" is all valid numerical characters. More...
 
static bool is_numeric (const basis::astring &look_at, int len)
 returns true if the "look_at" string has only valid numerical chars. More...
 
static bool is_hexadecimal (char look_at)
 returns true if "look_at" is one of the hexadecimal characters. More...
 
static bool is_hexadecimal (const char *look_at, int len)
 returns true if "look_at" is all hexadecimal characters. More...
 
static bool is_hexadecimal (const basis::astring &look_at, int len)
 returns true if the string "look_at" is all hexadecimal characters. More...
 
static bool is_identifier (char look_at)
 returns true if "look_at" is a valid identifier character. More...
 
static bool is_identifier (const char *look_at, int len)
 returns true if "look_at" is composed of valid identifier character. More...
 
static bool is_identifier (const basis::astring &look_at, int len)
 like is_identifier() above but operates on a string. More...
 

Detailed Description

Warehouses some functions that are often useful during text parsing.

Definition at line 24 of file parser_bits.h.

Member Enumeration Documentation

◆ line_ending

Line endings is an enumeration of the separator character(s) used for text files.

on unix, every line in a text file has a line feed (LF) character appended to the line. on ms-dos and ms-windows, each line has a carriage return (CR) and line feed (LF) appended instead. a synonym for the line_ending is "eol" which stands for "end of line".

Enumerator
LF_AT_END 

Unix standard is LF_AT_END ("\n").

CRLF_AT_END 

DOS standard is CRLF_AT_END ("\r\n").

NO_ENDING 

No additional characters added as line endings.

Definition at line 31 of file parser_bits.h.

Member Function Documentation

◆ eol_to_chars()

const char * textual::parser_bits::eol_to_chars ( line_ending  ending)
static

returns the C string form for the "ending" value.

Definition at line 45 of file parser_bits.cpp.

Referenced by loggers::eol_aware::get_ending().

◆ is_alpha() [1/3]

bool textual::parser_bits::is_alpha ( char  look_at)
static

returns true if "look_at" is one of the alphabetical characters.

This includes a to z in either case.

Definition at line 138 of file parser_bits.cpp.

References basis::range_check().

Referenced by filesystem::filename::canonicalize().

◆ is_alpha() [2/3]

bool textual::parser_bits::is_alpha ( const basis::astring look_at,
int  len 
)
static

returns true if the string "look_at" is all alphabetical characters.

Definition at line 148 of file parser_bits.cpp.

References basis::astring::observe().

◆ is_alpha() [3/3]

bool textual::parser_bits::is_alpha ( const char *  look_at,
int  len 
)
static

returns true if the char ptr "look_at" is all alphabetical characters.

Definition at line 141 of file parser_bits.cpp.

◆ is_alphanumeric() [1/3]

bool textual::parser_bits::is_alphanumeric ( char  look_at)
static

returns true if "look_at" is one of the alphanumeric characters.

This includes a to z in either case and 0 to 9.

Definition at line 121 of file parser_bits.cpp.

References basis::range_check().

Referenced by filesystem::filename::canonicalize(), fix_project_references.fix_project_references::replace_within_string(), and phrase_replacer.phrase_replacer::replace_within_string().

◆ is_alphanumeric() [2/3]

bool textual::parser_bits::is_alphanumeric ( const basis::astring look_at,
int  len 
)
static

returns true if the string "look_at" is all alphanumeric characters.

Definition at line 135 of file parser_bits.cpp.

References basis::astring::observe().

Referenced by fix_project_references.fix_project_references::replace_within_string(), and phrase_replacer.phrase_replacer::replace_within_string().

◆ is_alphanumeric() [3/3]

bool textual::parser_bits::is_alphanumeric ( const char *  look_at,
int  len 
)
static

returns true if the char ptr "look_at" is all alphanumeric characters.

Definition at line 128 of file parser_bits.cpp.

Referenced by fix_project_references.fix_project_references::replace_within_string(), and phrase_replacer.phrase_replacer::replace_within_string().

◆ is_eol()

bool textual::parser_bits::is_eol ( char  to_check)
static

returns true if "to_check" is part of an end-of-line sequence.

this returns true for both the '\r' and '
' characters.

Definition at line 68 of file parser_bits.cpp.

◆ is_hexadecimal() [1/3]

bool textual::parser_bits::is_hexadecimal ( char  look_at)
static

returns true if "look_at" is one of the hexadecimal characters.

This includes a to f in either case and 0 to 9.

Definition at line 104 of file parser_bits.cpp.

References basis::range_check().

◆ is_hexadecimal() [2/3]

bool textual::parser_bits::is_hexadecimal ( const basis::astring look_at,
int  len 
)
static

returns true if the string "look_at" is all hexadecimal characters.

Definition at line 118 of file parser_bits.cpp.

References basis::astring::observe().

◆ is_hexadecimal() [3/3]

bool textual::parser_bits::is_hexadecimal ( const char *  look_at,
int  len 
)
static

returns true if "look_at" is all hexadecimal characters.

Definition at line 111 of file parser_bits.cpp.

References is_hexadecimal().

Referenced by is_hexadecimal().

◆ is_identifier() [1/3]

bool textual::parser_bits::is_identifier ( char  look_at)
static

returns true if "look_at" is a valid identifier character.

this just allows alphanumeric characters and underscore.

Definition at line 151 of file parser_bits.cpp.

References basis::range_check().

◆ is_identifier() [2/3]

bool textual::parser_bits::is_identifier ( const basis::astring look_at,
int  len 
)
static

like is_identifier() above but operates on a string.

Definition at line 167 of file parser_bits.cpp.

References basis::astring::observe().

◆ is_identifier() [3/3]

bool textual::parser_bits::is_identifier ( const char *  look_at,
int  len 
)
static

returns true if "look_at" is composed of valid identifier character.

additionally, identifiers cannot start with a number.

Definition at line 159 of file parser_bits.cpp.

◆ is_numeric() [1/3]

bool textual::parser_bits::is_numeric ( char  look_at)
static

returns true if "look_at" is a valid numerical character.

Definition at line 170 of file parser_bits.cpp.

References basis::range_check().

◆ is_numeric() [2/3]

bool textual::parser_bits::is_numeric ( const basis::astring look_at,
int  len 
)
static

returns true if the "look_at" string has only valid numerical chars.

Definition at line 184 of file parser_bits.cpp.

References basis::astring::observe().

◆ is_numeric() [3/3]

bool textual::parser_bits::is_numeric ( const char *  look_at,
int  len 
)
static

returns true if "look_at" is all valid numerical characters.

this allows the '-' character for negative numbers also (but only for first character if the char* or astring versions are used). does not support floating point numbers or exponential notation yet.

Definition at line 175 of file parser_bits.cpp.

◆ is_printable_ascii()

bool textual::parser_bits::is_printable_ascii ( char  to_check)
static

returns true if "to_check" is a normally visible ASCII character.

this is defined very simply by it being within the range of 32 to

  1. that entire range should be printable in ASCII. before 32 we have control characters. after 126 we have potentially freakish looking characters. this is obviously not appropriate for utf-8 or unicode.

Definition at line 62 of file parser_bits.cpp.

◆ platform_eol()

parser_bits::line_ending textual::parser_bits::platform_eol ( )
static

provides the appropriate ending on the current OS platform.

Definition at line 31 of file parser_bits.cpp.

◆ platform_eol_to_chars()

const char * textual::parser_bits::platform_eol_to_chars ( )
static

provides the characters that make up this platform's line ending.

Definition at line 59 of file parser_bits.cpp.

Referenced by textual::xml_generator::add_content(), and textual::xml_generator::generate().

◆ substitute_env_vars()

astring textual::parser_bits::substitute_env_vars ( const basis::astring text,
bool  leave_unknown = true 
)
static

resolves embedded environment variables in "text".

replaces the names of any environment variables in "text" with the variable's value and returns the resulting string. the variable names are marked by a single dollar before an alphanumeric identifier (underscores are valid), for example: $PATH if the "leave_unknown" flag is true, then any unmatched variables are left in the text with a question mark instead of a dollar sign. if it's false, then they are simply replaced with nothing at all.

Definition at line 187 of file parser_bits.cpp.

References basis::astring::find(), basis::environment::get(), basis::astring::insert(), basis::astring::length(), basis::negative(), basis::astring::substring(), basis::astring::t(), and basis::astring::zap().

◆ translate_CR_for_platform()

void textual::parser_bits::translate_CR_for_platform ( basis::astring to_translate)
static

flips embedded EOL characters for this platform's needs.

runs through the string "to_translate" and changes any CR or CRLF combinations into the EOL (end-of-line) character that's appropriate for this operating system.

Definition at line 74 of file parser_bits.cpp.

References basis::astring::end(), basis::astring::insert(), and basis::astring::zap().

◆ white_space()

bool textual::parser_bits::white_space ( char  to_check)
static

returns true if the character "to_check" is considered a white space.

this set includes tab ('\t'), space (' '), carriage return ('
'), and line feed ('\r').

Definition at line 71 of file parser_bits.cpp.

References is_eol().

◆ white_space_no_cr()

bool textual::parser_bits::white_space_no_cr ( char  to_check)
static

reports if "to_check" is white space but not a carriage return.

returns true if the character "to_check" is considered a white space, but is not part of an end of line combo (both '
' and '\r' are disallowed). the allowed set includes tab ('\t') and space (' ') only.

Definition at line 65 of file parser_bits.cpp.


The documentation for this class was generated from the following files: