Back to index
Provides a file-like object that takes care of all the things you commonly want to do when processing a text file that has some line-by-line syntax: strip comments (as long as "#" is your comment character), skip blank lines, join adjacent lines by escaping the newline (ie. backslash at end of line), strip leading and/or trailing whitespace. All of these are optional and independently controllable. Provides a 'warn()' method so you can generate warning messages that report physical line number, even if the logical line in question spans multiple physical lines. Also provides 'unreadline()' for implementing line-at-a-time lookahead. Constructor is called as: TextFile (filename=None, file=None, **options) It bombs (RuntimeError) if both 'filename' and 'file' are None; 'filename' should be a string, and 'file' a file object (or something that provides 'readline()' and 'close()' methods). It is recommended that you supply at least 'filename', so that TextFile can include it in warning messages. If 'file' is not supplied, TextFile creates its own using 'io.open()'. The options are all boolean, and affect the value returned by 'readline()': strip_comments [default: true] strip from "#" to end-of-line, as well as any whitespace leading up to the "#" -- unless it is escaped by a backslash lstrip_ws [default: false] strip leading whitespace from each line before returning it rstrip_ws [default: true] strip trailing whitespace (including line terminator!) from each line before returning it skip_blanks [default: true} skip lines that are empty *after* stripping comments and whitespace. (If both lstrip_ws and rstrip_ws are false, then some lines may consist of solely whitespace: these will *not* be skipped, even if 'skip_blanks' is true.) join_lines [default: false] if a backslash is the last non-newline character on a line after stripping comments and whitespace, join the following line to it to form one "logical line"; if N consecutive lines end with a backslash, then N+1 physical lines will be joined to form one logical line. collapse_join [default: false] strip leading whitespace from lines that are joined to their predecessor; only matters if (join_lines and not lstrip_ws) errors [default: 'strict'] error handler used to decode the file content Note that since 'rstrip_ws' can strip the trailing newline, the semantics of 'readline()' must differ from those of the builtin file object's 'readline()' method! In particular, 'readline()' returns None for end-of-file: an empty string might just be a blank line (or an all-whitespace line), if 'rstrip_ws' is true but 'skip_blanks' is not.
Construct a new TextFile object. At least one of 'filename' (a string) and 'file' (a file-like object) must be supplied. They keyword argument options are described above and affect the values returned by 'readline()'.
00078 00079 def __init__(self, filename=None, file=None, **options): 00080 """Construct a new TextFile object. At least one of 'filename' 00081 (a string) and 'file' (a file-like object) must be supplied. 00082 They keyword argument options are described above and affect 00083 the values returned by 'readline()'.""" 00084 if filename is None and file is None: 00085 raise RuntimeError("you must supply either or both of 'filename' and 'file'") 00086 00087 # set values for all options -- either from client option hash 00088 # or fallback to default_options 00089 for opt in self.default_options.keys(): 00090 if opt in options: 00091 setattr(self, opt, options[opt]) 00092 else: 00093 setattr(self, opt, self.default_options[opt]) 00094 00095 # sanity check client option hash 00096 for opt in options.keys(): 00097 if opt not in self.default_options: 00098 raise KeyError("invalid TextFile option '%s'" % opt) 00099 00100 if file is None: 00101 self.open(filename) 00102 else: 00103 self.filename = filename 00104 self.file = file 00105 self.current_line = 0 # assuming that file is at BOF! 00106 00107 # 'linebuf' is a stack of lines that will be emptied before we 00108 # actually read from the file; it's only populated by an 00109 # 'unreadline()' operation 00110 self.linebuf = 
Close the current file and forget everything we know about it (filename, current line number).
00126 00127 def gen_error(self, msg, line=None): 00128 outmsg =  00129 if line is None: 00130 line = self.current_line 00131 outmsg.append(self.filename + ", ") 00132 if isinstance(line, (list, tuple)): 00133 outmsg.append("lines %d-%d: " % tuple(line)) 00134 else: 00135 outmsg.append("line %d: " % line) 00136 outmsg.append(str(msg)) 00137 return "".join(outmsg)
Open a new file named 'filename'. This overrides both the 'filename' and 'file' arguments to the constructor.
Read and return a single logical line from the current file (or from an internal buffer if lines have previously been "unread" with 'unreadline()'). If the 'join_lines' option is true, this may involve reading multiple physical lines concatenated into a single string. Updates the current line number, so calling 'warn()' after 'readline()' emits a warning about the physical line(s) just read. Returns None on end-of-file, since the empty string can occur if 'rstrip_ws' is true but 'strip_blanks' is not.
00151 00152 def readline(self): 00153 """Read and return a single logical line from the current file (or 00154 from an internal buffer if lines have previously been "unread" 00155 with 'unreadline()'). If the 'join_lines' option is true, this 00156 may involve reading multiple physical lines concatenated into a 00157 single string. Updates the current line number, so calling 00158 'warn()' after 'readline()' emits a warning about the physical 00159 line(s) just read. Returns None on end-of-file, since the empty 00160 string can occur if 'rstrip_ws' is true but 'strip_blanks' is 00161 not.""" 00162 # If any "unread" lines waiting in 'linebuf', return the top 00163 # one. (We don't actually buffer read-ahead data -- lines only 00164 # get put in 'linebuf' if the client explicitly does an 00165 # 'unreadline()'. 00166 if self.linebuf: 00167 line = self.linebuf[-1] 00168 del self.linebuf[-1] 00169 return line 00170 00171 buildup_line = '' 00172 00173 while True: 00174 # read the line, make it None if EOF 00175 line = self.file.readline() 00176 if line == '': 00177 line = None 00178 00179 if self.strip_comments and line: 00180 00181 # Look for the first "#" in the line. If none, never 00182 # mind. If we find one and it's the first character, or 00183 # is not preceded by "\", then it starts a comment -- 00184 # strip the comment, strip whitespace before it, and 00185 # carry on. Otherwise, it's just an escaped "#", so 00186 # unescape it (and any other escaped "#"'s that might be 00187 # lurking in there) and otherwise leave the line alone. 00188 00189 pos = line.find("#") 00190 if pos == -1: # no "#" -- no comments 00191 pass 00192 00193 # It's definitely a comment -- either "#" is the first 00194 # character, or it's elsewhere and unescaped. 00195 elif pos == 0 or line[pos-1] != "\\": 00196 # Have to preserve the trailing newline, because it's 00197 # the job of a later step (rstrip_ws) to remove it -- 00198 # and if rstrip_ws is false, we'd better preserve it! 00199 # (NB. this means that if the final line is all comment 00200 # and has no trailing newline, we will think that it's 00201 # EOF; I think that's OK.) 00202 eol = (line[-1] == '\n') and '\n' or '' 00203 line = line[0:pos] + eol 00204 00205 # If all that's left is whitespace, then skip line 00206 # *now*, before we try to join it to 'buildup_line' -- 00207 # that way constructs like 00208 # hello \\ 00209 # # comment that should be ignored 00210 # there 00211 # result in "hello there". 00212 if line.strip() == "": 00213 continue 00214 else: # it's an escaped "#" 00215 line = line.replace("\\#", "#") 00216 00217 # did previous line end with a backslash? then accumulate 00218 if self.join_lines and buildup_line: 00219 # oops: end of file 00220 if line is None: 00221 self.warn("continuation line immediately precedes " 00222 "end-of-file") 00223 return buildup_line 00224 00225 if self.collapse_join: 00226 line = line.lstrip() 00227 line = buildup_line + line 00228 00229 # careful: pay attention to line number when incrementing it 00230 if isinstance(self.current_line, list): 00231 self.current_line = self.current_line + 1 00232 else: 00233 self.current_line = [self.current_line, 00234 self.current_line + 1] 00235 # just an ordinary line, read it as usual 00236 else: 00237 if line is None: # eof 00238 return None 00239 00240 # still have to be careful about incrementing the line number! 00241 if isinstance(self.current_line, list): 00242 self.current_line = self.current_line + 1 00243 else: 00244 self.current_line = self.current_line + 1 00245 00246 # strip whitespace however the client wants (leading and 00247 # trailing, or one or the other, or neither) 00248 if self.lstrip_ws and self.rstrip_ws: 00249 line = line.strip() 00250 elif self.lstrip_ws: 00251 line = line.lstrip() 00252 elif self.rstrip_ws: 00253 line = line.rstrip() 00254 00255 # blank line (whether we rstrip'ed or not)? skip to next line 00256 # if appropriate 00257 if (line == '' or line == '\n') and self.skip_blanks: 00258 continue 00259 00260 if self.join_lines: 00261 if line[-1] == '\\': 00262 buildup_line = line[:-1] 00263 continue 00264 00265 if line[-2:] == '\\\n': 00266 buildup_line = line[0:-2] + '\n' 00267 continue 00268 00269 # well, I guess there's some actual content there: return it 00270 return line
Read and return the list of all logical lines remaining in the current file.
Push 'line' (a string) onto an internal buffer that will be checked by future 'readline()' calls. Handy for implementing a parser with line-at-a-time lookahead.
Print (to stderr) a warning message tied to the current logical line in the current file. If the current logical line in the file spans multiple physical lines, the warning refers to the whole range, eg. "lines 3-5". If 'line' supplied, it overrides the current line number; it may be a list or tuple to indicate a range of physical lines, or an integer for a single physical line.
00141 00142 def warn(self, msg, line=None): 00143 """Print (to stderr) a warning message tied to the current logical 00144 line in the current file. If the current logical line in the 00145 file spans multiple physical lines, the warning refers to the 00146 whole range, eg. "lines 3-5". If 'line' supplied, it overrides 00147 the current line number; it may be a list or tuple to indicate a 00148 range of physical lines, or an integer for a single physical 00149 line.""" 00150 sys.stderr.write("warning: " + self.gen_error(msg, line) + "\n")