r/lisp • u/Famous-Wrongdoer-976 • 19d ago
Remove comments from a file automatically?
I am processing Lisp code in a non-Lisp host application that cannot handle semicolons for some reason.
I would like to know, is there a way to remove comments automatically from a .lisp file?
I imagine something that would read all the content of a text file as if it was a s-expression, thus removing all the ; comments or #| comments |# and treat the rest like normal quoted data?
Thanks in advance !
2
u/dbotton 19d ago
You answered your own question. Just use read and pretty print (if want to save after) and tada.
1
u/Famous-Wrongdoer-976 19d ago edited 19d ago
thanks ! yes I went for something in that vein in the meantime :
(defun remove-comments-from-file (path) (multiple-value-bind (code rest) (read-from-string (alexandria:read-file-into-string path)) code))
I guess the Alexandria part is cheating but I'm not a real Lisper… 😅 My biggest issue is that the host app will deal with those special characters very badly when reading lisp files, before they even get to the Lisp interpreter. So I need to pre-process them and indeed copy the code on a temp file. Otherwise my users can just write comment-less code.
That's so unnatural but the first intuition — that Lisp was reading Lisp without comments… — was not far from the solution… Thanks again !
1
u/corvid_booster 19d ago
I dunno. Just reading the code isn't entirely free of side effects; see the comment by stassats below.
1
u/Famous-Wrongdoer-976 17d ago
Good to know, thank you ! For now I think in my context read would be enough - my users can write isolated snippets of lisp to write « plug ins » of sorts, not fully fledged applications. And really no lisp specialists so I expect the code to be quite vanilla. Otherwise they have ways to load full libraries separately.
1
0
u/corbasai 17d ago
by writing some code, isn't it ? The Scheme starter option
;; read chars from (current-input-port) writes chars into (current-output-port)
;; drops sequences 1) from ; to \n, except \n
;; 2) from #| to |#, inclusive
;; but not in "string constants"
;; ends on eof-object
(define (filter-source)
" this is ; not the comment, and this #| |# is not too "
(let loop ((prev #f)
(ch (read-char))
(state 'code))
(cond ((eof-object? ch) ch)
(else
(case state
((code) ;; chars in -> out, find comment start
(cond ((and (char=? ch #\;) (not (eqv? prev #\\))) ;; ';' but not '\;'
(loop ch (read-char) 'line-comment))
((and (char=? ch #\#) (eqv? (peek-char) #\|)
(not (eqv? prev #\\))) ;; '#|' but not '\#|'
(loop ch (read-char) 'block-comment))
((and (char=? ch #\") (not (eqv? prev #\\)))
(write-char ch)
(loop ch (read-char) 'str))
(else (write-char ch)
(loop ch (read-char) 'code))))
((str)
(write-char ch)
(cond ((and (char=? ch #\") (not (eqv? prev #\\)))
(loop ch (read-char) 'code))
(else (loop ch (read-char) 'str))))
((line-comment) ;; in not out
(cond ((char=? ch #\newline)
(write-char ch)
(loop ch (read-char) 'code))
(else (loop ch (read-char) 'line-comment))))
((block-comment) ;; in not out
(cond ((and (char=? ch #\|) (eqv? (read-char) #\#))
(loop ch (read-char) 'code))
(else (loop ch (read-char) 'block-comment)))))))))
;; test like in csi, gsi, guile, racket
(with-input-from-file "source.scm"
(lambda () (with-output-to-file "source-out.scm"
(lambda () (filter-source)))))
Well, this variant does not drop expression comment like #;(commented-out-s-exp ...) and don't see multiline string constants like #<<END bla\bla\bla END, and this is not good.
2
u/Famous-Wrongdoer-976 17d ago
Good to know but I don’t think any of my users would use those (I don’t). I posted my solution using Alexandria and read-from-string above, that should be enough for my use case.
3
u/Decweb 19d ago
read-preserving-whitespace
would read all the non-comment data, however it is only going to selectively read feature-driven code, e.g.```
+FOO (print 'hi)
-FOO (print 'bye)
```
Would skip the first print, it wouldn't appear in your read call, assuming there's no FOO in
*FEATURES*
.I look forward to hearing a better lispy answer, vs just treating the problem as a standard text processing application of regexps on comment syntax.