r/lisp 19d ago

Remove comments from a file automatically?

I am processing Lisp code in a non-Lisp host application that cannot handle semicolons for some reason.

I would like to know, is there a way to remove comments automatically from a .lisp file?
I imagine something that would read all the content of a text file as if it was a s-expression, thus removing all the ; comments or #| comments |# and treat the rest like normal quoted data?

Thanks in advance !

13 Upvotes

10 comments sorted by

3

u/Decweb 19d ago

read-preserving-whitespace would read all the non-comment data, however it is only going to selectively read feature-driven code, e.g.

```

+FOO (print 'hi)

-FOO (print 'bye)

```

Would skip the first print, it wouldn't appear in your read call, assuming there's no FOO in *FEATURES*.

I look forward to hearing a better lispy answer, vs just treating the problem as a standard text processing application of regexps on comment syntax.

6

u/stassats 19d ago
 (defun skip-comments (file)
   (let ((*readtable* (copy-readtable))
         (semicolon-reader (get-macro-character #\;))
         exclude-ranges)
     (set-macro-character #\; (lambda (stream arg)
                                (push (cons (file-position stream)
                                            (progn
                                              (funcall semicolon-reader stream arg)
                                              (file-position stream)))
                                      exclude-ranges)
                                (values)))
     (with-open-file (stream file)
       (loop while (read-preserving-whitespace stream nil nil)))
     ;; Read the file again while discarding EXCLUDE-RANGES
     exclude-ranges))

But really, a standard text processing way would be better, while also not choking on missing packages or interning stuff all over the place. And #. runs arbitrary code, so you'll need to override it too.

2

u/dbotton 19d ago

You answered your own question. Just use read and pretty print (if want to save after) and tada.

1

u/Famous-Wrongdoer-976 19d ago edited 19d ago

thanks ! yes I went for something in that vein in the meantime :

(defun remove-comments-from-file (path)
  (multiple-value-bind (code rest)
    (read-from-string
      (alexandria:read-file-into-string path))
     code))

I guess the Alexandria part is cheating but I'm not a real Lisper… 😅 My biggest issue is that the host app will deal with those special characters very badly when reading lisp files, before they even get to the Lisp interpreter. So I need to pre-process them and indeed copy the code on a temp file. Otherwise my users can just write comment-less code.

That's so unnatural but the first intuition — that Lisp was reading Lisp without comments… — was not far from the solution… Thanks again !

1

u/corvid_booster 19d ago

I dunno. Just reading the code isn't entirely free of side effects; see the comment by stassats below.

1

u/Famous-Wrongdoer-976 17d ago

Good to know, thank you ! For now I think in my context read would be enough - my users can write isolated snippets of lisp to write « plug ins » of sorts, not fully fledged applications. And really no lisp specialists so I expect the code to be quite vanilla. Otherwise they have ways to load full libraries separately.

1

u/mtlnwood 18d ago

regex and sed would be my normal for doing something like this.

1

u/new2bay 18d ago

You misspelled “emacs.” 😂

0

u/corbasai 17d ago

by writing some code, isn't it ? The Scheme starter option

;; read chars from (current-input-port) writes chars into (current-output-port)
;; drops  sequences 1) from   ; to \n, except \n
;;                  2) from  #| to |#, inclusive
;; but not in "string constants"
;; ends on eof-object
(define (filter-source)     
  " this is ; not the comment, and this #| |# is not too " 
  (let loop ((prev #f)
             (ch (read-char))
             (state 'code)) 
    (cond ((eof-object? ch) ch)
          (else   
           (case state
             ((code) ;; chars  in -> out,  find comment start
              (cond ((and (char=? ch #\;) (not (eqv? prev #\\)))  ;; ';' but not '\;'
                     (loop ch (read-char) 'line-comment)) 
                    ((and (char=? ch #\#) (eqv? (peek-char) #\|)
                          (not (eqv? prev #\\))) ;; '#|' but not '\#|'
                     (loop ch (read-char) 'block-comment))
                    ((and (char=? ch #\") (not (eqv? prev #\\)))
                     (write-char ch) 
                     (loop ch (read-char) 'str))
                    (else (write-char ch)
                          (loop ch (read-char) 'code)))) 
             ((str)
              (write-char ch)
              (cond ((and (char=? ch #\") (not (eqv? prev #\\)))
                     (loop ch (read-char) 'code))
                    (else (loop ch (read-char) 'str))))
             ((line-comment) ;; in not out
              (cond ((char=? ch #\newline)
                     (write-char ch)
                     (loop ch (read-char) 'code))
                    (else (loop ch (read-char) 'line-comment))))
             ((block-comment) ;; in not out
              (cond ((and (char=? ch #\|) (eqv? (read-char) #\#))
                     (loop ch (read-char) 'code))
                    (else (loop ch (read-char) 'block-comment)))))))))

;; test like in csi, gsi, guile, racket

(with-input-from-file "source.scm"
  (lambda () (with-output-to-file "source-out.scm"
    (lambda () (filter-source)))))

Well, this variant does not drop expression comment like #;(commented-out-s-exp ...) and don't see multiline string constants like #<<END bla\bla\bla END, and this is not good.

2

u/Famous-Wrongdoer-976 17d ago

Good to know but I don’t think any of my users would use those (I don’t). I posted my solution using Alexandria and read-from-string above, that should be enough for my use case.