Quantcast
Channel: feed2exec:2a675780889a7fada8137b2cd2097a0994c1d443 commits
Viewing all articles
Browse latest Browse all 39

recover from feedparser exceptions (Closes: #964597)

$
0
0
In the aforementionned bug report, feed2exec crashes brutally (with a backtrace, and not completely done) on the following feed: http://www.agendadulibre.org/events.rss?region=12 The full backtrace is: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/feedparser.py", line 3774, in _gen_georss_coords t = [nxt(), nxt()][::swap and -1 or 1] StopIteration The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/bin/feed2exec", line 11, in <module> load_entry_point('feed2exec==0.15.0', 'console_scripts', 'feed2exec')() File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__ return self.main(*args, **kwargs) File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke return callback(*args, **kwargs) File "/usr/lib/python3/dist-packages/click/decorators.py", line 27, in new_func return f(get_current_context().obj, *args, **kwargs) File "/usr/lib/python3/dist-packages/feed2exec/__main__.py", line 124, in fetch st.fetch(parallel, force=force, catchup=catchup) File "/usr/lib/python3/dist-packages/feed2exec/feeds.py", line 162, in fetch self.dispatch(feed, feed.parse(body), None, force) File "/usr/lib/python3/dist-packages/feed2exec/feeds.py", line 399, in parse data = feedparser.parse(body) File "/usr/lib/python3/dist-packages/feedparser.py", line 3965, in parse saxparser.parse(source) File "/usr/lib/python3.7/xml/sax/expatreader.py", line 111, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python3.7/xml/sax/xmlreader.py", line 125, in parse self.feed(buffer) File "/usr/lib/python3.7/xml/sax/expatreader.py", line 217, in feed self._parser.Parse(data, isFinal) File "../Modules/pyexpat.c", line 471, in EndElement File "/usr/lib/python3.7/xml/sax/expatreader.py", line 381, in end_element_ns self._cont_handler.endElementNS(pair, None) File "/usr/lib/python3/dist-packages/feedparser.py", line 2060, in endElementNS self.unknown_endtag(localname) File "/usr/lib/python3/dist-packages/feedparser.py", line 704, in unknown_endtag method() File "/usr/lib/python3/dist-packages/feedparser.py", line 1471, in _end_georss_point geometry = _parse_georss_point(self.pop('geometry')) File "/usr/lib/python3/dist-packages/feedparser.py", line 3783, in _parse_georss_point coords = list(_gen_georss_coords(value, swap, dims)) RuntimeError: generator raised StopIteration I can also reproduce this with plain feedparser with: python3 -c 'import feedparser; feedparser.parse("http://www.agendadulibre.org/events.rss?region=12")' So, to feed2exec's defense, this is purely feedparser's fault. Still, play the defensive programming game and do not let feedparser failing on a single feed crash the entire run. And even if it would, we should still tell the user nicer things than a backtrace (although, to be fair, maybwe we should tell *developers* the full backtrace). The feed is, at the time of writing, valid according to this: http://www.feedvalidator.org/check.cgi?url=http%3A%2F%2Fwww.agendadulibre.org%2Fevents.rss%3Fregion%3D12 I am not adding it to the test suite because it is not clear it is legally allowed. According to this, the agendadulibre.org source code is free software (AGPL): https://www.agendadulibre.org/pages/infos#puis-je-utiliser-le-logiciel-de-lagenda-du-libre-pour-mon-agenda ... but this is more worrisome: https://www.agendadulibre.org/pages/infos#a-nametraitementtraitement-des-donnes-personnellesa In english, it says that people are allowed to request their personal data to be taken out, which is a fair policy in terms of hosting an agenda, but could be painful if I store that data in a git repository. So trust me: it's broken, and this fixes it, kind of.

Viewing all articles
Browse latest Browse all 39

Latest Images

Trending Articles





Latest Images