Microsoft Word - FOI 12-01605 Resp 1.doc

From b4339df4caa93f44abe0cd8d9d4b8c5888662421 Mon Sep 17 00:00:00 2001 From: Gareth Rees Date: Fri, 25 Apr 2014 16:59:34 +0100 Subject: Work around bug#77932 in pdftohtml Sometimes pdftohtml will generate thousands of images when converting an image embedded in a PDF. This causes a request spike when a user tries to view the converted PDF as HTML. See https://bugs.freedesktop.org/show_bug.cgi?id=77932 for the bug report. --- spec/lib/attachment_to_html/adapters/pdf_spec.rb | 37 ++++++++++++++++++++++++ 1 file changed, 37 insertions(+) (limited to 'spec/lib/attachment_to_html/adapters') diff --git a/spec/lib/attachment_to_html/adapters/pdf_spec.rb b/spec/lib/attachment_to_html/adapters/pdf_spec.rb index c02b157e4..da79b2de0 100644 --- a/spec/lib/attachment_to_html/adapters/pdf_spec.rb +++ b/spec/lib/attachment_to_html/adapters/pdf_spec.rb @@ -58,6 +58,43 @@ describe AttachmentToHTML::Adapters::PDF do adapter.success?.should be_false end + it 'is not successful if the body contains more than 50 images' do + # Sometimes pdftohtml extracts images incorrectly, resulting + # in thousands of PNGs being created for one image. This creates + # a huge request spike when the converted attachment is requested. + # + # See bug report https://bugs.freedesktop.org/show_bug.cgi?id=77932 + + # Construct mocked HTML output with 51 images + invalid = <<-DOC + + + + Microsoft Word - FOI 12-01605 Resp 1.doc + + + + + + +

+ DOC + + (3..51).each { |i| invalid += %Q(

) } + + invalid += <<-DOC +
+ Some Content
+

+ + + DOC + AlaveteliExternalCommand.stub(:run).and_return(invalid) + + adapter.success?.should be_false + end + end end -- cgit v1.2.3