I was chatting to me mate Andrew Myers on Twitter this morning, and he brought this Twitter exchange about - basically - the death of CFQuickDocs at Adobe's hands to my attention:
Is CFQuickDocs not working anymore :( not getting results for CF9 #coldfusion
— Mark Mandel (@Neurotic) September 13, 2013
@Neurotic no and I told him ages ago, I think adobe broke it by moving their docs. Try http://t.co/FM8RmB66F0
— Russ Michaels (@RussMichaels) September 13, 2013
@RussMichaels errr... We broke it? If it was screen scraping you can't really blame us, right?
— Raymond Camden (@cfjedimaster) September 13, 2013
@cfjedimaster no but as they all broke at the same time it seems like a fair assumption
— Russ Michaels (@RussMichaels) September 14, 2013
@RussMichaels just saying - I wouldn't say it is our "fault" per as ;)
— Raymond Camden (@cfjedimaster) September 14, 2013
@cfjedimaster ok "adobe moving their focs is the cause", is that better ?
— Russ Michaels (@RussMichaels) September 14, 2013
@RussMichaels heh sure. Never a good idea to rely on scraping IMO
— Raymond Camden (@cfjedimaster) September 14, 2013
@cfjedimaster @RussMichaels CF doco is Creative Commons licensed yeah? Is there any way to get an export of the data other than scraping?
— am2605 (@am2605) September 14, 2013
@am2605 @RussMichaels no idea
— Raymond Camden (@cfjedimaster) September 14, 2013
@cfjedimaster @RussMichaels um pretty sure the answer is no. Which is why people resort to screen scraping
— am2605 (@am2605) September 14, 2013
@cfjedimaster @RussMichaels I would say we can blame @Adobe, yes. Unless There's like an API to request the docs that we're unaware of?
— Adam Cameron (@dacCfml) September 14, 2013
This irks me a bit... Andrew's right (I think) in that there's no other way of getting at this data other than scraping the page: there seems to be no API to get it. So Ray's response (although speaking appreciably frankly and not as an official Adobe position) is a bit unhelpfully dismissive here. This is unusual for him!
So... anyway... screw Adobe. I've decided I'm gonna scrape all the doc data I can, convert it to JSON, and stick it somewhere for people to access. As a first step I'm just gonna stick the JSON docs up on github somewhere (for everyone to use, as is the intent of Creative Commons), and from there maybe build a site that exposes a REST API to fetch individual documents or something. I have no actual plan of attack yet, I hasten to add.
One interesting thing I notice. The CF9 docs are explicitly covered by Creative Commons:
However the ColdFusion 10 docs are not. They are, instead, explicitly copyrighted. So I shan't be scraping those ones until I get clarification (which I will actively seek, and report back). This does seem rather contrary to the spirit of wikifying the docs... so they are soliciting public input, but basically keep all that work for themselves. Not very community-spirited. Just soliciting slave-labour, really.
The CF8 docs are explicitly copyrighted, in the Legal Notices page:
If this guide is distributed with software that includes an end user agreement, this guide, as well as the software described in it, is furnished under license and may be used or copied only in accordance with the terms of such license. Except as permitted by any such license, no part of this guide may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, recording, or otherwise, without the prior written permission of Adobe Systems Incorporated.I shall seek written permission to archive it for public use.
The ColdFusionMX 7 docs are copyrighted to Macromedia (and by implication, Adobe):
ColdFusion Documentation(I'll stop checking now).
Copyright © 1997-2004 Macromedia, Inc.
All rights reserved.
OK. I'm off to work out how best to scrape all this CF9 documentation.
Comments / suggestions, as always, welcomed.
--
Adam