Links

Content Skeleton

This Page

Previous topic

SCRAPE BUILD.XML

Reviewing scrape layout in XMLDB and file systemΒΆ

  1. The SVG dynamic URLs from the prior scrape are listed in $HFAG_SCRAPE_FOLDER/log.xml

Make svg ‘’‘vi’‘’sible with ‘’‘xf’‘’ ie xmllint --format:

xf ./hfagc/indv/BR_-521_-321+90xoBR_-521_-321+333+443/BR_-521_-321+90xoBR_-521_-321+333+443.svg > ~/heprez/svg/BR_-521_-321+90xoBR_-521_-321+333+443.svg

Somehow the rez-classify.xml seem out of place within scrape watershed /db/cache/hfagc/20120417 they would seem better suited together with averages undef /db/hfagc_prod/end_of_2011/

-rwur-ur-- admin dba rez-classify-key.xml Thu Apr 19 17:33:55 CST 2012
-rwur-ur-- admin dba rez-classify.xml Thu Apr 19 17:34:03 CST 2012

Notice that scrape rerunnig tends not to update tex precursors and other tex, so changes that impact these tend to require forcing to get them to update:

-rw-r--r--  1 blyth  admin  62 19 Apr 20:01 hfagc/00101/03101/03101-table_body.tex
-rw-r--r--  1 blyth  admin  62 19 Apr 20:01 hfagc/00101/04101/04101-table_body.tex
-rw-r--r--  1 blyth  admin  2779 19 Apr 20:01 hfagc/99999/99999/99999-table_body.tex

The indv.html and comb.html are valid docbook xml “chapter” fragments with instrumentation such as the dyn-links:

simon:20120417 blyth$ find db -type f -exec ls -l {} \;
-rw-r--r--  1 blyth  admin  39644 20 Apr 13:18 db/cache/hfagc/20120417/00101/comb.html
-rw-r--r--  1 blyth  admin  14216 20 Apr 13:24 db/cache/hfagc/20120417/BR_-521_-311+-82xBR_-82_-211+111+443/indv.html
-rw-r--r--  1 blyth  admin  17612 20 Apr 13:26 db/cache/hfagc/20120417/BR_-521_-311+-87xBR_-87_-211+100443/indv.html
-rw-r--r--  1 blyth  admin  17539 20 Apr 13:23 db/cache/hfagc/20120417/BR_-521_-311+-87xBR_-87_-211+443/indv.html
...

The named *.html are finished static pages, which are not valid xml:

simon:20120417 blyth$ find hfagc -type f -name '*.html' -exec ls -l {} \;
-rw-r--r--  1 blyth  admin  23926 20 Apr 13:18 hfagc/00101/00101.html
-rw-r--r--  1 blyth  admin  16776 20 Apr 13:24 hfagc/indv/BR_-521_-311+-82xBR_-82_-211+111+443/BR_-521_-311+-82xBR_-82_-211+111+443.html
-rw-r--r--  1 blyth  admin  17179 20 Apr 13:26 hfagc/indv/BR_-521_-311+-87xBR_-87_-211+100443/BR_-521_-311+-87xBR_-87_-211+100443.html
-rw-r--r--  1 blyth  admin  17158 20 Apr 13:23 hfagc/indv/BR_-521_-311+-87xBR_-87_-211+443/BR_-521_-311+-87xBR_-87_-211+443.html
-rw-r--r--  1 blyth  admin  16212 20 Apr 13:19 hfagc/indv/BR_-521_-321+81/BR_-521_-321+81.html
-rw-r--r--  1 blyth  admin  20981 20 Apr 13:22 hfagc/indv/BR_-521_-321+81xBR_81_-211+211+443/BR_-521_-321+81xBR_81_-211+211+443.html
...