Search in HTML and HTM files - creating
local search engine
This article is about searching text in .HTM
and .HTML files. It answers questions: what is HTM file, why should we
use this file format and what is the best way to search it for text.
What is HTML file?
HyperText Markup Language (HTML). HTML is the lingua
franca for publishing hypertext on the World Wide Web. It is a
non-proprietary format based upon SGML,
and can be created and processed by a wide range of tools, from simple
plain text editors - you type it in from scratch- to sophisticated WYSIWYG
authoring tools.
- HTML
4.0 first released as a W3C Recommendation on 18
December 1997. This specification has now been
superseded by HTML 4.0. More information about HTML on www.w3.org
|
Why do I need to search in HTML files?
Just because it takes too much time to open each file. But
sometimes you save web pages or download web-sites and need some mean
to search within file you got.
Is there a native Window tool to search in
HTML files?
Unfortunately no. "Find file" in Windows cannot search
text within htm files, as Windows doesn't have any mean to
convert .htm or .html file into plain text file.
As you want to have a full access to all knowledgebase you have,
then you should use some special search utility with htm search
support (such as FSA - you will read about it below).
What is the simplest solution to search in
HTML files - it's html search engine.
We created File Search Assistant to make your search fast and
effective.
You should run File Search Assistant, put your search criteria and
click "Search" button.
1. Put
*.htm mask in "File Name" field. You will
tell FSA to search only for htm files.
2. Use
"Search in" drop list to specify location to
search in.
3. Put
keywords you want to find in "Keyword" field.
Useful notes:
- Use custom search options to create custom search
groups - it will make you search faster. Read
more in on-line manual.
- Using custom search options allows to skip step where
you specify search mask and path to search.
- You can use regular expressions as a keyword. For
example +dog -cat +"my cat"
|
OK. Now tell me how can your FSA be more
effective that other html searching tools.
Three key features makes FSA extremely useful:
Custom Search Options - you can
point FSA to folders where your files are and where exactly you want
to search in. Once you have created a search group you can use it for
every other search.
Preview Pane - Preview pane show you
the small piece of file where keyword (or keywords if you search for
some regular expression) was found. Specify your search criteria and
click "search" button to learn how it works.
Preview pane
shows you the small piece of file where keyword (or keywords
if you search for som regular expression) was found.
FSA
highlights found keyword(s) so you can decide whether you
need this very file. If not - click "Next File"
button. If you not sure - click "Next Fragment"
button to show next text fragment with found keyword(s)
|
Html searches - generating search report
Any time you can generate search report from search results and
make use of them later.
FSA puts in search report:
- text from found files with highlighted keywords;
- information about found files - size, path and relevancy to
search phrase;
- link to found file so you can open it right from search report;
Tell me more about searching in HTML files
Search Speed
Depends on what computer you are using now. You can download FSA
and try for free.
If
it will find ANY text in HTML file?
Yes. FSA also will
search in image alt tags.
Can
I use FSA to search in htm as a binary file?
Anytime you can turn
off "HTML to txt" filter. Then FSA will read HTML as a
binary file.
If
my HTML file is in ZIP archive, can FSA find text in it?
Yes, you just should
check "search in zip files" checkbox.
Can
it find HTML which is on network disk?
Yes FSA can search
files on both local and network disks.
Get File Search Assistant
You can get File Search Assistant right now. It's a file search
tool that allows to search for popular file types on local hard disk
and across a network.
|