Patent visualisation

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Patent visualisation is an application of information visualisation. The number of patents has been increasing steadily,[1] thus forcing companies to consider intellectual property as a part of their strategy.[2] So patent visualisation like patent mapping is used to quickly view patents portfolio.

Patent visualisation dedicated software began to appear in 2000 like Aureka from Aurigin now owned by Thomson Reuters.[3] Many patent and portfolio analytics platforms, such as Relecura, Innography and Patent iNSIGHT Pro,[4] offer options to visualise specific data within patent documents by creating Topic Maps,[5] Priority Maps, IP Landscape reports[6] etc. Taking advantage of the innate visual language, software have been developed to convert patents in clear infographics or maps, to allow the analyst to "get insight into the data" and draw conclusions.[7] Also referred as patinformatics,[8] it is the "science of analysing patent information to discover relationships and trends that would be difficult to see when working with patent documents on a one-and-one basis".

Patents contain two types of information: Structured data like publication number which are processed by data-mining and unstructured text like title, abstract and claims which are used with text mining.[9]

Data mining[edit]

The main step in processing structured information lies on data-mining. Data mining[10] emerged as a science in the late 1980s. Used in computer science and genetic algorithms, data mining is the union of statistics, artificial intelligence and machine learning to assist in the analysis of patents.[11] Patent data mining extracts information from the structured data of the patent document.[12] These structured data are bibliographic fields like location or date or status :

Structured fields[edit]

Structured data Description Business Intelligence use
Datas Patent contain different identifying data such as priority, publication data and the issue date
  • Priority data regroup priority number assigned for the first application, the corresponding date and priority country.
  • The publication data encompass the publication number given when the patent is published, 18 months after filling and the publication date.
  • The issue date is the data the patent is granted, usually 3.5 years after filling depending on the patent office.
Priority application data often propose information such as priority country. Crossing dates and locations fields offer a global vision of a technology in time and space.
Assignee Patent assignees are organizations or individuals, owners of the patent's invention. The field can offer a ranking of the principal actors of the environment thus allowing seeing potential competitors or partners.
Inventor Inventors developed the invention Inventors field combined to the assignee field can create a social network and follow field experts.
Classification The classification will regroup inventions with similar technologies. The most commonly used is the International Patent Classification IPC. However patent organizations have their own classification for instance the European Patent Office with ECLA. Grouping patents by thematics offers an overview of the corpus and the potential applications of studied technology.
Status The legal status is an easy way to access report that lets you view the legal status for all members of a patent family in a single view. Patent family and legal status searching is very important for litigation and even competitive intelligence.

Advantages[edit]

Data mining offers a statistical analysis tool to study filing patterns of competitors, locate the main patent filers within a specific area of technology. This type of approach can be very helpful to monitor competitors environment, moves and innovation trends and gives a macro-view of a technology status in order to evaluates its maturity and complexity.

Text-Mining[edit]

Principle[edit]

Text-mining is used to search through unstructured full-text patents.[13] This technique is very well known due to the Internet development, its success in bioinformatics and now in the intellectual property environment.[14]

Text mining is based on a statistical approach of words recurrence or occurrence in the patents corpus.[15] An algorithm decomposes the corpus into a text sea, extracts words and expressions from title, summary and claims and gather them by declension. The conjunctions such as "and" or "if" are labeled as non-information bearing words and are stored in the stopword list. These stoplists can be personalized, in order to create an accurate analysis. Next, the algorithm will rank the words by weight, according to their frequency in the patent's corpus and the documents frequency containing this word. It literally fishes the whole Text Sea for words or expressions and counts their occurrence. The score for each word is calculated using this formula:[16][17]

Weight=\frac{Term\ Frequency}{Document\ Frequency}=\frac{Frequency\ of\ the word\ or\ expression\ in\ the\ Text\ Sea}{Number\ of\ documents\ containing\ the\ expression\ or\ word}

According to this, a frequently used word in several documents will have less weight or score than a frequently used word in a few patents. Words under a minimum weight are eliminated, only to have left a list of n pertinent words or descriptors. Then each patent is associated to the descriptors found in the selected document. Further, in the process of clusterization, these descriptors are used as subsets, in which the patent are regrouped or they can be used as tags to place the patents in predetermined categories for example keywords from International Patent Classifications.

There are four different full-text parts that can be processed with text-mining :

  • Title
  • Abstract
  • Claim
  • Patent Full-Text

Software offer different combinations but using title, abstract and claim is generally the most used, having a good balance between interferences and relevancy.

Advantages[edit]

Text-mining approache has numerous advantages. First, it is useful to narrow down a search or quickly evaluate a patent corpus. For instance, if a query has taken irrelevant documents, a multi level clustering hierarchy will identify them in order to delete them and refine the search. Moreover, this approach offers the possibility to create internal taxonomies specific to a corpus, thus preparing possible mapping.

Visualisations[edit]

Further information: Patent map

This art of allying patent analysis and informatic tools offers an overview of the environment through value-added visualisations. As patent contain two types of information, structured and unstructured one, visualisations can be distinguished in two categories. Structured data can be rendered with data mining in macrothematic maps and statistical analysis. Whereas unstructured information extracted by text-mining are represented in a more intuitive way like clouds, cluster maps, 2D keyword map.

Data mining visualisation[edit]

Visualisation Picture Description Business Intelligence use
Matrix chart Picture Graphic organizer used to summarize a multidimensional data set in a grid Data comparison
Location map Picture Map with overlaid data values on geographic regions
  • Spatial patterns
  • Find innovative countries
Bar chart Picture Graph with rectangular bars proportional to the values that they represent, useful for numerical comparisons. Data evolution
Line graph Picture Graph used to summarize how two parameters are related and how they vary. Data evolution and relationships
Pie chart Picture Circular chart divided into sections, to illustrate proportions. Data comparison
Bubble chart Picture 3-axis 2D chart which enables visualization similar to the Magic quadrant chart.
  • Market maturity
  • Competitive analysis
  • Licensing opportunities

Text mining visualisation[edit]

Visualisation Description Business Intelligence use
Tree list Hierarchy list
  • Evaluating the data relevancy
  • Creating taxonomy
  • Relationship between concepts
Tag cloud Full text of concepts. The size of each word is determined by its frequency in the corpus
  • Evaluating the data relevancy
  • More visual than the tree list
2D keyword map[18] Tomographic map with quantitative representation of relief, usually using contour lines and colors. Distance on the map will be proportional to the difference between patent themes[19]
  • Landscape vision of thematics
  • Similarity vision with SOM
  • Monitoring competitors
2D hierarchical cluster map with quantitative and qualitative representation of document set association to topic, usually using quantized cells and colors. Size of topic cells may represent patent count per topic relative to overall document set. Density and distribution inside of a topic cell may be proportional to document count relative to association to the topic and strength of association, respectively.
  • Landscape vision of thematics
  • Monitoring competitors or a technology space
  • Identifying trends in an defined patent set
Text is decomposed into logical groupings and sub-groupings, then represented as a navigable hierarchy of those groupings by means of proportionate circle arcs.
  • Landscape vision of thematics
  • Monitoring a technology space
  • Interactive navigation and granularity

Visualisation for both data-mining and text-mining[edit]

Mapping visualisations can be used for both text-mining and data-mining results.

Visualisation Picture Description Business Intelligence use
Tree Map Picture Visualization of hierarchical structures. Each data item, or row in the data set, is represented by a rectangle, whose area is proportional to selected parameters.
  • Landscape vision of hierarchical thematics
  • Position of competitors or technology by thematics
Network map Picture In a network diagram, entities are connected to each other in the form of a node and link diagram.
  • Relationship visions
  • Monitoring similar competitors or technologies
Picture In the citation map, the date of citation is visualized on the x axis and each individual citation takes an entry on the y axis. A strong vertical line indicated the filing date of the patent, allowing one to see which citations are cited by the patent as opposed to those which cite the patent.
  • Qualitative and quantitative view of citation history and density

Uses[edit]

What can patent visualisation highlights:[20][21]

  • Competitors
  • Partners
  • New innovations
  • Technologic environment description[22]
  • Networks

Field application:[23][24]

See also[edit]

References[edit]

  1. ^ http://www.wipo.int/export/sites/www/ipstats/en/statistics/patents/pdf/941e_2010.pdf
  2. ^ Kevin G. Rivette, David Kline “Discovering new value in intellectual property” Harvard Business Review ( January–February 2000)
  3. ^ http://thomsonreuters.com/products_services/intellectual_property/ip_products/a-z/aureka/
  4. ^ http://www.intellogist.com/wiki/Patent_iNSIGHT_Pro
  5. ^ http://www.relecura.com/reports/Relecura_Whitepaper_-_Topic_Maps.pdf
  6. ^ http://www.patentinsightpro.com/techreports/1113/Tech%20Insight%20Report%20-%20Graphene.pdf
  7. ^ Daniel A Keim et IEEE Computer Society, “Information visualization and visual data mining,” IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 8 (2002): 1--8.
  8. ^ Anthony J. Trippe, “Patinformatics: Tasks to tools,” World Patent Information 25, n°. 3 (September 2003): 211-221.
  9. ^ Laura Ruotsalainen, “Data mining tools for technology and competitive intelligence” VTT Research Notes 2451(October 2008)
  10. ^ http://www.data-mining-software.com/data_mining_history.htm
  11. ^ http://www.exforsys.com/tutorials/data-mining/how-data-mining-is-evolving.html
  12. ^ Sungjoo Lee, Byungun Yoon, et Yongtae Park, “An approach to discovering new technology opportunities: Keyword-based patent map approach,” Technovation 29, n°. 6 (Juin): 481-497.
  13. ^ http://comminfo.rutgers.edu/~msharp/text_mining.htm
  14. ^ Sholom Weiss and al, Text Mining : Predictive Methods for Analyzing Unstructured Information, 1er ed. (Springer 2004).
  15. ^ Antoine Blanchard “La cartographie des brevets” La Recherche n°.398 (2006) : 82-83
  16. ^ Gerard Salton et Christopher Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing & Management 24, n°. 5 (1988): 513-523.
  17. ^ Y Kim, J Suh, et S Park, “Visualization of patent analysis for emerging technology,” Expert Systems with Applications 34, no. 3 (4, 2008): 1804-1812.
  18. ^ http://www.infovis.net/printMag.php?num=160&lang=2
  19. ^ Sungjoo Lee, Byungun Yoon, et Yongtae Park, “An approach to discovering new technology opportunities: Keyword-based patent map approach,” Technovation 29, n°. 6 (Juin): 481-497.
  20. ^ Miyake, M., Mune, Y. and Himeno, K. “Strategic Intellectual Property Portfolio Management: Technology Appraisal by Using the “Technology Heat Map”, Nomura Research Institute (NRI) Papers, n°. 83, (December 2004).
  21. ^ Charles Boulakia “Patent mapping,” http://sciencecareers.sciencemag.org/career_development/previous_issues/articles/1190/patent_mapping
  22. ^ Richard Seymour, “Platinum Group Metals Patent Analysis and Mapping,” Platinum Metals Review 52, n°. 4 (10, 2008): 231-240.
  23. ^ Susan E Cullen, “Introduction, From acorns to oak trees : how patent audits help innovations reach their full potential” IP Value 2010 - An International Guide for the Boardroom : 26--30
  24. ^ Charles Boulakia “Patent mapping,” http://sciencecareers.sciencemag.org/career_development/previous_issues/articles/1190/patent_mapping