5-原理论文CiteSpace 科技文本挖掘及可视化-光盘3-Towards an explanatory and computational theory of scientific discovery
收录时间:2022-11-24 19:36:28  浏览:0
Journal of Informetrics 3 (2009) 191209 Contents lists available at ScienceDirect Journal of Informetrics journal homepage/ Towards an explanatory and computational theory of scientifi c discovery? Chaomei Chena/b/ Yue Chenb/ Mark Horowitza/ Haiyan Houb/ Zeyuan Liub/ Donald Pellegrinoa a College of Information Science and Technology/ Drexel University/ USA b The WISE Lab/ Dalian University of Technology/ China a r t i c l ei n f o Article history/ Received 1 September 2008 Received in revised form 12 February 2009 Accepted 17 March 2009 Keywords/ Theory of scientifi c discovery Transformative scientifi c discoveries Theory of structural holes Intellectual brokerage Knowledge diffusion Information foraging a b s t r a c t We propose an explanatory and computational theory of transformative discoveries in sci- ence/ The theory is derived from a recurring theme found in a diverse range of scientifi c change/scientifi cdiscovery/andknowledgediffusiontheoriesinphilosophyofscience/soci- ology of science/ social network analysis/ and information science/ The theory extends the concept of structural holes from social networks to a broader range of associative networks found in science studies/ especially including networks that refl ect underlying intellectual structures such as co-citation networks and collaboration networks/ The central premise is that connecting otherwise disparate patches of knowledge is a valuable mechanism of cre- ative thinking in general and transformative scientifi c discovery in particular/ In addition/ thepremiseconsistentlyexplainsthevalueofconnectingpeoplefromdifferentdisciplinary specialties/Thetheorynotonlyexplainsthenatureoftransformativediscoveriesintermsof the brokerage mechanism but also characterizes the subsequent diffusion process as opti- mal information foraging in a problem space/ Complementary to epidemiological models of diffusion/ foraging-based conceptualizations offer a unifi ed framework for arriving at insightful discoveries and optimizing subsequent pathways of search in a problem space/ Structural and temporal properties of potentially high-impact scientifi c discoveries are derived from the theory to characterize the emergence and evolution of intellectual net- worksofafi eld/TwoNobelPrizewinningdiscoveries/thediscoveryofHelicobacterpyloriand gene targeting techniques/ and a discovery in string theory demonstrated such properties/ Connections to and differences from existing approaches are discussed/ The primary value of the theory is that it provides not only a computational model of intellectual growth/ but also concrete and constructive explanations of where one may fi nd insightful inspirations for transformative scientifi c discoveries/ 2009 Elsevier Ltd/ All rights reserved/ 1/ Introduction Theintellectualstructureofascientifi cfi eldisanabstractionofthecollectiveknowledgeofscientistsinthefi eld/including scholarly publications and other forms of intellectual assets/ Scientifi c change refers to profound changes of the intellectual ? CiteSpace is freely available at http//cluster/cis/drexel/edu/cchen/citespace/ Color versions of the fi gures in this article are available at http//cluster/cis/drexel/edu/cchen/papers/JOI/ Corresponding author at/ College of Information Science and Technology/ Drexel University/ Philadelphia/ PA 19104/ USA/ E-mail address/ chaomei/chencis/drexel/edu (C/ Chen)/ 1751-1577/$ see front matter 2009 Elsevier Ltd/ All rights reserved/ doi/10/1016/j/joi/2009/03/004 192C/ Chen et al/ / Journal of Informetrics 3 (2009) 191209 structure of a fi eld/ In this article/ we will focus on the nature and key mechanisms of scientifi c discoveries that could lead to such fundamental changestransformative scientifi c discoveries/ The nature of scientifi c change has been studied from many distinct perspectives/ notably including philosophy of science (Collins/ 1998/ Laudan et al// 1986/ Schaffner/ 1992)/ sociology (Fuchs/ 1993/ Griffi th HeinzeHeinze/Shapira/Senker/Hummon Small Sullivan/ Koester/ White/ Wagner-Dobler/ 1999)/ Scientifi c literature has increasingly become one of the most essential sources for these studies/ Social network analysis and complex network analysis also provides valuable perspective (Barabsi et al// 2002/ Newman/ 2001/ Redner/ 2004/ Snijders/ 2001/ Valente/ 1996/ Wasserman Simon/ Langley/ Fuchs/ 1993/ Morris Mullins/ Hargens/ Hecht/ Small Bettencourt/ Kaiser/ Kaur/ Castillo-Chavez/ Liben-Nowell Nowakowska/ 1973)/ Epidemic models consider variables such as contact rates between scientists/ latency and recovery times/ The contact rate between scientists is found to be the single important factor to speed up the diffusion of knowledge/ Other potentially applicable models of diffusion include ant colony and random walk models/ In an ant colony model (Dorigo Simon/ 1981)/ In particular/ Perkins distinguished two types of problem spaces/ In a Homing Space/ there are many clues and signposts such that navigating in such spaces is relatively easy/ In contrast/ a Klondike Space has very few such clues/ The sparseness of clues is illustrated by Perkins (p/ 498) in a widely known case of sudden insightCharles Darwins discovery of the principle of natural selection/ According to Darwins autobiography/ in October 1838/ he conceived the principle while he “happened to read for amusement Malthus on Population/ What is remarkable is that the next person arrived at the same principle 20 years later/ What is even more remarkable is that the person/ Alfred Russell Wallace/ arrived to the idea while reading the same 1826 book by Malthus! How could one increase the odds of stumbling on such clues? It becomes clear/ from Sandstroms notion of bibliographic microhabitats to Perkins characterizations of Homing and Klondike spaces/ that fi nding and recognizing clues is essential for both information foragers and problem solvers/ Research in the data mining community on interestingness is particularly relevant (Hilderman Liqiang the previously stated hypotheses have not been falsifi ed yet/ and are less and less likely to be so determined/ (2) the interesting ideas or work/ which denies widely accepted assumptions/ states new relationships between old ideas/ proposes new mechanisms/ but do not require the reader to adopt wholly new ways of thinking/ and (3) paradigm shifts and transformative discoveries/ Interesting ideas are enlightening and surprising in a non-threatening way/ in fact/ the surprise is generally a pleasant one/ in contrast to the experience of living through a shift of paradigm/ especially when ones accepted paradigm is being replaced by a more successful one/ 2/3/2/ Literature-based discovery Swanson and his colleagues pioneered a literature-based discovery approach to identify potentially valuable hypotheses (Swanson/ 1986a/ 1986b/ 1987/ Swanson Lindsay namely/ historical accounts of scientifi c discoveries/ psychological experiments with nonscientists working on tasks related to scientifi c discoveries/ direct observation of ongoing scientifi c laboratories/ and computational modeling of scientifi c discovery processesby viewing them through the lens of the theory of human problem solving/ The authors then considered these types of studies against a list of evaluative criteria/ such as face validity/ fi ne or coarse-grained/ new phenomena/ rigor and precision/ social and motivational factors/ Many scholars have studied information and discovery pathways/ Small presented a series of examples from the his- tory of science in which discoveries can be modeled as navigation between pairs of established experimental or theoretical fi ndings (Small/ 2000)/ One of his examples was from atomic physics in early 20th century/ There was no direct connec- tion between experimental evidence on the spectrum for atomic hydrogen and evidence for hydrogens nuclear structure until Niels Bohrs 1913 model for the hydrogen atom using a quantum hypothesis/ Similarly/ the Mller-and-Bednorz discovery of superconductivity was also seen as creating a path between the fi eld of superconductivity and a class of com- pounds previously not thought to be promising candidates for superconductivity (Holton/ Chang/ Small/ 2000)/ We notice a recurring theme in the diverse conceptualizations of scientifi c change/ That is/ profound scientifi c change tendstobeconnectedtoabroadrangeofbrokeragemechanisms/Burtsstructuralholesarefoundnotonlyinsocialnetworks but also in associative networks of intellectual/ semantic/ and other types of interrelationships/ Because information fl ow around a structural hole is limited by the topological structure/ those who are in the brokerage positions inherit advantages from their positions in such networks/ Furthermore/ structural holes in intellectual and cognitive networks appear to be a vital source of inspiration and creativity/ Creative scientists draw inspirations from other disciplines/ Research has found that great philosophers tend to be the ones who stayed in touch with competing schools of philosophy (Guiffre/ 1999)/ Creative scientists are the ones who have the ability to communicate effectively with otherwise disconnected peers (Heinze Barabsi et al// 2002)/ The popularity of a node can be broadly defi ned by an attribute function of node/ such as prestige/ age/ or by other ranking mechanisms/ Such processes often result in scale-free networks/ which are characterized by power law distributions of node degrees/ While earlier preferential attachment models assume that each new coming node is fully aware of the prestigious status of every existing node/ more recent studies have relaxed the assumption to ranking functions defi ned on a subset of the existing nodes instead (Fortunato/ Flammini/ Chen Freeman/ 1977/ Kleinberg/ 2002)/ ?1(v/G/T/?burst) = ? ? i=burst ?i ? = ?burst(2) ?1(v/G/T/?centrality) = ? i=centrality ?i = ?centrality (3) ?1(v/G/T/?citation) = ? ? i=citation ?i ? = ?citation(4) ?2(v/G/T/?burst/?centrality) = ? i=burst/centrality ?i 1/2 = ? burst ?centrality (5) ?3(v/G/T/?burst/?centrality/?citation) = ? i=burst/centrality/citaiton ?i 1/3 = 3 ? burst ?centrality ?citation (6) Note that ?1(?citation )/ a special case of the generic defi nition/ ranks the signifi cance of a reference based on its citations as seen in earlier efforts for predicting Nobel Prize winners based on citation counts (Garfi eld/ 1992)/ We will also compare pair-wise Pearson correlation coeffi cients between ?1/ ?2and ?3indices of centrality/ burst/ and citation frequency in order to identify the simplest and effective metrics among them/ In summary/ our theory suggests that ? indices would be a good indicator of potential transformative discoveries/ Fur- thermore/ once a reference is identifi ed with a high ? index/ the theory provides an explanatory framework such that we can focus on the precise brokerage connections at work/ The theory also suggests alternative ways to model the evolution of a network by taking brokerage connections into account/ According to our theory/ a subset of Nobel Prize discoveries will be transformative discoveries/ More transformative discoveries would be expected from the recipients of a variety of 200C/ Chen et al/ / Journal of Informetrics 3 (2009) 191209 other awards in science/ In addition/ we expect that transformative discoveries can be identifi ed by these ? metrics at an earlier stage than by single-dimensional ranking systems/ In terms of diffusion/ we expect that transformative discoveries in general will lead to a more rapid and sustained diffusion process/ If we see the diffusion process as an information foraging process by the scientifi c community as a whole/ transformative discoveries/ i/e// brokerage connections across structural holes/ would have a higher perceived profi tability/ which would motivate and stimulate the diffusion process/ It also follows that the domain-wide foraging process will spend more time with transformative discoveries than other patches of scientifi c knowledge/ 4/ Illustrative examples We consider three examples as our initial verifi cation of the theory/ We choose two topic areas which have received Nobel Prize awards recently/ namely/ peptic ulcer and gene targeting/ and string theory in physics as the third topic area/ 4/1/ Procedure In each case study/ CiteSpace (Chen/ 2006) was used to construct a co-citation network of the references relevant to the chosen topic/ We followed the general procedure described in (Chen/ 2004/ 2006)/ Bibliographic records were retrieved from the Web of Science with a topical search for articles only/ Reviews/ editorials/ and other document types were excluded from the analysis/ CiteSpace uses a time-slicing mechanism to generate a synthesized panoramic network visualization based on a series of snapshots of the evolving network across consecutive time slices/ Each node in the network represents a reference cited by records in the retrieved dataset/ A line connecting two nodes represents one or more co-citation instances involving the two references/ Colors of co-citation links correspond to the earliest year in which co-citation associations were fi rst made/ Each node is shown with a tree-ring of citation history in the same color scheme/ representing the history of citations received by the underlying reference/ Structural-hole and burst properties are depicted in two distinct colors purple and red in visualizations/ If a node is rendered with a purple ring/ it means it has a strong betweenness centrality/ The purple color can only appear as the color of the outermost rim of a node/ The thickness of the purple ring is proportional to the degree of the centrality/ the thicker/ the stronger the betweenness centrality/ In contrast/ if a node has red rings/ these red rings represent the presence and strength of its burst property/ It can appear as the color of any inner rings of the tree ring of a node/ The presence of one or more red rings on a node indicates a signifi cant citation burst was detected/ In other words/ there was a period of time in which citations to the reference increased sharply with respect to other references in the pool/ hence the name CiteSpace/ 4/2/ Case study I/ peptic ulcer The Nobel Prize in Physiology or Medicine for 2005 was awarded jointly to Barry J/ Marshall and J/ Robin Warren for their discovery of “the bacterium Helicobacter pylori and its role in gastritis and peptic ulcer disease/” We choose peptic ulcer as the topic area/ According to Marshalls Nobel Prize lecture (Marshall/ 2005)/ Marshall and Warren conducted a study in the 1980s and found 100% of 13 patients with duodenal ulcer were infected by Helicobacter pylori/ They discovered that peptic ulcer was caused by a bacterial infection/ unlike the then predominant understanding that ulcers were caused by other reasons such as stress and acid in the stomach/ The discovery established that very young children acquired the Helicobacter organism/ a chronic infection which caused a lifelong susceptibility to peptic ulcers/ Helicobacter was generally accepted after 1994 as the cause of most gastroduodenal diseases including peptic ulcer and gastric cancer/ We analyzed a co-citation network of peptic ulcer research to identify structural and temporal properties associated with the Helicobacter pylori discovery/ Bibliographic records on peptic ulcer between 1980 and 2007 were retrieved from the Web of Science with a topic search for peptic ulcer/CiteSpace was used to construct a co-citation network of peptic ulcer research between 1980 and 2007/ Fig/ 1 shows a series of 5-year snapshots of the co-citation network as it evolved over time/ In each diagram/ fi ve colors match to the 5 years in the order of blue/ cyan/ green/ yellow/ and orange/ Thus/ an orange cluster would be formed in the 5th year of a given 5-year interval/ For example/ a node with essentially a green tree-ring means the reference was mostly cited in the 3rd year of the time interval/ The captions below network snapshots record the time interval/ the number of nodes/ the number of co-citation links/ and three thresholds/ For example/ the caption “19811985/ N=210/ E=2038/ 3/3/20” under the fi rst snapshot of the network means that the network was formed between 1981 and 1985/ consisting of 210 references and 2038 co-citation pairs/ Each reference has received at least three citations in one of the 5 years during this period/ According to independent sources (Pincock/ 2005)/ the fi rst major publication of the Helicobacter pylori discovery was (Marshall Chen both have shown strong betweenness centrality and burstness/ Maldacena-1998 is not only strong in both centrality and burstness/ it is also the most cited reference in this dataset/ We contacted Juan Maldacena directly and asked him to identify the nature of his major contributions in this article to String Theory/ The transformative nature is evident in his reply/ “It connected two different kinds of theories/ (1) particle theories or gauge theories and (2) string theory/ Many of the papers on string dualities (and this is one of them) connect different theories/ This one connects string theory to more conventional particle theories/” Maldacenas contribution is highlighted Fig/ 5/ A diffusion map of gene targeting research between 1985 and 2007/ Selection criteria are at least 15 citations for citing articles and top 30 cited articles per time slice/ Polygons represent clusters of co-cited papers/ Each cluster is labeled by title phrases selected from papers citing the cluster/ Red lines depict co-citations made in the current year/ The concentrations of red lines track the context in which co-citation clusters are referenced/ C/ Chen et al/ / Journal of Informetrics 3 (2009) 191209205 Fig/ 6/ A co-citation network of references cited between 1990 and 2003 in string theory/ Polchinski-1995 marked the
1. WEO啦仅展示《5-原理论文CiteSpace 科技文本挖掘及可视化-光盘3-Towards an explanatory and computational theory of scientific discovery》的部分公开内容,版权归原著者或相关公司所有。
2. 文档内容来源于互联网免费公开的渠道,若文档所含内容侵犯了您的版权或隐私,请通知我们立即删除。
3. 当前页面地址:https://www.weo.la/doc/39e642186b9f0324.html 复制内容请保留相关链接。