AI613 2/2553 Class Blog

วันพฤหัสบดีที่ 24 กุมภาพันธ์ พ.ศ. 2554

เเนวทางข้อสอบFinalของวิชาAI613

ข้อสอบมีทั้งหมด 24 ข้อนะครับ เเบ่งเป็น 20 ข้อย่อย ตอบสั้นๆ ข้อละ 1 คะเเนน เเล้วก็มีอีก 4 ข้อใหญ่ 10 คะเเนน

นศ. อ่าน Reading journal ที่ตัวเองทำไว้ เเล้วมีบางข้อที่ผมสอดเเทรกระหว่างสอนในclass

มีคำถามเกี่ยวกับcaseหนึ่งข้อจากบทที่ี่ผมสอน

อย่าลืมอ่าน IT HYPE CYCLE ด้วยนะครับ

โชคดีครับ

อ.ปีเตอร์

วันจันทร์ที่ 14 กุมภาพันธ์ พ.ศ. 2554

Marketing Job opening @ Aquacorp

An owner of the Aquacorp company is looking for a few marketing students who might be interested in working for him. The job function would be something about Social Media consultant. If anyone of you are interested, you can email me at peterractham.tuATgmail.com

http://www.aquacorp.co.th/

ควอนตัมคอมพิวเตอร์ (Quantum computer)

ที่มาของคอมพิวเตอร์

ในช่วงทศวรรศที่ 1830 ชาร์ลส์ แบบเบจ (Charles Babbage) ได้ศึกษาการทำเครื่องวิเคราะห์ เพื่อสร้างเป็นเครื่องจักรที่สามารถรองรับการคำนวณทุกชนิด แม้ว่าจะสร้างไม่สำเร็จ แต่ทฤษฏีและการออกแบบเครื่องคำนวณของเขา ก็ได้รับการยอมรับว่าเป็นต้นแบบของเครื่องคอมพิวเตอร์ในเวลาต่อมา

ในปี 1936 อลัน ทูริ่ง (Alan Turing) เปลี่ยนแนวคิดของแบบเบจ ให้กลายเป็นทฤษฎีพื้นฐานของการประมวลผลข้อมูลที่ใช้ในคอมพิวเตอร์ และสร้างคอมพิวเตอร์ขึ้นแบบอิเล็กทรอนิกส์เครื่องแรกของโลก ทำงานโดยพื้นฐานของเลขฐานสองคือ 0 และ 1 โดยอาศัยหลอดสุญญากาศเป็นสื่อแทนค่า

ในทศวรรษที่ 1960 หลอดสุญญากาศก็ถูกแทนที่ด้วย ทรานซิสเตอร์ (Transistor) ที่ทำมาจากสารกึ่งตัวนำ ทำหน้าที่เปิดปิดสัญญาณไฟฟ้า แทนการเปิดปิดของหลอดสุญญากาศ โดยทรานซิสเตอร์มีข้อดีกว่าคือ ขนาดเล็กกว่า และทำให้เกิดความร้อนจากการใช้งานน้อยกว่าหลอดสุญญากาศ

ต่อมาไม่นานในทศวรรษที่ 1970 ก็มีการรวมทรานซิสเตอร์หลายๆตัว อยู่ในวงจรที่มีขนาดเล็กลง ทำให้เกิดเป็น ไมโครโปรเซสเซอร์ (Microprocessor) โดยมีไมโครโปรเซสเซอร์ที่ออกมาครั้งแรกคือ Intel 4004 ของ Intel, TMS 1000 ของ Texas Instrument, Central Air Data Computer ของ Garrett AiResearch

ในปี 1994 ปีเตอร์ ชอร์ (Peter Shor) ได้ค้นพบทฤษฏีใหม่ โดยใช้ทฤษฎีกลศาสตร์ควอนตัมที่ศึกษาถึงคุณสมบัติของสิ่งที่เล็กกว่าอะตอม มาประยุกต์ใช้ในการแก้ปัญหาเชิงคณิตศาสตร์ที่เป็นหัวใจของเครื่องคอมพิวเตอร์ในปัจจุบัน การค้นพบนี้เป็นการจุดประกายที่ทำให้มีการรวมเอาศาสตร์ทางด้านฟิสิกส์และวิทยาการคอมพิวเตอร์เข้าด้วยกัน เกิดเป็นศาสตร์แขนงใหม่ที่เรียกว่า “ควอนตัมคอมพิวเตอร์”

ควอนตัมคอมพิวเตอร์

คือเครื่องมือในการประมวลผลที่ใช้คุณสมบัติทางกลศาสตร์ควอนตัมเพื่อใช้ในการทำงานกับข้อมูลต่างๆ โดย อาศัยคุณสมบัติเชิงแม่เหล็กของการหมุนอิเล็กตรอนที่ไม่ใช่การหมุนตามปกติ หรือที่เรียกว่าสปิน(Spin) โดยอิเล็กตรอนจะมีสปิน +1/2 (spin up) และ -1/2 (spin down) โดยที่การเปลี่ยนแปลงกลับไปกลับมาระหว่างสปิน 2 แบบนี้ สามารถนำไปใช้กับระบบการทำงานแบบดิจิตอลหรือระบบเลขฐาน 2 ซึ่งเป็นระบบพื้นฐานของคอมพิวเตอร์ในปัจจุบัน โดยใช้สปิน +1/2 แทนที่ 1 และใช้สปิน -1/2 แทนที่ 0 ซึ่งคุณสมบัติดังกล่าวนี้เอง ทำให้นำไปสู่การพัฒนาระบบคอมพิวเตอร์ที่มีหน่วยพื้นฐานรูปแบบใหม่ที่เรียกว่า “ควอนตัมคอมพิวเตอร์”

ในระบบคอมพิวเตอร์ในปัจจุบันนี้ มีหน่วยพื้นฐานของข้อมูลที่เรียกว่า บิท (bit) ซึ่ง 1 บิท จะแสดงสถานะเพียงสถานะเดียวในระบบเลขฐานสองคือ 1 หรือ 0 เท่านั้น แต่หน่วยพื้นฐานของข้อมูลระบบควอนตัมคอมพิวเตอร์นั้นเรียกว่า คิวบิท (qubit) ซึ่ง 1 คิวบิทนั้น เป็นไปตามคุณสมบัติของการหมุนของอิเล็กตรอนที่ไม่แน่นอนตายตัว ทำให้ในคิวบิทเดียวกันนั้น อาจมีสถานะเป็นไปได้ทั้ง 1 และ 0 ในเวลาเดียวกัน ซึ่งนักฟิสิกส์เรียกสภาวะดังกล่าวว่า ซุปเปอร์โพสิชัน (Superposition) ทำให้แต่ละการทำงานของคิวบิท สามารถทำงานได้เร็วกว่า 2ⁿ เท่าของบิทระบบคอมพิวเตอร์ธรรมดาในปัจจุบัน (n คือจำนวนบิท) เช่น สมมติว่าใช้

คอมพิวเตอร์ที่มีขนาดสองบิท และควอนตัมคอมพิวเตอร์ที่มีขนาด 2 คิวบิท ในการประมวลผลโจทย์เดียวกัน จะพบว่า ควอนตัมคอมพิวเตอร์ สามารถประมวลผลทั้งสี่คำตอบ (“0,0” “0,1” “1,0” “1,1”) ได้ โดยใช้การประมวลผล เพียงครั้งเดียว ในขณะที่คอมพิวเตอร์ธรรมดา ต้องใช้การประมวลผลถึงสี่ครั้ง จึงจะได้คำตอบทั้งสี่อย่างครบถ้วน

นอกจากนี้ คิวบิต ยังสามารถเชื่อมต่อกัน หลายๆ หน่วย โดยที่สภาวะ ของคิวบิตแต่ละหน่วย ยังสามารถส่งผล และมีความเกี่ยวโยงกับคิวบิตหน่วยอื่นๆ ได้ด้วย ซึ่งเรียกปรากฏการณ์นี้ว่า เอนแทงเกิลเมนต์ (Entanglement) เช่น หากคิวบิทหนึ่ง มีการ spin up จะส่งผลให้อีกคิวบิทหนึ่งมีการ spin down

ประโยชน์ของควอนตัมคอมพิวเตอร์

1. Quantum Dense Code / Teleportation

สามารถลดระยะเวลาในการส่งผ่านข้อมูล เนื่องจาก 1 คิวบิท สามารถมีได้ทั้ง 2 สถานะ จึงสามารถย่อข้อมูล จาก 2 บิท ให้เหลือเพียง 1 คิวบิทได้ ทำให้การส่งข้อมูลมีความรวดเร็วมากขึ้น เพิ่มขีดความสามารถของคอมพิวเตอร์ให้สูงขึ้น

2. Quantum Cryptography รหัสลับเชิงควอนตัม

ในแง่ของความปลอดภัยในการส่งผ่านข้อมูล ควอนตัมคอมพิวเตอร์สามารถโจมตีกุญแจการเข้ารหัสสาธารณะ (Public Key) ที่ทำหน้าที่ปกป้องอีเมล ปกป้องข้อมูลรหัสผ่าน ปกป้องข้อมูลบัญชีธนาคารหรืออื่นๆ ทำให้มาตรการรักษาความปลอดภัยของข้อมูลเหล่านี้ล้มเหลว อย่างไรก็ตาม ด้วยกลวิธีเดียวกัน เราสามารถสร้างกุญแจการเข้ารหัสที่ไม่สามารถแก้ได้โดยวิทยาการควอนตัมคอมพิวเตอร์ได้เช่นเดียวกัน ซึ่งเรียกเทคโนโลยีใหม่นี้ว่า Quantum Cryptography ซึ่งสามารถตรวจจับเมื่อมีบุคคลที่สามเข้ามารบกวนหรือแทรกแซงการส่งข้อมูลได้ 100% เพื่อเป็นการปกป้องการขโมยข้อมูลได้

ตัวอย่างการใช้งาน

เนื่องมาจากเทคโนโลยีควอนตัมคอมพิวเตอร์นี้กำลังอยู่ในช่วงวิจัยและพัฒนา ทำให้เทคโนโลยีนี้ยังไม่มีการใช้อย่างกว้างขวางมากนัก มีเพียงแต่การพัฒนาเพื่อใช้ในการศึกษาขององค์กรหรือการใช้งานของบริษัทที่มีขนาดใหญ่โดยเฉพาะ เนื่องจากเทคโนโลยีนี้ยังใช้เงินลงทุนที่สูง ตัวอย่างการใช้งานที่ได้ออกมาแล้วเช่น

บริษัท D-Wave ได้มีการระดมเงินลงทุนจากผู้ร่วมธุรกิจไปกว่า 44 ล้านเหรียญสหรัฐใน 5 ปีที่ผ่านมา โดยให้บริษัท Rose เป็นผู้พัฒนาในการผลิต chip ที่มีจำนวนคิวบิทถึง 128 คิวบิท โดยทีม Google image recognition ได้ทำการสาธิตความสามารถในการสืบค้นจาก chip นี้ ที่สามารถแยกความแตกต่างของวัตถุในจำนวนภาพถ่ายเป็นแสนๆภาพในเวลาเพียงไม่กี่วินาที

ควอนตัมคอมพิวเตอร์ในอนาคต
ด้วยเทคโนโลยี Quantum Computer นี้ นักวิทยาศาตร์คาดการณ์ไว้ว่าในอีก 20 ปีข้างหน้า เราจะสามารถสั่งการสิ่งต่างๆรอบตัวได้ด้วยสมองโดยตรง โดยการใส่ headband ซึ่งรับส่งข้อมูลกับสมองได้ ซึ่งนักวิทยาศาสตร์จินตนาการอีกว่า จะมี chip ปฎิบัติการอยู่ในบ้าน เฟอร์นิเจอร์ รถยนต์ และสาธารณูปโภคต่างๆทำให้ชีวิตเราสะดวกมากยิ่งขึ้น อีกทั้งจะทำให้สามารถติดต่อกันได้สะดวกโดยใช้เสียงเท่านั้น และอาจจะทำให้ไม่มีคีย์บอร์ดอีกต่อไป

นาย ศุลี พิเชฐสกุล 5202113014
นางสาว ศิโสภา อุทิศสัมพันธ์กุล 5202115001

Web 2.0 Slide

http://www.4shared.com/file/L9riwzko/AI613_lastlecture__1_.html

Presentation: TEXT MINING

Text Mining

Text Mining Definition

กระบวนการที่กระทำกับข้อความ(โดยส่วนใหญ่จะมีจำนวนมาก) เพื่อค้นหารูปแบบ แนวทาง และความสัมพันธ์ที่ซ่อนอยู่ในชุดข้อความนั้น โดยอาศัยหลักสถิติ การรู้จำ การเรียนรู้ของเครื่อง หลักคณิตศาสตร์ หลักการประมวลเอกสาร (Document Processing) หลักการประมวลผลข้อความ (Text Processing) และหลักการประมวลผลภาษาธรรมชาติ (Natural Language Processing) โดยใช้วิธีการ Information extraction ด้วยโปรแกรมคอมพิวเตอร์แบบอัตโนมัติ นำเสนอผลการวิเคราะห์ให้เป็นความรู้ใหม่ รวมถึงสามารถแสดงความสัมพันธ์ของข้อมูลใหม่ด้วย ซึ่งเป็นการค้นพบข้อมูลที่ไม่เคยรับรู้มาก่อนหรือไม่มีข้อมูลที่ถูกบันทึกไว้ก่อน จะแตกต่างกับ Searching ซึ่งเป็นความต้องการค้นหาเรื่องที่ผู้สืบค้นรู้จักมาก่อน รวมทั้งเป็นเรื่องที่มีการเขียน/บันทึกไว้แล้ว

The Purpose of Text Mining

The purpose of Text Mining is to process unstructured (textual) information, extract meaningful numeric indices from the text, and, thus, make the information contained in the text accessible to the various data mining (statistical and machine learning) algorithms. Information can be extracted to derive summaries for the words contained in the documents or to compute summaries for the documents based on the words contained in them.

Knowledge from Text Mining

- - การสรุปเอกสารข้อความ (Document Summarization)

เป็นการลดความซับซ้อนและขนาดของเอกสารข้อความโดยไม่ทำให้ความหมายหรือสาระสำคัญของข้อมูลเอกสารสูญเสียไป ตัวอย่างงานที่เห็นได้ชัดเจนคือ google เมื่อ search ข้อมูล google จะแสดงบางส่วนของเนื้อหาของแต่ละผลลัพธ์ เพื่อให้เห็นภาพรวมของ website นั้นๆ ก่อนที่จะคลิกเข้าไปดู

- - การแบ่งประเภทเอกสารข้อความ (Document Classification)

เป็นเทคนิคช่วยในจำแนกประเภทเอกสาร ทั้งนี้เราต้องทราบก่อนแล้วว่าต้องการจำแนกเอกสารออกเป็นกี่ประเภท (Class) ดังนั้นการใช้เทคนิคนี้ จำเป็นต้องทำการสอนระบบ (train model) ให้รู้จำรูปแบบของเอกสารในแต่ละ class ก่อน ตัวอย่างเช่น ในการสมัคร e-mail ตาม free e-mail ต่างๆ นั้น จะมีหน้าต่างเงื่อนไขการใช้บริการ ถ้าเราอ่านเงื่อนไขทั้งหมดจะพบว่าหนึ่งในหลายๆ ข้อนั้น จะมีเงื่อนไขของการยินยอมให้ทางผู้ให้บริการ e-mail สามารถอ่านเนื้อหาภายใน mail ได้ ทั้งนี้ส่วนหนึ่งก็เพื่อใช้ในการกรอง พวก spam mail ออกจาก e-mail ปกตินั่นเอง อีกตัวอย่างหนึ่งของการทำเทคนิค Document Classification ไปใช้ คือใช้ในการจำแนกข้อมูลที่มีการ post อยู่ใน social network เพื่อใช้ในการวิเคราะห์หรือดูแนวโน้มในเรื่องต่างๆ ได้อีกด้วย

- - การแบ่งกลุ่มเอกสารข้อความ (Document Clustering)

จัดแบ่งเอกสารข้อความออกเป็นกลุ่ม โดยใช้การวัดความคล้ายคลึงและความแตกต่างของคุณลักษณะของเอกสารข้อความ สามารถนำไปใช้ในงานด้าน search engine เพื่อทำการจัดกลุ่มข้อมูลที่มีอยู่มากมาย ออกเป็นกลุ่มย่อยๆ หรือ Categories เมื่อ user ระบุ key word หรือ คำค้น เข้ามา ระบบ search engine จะทำการค้นข้อมูลใน Category เป้าหมายก่อน เพื่อลดเวลาในการ search แทนที่จะต้องทำการค้นหาข้อมูลจากฐานข้อมูลทั้งก้อน

ขั้นตอนการทำเหมืองข้อความ

1. ทำความเข้าใจปัญหา

2. ทำความเข้าใจข้อมูล

3. เตรียมข้อมูล (Text Corpus: Training set, Test set)

4. สร้างแบบจำลอง จากขั้นตอนวิธี

5. ประเมิน

6. นำไปใช้งาน

Applications for Text Mining

- - Analyzing open-ended survey responses.

In survey research (e.g., marketing), it is not uncommon to include various open-ended questions pertaining to the topic under investigation. The idea is to permit respondents to express their "views" or opinions without constraining them to particular dimensions or a particular response format. This may yield insights into customers' views and opinions that might otherwise not be discovered when relying solely on structured questionnaires designed by "experts." For example, you may discover a certain set of words or terms that are commonly used by respondents to describe the pro's and con's of a product or service (under investigation), suggesting common misconceptions or confusion regarding the items in the study.

- - Automatic processing of messages, emails, etc.

Another common application for text mining is to aid in the automatic classification of texts. For example, it is possible to "filter" out automatically most undesirable "junk email" based on certain terms or words that are not likely to appear in legitimate messages, but instead identify undesirable electronic mail. In this manner, such messages can automatically be discarded. Such automatic systems for classifying electronic messages can also be useful in applications where messages need to be routed (automatically) to the most appropriate department or agency; e.g., email messages with complaints or petitions to a municipal authority are automatically routed to the appropriate departments; at the same time, the emails are screened for inappropriate or obscene messages, which are automatically returned to the sender with a request to remove the offending words or content.

- - Analyzing warranty or insurance claims, diagnostic interviews, etc.

In some business domains, the majority of information is collected in open-ended, textual form. For example, warranty claims or initial medical (patient) interviews can be summarized in brief narratives, or when you take your automobile to a service station for repairs, typically, the attendant will write some notes about the problems that you report and what you believe needs to be fixed. Increasingly, those notes are collected electronically, so those types of narratives are readily available for input into text mining algorithms. This information can then be usefully exploited to, for example, identify common clusters of problems and complaints on certain automobiles, etc. Likewise, in the medical field, open-ended descriptions by patients of their own symptoms might yield useful clues for the actual medical diagnosis.

- - Investigating competitors by crawling their web sites.

Another type of potentially very useful application is to automatically process the contents of Web pages in a particular domain. For example, you could go to a Web page, and begin "crawling" the links you find there to process all Web pages that are referenced. In this manner, you could automatically derive a list of terms and documents available at that site, and hence quickly determine the most important terms and features that are described. It is easy to see how these capabilities could efficiently deliver valuable business intelligence about the activities of competitors.

Areas that text mining has been used.

- - Security applications

Many text mining software packages are marketed towards security applications, particularly analysis of plain text sources such as Internet news. It also involves in the study of text encryption.

- - Biomedical applications

A range of text mining applications in the biomedical literature has been described. One example is PubGene that combines biomedical text mining with network visualization as an Internet service. Another text mining example is GoPubMed.org. Semantic similarity has also been used by text-mining systems, namely, GOAnnotator.

- - Software and applications

Research and development departments of major companies, including IBM and Microsoft, are researching text mining techniques and developing programs to further automate the mining and analysis processes. Text mining software is also being researched by different companies working in the area of search and indexing in general as a way to improve their results.

- - Online Media applications

Text mining is being used by large media companies, such as the Tribune Company, to disambiguate information and to provide readers with greater search experiences, which in turn increases site "stickiness" and revenue. Additionally, on the back end, editors are benefiting by being able to share, associate and package news across properties, significantly increasing opportunities to monetize content.

- - Marketing applications

Text mining is starting to be used in marketing as well, more specifically in analytical Customer relationship management. Coussement and Van den Poel (2008) apply it to improve predictive analytics models for customer churn (customer attrition).

- - Sentiment analysis

Sentiment analysis may involve analysis of movie reviews for estimating how favorable a review is for a movie. Such an analysis may require a labeled data set or labeling of the affectivity of words. A resource for affectivity of words has been made for WordNet.

- - Academic applications

The issue of text mining is of importance to publishers who hold large databases of information requiring indexing for retrieval. This is particularly true in scientific disciplines, in which highly specific information is often contained within written text. Therefore, initiatives have been taken such as Nature's proposal for an Open Text Mining Interface (OTMI) and the National Institutes of Health's common Journal Publishing Document Type Definition (DTD) that would provide semantic cues to machines to answer specific queries contained within text without removing publisher barriers to public access.

Source:

http://www.statsoft.com/textbook/text-mining/
http://en.wikipedia.org/wiki/Text_mining
http://people.ischool.berkeley.edu/~hearst/text-mining.html
http://th.wikipedia.org/wiki/%E0%B8%81%E0%B8%B2%E0%B8%A3%E0%B8%97%E0%B8%B3%E0%B9%80%E0%B8%AB%E0%B8%A1%E0%B8%B7%E0%B8%AD%E0%B8%87%E0%B8%82%E0%B9%89%E0%B8%AD%E0%B8%84%E0%B8%A7%E0%B8%B2%E0%B8%A1
http://www.stks.or.th/blog/?p=125
Presentation File:
https://cid-5b9b03b3908a57c4.office.live.com/view.aspx/Presentation%5E_textmining.pptx?Bsrc=Docmail&Bpub=SDX.Docs&wa=wsignin1.0

พรพิตรา สิทธิประศาสน์ 5302115224

รัฐวิชญ์ (สรุจ) รัตนสิมานนท์ 5202112701